Category: Database

Preparing Data for Analysis with Microsoft Excel and Power BI
These sources offer an extensive exploration of data analysis and PowerBI, focusing on the role of a data analyst and the process of transforming raw data into valuable insights. They cover essential concepts like data sourcing, cleaning, modeling, and visualization, emphasizing the importance of effective communication of findings. The texts also introduce advanced topics such as DAX calculations, performance optimization, and the integration of PowerBI within a larger enterprise data flow, highlighting the potential of data to drive strategic business decisions. Furthermore, they touch upon the application of generative AI in data analysis and provide guidance on preparing for the Microsoft PL-300 certification exam, offering real-world scenarios and career insights through examples of aspiring data analysts.

Foundations of Data Analysis

Data analysis is a multifaceted process crucial for turning raw data into meaningful insights and informed decisions for businesses and organizations . It involves identifying, cleaning, transforming, and modeling data to discover meaningful and useful information. Data analysts use various techniques to explore, interpret, and draw meaningful conclusions from processed data “.

The Importance of Data Analysis

Data is an essential business component, but raw data is only meaningful after proper interpretation and analysis . **Data analysts are crucial because they help organizations make sense of the vast amounts of collected data, turning it into insights that inform decisions**. This analytical work helps businesses identify growth opportunities, improve operations, gain a competitive advantage , identify the cause of problems, uncover trends, and make decisions that can improve business performance. Ultimately, data analysis drives strategic decision-making and can significantly impact an organization’s success “.

The Data Analysis Process

The data analysis process typically involves several interconnected stages “:
- Identifying the analysis purpose or defining the business problem: This is the foundational step, determining what you aim to achieve or the questions you need to answer with the analysis . Gathering the right data is fundamental to ensure the analysis is relevant and useful, and understanding the purpose informs the type and scope of data needed. Consulting with stakeholders is key to determining the purpose “.
- Data Collection and Preparation: Data is gathered from various sources . This raw data is often unorganized and may have missing values or inconsistencies. Data preparation involves cleaning, standardizing, organizing, and transforming the data into a usable format for analysis . The Extract, Transform, Load (ETL) process is a common method for processing data, involving extracting data from sources, transforming it to make it consistent and ready for analysis, and loading it to a suitable destination. Data wrangling is another term for this process of processing, cleaning, and transforming data “.
- Data Processing and Modeling: Processing transforms raw data . Data modeling organizes data to make sense of the information and generate insights. This can involve understanding basic concepts, using tools like DAX to create calculations, and optimizing model performance . Common data schemas include star and snowflake schemas, which organize data into fact and dimension tables.
- Data Analysis, Visualization, and Interpretation: This stage involves exploring processed data and generating insights . Data analysis uses various techniques to explore, interpret, and draw meaningful conclusions from the processed data. Analytical techniques include statistical analysis, hypothesis testing, and identifying patterns, trends, and relationships . Data visualization is a powerful tool used to communicate these insights. Visualizations (like charts and graphs) transform complex data into understandable representations, helping to spot patterns, anomalies, and trends at a glance . Interpretation involves understanding what the patterns and trends reveal.
- Reporting and Sharing Data Insights: Insights are communicated to stakeholders through reports and dashboards . Dashboards consolidate critical information visually on one screen to achieve specific objectives. Sharing reports requires considering factors like accessibility, visual appeal, and security . Effective communication and storytelling are essential to convey findings responsibly and ethically.
- Implementing Insights and Recommendations: Informed decisions are made based on the analyzed data, guiding actions and adjustments within the business to achieve objectives “.
This data flow process – collection, processing, analysis, and decision-making – is a fundamental concept in business “.

Roles in Data Analysis

The data analysis process involves various roles that collaborate to achieve datadriven success “:
- Data Engineer: Designs and constructs data infrastructure, including pipelines, cleaning, pre-processing, and transforming raw data for analysts and scientists “.
- Data Analyst: Examines data sets to identify trends, patterns, and insights . They use tools to visualize and present data, making it digestible for stakeholders, and work closely with teams to align analysis with business goals. The data analyst is often a central figure in the process “.
- Data Scientist: Dives deeper into data, creating predictive models using machine learning and statistical techniques to identify hidden patterns and optimize decisions . They often collaborate with data analysts.
- Database Administrator (DBA): Works on the maintenance, performance, and security of databases, ensuring data is stored efficiently and accessible “.
- Data Architect: Creates the blueprint for data management systems, designing data models and strategies for storage, integration, and retrieval “.
- Business Intelligence (BI) Analyst: Transforms data into actionable insights, focusing on Key Performance Indicators (KPIs) using BI tools to visualize and present data to stakeholders and collaborating with business leaders to understand their goals “.
These roles are essential for providing organizations with the information they need for informed, data-driven decisions “.

Skills for Data Analysts

To succeed, data analysts require a mix of technical and non-technical skills “:
- Technical Skills: Proficiency with tools like Microsoft Excel and Microsoft PowerBI . Experience with programming languages such as R and Python is used for analysis and visualization. Understanding SQL (Structured Query Language) is vital for interacting with databases . Key technical activities include data wrangling (cleaning and transforming data), data modeling (organizing data for analysis) , creating calculations using languages like DAX, data visualization (creating charts and reports) , and using statistical functions. Other important technical skills mentioned include data profiling , managing data storage modes, creating aggregations , joining and merging data, grouping and binning data , and performance optimization.
- Non-Technical (Soft) Skills: These are crucial for connecting with and influencing stakeholders . Essential skills include **effective communication** to present complex information clearly and concisely to various audiences, diplomacy for navigating disagreements and maintaining relationships , **understanding end-user needs** to tailor analysis and provide relevant insights, and being a technical interpreter to translate complex concepts for non-technical stakeholders . **Strategic thinking, awareness of impact, and understanding the business context** are also important. The ability to use data to tell a story or narrative is also highlighted “.
By developing these technical and non-technical skills, data analysts can collaborate effectively, create actionable insights, inspire change, and make lasting impacts “.

Tools and Techniques Used in Data Analysis

Data analysts utilize a range of tools and techniques “:
- Software and Tools: Microsoft Excel is used for designing and managing spreadsheets and preparing data . **Microsoft PowerBI** is a powerful tool for processing, analyzing, and sharing data, known for its user-friendly interface, rich visualizations, and advanced analytics capabilities . The PowerBI workflow includes PowerBI Desktop, PowerBI Service, and PowerBI Apps. Power Query Editor within PowerBI is used for data preparation, cleaning, transformation, and ETL tasks . SQL Server and other databases are used for data storage. Programming languages like R and Python are used for data analysis and visualization “.
- Techniques:ETL (Extract, Transform, Load): A fundamental process for preparing data “.
- Data Wrangling/Cleaning/Transformation: Making raw data consistent and usable “.
- Data Modeling: Organizing data into structured formats like star or snowflake schemas “.
- DAX (Data Analysis Expressions): A formula language used to create custom calculations and measures within data models “.
- Calculations and Statistical Functions: Performing mathematical operations and applying functions like average, median, count, min, and max to data to reveal insights “.
- Data Visualization: Creating graphical representations of data such as charts, graphs, scatter plots, bubble charts, dot plots, and tables to make complex information understandable . Interactive features like filtering, sorting, slicers, and bookmarks enhance visualizations.
- Data Profiling: Examining data sets to evaluate accuracy, completeness, and statistical distribution . Tools analyze column quality, distribution, and profile statistics.
- Grouping and Binning: Organizing data points into chosen groups or equal-sized segments “.
- Clustering: Identifying similarities in data attributes to divide data into subsets or clusters “.
- Time Series Analysis: Analyzing data in chronological order to identify trends “.
- Performance Optimization: Modifying data models and reports to improve speed and efficiency, especially with large data volumes . Techniques include filtering, sorting, indexing, aggregation, and choosing appropriate storage modes. The Performance Analyzer tool helps diagnose issues “.
- Data Storage and Management: Understanding different data types (structured, unstructured, semistructured) and appropriate storage solutions , as well as concepts like normalization and indexing in databases.
- Connecting to Data Sources: Using methods like Import mode or Direct Query mode to bring data into tools like PowerBI “.
These tools and techniques empower data analysts to extract insights, support business intelligence, and facilitate data-driven decision-making . The sources frequently use the example of Adventure Works, a fictitious bicycle company, to illustrate how data analysis is applied in real-world business scenarios.

Mastering Microsoft PowerBI for Business Intelligence

Microsoft PowerBI is an interactive data visualization product and a comprehensive business analytics solution. It is considered an essential resource for many organizations across various industries.

Importance in Business

PowerBI plays a crucial role in helping businesses make sense of the vast amounts of collected data, transforming it into actionable insights that inform decisions. It enables organizations to harness the full potential of data to uncover insights, identify patterns, trends, and insights, and drive strategic decision-making. PowerBI supports data-driven decision-making and is vital for providing organizations with the information they need for informed decisions [Introduction]. For companies like Adventure Works, PowerBI is used to extract insights from large amounts of data.

Components and Workflow

Microsoft PowerBI has multiple components that work together. The main components are PowerBI Desktop, PowerBI Service, and PowerBI Apps. Other related components include PowerBI mobile, PowerBI report server, and PowerBI embedded.
- PowerBI Desktop is a Windows-based application used by data analysts or report designers to clean, transform, and load data, create a data model, design reports, and publish them.
- PowerBI Service is the cloud-based service (SaaS) part of PowerBI, used by report users and administrators. It offers advantages like accessibility, scalability, collaboration tools, and data backup and recovery features.
- PowerBI Apps are the native mobile applications available on iOS, Android, and Windows. They allow access to insights on the go.
A typical workflow in PowerBI often starts with the creation of a report in PowerBI Desktop. Report designers and developers are primarily responsible for this task. When the report is ready, you publish it to the PowerBI service, where administrators can assign permissions and specific users can consume the report. You can also share reports with colleagues, your whole organization, or external stakeholders who need to draw insights. Insights are also communicated through dashboards, which consolidate critical information visually. PowerBI Service and PowerBI mobile can be used to view dashboards.

Key Capabilities and Features

PowerBI offers a wide range of features and capabilities for data analysis and business intelligence:
- Data Connection and Preparation:
- PowerBI supports a wide range of data sources, including traditional databases, Excel spreadsheets, cloud-based services, on-premise databases, external enterprise applications, and APIs. PowerBI connector is used to access these sources.
- Data preparation is crucial for making raw data usable. This involves cleaning, standardizing, organizing, and transforming data.
- The Extract, Transform, Load (ETL) process is fundamental for preparing data in PowerBI. Power Query Editor in PowerBI is a tool used for data preparation, cleaning, transformation, and ETL tasks. Data wrangling is another term for processing, cleaning, and transforming data [Introduction, 1, Introduction].
- Techniques include data profiling, joining and merging data [Introduction], and grouping and binning data to classify or segment data points.
- Data Modeling:
- Data modeling is creating visual representations of your data in PowerBI to organize it and make sense of the information. It involves understanding how different data elements interact and outlining the rules that influence these interactions.
- PowerBI allows you to identify or create relationships between data elements. You can define relationships between tables and assign data types.
- Common data schemas include star and snowflake schemas, which organize data into fact and dimension tables [Introduction, 7, 43].
- DAX (Data Analysis Expressions) is a powerful language used to create custom calculations, calculated measures, columns, and tables within data models. DAX is fundamental to data analysis in PowerBI.
- Performance Optimization is important, especially with large data volumes. Techniques include modifying models, reports, queries, filtering, sorting, indexing, aggregation, and choosing appropriate storage modes. The Performance Analyzer tool helps diagnose issues.
- Aggregations in PowerBI enable diving deeper into data without compromising speed and performance. They involve summarizing or consolidating large volumes of data into manageable summary tables.
- Understanding different Data Storage Modes (Import, Direct Query, Dual, Composite) is vital as they determine where data is stored and how queries are sent. Import mode stores data in PowerBI’s in-memory storage, Direct Query keeps data in the source, and Dual mode can act as either. Composite mode allows combining different storage modes.
- Creating Hierarchies (date, product, geographical) is a significant feature allowing analysis at different levels of granularity within the same visual using drill down.
- Analysis Techniques:
- PowerBI empowers you to transform raw data into meaningful insights through various advanced tools and functionalities.
- Calculations are the foundation of data analysis in PowerBI and are created using DAX. Common calculations include aggregations and statistical functions like average, median, count, min, and max [Introduction, 21, 22, 23].
- PowerBI offers analytics capabilities to add significant value to visualizations. This includes using statistical summary tools.
- Identifying patterns, trends, and anomalies is crucial. Scatter charts can help identify outliers.
- Time Series Analysis involves analyzing data in chronological order to identify trends. PowerBI supports time series forecasting to predict future trends.
- Clustering identifies similarities in data attributes to divide data into subsets.
- The Analyze feature automatically detects relationships and connections, providing automated insights. You can right-click on a data point to analyze fluctuations like increases or decreases.
- PowerBI leverages AI capabilities and machine learning algorithms to provide insights. This includes AI visuals like Key Influencers and Decomposition Trees for understanding drivers behind outcomes, sentiment analysis, and key phrase extraction.
- The Q&A feature is a natural language processing tool allowing users to ask questions about data in plain English and get answers as visuals. It learns and adapts over time.
- Quick Insights automatically searches datasets to discover and visualize potential patterns, trends, and outliers using machine learning and statistical functions.
- Dynamic reports can facilitate using What-If parameters for interactive adjustments and scenario analysis.
- Metrics and Scorecards are critical for tracking progress towards specific objectives and providing a comprehensive view of performance.
- Visualization:
- Data visualization is a powerful tool for communicating insights. Visualizations transform complex data into understandable representations, helping to spot patterns, anomalies, and trends [Introduction, 11].
- PowerBI offers a variety of built-in visualization types, such as bar charts, maps, tables, cards, multirow cards, gauges, KPI visual, scatter plots, bubble charts, and dot plots. Heat maps, tree maps, and 3D visualizations are also discussed for handling high-density data. Coropleth and shape maps are common map visuals.
- Custom visuals can be imported from the PowerBI marketplace or created using Python or R.
- Design principles are important for creating effective visualizations. This includes considering color theory, appropriate positioning and scale, maintaining cohesion and consistency, and avoiding clutter.
- Accessibility is crucial in report design, including features like alt text, sufficient color contrast, keyboard navigation, and compatibility with screen readers. PowerBI has built-in tools to support this.
- Visualizations can be interactive, allowing users to drill down, filter, and sort data.
- Visual interactions determine how selecting data in one visual affects others. The primary types are filter (filters other visuals), highlight (dims non-selected data), and none (no interaction).
- Slicers help users drill down to deeper insights and can be synchronized across report pages to improve user experience.
- The Selection Pane helps manage report elements, allowing naming, grouping, and layering visuals. Bookmarks can also be used to create a smooth narrative.
- PowerBI allows optimizing report layouts for mobile devices to ensure proper display on smaller screens.
- Sharing and Collaboration:
- Insights are communicated through reports and dashboards. Publishing reports to PowerBI Service makes them accessible and collaborative.
- PowerBI Workspaces are specialized areas that hold assets like reports, dashboards, and datasets. They help organize assets, provide security, enable collaboration, and allow quick updates. There are personal and shared workspaces.
- Workspace roles (viewer, contributor, member, admin) determine how individuals interact with content. Permissions can be managed.
- You can share Workspace assets as an app, which can have multiple audience groups with tailored access.
- Data security is important for safeguarding sensitive data. PowerBI offers authentication tools, sharing links with controlled permissions, sensitivity labels, and data permissions.
- Row-Level Security (RLS) controls which individuals can view data based on predefined roles and rules, enhancing security and user experience.
- You can promote and certify datasets to establish trust and standardize data quality, helping users find the most accurate data.
- Data Gateways establish a secure connection between PowerBI cloud services and on-premises data sources. Types include on-premises data gateway (standard mode), on-premises data gateway personal mode, and Azure virtual network data gateway. They help sync data and keep datasets up to date via schedule refresh.
- Subscriptions and Alerts provide automated delivery of data snapshots (emails/notifications) and notifications when specific conditions are met. They enhance user engagement and support real-time decision-making.
Overall, PowerBI transforms raw data into actionable intelligence, acting as a toolkit with mapping techniques and navigation support to help users cut through data noise and interpret patterns. It is a central tool in the data flow process within a business, moving from collection, processing, analysis, and decision-making.

PowerBI Data Transformation Explained

Data transformation is a fundamental process in Microsoft PowerBI, essential for preparing raw data for analysis and generating meaningful insights. It involves altering the structure, format, or values of data to make it suitable for analysis. This often includes cleaning, structuring, and enriching the data.

Why is Data Transformation Necessary?

Raw data, as collected from various sources, is often untidy, incomplete, inconsistent, scattered across different systems, or may have missing values or duplicate entries. Working with such data can lead to inaccurate or misleading analysis results and, consequently, poor business decisions. Data transformation addresses these issues by ensuring the data used for analysis is accurate, clean, consistent, and reliable. It standardizes data across multiple sources and organizes it to be more understandable.

Where Transformation Happens in PowerBI

Within PowerBI, data transformation is primarily handled by Power Query Editor. Power Query is a powerful ETL (Extract, Transform, Load) tool integrated into PowerBI Desktop. It provides a graphical user interface (GUI) for connecting to various data sources, cleaning data, and performing transformations with ease.

Key Data Transformation Techniques and Capabilities

Power Query Editor offers a range of tools and features for transforming data:
- Data Cleaning: This involves identifying and correcting errors and inconsistencies. Techniques include removing duplicate entries, handling or filling in missing values (nulls), fixing incorrect data types, and standardizing formats (e.g., ensuring consistent spelling or capitalization). Filtering data is also a key cleaning method.
- Structuring and Shaping Data: This prepares data for analysis. Operations include removing unwanted columns or rows, splitting or merging columns (e.g., combining first and last names into a full name), changing data types (e.g., text to numeric, date, or decimal), and sorting data. Promoting header rows is also a common shaping task. Grouping data allows manually dividing data points, while binning automatically separates data points into segments based on number or size.
- Combining Data: It is common to need to combine data from multiple sources.
- Append: Adds rows from one table to another. This is useful for consolidating data that has the same columns but spans across different files or databases (e.g., monthly sales files).
- Merge: Consolidates data from multiple sources into a single table based on matching criteria or key columns, similar to joining tables in a database. This is used when data needs to be combined horizontally based on relationships between tables.
- Reshaping Data Structures:Unpivot: Transforms data from a “wide” format (many columns) to a “narrow” format (fewer columns), often converting column headers into row values. This is useful for data normalization and making comparisons easier.
- Pivot: Transforms data from a “narrow” format to a “wide” format, converting rows into columns based on specific values.
- Adding Calculated Columns: Power Query allows adding new columns based on calculations performed on existing columns, such as calculating total price by multiplying quantity and unit price. DAX is used for calculations within the data model, but calculated columns can be created during the transformation stage in Power Query using its own formula language or features.
- Query Management: Power Query’s Applied Steps list is a critical feature, visually representing every transformation applied to a query. This list can be reviewed, modified, deleted, or reordered, ensuring transparency and allowing for easy undo or redo functionality. Referencing a query creates a new query based on an existing one, inheriting its steps. Changes to the original query automatically update the referenced query, which is useful for maintaining complex transformation workflows. Duplicating a query creates an independent copy that can be modified without affecting the original.
Relationship with Data Loading and Profiling

Transformation is typically performed after data extraction and before data loading into the PowerBI data model. The loading process brings the transformed data into PowerBI for analysis and visualization.

Before transforming or loading data, it is essential to inspect and profile the data. Power Query Editor provides tools like Column Quality, Column Distribution, and Column Profile to evaluate the data’s accuracy, completeness, validity, distribution, and identify anomalies or outliers. This profiling step helps identify where transformations are needed.

Benefits of Data Transformation

Effective data transformation is crucial for generating accurate reports and gaining valuable insights. It improves data quality and consistency, enhances performance by preparing data efficiently, simplifies data management, and helps organizations make informed decisions based on reliable information.

PowerBI Data Visualization Fundamentals

Data Visualization in PowerBI

Data visualization is a graphical representation of data. In Microsoft PowerBI, it is much more than simple graphical depictions; it involves converting raw data into a visual format to help identify patterns, trends, and insights that might not be apparent in text-based data. Visualizations enable you to communicate complex data and insights in a simple, appealing way by presenting data graphically. This process makes it easier for stakeholders to grasp key insights, trends, and patterns that may be difficult to identify from row data or tables.

Why is Data Visualization Important?

Data visualization is crucial for generating accurate reports and gaining valuable insights. It enhances business intelligence, particularly in complex and dynamic business environments. Key benefits include:
- Simplifying Complexity: Visualizations transform large, intricate datasets into intuitive, easy-to-understand graphical representations.
- Revealing Patterns and Trends: Data visualizations can reveal patterns, trends, and correlations hidden in raw data. For example, a bar chart could visualize sales data demonstrating geographic regions where sales are highest.
- Making Data Accessible: Visualizations make data more accessible to a broader audience, as most stakeholders can understand a well-designed chart or graph. This encourages engagement with data and contributes to data-driven decision-making.
- Powerful Communication Tool: Visualizations are a powerful communication tool that can tell a compelling story with data, making insights more memorable and persuasive.
- Driving Data-Driven Decisions: By providing clear, interactive displays, visualizations act like a navigation system through complex data, helping businesses make informed decisions based on reliable information.
- Real-time Analysis: Visualizations can enable real-time data analysis. For example, as sales figures are updated, visualizations in PowerBI can update automatically, providing up-to-date insights.
Where Visualization Happens in PowerBI

Visualizations are primarily created in the Report View of PowerBI Desktop. This is the primary canvas where you design and create your visualizations, adding and arranging different visual elements. Reports can have multiple pages organized using tabs at the bottom of the window. Once created in reports, visualizations can also be pinned to Dashboards in the PowerBI service, which provide a consolidated, one-page summary of the most important metrics or key performance indicators (KPIs).

Workflow for Creating Visualizations

Creating visualizations in PowerBI typically follows a workflow:
1. Connecting to data sources.
2. Using Power Query Editor to extract, transform, and load the data.
3. Loading the refined data into PowerBI’s data model.
4. Representing this processed data in visualizations.
Key Components and Concepts

Several key components and concepts are involved in creating and using visualizations in PowerBI:
- Visualizations Pane: Located on the right side of the window, this pane contains a gallery of visual elements you can add to your report canvas. You add visuals by clicking or dragging them onto the report view.
- Fields Pane (or Data Pane): Also on the right side, this pane displays the data tables and fields available for your report. You use this pane to populate your visualizations with data by dragging fields onto the visual or specific field wells.
- Field Wells: These are sections within the visualizations pane where you drag data fields to define how they are used in the visual, such as axes, legend, values, or tooltips.
- Axes (X and Y): These represent the data points you want to compare or analyze.
- Categorical Axes: Used to represent discrete, non-numeric data points (categories). PowerBI automatically arranges data points in the order they appear in the dataset or allows sorting. Common in bar charts and column charts.
- Continuous Axes: Designed to represent numerical data points with an inherent order along a continuous scale. Ideal for visualizing quantitative information to identify trends and patterns. Common in line charts, area charts, and scatter plots.
- Legend: Controls the color coding or grouping of elements in your chart, helping differentiate between different categories or subgroups. It makes it easier to understand which color represents which item.
- Tooltips: Display data or extra information when you hover over the data points of a chart. Tooltips can be customized to include additional fields.
- Formatting: PowerBI offers extensive options to format the appearance and feel of visualizations to improve their aesthetic appeal, readability, and align with branding. This includes options for colors, fonts, grid lines, titles, backgrounds, and more. Formatting options are found in the ‘Format visual’ tab of the visualizations pane.
Common Visualization Types

PowerBI offers a wide variety of visualization types:
- Charts:Column Charts: Compare different categories in a vertical orientation, useful for demonstrating changes over time or comparisons, generally with fewer than 10 categories.
- Bar Charts: Similar to column charts but horizontal, useful for comparing larger quantities or categories with lengthy labels.
- Line Charts: Best suited for showing trends over time by connecting individual numeric data points, particularly effective for large datasets.
- Area Charts: Similar to line charts but with the area beneath the line filled, helping compare quantities and show part-to-whole relationships over time or across categories. Stacked area charts emphasize the total across several categories.
- Pie Charts: Circular graphics divided into slices to illustrate numerical proportions of a whole. Each slice represents a category, and its size is proportional to its quantity. Less effective with too many categories.
- Donut Charts: Similar to pie charts but with a blank center. Ideal for showing a dataset as a proportion of a whole.
- Scatter Charts: Use dots to represent values for two numeric variables, plotting them along two axes to illustrate how one factor is affected by another, representing correlations and helping identify anomalies or outliers.
- Bubble Charts: A variation of scatter plots where a third variable is represented by the size of the bubble. They can depict multi-dimensional data in a single view.
- Funnel Charts: Present sequential or staged data, such as a sales conversion process, helping identify trends and bottlenecks.
- Combo Charts (Line and Column): Combine line and column charts to display complex and related data points seamlessly.
- Tree Maps: Use nested rectangles to display hierarchical or proportional data. Useful for visualizing larger datasets without becoming overly complex compared to pie charts.
- Tables: Display raw, detailed data and exact numbers in columns and rows, providing a comprehensive numerical view. Useful for examining exact figures and making precise comparisons.
- Maps: Visualize geographical data.
- Shape Maps: Color-code geographical regions based on data values to reveal insights.
- Coropleth Maps (Filled Maps): Similar to shape maps, shading or patterning geographical areas (countries, states, regions) to illustrate quantitative data values.
- Heat Maps: Use color gradients to represent the density and distribution of data across geographical regions or grids. Not a core PowerBI visual but can be imported or created with Python.
- ArcGIS Maps: Rich in map visualization features.
- KPI Visuals: Specifically designed to display key performance indicators. Include Cards (single value), Multirow Cards (multiple values per row), Gauges (progress toward a target), and the KPI visual (performance against target with trend line).
Advanced Visualization Techniques

PowerBI offers advanced capabilities for visualizing complex data:
- Handling High-Density Data: Techniques include using aggregations and summarization, drill through and drill down, color coding (like heat maps), and using 3D and custom visualizations.
- Hierarchies and Drill Down/Through: Organizing data into hierarchies (like Date, Product, Geography) allows users to explore data from a general overview level down to specific details within the same visualization. Drill down allows navigating through these hierarchy levels. Drill through is a technique for creating summary pages with high-level insights.
- Custom Visualizations: User-defined visual elements for specific requirements. They can be imported from the PowerBI marketplace (AppSource).
- Python/R Visuals: Integration with Python and R programming languages allows creating dynamic and sophisticated custom visualizations. This requires specialist expertise and has limitations on data size.
- Key Influencers Visual: An advanced analytics feature that uses AI algorithms to identify key contributors behind increases or decreases in a metric, such as sales.
- Decomposition Tree: Another specialized analytics tool to navigate through data hierarchy levels to understand how a final value is influenced by different categories.
- Clustering: Using algorithms (like in scatter plots) to group data points based on patterns and identify hidden relationships.
- Interactions: Visualizations can be configured to interact with one another.
- Filter: Selecting a data point in one visual filters the data displayed in others.
- Highlight: Selecting a data point highlights related data in other visuals while dimming the rest, maintaining context.
- None: Disables interaction, useful when visuals should function independently.
- Slicers: Visual filters that allow viewers to segment and filter the data in real-time.
Data Visualization and Data Storytelling

Data visualization is a crucial part of data storytelling. Data storytelling involves leveraging narrative, data, and visualizations to communicate insights effectively. Visualizations act as a bridge between raw data and actionable insights, supporting the narrative and making complex information accessible and engaging for the audience. By choosing appropriate and effective data visualizations, analysts can allow viewers to quickly grasp information and identify trends, patterns, and insights.

Accessibility

When designing reports and visualizations, it is important to consider accessibility. This means creating reports that can be easily used and understood by all individuals, including those with disabilities. Features supporting accessibility in PowerBI include providing alt text for visuals, ensuring sufficient color contrast, enabling keyboard navigation (Tab Order), using markers on lines, and ensuring compatibility with screen readers. High-contrast themes are also available.

Essential Concepts in Data Security

Based on the sources provided, here is a discussion of data security:

Data security is considered paramount in our digital age, like safeguarding your most valuable possessions in a vault with a strong lock. Data, being the lifeblood of modern organizations, is subject to a range of threats, including cyber attacks, breaches, and unauthorized access. Ensuring the security of this “digital gold mine” is not just a choice, but a necessity. In the world of data visualization, ensuring data security is of utmost importance. This includes protecting sensitive information and maintaining data integrity. Incorporating robust security measures is crucial throughout the visualization process.

Why Data Security Matters

Data security is crucial for generating accurate reports and gaining valuable insights. It enhances business intelligence, particularly in complex and dynamic business environments [Source 1 – my previous response, not directly from the provided sources]. Working with data often involves handling sensitive information, such as customer data, financial records, or proprietary business insights. Ensuring the security of this data is essential to:
- Maintain trust.
- Comply with regulations.
- Protect against unauthorized access or data breaches.
- Safeguard the company’s reputation and success.
- Prevent potential harm to the company and its stakeholders.
Mishandling sensitive data can lead to serious consequences, including financial loss, legal troubles, brand damage, and competitive disadvantage. It can also damage the relationship between an organization and its workforce if employee data is leaked.

Identifying Sensitive Data

Sensitive data contains important information about a business or its stakeholders that, if mishandled, could cause harm or misuse. A simple rule is: if it’s information that could damage the company’s reputation, finances, or stakeholder privacy, it’s sensitive data. Examples include:
- Customer details.
- Financial records (including profit margins).
- Employee information.
- Proprietary business knowledge or insights.
- Product designs.
- Vendor contracts.
Any information that offers intimate knowledge not meant for circulation can be classified as sensitive.

Measures for Safeguarding Data

PowerBI offers various measures to ensure data security:
- Access Control & Authentication: Controlling access to data is vital to ensure only authorized individuals can view or interact with specific data sets. Before a user can access a report, they need to prove who they are through an authentication system. Once authenticated, the system determines what data they are permitted to access. This helps protect organizations like Adventure Works from internal leaks and unauthorized external breaches. PowerBI allows defining roles for users with specific permissions tied to them, ensuring data is distributed on a need-to-know basis. Regularly reviewing and updating these roles is essential. Access logs and audit trails can also track and monitor data usage.
- Role-Level Security (RLS): RLS is a powerful data governance capability that controls which individuals can view data based on predefined roles and rules. It allows restricting data visibility so each user can only access data they are authorized to view, ensuring data integrity and confidentiality.
- Benefits: Precise control over data visibility, prevention of accidental data leaks, safeguarding sensitive data, easier handling of complex data access needs as data scales, assistance with compliance and auditing, and a reduced risk of data breaches.
- Types:Static RLS: Uses predefined rules based on user roles and is suitable for a fixed set of users or a simple logic. You configure this in PowerBI Desktop by managing roles, adding filters using DAX expressions, testing, and then assigning users to these roles in the PowerBI service.
- Dynamic RLS: Adjusts real-time data access based on user roles and attributes stored in the data itself, using DAX expressions like USERPRINCIPALNAME() to filter data dynamically. This is ideal when user access is based on varying criteria, such as region-specific data access.
- Considerations: Both types require thorough testing to ensure accurate and secure visibility. Dynamic RLS can potentially slow down data retrieval and requires regular maintenance.
- Data Anonymization and Masking: These techniques protect privacy by removing personally identifiable information or replacing it with pseudonyms. Techniques include generalization, suppression, or noise addition. Data masking specifically allows working with obscured versions of sensitive data, balancing transparency and security, for example, viewing only the last four digits of a credit card number. These are used for analysis and visualization while preserving privacy, especially when sharing data with external partners.
- Data Integrity: Maintaining data integrity is crucial to ensure the accuracy and reliability of the visualized information. Key aspects include data validation, error detection, and consistency checks. Implementing data validation rules and performing regular audits helps identify and rectify anomalies. Encryption techniques can also prevent unauthorized modifications and tampering.
- Secure Data Transmission: When transferring data or sharing visualizations, it is essential to prioritize secure data transmission using encrypted connections such as HTTPS or SSL/TLS. These protocols ensure data is encrypted during transit, making it difficult for unauthorized individuals to intercept or manipulate it. Other secure methods include using VPNs, two-factor authentication (2FA), enterprise cloud storage solutions, secure protocols like SFTP, and secure cloud-based platforms for distribution. Sharing reports externally requires secure embedding methods like publish to web or embed code, chosen carefully based on data sensitivity.
- Data Sensitivity Labels: PowerBI’s data sensitivity labels allow categorizing data to safeguard company reputation and trust. They act like digital tags indicating the required level of confidentiality. Applying these labels properly ensures data protection, especially when sharing or exporting. The sources mention six categories: Personal, Public, General, Confidential, Highly Confidential, and Restricted. These labels can also include encryption settings, preventing access even if a file is inadvertently shared.
- Sharing Permissions and Link Management: PowerBI’s link sharing feature allows distributing reports via a URL. However, this poses security risks, so access must be carefully managed. PowerBI offers different sharing options for links (e.g., people in your organization, specific people). Configuring sharing permissions is vital to safeguard data by determining who can access it and what they can do. Permission types include Read (view only), Build (use data for analysis/reports but not change source), Reshare (distribute to authorized users), Write (alter data sets), and Owner (comprehensive control). These permissions can be configured using the ‘Manage permissions’ option in the PowerBI service. When sharing externally, it is important to carefully control what information is shared and maintain strict security measures. Safe links with clear permissions, expiration dates, and limitations to specific users enhance report security. User licensing also needs to be considered for external partners.
- External Sharing Settings: PowerBI administrators can adjust settings to enable external sharing while maintaining security standards, such as authorizing users or groups, setting content restrictions, controlling link expiration, and mandating authentication.
- PowerBI Gateways: Data gateways, such as the on-premises data gateway, bridge the gap between PowerBI’s cloud services and on-premises data sources, allowing secure use of on-premises data in the cloud. The connection is outbound, which helps reduce security vulnerabilities.
Data Security in the Data Flow

Security considerations are relevant throughout the data flow stages: collection, processing, analysis, and decision-making. Processes within a business govern how data is acquired, stored, manipulated, and shared to support operations. Safeguarding data is important during data preparation (cleaning, transformation) [Source 1 – my previous response, not directly from the provided sources] and ensuring accurate data (data refresh). Planning for data storage and management involves considering security and implementing measures to protect data against unauthorized access, theft, tampering, and emerging threats.

Roles and Responsibilities

Various roles are involved in ensuring data security. Data analysts often work with sensitive data and must handle it with care. Database administrators safeguard the security and overall health of an organization’s databases. Data architects design strategies for data storage, integration, and retrieval, collaborating with other data professionals to align designs with business needs and support security objectives. BI analysts transform data into actionable insights and must work closely with other data professionals, considering data security when presenting to stakeholders. PowerBI Administrators control organizational settings related to security, including external sharing. Workspace roles (viewer, contributor, member, admin) define levels of interaction and access to assets.

In conclusion, security is a fundamental aspect of data visualization in PowerBI, crucial for protecting sensitive information, maintaining trust, ensuring data integrity, and complying with regulations. By implementing measures such as access control, RLS, data anonymization, secure transmission, sensitivity labels, and proper sharing permissions, organizations can build trust, protect sensitive information, and deliver reliable insights to stakeholders.

Microsoft Power BI: Data Analysis Study Guide

Quiz
1. What are the three key pieces of information required to construct an IF function formula in Excel? An IF function requires a logical test, a value to display or perform if the test is true, and a value to display or perform if the test is false.
2. Explain the primary difference between a nested IF function and an IFS function in Excel. A nested IF function involves placing one IF function inside another as an argument, typically in the “value if false” section. An IFS function is designed to handle multiple logical tests sequentially without requiring nesting.
3. According to the source material, why is gathering the right data crucial in the data analysis process? Gathering the right data is essential because it ensures the analysis is focused, relevant, and useful for the end user. Using irrelevant data will not provide insights needed for informed decisions.
4. What is the primary purpose of data profiling in Power BI, and what are two tools available in the Power Query editor for this? Data profiling identifies potential issues and anomalies within a dataset, enabling informed decisions about data cleaning and transformation. Column quality and column distribution are two tools in the Power Query editor for data profiling.
5. Define the terms “unique” and “distinct” as they are used in data profiling within Power BI, according to the source. “Unique” refers to the total number of values that appear only once in a column. “Distinct” refers to the total number of different values in a column, regardless of how many times each value appears.
6. What is DAX (Data Analysis Expressions) and what is its primary function in Power BI? DAX is a programming language used in Power BI (among other Microsoft tools) to create custom calculations on data models and generate additional information not present in the original data.
7. Explain the concept of “row context” in DAX calculations. Row context refers to the current row of a table being evaluated within a calculation. When a DAX expression is evaluated for a specific row, it considers the values in that row as the context for the calculation, allowing for row-level operations.
8. What are “calculated columns” in Power BI, and how do they differ from standard columns? Calculated columns are new columns added to an existing table in Power BI that display the results of a DAX formula. Unlike standard columns which are populated by imported data, calculated columns are generated dynamically based on existing data.
9. Describe the purpose of the CALCULATE function in DAX. The CALCULATE function in DAX evaluates an expression within a context that is modified by specified filters. It allows you to alter the filter context of a calculation, enabling more focused analysis.
10. What is the primary requirement for a table to be marked as a “date table” in Power BI for time intelligence calculations to function correctly? For a table to function correctly as a date table for time intelligence calculations, it must contain one record for each day, have no missing or blank dates, and span from the minimum to the maximum date present in the data.
Answer Key
1. Logical test, value if true, value if false.
2. Nested IF places IF functions inside each other as arguments; IFS handles multiple tests sequentially without nesting.
3. It ensures the analysis is focused, relevant, and useful for the end user and provides necessary insights for informed decisions.
4. To identify potential issues and anomalies within the dataset; Column quality and Column distribution.
5. Unique: Total number of values that appear only once. Distinct: Total number of different values regardless of frequency.
6. A programming language used for creating custom calculations and generating additional data not in the original model.
7. The current row being evaluated in a calculation, considering the values in that specific row.
8. New columns added using DAX formulas; they are calculated dynamically, while standard columns are from imported data.
9. To evaluate an expression in a filter context modified by specified filters.
10. One record per day, no missing or blank dates, and spans from minimum to maximum date.
Essay Format Questions
1. Compare and contrast the star schema and snowflake schema data models in Power BI. Discuss their key characteristics, advantages, disadvantages, and when you might choose one over the other.
2. Explain the concept of evaluation context in DAX. Discuss how row context and filter context interact and impact the results of DAX calculations, providing examples of each.
3. Describe the different types of measures in Power BI (additive, semi-additive, and non-additive). Provide examples of each and explain how the approach to aggregation differs for each type.
4. Discuss the importance of effective data visualization in Power BI for conveying insights to stakeholders. Describe at least three different visualization types mentioned in the source material and explain how they can be used to display key performance indicators (KPIs).
5. Explain the process of creating and utilizing data hierarchies in Power BI. Discuss why hierarchies are beneficial for data analysis and reporting, and describe how you can create your own custom hierarchies using different data fields.
Glossary of Key Terms
- Autofill: A feature in Excel that allows you to quickly copy formulas or data down a column or across a row.
- Logical Function: A function in Excel or Power BI that performs a calculation based on whether a condition is true or false.
- IF Function: A logical function in Excel that returns one value if a condition is true and another value if it’s false.
- Logical Operators: Symbols used in logical functions to compare values (e.g., =, >, <, >=, <=, <>).
- Nested IF: An Excel formula where one IF function is placed inside another IF function’s arguments.
- IFS Function: An Excel function that checks multiple conditions and returns a value corresponding to the first true condition.
- Serial Numbers: How Excel interprets and stores dates for calculation purposes.
- AutoFill Double-click Shortcut: A quick method in Excel to copy a formula down a column by double-clicking the fill handle.
- DAX (Data Analysis Expressions): A programming language used in Power BI, Excel Power Pivot, and SQL Server Analysis Services for creating custom calculations and data analysis.
- Data Modeling: The process of creating visual representations of data and defining relationships between data elements in Power BI.
- Schemas: Structures used to organize data in a data model, such as star and snowflake schemas.
- Relationships: Connections between tables in a data model, typically based on common key columns.
- Cardinality: The nature of the relationship between two tables (e.g., one-to-one, one-to-many, many-to-many).
- Cross-filter Direction: The direction in which filters propagate through relationships in a Power BI data model (e.g., single, bidirectional).
- Calculated Tables: New tables created in a Power BI data model using DAX formulas based on existing data or combinations of data sources.
- Cloned Tables: Exact copies of existing tables in a Power BI data model, often created to manipulate data without affecting the original table.
- Calculated Columns: New columns added to an existing table in a Power BI data model that display the results of a DAX formula.
- Measures: Dynamic calculations or metrics created in Power BI using DAX to summarize, analyze, and compare data across dimensions.
- Additive Measures: Measures that can be meaningfully summed across any dimension (e.g., total sales quantity).
- Semi-additive Measures: Measures that can be summed across some dimensions but not all, often problematic with the time dimension (e.g., inventory balance).
- Non-additive Measures: Measures that cannot be meaningfully summed across any dimension (e.g., profit margin percentage).
- Row Context: In DAX, the current row being evaluated within a calculation.
- Filter Context: In DAX, the set of filter constraints applied to the data before it’s evaluated by an expression.
- CALCULATE Function: A powerful DAX function that evaluates an expression in a context modified by specified filters.
- Time Intelligence Functions: Specialized DAX functions designed to work with date and time data for temporal analysis (e.g., TOTALYTD, DATESBETWEEN, DATEADD).
- Common Date Table (Date Dimension): A dedicated table in a data model containing a continuous list of dates, required for time intelligence calculations.
- Data Granularity: The level of detail captured in a data set or data field (high granularity means more detail).
- Data Profiling: The process of examining and summarizing data to understand its structure, content, and quality.
- Column Quality: A data profiling feature in Power BI that categorizes values in a column as valid, error, or empty.
- Column Distribution: A data profiling feature in Power BI that shows the frequency and distribution of values in a column.
- Append Queries: A process in Power Query to combine rows from two or more tables with the same column structure into a single table.
- Merge Queries: A process in Power Query to combine data from two or more tables based on matching values in common columns (similar to SQL joins).
- Join Type: Determines how rows from two tables are combined during a merge query based on matching criteria (e.g., left outer, inner).
- Primary Key: A column or set of columns in a table that uniquely identifies each row.
- Foreign Key: A column or set of columns in one table that establishes a relationship to the primary key in another table.
- Data Hierarchy: A structured way to organize data fields into levels, allowing for drill-down analysis in visualizations.
- Drill Down/Up: Features in Power BI visualizations that allow users to navigate through different levels of a data hierarchy.
- Bookmarks: A feature in Power BI reports that captures the current state (filters, slicers, visual state) and allows users to quickly return to that state.
- Key Performance Indicators (KPIs): Measurable values that indicate the effectiveness of a company or department in achieving business objectives.
- Card Visualization: A Power BI visual that displays a single data point or value.
- Multi-row Card Visualization: A Power BI visual that displays one or more data points, with each data point on a separate row.
- Radial Gauge: A Power BI visual that displays a single value measuring progress toward a goal or target.
- KPI Visual: A Power BI visual specifically designed to track the performance of a metric against a target, often including a trend line.
- Histogram: A type of bar chart used to visualize the frequency distribution of data, grouping values into ranges or bins.
- Top N Analysis: A method to filter data to show only the top or bottom specified number of values based on a criterion.
- Geo Hierarchy: A data hierarchy based on geographical locations (e.g., continent, country, state, city).
- Custom Visualizations: Visualizations in Power BI created using programming languages like Python or R or developed to meet specific analytical or aesthetic needs.
- Workspace Apps: A feature in Power BI Service that allows you to package and share an entire workspace (data sets, reports, dashboards) with specific users or teams.
- Impact Analysis: A tool in Power BI Service to view which workspaces, reports, or dashboards are affected by a data set.
- Lineage View: A view in Power BI Service that shows the connections and dependencies between different items in a workspace.
- Permissions: Settings in Power BI Service that control who can access and interact with data sets, reports, dashboards, and workspace apps.
- Use Relationship Function: A DAX function that allows you to activate an inactive relationship between tables for a specific calculation.
- Role-Playing Dimension: A single dimension table in a data model that can play multiple roles in relationships with a fact table (e.g., a Date table related to both Order Date and Ship Date).
Briefing Document: Excel and Power BI Data Analysis Techniques

Summary:

This document summarizes the key concepts and techniques presented in the provided source material, focusing on fundamental data manipulation in Excel and various advanced data analysis and visualization capabilities in Microsoft Power BI. The sources cover Excel’s date/time and logical functions (IF, nested IFs, IFS), and delve into Power BI topics such as data modeling, DAX (Data Analysis Expressions), data preparation (profiling, cleaning, transforming, loading, merging, appending), visualization types, hierarchical data, bookmarks, and performance optimization. The importance of non-technical skills, data quality, and understanding analysis objectives is also highlighted.

Key Themes and Important Ideas:

1. Excel Fundamentals:
- Working with Dates and Time: Excel interprets dates as serial numbers, allowing for calculations like subtraction. Functions like TODAY(), NOW(), DAY(), MONTH(), YEAR(), and DATE() are used to extract or combine date components and create dynamic date/time formulas.
- “Excel interprets stored dates as serial numbers…”
- “you can separate the date into its component parts so that you can focus on the year element type an equal sign the word year and an open parenthesis in cell H5…”
- “…you also reviewed functions for creating dynamic formulas that calculate time and date values these include the today and now functions…”
- “…you can also divide a date entry into its component parts using day month and year or return these components as a single date with the date function…”
- Logical Functions (IF, Nested IFs, IFS): Logical functions allow Excel to perform actions based on conditions or logic, essentially asking “yes” or “no” questions about data.
- “when working with Excel you might need to execute a function under certain conditions or logic in these instances you can use a logical function calculation like an if function…”
- “You can use logical functions to ask yes or no questions about your data if the function returns yes as its answer then you can direct Excel to perform the required action however if the function returns an answer of no then Excel can be directed to perform a different action…”
- Logical Operators: These operators are crucial for logical tests within formulas and compare values against specified criteria. Examples include =, >, <, >=, <=, and <>.
- “for these tests to work the formula must contain logical operators the logical operators determine what kind of question the formula is asking and what value it needs for its answer these operators can be used to compare both text and numeric entries…”
- “The equal sign is the first of the mathematical operators that Excel uses in logical functions excel uses this operator to check if the value of one item is equal to that of another item…”
- “finally a very useful set of logical operators is not equal to this is when the less than and greater than symbols are typed back to back this combination of operators is interpreted by Excel as not equal to…”
- IF Function Syntax: The IF function requires three arguments: a logical test, a value if true, and a value if false.
- “when constructing the if function formula you need to give Excel three pieces of information the first piece of information is called the logical test… The next instruction tells Excel what to do or what to display if the test returns a result of true… The third and final argument is what Excel should do or display if the logical test returns the result of false…”
- Nesting IF and IFS Functions: Nested IF functions allow for multiple conditions to be tested sequentially, with subsequent IF functions embedded within the value if false argument of the previous one. The IFS function provides an alternative, designed to run a series of tests without nesting, executing the action for the first test that returns true.
- “what if you need to test for multiple conditions? You can use nested if and ifs functions…”
- “nesting functions is the technique of adding another function to the formula as an argument for the original function in other words you can place one function inside another to expand its functionality…”
- “One approach would be to create what is known as a nested if formula the formula begins with an if that performs an initial logic test if the test turns out to be true then the formula will simply process whatever action is specified in the value if true argument however the result of the logical test could also be false if so then another if function in the value of false argument could run another test and process different actions…”
- “The second approach is to use a function called ifs an ifs function is designed to run a series of tests that don’t require you to nest other functions the ifs function steps through the tests checking each one if a test is false it continues to move through the tests until it finds one that is true when a logical test returns true as a result the formula performs or displays whatever is in the value if true for that test it then stops running tests…”
2. Power BI – Data Modeling and DAX:
- Data Modeling: Creating visual representations of data and defining relationships between data elements to generate insights. Power BI is a key tool for this.
- “data modeling is creating visual representations of your data in PowerBI you can use these representations to identify or create relationships between data elements by exploring these relationships you can generate new insights into your data to improve your business…”
- “microsoft PowerBI is a fantastic tool for creating data models and generating insights and you don’t need an IT related qualification to begin using it…”
- Schemas (Flat, Star, Snowflake): Different ways to structure data models. Star and Snowflake schemas are common, organizing data into fact and dimension tables.
- “you’ll learn to identify different types of data schemas like flat star and snowflake…”
- “when deciding on the data schema you plan to use for your analysis the most common schema types are star and snowflake schemas you may recall that in these schemas data is broken down into fact and dimension tables…”
- Relationships: Connecting tables based on common keys (primary and foreign keys). Cardinality (one-to-one, one-to-many, many-to-many) and cross-filter direction are important aspects of relationships.
- “you’ll create and maintain relationships in a data model using cardality and cross- filter direction…”
- “a table relationship is how two tables are connected to each other…”
- “in the products table the product ID column is what’s known as a primary key each value in the product ID column is unique… in the sales table the product ID column is what’s known as a foreign key it’s not the primary key of the table but instead it establishes a relationship to the products table…”
- “Now that you know how to establish a relationship between two tables the next important aspect is the cardality of the relationship in PowerBI there are three types of cardality one many to one or one to many and many to many…”
- DAX (Data Analysis Expressions): A programming language used in Power BI (and other Microsoft tools) to create custom calculations and generate information not present in the original data model. It uses functions, operators, and constants.
- “if it’s possible to derive the data from the original model you can use DAX data analysis expressions to create custom calculations to generate the data…”
- “dax is a programming language used in Microsoft SQL Server analysis services Power Pivot in Excel and PowerBI it is a library of functions operators and constants used in formulas or expressions to create additional information about the data not present in the original data model…”
- “to master DAX you need to understand its syntax different data types the operators and how to refer to columns and tables using functions…”
- DAX Syntax: Typically involves specifying the name of the new calculation, an equal sign, the DAX function name, and arguments within parentheses (often referencing table and column names).
- “first write the name of your new calculation then add the equal sign operator next write the name of your DAX function then parenthesis that contain the logic of your formula write a table name enclosed in single quotes followed by the column name enclosed in square brackets…”
- Operators in DAX: Used for various calculations and comparisons, including arithmetic, comparison, logical, and concatenation.
- “dax formulas rely on operators there are many different types of operators they can be used to perform arithmetic calculations compare values work with strings or test conditions…”
- DAX Functions: Reusable pieces of logic for tasks like aggregations, conditional logic, and time intelligence calculations. Examples include SUM, AVERAGEX, and SUMMARIZE.
- “functions are reusable pieces of logic that can be used in a DAX formula these functions can perform various tasks including aggregations conditional logic and time intelligence calculations…”
- “commonly used DAX formulas and functions include calculate sum and average…”
- Row Context and Filter Context: DAX formulas are evaluated within a context. Row context refers to the current row being evaluated in a calculation. Filter context refers to the constraints applied to the data before evaluation, determining the subset of data used for calculations.
- “dax computes formulas within a context the evaluation context of a DAX formula is the surrounding area of the cell in which DAX evaluates and computes the formula this surrounding area is determined by the set of rows and filters to be evaluated in a DAX expression it determines which subset of data is used to perform calculations…”
- “row context refers to the table’s current row being evaluated within a calculation…”
- “filter context refers to the filter constraints applied to the data before it’s evaluated by the DAX expression…”
- CALCULATE Function: A powerful DAX function that can alter the filter context of a calculation. It evaluates an expression within a context modified by specified filters.
- “calculate along with its companion calculate table is the only DAX function that can alter the filter context during a DAX calculation…”
- “the calculate function evaluates an expression in a context modified by the specified filters…”
- “from the examples you have learned the calculate only modifies the outer filter context by applying new filters this is done by either overriding the existing filter or by combining new filters with the existing ones…”
- Measures: Calculations or metrics that generate meaningful insights from data, often using DAX. They are essential for quantitative analysis and can be categorized as additive, semi-additive, and non-additive.
- “a measure is a calculation or metric that generates meaningful insights from data measures are an important aspect of data analysis and play a lead role in creating calculated tables and columns…”
- “there are three different types of measures additive semi-additive and non-additive which type of measure is used depends on the needs of your data and its dimensions…”
- Additive, Semi-Additive, and Non-Additive Measures:Additive: Can be meaningfully aggregated across any dimension (e.g., total sales).
- Semi-Additive: Can be aggregated over some dimensions but not all, often time (e.g., inventory balance).
- Non-Additive: Cannot be meaningfully aggregated across any dimension (e.g., profit margin percentage).
- Statistical Functions in Measures: Functions like AVERAGE, COUNT, DISTINCTCOUNT, MIN, and MAX are used in measures to calculate values related to statistical distributions and probability.
- “a key element of measures is statistical functions statistical functions calculate values related to statistical distributions and probability to reveal information about your data several common statistical functions are used in measures like average median and count…”
- Calculated and Cloned Tables/Columns: Calculated tables and columns are new elements created within a data model using DAX formulas. Calculated tables can combine data from multiple sources or normalize dimension tables. Cloned tables are exact copies used for manipulation without altering the original. Calculated columns add derived data to existing tables.
- “you can use calculated and cloned tables to enhance your data sets and improve your analysis…”
- “a calculated table is a new table created within a data model based on data from different sources a calculated column is a new column added to an existing table that presents the results of a calculation…”
- “cloning a table can be extremely useful for manipulating or augmenting data without affecting the original table…”
- “calculated columns are custom data columns that are created within a Microsoft PowerBI data model using data analysis expressions or DAX language…”
- Time Intelligence Functions: Specialized DAX functions for working with date and time data to perform advanced temporal analysis, including period-to-date calculations, comparisons, and moving averages. A common date table is a prerequisite.
- “time is the dimension that virtually underpins all data analysis and for this reason time intelligence functions hold a position of paramount importance time intelligence functions are specialized functions designed to work with date and time data enabling users to perform advanced temporal analysis and gain deeper insight into historical data…”
- “a common date table or date dimension is a prerequisite for time intelligence calculations you can’t execute them without a date dimension…”
- “important time intelligence DAX functions is total year-to- date… date year-to- date function… dates between… same period last year… date add function…”
- Common Date Table: A critical dimension table for time intelligence calculations, requiring one record per day, no missing or blank dates, and covering the full date range of the data. Can be created in Power BI using Power Query or DAX (CALENDAR, CALENDARAUTO).
- “a common date table or date dimension is a prerequisite for time intelligence calculations…”
- “the date dimension must meet the following requirements there must be one record per day there must be no missing or blank dates and it must start from the minimum date and end at the maximum date corresponding to the fields in your parameters…”
- “you can create a date dimension in PowerBI using either Power Query or DAX this is useful when working on large data sets with complex calculations you can create a date dimension with DAX using the calendar and calendar auto functions…”
- USE RELATIONSHIP Function: Used within other DAX functions (like CALCULATE) to override or activate an inactive relationship between two tables for a specific measure calculation.
- “with the cross filter function you can change the cross filter direction for a specific measure while maintaining the original settings… Fortunately Adventure Works can use the cross filter function to alter the direction while maintaining the original settings…”
- “the cross filter function changes the cross filter direction between two tables for a specific measure while maintaining the original settings…”
- “you can only use use relationship within DAX functions that take filter as an argument for example calculate calculate table and total YTD…”
- “the use relationship function in DAX overrides this relationship and establishes a temporary relationship between the date column of the date table and the shipping date column of the sales table this inactive relationship becomes active only during the current calculation when using the use relationship function there are some essential points to consider…”
3. Power BI – Data Preparation and Transformation:
- Importance of Gathering the Right Data: The objective or purpose of the analysis informs the data collection process, ensuring the data is focused, relevant, and useful for the end user.
- “gathering the right data is crucial for conducting a successful analysis however before you can start collecting data it’s essential to determine and understand the purpose or goals of the analysis you can then collect the appropriate data to conduct an analysis that is focused relevant and useful for the end user of the analysis…”
- “the purpose of your analysis will inform what is the right data to collect including the type and scope of the data to gather and use in the analysis…”
- Data Profiling: Analyzing data to understand its structure, content, quality, and patterns. Helps identify potential issues and anomalies for cleaning and transformation. Power BI’s Power Query Editor offers Column Quality, Column Distribution, and Column Profile tools.
- “data profiling is the process of examining and analyzing a data set to understand its structure content quality and patterns…”
- “data profiling enables the identification of potential issues and anomalies within the data set this proactive approach allows you to make informed decisions about data cleaning transformation and enrichment ultimately leading to improved data quality…”
- “microsoft PowerBI offers the following two profiling tools in the Power Query editor column quality and column distribution…”
- “column quality focuses on valid error and empty rows on each column allowing you to validate your row values…”
- “column distribution provides a set of visuals underneath the names of the columns that showcase the frequency and distribution of the values in each of the columns…”
- “another type of profiling in PowerBI is column profile column profile provides column statistics such as minimum maximum average frequently occurring values and standard deviation…”
- Unique vs. Distinct: In Power BI, “unique” refers to values that appear only once, while “distinct” refers to the total number of different values regardless of frequency.
- “before delving into data profiling tools let’s first consider two important factors in data profiling unique and distinct in PowerBI unique is known as total number of values that only appear once distinct is known as total number of different values regardless of how many of each you have…”
- Data Cleaning: Addressing inconsistencies, errors, and missing values identified during profiling.
- “you explored evaluating data data statistics and column properties reviewing why data evaluation is crucial Power Query’s profiling capabilities and different evaluation methods through an interactive activity you practiced analyzing a data set for anomalies and statistical irregularities preparing you for real world scenarios as a PowerBI data analyst you also explore data inconsistencies unexpected or null values and data quality issues you may encounter as a PowerBI data analyst as well as resolving data import errors…”
- Transforming and Loading Data: Shaping data into a usable format and loading it into the data model. Includes creating and transforming columns, changing data types, and applying query steps.
- “next you explored the transforming and loading data you reviewed creating and transforming columns understanding the importance of selecting appropriate column data types and how to transform columns and create calculated columns in Power Query you brushed up on shaping and transforming tables and applying query steps to shape the data exploring reference queries you recaped when to use reference or duplicate queries and also unpacked the differences between merge and append queries and explored the different types of joins…”
- Merge vs. Append Queries:Append: Combines rows from multiple tables into a single table (stacking data). Works best when tables have the same column structure.
- Merge: Combines columns from multiple tables based on a common key (joining data). Requires selecting a join type (left outer, right outer, full outer, inner, left anti, right anti).
- “Append queries are a great way to consolidate data from multiple sources into a single table… append queries works well when the columns in the data source are well aligned and the desired resulting table should match the format of the data sources however you may encounter more complex scenarios requiring the merging of data from different sources this is where merge queries comes in…”
- “to merge two tables you need to tell the merge query which type of join you would like to use the join type informs PowerBI how to merge the two tables a join requires that there is a common column between the two tables… this is known as the join key…”
- “powerbi supports the following join types left outer right outer full outer inner join left anti-join and right anti- join…”
4. Power BI – Visualization and Presentation:
- Visualizing KPIs: Displaying key performance indicators using Power BI visuals like Cards, Multi-row Cards, Radial Gauges, and the dedicated KPI visual. KPIs differ from regular charts by aligning with strategic business objectives.
- “kpis differ from regular charts and metrics because they align directly with strategic business objectives instead of simply presenting raw data KPIs offer insight into how that data impacts overall business goals and progress…”
- “microsoft PowerBI offers a range of visualizations to display KPIs including cards multirow cards gauges and the KPI visual…”
- Card Visuals: Display a single value or data point, ideal for essential statistics.
- “the card visualization displays one value or a single data point this type of visualization is ideal for representing essential statistics you want to track on your PowerBI dashboard or report…”
- Multi-row Card Visuals: Display one or more data points, with one data point per row.
- “next is the multirow card visualization that displays one or more data points with one data point for each row…”
- Radial Gauge Visuals: Circular arcs displaying a single value, measuring progress toward a goal.
- “another visualization you can use is the radial gauge this visual is a circular arc that displays a single value measuring progress toward a goal or target or indicates the health of a single measure…”
- KPI Visual: Tracks a metric’s performance against a target and includes a trend line.
- “lastly the KPI visual in PowerBI is a powerful tool for tracking the performance of a metric against a target the KPI visual also includes a trend line or chart to show the data’s trajectory over time…”
- Data Granularity: Refers to the level of detail captured in a data set or field. High granularity provides deeper, more precise insights. The appropriate level of granularity depends on the analysis objectives.
- “data granularity refers to the level of detail or depth captured in a certain data set or data field granular data provides deeper and more precise insights this delivers more nuanced and valuable findings…”
- “data granularity isn’t about always having the highest level of detail it’s about having the appropriate level of detail before you begin your analysis ask yourself do you require high granularity or low granularity the decision should depend on the specific requirements and objectives of the analysis…”
- Histograms: Visualizations illustrating the frequency distribution of data by grouping data points into ranges or bins. Often use bar or area charts.
- “a histogram is a way to visualize a topend data query result while the topend function in PowerBI is a built-in DAX function that retrieves the topend records from a data set based on specific criteria it compares the parameters provided and returns the corresponding rows from the data source the n in top n refers to the number of values at the top or bottom data points are grouped into ranges or bins making the data more understandable a histogram is a great way to illustrate the frequency distribution of your data…”
- Top N Analysis: Filtering data to display only the top or bottom ‘n’ values based on specific criteria, enabling quick identification of significant data points.
- “the top end analysis prevents this by sorting the data to display according to a category’s best or worst data points this enables stakeholders to quickly identify the top or bottom values in the data and make datadriven decisions efficiently…”
- Data Hierarchies: Structured ways to organize data (e.g., geographical, product categories) to allow users to drill down into data at different levels of detail. Can be created automatically by Power BI (for dates) or manually.
- “PowerBI offers a way to unravel this mystery by creating a data hierarchy hierarchies provide a structured way to organize and visualize data allowing users to uncover hidden insights and tell a compelling story…”
- “PowerBI has automatically created a hierarchy with all the date fields such as estimated delivery date and order date… How can you create a hierarchy of your own? Let’s create a hierarchy for product related data using the product category product subcategory color and product name fields…”
- Map Visualizations: Used for visualizing geographical data. Requires correctly formatting geographical columns as data categories (Country, State/Province, City) and can benefit from using latitude and longitude coordinates for precision. Geo hierarchies enhance map visualizations.
- “for map visualizations defining a precise location is especially important this is because some designations are ambiguous due to the presence of one location name in multiple regions for example there is a Southampton in England Pennsylvania and New York adding longitude and latitude coordinates solves this issue but if the data set does not have this information you will need to make sure to format the geographical columns as the appropriate data category…”
- “adding depth to map visualizations leverages geo hierarchies you can drill down from country to state state to city and so on…”
- Bookmarks: Capture and save the current state of a report (filters, slicers, display properties, current page, visual selection) to share specific views with others or for easy navigation.
- “bookmarks in PowerBI are a way to capture the current state of the report you are viewing and share this state with other viewers…”
- “when adding a bookmark there are four state options that you can save data properties such as filters and slicers display properties such as visualization highlighting and visibility current page changes which present the page that was visible when you added the bookmark and selecting if the bookmark applies to all visuals or selected visuals…”
- Using Variables for Troubleshooting: Variables in DAX store values or tables temporarily, allowing for breaking down complex formulas into smaller, manageable parts. This aids in debugging and understanding the calculation process.
- “maybe the weight of potential inaccuracies weighs on you mistakes mean mistrust in data and mistrust in data can lead to poor business decisions in this video you’ll learn how to use variables in DAX to troubleshoot issues like this one…”
- “to recap a variable in DAX lets you store a value or a table to be used later in your formula think of them as placeholders or temporary storage units for your data by breaking down your DAX formula into smaller pieces and storing parts of the calculation in variables you can keep track of each step making the process more comprehensible and easier to debug…”
- Power BI Service – Dashboards: Dashboards provide a single page view of key metrics and visuals from one or more reports. They are available in Power BI Service and mobile, but not Desktop. Tiles from reports or other dashboards can be pinned to dashboards.
- “a PowerBI dashboard is a single page view of key metrics and visuals from one or more reports…”
- “you can create and copy dashboards you must use the Microsoft PowerBI service you can view dashboards in Microsoft PowerBI service and in Microsoft PowerBI mobile dashboards are not available in PowerBI desktop…”
- Duplicating Dashboards and Pinning Tiles: Dashboards can be duplicated in Power BI Service. Tiles from reports or other dashboards can be pinned to existing or new dashboards to consolidate visuals.
- “to create a copy of a dashboard you must be the creator of the dashboard… you cannot pin tiles from dashboards shared with you only from dashboards created by you…”
- “to duplicate a dashboard log into your PowerBI service and open the workspace that contains your dashboard… to pin a tile from one dashboard to another open the product sales dashboard from my workspace and hover the cursor on the tile to pin then select more options and select pin tile from the dropdown…”
- Custom Visualizations (Python/R): Power BI allows for creating custom visualizations using Python or R programming languages for more advanced or specific needs. Requires installing Python/R and enabling scripting in Power BI.
- “you can create custom visualization in PowerBI using Python or R programming languages these visualizations are imported from a file on your local computer you can also develop PowerBI visuals to meet your analytical or aesthetic needs…”
- “using R or Python to develop your own PowerBI visuals or to customize existing ones is an optional expertise you may wish to pursue it if you have a coding background a familiarity with Python or want to extend your skill set into this area…”
- Data Access and Permissions in Power BI Service: Power BI Service allows for managing data access and permissions at the dataset level and through workspace apps. Lineage view helps understand the impact of a dataset on reports and dashboards.
- “effective data access and permission management is crucial to ensure that the right individuals have the appropriate level of access to sensitive data and reports…”
- “with data set level permissions PowerBI service enables you to assign specific permissions to data sets while sharing you can ensure that although colleagues can access and utilize the data they cannot make changes to it this ensures the sanctity of vital data sets…”
- “workspace apps in PowerBI allow you to share entire workspaces including data sets dashboards and reports ia workspace app is a full data package that can be shared with specific users or teams ensuring a comprehensive sharing experience…”
- “to check how many workspaces reports or dashboards are affected by a data set you can perform what is known as impact analysis to do this you go to your workspace and hover on a data set then select the more options three dots next to it and select show lineage…”
- Using Microsoft Copilot in Bing for DAX Assistance: Copilot can help troubleshoot DAX formulas, suggest corrections, and offer alternative approaches for complex calculations like nested IFs.
- “Microsoft Copilot in Bing can also be a valuable companion in troubleshooting and improving your DAX formulas…”
- “microsoft Copilot in Bing can help guide you through the correct structuring of calculate formulas suggest how to perform dynamic aggregations and even detect and suggest fixes to syntax errors…”
- “Copilot can simplify this by suggesting straightforward alternatives or helping restructure these nested conditions into manageable components…”
5. General Concepts:
- Importance of Non-Technical Skills: Developing non-technical skills like understanding end-user needs, relaying findings to stakeholders, collaboration, and creating actionable insights are crucial for data analysts.
- “non-technical skills are equally vital these include a keen understanding of the needs of end users and the ability to relay findings and concepts to stakeholders of varying technical knowledge by developing these non-technical skills you can better collaborate with stakeholders create actionable insights inspire change and make lasting impacts enriching your own career and contributing to the growth and success of those around you…”
- Data Quality: Emphasized throughout the data preparation process, focusing on completeness, accuracy, uniqueness, and consistency.
- “data profiling enables the identification of potential issues and anomalies within the data set this proactive approach allows you to make informed decisions about data cleaning transformation and enrichment ultimately leading to improved data quality…”
This briefing document provides a high-level overview of the key topics and concepts covered in the provided source material, offering a foundation for understanding essential data analysis techniques in both Excel and Power BI.

Excel Functions and Power BI Data Modeling
- How do Excel’s logical functions, such as the IF function, work and what are they used for?
- Excel’s logical functions are used to ask yes or no questions about your data. Based on the answer to that question (true or false), Excel can be directed to perform different actions or display different values. The IF function is a common example, requiring three pieces of information: a logical test (a condition to check, often using logical operators), what to do if the test is true, and what to do if the test is false. For example, you could use an IF function to check if a sales figure is greater than or equal to a target; if true, award a bonus, and if false, award nothing. Logical operators like =, >, <, >=, <=, and <> (not equal to) are essential components of these tests.
- When might you need to use multiple conditions in Excel logical functions, and what are the approaches?
- You might need to test for multiple conditions when a simple yes/no question isn’t sufficient. For instance, determining different bonus levels based on varying sales thresholds. There are two main approaches: using nested IF functions or using the IFS function. A nested IF involves placing an IF function within another IF function’s “value if false” argument to perform a subsequent test if the initial one is false. The IFS function is designed to run a series of tests without nesting, stepping through each condition until one is true and then performing the corresponding action.
- What is Data Analysis Expressions (DAX) in Power BI and what are its key components?
- DAX is a programming language used in Power BI, SQL Server Analysis Services, and Power Pivot in Excel. It’s a library of functions, operators, and constants used to create additional information or custom calculations on data models that isn’t present in the original data. Key components of DAX include syntax (defining calculations, often starting with a name, equals sign, and function), operators (for arithmetic, comparison, logic, and concatenation), functions (reusable logic for tasks like aggregation, conditional logic, and time intelligence), and understanding the data model (tables, relationships, and context).
- How do row context and filter context influence DAX calculations in Power BI?
- DAX formulas compute values within a context. Row context refers to the current row being evaluated within a calculation. This allows calculations to be performed row by row, which is useful for tasks like creating calculated columns where a calculation is applied to each row independently. Filter context refers to the filter constraints applied to the data before a DAX expression is evaluated. This determines which subset of data is used for calculations. Changes in filters (like selecting a specific product category or region) will alter the filter context, leading to different results for the same DAX measure.
- What are measures in Power BI, what types exist, and why are they important for analysis?
- Measures in Power BI are dynamic calculations or metrics used to generate insights from data. They are essential for quantitative analysis and summarizing, calculating, and comparing data. There are three main types: additive measures (which can be meaningfully summed across all dimensions, like total sales), semi-additive measures (which can be summed across some dimensions but not all, particularly time, like inventory balance), and non-additive measures (which cannot be meaningfully summed across any dimension, like percentages or ratios). Measures are important because they compute values on the fly based on the current filter context, allowing for dynamic analysis and reporting.
- What are calculated and cloned tables in Power BI and when would you use them?
- Calculated tables are new tables created within a Power BI data model using DAX expressions, often based on data from existing tables or even multiple sources. Cloned tables are exact copies of existing tables. You would use calculated tables to combine data from different sources, normalize dimension tables (like in a snowflake schema), create a common date dimension table, or generate summary tables from large datasets. Cloned tables are useful when you need to manipulate or augment data without affecting the original table, especially if the original data is refreshed periodically.
- How do data granularity and geographical hierarchies contribute to data analysis in Power BI?
- Data granularity refers to the level of detail captured in a dataset or data field. High granularity provides deeper and more precise insights, while low granularity offers a more summarized view. Choosing the appropriate level of granularity depends on the analysis objectives. Geographical hierarchies in Power BI (like Country > State > City) provide a structured way to organize and visualize data based on location. They allow users to drill down into data from a broad overview to a more detailed level, enabling the analysis of trends and performance at different geographical scales.
- What is the significance of data modeling, schemas (Star and Snowflake), and table relationships in Power BI?
- Data modeling in Power BI involves creating visual representations of your data and defining relationships between data elements to generate new insights. Schemas, such as the Star and Snowflake schemas, are common structures for organizing data into fact tables (containing measurements and metrics) and dimension tables (providing contextual attributes). Table relationships, established using primary and foreign keys, define how these tables are connected. Understanding and correctly configuring cardinality (one-to-one, one-to-many, many-to-many) and cross-filter direction in these relationships is crucial for accurate data analysis and filter propagation in Power BI calculations.
Power BI Tutorial For Beginners To Advanced | Master Power BI From Beginner to Expert, By Microsoft

The Original Text

data is an important part of your day-to-day existence think about how many times you collect and make use of data every day for example you may have recently compared the cost of flights to find the best value for your vacation or you might have asked your friends to let you know what dates they’re available to meet for a party so that you can find a day that suits everyone in the group so how do data analysts make use of information just like when you plan your vacation or party they identify and gather important data then study and analyze the data to generate the insights that they need data analysts carry out these tasks using a range of techniques tools and software like Microsoft Excel and Microsoft PowerBI these might sound like complicated technologies but it’s possible to approach them from an entry-level stage and develop competency and this high demand at an organizational level for individuals who can demonstrate proficiency with these tools the career opportunities available for data analysts include a range of roles from business analyst to data scientist to database administrator with increasing digitization of all aspects of life the demand for these roles across all business sectors is greater than ever with the right knowledge and skills you could be the next data analyst an organization is looking for you might be keen to pursue a career in data analytics but you might also be concerned that you don’t have a relevant university degree or prior experience or maybe the cost is just too high don’t let these concerns hold you back if you’re fascinated by the world of data and willing to join us then we’re offering you a chance to embark on a learning journey that prepares you for an exciting career in data analytics this Microsoft PowerBI analyst professional certificate consists of a series of courses that act as a solid foundation of fundamental knowledge that imparts the skill set required for an entry- level job in data analytics in addition finishing this program also prepares you for the exam PL300 Microsoft PowerBI data analyst earning a Microsoft certification provides industry endorsed evidence of your skills and demonstrates your willingness to stay on top of the latest trends and demands and stand out in a fast changing industry you’ll begin this program with an overview of how to design and manage spreadsheets using Microsoft Excel this overview begins with a guide to Excel elements and techniques along with guidance on how to organize data you’ll then learn how to prepare data for analysis using different functions this overview of Excel will help you to understand the importance of sourcing and organizing data so you’ll follow it with an exploration of the different stages and roles in the data analysis process you’ll begin by learning about essential data analysis concepts and the role of the data analyst you’ll then review the tools required to source gather transform and analyze data effectively sourcing data is important but so is preparing it for analysis that’s why you’ll also learn how to bring data into PowerBI and clean and transform it for analysis you’ll begin by learning about different data sources in PowerBI you’ll then learn techniques for importing the data lastly you’ll discover how to clean and transform data once you’ve imported your data you then need to organize it so that you can make sense of the information to generate insights so you’ll also review techniques for modeling data you’ll start by developing an understanding of basic data modeling concepts you’ll then learn how to use DAX in PowerBI to create calculations finally you’ll discover how to optimize the performance of a data model in PowerBI the ability to generate insights from your data is great but you also need to be able to communicate these insights that’s why you’ll also explore the techniques and tools used to create visual presentations of data you’ll begin by exploring visualization concepts and you’ll also learn how to create reports next you’ll learn how to ensure your reports contain navigation and accessibility elements you’ll then explore how to bring data to the user by managing access and creating dashboards finally you’ll review methods and techniques for identifying patterns and trends in your data another important skill you’ll require is the ability to make use of available PowerBI assets so you’ll also learn how to create use monitor and manage a workspace and you’ll discover how to manage share and secure data sets in PowerBI not only do you need to be able to visualize your data but it’s also important that you can use it to tell a story or narrative during this program you’ll explore how to design robust and compelling visualizations to communicate your data with stakeholders you’ll start by exploring key principles of design and the importance of narrative you’ll then learn techniques for designing report pages with powerful visuals and you’ll review design principles and techniques for dashboards you’ll complete a final capstone project where you’ll put your new skills to use by developing a PowerBI dashboard in the final course you’ll prepare for the PL300 exam by undertaking a practice exam this exam covers all the main topics of the Microsoft Certified Exam PL300 so it’ll also help you determine if you’re ready for the real thing once you complete the program it’s time to start exploring potential careers and don’t forget to share your Corsera Professional Certificate to get that extra advantage congratulations on your decision to become a data analyst and to help make sense of data for others now let’s get started have you ever faced the challenge of making decisions or providing insights based on large amounts of data this can be quite a daunting task especially if the data is difficult to read and understand fortunately you’ve come to the right place this course on preparing data for analysis in Microsoft Excel will equip you with the skills you need to work with large blocks of data and make it easier to read and understand data analysis is a process that involves defining the purpose of the data gathering cleaning and analyzing it to gain insights businesses often use data analysis to obtain usable relevant information that can assist them in making educated business decisions however this is usually done with large amounts of data that you need to cleanse transform and analyze you will often have to present this data in charts tables and graphs that provide relevant insights your data insights will help organizations to lessen the risks associated with making business decisions microsoft Excel can assist you in analyzing data for your business and you don’t need an IT related qualification to do this the preparing data for analysis with the Microsoft Excel course is designed for anybody that’s interested in learning about preparing data for analysis within a business context it also establishes a foundation for anyone striving to have a career in data analytics through data analytics in Excel you will be able to collect store and delve deeper into your business’s data you will also learn to harness the power of data using tools for sourcing gathering transforming and analyzing data now let’s go over a brief overview of what you will learn over the next few weeks to kickstart your learning journey you’ll discover the fundamental and essential Microsoft Excel elements and techniques for creating workbook content these techniques include entering formatting managing and adding data to worksheets you’ll then learn how to read large blocks of data and review the steps for sorting and filtering data in Excel next you’ll discover how to use formulas and functions to perform calculations in Excel then you’ll learn how to prepare data for analysis using functions you’ll explore functions that are used to clean or standardize text to prepare it for effective analysis you’ll then investigate the use of date and time functions in Excel so that you can complete actions like creating timeline information in a spreadsheet you’ll also review the logical functions like if and ifs and you’ll learn how to use these logical functions to generate content like data columns in the last module you’ll undertake a final project in this project you’ll create a worksheet with an executive summary of a business’s month-by-month profit margin performance compared to the previous year this project will help you prepare for the final capstone project at the end of this program finally you’ll have a chance to recap on what you’ve learned and focus on areas you feel you can improve on throughout the course you will encounter many videos that will gradually guide you toward a solid understanding of preparing data for analysis watch pause rewind and re-watch the videos until you are confident in your skills then consolidate your knowledge by consulting the course readings and measuring your understanding of key topics by completing the different knowledge checks and quizzes by the end of the course you’ll be equipped with the necessary skills to work effectively with data in Microsoft Excel good luck as you start this exciting learning journey the Microsoft PowerBI Analyst program is an excellent resource to start your career whether you’re a beginner or a seasoned professional looking to improve your skills data is the driving force behind this everchanging modern world shaping and developing industries and society it has transformed the way institutions operate from banks and hospitals to schools and supermarkets and for businesses data is everything it informs decisions and helps create value for customers content streaming services analyze data to decide what content to promote social media services analyze data to determine what products their customers are interested in and your local supermarket gathers and analyzes data to ensure the products you want are available the result of having all this data is that professional analysts are required to process and sort it to gain the insights that drive both the business and social worlds are you intrigued by this career field and wondering how to get started let’s meet two other students who have just begun their careers in entry- levelvel positions discover how and why they have chosen to embark upon career paths in this field with Microsoft and Corsera lucas a recent information technology graduate is currently searching for his first IT job he is eager to secure a position in the IT sector that offers good earning potential and a quick career progression he wants to work full-time in data analysis as he feels this career would offer both benefits during his degree he found working with and analyzing cloud-based data to be the most enjoyable element hence his focus on this career path lucas currently works shifts in a warehouse environment so he will need the flexibility of self-paced learning his earnings are low so he wants to achieve the qualification using the same basic laptop he relied upon as a student despite being a beginner Lucas has already mapped out his career and certification path and has enrolled in the Microsoft PowerBI analyst program he plans to apply for an entry- levelvel position as a data analyst once he has successfully completed the program and passed the PL300 exam as a data analyst he will inspect data identify key business insights for new business opportunities and help solve business problems amelia has been working as an administrative assistant in sales and marketing since leaving high school now that a few years have passed she is ready to embark upon a new career path in her current role Amelia has seen PowerBI reports and dashboards created by colleagues and shared with the team she was impressed at how the information was used to shape and focus the sales campaigns this sparked an interest in a career in data analysis amelia’s job requires her to work long hours so the ability to structure her own learning path is vital she also has a long commute so would like to access e-learning through her smartphone or tablet pursuing the PowerBI analyst qualification will showcase her dedication and help her apply for more senior roles in the department in the short term amelia doesn’t have a scientific background but she finds IT concepts logical and easy to understand so she’s embarking on the Microsoft PowerBI analyst program as it doesn’t assume a pre-existing high level of technical knowledge in the long term she hopes to secure an entry-level role as a PowerBI analyst as a PowerBI analyst she will be responsible for building data models creating data assets like reports and dashboards and ensuring data requirements are met you may be in a similar position to Lucas and Amelia and possess an interest in this exciting field of data analysis like them you can begin your career in this field by enrolling in the Microsoft PowerBI analyst program this will be the start of your new adventure good luck with your learning journey generative AI stands at the forefront of a transformative era reshaping our interaction with data and redefining the boundaries of creativity across diverse sectors this innovative tool utilizes sophisticated statistical techniques to generate content across text images and code empowering individuals and industries with remarkable capabilities in this video you’ll gain an understanding of the multifaceted landscape of generative AI exploring its vast capabilities industry implications and the career opportunities it presents before we get into more detail let’s answer the question what is generative AI examples of these models are generative adversarial networks or GANs and transformer models with these models generative AI can create outputs that closely mimic humanmade content using generative AI as an assistant can make a positive contribution across multiple industries for example imagine a trendy clothing store using generative AI to design unique patterns and styles based on customer preferences with GANs the AI could generate lifelike images of clothing designs enabling the store to offer personalized options to each customer this application not only enhances the shopping experience but also streamlines the design process illustrating how generative AI is reshaping industries through its creative capabilities now that you’re up to speed on what generative AI is let’s explore some of its capabilities across different functions firstly there’s text generation where generative AI models like generative pre-trained transformer or GPT can compose essays generate creative writing automate customer support and more imagine how generative AI can bring the store collection to life for shoppers effortlessly crafting engaging product descriptions captivating social media posts and personalized customer communication that mimics the tone and style of human interaction next there’s image creation generative AI can transform textual descriptions into stunning visual representations for the retail store this means converting text into realistic images of new apparel designs from elegant evening gowns to casual streetear providing the store’s creative team with endless inspiration and flexibility in bringing their vision to life this capability is revolutionizing fields such as graphic design video game development film production and marketing and branding where custom visuals can be created quickly and at scale with audio production the store’s marketing and branding department uses generative AI’s audio ability to synthesize speech compose music and create sound effects generative AI produces captivating audiovisisual content for advertising campaigns captivating audiences and enhancing brand visibility in addition to its applications in creative fields like fashion generative AI also showcases its capability in code generation imagine the retail store leveraging generative AI to optimize its online presents ai would aid the store’s programmers by suggesting improvements completing lines of code or even creating entire programs this would not only streamline website development but also enhance user experience ensuring seamless navigation and captivating visuals for online shoppers finally there is data synthesis in the fashion world staying ahead of the curve is crucial and generative AI aids the store in achieving just that it utilizes extensive data sets on fashion trends customer preferences and style influencers the store can conduct market research and analyze customer behavior ethically and responsibly by generating synthetic data sets that maintain statistical properties without compromising individual privacy this application is crucial for training more AI models where access to real data might be restricted or unethical so what are the industry implications of this emerging technology the deployment of generative AI across various industries indicates a major shift in operational dynamics in healthcare AI generated models can predict patient outcomes personalize treatment plans and automate administrative tasks in finance AI can manage risk assessment automate trading and personalize banking services the creative industry is seeing an explosion of innovation and inspiration as generative AI aided tools are contributing hugely to the fields of art music and literature pushing the boundaries of traditional creativity as AI evolves its impact on the workforce and industry standards will be significant the demand for AI knowledge is growing and learning to work with AI will be crucial for career advancement in all fields jobs that traditionally didn’t involve technology will start using AI tools more often this shift will require professionals in most fields to develop new skills and undergo additional training to effectively integrate generative AI into their work as a result educational programs and workshops focusing on generative AI and its applications are becoming increasingly important offering valuable resources for those looking to stay relevant and excel in their careers both businesses and individuals need to understand and adapt to generative AI’s capabilities to fully harness its potential generative AI is not just a tool for creating and automating it is a catalyst for innovation and transformation across all areas in this video you gained an understanding of the capabilities of generative AI and its implications for various industries you also explored some of the career opportunities it will create as we continue to explore and expand these technologies capabilities the opportunities for advancement and creativity are limitless welcome to the age of generative AI where everyone has the chance to redefine the boundaries of what is possible generative AI is transforming businesses today by gathering information and creating all kinds of content changing how businesses operate let’s imagine a renowned restaurant called Chef’s Table as chef Andre strives to innovate and delight his patrons with new dishes he turns to generative AI to enhance his culinary creations the technology behind this ability involves using models trained on huge sets of data to do tasks such as text generation image creation and even code synthesis in Chef Andre’s kitchen Generative AI acts as his trusty sue chef assisting him in developing innovative recipes crafting visually stunning presentations and even optimizing kitchen workflows just like Chef Andre relies on his sue chef to complement his skills and creativity generative AI compliments businesses by providing them with new insights ideas and efficiencies in this video you’ll explore the technical foundations and potential applications of generative AI in businesses like Chef’s Table you’ll also assess its limitations and examine the ethical considerations that arise when using it first let’s gain some insight into the technical foundations of generative AI it operates primarily through two types of models generative adversarial networks or GANs and transformer-based models guns involve two neural networks the generator and the discriminator working in tandem to produce highly realistic outputs these two components are known as the generator and the discriminator imagine the generator as a chef preparing a new dish and the discriminator as a food critic tasting it the chef the generator creates new dishes while the food critic the discriminator evaluates them if the critic cannot distinguish between the chef’s creations and dishes from renowned restaurants then the chef has succeeded this collaborative process results in the creation of highly realistic and refined outputs transformers used by models like generative pre-trained transformer or GPT and birectional encoder representations from transformers or BERT use attention mechanisms to create text that is contextually relevant and stylistically coherent attention mechanisms play a crucial role in the model’s functionality these mechanisms enable the model to focus selectively on various parts of the input data much like a chef carefully chooses the best ingredients for a dish this selective focus allows the model to highlight important information and maintain a clear grasp of the context imagine a chef who not only selects fresh ingredients but also keeps the recipe and cooking techniques in mind to craft a delicious and well- balanced meal similarly attention mechanisms ensure that the text generated by the model is coherent and contextually appropriate rather than a random assortment of words these technologies rely on deep learning needing a lot of computer power and data to train them how well a generative AI model works depends on the quality and variety of its training data which affects its ability to generalize new information without upholding biases so you’ve learned about the technical foundations of generative AI but what are its practical applications in various business functions in marketing and customer engagement generative AI can craft personalized content at scale from email marketing campaigns to dynamic web content think of this as a chef preparing a personalized menu for each diner based on their preferences creating unique and delightful dining experiences ai models can enhance engagement and conversion rates by analyzing existing customer data and tailor messages that resonate on an individual level additionally generative AI assists in optimizing operational efficiencies and logistics for instance AI can forecast demand trends simulate supply chain scenarios and recommend adjustments this is like a chef estimating the number of diners planning the menu and ordering ingredients to minimize waste and make customers happy this predictive capability enables Chef’s Table to make informed decisions reduce costs and improve service delivery in the area of human resources AIdriven analysis of job descriptions and applicant data helps streamline the recruitment process by generating and evaluating diverse job descriptions AI can attract a wide range of candidates potentially reducing biases often found in manual processes additionally generative AI can simulate training scenarios providing personalized learning experiences for employees think of this as a chef conducting cooking classes tailored to the skill levels and learning styles of each student ensuring everyone learns effectively another application of generative AI is document management and technical writing it can analyze extensive data sets of documents to learn and replicate the necessary formatting style and technical language specific to different business sectors for example AI models trained on legal documents can help to draft contracts that comply with current laws and regulations furthermore models trained on medical texts can help in preparing accurate clinical trial reports the technologies ability to understand and generate technical content is like Chef Andre mastering the preparation of complex dishes ensuring consistency and high standards without extensive manual effort one of the standout features of generative AI is its capacity to mimic specific writing styles this capability is particularly useful in marketing and customer communications where maintaining a consistent brand voice is crucial by training on a company’s historical communication data AI can generate content that aligns with the brand’s tone style and audience engagement strategies additionally it can adapt to different styles as needed much like a versatile chef who can cook various cuisines to cater to diverse tastes and cultural preferences finally the ability of generative AI to produce coherent and contextually relevant text has wide ranging application in business for instance it can generate product descriptions marketing copy or news articles with little to no human input significantly speeding up the content creation process moreover in customer service AIdriven chat bots can handle inquiries and provide responses in real time improving customer experiences and operational efficiency these applications demonstrate the potential of generative AI to take over repetitive and time-conuming tasks enabling employees to focus on more strategic activities much like a chef relying on a well-trained kitchen staff to handle routine tasks while focusing on creating innovative dishes despite its capabilities generative AI is not without limitations and may raise some ethical concerns the quality of output can vary significantly depending on the model’s training inaccuracies can emerge especially when the AI encounters data or requests outside its training scope moreover there’s the potential for AI to reinforce or amplify biases present in the training data leading to unfair outcomes or ethical dilemmas this is similar to a chef needing to ensure their ingredients are fresh and free from contaminants as any issue can affect the final dish ethical concerns that must be addressed include issues such as data privacy intellectual property and the potential for misuse therefore businesses must establish clear guidelines and ethical frameworks to govern AI use ensuring that AI generated outputs align with legal and moral standards think of it as a chef adhering to food safety regulations and ethical sourcing practices to ensure every dish is not only delicious but also responsibly made in this video you learned how generative AI offers substantial benefits across various business functions enhancing productivity decision making and customer engagement however to leverage this technology effectively businesses must understand its technical foundations potential applications and limitations you also gained insight into how responsible use of generative AI guided by strong ethical principles is essential to harness its full potential while reducing associated risks as businesses continue to integrate AI into their operations the focus must remain on creating value responsibly ensuring that AI solutions are deployed in a manner that is both effective and ethical like a master chef businesses must blend innovation with responsibility to create a successful and sustainable future picture a future where machines not only grasp our language but also craft it with remarkable finesse where creativity knows no bounds as artificial minds effortlessly generate images and ideas this isn’t the stuff of sci-fi dreams it’s the emergence of generative AI a tool that will complement and benefit us in both our work and our everyday lives to gain a better understanding of generative AI it is crucial to dive into its foundational technologies such as machine learning models and their architectural nuances let’s get started by exploring the distinguishing features of generative AI unlike traditional AI which typically focuses on analysis and classification generative AI is proactive in creating new content this shift from passive analysis to active creation is transformative especially in handling complex tasks such as natural language processing or NLP and synthetic image generation nlp enables machines to read understand and generate human language while synthetic image generation involves creating fake images using computer programs and algorithms it’s like a digital artist creating a convincing picture of a landscape they’ve never seen before the introduction of transformers a type of model architecture that relies on mechanisms called attention and self attention has revolutionized NLP models like Google’s birectional encoder representations from transformers or BERT and Open AI’s GPT series use these transformers they learn the relationships between words in a text but not in the usual order from start to end instead they can understand different parts of the text at the same time it’s like reading a mystery novel and being able to pick up on clues scattered throughout the book all at once this way of learning allows for more things to be processed at the same time making the training quicker and more efficient so those are some of the distinguishing features but what are the technical foundations of generative AI it primarily operates through two types of machine learning supervised and unsupervised in supervised learning models are trained on labeled data sets allowing them to learn a function that can map input data to desired outputs for example a model might be trained to generate text summaries by learning from a data set of articles paired with their respective summaries unsupervised learning on the other hand involves training models on data without explicit labels here the aim is for the models to discover inherent patterns and relationships in the data this approach is particularly beneficial for generative AI as it allows the model to learn to create content that is not bound by predefined labels enabling more innovative and adaptive applications next let’s take a closer look at some of the core technologies behind generative AI at the heart of its capabilities are neural networks particularly generative adversarial networks or GANs and variational autoenccoders or VAEs variational autoenccoders or VAEEs encode input data into a compressed representation and then decode it back to reconstruct the input the process involves optimizing the parameters of the encoder and decoder so that the output closely matches the input allowing the model to generate new data samples from learned representations language models are constantly evolving so it’s important to keep up to date with these advancements language models such as GPT3 and BERT demonstrate significant advancements in generative AI these models use transformer architectures which rely on self-attention mechanisms to process sequences of data like sentences in ways that consider the context provided by other parts of the sequence this is crucial for generating coherent and contextually appropriate text word tovec another critical technology involves vectorizing words into a geometric space where words with similar meanings are located close to each other this enables more nuanced understanding and generation of text based on semantic similarities rather than just syntactic rules generative AI has many business applications and can revolutionize several key areas let’s explore some in more detail firstly there’s content generation gpd models excel in generating written content by leveraging transformer architecture which allows them to understand context and generate coherent and contextually appropriate text these models are pre-trained on a wide variety of internet text and fine-tuned for specific applications enabling them to create highquality articles blogs and other written materials next is personalization the process starts with collecting user data from sources like websites apps and social media integrated data pipelines using tools like Apache Kafka or Google Cloud Data Flow consolidate this data in real time realtime analytics platforms such as Apache Spark streaming or AWS Kinesis process the data to extract insights which feed into a personalization engine that generates tailored recommendations content and communications these personalized interactions are delivered using APIs integrated with various platforms to ensure low latency responses edge computing technologies like AWS Green Grass or Azure IoT Edge process data closer to the user additionally there’s automation ai models trained on large data sets and using advanced algorithms automate these processes improving efficiency and reducing costs the technical backbone includes robotic process automation or RPA for executing repetitive tasks AI powered software tools for intelligent decision making and cloud services that provide the necessary scalability and support continuous learning and adaptation of the models this infrastructure ensures that AI systems remain upto-date and can handle increasing volumes of work effectively and finally innovation generative AI fosters innovation by simulating and modeling various scenarios to predict outcomes aiding businesses in developing new products and services with higher success rates this involves using advanced AI models for predictive analytics scenario planning and risk assessment including techniques like regression analysis time series forecasting Monte Carlo simulations Beijian networks and stress testing large data sets from diverse sources are processed using tools like Apache Hadoop and Apache Spark simulation tools such as digital twins and optimization algorithms are used to predict performance and find optimal solutions from what you have learned in this video it is clear that generative AI is a powerful tool that when leveraged responsibly can provide significant advantages to businesses by automating tasks personalizing customer experiences and driving innovation you’ve gained an understanding of how generative AI continues to evolve providing useful business applications as the technology continues to evolve it will likely become an even more integral part of the digital business landscape it’s no secret that generative AI has significantly transformed various job functions in the workplace from automating routine tasks to enhancing creative processes these systems use vast amounts of data to create new content make predictions and even make decisions despite its revolutionary potential generative AI is not without its pitfalls and shortcomings which raise several risks challenges and ethical considerations that must be carefully managed in this video you will gain further insight into these challenges and limitations but first let’s explore how generative AI can be integrated into different job functions in many sectors generative AI tools are employed to streamline operations and enhance productivity for example in roles such as content creation AI can produce drafts suggest edits and generate creative ideas which allows human workers to focus on more strategic aspects of their work similarly in software development AI can write code debug and even test software streamlining the development process and reducing time to market a significant shortcoming of generative AI was highlighted by the use of Open AI’s GPT3 in generating medical advice in one instance GPT3 was used to provide mental health support and it suggested to a simulated user experiencing distress to commit self harm this incident underscored the danger of relying on AI for sensitive tasks without robust safeguards the model generated harmful advice because it lacked the nuanced understanding and ethical judgment required in mental healthcare relying instead on patterns learned from its training data this example demonstrates the potential risks and severe consequences of deploying AI without adequate human oversight and ethical considerations these capabilities not only optimize efficiency but also offer significant cost savings and scalability for growing businesses however the integration of AI into these roles is not always seamless the reliance on AI can lead to job displacement as roles traditionally failed by humans become automated furthermore the quality of AI generated outputs can be inconsistent while AI excels in generating structured content it struggles with tasks requiring deep understanding or emotional intelligence often producing outputs that are awkward or contextually inappropriate earlier you learned that businesses need to adopt ethical considerations given the potential for bias in AI generated content since AI models learn from data they inherently acquire the biases found in their training data sets this can result in discriminatory practices such as favoring one demographic group over another when AI is used in HR for resume screening or job recommendations maintaining the privacy of personal data is a primary objective for businesses when using generative AI systems to interact with personal data care must be taken to ensure confidentiality and user privacy these systems can inadvertently expose sensitive information or even be used to generate deep fakes contributing to misinformation and potentially harming individuals reputations next let’s examine some of the challenges of reliability and accountability when using generative AI ai systems are notorious for their blackbox nature meaning the processes they use to reach conclusions are not always clear this lack of transparency can lead to reliability issues where businesses find it challenging to understand or predict the AI’s behavior this is particularly problematic in highstakes environments like healthcare or finance where unexpected AI decisions can have serious consequences accountability is another challenge when errors occur it’s difficult to determine responsibility between the AI developers the users and the AI itself this complicates legal and regulatory frameworks which are often illequipped to handle the novel implications of AI technology despite their advanced capabilities generative AI systems often lack common sense reasoning a basic human ability to make practical judgments about everyday situations ai can generate plausible sounding responses or content that upon closer examination is nonsensical or impractical this limitation is due to the AI’s reliance on pattern recognition instead of understanding underlying principles or contexts implementing generative AI in a workplace context involves various hurdles these include the technical challenge of integrating AI with existing IT systems the need for significant investment in technology and training and the ongoing requirement to update and maintain AI systems to adapt to new data or changing conditions additionally if an organization is resistant to change and its staff are doubtful about AI this can also make it harder to implement effectively to reduce potential harm and ensure ethical AI deployment it is crucial to adhere to guidelines like those set by major technology companies including Microsoft these guidelines emphasize fairness reliability privacy inclusiveness accountability and transparency organizations must commit to rigorous testing and auditing of AI systems to identify and correct biases protect data privacy and ensure that AI systems perform as intended without infringing on ethical norms in this video you’ll learn that while generative AI presents remarkable opportunities for transforming workplace operations and enhancing productivity its implementation must be approached with a nuanced understanding of its limitations and potential risks by prioritizing ethical considerations and responsible use organizations can harness the benefits of generative AI while mitigating its shortcomings this balanced approach is essential for realizing the full potential of AI technologies in a manner that respects human values and social standards at this point in the course you might view Microsoft Excel as a complicated software application or believe it’s only used for working with financial data however Excel is designed to be very userfriendly and can assist with many different types of data and tasks in this video you’ll discover Excel’s primary purpose and use cases and explore key parts of the software’s user interface including the command tabs adventure Works a multinational manufacturing company that produces and distributes bicycles and accessories globally needs to input some data into Excel to assist with this task the company has recruited you and your several new employees however before starting the task the company has decided to train you to use the software so that you can improve your experience with Excel this training will help you better manage and analyze the data required for the task at hand let’s begin by understanding what Excel can do for Adventure Works microsoft Excel is a software application that businesses use to store data like financial figures and create calculations based on this data users can interpret the data they store by creating visuals or using Excel’s built-in analysis features they can then use the insights derived from these interpretations to inform business strategies or influence decisions with Adventure Work’s vast product line and global presence Excel’s capabilities will be crucial in managing and analyzing its data efficiently before you can start using Excel it’s essential to understand how to navigate the software’s user interface and locate the features you need excel’s user interface is designed to be accessible and includes various elements that help you interact with the software effectively the first of these elements is the title bar it’s located at the top of the Excel window and displays the name of your file the search option and other essential features the worksheet is the primary area where you can input data into cells using either the keyboard or other input devices the command tabs are located below the title bar and provide quick access to Excel’s hundreds of commands which are organized in areas called tabs or ribbons to find the command you need click in the relevant tab to reveal the related commands let’s take a few moments to explore these features and discover how you can use them to input data one of the main areas of Excel is the grid this area contains the worksheet which is where you enter data or information it’s divided into rows and columns and you input information into cells where a column and row intersect just above the worksheet is the formula bar when you type information into a cell in the spreadsheet it appears in both the cell and the formula bar when you create a calculation the result appears in the cell while the formula that drives the result appears in the formula bar in other words the formula bar always shows the actual contents of the cell there is a green title bar at the top of the screen on the left is the autosave button in the browser version of Excel you can find the app launcher button here which you could use to access other Microsoft 365 programs the title bar also contains a useful undo button when autosave is turned on creating a new Excel document automatically assigns the name book to your new file you can view the file name within the title bar to rename a file select the title bar and type an alternative name file names can contain spaces and capital letters you can also use punctuation marks however it is best to avoid the use of punctuation marks as some characters are not permitted also file names can contain a maximum of 255 characters but it’s recommended that you use 31 characters at most you can select the same box to manage the location in which you store the file to the right of the file name is the search feature select the search box and then select find to open a dialogue box where you can search for content like text or figures in your files you can use the options choice in the bottom right of the dialogue box to refine and control Excel searches you can also search for a recent action you’ve applied to a cell next let’s explore the command tabs excel has hundreds of commands organized in storage areas called tabs or ribbons you can select a tab heading to view its ribbon and related commands let’s review the most frequently used tabs the home ribbon is the first ribbon that appears when you open a file it contains the most frequently used commands you’ll rely on for standard everyday tasks like formatting and sorting data you can use the commands on the insert ribbon to add different elements to a file like charts or comments the draw ribbon offers you drawing tools for marking your worksheet while the page layout ribbon lets you alter the appearance of a spreadsheet when printed the formulas ribbon contains commands that you can use to manage more complex calculations you can use the data ribbon to perform different actions with data such as transform query sort and filter operations adventure works are expected to work with large blocks of information and the data ribbons sort and filter commands are useful for these tasks you’ll mostly use the commands on the review ribbon once you’ve created a spreadsheet for example you can use them to manage security settings or collaborate with colleagues the view ribbon offers Excel users commands to make it easier to view large spreadsheets such as the freeze pane which keeps titles visible when moving through data blocks there are also extra tabs called contextual tabs that appear during specific actions or when certain items are selected for example if you add a bar plot to your worksheet then the chart design and format tabs appear on screen these extra tabs contain commands relevant to the tasks you’re working on this demonstration provided only a brief overview of Excel’s interface and it’s completely normal if you feel like you need more help with this information learning any new software requires time and practice so don’t worry if you don’t fully understand everything just yet as you continue through the course you’ll have more opportunities to explore these commands and features in greater depth and you’ll become more comfortable with Excel’s interface by learning about its key elements including the command tabs you’ve built a solid foundation of Excel’s primary purpose and use cases keep up the good work excel is a powerful tool for organizing and analyzing data but sometimes when you’re dealing with large amounts of information it can be difficult to make sense of it all that’s where formatting comes in in this video you’ll discover how to enter and format data in Excel to improve its readability adventure Works has created a list of its offices using Excel however important information is missing from these files it’s also difficult to read the data because it’s not correctly formatted let’s help Adventure Works to add and format its data the green cursor box is in the top leftand corner of the worksheet you can move the cursor by pointing and selecting on a cell the cell location indicator shows you where you are on the sheet you can also use the arrow keys on the keyboard to move the cursor as you type the entry appears in the cell and on the formula bar you can use the backspace key to delete any typing errors the office location is missing from cell C21 select on C-21 type Delaware and then press enter to confirm your entry the entry appears in the cell and formula bar the data lines up to the left of the cell to indicate that it’s text type the number 130422 and confirm it in cell E21 the entry sits to the right of the cell in Excel text aligns to the left of the cell and numbers to the right excel treats an entry that contains both letters and numbers as text you can also manually set the alignment with the alignment buttons on the home ribbon excel also offers an autocomplete feature as a shortcut for entering data for example column D already contains several instances of the word partner so if you type the letter P in cell D21 then Excel suggests the word partner as a possibility press enter to accept the suggestion you can also ignore it by continuing to type an alternative word next New Jersey needs to be added type the word new in C16 this prompts an incorrect suggestion so you must type New Jersey in full now if you type new in C17 Excel waits to see what letter is typed next before suggesting a word because there is more than one entry beginning with new in the browser version of Excel you’ll be presented with a drop-own list of multiple suggestions from which you could select New Jersey column C contains state names this results in a floating dialogue called convert to geography to appear select in the dialogue to instruct Excel to recognize text entries as geographic locations you can select on the card symbol to the left of the entry to interact with Bing to generate information about the location keep in mind that if you print your worksheets the card symbols beside the entries will appear on the print like other Microsoft 365 apps Excel has an undo feature in the desktop version this feature is located on the title bar in the browser version it is located to the left of the home ribbon select the undo feature to reverse recent actions in this case you’ll remove the geographic locations tag and return the entries to normal text the next action is to type New York in full in C18 autocomplete has no suggestions as New York hasn’t appeared in the column before a different shortcut called autofill can be used to add New York to C19 and C20 with the cursor still on C18 position the mouse pointer over the bottom right hand corner of the cell the pointer changes to a narrow black cross now hold down the mouse button and drag it down this action autofills the entry into the cells underneath now that you’ve entered the data in your spreadsheet you need to format it formatting data makes it easier to read and correct formatting on numeric entries prevents misunderstandings here the numbers in E2 and H21 are financial data to make this clear highlight the numbers by selecting all the data from E2 to H21 then select on the currency button in the number group the currencies are available on the drop-own menu alternatively you can use the comma format to display a comma separator and two decimal places you can use the increase or decrease decimal buttons to customize the number of decimal places the percentage button is both a format and an action button it adds the percentage symbol and it also multiplies the cell content by 100 select undo to reverse this the dropown above these buttons presents other number formats these formats include dates as dates are treated as numbers in Excel your next task is to format the column titles so that they stand out type the heading state code in B1 the text overflows into the adjacent empty cell once you add state in C1 two characters of the B1 heading are masked however the formula bar confirms that the whole heading is still there the column’s title is partially hidden you need to make the full title visible from the home ribbon choose wrap text to stack the words in the cell you can also format a heading to stand out using font options in this example the size of the heading has been increased to Calibbri 12 and a blue background color has been applied you can also center the heading using the alignment section of the ribbon another Excel shortcut is the format painter which is found on the left of the home ribbon this shortcut copies format settings from one cell to another select in the format painter to display a paintbrush and copy B1’s style then highlight A1 to H1 to paint those cells with a copied format this action also copies the wrap text and center alignments you should now be familiar with the different methods and shortcuts you can use to enter and format data in Excel this video also demonstrated how this knowledge can be applied to help Adventure Works complete and format their Excel sheet great work reading and editing the contents of a large spreadsheet with hundreds or even thousands of data entries can seem like a large task thankfully Microsoft Excel offers several features and keyboard shortcuts that help you navigate and edit your spreadsheets over the next few minutes you’ll explore these features and shortcuts and learn how to use them adventure Works has sent you a large inventory file they need you to check the current information in the file and add some new data there are over 100 entries in the file to navigate through however you can quickly review these entries and add new ones through Excel’s navigation features and keyboard shortcuts there are several useful navigation and editing features available in Excel the freeze panes feature for example keeps an area of the screen static you could use it to freeze a specific row the static area remains on screen while you scroll freely through the other content you can use the new window option to open a second viewpoint of your file with this feature you can keep one part of the file within view as you work in another area name box is another useful Excel feature the name box is the title of an area located between the ribbon and the worksheet to the left of the formula bar when you type a cell reference in this box and press enter the cell cursor moves to that position on the sheet the name box can also be used to assign a name to a cell finally there are also several keyboard shortcuts that you can use to speed up the navigation and editing of a spreadsheet let’s discover more about how these features and shortcuts operate by helping Adventure Works first you need to freeze key rows to give yourself a more efficient view of the data from the window group of the view ribbon you can access several options two of these include freeze panes and new window select the freeze pane drop-down to view three choices freeze panes freeze top row and freeze first column select freeze top row to turn the row currently visible at the top of the screen static be aware that row one isn’t always the top visible row a horizontal line appears under the top row to indicate the static area the selected frozen row remains static while the other rows below it scroll off screen you can also select freeze first column to turn the first column currently visible on screen static in this case it’s the category column again the first column column A isn’t always the one that becomes static selecting the freeze first column option automatically turns off the freeze first row option once you’ve frozen an area of the screen the first choice in the freeze panes drop-down menu changes to unfreeze pane select the unfreeze pane to release all static areas on screen what if you need to freeze the screen in two directions at the same time for example to help Adventure Works view its worksheet more clearly you need to make sure that all row titles and the data in columns A and B are visible to do this you first need to select on C2 to move the cursor to that position then in the freeze panes dropdown select the freeze panes option once this option is selected Excel identifies the cursor position and freezes everything above and to its left your cursor is currently on C2 so Excel freezes columns A and B along with row one again you can use the unfreeze panes option on the freeze panes dropdown to release all areas of the screen you must also have the totals in row 152 available on the screen while editing other areas of the spreadsheet you can use the new window command to open another view of the file in a new window this window isn’t a separate copy of the file it’s just a different view of the same file with both views visible you can now review the totals data in row 152 while editing the cells in other areas of the spreadsheet to close this second view just select the X in the top right hand corner of its window you can also move quickly around the worksheet using keyboard shortcuts let’s take a moment to explore some keyboard shortcuts available to Windows users press control and home to jump to cell A1 at the top left of the worksheet if on the freeze panes top row choice is turned on the cursor will instead jump to cell A2 but what if you need to move to the end of your work to continue data entry press control and end to move the cursor to the last cell in the worksheet that contains content rather than simply moving the cursor hold down the shift key while pressing either the control and home or the control and end combinations excel selects the entire block as it moves the cursor you can also use the name box to move quickly to specific cells the name box is located to the left of the formula bar the box typically displays the cell reference for your cursor’s current position however if you type a different cell reference and press enter your cursor jumps to the specified cell the name box is also a useful method for assigning names to cells a cell name helps users to identify data content since it’s more descriptive than just a cell reference adventure Works needs you to rename cell 152 to units in stock so position the cursor on the cell then in name box type the text units underscore in underscore stock and press enter cell names must be unique and cannot contain spaces you can use the underscore symbol to substitute for spaces if the cell is referenced in a calculation its name and reference are visible you can view the name from the drop-own list in the name box you can check which cell the name is assigned by selecting the name manager on the formula ribbon in the dropown select the cell name to move the cursor to the cell you can use these same steps to view and access this cell from any sheet in the workbook for example from the products two sheet selecting the units in stock cell name from the name box dropdown brings you back to that cell on the products one sheet you should now know the Excel features and shortcuts to help you navigate and edit spreadsheets you can use these tools to assist you in any Adventure Works Excelbased assignments well done have you ever opened a Microsoft Excel worksheet only to find the content structure difficult to interpret perhaps it contains irrelevant entries or needs too much scrolling to navigate in this video you’ll learn how to use Excel’s sort and filter features to organize content so you can read and identify data quickly and efficiently over at Adventure Works the company checked its inventory data for records related to a specific supplier however the Excel file that contains the data is poorly structured and difficult to navigate adventure Works needs your help to sort and filter the information so that only the suppliers data is visible before you begin helping Adventure Works let’s examine the concepts of sorting and filtering in Excel excel offers users a series of sort and filter commands these commands change the position of data in the worksheet window so that it’s easier to understand in other words they don’t change the data they change how it’s displayed it’s also important to remember that the sort and filter commands are not the same they work on data in different ways you need to understand these differences to prevent any misreading of the data let’s begin with the sort feature the sort feature is found in the sort and filter group in the data ribbon this feature reorders the worksheet by physically moving rows into new positions to return the data to its original position you must use the undo command however if a sort was not your last action you may inadvertently reverse other steps you should also be careful if saving your workbook after applying a sort once your changes are saved the sort order applied to the data is permanent and an undo is no longer possible now that you’re familiar with the sort feature let’s focus on filtering filtering refineses the data displayed based on the criteria of your choosing however unlike with sort the rows are not repositioned instead Excel hides all the rows that don’t match your chosen criteria this leaves a subset of rows visible this subset can be reduced further by applying more filters let’s learn more about how these actions work by helping Adventure Works restructure its inventory Excel file the Adventure Works inventory Excel file is currently sorted by category you need to restructure it using the sort and filter commands access these commands from the sort and filter group in the data ribbon the sort ascending and sort descending commands are shortcut choices when you select one Excel checks the location of your cursor it then uses the column in which the cursor is located as the key for the sort place your cursor on column B which is the date entered column then select sort ascending which is now called oldest to newest the rows are now organized in date order excel interprets dates as numbers so it has performed a numeric sort had you placed the cursor in the supplier column Excel would have performed a textbased sort you can select undo on the title bar to restore the previous row order adventure Works has requested that the data be sorted by supplier the data in column D and that the most recent entry is visible first within each block of supplier data sorting by the supplier and then sorting by the date won’t work here because one sort would cancel out the other instead you need to perform a multi-level sort this technique lets you sort data in two ways simultaneously first from the sort and filter group of the data tab select the sort button to open a sort dialogue box at the top right of the dialogue box you need to confirm that there’s a tick in the my data has headers box this instructs Excel to exclude the first row from the sort next use the drop-own menu under column to instruct Excel to perform the first sort by supplier you can retain the defaults of sort and sell values and sort A to Z then select the add button to display additional sort fields use these fields to configure the second sort level by data entered again retain the defaults of sort and cell values but change the order to newest to oldest then select okay to exit the dialogue box and sort the data as required you have now sorted the data by supplier and date entered select undo on the title bar to reverse the sort next Adventure Works needs you to filter the records to view only the data related to the supplier called Cycles the first step when filtering is to turn on the filtering feature select the filter button on the sort and filter group of the data tab to add filter arrows to each column heading you can now filter the data using the arrows next to each heading to open drop-own lists each filter arrow also has an additional submen to allow for more precise filtering excel recognizes the type of content in the column and generates contextsensitive choices such as equals does not equal begins with and more select the arrow next to the supplier column heading to display a list of suppliers a tick mark beside an entry indicates that its rows are currently visible remove the tick marks next to list entries except for cycles as then select apply excel hides all other rows in the worksheet so that only the cycles as data is visible there are now only 10 rows visible in the sheet all of which relate to cycles as you can confirm this by checking the bottom left of the Excel screen here it states that 10 records were found select the arrow next to the unit price to apply another filter from the drop-down put a tick in the box to the left of item seven then select the apply button the filter only works on the 10 visible records so you have now displayed only rows where cycles as is the supplier and seven is the unit price you might ask yourself how do I know if data has been filtered in Excel there are two ways to determine if data has been filtered the first is to check the filter arrow to the right of the column heading if there is a funnel symbol on the filter arrow then your list is filtered the other method is to check for breaks in the sequence of row numbers on the left hand side of the display area for example a row sequence of 8 9 112 indicates that rows 10 to 111 have been filtered out so how can you remove filtering to make other data visible again in the column header select the arrow or arrow and funnel symbol then select the clear filter option from the drop-own menu to clear a specific filter while retaining the others you can also select the clear choice in the sort and filter group of the data tab to clear all filters you’ve now removed all filters and restored the full data display thanks to your help Adventure Works has the inventory data it needs and you should now be familiar with using the sort and filter actions to organize and identify data quickly and efficiently well done congratulations on reaching the end of the first week in this course on preparing data for analysis with Microsoft Excel in this week you explored the fundamentals of Microsoft Excel by learning how to create workbook content and work with blocks of data in Excel let’s take a few minutes to recap the key skills you gained during this week’s lessons you began with an introduction to the program in which you discovered what topics you will learn about as you progress through the different courses you were also given guidance on how to be successful in this course this guidance included helpful tips on how to structure your study and ways in which you can approach the learning material you were then introduced to other learners in a meet and greet session during which you explained why you’re taking this course and what you hope to achieve from it finally you explored a list of valuable resources you can use to succeed in the course in the second lesson you learned how to create workbook content you began this module with an introduction to Microsoft Excel you developed an understanding of the importance and function of the application including how it’s used in everyday business to store calculate and gain insights from data you then learned how to navigate Excel using its user interface or UI the UI is comprised of three key areas there’s the title bar which contains the name of your file the search option and other primary features the worksheet is the main area used to input data into cells and the command tabs provide quick access to Excel’s commands which are organized in areas called tabs or ribbons you then learned how to enter and format data in Excel you explored the different ways data can be added to a worksheet you discovered how to use formatting to improve the readability of a spreadsheet and you reviewed keyboard shortcuts for data entry and formatting next you learned how to manage worksheets you then undertook an exercise where you demonstrated your new skills by adding data to a worksheet this was followed by a knowledge check which tested your understanding of the material finally you explored additional resources to enhance your learning in the third and final lesson of this week you focused on working with blocks of data in Excel you began the lesson by learning how to read large data blocks in Excel you explored Excel’s navigation and editing features such as the freeze panes feature the new window feature and the name box feature and keyboard shortcuts you then developed an understanding of the concepts of sort and filter you learned how to identify the key differences between both and you learned how to sort and filter data in Excel so that you can organize and identify data quickly and efficiently you then explored different methods for sorting data in a worksheet including alpha numeric sort and the multi-level sort feature and you discovered how to use the filter feature to control data visibility in a worksheet next you undertook an exercise in which you demonstrated your new skills by sorting and filtering data in a worksheet this was followed by a knowledge check and module quiz both items tested your understanding of the material by presenting questions focused on the key concepts you explored you should now be familiar with the fundamentals of Microsoft Excel you should be capable of creating workbook content and using different methods for working with blocks of data great work i look forward to guiding you through the lessons next week in which you’ll learn how to use formulas and functions in Excel analyzing data often involves making calculations however when working with large blocks of data calculations can quickly become confusing luckily Microsoft Excel can calculate numerical information using formulas you can solve real life data analysis problems in Excel with a little bit of planning and some basic math over the next few minutes you’ll learn how

Excel processes calculations and how to create a formula using the correct syntax over at Adventure Works the accounting staff are amending a spreadsheet that records orders placed with suppliers their first task is to update the prices and order amounts they need to work out the purchasing cost by creating a calculation in the data but first they need to understand how Excel reads interprets and implements calculations let’s take a few minutes to explore formulas and calculations and then help Adventure Works a formula in Excel is a calculation performed on the values in a range of cells in your worksheets examples of these calculations include addition subtraction multiplication and division once the calculation is completed the formula returns a result even if it’s an error now that you’re familiar with what a formula is let’s find out more about how they work all formulas begin with an equal sign it is then followed by a calculation or function formulas can contain numbers or cell references for example this formula instructs Excel to add the values in cells A1 and B1 excel usually reads the formula from left to right characters are used to indicate the type of calculation Excel should perform the plus character is used for addition and the minus character for subtraction the asterisk is used for multiplication and the forward slash character is used for division the formula bar shows the formula in the cell you are working in the worksheet shows the result of the formula in the formula bar this is important to take note of when you are creating or working with calculations a formula can also be static or dynamic a formula containing fixed numbers will be static and always generate the same result for example the formula in E2 is static because it contains specific numerical values it will not update if any of the monthly figures in cells A2 B2 or C2 change on the other hand a formula that contains cell references is dynamic based because Excel always uses the current value in the cell the formula in E3 is dynamic because it includes cell references a formula can also include a reference to a cell which itself contains a formula this creates a chain of calculations for example the formula in E1 refers to cell C1 cell C1 also contains a formula that calculates the data in cells A1 and B1 if the values in cells A1 and B1 change then the formulas in cells C1 and E1 will both change in other words a change at one end affects all other formulas in the chain a formula can also refer to a cell in another sheet this reference must include the worksheet name followed by an exclamation mark this other worksheet can be in the same workbook or in another Excel file references to cells in other workbooks are called links or external references the formula in this screenshot references the product sheet within the same workbook for example this formula states that what is in this cell is equal to the contents of H2 in the product sheet plus the contents of A1 in this sheet now that you’re familiar with the basics of a formula let’s view it in action by helping Adventure Works determine the cost of the items it’s ordering from its supplier begin by positioning the cursor on K3 which is the cost column this is the cell where the results should be displayed then type an equal sign to determine the cost of the order you need to multiply the contents of I3 the unit price by the contents of J3 the number ordered select cell I3 to add that reference to the formula the equal sign and the cell reference are displayed in both the result cell K3 and in the formula bar next type an asterisk symbol to represent multiplication then select cell J3 this reference is colored red on the formula bar and the cell is highlighted in red press the enter key to complete the formula this creates a result of 79,050 which is now visible in K3 adventure Works decide to make a change to its order it wants to reduce the number of units that it ordered by 250 so how can Adventure Works update the formula with this new information amend the figure in J3 and press enter this causes the formula in K3 to recalculate and generate a new result of $65,875 if you double click on a cell such as K3 this opens edit mode while you’re in edit mode Excel places colored highlights around the cells referenced in the calculation it’s easy to begin to edit a cell accidentally with a double click if the cell contains a formula particularly one you didn’t create this can be a little worrying pressing the escape key is a safe way to cancel an edit without amending any of the information within a cell you have explored how calculations in Excel can be useful in data analysis by now you should know how Excel processes calculations and how to create a formula using the correct syntax you will learn more about formulas as you progress in your learning journey well done microsoft Excel doesn’t just store data it also assists with calculations a fundamental component of Excel and data analysis so it’s important that your calculations are correct and reliable in this video you’ll learn how Excel processes calculations discover how to construct the syntax for calculations and edit your syntax to avoid errors jamie at Adventure Works is working on a purchase sheet it has been updated to include information on new orders placed with suppliers she now needs to create calculations that correctly display the difference between purchasing costs and sales amounts the formulas she creates will contain a mixture of multiplication and subtraction and she needs to be confident that those operations are happening in the correct sequence let’s take a few minutes to explore how these formulas work beginning with operators the symbols that are used to indicate mathematical actions in Excel are known as operators operators are used for actions like addition subtraction multiplication and division for example you can use operators to add the values of two cells together or divide the value of one cell by another when working through a formula Excel does not always calculate the expressions or steps in a formula from left to right excel handles the operators in a calculation according to a key mathematical principle called the order of precedence the order of precedence assigns greater importance to some of the mathematical symbols over others this means that Excel calculates formulas according to the hierarchical position of each symbol within the order of precedence don’t worry if you don’t fully understand what the order of precedence is this is covered in a later reading in terms of importance Excel tries to process division and multiplication symbols before addition and subtraction however you can control how Excel executes calculations by using parenthesis in your formulas this is a key technique in creating formulas that generate reliable results parenthesis instruct Excel as to which part of a calculation must be executed first even if this would contradict the order of precedence let’s explore the use of parenthesis in formulas you want Excel to add the numbers two and three together and then multiply the subtotal result by 4 so you type this formula as equal sign 2 + 3 * 4 however Excel will not process this calculation left to right instead Excel will first multiply 3 by 4 which gives a result of 12 it will then add two giving a formula result of 14 this is because the multiplication symbol has a higher priority in the order of presidents adding parentheses to the calculation allows you to instruct Excel to do this bit first so you could rewrite your calculation by placing part of the formula in this instance 2 + 3 in parenthesis now you’ve directed Excel to add 2 and 3 as its first step and then multiply the result of that addition by four the result of this calculation would be 20 and not 14 as it was previously it is important to have a clear understanding of where to put parenthesis in a calculation placing parentheses in the wrong position in a formula or not including them at all could change how Excel understands and implements the calculation an incorrect calculation result may not always be obvious as it may seem plausible there are also times when you may need to reproduce cell entries and formulas within a worksheet when a formula is copied it is important to consider the appearance of the cell references there are two ways a cell reference can appear in a calculation these are relative and absolute a relative cell reference means that if you copy a formula to a new cell Excel will adjust the row numbers or the column initials in the cell references to update the formula relative to its new location this ensures that the formula is correct for the row or column it has been copied to for example the formula in K3 which reads equal sign I3 multiplied by J3 is copied down using the autofill feature excel adjusts the cell references for each row but what if a cell reference needs to say the same when the formula is copied elsewhere for this to happen you must make the cell reference into an absolute reference when Excel copies a formula it keeps absolute references constant and does not adjust them for example if the formula in L3 is copied down through the column then the reference for the cell that contains the exchange rate needs to say the same when the formula in L3 is copied down the K3 reference in the formula will adjust to include a different row number however the N2 reference in the formula should not change since the exchange rate is only mentioned in that one cell to make a cell reference absolute add a dollar sign before the column initial and before the row number this instructs Excel to keep the cell reference constant during the copy operation this means that all copies of the formulas will contain the original cell reference don’t worry if you find these concepts difficult to follow you’ll explore how to control calculations in more detail in a later video there are also additional resources available at the end of the lesson excel will also recalculate and update all formula results when a file is opened files that contain a lot of complex calculations will be slower to open fully on screen than ones that only contain data fortunately you can turn the automatic recalculation feature off just remember to switch it on when you are done working with the file to change the recalculation mode select the calculation options dropown on the formulas ribbon then on the dropown select the recalculation mode you need for your file well done you now know how to control how Excel works through the steps in a formula you’re also able to identify the correct syntax to use if calculations are going to be copied elsewhere in the spreadsheet great work a Microsoft Excel formula can be complex and include many steps in this video you’ll explore the correct syntax for Excel calculations that contain multiple steps and discover how to adjust a formula to ensure that it copies a calculation correctly amy at Adventure Works is preparing a price quote in a worksheet for the client Kontoso Bikes the client wants to order bicycle parts for their retail outlets let’s find out more about how Amy can control her worksheet calculations to ensure that the prices are correct for the client amy has already listed the required items and their respective prices adventure Works are offering a 10% discount to the customer adventure Works charges different prices for delivery based on the region that the customer outlet is in contoso Bikes has four retail outlets two in region A and two in region B the spreadsheet also shows data for region C however this region is not the focus of this video amy must ensure that two different delivery rates are used in her formulas let’s help her create calculations firstly cell G6 must show the result of the cost per unit multiplied by the quantity ordered position the cursor on cell G6 and type an equal sign to begin the first calculation select cell E6 and type a star from multiplication then select F6 press enter to complete the calculation and generate the subtotal next Adventure Works needs to calculate the client’s 10% discount select cell H6 and type an equal sign select the subtotal amount in G6 to work out the 10% amount you need to divide by 100 and multiplied by 10 add the forward slash symbol for divide and type 100 then add the star symbol for multiply and type 10 excel processes these calculations from left to right it first divides the figure in G6 by 100 and then multiplies the result by 10 press enter to get the discount figure now you need to work out the total cost excluding delivery select cell I6 and type an equal sign then select G6 to select the subtotal and type a minus symbol to subtract the discount select cell H6 to select the discount however before pressing enter to complete the calculation there’s another step to consider this order needs to be duplicated for each of Ktoso bikes four outlets so the total cost excluding delivery needs to be multiplied by the value in cell I2 to calculate this type a star select cell I2 and press enter but something has gone wrong with the result of this formula because the total amount is less than the subtotal select I6 to return to edit mode in your formula the multiplication operator has higher priority or precedence than the minus operator in other words the multiplication operator is higher in the order of precedence so Excel takes the discount in H6 multiplies it by the value in I2 and then subtracts that value from the total to work around this add an opening parenthesis before G6 and a closing parenthesis after H6 this ensures that Excel processes the subtraction operator before the multiplication operator press enter to execute the formula and generate the correct value next you need to calculate the total amount if it is to include the cost of delivery remember there are two different prices for delivery one price for each region so there must be subtotals in this formula the formula in the cell also requires a mixture of addition and multiplication symbols so you need to use parenthesis to work with the order of precedence select cell J6 and type an equal sign select I6 to include the total cost if excluding delivery then type a plus symbol type an opening parenthesis and the number two add a star symbol and then select cell M2 include a closing parenthesis type another plus symbol add an opening parenthesis a number two and a star select cell M3 type the closing parenthesis press enter to calculate the result the total cost when delivery is included is $22,930 amy now needs to calculate these same costs for all the remaining categories in the worksheet you could help her by using the autofill feature to copy the formulas that you’ve created to save time however some cell references will need to be made absolute to prevent the autofill process from changing them select cell I6 type a dollar sign in front of the letter I and another dollar sign in front of the number two press enter the formula in J6 also requires a dollar sign this time instead of typing out each dollar sign let’s use a shortcut method enter edit mode on cell J6 position the cursor on the M2 reference this is the region A delivery charge press the F4 key on the keyboard to bring up the dollar signs repeat this action for the M3 reference the region B delivery charge then press enter to complete the formula it’s now safe to use autofill to copy these formulas as the required cell references will remain absolute position the cursor on G6 a shortcut for autofill is available because there is a block of data to the left position the mouse pointer on the bottom right hand corner of the cursor so that it becomes a black cross then double click the mouse button excel uses the block of data to the left as a reference and copies the formulas down to G15 repeat this process on cells H6 I6 and J6 to complete the worksheet you have now helped Amy to calculate the various costs for Kontoso bikes orders you should now be able to recognize situations in which you need to adjust the syntax in a formula to control how it’s processed in Excel you’ve also learned some useful shortcuts for absolute references and autofill these shortcuts will help you to work more quickly and efficiently on your worksheets at this stage of the course you should be familiar with creating and working with formulas but you don’t always have to create your own formulas as you’ll soon discover Excel offers predefined formulas called functions that you can use to perform calculations in this video you’ll discover what function formulas are explore their syntax and learn how to use them to perform calculations over at Adventure Works the company is approaching the end of its financial year lucas in accounts has been tasked with calculating the total quarterly sales for each regional sales team you can help Lucas carry out this task using Excel function formulas but first you need to learn what functions are and recognize their syntax let’s begin by defining a function a function is a predefined formula that performs a calculation based on values specified by the user for example a simple function could total the values in two cells or a more complex function could calculate repayments on a bank loan functions are useful because they allow for more complex calculations they also facilitate dynamic content that responds to changes in the worksheet excel contains many built-in functions these built-in functions are grouped into different categories which can be accessed from the formulas tab or ribbon there are several categories visible when you access this ribbon select the more functions option to view the others these categories are organized so that you can locate the functions most relevant to your day-to-day requirements for example Excel offers functions for financial date and time and math calculations you’ll explore each of these categories in more details as you progress through the course you can also refer to the Microsoft page Excel functions by category article link in the additional resources so now that you know what a function is let’s explore its elements the first element of a function formula is the name of the function this takes the form of a single word such as sum the sum function adds all the values within a selected range of cells the second element of the formula is the arguments as you’ve just learned a function calculates data this data or information is referred to as an argument the data it accepts is also custom you can add your own information to the formula to direct and control the action of the function it’s important to remember that each function requires a different list of arguments some arguments are mandatory a function can’t carry out its task without them however other arguments are optional they exist to provide different choices around additional elements like formatting your results so how do you construct a function formula like any other calculation a function formula begins with an equal sign you then need to write the function name for example equals followed by sum the next step is to write the arguments arguments are contained within a pair of parenthesis so begin by typing an open parenthesis then list the arguments as an example you could follow a sum function with the argument open parenthesis C2 colon C4 make sure to separate arguments from one another using characters such as commas or colons instead of spaces or periods when you finish typing your arguments end your function formula with a closing parenthesis you now have an argument that instructs Excel to add all data in cells C2 to C4 when executed this formula returns a result that calculates the values within this cell range function formulas can contain more complex arguments but this simple example is a great starting point to help familiarize you with the syntax now that you know how to construct a basic function formula let’s make use of your new skills and help Lucas create a sum function to obtain the totals for Adventure Works sales figures adventure Works sales data is contained in an Excel workbook called annual sales totals the workbook contains a worksheet called sheet one this sheet contains five columns the first column lists the months of the year one month per row the other four columns contain the names and data for each regional sales team each column contains 12 sales totals one for each month let’s begin by calculating the sales totals for team A first you need to place the cursor on the cell where the result of your function must appear place your cursor on cell B14 underneath the sales data for team A this is the cell where the overall sales total must appear now you can write your function first type an equal sign then type the name of your function in this instance you need to add the data so you can use a sum function function names are not case sensitive you can type them in upper or lower case once written Excel displays them in uppercase as you type the word sum a list of suggested functions appears this list is a useful shortcut for accessing functions quickly but for now you can continue typing the formula now that you’ve stated the name of your function you need to outline your arguments type an open parenthesis a floating help message appears with argument prompts if the prompt is in bold then the argument is required if the argument is in square parenthesis then it is optional in other words it’s not required for the function to work in this instance you’re writing a custom argument type B2 colon B13 then type a closing parenthesis to end your argument the sum function and your custom argument instruct Excel to calculate or add numeric total of all data in cells B2 to B13 just like the example you explored earlier press enter to execute the function the result shows that team A sales total for the year was $971,000 now that you’ve calculated the sales total for team A you can copy the function formula to the other cells in the row using the autofill shortcut select cell B14 position the mouse pointer over the bottom right hand corner of the cursor to turn it into a black cross hold down the mouse button and drag the cursor to the right as far as cell E14 as it copies the data from cell to cell excel also adjusts the formula to total the cells in each column for the remaining teams lucas now has the sales totals for each of Adventure Works sales teams thanks to your help Lucas successfully created the function formula he needed to complete his task and having assisted Lucas you should now know what functions are be able to read the syntax of a function and know how to use a function to perform a calculation creating a formula with a function for the first time can be intimidating how many arguments does it require what’s the correct syntax thankfully Excel offers a useful insert function tool that provides a framework for creating a function formula in this video you’ll explore the insert function tool and function categories and learn how to create a function over at Adventure Works the company is busy calculating the annual sales total for each regional team the sales data is contained in a worksheet called sheet one the worksheet lists all four teams and their respective sales totals for each month let’s help Adventure Works calculate each team’s total sales using the insert function feature begin by positioning the cursor on cell B14 this is the cell in which your sales total must appear for team A now you can access the insert function feature there are two ways to open this feature the first is by selecting the insert function button on the left hand side of the formulas ribbon or you can select the insert function option on the worksheet screen to the left of the formula bar selecting either one of these options opens the insert function dialogue box in the middle of this dialogue box is a list of functions you can navigate through these functions using the scroll bar however this is a brief list that doesn’t contain all available functions above this list is a drop-own box with the heading most recently used to the left of this dropdown is a prompt called or select a category because the category choice is set to most recently used the list underneath contains functions that you’ve recently used in your worksheet formulas as you work through Excel you’ll most likely make frequent use of the same functions over time this list will populate with your most used functions providing a useful quick access shortcut you can select each function in the list to display a short description of its purpose in the bottom left of the dialogue box is a blue hyperlink called help on this function this is a contextsensitive link select it to visit the help page for your selected function on the Microsoft support site if your required function isn’t on this list then select the drop-own arrow to the right of most recently used you can select another category to open a different list of functions for example you need to use the sum function to complete the calculation task for adventure works you can access a sum function from the math and trigonometry category when you select this category the list of available functions changes you can learn more about which functions correspond to which categories in the additional resources remember that you can select a function for an explanation of what it does or you can highlight a function name in the list then select the blue help hyperlink for more detail the function list is arranged alphabetically so scroll down to the S section select sum and then select okay this action opens another dialogue box called function arguments there are two boxes at the top of this dialogue labeled number one and number two respectively notice that the text number one is bolded this indicates that an entry is required here the text number two is not bolded which indicates that it is optional however you might use it in a situation where you require a total for blocks of numbers at separate locations in the spreadsheet in the adventure worksheet Excel has identified the block of numbers directly above your cursor position so it’s suggesting that you include the cell range B2 to B13 in your total in the background on the formula bar Excel has already constructed the calculation for you it has included not only the cell references but also the equal sign the parenthesis and the colon if Excel has suggested the wrong block of cells then you can select the navigate button to select a different range or edit the formula the navigate button is an arrow pointing upwards at the right of the number one box selecting this arrow temporarily collapses the dialogue box and returns you to the spreadsheet so that you can change the selection the navigation arrow to the right of the number box is now an arrow pointing downwards selecting this arrow restores the full function arguments dialogue box just above the blue help link on this dialogue is a formula result which in this case is a total you should also be aware of warning messages that could appear here these warnings are often generated by errors that are created when working with more complex function formulas you’ve now selected the required function and you’ve made sure that the syntax is correct and targets the required data select okay to add the completed formula to the worksheet when executed this function formula generates a sales total of $971,000 for team A adventure Works can copy this formula across the row to generate sales totals for the other teams thanks to your use of the insert function feature Adventure Works now have the required sales data and you should now be familiar with the function tool understand its categories and be able to make use of the tool to create a function formula congratulations on reaching the end of this second week in this course on preparing data for analysis with Microsoft Excel this week you explored how to create and work with formulas and functions in Excel let’s take a few minutes to recap what you learned in this week’s lessons you began the first lesson by learning about formulas you learned that a formula in Excel is a calculation performed on the values in a range of cells in your worksheets examples of these calculations include addition subtraction multiplication and division once the calculation is completed the formula returns a result even if it is an error you then learned how formulas work different characters or operators are used to indicate what type of calculation Excel should perform examples of operators and calculations include addition subtraction multiplication and division the formula bar shows the formula in the cell you are working in while the worksheet shows the result of the formula in the formula bar formulas can also be static or dynamic a static formula means that the numbers are fixed so it always generates the same results a dynamic formula is one in which the results depend on the current values in the reference cells a formula can also include a reference to a cell that itself contains a formula creating a chain of calculations and a formula can also refer to a cell in another sheet this reference must include the worksheet name followed by an exclamation mark you then learned how to control calculations you learned that when working through a formula Excel handles the operators according to the order of precedence this means that Excel calculates formulas according to the hierarchical position of each symbol within the order of precedence the hierarchy is as follows excel first calculates division and multiplication operators it then calculates addition and subtraction operators however you also discovered that you could control a calculation using parenthesis in formulas parenthesis instruct Excel as to which part of a calculation must be executed first even if this would contradict the order of precedence there are also times when you may need to reproduce cell entries and formulas within a worksheet when a formula is copied it is important to consider the appearance of the cell references there are two ways that a cell reference can appear in a calculation relative and absolute a relative cell reference means that Excel adjusts the cell reference of a copied formula relative to its new location to make sure it’s correct and an absolute reference means that Excel keeps the reference constant it doesn’t adjust it you learned that to make a cell reference absolute you must add a dollar sign before the column initial and before the row number you also explored different percentage calculations and you learned how to create reliable percentage formulas using the correct syntax throughout the lesson you put your new knowledge to use by assisting Adventure Works with many different calculation tasks one of these tasks was in the exercise in the exercise you calculated Adventure Works profits and margins in preparation for a presentation to complete this task you created a calculation that relied on the company’s revenue data and you made sure that your calculation followed the best practices you had explored during the lesson you then undertook a knowledge check in this item you proved your understanding of the concepts you encountered by answering a series of questions finally you explored a list of additional resources designed to help you improve your knowledge of the topics in this lesson in the second lesson of this week you learned how to get started with functions you began by learning that a function is a predefined formula that performs a calculation based on values specified by the user you then discovered that Excel contains many built-in functions grouped into separate categories which can be accessed from the formulas tab or ribbon you then explored the two elements of a function the first element of a function is the name such as sum next is the arguments an argument is the data a function accepts arguments are mandatory but the data can be custom you then learned how to construct an argument in Excel like any other calculation a function formula begins with an equal sign you then need to write the function name for example equals followed by sum the next step is to write the arguments within a pair of parenthesis when you finished typing your arguments end your function formula with a closing parenthesis you also learned that you could create a function using the insert function tool the tool is a framework for building functions it’s accessed using the formulas ribbon or from the worksheet screen the tool lets you build a function from a series of drop-own lists and it provides useful tips for building functions and warnings for when they’re incorrect you then explored the autosum shortcut the autosum shortcut is a method of adding formulas in Excel it provides quick access to core functions that Excel users make daily use of the functions it provides access to include the sum function which adds all values within a selected range of cells the average function used to calculate the average of the selected range and the different versions of the count functions these are useful methods of counting the numbers of cells in a given range that contain or don’t contain specified values there’s also the max function which displays the cell with the largest value from a given range and finally the min function this function displays the cell with the lowest value from a given range you can also reproduce calculations quickly and easily in a worksheet using the autofill feature just like in the previous lesson you put your new knowledge to use by assisting Adventure Works with many different functions this included the exercise item in the exercise you helped Adventure Works to prepare a monthly sales report to complete this task you prepared the report using a series of functions and you made sure that your calculation followed the best practices you had explored during the lesson you then undertook a knowledge check and a module quiz in which you proved your understanding of the concepts you encountered by answering a series of questions you’ve now reached the end of this module summary it’s time to move on to the discussion prompt where you can discuss what you’ve learned with your peers you’ll then be invited to explore some additional resources to help you develop a deeper understanding of the topics in this lesson best of luck we’ll meet again during next week’s lessons you check the results of a recently performed data analysis only to discover the results are wrong a quick inspection of the data set reveals errors in the data raw data needs to be correct and trustworthy because this information influences decisions so you always need to check for errors and resolve any you find in this video you’ll explore the common data errors in Microsoft Excel and discover how they could negatively impact data analysis jamie at Adventure Works is working on a spreadsheet that contains a large amount of customer and sales information she’s assessing if the contents are reliable enough to be used for data analysis to deliver new insights on customer behavior however the spreadsheet contains some common errors these errors must be resolved before she can make use of the data let’s take a few minutes to examine the types of errors that Jaime should be checking for many common errors or mistakes that you might find in your data set are often made by those who entered the data they might be unfamiliar with the software or technology or they’re just not paying attention a common mistake is that a name or keyphrase is misspelled in that case Excel might not link the entry to other important details as it should or it might not find the entry in a search for example Jaime’s spreadsheet tracks sales figures by region column C tracks the city in which each sale was made if she types the city Chicago as the latest entry without the A or types it in the wrong column Excel would ignore that entry when asked to summarize or total the sales results for that city entries can be misidentified during the data analysis process if they contain unnecessary characters for example Jaime types a dollar character before the numbers in her entries these entries are considered text excel would not include those amounts in a number calculation in a wider data analysis process they might be ignored altogether remember in Excel a currency amount should always be typed as numbers in the cell first then you should apply the currency symbol or the comma separator using a number format unnecessary spaces before or after entries can also create difficulties they don’t stand out on screen in the same way as other text or number characters but Excel is aware of them for Excel the word Chicago followed by a single space is different from Chicago typed without the space for calculation and analysis purposes it considers them to be two separate cities finally an entry might be placed in the wrong column or under an incorrect heading in a spreadsheet for example Jaime might type an entry under the wrong heading in her spreadsheet the city named Chicago is entered in the sales price column so that row item might be mclassified other examples of common errors or mistakes can be caused by an inconsistent layout or content it’s important that data is presented consistently throughout a worksheet so that it always remains accurate and reliable poor or inconsistent layouts can give rise to errors when creating an Excel file keep in mind the way in which information will be used like if a spreadsheet only has a single column for an address this column then contains all the address elements like city region or area code this means that it’s difficult to break down these results separately by city or by region during data analysis because they’re not in separate columns instead you should format information like addresses across multiple columns so that it’s easier to process and analyze the data abbreviations and acronyms can also generate errors in data analysis it’s usually better to include a full word or title instead of an abbreviation or acronym in the following spreadsheet there are multiple variations of common abbreviations like Mr Miss and doctor this will cause serious issues during data analysis the best approach for data analysis is to standardize the approach for writing abbreviations particularly for titles like these another important feature of data analysis is the ability to break down results and information by date or calendar interval this means that dates must be entered in a particular way in a spreadsheet so that Excel recognizes them as calendar items the component elements like the month day and year must be typed as numbers and separated by a forward slash or a dash if you type dates with incorrect separator characters then Excel won’t interpret them as numbers instead it processes them as text so you won’t be able to conduct time analysis of your data a final common error to be aware of is duplicate information duplicate information in a data block distorts analysis results items can be counted multiple times and numeric results can be artificially inflated checking for duplicate data is an important step before performing data analysis duplicated entries in data are often the result of human error where entries are typed multiple times data could also be repeated accidentally if imported or created using a copy and paste operation for example Jamie might add sales figures from the previous week to the spreadsheet if her colleague doesn’t check for duplicate data then those sales figures could be included in the results a second time so how could you avoid the risk of duplicate data aim for an efficiently designed spreadsheet for example if you’re including dates in your spreadsheet then sort the sheet in date order this makes it easier to identify the time entries already added likewise if you’re including address data then assign a different column to each element of an address this helps others to identify entries by searching for house numbers street names or cities like an entry for apartment 1 2 36 on North Street Miami jamie has identified the common errors in her data set she can now resolve them and start analyzing the data and you should also now be able to recognize common data errors and how they can have a negative impact on data analysis results you’ll be able to identify and fix the most common errors in the data before submitting it for analysis well done every day you calculate dates and times asking questions like “How long do I have to get to work?” or how many days do I have available to complete that project data analysts also ask date and timebased questions about their data sets and they can calculate answers using Excel’s date and time functions and formulas in this video you’ll learn about the importance of these date and time calculations how they can generate new data and explore some business use cases over at Adventure Works distribution hub Jaime is overseeing both the stock that Adventure Works are purchasing from suppliers and the items dispatched to fulfill customer orders jaime needs to create a spreadsheet with date and time formulas that track the delivery times dates and date intervals before you discover how Jaime can make use of these formulas let’s find out how date and time information provides businesses with an essential framework for planning date and timebased calculations are useful tools in helping businesses to plan for increased demands for products and demands on resources such as staff and equipment they also help businesses plan towards key dates or deadlines you can also use Excel to plan toward key dates where there will be an increased demand on your business take the example of a building company contracted to build a new office block the project manager needs to create schedules and plans for all stages of the building process for planning purposes they need to determine how many working days there are between the project’s proposed start and end dates excel can be used to create formulas to calculate how many hours calendar days or work days there are for important deadlines these formulas can be set up in a dynamic way so that they update as the clock or the calendar changes by monitoring daily results over a specific time interval businesses can identify dips and peaks in performance for example a management team might notice that during one period there was a significant drop in sales if the results are organized by date they can identify the factors internal or external that might have caused this date and time calculations are also useful for tracking results and performance business transactions are usually recorded against dates and in some cases against time now that you’re familiar with some of the benefits of date and time calculations let’s explore date and time functions and formulas in Excel it is important to understand how Excel tracks dates and how it is used in calculations let’s begin with serial numbers the method Excel uses for tracking calendar days in Excel each date entry is formatted to appear as a calendar item however behind each date is a number that Excel uses to keep track of calendar days this number is known as a serial number excel assigns a serial number to each date starting from the 1st of January 1900 this date was given serial number one excel uses the system clock on your computer to track time and it increments the serial number by one when a 24-hour period has elapsed a date in the past will have a smaller serial number than one in the future you can view the serial number behind any date by changing the format from date to general in this example the two entries in A2 and B2 are formatted to display as dates if the same entries in A4 and B4 are formatted as general it is possible to display the serial numbers behind these dates the later date has a larger serial number excel uses these serial numbers in calculations using serial numbers one date can be subtracted from another to calculate a specific number of days for example the today formula can be used to always display the current date in a spreadsheet over at Adventure Works Jamie needs to display the current date in her spreadsheet she can use the today function to generate this result the syntax for this formula is an equal sign followed by the word today and parenthesis this creates a dynamic date display in a spreadsheet that updates every 24 hours a similar function called now can also be used to display both the current date and time the syntax for this function is an equal sign the word now followed by parenthesis when executed this function displays the current date and time in your spreadsheet this makes it more useful than the today function which just shows the date you can also use functions to extract the component elements of a date these actions can be carried out using the month day and year functions each function extracts a specific component of the date the month day or the year you will learn more about these functions and the others you’ve just reviewed later in the course finally there’s also the date function the date function is the opposite of month day and year either of these operations may be necessary to prepare date information for data analysis you will learn more about these functions and the others you’ve just reviewed later in the course jamie can use these date and time formulas to track delivery times and dates for Adventure Works purchases from suppliers and to track items dispatched to their customers and you should now understand how date and time calculations are used to generate new data in Microsoft Excel you’ve also learned how to identify key business case uses for date and timebased information well done as a data analyst you’ll often have to input large volumes of time and date based data into your spreadsheets and it can be difficult to manually keep this data aligned with your project thankfully with Excel you can create dynamic date and time entries that update automatically over the next few minutes you’ll learn how to create dynamic time and date entries in a worksheet and separate dates into component parts adventure Works are preparing a new advertising campaign which will launch in multiple countries they need to use Excel to track progress toward key dates the milestone dates for the project are contained in a worksheet called regional dates the worksheet tracks information about the products that are part of the campaign alongside the campaign launch dates for each country adventure Works needs to calculate how many project days are available for each campaign another calculation in the spreadsheet must show on a rolling basis how many days are left until each launch date the development of this campaign will spread over two years so Adventure Works also need to record the accounting period for the project launch date for each country let’s help Adventure Works to complete their spreadsheet using date and time formulas entries in column D and E are formatted as dates you can select any cell in the range D5 to E19 and check the number format box on the home ribbon to confirm this remember that these dates are actually serial numbers so you can switch the format on cells D5 and E5 to general access the home tab and select general from the drop-own menu to display the serial numbers notice that the serial number for the date in E5 is larger than the one for the date in D5 select undo to restore the date format now you need to calculate the number of project days you can complete this task using a simple subtraction formula select F5 to input your calculation begin the calculation with an equal sign then take the date in E5 the larger serial number and subtract the date in D5 the smaller serial number press enter to generate the result there are 63 days assigned to the timeline for this first project note that because this calculation is a subtraction Excel doesn’t include the start date in cell D5 in its count however if required you can ask Excel to include the start date by adding a plus one to the formula the result in F5 remains static because the dates in D5 and E5 won’t change now you need to work out the days to launch figure for cell G5 the formula for this figure takes the launch date in E5 and subtracts a current date figure in cell E1 the current date in E1 must also be created using a formula if E1 always displays the current calendar date then the formula in G5 recalculates daily to show the decreasing numbers of days to the launch date you need to use the today function in your formula in E1 to make sure that the date updates every 24 hours to the current date with the cursor in E1 type an equal sign the word today and an open parenthesis you might notice that the help prompt is empty this is because the function doesn’t require any arguments there still needs to be parenthesis after the function name but no arguments should be included press enter to produce a dynamic date result that updates every 24 hours to show the days to launch figure in G5 the formula takes the campaign launch date in E5 and subtracts the current date in E1 the E1 cell reference must have dollar signs before the column initial and the row number this is to make sure that the reference stays constant when the formula is copied the today formula will now change the current date in cell E1 every day this means that the formula in G5 also recalculates daily so the days to launch figure reduces by one each day as the timeline gradually progresses your next task is to show the year for the campaign launch date excel recognizes three elements in a date the month the day and the year you can use the year function to identify and display the year element from a date in another cell in other words you can separate the date into its component parts so that you can focus on the year element type an equal sign the word year and an open parenthesis in cell H5 a help prompt appears on screen and states serial number this is because Excel interprets stored dates as serial numbers select E5 type a closing parenthesis and then press enter to generate the result in H5 this campaign is set to launch in 2023 you’ve calculated the required campaign information for row 5 you can now copy these formulas down through the spreadsheet to calculate the remaining campaign dates use the autofill doubleclick shortcut on each formula to copy it down through the column to row 19 and complete the spreadsheet you should now understand how Excel works with dates in calculations and be able to create some common dates and time tracking formulas thanks to your work in these formulas Adventure Works now have a clearer picture of how much time is available for each stage of this project well done when working with Excel you might need to execute a function under certain conditions or logic in these instances you can use a logical function calculation like an if function in this video you’ll explore the purpose of logical functions review some common use cases and learn the syntax for creating a logical function formula using the if function over at Adventure Works Lucas is reviewing the monthly sales reports he needs to find out if any of the sales staff are entitled to a monthly bonus as a reward for exceeding their sales targets you can help Lucas to identify which sales team members deserve a bonus by using an if function formula but before you can help Adventure Works you’ll need to find out more about how logical functions work you can use logical functions to ask yes or no questions about your data if the function returns yes as its answer then you can direct Excel to perform the required action however if the function returns an answer of no then Excel can be directed to perform a different action for example you can direct adventure works if function formula to ask the question has this salesperson met their target if the answer is yes then they’ll be awarded their bonus if the answer is no then they’re not awarded a bonus when logical functions such as if run a test they determine the answer by comparing the value in a cell against a specified criterion for these tests to work the formula must contain logical operators the logical operators determine what kind of question the formula is asking and what value it needs for its answer these operators can be used to compare both text and numeric entries let’s review some examples of these operators the equal sign is the first of the mathematical operators that Excel uses in logical functions excel uses this operator to check if the value of one item is equal to that of another item for example a formula that tests if one equals 1 would return the value of true the logical symbols greater than and less than are used by Excel to test if one value is larger or smaller than another an Excel formula that performed the logical tests two is greater than one and one is less than two would return an answer of true for both tests the greater than and less than symbols can also be combined with the equals sign this combination lets Excel confirm if a value is greater than or equal to or less than or equal to another value let’s take a formula where Excel checks to see if the value in cell D2 is the same as or larger than the value of 400 if even one of these arguments were true then the test would return the value of true finally a very useful set of logical operators is not equal to this is when the less than and greater than symbols are typed back to back this combination of operators is interpreted by Excel as not equal to in other words you’re asking Excel to determine that value A does not equate to value B for example the result of the logical test 1 is not equal to two would be true because the two numbers are different values so you’ve discovered how an if function formula works but how do you make use of one when constructing the if function formula you need to give Excel three pieces of information the first piece of information is called the logical test for the logical test you need to identify the cell that contains the value to be checked you also need to specify the test to be carried out in relation to this value this is the if keyword followed by parenthesis it’s within these parentheses that you must type the logical test for example Lucas needs Excel to check the total sales of each team member to determine if they meet their monthly target the next instruction tells Excel what to do or what to display if the test returns a result of true in Lucas’s case if his test returns a value of true then the team member is awarded a bonus the third and final argument is what Excel should do or display if the logical test returns the result of false if Lucas’ test returns a value of false for a team member then Excel returns a value of zero in other words that person is not awarded a bonus now that you’ve reviewed the elements of an if function formula let’s make use of your new skills and help Lucas create a formula to check the sales team’s monthly figures and determine which employees are entitled to a bonus the data set Lucas requires is in a workbook called monthly sales the workbook contains four sheets one for each sales team for this exercise let’s just focus on the results for team A the worksheet lists the name of each team member their total monthly sales and their monthly target the bonus amounts must be calculated and listed within column E any team member who meets or exceeds their target is awarded the bonus figure in cell H4 let’s begin by finding out if team member Michelle Cook is entitled to a bonus position the cursor on cell E4 type an equal sign the keyword if and an opening parenthesis you need to place your arguments for the if function within parenthesis notice the floating help message prompting you for the three arguments that the function needs select cell C4 for Michelle’s monthly sales data type a greater than symbol followed by an equal sign then select cell D4 and type a comma this instructs Excel to check if Michelle’s sales figures for this month are greater than or equal to her assigned target however as you can see from the bold prompt text the formula is still incomplete you now need to instruct Excel on what bonus value to award you must also include what action Excel should take if the result of the logical test is true or yes and what to do if the result is false or no select cell H4 for the value if true add a dollar sign before the column initial and the row number this dollar sign prevents Excel from adjusting it when copied then type a comma followed by a zero for the value if false this zero indicates that Michelle doesn’t receive a bonus if Excel doesn’t return the required value finally type a closing parenthesis to end your arguments press enter to execute the if function formula the results show that Michelle has met her sales target and has earned a bonus of $500 for this month copy the formula down the column and executed to determine how the other team members have performed the results show that three team members met their sales targets and could be awarded a bonus two team members did not reach their targets so should not receive a bonus thanks to your help Lucas successfully created the IF function formula he needed to complete his task and having assisted Lucas you should now know how if functions work and recognize the correct syntax to create a logical formula using if well done you may be familiar with using a logical function to test for conditions in your data sets but what if you need to test for multiple conditions you can use nested if and ifs functions in this video you’ll explore the concept of nested if and ifs functions and learn how they can be used to perform a series of elimination tests and generate a final result over at Adventure Works Lucas is calculating bonuses for sales team B lucas needs to calculate each team member sales total and determine what level of bonus they should be awarded lucas can complete this task using nested if and ifs functions let’s find out more about these functions and then help Lucas complete his task at this stage of the course you’ve encountered many examples of function formulas but a formula doesn’t have to make use of just one function in fact a formula can contain several functions that work together to achieve a result logical functions work this way by interconnecting with one another nesting functions is the technique of adding another function to the formula as an argument for the original function in other words you can place one function inside another to expand its functionality for example you might need to create a formula that performs a series of elimination tests before it generates the final result you could design this formula in two ways one approach would be to create what is known as a nested if formula the formula begins with an if that performs an initial logic test if the test turns out to be true then the formula will simply process whatever action is specified in the value if true argument however the result of the logical test could also be false if so then another if function in the value of false argument could run another test and process different actions for example a nested if formula could check if a member of the adventure work sales team meets a specific bonus band if the result is false then a second argument could check the value against another band and so on the second approach is to use a function called ifs an ifs function is designed to run a series of tests that don’t require you to nest other functions the ifs function steps through the tests checking each one if a test is false it continues to move through the tests until it finds one that is true when a logical test returns true as a result the formula performs or displays whatever is in the value if true for that test it then stops running tests in the case of Adventure Works the IFS function can continually check each sales team member’s sales results against the different bonus bands until it identifies a suitable amount to award them now that you’ve learned about the basics of nested if and ifs functions let’s put your knowledge to use by helping Lucas to calculate the bonus bands for the sales team the sales data sets are contained in the team B worksheet in a workbook called monthly sales figures the team B worksheet lists the names of each team member and their monthly sales result it also lists their sales targets and the amount they achieved above their targets the bonus amounts must be listed in column F using the bonus bands data in column I and J adventure Works also needs a formula in F3 that checks the sales data in cell E3 it must then calculate which bonus band is applicable to the team member Olivia King and display the correct bonus amount let’s begin by typing the formula position the cursor on F3 type an equal sign an if and an opening parenthesis next select E3 to add that cell as a cell reference then type a greater than symbol followed by an equal sign type 20,000 which is the first bonus band and then a comma finally select cell J3 to add it as the value if true argument then type a comma this first part of the formula provides Excel with the following instruction if the figure in cell E3 is greater than or equal to 20,000 then the staff member is owed the bonus amount in cell J3 but what if one or more of the amounts in column E are less than 20,000 if the amount in E3 is less than 20,000 there are still two other bands from which a bonus can be assigned to test for these bands you need to add another if function as the value if false argument in the formula you can nest this function within the first one first type an if in this instance you don’t need another equal sign then type an opening parenthesis so you can begin writing your arguments this second occurrence of the if will need its own opening and closing parenthesis the parenthesis must contain three arguments a logical test a value if true and a value if false let’s create the logical test first select E3 to assign it to your argument then type a greater than symbol and an equal sign then type 10,000 and add a comma next you need to assign the value if true so if the amount in E3 is over 10,000 then the bonus amount awarded will be the value in J4 select cell J4 and type a comma to assign it to your argument finally you need the value if false if it’s not true that the amount is over 10,000 then the bonus amount awarded will be the value in J5 select cell J5 to assign it to your argument each instance of if also needs its own closing parenthesis type two closing parenthesis and press enter to execute the function the results of your function show that the logical test for the first if failed so Excel moved on to the second if the second logical test was true so Excel correctly displayed the bonus amount of $1,000 in cell J4 changing the monthly sales figure for Olivia to 67,140 would change the result in F3 because both if functions would have returned a false result so the result would have been the value in cell J5 this formula is now a nested formula because there is a second if inside the first one let’s delete this result and recreate the formula using the ifs function when you type equals an ifs and an opening parenthesis Excel only provides prompts for two arguments a logical test and a value if true as you learned earlier you can use ifs to specify a series of tests and the value if true for each one let’s step through this process select cell E3 then type a greater than symbol an equal sign and a value of 20,000 type a comma and then select J3 as the band to be assigned if the first test is met when you type a comma prompts appear for another logical test and a value if true for the second logical test select E3 again this time you must follow it with a greater than sign and an equal sign then select J4 now you need to tell Excel that the final value of true should be the result of the formula so type true and a comma then select J5 adding the word true here prevents Excel from producing a hash NA error message you also need to add dollar signs to the J3 J4 and J5 references you can now copy this formula down through the column to calculate the bonus amount for each team member thanks to your help Lucas has now determined what bonus band should be awarded to each team member and you should now understand the difference between a nested if function formula and a calculation that uses ifs you’ve explored the different syntax for both types of formula so you can decide which you find easier to understand and replicate congratulations on reaching the end of the third week in this course on preparing data for analysis with Microsoft Excel this week you explored how to use functions to prepare data for analysis in Excel let’s take a few minutes to recap what you learned in this week’s lessons you began the first lesson by discovering how inconsistent data affect analysis and the common mistakes people make examples of these errors include misspellings unnecessary characters and spaces and incorrectly place entries you now know that errors such as these have a negative impact on data analysis you were also able to fix these errors in your data before submitting it for analysis you then learned how you can use different functions to standardize text data the left mid and right functions are used to return a specific number of characters from either the left the middle or the right side of a cell entry typically these functions are used in situations where you need to transfer parts of the cell content to a different column many data analysts use the left mid and right functions to split the contents of a column into three separate columns the trim function removes empty spaces from text strings except for the spaces between words this is useful for when you suspect that there are random spaces at the beginning or end of an entry it’s also a useful way to tidy up a column of text before beginning any analysis using the wrong case in text data can make a summary or report appear untidy or unprofessional there are three functions you can use to standardize the case used in text entries these are upper lower and proper lastly you can use the concat function to combine entries from different cells in a spreadsheet into a single cell entry and in this lesson you put your new knowledge of functions to use by helping adventure works you used your knowledge of functions to help Adventure Works standardize its data for analysis one of these tasks was in the exercise in the exercise you had to clean up Adventure Works spreadsheet so that it could be used for data analysis to complete this task you used formulas to remove inconsistencies or errors from the data and you made sure that your formulas followed the best practices you had explored during the lesson you then undertook a knowledge check in this item you proved your understanding of concepts you encountered by answering a series of questions finally you explored a list of additional resources designed to help you improve your knowledge of the topics in this lesson in the second week you learned how to use date and time functions in Microsoft Excel to generate new data you explored different examples of how the data generated from date and time calculations can be used for example date and time data can be used to create a framework for planning track business performance and display important results you then learned how Excel interprets and works with dates in a spreadsheet all dates have serial numbers which is how Excel interprets them with these serial numbers you can use dates to perform calculations like subtracting one date from another you also reviewed functions for creating dynamic formulas that calculate time and date values these include the today and now functions and you discovered that you can also divide a date entry into its component parts using day month and year or return these components as a single date with the date function throughout the lesson you put your new knowledge to use by assisting Adventure Works you helped the company to plan its projects by using different date and time calculations one of these tasks was in the exercise in the exercise you gathered date and time information for one of Adventure Works advertising campaigns you completed this task using the date and time calculations you learned about these functions helped you to generate new milestone data for Adventure Works you then undertook a knowledge check in this item you proved your understanding of the concepts you encountered by answering a series of questions finally you explored a list of additional resources designed to help you improve your knowledge of the topics in this lesson in week three you learned about logical functions such as if and ifs you learned that logical functions can be used to ask yes or no questions about your data if the function returns yes as its answer then you can direct Excel to perform the required action however if the function returns an answer of no then Excel can be directed to perform a different action next you learn that for these tests to work the formula must contain logical operators the logical operators determine what kind of question the formula is asking and what value it needs for its answer you discover that these operators make use of if formulas and this formula needs three pieces of information to work it requires a logical test a true value and a false value you also learned that nesting functions is the technique of adding another function to the formula as an argument for the original function in other words you can place one function inside another to expand its functionality there are two approaches you can use the nested if function or the ifs function you learned that the nested if formula begins with an if that performs an initial logic test if the test turns out to be true then the formula will simply process whatever action is specified in the value if true argument however the result of the logical test could also be false if so then another if function in the value if false argument could run another test and process different actions the second approach is to use the ifs function you discover that the ifs function steps through the tests checking each one if one test is false then the function continues to move through the remaining tests until it finds one that is true when a logical test returns true as a result the formula performs or displays whatever is in the value if true for that test it then stops running tests just like in the previous lessons you put your new knowledge to use by helping adventure works in this lesson you determined the financial performance of the sales team using if and ifs functions this included the exercise item in the exercise you helped Adventure Works to generate additional information from a customer’s spreadsheet to complete this task you generated the required information by using if and ifs functions and you made sure that your calculation followed the best practices you had explored during the lesson you then undertook a knowledge check and a module quiz in which you proved your understanding of the concepts you encountered by answering a series of questions you’ve now reached the end of this module summary it is time to move on to the discussion prompt where you can discuss what you’ve learned with your peers you’ll then be invited to explore some additional resources to help you develop a deeper understanding of the topics in this lesson best of luck we’ll meet again during next week’s lessons you’re nearing the end of this course on preparing data for analysis in Microsoft Excel you’ve put great effort into this course by completing the videos readings quizzes and exercises you should now have a stronger grasp of several foundational concepts for understanding data analysis these include the fundamentals of working with data in Microsoft Excel creating and using formulas and functions in Excel and preparing data for analysis using functions you’re now ready to apply your knowledge in the exercise and the final course assessment the assessment is a graded quiz that consists of 30 questions that are related to topics you covered throughout the course but before you start let’s recap on what you’ve learned in the first week you were introduced to Microsoft Excel you learned how to use Excel by exploring how to enter and format data manage worksheets read large blocks of data and sort and filter data microsoft Excel is a useful data analysis tool it is used in everyday business to store calculate and gain insights from data you learned how to navigate Excel using its UI for example the title bar that displays the name of your file and search option and the command tabs which are organized into tabs and ribbons you also learned that a worksheet is where you input data into cells data can be added to worksheets by importing it or creating it manually data isn’t always easy to read but you’ve learned how to use formatting to improve the readability of a spreadsheet you also explored the keyboard shortcuts for data entry and formatting excel has various features that help you to read large blocks of data you learn that you can use the freeze panes new window name box features and keyboard shortcuts to make it easier to read your data you can use the sort and filter feature to organize and sort data quickly and efficiently there are also different sort methods such as alpha numeric sort and multi-level sort that you can use to sort your data the filter feature helps you to control data visibility in a worksheet and provides information on how many rows match a specific criteria in the following week your focus shifted to functions and formulas in Excel you discovered that a formula in Excel is a calculation performed on the values in a range of cells in your worksheets examples of these calculations include addition subtraction multiplication and division once the calculation is completed the formula returns a result even if it is an error you then explored how formulas work along with the operators they use formulas can be static or dynamic a static formula means that the numbers are fixed so it always generates the same results a dynamic formula is one in which the results depend on the current values in the reference cells and it reacts to any changes in the values by updating the result you also learned how to control calculations here you learned that Excel controls calculations using the order of precedence this means that Excel processes the mathematical operators in formulas according to the hierarchical position of each symbol within the order of precedence you learned about the hierarchy of symbols and discover that you can also control a calculation using parenthesis next you explored the relative and absolute cell references these concepts relate to how a

cell reference appears in a calculation a relative cell reference means that Excel adjusts the cell reference of a copied formula relative to its new location to make sure it’s correct an absolute reference means that Excel keeps the reference constant in other words it doesn’t adjust it you also learned about functions which are predefined formulas built into Excel you explored popular functions such as sum average and count and learned how to create formulas with them using features such as the autosome shortcut and the insert function wizard you also explored different percentage calculations and you learned how to create reliable percentage formulas using the correct syntax the third week was all about preparing data for analysis using functions you started off by exploring how inconsistent data affects analysis and the mistakes that can be made when inputting data examples of these errors include misspellings unnecessary characters and spaces and incorrectly placed entries you now know that errors such as these have a negative impact on data analysis you also learned how to fix these errors in your data before submitting it for analysis it is important to standardize text data before analyzing it you can do this using functions the left mid and right functions are used to return a specific number of characters from either the left the middle or the right side of a cell entry typically these functions are used in situations where you need to transfer parts of the cell content to a different column the trim function removes empty spaces from text strings except for the spaces between words this is useful for when you suspect that there are random spaces at the beginning or end of an entry you also learned that there are three functions upper lower and proper that you can use to standardize the case used in text entries your reports will look tidy and professional if you standardize the case you can also use the concat function to combine entries from different cells in a spreadsheet into a single cell entry next you discover that dates are important for data analysis without date and time data it is more difficult to analyze and compare results over time you explored functions such as today or now which help you add dynamic date and time information to your worksheet you also learned that other functions such as year month or day can be used to split dates into their component parts to facilitate analysis finally you learned how logical functions such as if and ifs add another dimension to calculations because they ask Microsoft Excel to check for criteria and perform different actions depending on the result you then explored how other functions such as the or and the and functions make the logical formulas you create even more efficient and versatile you also learned how to produce specific and targeted formulas by using functions such as sum if average if and count if these functions combine the if functionality with the actions of standard functions such as sum now that you’ve built a solid understanding of the fundamentals of Excel formulas functions and learned how to prepare data for analysis you’re ready to test your knowledge by undertaking the exercise and the final course assessment best of luck congratulations you have made it to the end of the preparing data for analysis in Microsoft Excel course your hard work and dedication have paid off you’re off to a great start with your data analysis learning journey and you should now have a thorough understanding of the fundamentals of Microsoft Excel working with blocks of data in Excel formulas and functions and how to prepare data for analysis using functions you can also identify common errors made in data analysis and you know how to deploy different strategies to make sure you have reliable data but that’s not all you’ve also gained valuable insight into the functions and formulas you can use to create in-depth data for analysis you’ve explored various calculations deepened your knowledge of how data analysis can be performed and reviewed scenarios where it is used and let’s not forget the process of preparing data for analysis you now understand the critical role that reliable data plays as a central focal point of data analysis you should now have a firm knowledge of how Microsoft Excel works and how it can be used for data analysis think about everything you can do with this new knowledge well done for taking the first steps towards your future data analysis career by successfully completing all the courses in this program you’ll receive a Corsera certification this program is a great way to expand your understanding of data analysis and gain a qualification that will allow you to apply for entry- levelvel jobs in the field all the courses in this program including the one you just completed will help you prepare for the PL300 exam by passing the exam you’ll become a Microsoft certified PowerBI data analyst it will also help you to start or expand a career in this role this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to perform the following tasks prepare data for analysis model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX you’ll learn about the syntax later in this program you can visit the Microsoft certifications page at http://www.learn.microsoft.com/certifications to learn more about the PowerBI data analyst associate certification and exam this course has enhanced your knowledge and skills in the fundamentals of data analysis but what comes next there’s more to learn so it’s a good idea to register for the next course on harnessing the power of data in Microsoft PowerBI the next course will cover various ways data analysis is used in business you’ll learn about the role of a data analyst and how to use data to solve business problems and you’ll learn how to process and analyze data then you’ll move on to learn about the tools needed to analyze data efficiently whether you’re just starting out as a novice or you’re a technical professional completing the whole program demonstrates your knowledge of analyzing data in PowerBI you’ve done a great job so far and you should be proud of your progress the experience you’ve gained will show potential employers that you are motivated capable and not afraid to learn new things it’s been a pleasure to embark on this journey of discovery with you best of luck in the future hello and welcome to the harnessing the power of data with PowerBI course this course covers the core concepts of data analysis and introduces the main features of Microsoft PowerBI many of your normal digital activities generate data this can happen when you use services such as car parking traveling by rail or air or from your shopping socializing or fitness activities of course it’s not just you that’s contributing data your friends family and colleagues in fact almost everyone adds content to the data pool businesses and organizations also use many other sources such as government financial economic health and scientific data to name a few gathering and storing a vast amount of data is the first phase then comes the challenge of its analysis this is why there is a growing demand for data analyst professionals businesses need data analysis more than ever and as a data analyst you’ll be ideally placed to begin harnessing the power of data in this learning path you will learn about the life and journey of a data analyst and the skills tasks and processes they go through in order to tell a story with data you’ll discover how getting that data analysis story correct enables businesses to make informed decisions let’s get an overview of the main topics covered in this course you may have already learned about one crucial topic preparing data using Microsoft Excel you also need to understand other elements involved in the career of data analysis including learning about the stages in the data analysis procedure and the roles involved recognizing key issues and concerns when conducting analysis and sharing results and knowing different types of data sources and connection types this course will give you a solid foundation in these topics and introduce you to the component elements of Microsoft PowerBI software that helps to process analyze and share data let’s now quickly summarize the course material to give you an overview of all your study in this course this course will introduce you to data analysis in business data sources and data ingestion to begin you’ll learn about the role of a data analyst key data analysis concepts and how data plays an essential role in business you’ll then be briefly introduced to PowerBI as a tool for data analysis you will also learn about data sources and the exact transform load or ETL process you’ll learn the importance of identifying and evaluating data sources and following this you will learn about transforming and cleaning data in PowerBI you’ll get to distinguish between the different query and scripting languages to consolidate your learning and put it into practice you will complete a practical assignment where you will use data to determine the cause of a recent decrease in sales practical exercises in the course are based on a fictional business called adventure works during the exercise you must identify stakeholders locate data sources perform data transformation and distribute reports after this hands-on learning you will complete a final graded assessment be assured that everything you need to complete the assessment will be covered during your lesson with each lesson made up of video content readings and quizzes to assist your learning you will also get to apply your newly gained skills in exercises quiz questions and self- reviews in addition discussion prompts allow you to share knowledge and discuss difficulties with other learners these discussions are also a great way to grow your network of contacts in the data analysis world so be sure to get to know your classmates and stay connected during and after your course is this the course for you hopefully the outline of the course content and topics will help you decide and it’s important to mention that you don’t need an IT related background to take this course it’s for anyone who likes using technology and has an interest in data analysis whatever your background to complete this course you need to have access to some resources you need a laptop or desktop computer with a recommended 4 GB of RAM an internet connection and a Windows operating system version 8.1 or later it should have a .NET framework version 4.6.2 to or later install and a subscription to Microsoft Office 365 you’ll also need to install PowerBI desktop available as a free download you’ll find further details about these and other requirements in the additional resources item at the end of this lesson this program prepares you for a career in data analysis when you complete all the courses in the Microsoft Power BI analysis professional certificate you learn a Corsair certificate to share with your professional network taking this program not only helps you become job ready but also prepares you for an exam PL300 Microsoft PowerBI data analyst in the final course you’ll recap the key topics and concepts covered in each course along with a practice exam you’ll also get tips and tricks testing strategies useful resources and information on how to sign up for the exam finally you’ll test your knowledge in a mock exam mapped to the main topics in this program and the Microsoft Certified Exam PL300 ensuring you’re wellprepared for certification success earning a Microsoft certification is evidence of your real world skills and is globally recognized a Microsoft certification showcases your skills and demonstrates your commitment to keeping pace with rapidly changing technology it also positions you for increased skills efficiency and earning potential in your professional roles the topics covered in the practice exam include prepare data model data visualize and analyze data and deploy and maintain assets in summary this course introduces how a data analyst uses data to create a compelling story through reports and dashboards using Microsoft PowerBI it also explores the need for true business intelligence in the enterprise i hope you are ready to get started with your data analysis journey data is an essential business component with organizations using many methods to collect their data however raw data is only meaningful with proper interpretation and analysis that’s where the work of a data analyst is crucial because data is often used to inform decisions that can significantly impact an organization’s success data analysts are essential to business they help organizations make sense of the vast amount of collected data in this video you will explore the role of a data analyst the flow of data in an organization and how an analyst achieves data insights that inform decisions you’ll also learn about the importance of data analysis in modern organizations and the vital role of a data analyst data analysts help organizations make sense of the data they collect turning it into insights that inform decisions let’s explore the responsibilities of a data analyst and discover how they achieve data insights imagine you work for an online retail company every day your company collects data on customer purchases website traffic and social media engagement however the data is not organized which makes it difficult to analyze the inability to interpret the data means your company fails to identify opportunities to improve customer experience increase sales and stay ahead of the competition this is why a data analyst is needed the data analyst is responsible for collecting organizing and analyzing the data to generate insights that inform business decisions for example the data analyst may identify trends in customer behavior that could inform marketing campaigns or website design they may also identify areas where the company can cut costs or improve efficiency strategic thinking awareness of impact and understanding of context are crucial skills for a data analyst to succeed in their role here’s why each skill is important strategic thinking helps data analysts prioritize tasks allocate resources efficiently and make datadriven decisions that contribute to long-term success by considering both short-term and long-term implications data analysts can ensure their work has a meaningful impact on the organization being aware of the potential impact of their analysis is critical for data analysts to ensure they communicate their findings responsibly and ethically this involves understanding the consequences of datadriven recommendations considering potential biases and ensuring data privacy and security awareness of impact also helps data analysts advocate for datadriven decision making and fosters a culture of evidence-based strategy within the organization data analysts need to have a deep understanding of the context in which they are working including the industry market trends and the organization’s goals and challenges this knowledge allows them to tailor their analysis to the specific needs of the business and provide actionable insights data analysts use various tools and techniques to collect and analyze data these include programming languages like R and Python r is used specifically for data analysis while Python is a generalpurpose programming language that can be used for a wide range of applications including statistical analysis data visualization tools like Microsoft PowerBI and databases like SQL Server data analysts are expected to be proficient in these tools and technologies and to possess excellent analytical skills a data analyst collects data from many resources including customer sales financial and operational data departments within an organization such as marketing sales finance and operations provide this data the data is then processed cleaned and transformed into a usable format for analysis this process is known as data wrangling once the data is wrangled it is loaded into a data warehouse or data lake where data analysts can access and analyze it the data is organized into tables or data sets each containing a specific data type data analysts then use this data to generate insights that inform business decisions data analysts play a critical role in our datadriven world they help organizations make sense of the large amounts of collected data turning it into insights that inform decisions using their skills data analysts help organizations identify growth opportunities improve operations and gain competitive advantage someone at the party asks you “What do you do?” You reply “I work with data.” Does that help them data roles are a mystery most people don’t understand the value and variety of positions in the data analysis process let’s demystify data analysis roles and responsibilities in this video by exploring various roles and describing how they contribute to the success of datadriven organizations you’ll also learn about the importance of each role and how roles collaborate the data analysis roles and responsibilities that you’ll explore are data engineer data analyst data scientist database administrator data architect and business intelligence analyst commonly called BI analyst to understand a data engineer’s role imagine you’re creating a garden the data engineer is like the person who designs and constructs the irrigation system delivering water to each plant they build and maintain the data infrastructure including designing constructing and integrating data pipelines they clean pre-process and transform raw data into a format that can be used by data analysts and data scientists in our gardening analogy the data analyst is like the gardener who meticulously observes the growth of each plant and makes recommendations for improvement data analysts examine data sets to identify trends patterns and insights to inform decision-m they use various tools and techniques to visualize and present data making it easily digestible for stakeholders data analysts work closely with other team members to align their analysis with business goals and objectives think of a data scientist as a botanist using their plant biology knowledge to optimize the growth and health of the garden they dive deeper into the data to create predictive models using machine learning algorithms and statistical techniques they seek to identify hidden patterns and correlations that help organizations make better datadriven decisions data scientists often work closely with data analysts sharing insights and collaborating on projects to maximize the value of the data after all at gardening you’ll want to safeguard the security and overall health of the garden that’s like the role of a database administrator or DBA database administrators work on the maintenance performance and security of an organization’s databases they ensure data is stored and retrieved efficiently implemented backup and recovery strategies and manage user access dbas play a crucial role in keeping data safe and accessible to those who need it to ensure a greatl looking garden a landscape architect designs the garden layout to maximize aesthetics and functionality in a similar fashion a data architect creates the blueprint for an organization’s data management systems they design data models establish database structures and create strategies for data storage integration and retrieval data architects collaborate with other data professionals to align their designs with business needs and support the objectives of data analysts and scientists the business intelligence or BI analyst is like the garden consultant who helps you make informed decisions about the type of plants to grow where to place them and how to care for them based on data and analysis pi analysts transform data into actionable insights that drive business growth and improve decision-making they work closely with data analysts and data scientists to extract meaningful insights from complex data sets focusing on key performance indicators and using various BI tools to visualize and present data to stakeholders bi analysts also collaborate with business leaders to understand their goals and objectives ensuring that their analysis is relevant and impactful so the next time you’re at a party and someone asks about your role what will you say you should be able to highlight the importance and variety of data analysis positions you could discuss the data engineer who is responsible for building and maintaining the data infrastructure the data analyst who identifies trends patterns and insights in the data the data scientist who creates predictive models to optimize decision-m the database administrator who ensures the security and performance of databases the data architect who designs the blueprint for data management systems and the business intelligent analyst who transforms data into actionable insights for decision makers your party friends will then understand what each role does in the data analysis process providing organizations with the information they need to make informed datadriven decisions jamie the CEO at Adventure Works has asked you to analyze customer data to identify trends and make recommendations for improving the customer experience after weeks of working through the data creating detailed visualizations and uncovering valuable insights you now need to present your findings to various stakeholders these include your team marketing sales and company executives for your project to be successful you need to effectively communicate your findings and collaborate with people at all organizational levels to succeed as a data analyst you need a strong foundation in non-technical abilities like these in addition to technical skills in this video you will explore some essential non-technical or soft skills a data analyst should have nontechnical skills are important for data analysts these skills can help you connect with and influence stakeholders increasing your impact within your organization essential non-technical skills include effective communication diplomacy understanding end user needs and being a technical interpreter for nontechnical stakeholders let’s explore each skill in more detail the first soft skill is effective communication data analysts need to effectively communicate findings to various stakeholders with different degrees of technical knowledge for example when Jamie at Adventure Works asks you to analyze customer data you would need to present your findings to team members managers and executives to communicate effectively data analysts need to present complex information clearly and concisely imagine you have identified a trend in Adventure Works data that could significantly increase sales instead of overwhelming your audience with raw data you could visually represent this trend and use storytelling techniques to explain how it could impact the business another important non-technical skill is diplomacy which is the art of navigating delicate situations and maintaining positive relationships even when disagreements arise as a data analyst diplomacy may be essential for negotiating access to data mediating disagreements among stakeholders or presenting results that challenge existing beliefs for instance you might have to present a report that disagrees with a manager’s idea by being diplomatic you can share your findings in a way that maintains trust and respect while still communicating your insights collecting and analyzing data is not sufficient for making an organizational impact data analysts also need to understand the needs of the end user of their reports this will lead to findings that are relevant and useful to the stakeholders that will use them as a result stakeholders can use the insights from your reports to take action and make informed business decisions understanding the analytical needs of a business involves asking questions empathizing with the user’s perspectives and collaborating with stakeholders to identify the most valuable insight imagine you are analyzing customer data for a marketing team by understanding the marketing team’s goals and customer frustrations you can tailor your analysis to provide more useful and relevant insights because data analysts often serve as a bridge between technical and nontechnical stakeholders it’s important to be able to translate complex concepts into understandable terms this is especially so when relaying information to stakeholders who lack a technical background one way to do this is by using analogies or metaphors to explain technical concepts for example comparing machine learning algorithms to a chef who improves their recipes over time based on customer feedback ultimately becoming a successful data analyst goes beyond mastering technical skills it also requires effective communication diplomacy a total understanding of the needs of end users and the ability to relay findings and concepts to stakeholders of varying technical knowledge by developing these non-technical skills you can better collaborate with stakeholders create actionable insights inspire change and make lasting impacts enriching your own career and contributing to the growth and success of those around you i hope this thought will inspire you as you continue your journey to becoming the best data analyst you can be if you needed to assess the prospects for a new bicycle launch in the USA by Adventure Works you wouldn’t collect data about sports clothing from the European market would you no because no matter how great your analysis is this data will not provide insights that Adventure Works can use to make informed decisions about a product launch in the USA that’s why gathering the right data is an important part of the data analysis process in this video you’ll explore how the objective or purpose of analysis informs the data analysis process you’ll learn the importance of gathering data that is aligned with this purpose and how it influences the type of scope of data used gathering the right data is crucial for conducting a successful analysis however before you can start collecting data it’s essential to determine and understand the purpose or goals of the analysis you can then collect the appropriate data to conduct an analysis that is focused relevant and useful for the end user of the analysis to determine the purpose of your analysis you will need to consult with stakeholders and consider the questions you aim to answer with the analysis such as what are the recent sales figures for bike A and bike B and insights you hope to gain through the patterns trends or relationships that emerge from the analysis such as how the introduction of bike B to the market is affecting the sales of bike A for example in the case of Adventure Works you might need to brainstorm with marketing manager Renee and the sales and marketing team to determine what they hope to achieve with analysis the purpose of your analysis will inform what is the right data to collect including the type and scope of the data to gather and use in the analysis the type and scope of data used then influence the conclusions drawn and the decisions made let’s explore how the purpose of the analysis can influence the type and scope of data used in the analysis the type of data refers to the format or structure of the data for example sales figures and numerical data suppose through consultation you determine that the primary goal of the analysis for the sales and marketing team at Adventure Works is to determine which bicycle models are the most profitable in the USA in this case the type of data you might choose to focus your analysis on is sales data which includes information on the total sales of each bicycle model the number of units sold and the revenue generated by each model however if the team is more interested in understanding which products American customers are interested in buying and how to improve the product purchasing experience customer feedback data may be more useful than sales data this might involve collecting customer reviews ratings and comments on each bicycle model as this data can provide valuable insights into customer preferences and help identify areas for improvement these examples demonstrate the role identifying and defining the end goal or purpose of the analysis plays in determining what data is relevant and should be collected aside from considering the type of data appropriated from achieving the aims of your analysis you also need to define the scope of your data in relation to the analysis purpose considering the scope of your data in data analysis includes defining the boundaries or limits of the data you’ll collect and use in your analysis such as geographical regions time periods or product categories it can also include the size or amount of the data and number of variables considered in the data to illustrate if Adventure Works stakeholders would also like to use the analysis to inform the development of a new bike in the USA you might decide to analyze market trends competitor and sales data from the past two years focusing on mountain bikes and road bikes in North America by defining the scope of the data you can ensure that you collect data that is useful for understanding the relevant product market and identifying potential product development opportunities for adventure works ultimately by carefully defining the type and scope of your data based on the purpose of your analysis you can collect relevant data this helps ensure that your analysis is accurate and relevant to the needs of the business addressing the specific objectives or goals of the project this video highlighted the importance of identifying the purpose of your analysis and then gathering relevant data of the appropriate type and scope for successful analysis this ensures that the analysis results are meaningful and useful helping businesses like Adventure Works unlock insights and make informed decisions as you continue to develop your data analysis skills remember that the foundation of any successful analysis lies in gathering the correct data you might think that a business like Adventure Works is a great place for data analysis it has access to large amounts of data from a variety of sources like sales manufacturing purchasing and marketing however that data while valuable is often not in a form that is easily understandable or ready for analysis this is where the process of preparing and analyzing data comes in in this video you’ll learn about the importance of processing and analyzing data for transforming raw data into valuable insights that can drive strategic decisions you’ll be introduced to the extract transform load or ETL process a common method for processing data you will also learn how using calculations and visualizations during analysis can help uncover hidden patterns and trends in the data first let’s define what is meant by processing and analyzing data processing data refers to transforming raw data into a format that can be easily understood and analyzed analyzing data involves using various techniques to explore interpret and draw meaningful conclusions from the processed data for Adventure Works processing data might involve consolidating data from multiple sources such as sales transactions customer demographics and product inventory this is because the data in its raw form may be scattered across different databases spreadsheets and even paper records additionally the data may be in various formats have missing values or contain duplicate entries in this case processing the data would involve cleaning organizing and transforming the data into a format that is more suitable for analysis a common data processing method is the extract transform load or ETL process the ETL process involves extracting data from various sources such as databases or files transforming the data to make it consistent accurate and ready for analysis for example by cleaning and filtering the data and loading the transformed data to a suitable destination like data repositories databases or analytical tools for further analysis this process which you will learn about in greater depth later plays a crucial role in preparing raw data for analysis now that you have a general understanding of data processing let’s explore some methods of data analysis one effective way to analyze data is by performing calculations on the processed data to reveal new insights for example Adventure Works can calculate its products total revenue profit margin or average order value these calculations can help the company identify which products are performing well and which might need improvement another powerful technique for analyzing data is data visualization visualizations or graphical representations of data such as charts and graphs can communicate complex information in a simpler way and help make complex data easier to understand they can also help uncover patterns trends and relationships within the data that might not be apparent through calculations alone for instance Adventure Works could create a bar chart to compare the total sales of different product categories or a line chart to track monthly revenue over time visualizations like these can help the company quickly identify trends spot potential issues and make more informed decisions in summary processing and analyzing data is critical to transforming raw data into actionable insights through the ETL process data can be extracted transformed and loaded into a format that is suitable for analysis when the data is processed calculations and visualizations can then be used to explore the data uncover hidden patterns and generate new insights to drive strategic decisions as you progress in this course you will learn more about the various tools and techniques available for processing and analyzing data by mastering these skills you will be better equipped to help businesses like Adventure Works maximize the value of their data and make datadriven decisions that drive growth and success jaime Lee owner and CEO of Adventure Works is concerned that sales have been stagnant and wants to take her business to the next level she’s aware of the power of data insights to drive business decisions so she employs Adio Quinn a data analyst to help provide the answers she needs to grow her company in this video you’ll explore how data insights can be used in the final stage of the data analysis process to drive business using a case study you’ll discover how these insights can empower stakeholders like Jamie to make informed decisions and improve business performance data insights refer to the valuable and actionable information knowledge and understanding generated from analyzing data this is the final stage of data analysis where the insights can be used to identify trends patterns and opportunities these insights can then lead to actionable business decisions that can help businesses grow and stay ahead of the competition let’s explore how data insights can drive business decisions practically by considering how Jamie could use insights related to sales customer and competitor data to make decisions that improve business performance at Adventure Works by analyzing sales data collected over the past year ADIO identifies that certain types of bicycles sell more during specific seasons like mountain bikes in the spring and road bikes in the summer by using this data insight Jaime can make informed decisions about inventory and promotional efforts for example she could make sure that the warehouse is sufficiently stocked up with each bike type based on seasonal demand levels and have the marketing team offer special promotions to boost sales of the bikes in their off seasons by making decisions based on data insights Jaime can optimize her inventory management and increase overall profitability suppose Adio also discovers that customers belonging to particular age groups prefer specific bicycle types or respond more positively to particular marketing messages jamie can use this information to oversee the creation of targeted marketing campaigns offerings and communications that resonate with different segments of the company’s audience by personalizing marketing efforts based on customer data insights Jaime can increase customer satisfaction and loyalty and drive more sales and revenue imagine Addio’s analysis reveals a gap in Adventure Works current offerings with customer data indicating that customers are increasingly interested in electric bikes and unique design features with insight into this growth opportunity Jaime can explore the development of new products to meet these demands making decisions related to product development and innovation for Adventure Works this datadriven approach to product development ensures that businesses create products that cater to real customer needs increasing the likelihood of success another area where data insights could drive business decisions is pricing strategy sales data competitor pricing and customer feedback can help stakeholders like Jamie determine optimal price points for products balancing demand revenue optimization and market competitiveness for example say Adio finds that customers at Adventure Works are willing to pay a premium for certain highquality bicycles jaime can then adjust the company’s pricing strategy accordingly to capture more value from those sales however if some bicycles are priced too high and are hurting overall sales Jaime can consider lowering their prices to create demand by using data insights to inform pricing decisions businesses can optimize revenue and profitability stakeholders and data analysts alike can follow some best practices to enhance the use of data insights to drive business decisions for a comprehensive understanding of a business its operations and trends and patterns it’s important to gather data from multiple sources and regularly analyze it regular data analysis makes it possible to stay upto-date with trends and make timely informed decisions it’s also important to encourage a datadriven culture where data insights are valued and used to inform decision making at all levels likewise encouraging collaboration and insight sharing within an organization can lead to better decision-m finally investing in the right tools and technology like Microsoft PowerBI can help streamline the data analysis process making it easier to gain insights and make datadriven decisions you should now have a better understanding of how data insights can drive business by embracing a datadriven approach companies can stay ahead of the competition and make better business decisions ultimately the more stakeholders like Jamie understand their data the better equipped they’ll be to make informed strategic decisions that can optimize business performance for your company imagine navigating through a dark maze without a map searching for hidden treasure this is what it feels like to dive into a vast ocean of data without the right tools microsoft PowerBI offers a solution to the challenge of navigating large amounts of data and uncovering useful insights in this video you’ll learn about PowerBI’s role in data analytics and visualization its key features and benefits and navigating its user interface powerbi is a suite of business analytics tools to help organizations transform raw data into meaningful information and make datadriven decisions there are several products within the PowerBI ecosystem including PowerBI desktop the Windows application for creating reports and dashboards that you’ll use throughout this course and others such as PowerBI service PowerBI mobile PowerBI report server and PowerBI embedded these components work together to provide a comprehensive business analytics solution allowing you to connect to various data sources clean and prepare data create impactful visualizations and reports and share findings and insights effectively powerbi has become an essential resource for many organizations across various industries let’s explore why powerbi is userfriendly its easy to use intuitive interface makes it accessible to technical and nontechnical users alike with its drag and drop functionality you can create visualizations reports and dashboards simply and quickly another benefit of using PowerBI is data integration it supports a wide range of data sources including traditional databases Excel spreadsheets and cloud-based services this allows you to consolidate data from multiple sources and create a comprehensive view of their business performance powerbi simplifies data transformation with the Power Query Editor in PowerBI you can clean transform and reshape data as needed which is important to ensure that data is accurate consistent and ready for analysis there are also rich visualization options available in PowerBI with a variety of built-in visualization types such as bar charts and maps and custom visuals developed by the community these options make it easy for you to present data in a visually appealing and easy to understand way you can perform advanced analytics with PowerBI with data analysis expressions or DAX and built-in analytical capabilities you can perform complex calculations and data analysis leading to deeper insights and better decision- making plus you can easily collaborate and share reports and dashboards with colleagues both within and outside the organization powerbi is scalable and designed to grow with organizations its various licensing options and features can accommodate businesses of all sizes and the platform can scale to meet changing business needs finally PowerBI integrates seamlessly with other Microsoft products such as Excel SharePoint and Teams and offers a cost effective pricing model now that you have some insight into why PowerBI is one of the most popular data visualization and business intelligence tools let’s examine its user interface to get started with PowerBI you’ll need to download and install PowerBI Desktop the primary application for designing and creating reports and dashboards once you have PowerBI Desktop installed you can begin exploring the main areas of its user interface you can use the ribbon located at the top of the PowerBI desktop window to quickly access various tools and features to create and customize your reports and dashboards it contains several tabs such as home insert modeling and view each tab has its own collection of buttons and options for performing common tasks like connecting to data sources creating visualizations and formatting your reports in the left navigation pane you can select report to open report view report view is the primary canvas where you design and create your visualizations you can add and arrange different visual elements here like charts tables maps and more to build your report pages allow you to create multiple views of your data in a single report at the bottom of the PowerBI desktop window you’ll find a row of tabs you can use these to organize your visualizations based on themes or categories to add duplicate or remove pages use the tabs at the bottom of the report view the visualizations pane is located on the right side of the window and contains a gallery of visual elements that you can add to your report there are various types of visuals available that you can add to your report by clicking or dragging them from the visualization pane onto the report view also on the right side of the window is the fields pane it displays the data tables and fields available for your report as you learn to build reports in PowerBI you’ll use the fields pane to populate your visualizations with data the fields pane is organized into two sections the top section displays the available tables and the bottom section shows the fields within the selected table last the filter pane found on the right side of the window allows you to apply filters to your data at various levels such as the entire report individual pages or specific visualizations in this video you discover the benefits of using PowerBI as a business intelligence tool and explored its user interface by understanding its key features and capabilities you’re one step closer to using PowerBI to create reports that communicate your insights effectively and drive meaningful change businesses like Adventure Works often have a large amount of data but don’t know how to extract the insights hidden within in this video you’ll discover how calculations and visualizations in Microsoft PowerBI are used to analyze this data generate and communicate insights and empower businesses to make datadriven decisions you’ll learn the key concepts behind calculations using data analysis expressions or DAX and how visualizations can communicate complex data and insights in PowerBI calculations are the foundation of your data analysis and are created using a powerful language called data analysis expressions or DAX calculations allow you to perform specific operations on data manipulate it and create new calculated measures columns and tables that you can use in visualizations and reports to drive decision-m with custom calculations you can tailor your analysis to specific business requirements and address unique analytical needs some common calculations are aggregations where multiple values are combined or grouped into a single value to summarize large amounts of data for example summing up finding the average or counting data points based on specific criteria timebased calculations for comparing data across time periods such as month over month or year-over-year growth and ratios and percentages for calculating proportions or shares of a whole to understand the relative performance of different elements to illustrate with data on monthly sales Adventure Works could use DAX to calculate the average monthly sales determine the month with the highest sales or identify the percentage of sales coming from a specific product category after performing calculations with your data the next step is to represent the results visually visualizations enable you to communicate complex data and insights in a simple appealing way by presenting data graphically visualizations make it easier for stakeholders to grasp key insights trends and patterns that may be difficult to identify from row data or tables powerbi offers a wide range of visualization types such as different charts maps tables and even custom visualizations when choosing the most suitable visualization you should consider the type of data you’re working with for example whether the data is numerical or categorical consisting of non- numeric variables the purpose of your analysis such as comparing values showing distribution understanding relationships or tracking trends as well as the level of detail needed from highlevel summaries to granular insights now let’s explore how to create a visualization in PowerBI using a given data set suppose you are part of a team analyzing sales data and creating a report for Adventure Works you need to create a visualization that represents the number of orders across the different bite categories to create your visualization you first need to import your data to do this open Microsoft PowerBI desktop click on get data in the home tab then select text/ CSV and click connect navigate to the location of the CSV file containing the data you need in this case the Adventure Works bike sales data select it and click open once the data is loaded the data view will display the important data in a table format take a moment to familiarize yourself with the structure of the data the next step is to create a bar chart of the bike sales by category click on the report view which is the first icon on the left side of the PowerBI interface next click on the clustered bar chart visualization icon in the visualizations pane this is a bar chart with multiple bars after that drag and drop the product category field onto the y-axis section of the visualization pane then drag and drop the order quantity field onto the x-axis section of the visualization pane this bar chart visualization shows the total order quantity for each product category this can help Adventure Works quickly identify which bike categories have the highest or lowest number of orders they can use the insight to make informed decisions about inventory management marketing strategies and product development you’ve now gained a foundational understanding of calculations and visualizations in PowerBI and their role in generating results and insights from data you learned about using DAX calculations for data analysis and using visualizations to communicate data insights and help businesses make datadriven decisions congratulations on completing this first module on data analysis in business let’s recap some key concepts that you covered in lesson one you were introduced to the course and syllabus explored some tips for successfully completing the course and engage with your peers in the second lesson you learned more about the essential role data analysis play in businesses helping them collect organize analyze and understand their data data analysis can help businesses gain insights from their data identify the cause of problems uncover trends and make decisions that can improve business performance you are introduced to the stages of data analysis and the interconnected roles available within this process from data engineers to business intelligence or BI analysts you also explored some important skills data analysts need to succeed in their role including nontechnical skills like effective communication and understanding end user needs in lesson three you examine the stages of data analysis in more depth these stages include identifying the problem or purpose of the analysis collecting processing data and analyzing data data visualization and report sharing and implementing insights and recommendations you learned that gathering the right data is fundamental to an analysis that is relevant and useful understanding the purpose of your analysis will inform the type and scope of data that is correct for the analysis you then explore the processing and analyzing stages of data analysis some are more processing involves transforming raw data in preparation for analysis and analysis involves analyzing the processed data and generating insights you are briefly introduced to the extract transform load or ETL processing method and learned about DAX calculations and visualizations in data analysis you also learned about some factors to consider before sharing reports with stakeholders including the accessibility visual appeal and security of your report as well as data storage and refresh schedules you discovered the importance of understanding stakeholder experience and applying this to data visualization and analysis to more effectively convey data insights you learned how data insights can drive informed business decisions and lead to improvements like increased customer satisfaction you then explored some best practices for stakeholders and data analysts to follow to drive business decisions including collecting data from multiple sources regular data analysis encouraging datadriven culture and collaboration and insight sharing and investing in the right tools and technology you also had the opportunity to apply the knowledge gained in the lesson by evaluating an analysis process finally you were introduced to Microsoft PowerBI and its many benefits including its userfriendly interface rich visualizations and advanced analytics you learned how to navigate PowerBI’s users interface set up your own PowerBI desktop environment view a report and generate interactive visualizations you now know more about the role of a data analyst the data analysis process the role data analysts play in business and PowerBI as a tool for data analysis with the foundational knowledge you’ve gained you are ready to move on to your next lesson on harnessing the power of data in PowerBI in previous lessons you learned about the importance of data and the role it plays you discovered how organizations aim to derive meaningful insights from their collected data in this context it’s necessary to identify the collected data and evaluate which parts of it are required you could start a data project by first determining what is being measured and what are the critical issues you need to make decisions about the answers will help you to identify and evaluate the data correctly now let’s examine the process of data identification and evaluation in more detail this process includes understanding the importance of asking the right questions analyzing the required data for a business decision and data type classification by the end of this video you’ll understand data classification and modern data sources and you’ll learn how to use these in business decisions proper data valuation depends on the key skills of identifying data sources and asking the right questions let’s explore data evaluation at Adventure Works a fictitious large multinational company that makes and distributes bicycles and accessories to global markets jamie the CEO at Adventure Works wants to analyze sales data to reveal factors that influence the sales of their products a good place to start the analysis is to streamline the business requirement from complex to simple and then establish relationships between any multiple topics let’s take the example of identifying factors that affect sales to do this analysis you need first to determine the data to be measured and the potential factors that could influence it for instance this includes internal company data data from social media and sensor generated data such as product codes from barcode scanners or identity confirmation from facial recognition software sales data is the main area that Adventure Works wants to assess a critical source of this information comes from their enterprise resource planning or ERP system erp systems are designed to collect store manage and interpret structured data from various business activities structured data is data that is organized into a formatted repository typically a database so it’s easily searchable in the context of Adventure Works everything is a physical store from product shelves product categories to points of sale employees and customers and are all defined and stored in the table of the ERP database this kind of data structure creates a digital mirror of the real world store and provides a highly efficient and effective way for Adventure Works to analyze sales data from various periods such analysis could be based on product category or type of customer providing actionable insights into sales trends customer behaviors and product performance how you evaluate the ERP database depends entirely on your perspective and analysis evaluation questions could be are sales generally showing a downward or upward trend are there seasonal increases or decreases in certain categories how do holidays or special occasions affect sales have sales shown variability by age gender income level or customer geographic location on a product or category basis now let’s consider other potential data sources for Adventure Works in addition to the ERP data examining the situations that occur before or during the purchase are useful an excellent example of such a source is the sensors installed in the automatic doors of the store the data from these sensors revealing the number of people entering and exiting the store at any given time can be categorized as semistructured data semistructured data falls between structured and unstructured data while it doesn’t conform to the formal structure of data models as seen in an ERP system it contains tags or other markers to separate data elements and enforce hierarchies of records and fields within the data the data obtained from door senses might be tagged with information like timestamps store identifiers or locations allowing for more detailed analysis this data can be used to evaluate the store’s visit intensity over different periods offering an opportunity to correlate store traffic patterns with sales volume this analysis could lead to insight about peak selling times the effectiveness of promotions or how staffing levels relate to sales in addition Adventure Works can analyze unstructured data flowing from social media channels to gauge the company’s popularity and reputation this can include online messages related to the company social media check-ins photos and videos shared by customers unstructured data is information that doesn’t have a predefined structure or isn’t organized in a predefined manner making it less straightforward to analyze for adventure works this social media data can be evaluated from different dimensions such as the timing of posts or demographic characteristics of the audience interacting online with the company for instance by conducting trend analysis the company can gauge the popularity of its brands products or campaigns this analysis can inform marketing strategies customer engagement tactics and product development with a robust data identification and evaluation strategy to identify and evaluate the correct data sources companies like Adventure Works can harness the full potential of data to uncover actionable business insights each piece of data regardless of its type structured unstructured or semistructured holds immense value the true power of data lies not in its volume or variety but in its purposeful utilization remember data itself is not the end goal instead it’s a tool to help businesses make more informed decisions therefore it’s vital to understand why you’re using the data how it serves your purpose and what methods you’ll use for its evaluation what’s the best way to use Microsoft PowerBI as with other software you may have your own preferred way to use it and that’s okay however in this video you will explore key PowerBI components and discover their primary purpose to achieve the best results you must use these components in the proper order that sequence of use is known as a workflow over the next few minutes you’ll get to know how a common workflow operates in PowerBI microsoft PowerBI is an interactive data visualization product with multiple components you use its components and its rich visualization features to create meaningful reports from different data sources and types of data let’s explore the details of Microsoft PowerBI’s three main components powerbi Desktop PowerBI apps and PowerBI service powerbi Desktop is a Windows-based desktop application that is mainly used by data analysts or report designers to clean transform and load data create a data model design reports and publish these reports powerbi desktop uses PowerBI connector to access various data types and data sources connectors allow you to read data from various sources this includes resources located in the local file system such as Microsoft Excel or PDF documents conventional database systems hosted on internal servers called onremise databases cloud-based databases and even external enterprise applications and application program interfaces or APIs powerbi service is the cloud-based BI service or software as a service part of PowerBI it is used by report users and administrators powerbi apps is the native mobile application of PowerBI it’s available on iOS Android and Windows with these components and interfaces Microsoft PowerBI enables users from various disciplines such as report designers administrators and business users to use the product according to their roles as mentioned earlier the order in which you use these components is known as a workflow a PowerBI workflow can be described as the steps taken with data to create publish and share a typical workflow in PowerBI often starts with the creation of a report in PowerBI desktop report designers and developers are primarily responsible for this task when the report is ready you publish it to the PowerBI service where administrators can assign permissions and specific users can consume the report now let’s examine each step of the workflow in more detail create is about importing data and creating a report this step is when you import your data sources into PowerBI desktop clean transform and load your data in order to have targeted data for your reports use your filtered data to create a report and analyze and present your data using various visualizations and charts in your report then you move on to the publish step of the workflow where you publish reports and create dashboards that means you publish your report to the PowerBI service and share your data with others by creating dashboards and use different visualizations and filters to make your data more understandable in your dashboard the final step of this workflow is sharing in this step you share dashboards with users and manage access to your data share your dashboards with the users needed to make it easier to collaborate on projects manage access to your data by ensuring that dashboards have different user permission levels this is also where you consider mobile usage for instance using PowerBI mobile apps you can view and interact with reports and dashboards that have content pinned from reports anytime and anywhere you can use different features of the mobile apps to explore and share your data from different perspectives in summary a typical Microsoft PowerBI workflow sequences the requirements needed to choose data sources and types in step one and then step two is used to visualize the data the third and final workflow step presents the resulting reports and dashboards to cater to different user types and their requirements using such a workflow you combine different types of data from many sources using various components such as PowerBI desktop PowerBI service and PowerBI apps have you ever tried to solve a jigsaw puzzle when the pieces are scattered everywhere and you don’t even know those pieces belong to the same puzzle that’s what it can feel like as a data analyst tasked with extracting insights from data that spread across multiple sources formats and structures not to worry there’s a way to solve this problem the extract transform load or ETL process in this video you’ll build on your knowledge of the ETL process you’ll explore the three main components of the ETL process and how to apply them the benefits of using the ETL process and how it’s performed using Microsoft PowerBI as you learned earlier in this course ETL stands for extract transform and load the names given to the three main steps in the ETL process this process involves taking raw data from various sources preparing it for analysis and loading it into a repository or data storage and management system let’s explore each step of the ETL process in more detail and how they can be applied in the scenario of the manufacturing company Adventure Works which produces and distributes bicycles and accessories extract is the first step in the ETL process which involves retrieving and extracting raw data from different sources such as databases files or other data storage systems for example imagine that Adventure Works data is scattered across multiple systems as is the case with many organizations say customer data is stored in a data management system called customer relationship management or CRM sales marketing and manufacturing data is in an enterprise resource planning system or ERP and purchasing data is in spreadsheets the extraction process involves pulling the data from these different sources then you consolidate it into an easily accessible central location often a temporary intermediate storage location known as the staging area and prepare it for further processing in the next step once the data is extracted the second step is to transform it transforming the data involves cleaning structuring and enriching the data to make it more suitable for analysis this may involve removing duplicates handling missing values creating new calculated fields converting data types and standardizing measurement units in the case of Adventure Works let’s say that the sales and marketing data is in US dollars but the manufacturing and purchasing data is in different currencies depending on where in the world the sales or purchase take place as part of transforming the data you may need to convert all the currency values into a standard unit of measurement in this case US dollars to ensure consistency the third and last step involves loading the transformed data into the final storage system typically a data warehouse where it can be readily accessed and analyzed for example using tools like PowerBI depending on the organization’s needs the loading process can be a one-time event or scheduled to run regularly in the case of Adventure Works the cleaned and transformed data might be loaded into a cloud-based data warehouse making it accessible to the company’s data analysts and decision makers the ETL process ensures that the data analyze is accurate clean and consistent which in turn supports informed decision-m this process offers many benefits including data integration etl helps integrate data from different sources providing a unified view of an organization’s data making it easier for analysts to perform analysis and derive insights data quality etl processes involve data cleansing and validation which significantly improve data quality data consistency by transforming data into a standardized format ETL ensures consistency across various data sets enabling analysts to easily compare and analyze data from different sources enhance performance by aggregating summarizing or indexing data during the transformation process etl can improve query performance and reduce the load on data analysis systems and data governance etl can support data governance initiatives by helping organizations maintain a single source for their data ensuring that everyone has access to the same accurate information widely used in data analytics tools like PowerBI the ETL process helps you bring together refine and assemble different data pieces into a coherent picture that can drive business decisions powerbi is just one tool that comes equipped with built-in ETL capabilities enabling you to connect to many different data sources transform your data using Microsoft Power Query and load it into the PowerBI data model power Query is a powerful ETL tool within PowerBI providing a graphical interface and formula language called M to perform various data transformation tasks with Power Query you can extract data from multiple sources clean and structure it and load it into PowerBI for creating reports and visualizations the extract transform load or ETL process is essential for any datadriven organization the importance and benefits of ETL lie in its ability to turn raw data into accurate and consistent information in a centralized system that is easy to analyze and use in decision-m because data is critical to better decision- making embracing tools that can support the ETL process such as PowerBI can significantly impact business performance addio the data analyst at Adventure Works needs to analyze sales data from multiple channels including physical stores and e-commerce platforms he asks the data analytics team to gather and ingest the data a fundamental step before he can proceed with the later stages of the extract transform load or ETL process in this video you’ll explore data gathering and ingestion including different methods to gather and ingest data and their advantages and disadvantages let’s start by outlining data gathering and ingestion which typically take place in the extract step of the ETL process data can come from a variety of sources such as structured data from spreadsheets or databases unstructured data from text files or social media posts and streaming data from realtime data transmissions such as webcams or satellite navigation systems data gathering involves collecting or acquiring data from these different sources an example of gathering data is the data analytics team at his venture works collecting all their sales data ranging from spreadsheets to realtime streams data ingestion starts with data gathering and encompasses the process of obtaining and importing data from various sources for immediate use or storage such as in a database for example as a part of data ingestion the team at Adventure Works can go on to extract relevant data from each source such as customer data and sales metrics like revenue they can then load it into a central database where it can be accessed for further processing and transformation the data gathering and ingestion process is beneficial for organizations for various reasons with data volume velocity or speed of generation and variety in terms of types and sources constantly increasing it helps organizations consolidate their data this unified view of their data facilitates comprehensive analysis datadriven decision-m and innovation data ingestion improves operational efficiency through process automation proper ingestion practices can also help organizations meet regulatory requirements protect sensitive data and ensure data integrity now that you know more about data gathering and ingestion and its benefits let’s explore some common methods for gathering and ingesting data as well as their advantages and limitations these include manual data entry filebased ingestion database connections web scraping and data streaming manual data entry is the most basic method of data gathering and ingestion where data is manually inputed into a system for example an employee at Adventure Works may type in data from a physical customer order form into a customer relationship management or CRM system while manual data entry is straightforward and suitable for small amounts of data it is time consuming prone to errors and unsuitable for large scale data ingestion another method is filebased ingestion the process of importing data from files such as spreadsheets to illustrate Adventure Works might receive sales data from retail stores in Excel spreadsheets these files can be imported into the ETL process using tools that read and parse or interpret the file contents while filebased ingestion is common and requires less technical expertise than other methods it can become cumbersome when dealing with large numbers of files or frequent updates with the database connection method you access data directly from a database or data warehouse using tools that can connect to and query the source for example Adventureworks can create a database connection to access data from its sales database using SQL queries this connection enables the analytics team to extract necessary data by using SQL commands as well as transform and load it for further analysis later in the ETL process while database connections offer real-time access to data enabling instant insights and prompt decision- making they do require knowledge of database languages like SQL and may involve complex configuration or authentication process web scraping is a method of extracting data from websites using automated methods or software tools in the case of Adventure Works the analytics team can use web scraping to gather competitor pricing information or customer reviews web scraping is a powerful way to gather data from websites but it can require legal permission and be complex as it involves a range of technologies streaming data is continuous real-time data generated by sensors or other sources you can ingest data streaming using tools that connect to and process the data as it is generated for instance Adventure Works could use data streaming to monitor factory equipment track inventory levels or analyze real-time sales data data streaming allows for immediate analysis and decision-m but requires specialized tools and infrastructure to handle the continuous flow of data each data ingestion method has its advantages and limitations so it’s essential to choose the appropriate data ingestion method based on your specific use case and the nature of the data you’re working with in summary data gathering and ingestion involve obtaining and importing data from different sources generally in the extract phase of the ETL process data gathering and ingestion have many benefits for businesses from consolidating data to facilitating innovation by mastering the data gathering and ingestion methods introduced in this video you can help organizations like Adventure Works optimize their data for analysis due to rapid growth Adventure Works needs to store and manage increasing volumes of data from different sources the company must develop a comprehensive plan for data storage and management to handle its changing data needs in this video you learn about the role of data storage and management planning in the extract transform load or ETL process and for organizations in the short and long term you’ll also learn key considerations for effective data storage and management planning planning for data storage and management is involved throughout the ETL process during the extract step you need to consider what types of data you’ll be collecting how often and from which sources setting the foundation for data management in the transform step proper data management ensures the transform data is consistent accurate and complete planning for data storage is also necessary as the transformed data may need temporary storage before being loaded into its end destination finally in the load step planning for data storage and management like considering database or data warehouse structure facilitates efficient retrieval and analysis of stored data in a broader context planning for data storage and management impacts multiple aspects of an organization short-term data storage and management solutions address immediate data needs facilitating quick access to up-to-date data and collaboration for Adventure Works this is vital for daily operations like responding to customer inquiries and processing transactions long-term storage and management planning caters to strategic goals and compliance requirements for example long-term storage solutions will enable Adventure Works to analyze sales data customer feedback and market trends over time informing decision-m and improvement strategies when planning for data storage key considerations include storage capacity data access scalability security and backup and disaster recovery one of the first considerations is how much storage capacity you need this depends on factors like organization size data types and average file size required storage duration and anticipated data volume growth accurate estimation can prevent the cost of overprovisioning and lower underprovisioning risks like data loss and system performance issues it’s also important to consider how easily you and your team can access data when needed whether for daily operations and collaboration or long-term trend analysis planning for accessibility may involve organizing file structure implementing searchability and retrieval mechanisms and providing remote access options another factor is the scalability of your storage solutions or its ability to adapt to changes in data volume technology and data types planning for scalability helps ensure the storage infrastructure can support your organization’s data needs as they change over time without compromising performance requiring major infrastructure changes or incurring excessive costs next is security considering storage security is vital as data breaches can have serious consequences like financial loss planning and implementing security measures such as access controls and data encryption help protect your data against unauthorized access theft or tampering and emerging threats and vulnerabilities lastly a comprehensive backup and disaster recovery plan is essential for minimizing the impact of data loss due to unexpected events such as hardware failures or human error this involves creating regular data backups on site offsite or both implementing a recovery strategy that outlines how to restore data and resume operations and regularly testing and updating the recovery plan now that you’re familiar with data storage planning let’s focus on data management which involves organizing maintaining and protecting data to ensure its quality accuracy and accessibility key aspects of data management planning include data governance data quality data integration data security and privacy and data retention and archiving data governance establishes policies and procedures for data collection storage access and usage throughout your organization this helps prevent data silos or isolated sets of data ensures data accessibility and promotes data quality and responsibility among team members data quality considerations ensure accurate complete up-to-date data relevant to business needs you can implement processes for checking cleaning and enriching your data to maintain high quality data data integration plays an important role in the combination and consolidation of data from multiple sources and formats into a unified view facilitating data analysis and insights data security and privacy include planning measures such as access controls activity monitoring and compliance with data protection regulations implementing a data retention policy and archiving process to ensure data is retained for the appropriate time based on factors like legal or business requirements are important aspects of data management planning in conclusion data storage and management planning helps organizations develop comprehensive solutions to handle their current and future data needs even during periods of expansion as with adventure works by considering data storage factors like storage capacity and accessibility alongside aspects of data management from data quality to retention organizations can ensure efficient data storage management and use imagine you have a Microsoft Excel spreadsheet of raw data from various sources your task is to analyze it and generate insights to help Adventure Works make informed decisions as you start exploring the data set you realize that it’s filled with inconsistencies missing values and duplicate entries if you don’t address these issues your analysis will be flawed and potentially lead to costly mistakes this is where data cleaning and transforming comes into operation in this video you’ll explore data cleaning and data transformation discover how they impact the quality of your analysis and compare the implications of cleaning data at source and in PowerBI data cleaning is the process of identifying and correcting errors and inconsistencies in data sets this includes removing duplicate entries filling in missing values and fixing incorrect data types data transformation involves altering the structure format or values of the data to make it more suitable for analysis this may include aggregating data converting data types or normalizing values both cleaning and transformation are crucial to ensure the quality and reliability of your analysis for instance imagine you’ve been given a data set that contains information about customers products and sales transactions some customer names are written in all caps while others are in sentence case making it difficult to group or filter the data by customer name cleaning this data would involve standardizing the format of customer names an example of transforming this data is calculating the total revenue for each customer which would require aggregating the sales data by customer and multiplying the quantity of products sold by their respective prices inconsistent untidy or duplicate data entries can have a negative impact on data analysis these issues can lead to inaccurate or misleading results which can lead to poor decision-m for example if duplicate sales transactions are included in the data the total revenue might appear higher than it actually is this can result in overestimating the company’s performance and making illinformed decisions about resource allocation now let’s discuss the difference between cleaning data at the source and cleaning data in PowerBI cleaning data at the source involves addressing data quality issues directly within the source system such as a database or a spreadsheet this method ensures that any future analysis using this data will have a clean and consistent foundation however this approach may not always be possible especially if you don’t have direct access to the source system or if multiple systems are involved cleaning data in PowerBI involves importing the raw data and applying cleaning and transformation steps within the PowerBI environment this approach addresses data quality issues without modifying the original data source however this means that you may need to repeat the cleaning process each time you import the data into PowerBI which is time consuming and prone to errors let’s consider examples of data cleaning in PowerBI and data cleaning at the source the source refers to where your data is coming from for instance it could come from internal software like enterprise resource planning or ERP systems accounting software databases or Microsoft Excel let’s start by exploring how to clean data at the source adventure Works stores its sales customer and product information in a centralized database the data quality team decides to implement data validation rules and standardize the formatting of customer names directly in the database this ensures that any future analysis of this data has a consistent and accurate base by addressing the data quality issues at the source Adventure Works can save time and effort in future analysis as the data will already be clean and ready for use now let’s switch to an example of cleaning data in PowerBI rather than at the source imagine that Adventure Works stores its sales and data in multiple systems and the data quality team does not have direct access to all the source systems they choose to import the raw data into PowerBI and apply cleaning and transformation steps there while this approach allows them to address data quality issues and generate accurate insights it also means that they will need to repeat the cleaning process each time they import new data this is time consuming and if the cleaning steps are poorly documented it may lead to inconsistencies in future analysis in summary data cleaning and transforming are essential data analysis processes they help ensure your insights are accurate and reliable data cleaning involves identifying and correcting errors and inconsistencies in data sets data transforming involves altering the data structure format or values to make it more suitable for analysis now that you understand the implications of cleaning data at the source compared to EmpowerBI you can choose the most effective approach for your needs by improving your data cleaning and transformation skills you’ll be better equipped to tackle the challenges of errors and inconsistencies in data sets picture this you’re at your desk with your morning coffee your manager needs a comprehensive report on Adventure Works sales performance across all regions product categories and customer types and she needs it by the end of the day your heart races as you think about the vast amount of data you’d have to sift through scattered across numerous files databases and systems but you don’t panic you remember that Microsoft Power Query can help with Power Query you know you can efficiently connect to multiple data sources transform unclean data and create a structured data set for further analysis in PowerBI this video explores the capabilities and benefits of Power Query you’ll discover how Power Query helps you connect to multiple data sources clean and transform data and create structured and repeatable data preparation workflows for efficient data analysis microsoft Power Query more commonly known as Power Query is a data connectivity and data preparation tool built into Microsoft’s PowerBI suite it plays a crucial role in the data analysis process by enabling you to connect to a wide range of data sources clean and transform the data and then load it into PowerBI data models for analysis and visualization power Query streamlines and automates the process of preparing data for analysis making it easier for you to gain valuable insights from data power Query is designed to handle the extract transform load or ETL process an essential part of any data analysis workflow let’s explore how Power Query can help with the ETL step extract power Query can connect to various data sources such as relational databases Excel workbooks CSV files web pages and more once connected you can select the specific tables or data sets you want to work with transform with the data loaded Power Query provides a userfriendly interface for cleaning and transforming the data you can perform various transformations such as filtering sorting merging splitting grouping and aggregating data load once the data has been cleaned and transformed Power Query loads it into the PowerBI data model where you can further analyze visualize and share power Query is particularly useful in the following scenarios connecting to multiple data sources power Query simplifies the process of connecting to any consolidating data from different sources into a single data set for further analysis cleaning and transforming data power Query provides a wide range of tools and functions that help you clean reshape and transform data into a structured and usable format automating data preparation tasks power Query records the steps you take when transforming data creating a repeatable and editable process this feature not only saves time by automating repetitive tasks but also ensures consistency and accuracy during data preparation structured and collaborative workflows power Query’s ability to record and edit transformation steps makes it easy for you to share data preparation workflows with colleagues power Query also promotes a structured and repeatable approach to data preparation as you perform transformations it records these steps in an applied steps pane which allows you to review modify or delete any step in the process this makes it easy to fine-tune your data preparation workflow and ensures that you can consistently reproduce your results to illustrate the ability of Power Query let’s return to your task of creating a sales performance report for Adventure Works based on all sales regions in this situation your data is scattered across various sources such as Excel spreadsheets CSV files databases and even web pages with Power Query you can easily connect to these different sources extract the relevant data and consolidate it into a single data set once you’ve connected to your data sources Power Query provides a userfriendly interface that allows you to perform various data transformations such as removing unwanted columns or rows splitting or merging columns changing data types and filtering and sorting data power Query is ideal for extracting data from various sources cleaning and transforming it and then loading it into a PowerBI data model for further analysis and visualization this enables you to create a comprehensive Adventure Works sales performance report breaking down sales by region product category and customer type just as your manager requested part of the PowerBI suite Power Query is a versatile and powerful data connectivity and preparation tool by connecting to multiple data sources cleaning and transforming data and creating structured and repeatable data preparation workflows Power Query helps you at each stage of the ETL process turning raw data into valuable insights that drive informed decision-making as you continue to work with data and explore the world of PowerBI Power Query will become an indispensable tool in your data analysis toolbox imagine yourself as an artist standing before a canvas prepared to create a masterpiece the colors on your palette are your data and your brush is Microsoft PowerBI how you blend these colors the strokes you choose and your vision will determine the beauty of your final painting your business intelligence insights working through this week on the right tools for the job you learned the techniques to paint a masterpiece you covered the importance of identifying suitable data and evaluating data sources data gathering and ingestion transforming and loading the data in preparation for analysis and using the extract transform load or ETL capabilities of Microsoft PowerBI and Microsoft Power Query let’s revisit some of the key concepts you covered in the week you started your journey with an exploration of data collection identifying and evaluating the required data in the foundation for successful business decision-making you learn the importance of asking the right questions and analyzing the necessary data for business decisions illustrated through the scenario of adventure works you explore the need to understand the purpose of the data how it serves this purpose and how it should be evaluated learning about classifying data as structured unstructured and semistructured types you then continued to the workflow in PowerBI the artist’s brush in the earlier analogy you discover that PowerBI with its three main components PowerBI desktop PowerBI service and PowerBI apps is a powerful tool for creating meaningful reports from various data sources you were introduced to the PowerBI workflow to effectively sequence your work from importing data to creating dashboards sharing them and managing access permissions next you explored the ETL process and related concepts you learned about data gathering and ingestion the act of obtaining and importing data from different sources this process aids in data consolidation enabling enhanced decision-m and innovation you covered some common methods of data ingestion and gathering from less technical methods like manual data entry to methods that require specialized tools or knowledge like database connections you also learned more about data storage and management and their importance for datadriven organizations you explored key considerations for data storage planning such as storage capacity and data access needs as well as key aspects of data management planning from data governance to retention and archiving your journey then led you to data cleaning and transformation much like cleaning and preparing your paint brushes before creating a masterpiece data needs to be cleaned and transformed to ensure its quality and suitability for analysis you learned how data cleaning addresses inconsistencies missing values and duplicate entries in data sets while data transformation enhances data analysis through processes like aggregating data converting data types and normalizing values after that you explore the practical aspects of cleaning data at the source in Excel before importing it into PowerBI you discovered the importance of using key Excel functions like text functions data and time functions logical functions and lookup functions to ensure the reliability and accuracy of our data in the final part of the week you explored Microsoft Power Query in PowerBI a data connectivity and preparation tool that handles the ETL process you should now understand how Power Query helps in connecting to multiple data sources cleaning and transforming data automating data preparation tasks and creating structured and collaborative workflows this week you were introduced to some of the tools you can use to create data analysis masterpieces robust insightful and visually appealing business intelligence reports in future courses you’ll have the opportunity to develop practical skills in using these tools as you continue your PowerBI learning journey remember that like a skilled artist a successful data analyst must know their tools well understand their medium the data and have a clear vision of the end result the knowledge and skills acquired in this week will serve as a strong foundation to build on enabling you to create compelling data narratives that drive informed business decisions you’ve now reached the end of your learning journey for this harnessing the power of data with PowerBI course building a solid foundation in learning how to use Microsoft PowerBI to help businesses make the most of their data with Microsoft PowerBI in your data analysis toolkit you discovered how you can use data effectively to help stakeholders make informed business decisions you’ve put great effort into completing this course by working through a range of videos readings exercises and quizzes in the final course assessment you’ll apply what you’ve learned by completing tasks that simulate a real world data analysis scenario to consolidate your learning you’ll then take a final graded quiz to assess the knowledge and skills you gained throughout this course in this video you’ll review key learnings related to the data analysis process for businesses and the process of transforming data into valuable insights using PowerBI this will help you prepare effectively for your upcoming assessments now let’s get started by revisiting your first week of learning in the first week you learned about data analysis in business including the interconnected roles available to you in the world of data you primarily focus on the role of a data analyst when exploring the data analyst role you cover the skills data analysts need to collect process analyze and ultimately transform raw data into valuable business insights another key learning point was the stages of the data analysis process you learned that the data analysis process includes identifying the analysis purpose or defining the business problem data collection and preparation data processing and modeling data analysis visualization and interpretation and reporting and sharing data insights in relation to data processing you explored how you can use the extract transform load or ETL process to transform raw data in preparation for analysis you were introduced to data analysis expressions or DAX calculations and using visualizations during the data analysis stage you also explored some factors to consider when creating data analysis reports and best practices for supporting datadriven decision-making in businesses the importance of gathering the right data and engaging with the analysis purpose for successful data analysis was emphasized you learned the significance of understanding stakeholder experience you discovered how tailoring your data analysis and visualization with this in mind can enhance comprehension engagement and the relevance of data insights part of your learning included discovering how data insights can drive business decisions and how stakeholder engagement can facilitate this process you then went on to learn more about Microsoft PowerBI and its user interface components powerbi is a userfriendly but powerful tool for data analysis and visualization week two began with an exploration of data collection and the importance of asking the right questions to ensure you gather the right data this included learning about identifying suitable data by evaluating data sources and types you were introduced to the PowerBI workflow consisting of PowerBI desktop PowerBI service and PowerBI apps you learned that with the PowerBI workflow you can import data generate data insights create meaningful reports and dashboards and share and manage those reports and dashboards you then explored elements of the extract transform and load process in more depth as a part of this process you covered data gathering and ingestion which are integral to the data analysis as well as methods for performing them you also explored the importance of effective data storage and management which is involved throughout the ETL process data storage and management planning and considerations from storage capacity and data access needs to data retention and archiving were highlighted as crucial for datadriven organizations you then learned more about data cleaning and transformation essential steps to ensure data quality and accuracy prepare your data for analysis and enhance your analysis you discovered how to clean data at source in Microsoft Excel before you import it into PowerBI the week of learning concluded with an introduction to Microsoft Power Query Editor in PowerBI a data preparation tool with ETL capabilities you learn that Power Query can help you connect to multiple data sources clean and transform data automate data preparation tasks and create workflows as you embark on the final course exercise and graded quiz you can approach your assessments with confidence knowing that you’ve built a strong foundation of knowledge and skills by committing to your learning journey throughout the course however if you feel the need to review any of the concepts summarized for you in this video or require additional preparation remember that you have the flexibility to revisit any of the course items it’s now time to showcase your learning starting with an invaluable practical exercise in this exercise you’ll engage in key tasks that form part of the initial phases of the data analysis process for a product launch analysis wishing you the best of luck as you embark on the final week of this course congratulations on completing the harnessing the power of data with PowerBI course with your hard work and dedication you’ve made great progress in your data analysis learning journey you should now have a thorough understanding of the following topics the role of data in driving decisions and business outcomes how data is produced gathered and transformed into insights in businesses and organizations the stages in the data analysis process the role of the data analyst including related skills tasks and tools the components of Microsoft PowerBI and using PowerBI as a tool for data analysis and visualization this course provided you with a foundation in data analysis in Microsoft PowerBI you discovered the importance of data analysis in business with a deep dive into the role of a data analyst in supporting datadriven decision-m in organizations you’ve learned all about the data analysis process and how to ensure that the analysis you perform is useful for stakeholders whether you’re engaging with stakeholders to determine the analysis purpose or business problem gathering the right data or reporting the insights you now have a comprehensive understanding of each stage of the process you familiarize yourself with PowerBI including its user interface and components you had the opportunity to generate your own visualization a key skill for a data analyst you also learned about the PowerBI workflow and using Power Query Editor in PowerBI for transforming data the foundational knowledge you’ve gained represents a significant step towards using PowerBI effectively to generate valuable insights from data well done this course forms part of the Microsoft PowerBI analyst professional certificate these professional certificates from Corsera help you get job ready for in demand career fields the Microsoft PowerBI analyst professional certificate in particular is not only a way to broaden your understanding of data analysis but also gain a qualification that can serve as a foundation for a career in data analysis using Microsoft PowerBI plus the professional certificate will help you prepare for exam PL300 Microsoft PowerBI data analyst by passing the PL300 exam you’ll earn the Microsoft certified PowerBI data analyst certification this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to prepare data model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX of which you gain some foundational knowledge in this course you can visit the Microsoft certifications page at http://www.learn.microsoft learn.microsoft.com/certifications to learn more about the PowerBI data analysis certification and exam this course enhance your knowledge and skills in the fundamentals of data analysis in PowerBI but what comes next well there’s more to learn so it’s recommended you move on to the following course in the program whether you’re new to the field of data analysis or already have some expertise and experience completing the whole program demonstrates your knowledge of and proficiency in analyzing data using PowerBI you’ve done a great job so far and should be proud of your progress the experience you’ve gained will showcase your willingness to learn motivation and capability to potential employers it’s been wonderful to be a part of your journey of discovery wishing you all the best for the future hello and welcome to this course on extracting transforming and loading data in Microsoft PowerBI regular digital activities such as ordering food online reserving a trip and using a social media application generate a great deal of data now think about the billions of people who engage in these activities every single day then there are other organizations like universities and banks that perform many other transactions that may need to be stored in different ways businesses also need to gather data from different sources for example from their customers from other companies and from the government now imagine all that data living in different places and being stored in different ways how can a company make sense of all of this that’s where data analysts come in one of their jobs is to extract data from different sources transform it in a way that it can be used and load it into a tool to help the analysis process like PowerBI this is what you will learn in this course how to extract transform and load data a process also known as ETL before data can be used to tell a story it must first be processed so that it is usable as a story data analysis is the process of identifying cleaning transforming and modeling data to discover meaningful and useful information the data is then crafted into a story through reports for analysis to support the critical decision-making process in this learning path you will learn about the life and journey of a data analyst and the skills tasks and processes they have to master to tell a story with data you’ll discover how getting the data analysis story correct enables businesses to make informed decisions by now you should have learned how to harness the power of data in PowerBI and how it benefits an organization in this course you will get to explore various topics and elements involved in the career of a data analyst including identifying how to collect data from multiple sources and configuring it in PowerBI preparing and cleaning data for analysis and inspecting and analyzing ingested data to ensure data integrity this course will give you a solid foundation in these topics and offer you opportunities to practice extracting transforming and loading data into PowerBI now let’s briefly outline the course content so you can have an idea of what’s to come in your learning journey as you explore the extract transform and load process first you will learn about the extract portion of the ETL process you will focus on data sources and how to extract data and configure storage modes in PowerBI then you will move on to the transform portion of the ETL process you will practice cleaning and transforming data to prepare it for data modeling you will also learn about data cleaning using Power Query and how to use applied steps next you will cover the load portion of ETL and practice using data profiling and advanced queries you will also learn about referencing queries and data flows and using the advanced editor to modify code to assist your learning you will also get to apply your newly gained skills in exercises quiz questions and self- reviews to consolidate your learning and put it into practice you will complete a practical assignment in this assignment you will be provided a business scenario from Adventure Works a fictional business where you need to gather data from multiple data sources to clean and transform you will have the opportunity to apply the knowledge you gained in this course to join and merge these data sources identify and remove anomalies using profiling tools after this practical assignment you will complete a final graded assessment be assured that everything you need to complete the assessment will be covered during your learning with each lesson made up of video content readings and quizzes in addition you can share your knowledge and discuss challenges with other learners these discussions are also a great way to grow your network of contacts in the data analysis world so be sure to get to know your classmates and stay connected during and after your course this course is also a great way to prepare for the Microsoft PL300 exam by passing the PL 300 exam you’ll earn the Microsoft PowerBI data analyst certification the exam measures your ability to prepare data model data visualize and analyze data and deploy and maintain assets in this course you will learn the process of extract transform and load you will identify how to collect data from and configure multiple sources in PowerBI and prepare and clean data using Power Query you’ll also have the opportunity to inspect and analyze ingested data to ensure data integrity now that you have an overview of what this course is about it’s time to take the next step and prepare for a career as a data analyst using PowerBI these days businesses generate very large amounts of data through their activities and the data may come from different sources for example from different departments within the company or from clients the challenge is how to make sense of this data and extract valuable insights that can help improve business performance that’s where PowerBI comes in in this video you’ll explore the basics of data sources produced from business operations and learn how to combine them to gain business insights to begin let’s first review the data sources that you can connect to in PowerBI flat files are a common type of data source that can be used for ETL or extract load and transform in PowerBI examples of flat files include CSV TXT and Microsoft Excel files relational data sources such as SQL Server MySQL and Oracle databases are commonly used by large organizations because they provide a high level of reliability data integrity and security nosql databases such as MongoDB and Cassandra are becoming increasingly popular for ETL in PowerBI these databases are designed to store and manage large volumes of unstructured or semistructured data making them ideal for use in a wide range of applications don’t worry if you’re not familiar with all the terminology it will be discussed later in this course so no matter where your data is stored PowerBI has the flexibility to connect to a wide range of data sources next we will explore how combining data sources in PowerBI can optimize supply chain performance imagine you are a supply manager responsible for managing the new just in time system of your company ensuring that all parts and materials are sourced and delivered on time while meeting quality standards you closely collaborate with your team to ensure that the system runs and all suppliers meet their obligations by combining data from various sources such as sales figures inventory production and supplier information your department could gain valuable insights into customer behavior product performance and supplier performance for example by analyzing sales data alongside supplier data trends in customer demand can be identified and production and inventory levels adjusted accordingly on a company level analyzing supplier performance data helps to identify areas for improvement and work with them to enhance their performance and long-term collaboration in conclusion combining data sources can benefit different stakeholders in a business by providing valuable insights into customer behavior product performance and supplier performance this information can be used to make informed decisions leading to improved supply chain management reduced costs increased customer satisfaction and ultimately drive business success data integration can be a daunting task especially when you are working with multiple data sources that have varying formats structures and quality levels the combination of these sources can often lead to inconsistencies and errors making it difficult to derive meaningful insights and make informed decisions but you don’t need to worry tools like PowerBI simplify the process of combining data from different sources reducing the time and effort required to create a comprehensive view of your data it is designed to be userfriendly and accessible even for non-technical users with an intuitive interface and drag and drop functionality that makes it easy to create reports and visualizations powerbi also allows you to customize your reports and visualizations to suit your company’s specific needs you can choose from a wide range of pre-built templates and visualizations or create your own custom designs this flexibility makes it easy to create reports that are tailored to the unique needs of your business it also enables collaboration by allowing you to share your reports and visualizations with colleagues clients or stakeholders by sharing reports or embedding them in websites or apps this collaborative approach can improve communication and ensure that everyone is working with the same data ultimately driving business success combining data sources is a great method of providing valuable information that can lead to improved supply chain management reduced costs increased customer satisfaction and ultimately drive business success and it should not be a daunting task in this video you learned the basics of data sources produced from business operations and how to combine them to gain business insights tools like PowerBI with its built-in data connections can simplify the process of combining data from different sources reducing the time and effort required to create a comprehensive view of your business by leveraging the functionalities of PowerBI you as an aspiring data analyst along with other stakeholders can gain a competitive edge and unlock new opportunities for growth and success at Adventure Works every day businesses generate large amounts of data but where do they store it all many organizations store and export data as files such as flat files in this video you’ll learn how to set up and export a flat file data source your manager at Adventure Works Adio Quinn asked you to build a PowerBI report using a flat file that the human resources team has prepared the file contains some of Adventure Works’s employee data such as employee names hire dates positions and managers as well as data located in several other data sources so what is a flat file a flat file is a file type that contains a single data table with a uniform structure for every row of data and does not have hierarchies some examples of flat files include commaepparated value or CSV files delimited text or TXT files and fixed width files additionally output files from various applications such as Microsoft Excel workbooks can also be classified as flat files now that you know what a flat file is let me demonstrate how to set up a flat data source let’s help Adventure Works HR department set up a flat data source the first step is to determine which file location you need to use to export the data the file location is important because when it is changed PowerBI will not be able to refresh the data this can cause errors such as file not found or data source not found once you have located your file you can proceed in PowerBI to display available data sources in the home group of the PowerBI desktop ribbon select the get data button option or down arrow to open the common data sources list if the data source you want isn’t listed under common data sources select more to open the get data dialogue box in this example you need an Excel data source which is first on the list next a connection window displays where you select the employee Excel workbook that the HR team prepared and select open when your HR file is connected to PowerBI desktop the navigator window opens this window displays the tables available in your data source the Excel file in this example you can select a table to preview its contents and to ensure that the correct data is loaded into the model after selecting the check box of the table that you want to bring into PowerBI it activates the load button now you can select the load button to import your data into the PowerBI data set in case you need to change the location of your source file for a data source during development or if your file storage location changes you’ll need to update your connection strings in PowerBI to keep your reports up to date to do this in PowerBI desktop select file in the menu bar then select options and settings from the file menu and now select data source settings from the options and settings menu you can also change or clear the permissions by selecting edit or clear permissions respectively permissions cover the privacy level and credentials used for connecting to a data source remember that any structural changes to the file can break the reporting model so it’s important to reconnect to the same file with the same file structure by following these steps you’ll be able to ensure that your report uses the most accurate and up-to-date information available you’ve now helped Adventure Works HR department to store their data and you should now know how to set up and export a flat file data source great work as an aspiring PowerBI data analyst you’ll generate large amounts of data but where can you store this data fortunately PowerBI offers several storage options for its users over the next few minutes you’ll explore PowerBI’s storage modes and their impacts on report performance adventure Works need help with creating a report that displays the performance of different product categories over time this report will be a large sales transaction table with billions of rows so you need to optimize its performance so that the end users have fast access to the visuals but before taking on this task you first need to understand the different storage modes available in PowerBI and how they impact report performance let’s begin with an overview of PowerBI storage modes powerbi has two primary storage modes import mode and direct query mode it also includes a complimentary dual mode import mode is used to import small data sizes from various sources into PowerBI and it stores it in memory which enables quick access for example in import mode you can connect to an Excel file containing a data set of available categories this mode is ideal for the marketing department if they need to filter sales transactions by category in the report view on the other hand direct query mode allows you to connect directly to the data source and the data remains in the source system direct query mode is best suited for larger data sets where loading data into memory is not practical for instance if you have a card visualization that displays an aggregate summary of category sales from a sales table with this storage mode PowerBI will send a request to the data source and get the result back by using direct query the sales department can leverage the power of the external database to handle complex queries and aggregations while PowerBI only brings in the necessary data for visualizations there are many features in import mode not supported in direct query mode so it’s important to remember that you can’t switch from one mode to the other now that you’re familiar with the two primary storage modes in PowerBI import and direct query let’s explore the complimentary dual mode dual mode is a distinct mode that combines the benefits of import and direct query modes when you use dual mode the PowerBI service determines the most efficient mode to use for each query so if a table has similar data between import and direct query modes then using dual mode can be beneficial with dual mode you can import the data you need and still use direct query for additional data that is not available in the important data let’s explore the advantages and limitations of each of the storage modes in a little more detail starting with import mode import mode is a great option if you need to work with small to medium-siz data sets data is loaded into PowerBI to form the data model the data model organizes the data into tables columns and relationships making it more accessible and easier to work with all calculations are performed within the data model the data is stored in compressed form which optimizes memory usage one downside of import mode is that you must refresh the data manually this means that any changes you make to the source data will not be reflected in the report until the data is refreshed the next mode you’ll explore is direct query direct query mode connects directly to the data source and queries are sent to the source system in real time this means that the data is always up to date and there’s no need to refresh the data manually direct query mode is best suited for larger data sets as it does not require loading all the data into memory if you choose to import the data to a PowerBI file stored on your local computer it will require a significant amount of memory and resource overhead one downside of using direct query mode is that it can impact performance if the queries are complex or the data source is slow so you need to consider the benefits and drawbacks of each storage mode and select the one that best suits your needs the third option you need to be familiar with is dual mode this is where data is stored in memory but can also be retrieved from the original data source this is useful when you are working with dimension tables which can be queried with fact tables from the same source for instance Adventure Works might have a sales aggregate by customer loyalty table in import mode which is used to speed up query processing by storing a summarized and categorized version of customer data in memory simultaneously the larger sales transactions table could be set to direct query mode in this scenario setting the common dimension table such as date to dual mode can enhance the performance of the report when the dual mode table date is combined with an import mode table sales aggregate by customer loyalty it behaves like an import table and retrieves data from memory ensuring faster performance on the other hand when the dual mode table dimension date is combined with a direct query mode table sales the dual mode table dimension date behaves like a direct query table quering data directly from the source system when you use multiple data sources to create a data model it is called a composite model composite models enables you to combine multiple import modes into one unified data model using composite models can greatly enhance the functionality and performance of your reports and analytics workflow when building composite models in PowerBI it’s important that you specify the storage mode for each table in your data model the performance of your composite model depends on how you set it up for the best performance try to use import or dual mode tables they work faster because the data is stored in memory and can be retrieved quickly giving you faster results when creating reports it’s essential that you consider the size of your data set and determine if real-time access is a requirement before selecting a storage mode powerbi offers different storage modes and in this video you learned about the two primary storage modes in PowerBI import direct query as well as the complimentary dual mode as an aspiring data analyst it is important that you understand how these different storage modes impact a report’s performance in this video you explored the advantages and limitations of each of the storage modes great work data has the potential to help organizations make better business decisions but businesses generate such large amounts of data they have to sift through that it becomes difficult to see the story it tells luckily PowerBI is an excellent tool for visualizing and analyzing data however the slow loading time of data can be a significant issue especially when working with large data sets in this video you’ll learn how to configure import direct query and dual storage modes in PowerBI to optimize data retrieval and processing enhance report speed and guarantee that your reports always contain the most recent data renee Gonzalez the marketing manager at Adventure Works has asked you to create a report that displays sales at the cash registers as customers purchase products the point of sale system scans product barcodes at the cash register measuring purchase trends she’s concerned with the logistics of ordering stocking and selling products while maximizing profit as this is going to be a large sales transaction table with billions of rows you need to ensure that the report’s performance is optimized so that the end users have fast access to the visuals to complete this task successfully you have to select the best storage mode for the data and configure it in PowerBI to optimize data retrieval and processing let’s start by helping Adventure Works choose a storage mode in PowerBI desktop to do this select the data button on the home group of the PowerBI desktop written in the get data dialogue box search for the Azure SQL database connector once you’ve selected the Azure connector the data connectivity mode section displays where you can choose from two options import or direct query import mode stores data directly in PowerBI desktop’s memory while direct query retrieves data from your data source in real time powerbi also provides extra functionality to customize the storage mode for each table in your data set to get started select the model view icon near the left side of the window to display a view of the existing model model view displays all the tables columns and relationships in your model table card headers are colored to help you quickly identify which tables are from the same kind of source a table card header with no color indicates that these tables are in import mode tables from the same direct query source will display the same color in the table card header blue in our example select the sales order detail DW table and expand the properties pane by right-clicking on the table and selecting properties the properties pane displays various options for configuring the table you’ll find a drop-own menu labeled storage mode in the advanced section of the properties pane this is where you can set or adjust the table’s storage mode now let’s set up a dual import mode for your table by configuring the storage mode of the sales order details table this table is currently set to a direct query mode in the advanced section change the option to import mode the following warning message will display setting storage mode to import is an irreversible operation you will not be able to switch it back to direct

query this operation will refresh table setter import which may take time depending on factors such as data volume next select okay congratulations you now know how to configure storage modes to optimize your reports now that the storage modes are configured Renee and her team should experience a significant improvement in system performance for example reports will generate more quickly they can display real- time data and business users can access data more efficiently well done at this stage of the course you should be familiar with how businesses gather and generate large amounts of data in their daily activities this can include data from human resources accounting and sales you also learned that this data may be structured and stored in different ways as an aspiring data analyst at Adventure Works you will realize that the most important step is to determine how data will be structured and stored knowing your data types and the way it is structured gives you the correct data sets to create reports that suit the company’s needs allowing business insights that will help during decision-m furthermore identifying the best storage solution for your data can reduce costs and improve performance two aspects that any company has as top priorities by the end of this video you will be able to identify the difference between structured and unstructured data and what storage solution is ideal for each type as an aspiring data analyst at Adventure Works you’ve been assigned the task of determining the best storage solution for the online retail website at Adventure Works the website was built with three data sets used to run the business product catalog data image files and financial business data each data set has different requirements the key factors to consider in your task are data classification how your data will be used and how you can get the best application performance now let’s focus on data types there are three types of data structured unstructured and semistructured all of which are suitable for analysis but differ in the tools used for ingestion transformation and storage let’s start with structured data structured data is the most common type of data that we use it is also known as relational data in a financial report for example numbers and names are arranged into columns and rows making it easier for analysis and processing by nature structured data is quantitative easily searchable sortable and analyzed using tools like Microsoft Excel spreadsheets or relational databases which can store large amounts of structured data sql or structured query language is a programming language used to manage relational databases it allows users to manipulate and query data stored in a database making it a valuable tool that’s used by data analysts and business users however the structure makes any addition or removal of data fields difficult since you must update each record to adjust to the new structure some applications where relational data is used are customer relationship management reservations and inventory management systems now let’s cover unstructured data unstructured data does not have a predefined structure or format it is best used for qualitative analysis and usually resides in non-reational databases or unprocessed file formats some examples of this type of data are text documents audio and video files social media posts and images these types of files can be stored in a centralized repository that ingests and stores large volumes of data in its original form then there is a third type of data it is called semistructured data because it is not as organized as structured data and it is not stored in relational databases this type of data uses tags for organization and hierarchy video files may have an overall structured and contain semistructured metadata but they are considered unstructured data since the data that forms the video itself is unstructured there is a process for converting semi-structured data into a specific format that can be easily transmitted stored or processed it is called data serialization it uses a method of formatting that will allow the data to be transmitted or stored in a way that is easily understood by both the sender and the receiver without the need to know all the specific details of the data this is useful when dealing with semi-structured data that doesn’t fit neatly into traditional databases or data structures if you want to learn more about serialization please visit the additional resources at the end of this lesson now you’ll learn how to classify your data in order to choose a suitable storage solution for structured or unstructured data the correct storage solution can deliver better performance improve manageability and save on database costs when selecting a storage solution it’s important to consider the type of data you’re working with what operations are needed to transform the data and what level of management and maintenance is required the business data used at adventure works for analysis on a year-to-year comparison is not updated frequently it is stored in multiple data sets and some latency can be accepted since it is mainly read only not all data analysts need write access but they can all read from all data sets this is a type of structured data that will most likely be queried by data analysts who use SQL more than any other query language therefore a suitable storage solution for this example is a SQL database or a cloud-based solution like Azure SQL database but it can also be bundled with another cloud-based solution Azure Analysis Services to model the data in Azure SQL database this model can be shared with business users who can connect to it through PowerBI for analysis and gain business insights in summary selecting the appropriate storage solution is vital for addressing the specific requirements of your data remember when we spoke about serialization and the formatting to allow the storage of unstructured or semistructured data one of those formats is a blob this is a binary large object where the data is stored in a binary ones and zeros format for Adventure Works online retail website Azure Blob Storage is an ideal option for storing unstructured data such as photos and videos it’s a scalable and cost-effective cloud storage service which is designed to store large amounts of unstructured data such as images videos or documents the website has a product page where a bicycle photo needs to be displayed at the same time as the specific bicycle model the photos will not be queried independently by including the photo ID or URL as a product property the photo can be retrieved by its ID without any time lag this demonstrates how unstructured data can be stored the right storage solution allows Adventure Works to achieve optimal performance and efficient data management in this video you learned that while structured data is easier to work with and analyze unstructured data is often more abundant and valuable businesses and organizations are increasingly focusing on harnessing unstructured data to gain insights into customer behavior emotions and other aspects that can shape their strategies choosing and implementing the correct storage solution can benefit companies and organizations by improving performance reducing costs and increasing efficiency adventure Works generates data from many different departments and stores this data in many different sources wouldn’t it be great if they could combine data from these different sources with PowerBI they can combine data sources using connectors in this video you’ll learn about the different kinds of connectors available in PowerBI their purpose how to choose a connector and securely connect to the cloud data source adventure Works needs to generate a report that compares the sale of bicycle models across the company’s different outlets web retail and individual sellers however the sales data is stored in different sources the company needs you to generate an integrated report that combines these different data sources you can combine these data sources using connectors in PowerBI you can use PowerBI as a single business intelligence solution to generate an integrated report by combining the company’s data sources through the use of connectors but before you begin let’s find out more about connectors connectors are links that transport data between a data source and an application they’re basically the bridges that connect PowerBI to different sources of data with connectors you can create a link or bridge between PowerBI in different data sources like databases files services SharePoint and more connectors make it easy to connect between data sources you can then transform clean and visualize the data into PowerBI for report and analysis to generate insights but before you start importing your data it’s important to understand what your business requirements are for the data source this includes things like whether the data is stored on your own computer and gets updated every so often or if the data is coming from an external source and needs to be updated in real time you also need to know who will be using the data and how it will be used these requirements are essential because they can affect the way you load the data into PowerBI so it’s important that you get them right microsoft frequently adds new data connectors to its desktop and services platforms it typically releases at least one or two new connectors every month as part of the regular PowerBI update this has resulted in PowerBI having a vast collection of over 100 data connectors available files databases and web services are the most used sources all PowerBI connectors are free to use but they might be marked as beta or preview depending on their development stage any data source marked as beta or preview has limited support and functionality so don’t make use of it in production environments now that you’re familiar with the data connectors available in PowerBI it’s time to help Adventure Works generate their report let’s examine the steps involved in setting up a connector to a SQL database first navigate to the home tab and locate the get data button you have two options to choose from here you can either select the get data button and then choose all or you can select the expand arrow next to the get data button and select more this lets you access a wide range of data connectors available in PowerBI to make sure your data is mapped correctly in PowerBI it’s crucial to identify the specific nature of the data for instance if you’re working with a document meant for an Azure SQL database using the Excel connector wouldn’t give you the desired outcome as a PowerBI user in the get data window navigate to the Azure SQL option and select it then select the connect button you can also use the search bar to filter the available connectors and quickly find what you’re looking for after selecting the data source you’ll be prompted to set up the connection depending on the type of data source you’ve chosen the specific details you need to provide will differ for example if you’re working with an Excel file you’ll need to specify the location of the file on the other hand if you’re dealing with a SQL server database you’ll need to enter the server name and the database connection details there are a few additional options you may want to consider in addition to specifying the server address and database name you can also choose between different connection modes such as import or direct query most of the time you’ll select import other advanced options are also available in the SQL Server database window but you can ignore them for now you’ll cover them at a later stage in the course after you’ve specified the server and database names you’ll be prompted to sign in with a username and password you’ll have three different sign-in options to choose from depending on your credentials the first option is to use your Windows account this is often the easiest option for users who are already logged into their computer the second option is to use your database credentials for instance SQL Server has its own signin and authentication credentials that are managed by the database administrator the third option is to use your Microsoft account credentials which require your Azure Active Directory credentials once you’ve selected the sign-in option that’s appropriate for your situation enter your username and password and then select connect this will allow you to securely connect to your data source once you’ve successfully connected your database to PowerBI desktop the available data in the navigator window appears this window displays all the tables or entities that are available in your data source such as the SQL database in this example to preview the contents of a table or entity simply select the check box next to the table to import data into your PowerBI model select all tables that you want to bring in finally once you’ve selected the tables you can choose to either load the data into your model in its current state or transform it before loading for now the focus is on the data loading process data transformation will be covered in more detail at a later stage by selecting the appropriate data and choosing the load option you can easily bring in the data you need to start building visualization and analyzing your data in PowerBI connectors are an essential component of PowerBI the wide range of available connectors lets you connect to lots of different data sources to bring them all together into one place you can then import or extract the data from these sources into reports and dashboards for analysis and visualization by leveraging the full range of connectors you can access valuable insights to make datadriven decisions for your business you should now understand that connectors are a powerful asset that can help you get the most out of your data analysis what if you could reorder products you buy frequently with a click of a button that would be really convenient right and what if other types of tasks could be automated by businesses well in today’s datadriven world organizations are constantly searching for ways to automate tasks to optimize productivity microsoft PowerBI is an integrated suite of software tools applications and connectors that can help you transform your data sources into clear and compelling visualizations connectors play an important role in connecting to various data sources and executing actions or triggering workflows based on specific events there are two types of operations available to create automated workflows triggers and actions in this video you will explore how actions are triggered to create efficient and effective scheduled actions so let’s get started with triggers and actions in PowerBI addio Quinn a data analyst at Adventure Works a bicycle manufacturer is responsible for analyzing daily sales reports and providing insights to the management team however the manual process of importing data from multiple sources and analyzing it can be laborious and timeconuming to streamline this process Adio asks your help to leverage PowerBI’s triggers and actions to automate the workflow with PowerBI you can schedule an action to refresh the data and email the latest sales report to the management team with this automated workflow in place you can now focus on analyzing the data and providing valuable insights to the management team without worrying about the manual process of importing and analyzing the data in PowerBI triggers and actions work together in configuring a workflow either based on time or specific actions a trigger is always required to initiate a workflow and prompt it to run additionally actions in PowerBI enable interaction with the data source through various functions automating tasks and processes with actions in your workflow can save time reduce manual effort and make your workflow more efficient moreover scheduled actions in PowerBI can automate tasks and actions based on specific time intervals by setting up a schedule reports and dashboards can be updated with the latest data regularly without manual intervention thereby improving data accuracy and streamlining workflows now we are going to explore how to set up a schedule data refresh when it comes to working with data in an organization having access to the latest and most relevant information is essential outdated data won’t be useful to the organization as it doesn’t reflect the current situation relying on old data can even hinder the organization’s growth since there could be more recent and applicable data readily available in this video we’ll explore the topic of automating tasks in PowerBI in PowerBI users have the option to create scheduled actions which enable them to automate tasks and actions at specified time intervals today you are going to help Adio a data analyst at Adventure Works and his job involves regularly updating sales report data sets according to a predetermined schedule by setting up a schedule data refresh Adio can now automate the process saving him valuable time and effort let’s begin by opening your browser and heading to https/app.powerbi.com/home powerbi.com/home to get to the scheduled refresh screen in the navigation pane on the left hand side of the screen select data hub next locate the data set you wish to work with in our case the sales report data set next select the ellipses and then select settings to expand the data set settings this will take you to a new screen where you can configure the trigger scheduled refresh section is where you define the frequency and time slots to refresh the data set let’s walk you through the steps to set up an online refresh schedule in PowerBI services here’s what you need to do step one turn the switch to on step two you can modify the schedule to fit your needs choose the frequency you want the data set to refresh such as daily select the time zone you want to use for example UTC London under time select add another time and enter a time for the refresh to occur repeat this step for additional refresh times as needed step three once you’re done simply select apply and you’re all set did you know that you can easily adjust the frequency time zone and time of your scheduled refreshes in PowerBI this allows you to ensure that your data is always up to-date and accurate plus you can even set up scheduled notifications to be sent to a specific email address how convenient is that beware if your data set hasn’t been active for 2 months the scheduled refresh will be automatically paused are you ready for a quick rundown on data refreshing in PowerBI great as a PowerBI user refreshing data typically means importing data from the original data sources into a data set you can choose to refresh data based on a predetermined schedule or on demand depending on your needs if your underlying source data changes frequently it may be necessary to perform multiple data set refreshes daily however it’s important to note that PowerBI limits data sets on shared capacity to a maximum of eight scheduled daily data set refreshes with these easy steps you can now create a refresh schedule that works perfectly for you in this video you explored the topic of automating tasks within PowerBI specifically using scheduled actions to automate tasks and actions at specified time intervals by automating processes such as data refreshing users can save valuable time and effort we walked through the steps to set up an online refresh schedule in PowerBI services and highlighted the importance of periodically checking the refresh status and history to ensure data sets are error-free good job congratulations on reaching the end of the first week in this course on how to extract transform and load data in PowerBI this week you explored how to work with basic and advanced data sources in PowerBI let’s now take a few minutes to recap what you learned this week this summary will help you review the concepts presented previously and clear up questions you might have you began the course by covering basic data sources you learned that for example by analyzing sales data alongside supplier data you can identify trends in customer demand you also learned that data from different parts of an organization may come from different sources and may be stored in different ways that’s when you identified the many different data sources supported by PowerBI like flat files relational data sources and NoSQL databases you also learned how to set up a flat data source after that you learned that local data sets provide data that is only available to a specific individual or organization and are typically stored locally local data sets are a good option for organizations or projects with few users that demand high security and need speed over quantity on the other hand shared data sets allow multiple individuals or organizations access to data and are usually stored on multiple locations or cloud-based platforms they are suitable for large enterprises or projects that require multiple users working at the same time then you had the opportunity to complete a practical exercise on how to set up an Excel data source in PowerBI after that you covered different storage modes in PowerBI you learned that you must think carefully about the benefits and limitations of each storage mode and select the one that best suits your needs import mode is a great option if you are working with small to medium-siz data sets and if the data is loaded into PowerBI data model in this model data must be refreshed manually on the other hand direct query mode connects directly to the data source and queries are sent to the source in real time so there’s no need to refresh the data manually however this mode might impact performance you also covered dual and hybrid modes as alternative storage modes after you explored these different storage modes you then learned how to configure them in PowerBI next you had the opportunity to apply your skills and configure storage modes in PowerBI you discovered that structured data also known as relational data is arranged into columns and rows by nature structured data is quantitative easily searchable sortable and analyzed using tools like Microsoft Excel spreadsheets or relational databases which can store large amounts of structured data on the other hand unstructured data does not have a predefined structure or format unstructured data is best used for qualitative analysis and usually resides in non-reational databases or unprocessed file formats some examples of this type of data are text documents audio and video files social media posts and images semistructured data is not as organized as structured data and it is not stored in relational databases this type of data uses tags for organization and hierarchy an example of semi-structured data is video files you then learned about connectors connectors are the bridges that connect PowerBI to different sources with connectors you can import data from databases files Outlook servers SharePoint and many other sources you also learned that before you start importing your data it’s important to understand what your business requirements are for the data source you then explored the two types of operations used for creating automatic workflows triggers and actions triggers are used to create efficient and effective scheduled actions for example Adventure Works can use triggers to automate parts of their PowerBI workflow like refreshing data and emailing reports next you undertook another practical exercise in this exercise you implemented triggers to automate your workflow in PowerBI you then tested your understanding of the concepts that you encountered in this lesson in the knowledge check finally you undertook a module quiz this quiz tested your understanding of all concepts that you explored in this module you should now be familiar with the fundamentals of data sources you should be capable of extracting data from basic and advanced data sources to work with in PowerBI great work i look forward to guiding you through the next week’s lessons in which you’ll learn about transforming data in PowerBI you’re making progress in your journey to become a data analyst you’ve learned how to extract data and now it’s time to learn how to transform it so you can make better use of it depending on your data sources data transformation can involve different activities such as cleaning merging and profiling in this video you’ll learn how to identify components of data transformation and understand why data transformation is required adventure Works CEO Jamie Lee has set a new goal for the company to increase sales she’s relying on company data to uncover trends and insights and make that goal achievable your manager Addio Quinn has asked you to create a PowerBI report that visualizes the data in a meaningful way but before you can start working with that data you need to clean and transform the raw data to ensure its accuracy and consistency in the first part of this course when you explored the extract stage of the extract transform load process you learned that data may come from different sources however the data from these sources may contain inconsistencies that make accurate analysis difficult data from different sources can be untidy incomplete and inconsistent making it difficult to draw meaningful insights that’s why data transformation is a crucial step it helps you prepare data for analysis now let’s examine some of the inconsistencies you may find in data by this point in the course you should know that data is classified into three main groups called structured semistructured and unstructured data each data group is suitable for analysis but may require different tools to ingest transform and store you can say that data coming from sources that you define as structured data is more ideal to work with and compliant with the rules since these sources are systems that have strict rules and prioritize data integrity data coming from conventional databases generally have a low probability of inconsistent or erroneous data however in semistructured data unstructured data and even in some types of structured data it is likely that there is data that needs to be transformed before starting the report design for example let’s say you are working on an analysis related to products in an e-commerce database for this task you need some relevant fields for your report however the table has hundreds of fields so you need to decide how to identify the relevant data to create your report an example of useful data transformation in this scenario is including certain columns from the data and excluding others before loading the data for analysis and reporting another transform example would be selecting fields and transforming by merging them such as in a customer table with fields for the first and last name but you want to display them as a single full name field by merging fields with a space between now let’s explore what data cleaning is data that is not structured is more flexible in terms of rules and therefore more likely to be disorganized and require cleaning you may not encounter as clean data as you would expect in Excel data or in data organized using delimiter symbols such as angle brackets or commas in such cases the data should have a preliminary examination to identify incorrect data or separate rows where content refers to the same values like where house written as two words and warehouse as in one word you can resolve these inconsistencies by passing them through filters with specific rules this examination is referred to as data cleaning another data issue you may encounter is the need to merge or append multiple data sources for example if Adventure Works has two data sources for sales one for online sales and another for in-person sales you’ll need the data from both to create a monthly sales report depending on the data formats you can use commands such as append or merge data transformations to combine the data for analysis in this video you learned that data transformation can help improve data quality by removing errors inconsistencies and inaccuracies this results in cleaner more reliable data for analysis it also allows you to standardize data when working with multiple sources with data transformation you can help organizations like Adventure Works use data that is more understandable organized and consistent to achieve goals like increased sales in this video you will explore some features of Power Query and learn to navigate the Power Query editor interface adio Quinn the data analyst at Adventure Works asks you to clean and transform the company’s sales data which is scattered across multiple sources in preparation for data analysis power Query can help you with this power Query is part of PowerBI desktop allowing for seamless data preparation within the PowerBI environment power Query is a data transformation and data preparation tool allowing you to connect clean and transform data from a wide range of sources it ensures that your data is ready for analysis enabling you to create insightful visualizations and reports let’s explore how Power Query helps you clean shape and organize data from various sources the first feature is data connectivity power Query connects to various data sources both on premises and the cloud directly within PowerBI desktop you can access data from traditional databases as well as file-based sources next there’s data extraction and transformation power Query’s interface allows you to extract and transform data with ease during the extraction process you can filter sort and apply custom transformations ensuring that you import only the required data then there’s the power query editor in PowerBI within PowerBI desktop which provides a graphical user interface or guey for designing and managing queries tabs such as home transform add column and view have data manipulation tools there’s also query reusability and applied steps power Query records each transformation as an applied step allowing you to review modify or delete any step this ensures that your data transformations are transparent and easily modifiable finally there’s performance and scalability power Query handles large data sets efficiently using various techniques that optimize performance and reduce memory usage let’s demonstrate these features in Power Query to achieve Jaime’s goal of increasing sales you must work with sales data from different regional teams stored in different file formats like Excel CSV and even a SQL database to get started you’ll need to import this data into PowerBI using Power Query to begin the import you must add a data source in the PowerBI desktop in the home tab select get data to choose a data source the Power Query editor opens in a separate PowerBI window where you can apply various data transformations such as removing columns changing data types and filtering data next you need to load the data select your data source and configure the connection settings if necessary select transform data to open the Power Query Editor now let’s discover how to navigate in Power Query the Power Query editor has several key areas let’s start with the ribbon the ribbon is the set of toolbars at the top of the window it helps you quickly find the commands that you need to complete your tasks the ribbon tabs such as home transform add column and view contain commands and tools for data transformation and manipulation the queries pane is located on the left side of the editor the queries pane displays a list of all the queries in your project select a query to view or edit its applied steps and data preview this pane is where you can manage and navigate between different queries in your project by selecting a query you can view the data and the applied steps associated with it helping you keep track of your work and maintain organization in your project then on the right pane below the ribbon there’s the applied steps section it displays the sequence of transformations applied to the selected query select a step to view the data state at that point or delete reorder or modify steps as needed the applied steps section provides a visual representation of the transformations applied to your data making it easier to understand the changes made by reviewing the applied steps you can identify errors redundancies or inefficiencies in your data transformations finally in the center of the Power Query window let’s explore data preview the data preview pane displays a preview of your data as it appears after the applied transformations you can interact with the data by sorting filtering or changing the data type of columns this pane enables you to review your data at different stages of the transformation process helping you to get your transformations accurate and effective before loading the data into the data model in this video you learned that Power Query is a versatile tool in PowerBI that streamlines data import cleaning and transformation from multiple sources its features such as data connectivity data extraction and transformation make it an integral part of PowerBI desktop it helps you prepare and transform data from different sources within Adventure Works to simplify analysis and create insightful visualizations and reports the Power Query Editor interface offers a userfriendly experience allowing you to perform various data transformations with ease thanks to the applied steps list in Power Query you can easily undo and reorder steps without losing progress in this video you’ll learn how to use the applied steps list to undo modify and reorder steps first let’s open the Power Query Editor in PowerBI to do this from the home tab select transform data after selecting your data source the Power Query Editor opens in a separate window next let’s locate the applied steps list in the Power Query editor you’ll find the applied steps list on the right pane below the ribbon it has all the steps you’ve performed on your data presented in the order of application the applied steps list is a visual representation of the transformations applied to your data by reviewing the applied steps you can identify errors redundancies or inefficiencies in your data transformations to view the data state at a specific point in the process select the corresponding step in the applied steps list the applied steps list makes it easy to correct a mistake or change your mind or undo a transformation to undo a step simply select the X icon next to the step to remove power Query will automatically revert the data to the state it was in before that step was applied please note that removing a step will also remove all subsequent steps in the list as they are dependent on the previous transformations what if you need to reorder the sequence of steps to reorder steps select and drag the step you’d like to move to a new position in the list power Query will update the data accordingly applying the transformations in the new sequence you should note that reordering steps might affect the results of subsequent transformations review your data and the applied steps list to check everything suppose you need to modify a step just select the gear icon next to the step this opens a settings window to edit the transformation parameters when changed select okay to apply the update as with reordering steps modifying a step might affect subsequent transformations always review your data and the applied steps list to ensure everything is as expected to add a new step use the Power Query Editor ribbon to choose a transformation such as filtering or sorting when you perform a new data transformation it’s added to the applied steps list with the Power Query Editor you can also add filters filtering is the process of narrowing down your data set by displaying only the rows that meet specific criteria it helps focus on a particular subset of data remove unwanted data that may affect your analysis or simplify your data set for better readability let’s check how to add a filter in the Power Query Editor select the column header for the column you want to filter this highlights the entire column with the column selected select the small down arrow next to the column header this opens a drop-own menu with filtering options such as text filters number filters or date filters depending on the data type in the column choose the type of filter and select okay notice the new filtering step has been added to the applied steps list you can also sort your data set sorting is the process of arranging your data in a specific order either ascending or descending sorting organizes data based on specific attributes such as alphabetical order numerical values or chronological order helping to identify the highest or lowest values in a data set select the column header for the column you want to sort in the home tab of the ribbon find the sort group choose sort ascending A to Z or sort descending Z to A to sort the selected column in ascending or descending order the data is sorted based on your chosen sorting order check the applied steps list to ensure the new sorting step is added finally for better organization and readability you can rename any step in the applied steps list just rightclick the step you’d like to rename and select rename enter a new descriptive name for the step and press enter renaming steps helps keep track of transformations making it easier to navigate and understand the data transformation process in this video you learned how to use the applied steps list in Power Query to undo modify and reorder steps it has a visual representation of the data transformation process making it easier to understand complex queries and track the impact of each action on the data set the applied steps list provides easy undo and redo functionality flexibility and reordering steps and efficient troubleshooting capabilities saving time and effort how do you efficiently remove and rename columns to focus on the data that matters you can do it with Microsoft Power Query in Microsoft PowerBI in this video you’ll learn how to remove and rename columns and promote header roles in Power Query in PowerBI as you continue to work on Adventure Works goal to increase sales your manager Adio Quinn asks you to prepare a report on sales and customer demographics you have a data set with numerous columns but you only need a few of those columns for your analysis you must get the data organized and streamlined but you’re not sure where to start that’s where Power Query comes in power Query is a powerful data transformation tool within PowerBI that allows you to connect to different data sources clean data and transform data with ease a common data manipulation you’ll encounter is working with columns working with columns in Power Query in PowerBI is an essential skill for data analysts and professionals who regularly deal with data one of the main benefits of learning to work with columns is efficient data preparation eliminating unimportant or repetitive columns allows you to concentrate on the most crucial data for your analysis minimizing the data set size and streamlining the data structure for easier manipulation and quicker processing another benefit of working with columns is improved data readability and interpretation removing unnecessary columns helps declutter your data set making it easier to read and understand renaming columns with more descriptive names helps you quickly identify the purpose and content of each column one other benefit of working with columns is that it allows for enhanced data analysis and reporting by focusing on the most relevant columns you can produce more accurate and meaningful analyses this allows you to deliver actionable insights to your team and organization leading to better decision making finally working with columns means time and resource savings efficiently removing and renaming columns in Power Query can save you a significant amount of time during the data preparation stage this means you can devote more time to analyzing the data and generating insights by streamlining your data preparation process you also reduce the computational resources required to process your data this can lead to faster analysis and in some cases cost savings particularly when working with cloud-based services that charge based on resource usage now let’s explore a step-by-step guide on how to remove and rename columns and promote header rows in Power Query let’s start by demonstrating how to remove columns the first step is to load your data into Power Query Editor open PowerBI on the ribbon select home select get data and choose your data source for example Excel or CSV once connected to your data the Power Query Editor opens displaying your data the next step in the Power Query Editor is to locate the columns you want to remove to select a single column select its header if you need to select multiple columns hold down the keyboard control key or the command key if you’re using a Mac and select multiple column headers to remove with the columns you want selected you’re ready to proceed right click on any of the selected column headers in the context menu that appears select remove columns the selected columns are removed from your data set you will notice a new step removed columns appears in the applied steps list on the right pane reflecting the updated data state now let’s cover how to rename columns first you select the column you want to rename in the Power Query editor select the header of the column to rename rightclick the header of the selected column in the context menu select rename a text box appears type in a new column name press enter to save the change again you’ll notice the new step in the applied steps list let’s check how to promote header rows the first thing is to identify which row in your data set contains the headers in most cases this is the first row if your data set has additional information or metadata above the headers you may need to scroll down to find the appropriate row now you can promote the header row once you’ve identified the header row on the ribbon use the home tab to locate the transform group select use first row as headers this promotes the first row to be used as column headers replacing the existing headers note if the header row isn’t the first row you’ll need to remove any rows above the header row before promoting it to do this select the rows you want to remove by selecting the row numbers on the left side of the editor then on the ribbon in the home tab select remove rows you will notice a new step removed rows in the applied steps list on the right pane reflecting the updated data state in this video you learned how to remove and rename columns in Power Query you also learned how to promote header rows these are important skills for you to master as an aspiring data analyst they empower you to transform raw data into valuable insights that drive smarter decision making and lead to a greater impact within your organization furthermore efficient data preparation saves time and computational resources when analyzing your data you need to ensure accuracy and reliability but data sets often contain errors that lead to inaccurate results using Power Query you can fix many common data set errors in this video you’ll learn how to identify common types of errors and discover how best to fix them using Power Query in PowerBI adventure Works is preparing to analyze its latest sales data worksheet however there are several errors in this data set like null values duplicate rows and inconsistent data types these errors must be resolved before analysis let’s take a few moments to help Adventure Works fix these errors using Power Query first you must import the data set to transform in this case it’s the Adventure Works sales data set on the home tab select get data and choose text CSV for the file type browse to the location of your data set and select open to import then select load to load the data next select transform data in PowerBI desktop the transform data button is in the home tab in the queries group of functions the button is positioned to the right of the recent sources button the sales data is loaded into Power Query it shows a list of bicycle products and key information about each product like name price weight category and description however several of these rows contain null or missing values these errors need to be resolved before the data can be analyzed to systematically identify missing or null values select the drop- down arrow in the column header for the variable you’re examining this opens a filter menu used to filter the data in the column based on specific criteria the filter menu contains options like empty or null available options depend on the data type of the column empty refers to blank cells in text columns null refers to missing values in numeric or date columns select the appropriate option to filter and display rows that contain missing or null values in the selected column inspect the data table in the editor and identify any rows with missing or null values in this data set two rows contain missing values row 16 and row 17 have a missing value in the product subcategory column now that you’ve identified the values you can resolve them there are three ways to resolve missing values you can replace them with default values replace them with values from another column or remove the rows containing missing values for adventure works the best approach is to replace its missing values with default values logical default values can represent the missing data without distorting the analysis or visualizations first in the ribbon at the top of the editor select the transform tab you use this tab to access the tools and functions for modifying and transforming the data next select the replace values button then select replace values from the drop- down menu you use this option to replace specific values in a column with a new value in this case you can replace all null or missing values a replace values dialogue box appears on screen it has a text box labeled value to find where you specify the value you want power query to identify and replace the aim is to find missing or null values in the product subcategory column so in the value to find box you can write null below the value to find box there’s another text box labeled replace with this is where you type the new value you want to replace the missing or no values with the new value should be consistent with the columns data type which is text so let’s replace the missing values in the product subcategory with the text value trail which represents the default category for trail bikes finally select okay to confirm and make the change when you select the okay button in the replace values dialogue box Power Query scans the sheet for the values you’ve instructed it to identify it then replaces each instance of these values based on the criteria you specified in the replace with box you can review a history of all data transformation operations you’ve applied to the data set by selecting the pane called applied steps on the right hand side of the power query editor window adventure Works has fixed the null values in its data set but there are still duplicate rows errors present the entries in rows 22 to 24 are duplicates of other records in the sheet and identical records also exist in rows 25 to 27 let’s help Adventure Works resolve these errors on the home tab access the data manipulation functions from these functions select the remove rows option and a drop-own menu appears select remove duplicates from the options power Query analyzes the data set and finds rows that have identical values in the selected columns it then removes all but one instance of each group of duplicates that’s good progress just one final error left in the data set inconsistent data types in the form of order dates let’s fix this final error the inconsistent data is in the column order date select the column header to select and apply changes to the entire column next select the transform tab to access the data modification options select the data type button then select the date data type from the drop- down menu this converts all values in the column to the select to data type meaning all data types in the column are now consistent thanks to your help Adventure Works has removed all errors from its data set it can now perform data analysis without the risk of producing inaccurate results you should now understand how to identify common errors in data sets like missing or no values duplicate rows and inconsistent data types you should also be able to resolve these issues using the tools available in Power Query identifying and resolving these errors is essential for making sure your analysis runs on accurate reliable and highquality data you are a data analyst at Adventure Works tasked with analyzing sales data across different product categories and regions using PowerBI understanding the importance of reshaping the data to uncover valuable insights you know you’ll need to transform the data so far in your introduction to transforming data in PowerBI in this course you’ve learned about Power Query data types columns and preparing a data set in this video you’ll gain further insight into PowerBI’s powerful data transformation capabilities by discovering unpivoting and pivoting in Microsoft Power Query unpivot and pivot operations are data transformation techniques that you can use to reshape and restructure data in PowerBI let’s explore each operation in turn the unpivot operation refers to the transformation of data from a wide format with multiple columns to a narrow format with fewer columns by reshaping the data structure it involves converting column headers into row values resulting in a more structured and standardized representation of the data the unpivot operation is useful in data analysis supporting data normalization by organizing data in a tabular format this facilitates analysis variable comparison and data aggregation and summary as related information is consolidated into a single column transforming data from a wide to a narrow structure can also enable data compatibility and integration with other systems or tools that require a narrow format for example in the case of the adventure works sales analysis you can perform the unpivot operation to convert the sales data which is organized in a wide format with separate columns for each region into a long format where the region specific data is stacked vertically in a single column this makes it easier to compare sales across different regions and gain a holistic view of the overall performance on the other hand the pivot operation refers to the transformation of data from a narrow format with fewer columns to a wide format with multiple columns by reorganizing the data structure it enables data analysts to convert rows into columns based on specific criteria or values this operation is often used to summarize and aggregate data create cross tabulations and represent data in a more structured easy to understand way for analysis and reporting to illustrate say you want to analyze the sales data based on different product categories as part of the Adventure Works sales analysis using PowerBI’s pivot functionality you can transform the rows containing individual product categories into separate columns this pivot operation enables you to present the sales data in a more concise and structured manner making it easier to identify trends top selling products and performance within each category you’ve been introduced to PowerBI’s unpivot and pivot operations to transform and structure your data as with other data transformation techniques reshaping the data can help your team gain deeper insights and support business success through datadriven strategies decisions and actions now let’s take a moment to work through a practical application of the unpivot and pivot operations to the Adventure Works sales data using Power Query in PowerBI desktop suppose Adventure Works uses two separate Excel files to assess their quarterly sales and product and category distributions the first Excel file contains the sales target data consisting of three columns month 2022 and 2023 within this file there are 12 rows representing each month and each row displays the target sales amount for the corresponding month and year to enhance the table structure for easier readability your manager asks you to perform an unpivot operation to create a table with columns for month year and target which will also increase the number of rows the second Excel file includes category and subcategory data showcasing the category and subcategory data as columns without the product names you are tasked with performing a pivot operation on this file to present the product count per category in a tabular format to address the tasks given to you by your manager you can start by downloading and importing the two Excel files into Power Query with each data source selected select the transform data option to open the Power Query editor where you can apply various transformations including the unpivoting and pivoting operations for the first Excel file containing the sales target data you need to perform an unpivot operation to unpivot the table columns select target query on the left menu highlight the 2022 and 2023 columns select the transform ribbon tab in Power Query and then select unpivot rename the attribute column to year and the value column to target amount you now have an unpivoted table where the columns are converted to rows to accomplish the second task and pivot the table columns in the Excel file with the product categories and subcategories select the product categories query on the left menu on the transform ribbon tab select pivot column then on the pivot column window that displays select the column subcategory from the values column list expand the advanced options and select the option count all from the aggregate value function list lastly select okay with the pivot column feature applied you change the way that the data is organized subcategory names are converted to columns and row count for each subcategory is added as a row value for each column in this video you explored unpivot and pivot operations in PowerBI and the application of both in practice by building your technical expertise and learning about effective data transformation techniques like unpivoting and pivoting you can maximize the potential of PowerBI to unlock valuable insights from business data ultimately contributing to growth and success of organizations like Adventure Works you’re making good progress in your journey to becoming a data analyst you’ve learned how to transform data by using Power Query and have worked on data sets now it’s time to learn how to combine different data sources so you can use it more effectively the capability to combine queries is valuable as it empowers you to combine and merge diverse tables or queries enhancing your data analysis capabilities in the next few minutes you will be introduced to why combining data may be necessary and how you can combine tables or queries adventure Works have recently acquired another bicycle business adventure Works CEO Jamie Lee has assigned a task to the sales department to ensure that sales data from this business is incorporated in the Adventure Works sales reports your manager Adio Quinn has tasked you with creating a PowerBI query that merges the data but before you start working on the data you first need to understand the reasons why it is important to combine data the first reason for combining data is that it allows you to consolidate information from various sources or tables into a single table this consolidation can provide a unified view of the data making it easier to analyze and gain insights the next reason why you would combine tables is to create relationships combining tables is crucial for establishing relationships between related data in PowerBI relationships between tables are used to create meaningful visualizations and enable interactive analysis by combining tables you can link data points across different tables based on common fields or keys combining tables also enables you to enrich your data by adding additional information for example you may have a table with client details and another table with product information by combining these tables you can create a comprehensive data set that includes both client and product details allowing for a more comprehensive analysis another reason to combine data is that it provides a broader scope for analysis by merging multiple tables you gain deeper insights by analyzing data from different angles and lastly combining tables helps simplify data management in PowerBI instead of working with multiple separate tables having a single consolidated table reduces complexity and makes it easier to handle data updates refreshes and maintenance tasks now that you understand the reasons why it is important to combine data let’s look at the ways to do it in PowerBI there are two ways to combine data append and merge when you append queries you are adding rows of one table or query to another table or query by adding multiple lists one below the other you will see an increase in the number of rows say for instance you have two separate classes class A and class B that need to take an exam together to do this you have to combine the 20 students in class A with the 20 students in class B resulting in a combined class list of 40 students on the other hand when merging queries you consolidate data from multiple tables into a single entity by leveraging a shared column between the tables for example data with specific content such as gender category and city is stored in different independent tables and referenced by main tables that require this information this allows you to use this information within a specific context enables easy data classification and ensures data integrity you will learn more about both of these operations over the coming lessons in this video you learned about data combination techniques and the reasons for using it combining data in PowerBI is essential for creating accurate comprehensive and interactive reports and visualizations it allows you to leverage the full potential of your data by consolidating relevant information from multiple sources establishing relationships and enabling more insightful analysis good job adventure Works has recently acquired an additional bicycle business your manager Adio Quinn tasked you with creating a PowerBI query that merges the current sales data of Adventure Works with the sales data from the newly acquired business and he needs the query by the end of the day but you do not panic you know that PowerBI can help you combine different tables and queries to consolidate information create relationships enrich data enhance analysis and simplify data management in the next few minutes you will learn why appending tables or queries may be required at the end of this video you will also be able to describe the operation of appending one table to another by now you know that there are two ways to combine data in PowerBI append and merge when merging queries you consolidate data from multiple tables into a single entity by leveraging a shared column between the tables you will learn more about merging in the coming lessons when you append queries or tables you add rows from one or more tables to another query or table in this video you will focus on append before I demonstrate how the append operation is done let me share a very important tip with you say your manager has asked you to list the Adventure Works products that have fewer than 100 units sold for the current year the products that have not been sold do not appear in the sales table so you have to identify them by subtracting the sold products from all the products as a result you have two data sets to be merged products with 100 or fewer sales and products that have never been sold if you only list the products with sales data of less than 100 you won’t include the products that haven’t been sold at all to overcome this problem you have to merge the products with total sales below 100 and the ones that haven’t been sold at all to present the complete picture back to the task audio set you before you append the adventure works sales.xlsx and the other sales.xlsx XLSX files you have to format the data of both files to ensure they have an equal number of columns and that the columns have the same names and data types if you don’t have an equal number of columns or different column names the extra columns will be added to the most right of the query by preserving their values in the originating query and setting null values for the matching new query in this example columns A and B are common columns in both data sets columns C and D are unique and added to the right of the merged list since the D column does not have any data in the first data set the row values will be null after the merge similarly in the second data set null values will be added for the previously non-existent C column this may be confusing so try to have an equal number of columns with the same column titles let’s explore how this is done to format tables select other sales query in the query pane at the left menu of the power query window rename the quantity column to order QTY name to product name and total to line total by selecting the column names once you have completed the reformatting process you can merge the queries on the Power Query Editor ribbon navigate to the home ribbon tab and select the append queries drop-down menu you can select append queries as new to create a new query or table from the appended output or select append queries to merge the rows from an existing table into another if you select append queries as new you will create a new master table this selection displays the append window where you can select the tables you want to combine from the available tables section and add them to the tables to append section when you select okay a master table is created that contains the sales data of both Adventure Works and the newly acquired company in this video you learned how to combine data by appending tables and queries by appending different sales data you can create a master sales table this will help you to consolidate and enrich data from multiple tables and queries and simplify data management combining or joining data from different sources is like putting puzzle pieces together to form a big picture the big picture can help you discover details you could have missed when examining the individual pieces in this video you will discover what a join is and explore the purpose of joining data and its importance in data analysis before we explore the power of joining data to unlock new perspectives you need to understand what a join is when you have data in two tables and the columns of those tables are exactly the same appending the data from one table to another is straightforward however to combine the data of two tables with different column structures you need to specify the method in which the two tables should be combined this is known as a join join is when you merge or combine data from different places to create a bigger and a more complete data set it helps you view all the information in one place like putting puzzle pieces together to understand the whole picture let’s look at an example your manager Adio Quinn has tasked you to list all products with their category names and indicate which category has the most products during your investigation you notice that category data is referenced to a table called categories it is also being used by the common columns named category key on closer inspection you notice the row with a category key of one has a category name of bikes and the row with a category key of two has a category name of accessories your conclusion is that any row with a value of one in the category key column has bikes as the products category one of the key usage areas of joins is merging the two tables in this manner and matching related data by using the relationship one of the key usage areas of joins is merging two or more tables and matching related data by using the relationship joining data is essential for PowerBI data analysts because it enables you to combine information from different sources giving you a complete picture of the data joining data can help you validate data accuracy make informed decisions and perform advanced analysis joining data also empowers you to gain a holistic understanding uncover valuable insights and make datadriven conclusions overall join is a powerful technique that enhances your data analysis capabilities and allows you to unlock the full potential of your data in a previous video you learned that there are two ways to combine data in PowerBI append and merge in both merge and append operations the use of join is essential for combining tables effectively let’s explore merge with join in more detail when you merge queries you’re combining the data from multiple tables into one based on a column that is common between the tables merge with join allows you to match related data integrate data and explore relationships when you append queries you are adding rows of data to another table or query append with join helps you to ensure consistency and allow you to expand your existing data set whether it’s a merge or append operation the use of join is essential for aligning integrating and combining data from different tables it ensures that the relevant information is properly matched and merged enabling you to analyze and understand the data in a meaningful way in this video you learned what a join is as well as the purpose of joining data and its importance in data analysis by now you are aware that combining data and using join keys can save you hours of searching through vast amounts of data for a specific product item but did you know that you can simplify your query even further by specifying how the data should be combined in this video you will learn about join types specifically the difference between left outer right outer full outer and inner joins a join type in Microsoft PowerBI refers to how tables of data are related to each other in the software the joins are important because they determine how data is consolidated from multiple sources into a single view understanding joint types and their implications is crucial to building accurate efficient and meaningful data models in PowerBI over the next few minutes you’ll be introduced to four different join types left outer right outer full outer and inner join let’s explore each join type and the way it combines data from multiple tables based on matching criteria let’s say we have two tables one on the left for sales and one on the right for countries the sales table has three columns date country ID and units the countries table has two columns ID and country the sales table country ID column can be used as a join key with the ID column of the countries table now let’s explore each join type and how they combine data first let’s start with a left outer join if a left outer join is used all rows in the left table are kept and the matching rows from the right table are merged in if the left table is missing columns that the right table has the columns are included as part of the merge it is important to note that if there is no match for a row between the tables default or null values will be used for columns where matching data is unavailable in this scenario the resulting table will have the columns from the left table date country ID and units along with a country name column since the right table did not have a country ID of four the country name is null a right outer join works similarly to the left outer join except that all rows in the right table are kept and the matching rows from the left table are merged in again if the right table is missing columns that the left table has the columns are included as part of the merge similarly if there is no match for a row between the tables default or null values will be used for columns where no matching data is available in our scenario the resulting table will have date country ID units and country name the full outer join is used when you want to retrieve all records from both tables regardless of whether they have matching values in the join condition in this scenario since the right table has an ID of four and the left table does not have a corresponding entry with a country ID of four a row is created with a country name for ID 4 and with null values in all other columns in the previous video what is a join you used full outer joins and appended with joins by matching related data for inner join only matching rows from both left and right tables are merged together this join type is helpful when you want to focus only on the sales that have corresponding data in another table and exclude any sales data that don’t match as a data analyst you often come across the requirement to combine data from different tables or data sets related to sales and product tables this is where merging operations specifically join types become crucial keep in mind that you should choose the combination types based on how you choose them taking into account the specific needs of the analysis the choice of join type will impact the inclusiveness of the data in your analysis it’s important to consider your analysis objectives and the specific requirements of your project each join type serves a different purpose and selecting the appropriate one ensures that you obtain the desired result set for your analysis of order and order details data as you start working with more and more data sources keeping all the different data in different tables will become quickly unmanageable identifying similar and related data that can be merged is an important skill for a data analyst over the next few minutes you will learn how to identify and merge tables using joins in PowerBI in relational data fields such as category or status are often kept in a separate table for instance when a new product is added the category information is associated with an entry in a different table instead of being manually repeated in multiple rows in the product table as you have previously learned data from two different tables can be linked by join keys this works for tables from individual and multiple data sources however sometimes you’ll be working with a single data source such as a database where these relationships are already established in these scenarios merging the data using a join is a straightforward operation a column in one table will act as a key to the column of another table in databases this is known as a foreign key relationship and the foreign key is used as the join key this is almost impossible for databases that have a large number of products for example an e-commerce business selling books or adventure works who sell a large number of product variants selecting from defined categories or any other parametric data ensures easy classification of data and enables us to work within a consistent and comprehensive data set consider a scenario where you are working in the sales department of Adventure Works a multinational bicycle store and you have been given a task by your manager Adio Quinn to consolidate orders and their corresponding details currently in two tables into a single table there is a typical foreign key relationship between the order and order details tables which is order ID adventure Works provides the following details to deal with situations such as this the orders table is created to store information such as the name of the store the date of the purchase the cashier’s name and so forth since there can be multiple individual products associated with a single order Adventure Works database has created a separate but related table to store these variable numbers of associated product purchases it allows you to add new products to your current purchase by opening as many rows as needed in this way you’ll develop a structure that is dynamic and flexible saving space and time by only storing the necessary information to truly understand the join operation or in PowerBI terms the combine with merge operation it is important to first understand the relationship between tables the merging operation arises from the need to separate tables avoid forcibly distributing data that can be stored in a single table into separate tables visualize relationships such as product category transaction status person city where the definition table and its rows need to be separated in the order example the order details can connect unique data with repeating data in a more efficient manner now you can complete your task to combine the two tables orders and orders details with merge go to home on the power query editor ribbon and select combine then merge queries drop-down menu and select merge queries as new this selection opens a new window where you can select the tables that you want to merge from the drop- down list next select the column that matches between the tables which in this case is order ID select left outer join in the join kind drop-down which displays all rows from the first table and only the matching rows from the second after you select okay you are directed to a new window where you can view your new merged query now let’s take a look at doing this in more detail in Microsoft PowerBI in this scenario you are working in the sales department of Adventure Works which is a multinational bicycle manufacturer and you have been given a task by your manager Adio Quinn to consolidate orders and their corresponding details which are currently in two tables into a single table in PowerBI you select the Excel workbook option in the data group of the home tab select order.xlsx and order details.xls XLSX there is a typical foreign key relationship between the orders and order details tables let’s try to understand this with an example from our own social life we have all probably shopped at a market at least a few times at the end of the shopping we go to the cashier scan our items make the payment and receive a receipt the receipt contains information such as the name of the store the date of the purchase the cashier’s name and various other details at the bottom of the receipt there is a section that lists the quantity unit price and total amount for each item purchased followed by a grand total or the amount paid now let’s explore how we can structure these commonly encountered pieces of information into a table format adventure Works provides the following details to deal with these situations the order table is created to store information such as the name of the store the date of the purchase and other details found on the receipt in our earlier market scenario since there can be multiple individual products associated with a single order Adventure Works database have created a separate but related table to store these variable numbers of associated product purchases it allows you to add new products to your current purchase by opening as many rows as needed in this way you develop a structure that is dynamic and flexible saving space and time by only storing the necessary information to truly understand the join operation or in PowerBI terms the combine with merge operation it is important to first understand the relationship between tables if there is a need to separate tables the merging operation arises from that need avoid forcibly distributing data that can be stored in a single table into separate tables visualize relationships such as product category transaction status person city where the definition table and its rows needed to be separated now in the example of order order details that we have learned you have connected unique data with repeating data in a more efficient manner now you complete your task to combine the two tables order order details with merge go to home on the power query editor ribbon and select combine then the merge queries drop-down menu where you can select merge queries as new this selection will open a new window where you can choose the tables that you want to merge from the drop- down list and then select the column that is matching between the tables which in this case is order ID you will choose to use a left outer join in the join kind dropdown which displays all rows from the first table and only the matching rows from the second after you click okay you will be routed to a new window where you can view your new merged query and that concludes how to combine tables with merge in PowerBI in this video you learned how to combine data by merging tables and queries it can help you to consolidate information from multiple tables and queries by using related fields with foreign keys good job adventure Works is looking to expand its business by identifying new product lines that it can market to its customers it hopes that the results of data analysis will identify potential new product lines meet Daniel he’s a talented data analyst with Adventure Works they’re in-house expert on configuring and transforming data in PowerBI including merging data in Power Query adventure Works has noticed that a lot of customers have been returning bicycles to their stores for repair and maintenance these are often very simple repair and maintenance tasks like replacing tires or tightening loose bolts and screws the company suggests that Daniel analyzes the customer and sales data related to these transactions perhaps these customers might be willing to purchase a service plan for their bicycles first Daniel identifies the relevant data sources he begins with an Excel sheet named sales data this worksheet contains data on each bicycle Adventure Works has recently sold including the categories they belong to a description of each bike the prices they sold for and the staff who sold them the worksheet also includes data on the repairs carried out on each bike like the names of the parts that were replaced there are other relevant data sets available on a sheet named customer data this worksheet provides information on all customers including their names contact details age the bikes they have purchased and the repairs they have requested daniel uploads these data sources to PowerBI where he configures them for data analysis by transforming the data sets in Power Query once the data has been configured and transformed Daniel then uses joins to merge these worksheets together to identify what kind of bicycles customers are buying which customers are sending their bicycles to the store for repair and what kind of repairs are required he uses the results of his analysis to segment customers into profiles that focus on data such as age groups location and purchases he then identifies related search engine queries for individuals who match these profiles through combining and analyzing this data Daniel discovers that many of the customers seeking repairs are adults between the ages of 18 and 35 who live in rural areas this demographic mostly purchases mountain bikes which they use for weekend biking excursions he presents his data insights to Adventure Works the company realizes that he can offer these customers a service plan or bicycle health check in addition existing store staff can carry out these repairs so no new staff are needed to deliver this product it also helps the business to retain and generate a new revenue stream from existing customers this scenario emphasizes the importance of combining or merging data sources in Microsoft PowerBI by combining data sets you can deliver new insights on topics in the case of Adventure Works Daniel was able to create a customer profile and identify the needs of that profile adventure Works then provided a new product to this customer profile when it comes to generating data insights the benefits of merging data sources can’t be overstated the more data you have on your topic the greater an understanding you can develop and all of this can be achieved with Microsoft PowerBI and a strong data analytics skill set congratulations on reaching the end of the third week in this course on extracting transforming and loading data in PowerBI you’ve now reached the end of this module let’s take a few minutes to recap what you’ve learned you began this module by exploring the process of transforming data in PowerBI you first examined why data needs to be transformed you learned that raw data is not always gathered or sourced in a condition that’s suitable to work with it might be incomplete inconsistent or have other errors so it’s important that you transform and clean your data you can clean data by setting up filters in PowerBI that identify and resolve errors this way the filter data is accurate consistent structured and easier to analyze you then reviewed Power Query and its interface you learned how to navigate this interface and locate useful tools and features for connecting cleaning and transforming data from a wide range of sources and you explored the steps for these actions by helping Adventure Works connect to its data sources and then clean and transform the data they contained an important part of this cleaning process includes the applied steps list an editable list of all transformations applied to a selected query you can use this list to undo and reorder steps in the process next you explored the different data types in PowerBI the data types you explored included number types data and time type text or true or false and binary you learned that these different data types are used to classify values to help you better organize and structure your data sets you also learned that when working with data sets you might need to remove and rename columns you were presented with many of the benefits of reworking columns like more efficient readable and enhanced data and analysis or significant time and resource savings you continue to explore Power Query by reviewing steps for dealing with common errors power Query can fix errors like null values duplicate rows and inconsistent data types it’s important to resolve these errors before analyzing your data in Power Query you then made use of your new knowledge by helping Adventure Works to prepare a data set by cleaning the data and resolving its errors you then undertook a knowledge check in this item you proved your understanding of the concepts you encountered by answering a series of questions finally you explored a list of additional resources designed to help you improve your knowledge of the topics that you covered this week in the second week of this module you explored advanced data transformation methods in PowerBI you began this week by learning about the importance of data combination combine information create relationships between tables improve data and analysis and simplify data management you then reviewed the two main methods for combining data in PowerBI which are append and merge append means to add one table row or query to another merge means consolidating data from multiple data sources into a single table and you examined the process for combining tables with append and power query editor you then put your new skills to use by assisting Adventure Works with appending tables in their database next you completed a knowledge check which tested your understanding of these concepts through a series of questions and you were presented with a list of additional resources that you could review to learn more about advanced data transformation in week three you learned about methods for combining data that you could use for data transformation you discovered that one method of combining data is to use a join a join is a useful way of combining data from different sources you also learned that join keys are the values used to link rows between tables you also learned that there are different types of joins these different types include the left outer join right outer join full outer join and inner join which of these join types you choose to use depends on your data transformation needs you then looked at how to combine tables using a merge operation in Power Query Editor by identifying the relevant keys and require join operations you can merge two or more tables to deliver new insights into your data next you demonstrated your competence with these new skills by helping Adventure Works to merge two of their data sources to deliver new insights into their business finally you undertook a knowledge check which tested your understanding of the concepts that you encountered this week and you completed a module quiz in which you demonstrated your understanding of all concepts you encountered throughout the entire module you’ve learned a lot about transforming data in PowerBI and as you approach the next module consider going through some of the learning material again to reinforce your understanding looking ahead you will expand your knowledge of the ETL process by diving into advanced ETL in PowerBI where you will learn all about loading and profiling data and advanced queries best of luck you have gained detailed knowledge about the extract and transform steps in the ETL process so far and you have applied this knowledge by considering scenarios and tasks in this video you will learn about the final step of the ETL process load the load operation in summary enables the transformed data obtained by reading from a data source to become available for reporting purposes considering that the ultimate goal of PowerBI is to provide data visualization through reports and dashboards the importance of making the data available for this purpose becomes evident up until the load stage you have completed tasks such as accessing data sources establishing connections extracting data and performing transform operations the purpose of all these operations was to bring meaningful and cohesive data into the reporting interface filtered based on specific criteria the load process ensures the visualization of all the extracted and transformed data there are two main ways to load data in the PowerBI user interface load and transform data let’s look at each option a bit closer starting with load with the load option data is loaded directly into the data pane in PowerBI if you choose to load data directly you can still transform the data at a later stage the second option transform data allows you to transform the data before loading it the changes to the data are applied to the data model and the data pane is refreshed in PowerBI visualizations can now use the applied changes whether you choose to load the data directly with the load option or transform the data before loading with the transform data option loading time can vary depending on the size of your data set optimizing performance and reflecting updated data from the source in reporting are of great importance in the data loading process in the upcoming sections you will gain detailed information about these topics in some cases you might have some source tables which are used during the ETL process that will not be used directly in the reporting area and some of these tables may not meet the production demands of your data warehouse in such cases you will need an intermediate state between the data source and the data warehouse called the data staging area a staging area serves as an intermediate storage location for raw or unprocessed data allowing it to be temporarily stored and prepared for further processing in a data pipeline the existence of a data staging area is not obligatory for your ETL jobs so you can execute ETL jobs without creating staging areas however it is recommended to simplify the process of data cleansing and consolidating data coming from multiple sources by now you know that the data loading process is the final step of the ETL operation and that it is the most crucial step for making the data available in the reporting environment to achieve this the data is loaded into Power Query either directly from the data source or after performing transformation operations additionally a staging area is often used as an intermediate step to store the data in a more organized manner aiming to facilitate maintenance and management tasks by completing the load stage you are now ready to explore the data create compelling visualizations and gain valuable insights to support decision-making for your organization data staging is one of the key concepts in data loading over the next few minutes you will learn the basics of data staging the reasons for its necessity and the advantages of using it in the overall ETL processes to better understand the concept of staging let’s use an everyday life example imagine you’ve invited friends over for dinner and you’ve bought ingredients from the grocery store to prepare the meal however you don’t serve the ingredients as they are you might marinate the meat in a pot cut the vegetables and place them in a bowl for washing and prepare other dishes like making a salad or putting appetizers on a plate in this example all the ingredients represent raw data while the processes of marinating washing cutting and waiting correspond to ETL operations the pots bowls and other utensils used before serving can be thought of as the staging area now let’s apply this everyday life example to data staging a staging area serves as an intermediate storage location for raw or unprocessed data allowing it to be temporarily stored and prepared for further processing the staging area typically acts as a bridge between the data sources and the data warehouse a staging area simplifies the process of data cleansing and consolidation of operational data originating from multiple source systems particularly for enterprise data warehouses that centralize an organization’s critical data remember a data staging area is not required for your ETL jobs you can still execute ETL jobs without creating one however based on your need to consolidate data coming from multiple sources it is recommended over at Adventure Works the company receives feedback about its products from various channels such as social media platforms and corporate websites your manager Adio Quinn has tasked you to prepare a data set by using these resources to consolidate and to prepare the data for use in reports and dashboards none of the feedback can be used in its raw form as they have different formats you must transform the data and then consolidate it in a unified list since you will only use this data in the ETL process it is appropriate to use a staging area let’s take a few moments to complete this task using Power Query the first step is to import the two data sets Adventure Works social media feedbacks one and Adventure Works Social Media Feedbacks 2 to transform and consolidate in the staging area to do this navigate to the home ribbon tab at the top of the PowerBI window select the Excel workbook button inside the data group in the middle of the toolbar select your data sets and select open then select your data sets and select transform data in the window that opened now you have two queries Adventure Works social media feedbacks one and Adventure Works social media feedbacks 2 in the queries pane at the left menu of Power Query to successfully complete your task you have to consolidate these two queries into a single query and add an extra column to indicate where the feedback came from to do this you have to use these queries and integrate the data into a more defined and optimized model to do this you need a staging area as you have to consolidate these two tables into one but also keep them separately you have to create a new group called the staging area in the queries pane at the left menu of power query select new group type staging area in the name text box and select okay now move both the data sets adventure work social media feedbacks one and adventure work social media feedbacks 2 to the staging queries group your tables are now organized according to your need select the Adventure Works Social Media Feedbacks one and Adventure Works Social Media Feedbacks 2 tables respectively and disable the load by clearing the checkbox enable load you will keep the include and report refresh option this way both tables will still be used in queries but will not be part of the data model you are now familiar with a concept of a staging area and how it is implemented in PowerBI imagine you have just started working at Adventure Works as a data analyst you have a lot of data to analyze to determine which products are preferred by which client and why to perform successful analysis on these many items it is necessary to have data that includes fields suitable for analysis with an adequate amount of data and a variety of data ranges representing the overall data over the next few minutes you will be introduced to data profiling and statistical analysis and why it is important when reviewing data sets by the end of this video you will have been introduced to a high-level understanding of data profiling and statistical analysis when reviewing data sets you will also learn about the distribution anomalies and outliers in the context of data profiling let’s first cover an introduction to data profiling before analyzing any data set it is important to examine and evaluate the data you are working with analyzing the data without evaluating its accuracy completeness and alignment with your objectives can lead to misleading results when examining a data set for the first time there are several aspects you should look at especially for numerical fields you should check these characteristics for each numerical field minimum or min maximum or max average or mean frequently occurring values or mode and standard deviation the best way to start assessing data is with data you can immediately troubleshoot imagine you are reviewing a data set that has an age field for instance there could be someone in the data set with an age of 200 which would be extremely unlikely to be true if so there may be an outlier in the data look at the minimum and maximum values such as appearing between 21 and 77 these are realistic ages unlike 200 the concept of distribution of data refers to how the data points are spread or arranged within a data set it describes the pattern or shape of the data when plotted on a graph understanding the distribution of data is crucial in data analysis because it helps you gain insights into the central tendency variability and overall characteristics of the data next let’s consider outliers the formal definition of an outlier in statistics is a data point that significantly deviates from other observations outlier data can be handled by applying a technique called min max scaling or normalization the aim is to adjust the mean and standard deviation of the data proportionally while preserving the ratio of the distance between outlier data and other data points analyzing the distribution allows you to make informed decisions identify outliers and choose appropriate statistical techniques for further analysis there are situations where there may be values in the data set that skew the average for example there may be examples close in age let’s say there are three individuals aged 80 and above if you solely rely on the average to evaluate the distribution these outliers can mislead you by increasing the average in this case it would be appropriate to examine the distribution more closely when taking a closer look at the data you may find that the distribution is normal but the three records mentioned in the example are outliers next let’s look at standard deviation standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a data set it provides a way to understand how individual data points differ from the mean or average of the data set the main objective here is to prevent outliers from causing deviations in your analysis results minimizing their impact finally let’s return to the point of distribution of data the balanced distribution of data points that fall outside the outliers is another factor that affects data quality and your analysis results it is important for descriptive variables such as age gender income status occupation city and neighborhood to represent as many diverse groups as possible and be evenly distributed among others if not a cluster of records that closely resemble each other will lead to narrow intervals when defining norms which will mislead your analysis profiling and statistically analyzing data including examining its distribution min max mean and mode values detecting outliers if any and normalizing outliers ensuring that the data represents the entirety of the data set are the key elements that demonstrate data quality considering these factors will enhance the accuracy and quality of analysis and predictions made with this data by now you should have a good understanding of the concepts of profiling data and possible situations where you will need to apply the profiling techniques in this video you will learn about data profiling and statistical analysis and how to use them in PowerBI as well as this you will cover how to use profiling tools to inspect the data adventure Works recently conducted a field survey to increase sales and collected potential customer data this resulted in an Excel file containing information such as age gender occupation income level address and phone number of prospective customers since the survey data was collected manually it was not subjected to any validation therefore before analyzing the data it is necessary to confirm that the data is valid within the desired ranges and quantities and exhibits a good distribution before starting analysis on any data set it is important to examine the data by examining various aspects such as completeness accuracy uniqueness and consistency data profiling enables the identification of potential issues and anomalies within the data set this proactive approach allows you to make informed decisions about data cleaning transformation and enrichment ultimately leading to improved data quality additionally data profiling facilitates effective data exploration and visualization by providing insights into data patterns relationships and trends it empowers users to discover hidden insights uncover data inconsistencies and make datadriven decisions with confidence before delving into data profiling tools let’s first consider two important factors in data profiling unique and distinct in PowerBI unique is known as total number of values that only appear once distinct is known as total number of different values regardless of how many of each you have microsoft PowerBI offers the following two profiling tools in the Power Query editor column quality and column distribution let’s begin with column quality column quality focuses on valid error and empty rows on each column allowing you to validate your row values the column quality feature labels values in rows in five categories valid shown in green error shown in red empty shown in dark gray unknown shown in dashed green indicates when there are errors in a column the quality of the remaining data is unknown unexpected error shown in dashed red these indicators are displayed directly underneath the name of the column as part of a small bar chart the number of records in each column quality category is also displayed as a percentage by hovering over any of the columns you are presented with a numerical distribution of the quality of values throughout the column additionally selecting the ellipses button opens some quick action buttons for operations on the values column distribution provides a set of visuals underneath the names of the columns that showcase the frequency and distribution of the values in each of the columns the data in these visualizations is sorted in descending order from the value with the highest frequency by hovering over the distribution data in any of the columns you get information about the overall data in the column with distinct count and unique values you can also select the ellipses button and choose from a menu of available operations let’s consider column distribution specifically relating to distribution of distinct and unique amounts imagine that you have a selection of bike accessories that are supplied by four different suppliers supplier A supplier B supplier C and supplier D in this case there are four distinct suppliers now imagine you have two bikes each with a unique supplier to any other bikes you currently stock these would be considered two unique suppliers another type of profiling in PowerBI is column profile column profile provides column statistics such as minimum maximum average frequently occurring values and standard deviation and in addition value distribution on the selected column this is very important when assessing data to detect anomalies and outliers now that you’ve covered the basics of data profiling tools let’s apply this in PowerBI and inspect some data adventure Works conducted a field survey to increase sales and collected potential customer data this survey resulted in an Excel file containing information such as age gender occupation income level address and phone number of prospective customers the survey data was collected manually it was not subjected to any validation therefore before analyzing the data it is necessary to confirm that the data is valid within the desired ranges and quantities and exhibits a good distribution navigate to home at the top of the PowerBI window select Excel workbook inside the data group in the middle of the tab select potential customers.xlsx and select transform data in the opened window check the column quality check box inside the data preview group of view to assess column quality in the age column 89% of the values are valid 0% of the values are air and 11% of the values are empty rows to assess column distribution for the occupation column on the view tab from inside the data preview group check column distribution note that there are nine distinct values and two unique values computer programmer and accountant are the occupations which appear only once for each column note that if all the row values are distinct then unique and distinct amounts will be equal for example you can see that there are 19 distinct and 19 unique values for the surname column select the age column and then check the column profile checkbox note that maximum value for age column is 132 which is not acceptable examine the minimum maximum average and other column statistics and review the value distribution chart in this video you learned how to profile data by assessing column quality distribution and profile data profiling in PowerBI offers several advantages in the process of data analysis it helps you gain a comprehensive understanding of the data quality structure and distribution with its ability to assess data quality and provide valuable insights data profiling in PowerBI plays a crucial role in enhancing data reliability accuracy and overall analytical outcomes in the world of technology even the most meticulously designed software can harbor hidden bugs waiting to unleash chaos upon unsuspecting users imagine a scenario where a simple bug managed to infiltrate a company’s database threatening to compromise the accuracy of critical reports and potentially sending shock waves through senior management however thanks to the miraculous powers of data profiling with the aid of PowerBI disaster was averted and the company emerged victorious buckle up as we take you on a thrilling journey through the realm of software mishaps triumphs and the heroes who saved the day it all began innocently enough deep within the complex coding of a company’s flagship software a tiny bug had nestled its way into the system this bug had an uncanny ability to transform innocent data into deceptive monsters causing them to wreak havoc when unleashed into the wild the bug was sly and patient biting its time until the perfect moment to strike as the software went about its daily operations the bug began silently distorting the data it touched unbeknownst to the users inaccuracies were creeping into the system lurking beneath the surface reports that were once reliable now became unreliable leading to questionable decisions and raised eyebrows among senior management fortunately the company had an ace up its sleeve a team of brilliant data profilers armed with the mighty PowerBI with its robust data profiling capabilities PowerBI became the ultimate weapon against the deceptive bug and its corrupted data the team rallied together ready to utilize PowerBI’s analytical prowess and visualizations to uncover the truth hidden within the tainted database armed with PowerBI the heroic team embarked on a quest to hunt down and eradicate the corrupted data they connected PowerBI to the company’s database leveraging its intuitive interface and advanced algorithms to identify the anomalies lurking within the system powerbi’s data profiling features allowed the team to analyze and scrutinize every nook and cranny of the company’s data unearthing the bugs footprints one by one after days of tireless work the data profilers empowered by PowerBI emerged triumphant they successfully identified and isolated the distorted data ensuring its exclusion from future reports powerbi’s rich visualizations and interactive dashboards enabled the team to present their findings to senior management in a clear and concise manner further solidifying their victory as the dust settled the company took a moment to reflect on the incident they recognized the transformative power of PowerBI’s data profiling capabilities and the critical role it played in safeguarding their data integrity the bug had served as a wake-up call reminding them of the importance of incorporating robust data profiling tools like PowerBI into their systems helping them catch potential issues before they cascade into crisis in this thrilling tale of software mishaps and heroic data profilers we’ve witnessed how a simple bug had the potential to plunge a company into chaos however thanks to the power of data profiling with the aid of PowerBI accuracy was restored the diligent efforts of the data profiling team did not go unnoticed as senior management praised them for their exceptional work and dedication in resolving the crisis the successful outcome served as a reminder of the invaluable role data profiling plays in maintaining the integrity of systems it showcased the power of collaboration expertise and the remarkable capabilities of tools like PowerBI in conquering challenges and emerging triumphant as a data analyst at Adventure Works your team is responsible for analyzing vast amounts of data to gain insights into customer behavior and improve business operations microsoft Power Query is an essential tool in data analysis workflow enabling you to transform and integrate data from various sources you heavily rely on Microsoft PowerBI for your daily tasks preparing reports for business units by connecting to data sources and performing extract transform and load operations since adventure works strive for optimal efficiency and results your manager Adio Quinn has assigned you the task to research best practices for specific configurations performance preferences security and other related topics to ensure the most optimal use of PowerBI in your work over the next few minutes you’ll be introduced to best practices when working with data sources in PowerBI and also understand why these practices are important to implement let’s start by exploring how you and your team can apply best practices to enhance your Power Query workflows and improve data quality and analysis your first step is to plan and document your data transformation requirements you define the desired output identify the relevant data sources and outline the transformations needed you also ensure that data source credentials are properly documented and securely stored by maintaining an organized and consistent approach your team can streamline your Power Query process and avoid confusion next you carefully select the appropriate connector to connect to your data sources you consider factors such as the type and location of the data source the volume of data and the available connectivity options with PowerBI’s wide range of connectors you can seamlessly connect to databases cloud services files and APIs it is important that you evaluate the performance capabilities and scalability of the connectors to ensure optimal performance for your data requirements considering the performance and optimization of your data transformations and calculations your team follows the principle of do expensive operations last you prioritize and schedule resource intensive operations towards the end of the data transformation process this approach ensures that complex calculations merging large data sets and applying multiple transformations on a significant number of rows are executed efficiently leading to faster data loading and more responsive reports your team also pays attention to data type selection for columns aiming to improve performance and data accuracy you review and adjust the inferred data types manually preventing incorrect data interpretations and reducing memory consumption data profiling plays a crucial role in your team’s data analysis process you leverage PowerBI’s data profiling capabilities to gain a comprehensive understanding of data quality structure and distribution by examining aspects such as completeness accuracy uniqueness and consistency you identify potential issues and anomalies within the data set this proactive approach enables you to make informed decisions about data cleaning transformation and enrichment ultimately improving data quality to ensure smooth data processing your team implements error handling techniques such as conditional logic and custom error messages you also incorporate data validation checks to identify and handle unexpected data inconsistencies effectively the next best practice is to consider your merge strategy when merging or joining multiple queries you consider the most efficient merge strategy selecting inner joys whenever applicable you remove redundant fields to avoid unnecessary duplicate columns in the resulting merge query to maintain an organized work environment your team utilizes groups as containers for your queries you create nested groups when needed and easily move queries between groups by dragging and dropping them regularly reviewing and removing unnecessary steps in the Power Query editor is another practice you follow removing unused or redundant transformations helps improve processing time and simplifies query maintenance monitoring the performance of your Power Query workflows is an ongoing task for your team you evaluate the refresh speed resource consumption and overall efficiency by fine-tuning query settings such as parallel loading or data load options you optimize performance based on your specific requirements following these best practices when working with Power Query will enable you to effectively shape and transform your data while maintaining data integrity improving performance and streamlining your workflows remember consistent documentation efficient data filtering error handling and optimization techniques are key to achieving reliable and efficient data transformations with Power Query embrace these practices adapt them to your specific requirements and continue exploring new features and capabilities to become a Power Query expert in the world of Microsoft PowerBI data is the foundation of meaningful insights and informed decisionmaking however managing and preparing data for analysis can be a complex and timeconsuming process this is where data flows can help in this video you will explore what data flows are and why they are used in PowerBI you’ll learn the subscription level required to use them and engage with a fictional scenario showcasing their application and the advantages and limitations they offer adventure Works is a company operating in multiple regions each with its own set of data sources and reporting requirements to manage these multiple data sources Adventure Works wants to use the PowerBI data flows feature data flows allow you to connect to data sources perform data transformations and create business logic to build data entities that can be shared across different reports and dashboards they can also be published to the PowerBI service and in shared reports and dashboards data flows simplify the process of data preparation allowing users to cleanse transform and shape their data with ease you can apply business rules clean untidy data and create calculated columns through Microsoft Power Query a powerful data transformation tool within PowerBI data flows offer a visual interface for building data transformation logic making it accessible to users lacking coding skills you can use data flows in Microsoft PowerBI Desktop and Microsoft PowerBI service in PowerBI desktop you can create and manage data flows using the Power Query Editor this allows you to connect to various data sources perform transformations and define the structure of your data entities you can then publish these data flows to the PowerBI service for further use once published to the PowerBI service data flows can be accessed and managed through the PowerBI web interface you can schedule data flow refreshes configure data connectors and establish relationships between data flows and other data sets in your workspace additionally you can use the capabilities of Power Query online a cloud-based version of Power Query to perform data transformations directly in the PowerBI service by supporting data flows in both PowerBI desktop and PowerBI service powerbi enables a seamless experience for users to create share and collaborate on data flows throughout the entire data preparation and analysis process this flexibility allows users to work with data flows using their preferred environment while ensuring consistent and efficient data management across both desktop and cloud-based environments a PowerBI Pro license is required to use data flows in PowerBI however a PowerBI premium subscription is necessary for advanced features and capabilities such as incremental refresh compute engine selection and larger data capacity powerbi premium unlocks additional functionalities and performance optimizations that enhance the data flow experience advantages of data flows include reusability data flows enable the reuse of query logic and transformations saving time and effort in data preparation tasks data centralization data flows provide a centralized and consistent data source ensuring data integrity and reducing duplication collaboration users can collaborate on data flows making sharing and working on data preparation processes easier scalability data flows use cloud-based processing capabilities enabling efficient handling of large data sets and complex transformations limitations of data flows include data refresh data flows have specific refresh limitations such as the frequency and dependencies on data source availability data flow management currently data flows are managed individually and there is limited visibility into dependencies between data flows advanced transformations while data flows offer a wide range of transformations certain complex scenarios may require advanced coding or alternative solutions data flows in PowerBI help users streamline and enhance their self-service data preparation workflows by providing a scalable and collaborative approach to data integration and transformation data flows enable organizations to unlock the true potential of their data while data flows offer numerous advantages such as reusability centralization collaboration and scalability you must be aware of their limitations and consider alternative approaches for advanced transformations by effectively using data flows you can accelerate data preparation ensure data consistency and make informed decisions based on reliable and well-prepared data power Query is a powerful data transformation and manipulation tool within PowerBI that allows users to shape and transform data from various sources but performing repetitive steps on multiple queries can be a tedious task especially when the queries involve similar but separate sets of data one of the key features to solve this issue is through reference queries which provide flexibility reusability and efficiency in your data transformation process in this video you will learn about reference queries in Power Query and its importance in streamlining data workflows you’ll also explore best use cases for reference queries and data flows by establishing a query reference you can establish a connection between an existing query and a new query enabling data flow across sequential models any modifications made to the original query will automatically apply to the referenced query ensuring consistency and up-to-date information instead of modifying transformations individually in multiple queries you can make updates in the master query and those changes will be automatically applied to all reference queries this provides cohesion and makes it easier to maintain and update your data transformations so what are the benefits of query referencing let’s explore some examples first there is reusability by referencing queries you can reuse common data transformations across multiple queries this promotes consistency in your data processing and reduces the risk of errors that can occur when duplicating complex transformations next there is efficiency reference queries eliminate the need to repeat time-consuming data transformation steps instead you can leverage the results of a previously defined query significantly improving the performance of your data workflows lastly you have scalability as your data analysis requirements grow reference queries allows you to build modular and scalable data transformation workflows you can create separate queries for different data sources or transformation steps and combine them as needed providing flexibility and adaptability to changing business needs in Power Query you can reference a query by using the reference option by right-clicking any query in the queries pane reference will create a new query a copy of the original query but containing one single step you can rename the new query as you need and then start to use it in this way you establish a connection between the queries enabling data flow and transformation continuity let’s delve into this further through a scenario you are working as a data analyst at Adventure Works which recently acquired another bicycle business your manager Adio Quinn has assigned you the task of appending the product data from the newly acquired company to Adventure Works’s existing products prior to appending the new products you need to perform several transformation tasks such as changing column types and removing unnecessary columns however your manager has asked you not to modify the existing queries to preserve their original form and use them as a source for other operations to accomplish this you need to create references from the original queries rename the new queries apply necessary transformations and then append the data any changes made to the base queries will impact on the new queries this approach allows you to keep the original queries update the reference queries and ensure that any changes made to the base queries are reflected in the referenced ones query referencing creates many opportunities for advanced data transformation techniques you can apply conditional logic merge referenced queries or perform calculations based on reference data these advanced techniques further enhance the flexibility and power of your data workflows referencing queries in Power Query is a fundamental concept that allows you to streamline and optimize your data transformation process by leveraging query references you can improve reusability efficiency and scalability ultimately enhancing the overall productivity and effectiveness of your data analysis in PowerBI as data volume continues to grow so does the challenge of transforming that data into well-formed actionable information we want data that’s ready for analytics to populate visuals reports and dashboards so we can quickly turn our volumes of data into actionable insights however managing and preparing data for analysis can be a complex and timeconsuming process it’s important to consider the best approach for your data transformations and analysis in this video you will explore how to reference other queries and why a data flow may be more suitable choosing between referencing queries and data flows depends on the specific requirements of your scenario it’s important to evaluate factors such as data volume complexity of transformations user expertise and maintenance requirements to determine the best fit for your use case there are some performance considerations you need to bear in mind with regards to reference queries especially reference queries can contribute to slow data refreshes due to the nature of their referencing when a reference query is refreshed it needs to ensure that all the referenced queries are also refreshed to maintain data consistency this can result in longer refresh times especially if there are multiple layers of referencing involved furthermore reference queries can overburden data sources particularly when working with large data sets as reference queries rely on the data from other queries they need to fetch and process the data from the original sources this becomes more noticeable when dealing with complex transformations or frequent refreshes to mitigate these issues it’s important to optimize the design and usage of reference queries consider limiting the number of reference layers and optimizing the queries transformations to reduce unnecessary data processing additionally carefully manage the refresh schedule to avoid excessive load on data sources during peak usage times by implementing these best practices you can help minimize the impact of reference queries on data refreshes and prevent overburdening your data sources now let’s review data flows data flows offer a centralized and scalable approach for data preparation data flows are designed specifically for data integration and transformation tasks providing a self-service environment for business users to create and manage extract transform and load processes referred to as ETL processes with data flows you can connect to various data sources perform transformations using a visual interface and store the prepared data in the PowerBI service data flows are a feature available in both PowerBI desktop and PowerBI service data flows provide a cloud-based data preparation experience where you can build manage and share reusable data entities in summary understanding the differences and best use cases between reference queries and data flows is essential for optimizing your data processing workflows in Power Query reference queries in Power Query is a fundamental concept that allows you to streamline and optimize your data transformation process by leveraging query references you can improve reusability efficiency and scalability ultimately enhancing the overall productivity and effectiveness of your data analysis in PowerBI remember practice makes perfect experiment with reference queries in Power Query to gain hands-on experience and discover the immense value it brings to your data analysis endeavors at Adventure Works you have a task that needs separate analysis for three main bike product categories you soon realize that to complete the task you’re creating the same query three times the only difference being the change to the bike category it’s inefficient to completely rewrite queries whenever there’s a minor change in the data or a slightly different question from management what if there was a way to create adaptable reusable queries there is the query parameters feature in Microsoft PowerBI allows you to define one query that can be easily adjusted to handle different categories or variables this video will help you understand the concept of query parameters in PowerBI it explains how to effectively implement and manage query parameters let’s learn how query parameters can make your data analysis tasks more efficient and adaptable query parameters in PowerBI is a powerful feature that allows users to input a value which is then used in the data retrieval process from a data source essentially it’s a placeholder for information that can change the query parameter can be used in various operations such as filters transformations or creating new columns and tables let’s explore some possible uses of query parameters at Adventure Works adventure Works can use query parameters when connecting to its database to retrieve specific information rather than importing the entire data set for instance Adventure Works can establish a query parameter for a sales date range by inputting the dates PowerBI will only fetch data for that period saving resources and time parameters can also be used in Adventure Works data transformations if there’s a need to frequently adjust a specific value in the transformations using a parameter avoids manual changes each time the value only needs to be updated in the parameter parameters can control filters on Adventure Works data if the company wants viewers of a report to concentrate on a particular product category they could create a parameter for the product category this allows the viewer to select the category they’re interested in and PowerBI will adjust the report accordingly now let’s explore creating query parameters in Microsoft PowerBI first you’ll need to open the Power Query editor in PowerBI to do this go to the top left corner of the PowerBI desktop interface there is a set of tabs in a ribbon layout one of these tabs is home select this home tab once you are in the home tab select transform data this action will open the Power Query editor in the Power Query editor go to the Home tab select the manage parameters option this opens the manage parameters dialogue box where you can create parameters to create a new parameter select new now you are able to name your parameter and define its properties for instance you might name it product category filter under type from the drop-own menu select text as the data type next specify what values this parameter can take from the suggested values drop-down menu choose list of values in the input field that appears create your list by entering the different product categories from your data set therefore the values here are such items as mountain bikes road bikes and touring bikes once you’ve filled in these details select okay then okay again in the manage parameters dialogue to return to the Power Query editor query parameters can significantly enhance your PowerBI reports making them more flexible and interactive parameters enable efficient data retrieval and transformation by allowing for dynamic changes helping you cater to evolving business needs without having to rewrite entire queries the more adaptable your data analysis tools are the more capable you become in meeting your organization’s everchanging demands this makes your work more efficient and enables you to provide valuable insights that can guide your company’s decision-making processes keep exploring keep learning and embrace the power of query parameters in PowerBI to improve your analysis in previous videos in this course you learned about advanced query capabilities data flows and the differences between reference queries as mentioned before every instance of data transformation performed in Microsoft Power Query adds a step to the Power Query process these steps can be rearranged removed or modified as needed to optimize the data shaping process whenever you use the Power Query interface M language code is executed to perform each operation behind the scenes the M language is available for you to read and modify directly in the Power Query Advanced Editor in this video you’ll learn how to use this advanced editor to update an M query a core capability of Power Query is to filter and combine data from one or more supported data sources any such data mashup is expressed using the Power Query formula language Mquery although you don’t have to know M language to use Power Query being familiar with the language used behind the user interface as well as being able to update it when necessary is valuable for anyone using the tool for example you may need to perform custom transformations that cannot be easily accomplished using the Power Query user interface alone this is where knowledge of the M language and its syntax can be helpful using the M language you can perform advanced data manipulation tasks such as conditional filtering custom column creation data type conversions and merging multiple data sources the language is designed to be expressive and efficient enabling you to handle large data sets with ease when you access the M

language code there are certain group names and meanings that are called M syntax let’s explore the syntax using an M language code snippet this snippet showcases how to handle various CSV file operations in Power Query including setting up the initial data source and performing data transformations loading the file specifying the delimiter and encoding for the CSV document calculating the number of columns and assigning a value to a variable you can find more information on M syntax in the additional resources of this lesson it can also serve as template code for further data transformations using the Power Query M language in PowerBI which you can customize based on your needs the advanced editor provides syntax highlighting autocomp completion and error checking features making it easier to write and debug your AM code it also offers functions and operators that allow you to perform various data transformations calculations and aggregations now let’s explore how you can use the advanced editor tool in Power Query and modify steps by updating M language code using a practical scenario a report designer informs Adio Quinn your manager at Adventure Works about an error being received in the Power Query window he assigns you the task of identifying the cause of this error and resolving it you investigate the issue by examining the steps in Power Query and analyzing the problem using the M language discovering that the error is a result of a change in the source files location let’s outline the steps to resolve this issue using the advanced editor tool let’s start with the source file and adventure work sales spreadsheet in Excel if you navigate to the home tab at the top of the PowerBI window select Excel workbook in the data group followed by the Adventure Works sales file and lastly select transform data in the opened window you’ll successfully access the Power Query editor however suppose the location of the source file is unintentionally changed by another person for example the Excel file is moved to another folder this will cause an error in the Power Query window to explore what happens as a result of this error let’s navigate to refresh preview in the query group on the home tab and select refresh preview from the drop- down menu when you refresh the preview you now get an error message indicating that the source file is no longer reachable as the location has changed you can resolve this issue by using the advanced editor to do this you need to select advanced editor in the query group on the home tab next you’ll need to read the error message and code carefully to determine the necessary action in this case you need to correct the file path in this scenario I’ll change the path from C data C3 M3 L3 Adventure Works Sales.Xlsx to C data adventureworks sales.xlsx your file path will differ from this as it will specify the location of the file on your computer after you’ve completed your correction you can select done in the opened window with this edit you’ve modified the code using advanced editor correcting the file path and resolving the issue by using the advanced editor and familiarizing yourself with the M language you can unlock the full potential of Power Query whether for error checking or creating sophisticated data transformations that meet your specific requirements the advanced editor empowers you to manipulate and shape your data precisely congratulations on reaching the end of the third week in this course on extracting transforming and loading data in PowerBI you’ve now reached the end of this module let’s take a few minutes to recap what you’ve learned you began this module by exploring the final step of the ETL process load you learned that the load operation enables the transformed data obtained by reading from a data source to become available for reporting purposes you then explored the two main ways to load data in the PowerBI user interface load this option directly loads data into the data pane in PowerBI and you can still transform the data at a later stage and transform data the option allows you to transform the data before loading it with changes being applied to the data model next you discovered that in some cases you might have some source tables which are used during the ETL process that will not be used directly in the reporting area in some of these tables may not meet the production demands of your data warehouse in such cases you will need an intermediate state between the data source and the data warehouse called the data staging area a staging area serves as an intermediate storage location for raw or unprocessed data allowing it to be temporarily stored and prepared for further processing in a data pipeline you then made use of your new knowledge by helping Adventure Works transform and consolidate data by using a staging area next you undertook a knowledge check in this item you proved your understanding of the concepts you encountered by answering a series of questions in the second week of this module you were introduced to data profiling in PowerBI you began this week by learning about the importance of data profiling and statistical analysis when reviewing data sets you also learned about distribution anomalies and outliers in the context of data profiling and you learned about standard deviation next you explored the two profiling tools in the Power Query editor column quality and column distribution you then put your new skills to use by assisting Adventure Works with data profiling and statistical analysis using the profiling tools in PowerBI to inspect data next you completed a knowledge check which tested your understanding of these concepts through a series of questions in week three you discovered the best practices when working with data sources and why these practices are important to implement then you had the opportunity to complete a practical exercise importing a data set while considering the best practices you were then introduced to data flows you explored what data flows are and why they are used in PowerBI you learned about the subscription level required to use them and engaged with a fictional scenario showcasing their application and the advantages and limitations they offer next you explored reference queries and their importance in streamlining data flows reference queries in Power Query refer to the practice of using the output of one query as a data source or transformation step in another query you then explored the performance considerations you need to bear in mind when using reference queries next you demonstrated your competence with these new skills by helping Adventure Works to merge two of their data sources using reference queries to deliver new insights into their business next you explored the query parameters feature in Microsoft PowerBI you learned that this feature allows you to define one query that can be easily adjusted to handle different categories or variables and you examined the process for disabling helper queries in PowerBI after that you were introduced to the advanced editor and learned how to modify code you learned that whenever you use the Power Query interface M language code is executed to perform each operation behind the scenes and you learned that although you don’t have to know M language to use Power Query being familiar with the language used behind the user interface as well as being able to update it when necessary is valuable for anyone using the tool you then explored the various global options PowerBI offers that allow you to customize and optimize your experience when working with files you learned that these options provide flexibility and control over file settings ensuring a seamless workflow and enhancing your overall productivity finally you undertook a knowledge check which tested your understanding of the concepts that you encountered this week and you completed a module quiz in which you demonstrated your understanding of all concepts you encountered throughout the entire module you should now be familiar with the advanced ETL processes in PowerBI you should be capable of loading data with PowerBI profiling this data and using advanced queries in PowerBI great work you have almost reached the end of this course in this video you’ll consolidate key concepts you learned throughout you’ll revisit essential learnings related to the data analysis process for businesses and transforming data into valuable insights using PowerBI through your continuous effort you’ve gained a solid foundation in collecting data from and configuring multiple data sources in PowerBI preparing and cleaning data using Microsoft Power Query and inspecting and analyzing data to ensure data integrity you have demonstrated tremendous dedication to this course through your engagement with the videos readings exercises and quizzes what’s left now is to demonstrate the skills you’ve learned in the final course project this recap will serve as valuable preparation for your final course assessment and graded quiz in the final course assessment you’ll apply what you’ve learned by completing tasks that simulate a real world data analysis scenario to consolidate your learning you’ll then take a final graded quiz to assess the knowledge and skills you gained throughout this course let’s get started by revisiting your first week of learning in the first week you learned about data sources local and shared data sets working with Excel data types storage modes triggers and actions you primarily focused on data sources in the process you covered the skills to connect data sources choose the correct query modes either import or direct and setting up triggers and actions to stay updated with the frequently changing data week two began with analyzing the need behind the data transformation and getting familiar with the Power Query interface which will be used throughout the ETL operations you continued your journey with learning about columns data types applied step lists and common data errors and then you prepared a data set you also learned how and why to pivot and unpivot tables which are very popular operations finally you applied combining table operations which are appending merging and joining tables these week two contents are fundamentals for ETL operations week three began with loading data and staging area concepts you applied an end-to-end ETL operation then learned about data profiling which is very important for understanding data quality and distribution this helps you detect a potential anomaly in a data set before you start to analyze it you then explored how to use M language and advanced editor to apply detailed operations in Power Query finally you learned data flows and reference queries which are used to increase efficiency and productivity this course equipped you to use PowerBI and Power Query to construct end to end ETL solutions starting from understanding data sources then advanced transformation techniques and ended by loading data in PowerBI as you embark on the final course project and assessment you can approach it with confidence knowing that you’ve built a strong foundation of knowledge and skills by committing to your learning journey throughout the course however if you feel the need to review any of the concepts summarized for you in this video or require additional preparation remember that you have the flexibility to revisit any of the course items this might only be the start of your journey toward a career as a data analyst but you can be very proud of yourself for how much you’ve already learned and accomplished now you’re ready to tackle the course project and graded assessment quiz good luck you’ve got this well done on completing this course you should be proud of the progress you’ve made in your data analysis learning journey with Microsoft PowerBI throughout the course you explored how to extract transform and load data using PowerBI in depth gaining expertise in building ETL solutions using PowerBI and Power Query you explored collecting data from and configuring multiple data sources in PowerBI preparing and cleaning data using Microsoft Power Query and inspecting and analyzing data to ensure data integrity you learned about data sources and setting them up in PowerBI as well as some of PowerBI’s ETL capabilities including connectors storage modes and setting up triggers plus you discovered more about transforming data using Power Query whether you’re cleaning and preparing data sets in Power Query to deal with errors and inconsistencies or performing advanced transformations to combine data you are now better equipped to transform data using PowerBI and don’t forget that you now have more insight into loading and profiling data in PowerBI as well as performing advanced queries in Power Query you even practice transforming multiple data sources a key real world skill for a data analyst congratulations on the expertise you’ve gained in extracting transforming and loading data in PowerBI this insight marks a valuable milestone in your journey to comprehensively using PowerBI to unlock valuable insights from data completing this course contributes towards gaining the PowerBI analyst professional certificate from Corsera these professional certificates are designed to equip you with the necessary skills to become job ready for in- demand career fields the Microsoft PowerBI Analyst Professional Certificate in particular not only offers you the opportunity to enhance your data analysis skills but also gain a qualification that can lay the groundwork for a career as a PowerBI data analyst plus the professional certificate will help you prepare for the exam PL300 Microsoft PowerBI data analyst by passing the PL300 exam you’ll earn the Microsoft certified PowerBI data analyst certification this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to prepare data model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX you can visit the Microsoft certifications page at http://www.learn.microsoft.com/certifications learn.microsoft.com/certifications to learn more about the PowerBI data analyst certification and exam this course enhanced your knowledge and skills in the ETL process in PowerBI but what comes next well there’s more to learn so it’s recommended you move on to the following course in the program whether you’re new to the field of data analysis or already have some expertise and experience completing the whole program demonstrates your knowledge of and proficiency in analyzing data using PowerBI you’ve done a great job so far and should be proud of your progress the experience you’ve gained will showcase your willingness to learn motivation and capability to potential employers it’s been a joy to take part in your learning journey keep up the excellent efforts and best wishes for all your future endeavors have you ever been confronted with large amounts of information at once it can be an overwhelming experience how do you make sense of everything with PowerBI you can create data models that act as visual representations of your records however this requires familiarity with the process and mastery of many different techniques so we’ve designed this course to equip you with the skills you need data modeling is creating visual representations of your data in PowerBI you can use these representations to identify or create relationships between data elements by exploring these relationships you can generate new insights into your data to improve your business microsoft PowerBI is a fantastic tool for creating data models and generating insights and you don’t need an IT related qualification to begin using it this course is designed for anyone interested in learning about building data models it also establishes a strong foundation for those pursuing a career in data analytics by exploring PowerBI you’ll learn how to create data models using schemas and relationships analyze your models using DAX also known as data analysis expressions and optimize a model for performance in PowerBI in the first week of this course you’ll explore the key concepts related to data modeling you’ll learn to identify different types of data schemas like flat star and snowflake you’ll create and maintain relationships in a data model using cardality and cross- filter direction and you learn to form a model using a star schema the second week of this course focuses on DAX or data analysis expressions this syntax is used to create elements and perform analysis in PowerBI you’ll start by writing calculations in DAX to create elements and analysis in PowerBI you’ll explore the formula and functions used in DAX and use DAX to create and clone calculated tables you’ll then be introduced to the concept of measures you’ll learn where measures are used and what types are available you’ll work with measures to create calculated columns and measures in a mode and you’ll learn about the importance of context and DAX measures finally you’ll perform useful time intelligence calculations in DAX for summarization and comparison and learn how to use these techniques to set up a common date table in the third week of this course you’ll learn how to optimize a model for performance in PowerBI you’ll begin by learning how to identify the need for performance optimization this means analyzing your data models to determine how they can perform more efficiently you’ll then learn how to optimize your PowerBI models for performance you’ll explore different techniques and methods for ensuring that you’re running efficient models and you’ll also learn how to optimize performance using DAX queries in the final week of this course you’ll undertake a project and graded assessment in the project you’ll build and optimize a data model for Adventure Works you’ll have to build this model from scratch and optimize it to run efficiently finally you’ll have a chance to recap what you’ve learned and focus on areas you can improve upon throughout the course you’ll engage with videos designed to help you build a solid understanding of data modeling in PowerBI watch pause rewind and re-watch the videos until you are confident in your skills then consolidate your knowledge by consulting the course readings and measure your understanding of key topics by completing the different knowledge checks and quizzes this will set you on your way towards a career in data analytics and form part of your preparation to take the PL300 Microsoft PowerBI data analyst exam by the end of the course you’ll be equipped with the necessary skills to work effectively with data models in PowerBI good luck as you start this exciting learning journey as a data analyst you’ll often manage thousands hundreds of thousands or even millions of records but how can you generate insights from all this raw data you can convert it into data models in this video you’ll explore the basics of data models and learn how to create them over at Adventure Works the company needs to generate insights and increase sales from different data sources these data sources include customer sales and marketing data but these data sources are all in separate locations and the only way to generate insights is to combine them that’s where the data model comes in adventure Works can integrate its data sources as a data model in Microsoft PowerBI then generate insights in the form of visualizations let’s find out more about data modeling and learn how Adventure Works can make use of it at its core data modeling is creating a structured representation of data this representation can then be used to support different business aims in other words a data model shows how different data elements interact and it also outlines the rules that influence these interactions data models can be built in Microsoft PowerBI microsoft PowerBI is software that provides data analysts with a user-friendly interface for building data models other benefits of a PowerBI data model are that it can be used to define relationships between tables and assign data types you can also create calculated columns and measures and update your model as your business requirements change in PowerBI the foundation of creating reports and dashboards lies within the data model it’s important to understand how to design a data model that effectively aligns with the visual elements within your reports and dashboards there are several steps involved in building a data model in PowerBI connect to your data sources prepare and transform your data and configure table and column properties then create model relationships and finally create measures and calculated columns using DAX or data analysis expressions once your data model is in place you can analyze the data to generate insights to help you achieve your business objectives let’s explore some examples of how data models can be applied to business data by optimizing the data model you can significantly improve the performance of your PowerBI reports and dashboards it’s also easier to aggregate structured data in a data model thanks to the clear relationships and hierarchies with an effective data model you can perform more advanced analytical capabilities like complex measures and predictive analysis when your underlying data is structured organized and aligned your insights and reports are more likely to be accurate and reliable now that you understand more about data models let’s briefly explore how Adventure Works can build one with PowerBI to generate the sales insights they need first Adventure Works needs to connect to its data sources by executing a query in Power Query Editor the result is then loaded into the PowerBI data model as a table using Power Query in PowerBI Adventure Works can finish importing and cleaning their data sources this creates a data model that contains cleaned customer date employee and marketing data as separate tables each table in the model represents a specific business entity and each table also has its own related attributes the next step is to define the relationships between the tables in PowerBI’s model view the company can link its customers and sales tables using the customer ID column which is common to both tables with this relationship the company can now view each customer’s transactions adventure Works could also link its sales and marketing tables to understand which campaigns were most effective for boosting sales finally the company needs to create measures and calculated columns using DAX or data analysis expressions dax is a syntax used in PowerBI to analyze data you’ll learn more about it later in the course for now just know that Adventure Works can use DAX to create aggregations and custom calculations to generate insights on important aspects of their data like sales totals a strong understanding of data models will help you maximize your data’s full potential building sophisticated data models creates a robust foundation for data analysis and generating insights remember that your data model is the foundation of everything else generating business insights often means working through large amounts of data and it’s important that this data is stored and structured meaningfully with PowerBI you can structure your data using a schema in this video you’ll learn about different types of schemas and their advantages and disadvantages adventure Works wants to optimize its inventory and rework its sales strategy to sell more bicycles but first it needs to analyze the relevant data to determine the best way to approach this task these data sources include customer product and sales data along with information on other aspects of the business adventure Works can use a schema in PowerBI to organize and build relationships between these different data sources this way the company can generate its required insights let’s find out more about schemas and how Adventure Works can use one a schema refers to a structure that defines the organization and relationships of tables within a data set it represents the logical framework of how the data is organized and connected there are many benefits to using a schema in PowerBI which you’ll explore over the course of this lesson a schema plays a crucial role in defining the data structure it also enables efficient data analysis helps with the creation of visualizations and assists with generating meaningful insights from your data there are three different types of schema that can be used to organize and structure data a flat schema a star schema and a snowflake schema let’s review each of these schema types and find out how Adventure Works can use them a flat schema is the simplest form of a data model all attributes and fields related to the entity are stored in a single table as you discovered in earlier courses a table is a set of rows containing data with each row divided into columns each column represents a piece of information with a specified data type the required attributes and entities are stored in the rows and can be extracted as required from the columns there are several advantages to a flat schema it’s easy to retrieve data from it’s less complex to analyze flat schema data and it’s a simpler way to visualize data however even though it’s an easy approach to understand the flat schema still has a few disadvantages it requires large data sets which are difficult to maintain and slow to query it leads to data redundancy and inconsistency so is more suited to smaller data sets and it doesn’t allow for complex data sets which require more flexibility and detail next let’s explore the star schema data model a star schema is a more advanced approach to structuring and organizing quantitive or measurable data in PowerBI it allows for multiple tables to be connected through one central table in a star schema a central fact table connects to multiple dimension tables you’ll explore these concepts in a later lesson these connections look like a star shape so it’s called a star schema adventure Works can build a star schema using a central fact table that contains sales transactions the company can then link the fact table to dimension tables that contain records for customers employees dates and marketing campaigns let’s break down the components of the star schema using the example from Adventure Works database first there’s the fact and dimension tables you’ll explore these further in a later lesson and there are the table relationships there are many different types of relationships which you’ll also explore in a later lesson a star schema offers many advantages over a flat schema by storing data in separate tables star schemas help to reduce data redundancy and boost query performance it also provides a clear logical data model which makes it easier to understand the data structure however it’s also less flexible than other schema types adding or modifying tables can require extensive changes to the schema and the star schema can struggle to manage complex relationships next is the third and final model the snowflake schema a snowflake schema is an extension of the star schema it breaks down the dimension tables into multiple related tables existing tables in a star schema can be further denormalized into other tables which creates a hierarchy yet these tables maintain a relationship with the dimension and central facts tables for example Adventure Works can further normalize its product data into supplier and category data tables don’t worry about the terms normalize and denormalize for now you’ll learn more about these concepts later in the course extending a star schema into a snowflake schema offers several advantages it provides more efficient data storage and retrieval it improves data integrity and consistency and it reduces data redundancy it also offers scalability and flexibility by integrating new data tables as required yet there’s also disadvantages to a snowflake schema it’s more difficult to perform data analysis because of the extra relationships these new relationships also make the schema more challenging to understand and manage and they result in slower queries finally it’s important to validate your schemas to make sure they’re accurate when validating a schema you need to check for the following make sure each table column has been assigned the correct data type like text and numeric check that each column has the correct formatting applied confirm that all columns have clear descriptions with relevant context and make sure all table and column properties are correctly configured you should now be familiar with the different types of schemas in PowerBI and their advantages and disadvantages you can build on this knowledge to develop robust data models in PowerBI this way you’ll ensure that your data retains its integrity and simplicity and can be used to generate insights making datadriven decisions involves working with large complex data sets fortunately you can easily manage these data sets with a flat schema in this video you’ll learn how to create a flat schema in PowerBI and configure your table and column properties over at Adventure Works the company has received complaints from customers about incorrect and delayed orders let’s help Adventure Works build a flat schema to organize its data more efficiently the first step is to connect PowerBI to the data sources to connect to a data source in PowerBI desktop select the home tab then select the get data drop-down menu select the appropriate data source from this menu in this instance you need to select the Excel workbook option then navigate to the folder containing the Adventure Works spreadsheet and select open once you select the Excel data source PowerBI displays the available tables in the navigator menu for Adventure Works there is only one table in the Excel spreadsheet available to load adventure Works data select the table from the navigator menu a preview appears on the right hand side the preview shows the Excel sheet has one table which contains sales data for Adventure Works there are also other columns related to the data like product name category subcategory quantity and more you can perform transformations from this menu but in this instance you just need to load the data so select load to add the selected data table to your PowerBI data model next select the data set from the data pane on the right hand side of the PowerBI desktop interface then select data view from the left sidebar to view the data set you can now configure your table and column properties using the power query editor to access the editor select the home tab and then the transform data option for example you can select the properties feature to alter the spreadsheet name or add a description add some spacing to the spreadsheet name then add the following description Adventure Works sales data this makes it easier to identify the spreadsheet it’s particularly useful when working in a team now you can begin applying transformations to shape the data as a flat schema first you need to remove duplicate data from the order ID column select and rightclick on the order ID column in the drop-own menu select the remove duplicates option alternatively you can access the home tab and select the remove rows option in the drop-own menu select the remove duplicates option either action removes all duplicate values from the selected column you can also format the product weight column by changing the data to a decimal type select the column then select the transform tab select the data type option and select decimal number from the list of available options confirm your selection to change the column type when you’ve completed your transformations select the home tab and then select close and apply you’re then returned to the PowerBI desktop interface you can make further changes here using the table tools and column tools tabs for example from the column tools tab you can select the format option and change the product price column data type to currency the next step is to edit the model select model view from the lefthand sidebar to view the schema of the loaded data the model view shows that there is currently one table of data this shows that we are working with a flat schema since there are no other tables there’s no need to build any relationships however you can still make further changes to the table’s properties select the table in model view to open the properties pane you can make more changes here by selecting individual columns from the table you should now be familiar with creating a flat schema in PowerBI from your data sources and you should also know how to configure your table and column properties using PowerBI and Power Query creating a schema in Microsoft PowerBI is an essential skill for entry-level data analysts as you progress in your data analysis career you’ll explore even more complex schema structures to handle more intricate data scenarios as you discovered in an earlier lesson you can use schemas for data organization and two central components of all schemas are fact and dimension tables in this video you’ll explore these tables in more detail and learn how they can be used to build schemas adventure Works is dealing with an increase in delivery errors to help fix this issue the company needs to explore its data and discover the underlying cause it can use fact and dimension tables to find a resolution as you learned earlier a schema is a logical and visual representation of how your fact and dimension tables relate they’re the backbone of schemas in PowerBI fact tables are called fact tables because they consist of the measurements metrics or facts of a business process in other words they hold quantifiable measurable data let’s take the example of an adventure works fact table it sits at the center of a sample adventure works star schema it’s called sales orders and includes transaction details like order ID product ID customer ID quantity and total price these are core facts about transactions like the customer who made the purchase the price of the product they purchased and so on and this fact table is related to dimension tables dimension tables are typically textual fields and provide descriptive attributes related to fact data they offer the context surrounding a business process event in the Adventure Works star schema the dimension tables are linked to the fact table and include date customer sales and product data these are descriptive details that can be used to identify individual customers these two examples should help you understand how fact and dimension tables inform the building of a schema in the star schema model the fact table sits at the center the dimension tables radiate out like the points of a star each dimension table is directly connected to the fact table for example the sales order table is the central fact table in the adventure works star schema the dimension tables like date customer and product are connected directly to it this structure simplifies queries because you only need to navigate through two tables to answer questions like what were the total sales on a particular date and these fact and dimension tables can also be used to extend a star schema into a snowflake schema a snowflake schema makes use of dimension tables by normalizing them normalization means that existing tables within a schema are divided into additional related tables this technique creates a structure that resembles a snowflake this is where we get the name snowflake schema from for instance in addition to a central fact table Adventure Works product dimension table could be split into a product table connected to subcategory and category tables this schema reduces data redundancy but adds complexity to queries you can help Adventure Works use these schema designs to discover the cause of the delivery errors you can import the required data sources represent the data sets as a snowflake schema and perform data analysis your analysis might reveal that the errors are linked to inventory management issues or incorrect addresses on record with these insights Adventure Works can fix its delivery processes and avoid future errors you should now understand the importance of fact and dimension tables when building a database schema with these tables you can create different schemas that help to organize and make sense of your data and generate insights you’ll often have to untangle large data sets and make sense of the relationships between tables an understanding of cardality and table relationships can be useful in these situations in this video you’ll explore the concept of cardality and review the different relationships that can be created between tables in a database to help with its business planning Adventure Works asks questions of its data like what bicycle sells best in each region or what is the revenue of each store however the data required to answer these questions is stored across several tables posing a complex data analytics challenge adventure Works can solve this challenge using cardality and by identifying the table relationships before we find out how Adventure Works can solve its data issues let’s take a few moments to explore the concept of cardality in the context of data analytics cardality refers to the nature of relationships between two data sets in other words how tables in your database relate to each other it’s important that your cardality settings are correct incorrect settings can lead to inaccurate data analysis and flawed business decisions there are three types of cardalities or relationships between tables in PowerBI the first is a onetoone relationship in this instance a record in one column of table A corresponds to a unique record in one column of table B onetoone relationships are less common in data modeling but they are useful when dealing with specific scenarios for example a single business entity can be loaded as two or more model tables because the data might come from different sources this scenario is common for dimension tables for example in Adventure Works data set each bicycle model has a unique model ID listed in the product ID column and a separate table lists specific features for each model ID in a product features column together these columns form a onetoone relationship between the two tables next is the one to many relationship each record in a column of table A corresponds to multiple records in a column of table B but not the other way around adventure Works lists its stores in table A and it lists the employees of each store in table B the relationships between the stores and their employees establish a one to many relationship this is because each employee works for one store but each store has many employees this is the most common type of relationship in data modeling where one table acts as the primary table and the other tables act as related tables finally there’s the many to many relationship this is where multiple records in a column of table A are related to multiple records in a column of table B in both directions many to many relationships are often used to establish a relationship between two fact tables or two dimension tables in the case of Adventure Works a customer can purchase many different bicycle models logged in table B and each bicycle model can be purchased by multiple customers recorded in table A this creates a many to many relationship understanding these relationships and configuring your settings appropriately helps your queries and calculations flow correctly and generate accurate insights another important aspect when considering the cardality of your data is granularity granularity refers to the level of detail or depth of a data set the granularity of your data should align with the business questions you need to answer for example Adventure Works wants to view customer purchase histories over the past year with data granularity you can explore individual transactions to analyze individual customer behavior and identify purchase patterns however if you want to understand which specific bicycle models are performing well in a region you need sales data with high granularity high granularity data is the data set that captures detailed information about the transactions for example geographical sales of products can be captured as a continent country state city and all the way down to individual stores but for a more general analysis like total sales per store a lower level of granularity suffices low granularity data refers to the data set that captures a highle summary or an aggregated level over broader categories an example of this is monthly sales of a product category the sales data is summarized at the category level but only on a monthly basis understanding the granularity of your data is crucial for establishing correct cardality it also influences how you set up your cross filter direction in PowerBI which you will learn more about in a future lesson but be careful when judging the required level of granularity misjudging the level of granularity can lead to misrepresented data and incorrect business insights and excessive granularity can lead to too much data and slow down your queries by developing a keen understanding of cardality and granularity you can untangle complex data scenarios like the one at Adventure Works with confidence and ease understanding the relationships between multiple data sets requires an advanced tool and Microsoft PowerBI’s cross filters are the perfect fit in this video you’ll explore the concept of cross filter direction and learn how to identify different types of cross filters adventure Works needs to calculate which members of its sales team have sold the most product types and should be awarded a bonus however the data required to generate this insight is spread across multiple tables with fixed cross filter directions you can help Adventure Works analyze this data by changing the cross filter directions of its tables but first let’s find out what data analysts mean by cross filter direction in PowerBI cross filter direction refers to the pathway or the direction through which filtering happens between two tables in a data model it dictates how data from one table influences the data in another table this enables relational analysis without resorting to complex queries or manual data consolidation powerbi relationships are directional in nature unlike other database management systems the direction significantly impacts how filtering operates having a clear understanding of relationship direction is a crucial aspect of data modeling in PowerBI let’s look at how direction plays an important role the Adventure Works data set contains three tables product sales and salesperson the product dimension table is connected to the sales fact table using a one to many relationship based on the product ID column common to both tables and a oneto many relationship also connects the salesperson dimension table to the sales fact table based on their common rep ID columns there are two types of cross filter direction the first is single cross filter direction this is the default setting in PowerBI the filter propagates from one table to another but not vice versa a good example of single cross filter direction is the scenario you just explored adventure Works product and salesperson dimension tables are connected to the company’s sales fact table via a one to many relationship each arrow points in a single direction indicating that the relationships direction is single this means that sales data can be filtered by both product and salesperson so when the product table is filtered for product one the sales table is automatically filtered for all sales of product one the next type of filtering is birectional filtering birectional filtering is filtering against the direction of a relationship sometimes you’ll need to do this to answer a particular question for example as you learned earlier Adventure Works requires a report on employee performance the report must show the number of products sold by each salesperson you can generate this report using birectional cross- filtering to generate the required results you must filter from the sales fact table to the salesperson and products dimension tables so you need to change the direction of the filter to both let’s look at the process steps for this action you can apply a filter in the salesperson table for a specific sales team member this filters the sales table for all sales by that person the filter propagates to the product table as the direction is birectional we have now determined how many unique products the salesperson has sold however there are a few important points to note when using birectional filtering birectional cross- filter relationships can negatively impact performance and configuring a birectional relationship can also result in ambiguous filter propagation paths you can disable filter propagation within a relationship in PowerBI using the cross filter DAX function this setting can be particularly useful in certain advanced scenarios where you must isolate data for independent analysis you’ll learn more about DAX in the next module the direction of the relationships plays a very important role in data modeling in PowerBI properly applying these cross- filter directions can drastically enhance data analysis leading to more insightful and actionable conclusions different data sets are explored at different levels of detail depending on the questions to be asked answering these questions requires working with different levels of data granularity over the next few minutes you’ll explore the concept of data granularity and discover how it can help inform your data analysis over at Adventure Works the company needs sales data to help make strategic decisions about what products to stock it must identify the highest and least performing products using annual and daily sales data you can help the company generate these insights by using data granularity to analyze its sales records let’s begin by recapping what is meant by the term data granularity as you might recall data granularity refers to the level of detail or depth captured in a certain data set or data field granular data provides deeper and more precise insights this delivers more nuanced and valuable findings remember data granularity isn’t about always having the highest level of detail it’s about having the appropriate level of detail before you begin your analysis ask yourself do you require high granularity or low granularity the decision should depend on the specific requirements and objectives of the analysis it’s about striking the right balance between detail manageability precision and simplicity high granularity data is the data set that records very detailed information about each transaction this level of granularity provides a comprehensive overview of each transaction including specific attributes and metrics associated with the transaction let’s look at an example from Adventure Works database for instance in Adventure Works data analysis product related data can be captured as product ID category subcategory name price size and weight some benefits of high granularity include in-depth exploration of trends patterns and relationships within data sets to identify specific behaviors and anomalies the flexibility to aggregate and summarize data at various levels of detail and the ability to facilitate accurate decision making by drilling down into specific data points next let’s look at low granularity in low granularity data information is captured and analyzed at a high-level summary or an aggregated level the data is not broken down into individual records instead data is summarized over broader categories or periods here’s an example from the Adventure Works database for example Adventure Works can explore its sales quarter by business quarter or month the benefits of low granularity include a simplified view that’s easier to understand and allows for analysis without an overwhelming level of detail improved performance and reduced data volume which leads to faster query execution and a quick identification of trends and patterns for informed decision-m let’s take a closer look at data granularity and its role in data analysis in the context of data analysis high granularity data is often more desirable it offers a finer level of detail so it provides greater precision and potential for deeper insights for instance tracking sales hourly high granularity instead of monthly low granularity could reveal patterns like peak shopping hours during the day however working with high granularity data comes with its challenges the more granular your data the larger your data sets will be potentially slowing down data processing and analysis on the other hand low granularity data while offering less detail can provide a broader view of your data it’s also easier to manage because of the smaller data sets in Adventure Works the monthly sales data low granularity could help identify broader trends such as seasonal sales fluctuations of certain product lines for example bicycle repair equipment sells more during the spring and summer months this is because customers are more active on their bicycles you can ensure the relationships are accurate and produce consistent aggregations by matching the granularity levels it also helps with correct filtering and supports drill down analysis data granularity also has a significant impact on building relationships between tables in PowerBI for example to determine the highest and lowest selling products in the Adventure Works inventory you must produce reports of total sales and budget over time using the sales and budget data the sales data is in the sales table and has daily level granularity on the other hand the budget data is stored in the budget table and is monthly to establish the relationship between tables and produce accurate results you need to format the date table in both tables and then build a relationship based on a commonly formatted date column understanding and manipulating data granularity is a powerful skill that all data analysts must master the degree of granularity can impact the insights drawn and the ease with which data can be analyzed with a firm understanding of data granularity you can now approach your data analysis tasks with a refined perspective it’s time to discover the story that the right level of detail in your data can tell untangling complex intricate data is often too large a task for one individual thankfully a PowerBI star schema can simplify complex data over the next few minutes you’ll learn how to configure a star schema in PowerBI including differentiating between fact and dimension tables and configuring cardality and cross filter direction adventure Works needs to organize its data to understand what products have been ordered and where they need to be shipped you can help them to organize the data using a star schema but first let’s review the steps for setting up a star schema in PowerBI the first step is to disable autodetect powerbi auto detects relationships when you load multiple tables you need to disable the function so you can set your own relationships the next step is to load your fact and dimension tables into PowerBI select the required tables from your Excel spreadsheet or other relevant location and load them into the application once you’ve loaded the tables you must create relationships between them you can join tables by dragging relationships between key columns or from the manage relationship section of PowerBI desktop finally you need to set cardality and cross filter direction you must set cardality to determine how your database tables relate and you need to set the cross filter direction to determine the pathway through which filtering occurs between your tables now that you’re familiar with the steps for setting up a star schema in PowerBI let’s help out Adventure Works as you’ve just discovered the first step is to disable the auto detect function launch PowerBI desktop go to file and select options and settings then select options within the settings menu to open the options dialogue box on the left bar of the dialogue box select data load then deselect autodetect new relationships after data is loaded and select okay next you need to load your fact and dimension tables into PowerBI select home then get data select Excel workbook from the list of options in the get data drop-own menu navigate to the Adventure Works company data spreadsheet and select open the navigator menu appears on screen this menu displays a list of available tables within your spreadsheet you can select which tables you need from this menu you can also use the search bar to locate a table when working with larger spreadsheets a preview of each table appears in the preview pane when selected in this instance you require the product region sales and salesperson tables select these tables then select load the tables are now visible in the model view your next step is to create the relationships between the tables you must build a one to many relationship between the sales table and the product region and salesperson tables in this instance you can create a relationship between the product table and the sales table based on the product key column which is common in both tables similarly you need to relate the sales table to the region and salesperson tables based on the sales territory key column and employee key column respectively alternatively you can also create and configure relationships from the manage relationship section of PowerBI desktop from the model view select manage relationship select new to open a dialogue box called create relationship from here you can build and configure relationships select the sales table from the drop-own menu then select the product key column from the available options then select the product table and its product key column next you need to set up the cardality and cross filter directions to set up cardality select the cardality drop-own menu then select the appropriate relationship type in this case it is many to one finally under the cross filter direction drop-own menu select the filter direction powerbi’s default direction is single so leave this as it is for the current scenario however before you select a birectional cross filter make sure that you fully understand its implications select okay when finished you can repeat this process to create relationships between the other tables select new then work through the same steps again to create more relationships select okay from the create relationship dialogue box when finished then select close from the manage relationships dialogue box to return to the model view the star schema is now ready to use the sales table is the fact table it sits in the middle of the model and connects to the salesperson region and product dimension tables you should now be able to configure a star schema in PowerBI differentiate between fact and dimension tables and configure cardality and cross filter direction keep the data analysis needs of your organization in mind as you build and refine your star schemas with practice this powerful data modeling technique will become a vital tool in your data analysis toolkit data is not always structured in a way that provides quick insights but by leveraging the Snowflake design schema you can unlock your data’s full potential in this video you’ll explore the snowflake schema learn how to build your own and discover how to transition to one from a star schema adventure Works data is stored in a complex format it’s having difficulty retrieving the necessary information you can help Adventure Works build a Snowflake schema to enable more efficient data storage and make it easier to generate insights let’s begin with an overview of the Snowflake schema the snowflake schema is a type of database schema design that optimizes data storage and retrieval by normalizing the data into multiple related tables unlike the star schema which uses denormalized data with fewer tables the snowflake schema consists of a central fact table connected to one or more dimension tables the dimension tables are further connected to other related tables to create a hierarchy for example the Adventure Works sales data sets product dimension table has a product category and a product subcategory in a star schema all three fields exist in one dimension table however in a snowflake schema you can split this single table into three different tables and all these tables are related to one another via one to many relationships now when you filter a specific product category the filter is propagated through the tables from product category to subcategory product and then sales as the adventure works example has just shown the snowflake schema offers many benefits so it’s an ideal choice for complex data structures in PowerBI here’s a quick overview of some of these benefits it simplifies dimension tables by splitting them into separate tables simplifying dimension tables also improves data integrity because hierarchical relationships more accurately represent the data and splitting data sets into separate tables also helps to reduce data redundancy because each attribute is only stored once it also enhances data analysis because a more efficient structure means more accurate insights and finally a snowflake schema leads to better management of data using hierarchies now that you’ve explored the basics of the snowflake schema and its benefits let’s help Adventure Works build one before uploading the data set you first need to turn off PowerBI’s autodetect feature this feature automatically creates relationships between the tables but you need to do this manually to disable this feature open PowerBI desktop select file options and settings and then options within settings this opens the options dialogue box select the data load option to the left of the dialogue box then deselect autodetect new relationships after data is loaded then select okay now you can load the adventure works data set from the home tab select get data then select Excel workbook from the options in the drop-own menu navigate to the data set and select open the navigator menu presents a list of available tables from the data set select the following tables category product region sales salesperson and subcategory then select load the tables are loaded into PowerBI and presented in the model view you can now establish the relationships between the fact and dimension tables you can do this by dragging the primary key from the dimension table to the foreign key in the fact table for example drag the product key column from the products dimension table to the product key column in the select fact table you can then repeat this process for all related tables in the snowflake schema next you must create hierarchies in the dimension tables to enable greater data analysis create relationships between the product table and the category and subcategory tables based on the category ID and subcategory ID respectively via a oneto many relationship this creates a hierarchy of product dimensions but what if Adventure Works has already created a star schema let’s review the process for transitioning from a star to a snowflake schema open the PowerBI project that contains the star schema your first step is to normalize the dimension tables identify the tables in the star schema to be further normalized into related tables create separate tables and then link them using foreign and primary keys to create these tables you’ll need to use DAX you’ll explore DAX in greater detail in a later module for now let’s just use some basic DAX code select the table tools tab then select new table add the required DAX code to the formula bar to create a new category table repeat the same process with the required DAX code to create a subcategory table once you’ve created the new tables PowerBI attempts to detect the relationships between them remove any new relationships that it establishes between the tables next you need to update the product hierarchy in the dimension tables to reflect the new Snowflake schema structure build a relationship between the category and subcategory tables based on the subcategory ID then build new relationships between the product and category tables based on the category ID you can now use this hierarchy to interrogate data on individual products product categories and product subcategories configuring the Snowflake schema in PowerBI is a valuable skill by mastering these skills you can play a critical role in helping organizations make datadriven decisions optimize operations and drive growth choosing the right schema generates valuable data insights choosing the wrong schema generates incorrect and misleading insights so how do you select a schema in this video you’ll discover why the Snowflake schema is often the most suitable schema for your data sets adventure Works wants to use its data to generate business insights into its sales and marketing practices so it needs to structure its data in a way that enables efficient querying and analysis it considers using a star schema however the last star schema it used resulted in an overly simplified and denormalized data set so you suggest a snowflake schema to more accurately represent and analyze the complex relationships between its data components as you discovered in earlier lessons a star schema organizes data into a central fact table this central fact table is surrounded by dimension tables containing descriptive attributes this structure is suitable for certain kinds of analysis for example it’s useful for analyzing smaller data sets however it becomes problematic when dealing with more complex hierarchical relationships this is particularly true for the Adventure Works data set by using the star schema’s denormalized approach Adventure Works risks generating results that contain redundant data and a loss of data integrity this would make it difficult to perform an accurate analysis of the data on the other hand a snowflake schema would provide a much better approach as you discovered previously the snowflake schema optimizes data storage and retrieval by normalizing the data into multiple related tables this structure provides more flexibility in defining complex dimension hierarchies and it allows for the creation of subdimensions within these hierarchies this lets analysts explore data at much deeper levels of granularity however the downside is that increased table sizes result in slower query performance this impacts the team’s ability to derive insights and make datadriven decisions quickly the best approach for adventure works is to build a snowflake schema this schema uses a more normalized approach which is more beneficial for dealing with intricate data relationships it can be used to build out multiple levels of related tables in the form of a hierarchy this is much more efficient than a star schema which flattens a hierarchy into a single table you can normalize several of the tables in the Adventure Works data set for example the product dimension table can be split into two separate tables category and subcategory this structure makes it much easier to analyze the performance of individual products and their related categories through deeper granularity customer data can also be organized in a hierarchy the team can explore customers and their purchases by country state and city this level of granularity reveals insights into regional sales patterns and marketing campaigns another benefit of this hierarchical structure is that it helps the team to identify patterns and relationships between data sets a snowflake schema also eliminates data redundancy each attribute is stored only once in its respective table and a unique identifier ensures consistent and accurate data finally the normalization of dimension tables also helps to reduce the data model storage requirements this makes the snowflake schema a much more efficient approach choosing the right schema is crucial for data analysis especially when dealing with complex data sets as the case of Adventure Works shows opting for a snowflake schema can help avoid the risks of using a star schema for hierarchical data relationships as an entry-level data analyst understanding the importance of using the correct schema for your data set is crucial by recognizing when a snowflake schema is more appropriate than a star schema you can optimize your data analysis process leading to more accurate insights and better informed decisionmaking you might often encounter a data model that’s unsuitable or not fit for purpose and leads to data analysis issues when this occurs you can take steps to rebuild the model and fix these issues over the next few minutes you’ll learn how to identify and resolve some common challenges arising from unsuitable data models adventure Works uses a star schema for its data model in PowerBI to analyze sales and customer data however this data model is not effectively meeting the company’s analytical requirements adventure Works has very large data sets and the company’s departments want to visualize this data according to their specific needs however this is difficult to achieve with the currently employed model adventure Works needs your help to resolve these issues and create a new more suitable data model the first step is to analyze the existing model and identify its issues some examples of common issues you could find in a data model include inferior performance issues with data consistency and limited scalability let’s begin with the issue of inferior performance the current data model might not be optimized for query performance resulting in slow report generation and analysis complex calculations based on larger data sets contribute to slow performance this makes it difficult for business users to draw real-time valuable insights from that data the sales table in the adventure works model contains columns like product descriptions these columns can be normalized into a dimension table for faster insights the next issue identified with the data model is inconsistent data disperate sources of data can be integrated without being properly validated for example duplicate data or incorrect data types this can lead to inaccurate reporting in your analysis adventure Works data model contains multiple examples of duplicate and inaccurate data across its tables if these tables aren’t fully normalized this redundant and inaccurate data will enter the company’s reports the final issue that was identified is that of limited scalability in other words the model cannot scale alongside a company to accommodate its increased data volume and associated evolving analytical needs adventure Works current model cannot integrate additional data sources emerging business requirements or analytical needs so now that you’ve completed your analysis and identified the issues you need to resolve the model’s challenges you can propose the following measures as a line of action to resolve these modeling and analytical problems the first step is to conduct a thorough assessment of the current data model and find any other issues that might exist once you’ve identified all the issues you can plan a redesign of the data model you must also understand the following data model components to support meaningful analysis and decision making the model specific data elements and their sources and the dimension and fact tables the relationships that exist between the model’s tables and the model’s calculations and measures another important step is to collaborate with stakeholders and business users to define the analytical requirements and objectives to be achieved for example Adventure Works sales department wants to identify the top performing product categories for each region and the marketing team wants to understand the impact of marketing campaigns within specific territories understand these analytical requirements and objectives so you can redesign a data model that implements all these requirements from the stakeholders and management team based on your assessment you’ve decided to redesign the data model as a snowflake schema you can complete this process by performing the following actions normalize the dimension tables create new tables where necessary establish proper relationships and cardality and create hierarchies compute custom calculations and measures using DAX test and validate and document all changes these actions will bring the following benefits to the data model they’ll improve model performance and enhance data integrity they’ll also remove data redundancies and boost the scalability of data analysis you then need to carry out the final few steps transform and validate the data while also implementing data quality checks you can also optimize the model then test it to ensure it functions as required finally deploy the new data model and train users to make sure everyone is familiar with how it works by implementing these steps you can help Adventure Works resolve challenges posed by the not fitfor-purpose data model the newly optimized data model will meet Adventure Works’s analytical requirements improve its data integrity and guarantee adaptability to changing business needs congratulations on reaching the end of this first week in this course on data modeling in PowerBI this week you’ve explored concepts for data modeling let’s take a few minutes to recap what you’ve learned in this week’s lessons you began the week with an introduction to data models you learned how to identify the initial steps involved in data modeling like defining relationships between tables assigning data types and creating calculated columns and measures you then explored the process steps for building a data model in PowerBI this involves connecting your data sources preparing and transforming your data and configuring the table properties you also learned how to create model relationships and create measures and calculated columns with DAX and you reviewed the benefits of data models you discovered that data models can be used to enhance the performance of reports improve calculations improve analysis and insights and deliver more accurate reports you then explored schemas a schema is a structure that defines the organization and relationships of tables within a data set three types of schema can be used to organize and structure data the first is a flat schema this is the simplest data model form it’s a set of rows and columns containing data then there’s the star schema it’s a central fact table that links to multiple dimension tables these tables are connected through relationships and finally there’s the snowflake schema this is an extension of the star schema it breaks down dimension tables into multiple related tables you first learned how to set up a flat schema this involves removing duplicate data formatting columns and editing the tables properties in the lesson exercise you configured a flat schema for Adventure Works you also completed an activity configuring a flat schema with multiple sources finally you completed a knowledge check to test your understanding of data models and you reviewed links to materials for further learning in the additional resources item the next lesson focused on cardality and crossfilter direction this lesson began with an introduction to fact and dimension tables fact tables hold quantifiable measurable data on a business process it sits at the center of a star schema then there’s dimension tables dimension tables provide descriptive attributes related to fact data they radiate out from the central fact table a snowflake schema extends this design it normalizes the dimension tables by breaking them down into additional related tables next you explored the concept of cardality cardality refers to how your database tables relate to one another your cardality settings must be correct to ensure your insights are accurate there are three types of cardality in PowerBI the first is a onetoone relationship in this instance a record in one column of table A corresponds to a unique record in one column of table B next is the one to many relationship each record in a column of table A corresponds to multiple records in table B but not vice versa this is the most common relationship finally there’s the manny to many relationship this is where multiple records in a column of table A are related to multiple records in a column of table B in both directions you can understand these relationships using cross filters powerbi offers single cross filter direction and birectional filtering single cross filter direction is the default setting it propagates from one table to another as in table A to table B but not the other way birectional filtering is filtering against the direction of a relationship this means changing the direction of the filter to both so you can propagate the filter in the reverse direction another important aspect of cardality is granularity granularity refers to the level of detail or depth of a data set the granularity of your data should align with the business questions you need to answer do you need high granularity data in the form of a data set that captures detailed information about the transactions or low granularity data in the form of a data set that captures highle summary or at an aggregated level over broader categories you then tested your understanding of these concepts you completed a knowledge check to test your understanding of data models and you reviewed links to materials for further learning in the additional resources item in the fourth and final lesson you learned how to work with advanced data models the lesson began with an introduction to setting up a star schema in PowerBI the key steps in this process involve loading the required tables creating the relationships between the tables based on common keys and setting up cardality and cross filter direction you then completed an exercise configuring a star schema for adventure works in PowerBI and you compared your result against an exempller next you learned how to set up a snowflake schema in PowerBI the process steps are like those for setting up a star schema the key difference is that you must create hierarchies in the dimension tables to enable greater analysis you can also convert a star schema into a snowflake schema using DAX queries you then put this knowledge into practice by changing an Adventure Works star schema into a snowflake schema you continued your exploration of advanced data models with snowflake schemas you reviewed the importance of snowflake schemas including their key benefits and you explored the process for resolving challenges in data models finally you completed a knowledge check and module quiz to test your knowledge of the concepts you encountered you’ve now reached the end of this module summary it’s time to move on to the discussion prompt where you can discuss what you’ve learned with your peers you’ll then be invited to explore additional resources to help you develop a deeper understanding of the topics in this lesson best of luck we’ll meet again during next week’s lessons what if you’re analyzing a data model and the data you need isn’t in the original model if it’s possible to derive the data from the original model you can use DAX data analysis expressions to create custom calculations to generate the data in this video you’ll learn about DAX and explore the basic syntax of DAX formulas adventure Works needs to identify its top selling products and calculate its revenue but these insights are beyond the scope of the original data model they can only be generated by calculating the existing data so Adventure Works must use DAX or data analysis expressions to complete this task let’s begin with an overview of DAX dax is a programming language used in Microsoft SQL Server analysis services Power Pivot in Excel and PowerBI it is a library of functions operators and constants used in formulas or expressions to create additional information about the data not present in the original data model with DAX expressions you can create custom calculations on data models to extract maximum information from your data to solve real world problems to master DAX you need to understand its syntax different data types the operators and how to refer to columns and tables using functions let’s begin with the syntax dax usually computes values over columns in a table so you need to know how to reference a column in a table first write the name of your new calculation then add the equal sign operator next write the name of your DAX function then parenthesis that contain the logic of your formula write a table name enclosed in single quotes followed by the column name enclosed in square brackets omit the table name if the reference column is on the same table let’s demonstrate this using an example from Adventure Works the Adventure Works sales table doesn’t include any data that denotes the total number of products sold the company could generate this data using DAX in the DAX expression sales is the table name followed by the column name quantity to be referenced and sum is the DAX aggregation function total product sold is the name of the new calculated column that holds the results of the calculation when executed this DAX formula adds a new column to the existing table that contains the required data next let’s review operators dax formulas rely on operators there are many different types of operators they can be used to perform arithmetic calculations compare values work with strings or test conditions some commonly used operators in DAX include parenthesis for grouping arguments arithmetic operators for performing basic functions like addition and subtraction and comparison operators for comparing values dax also uses logical operators to return true false values and concatenation operators to combine two or more values into a single string adventure Works can use operators in a DAX formula to calculate its total revenue in this example the multiplication operator multiplies the unit price by the quantity to compute the total revenue the parenthesis group the arguments of the expression and the sumx DAX function adds the arguments values to calculate the total revenue finally let’s move on to DAX functions dax functions perform various calculations manipulate data and create custom expressions as you discovered in an earlier example Adventure Works need to calculate their total revenue and they can perform this calculation using the sumx DAX function for now you just need to be familiar with the concept of functions you’ll explore functions in more detail later in this lesson it’s also important to understand that DAX is not just about formulas and functions it involves understanding the data model the relationships between tables and the context in which calculations are made for instance understanding how the tables relate to one another in Adventure Works data model is crucial for creating meaningful calculations there are several important aspects of a relationship that will help you to understand DAX tables connected via a relationship are not the same they are either on one or many sides of the relationship columns used to build the relationship are the keys of the relationship the column on one side of the relationship needs to have unique values and tables relationships can be either single or birectional the direction of the relationship determines the direction of automatic filtering remember mastering DAX requires practice start with simple formulas and gradually incorporate more complex functions and operators and ensure you understand your data model and the relationships within it as your comfort with DAX grows so will your ability to turn data into meaningful insights eventually you’ll be able to unleash the full potential of your data using DAX and gain valuable insights for decision-making dax is a useful language for generating business insights using formulas however data analysts need to understand that DAX generates insights from data based on the context of that data in this video you’ll explore the concepts of row and filter context and discover how they impact data evaluation in DAX adventure Works needs to answer business specific questions like what are the total sales for each product and what are the top selling items by category it can generate these insights using DAX dax formulas answer these questions by evaluating the relevant data according to its row and filter context let’s find out more about the relationships between DAX and context dax computes formulas within a context the evaluation context of a DAX formula is the surrounding area of the cell in which DAX evaluates and computes the formula this surrounding area is determined by the set of rows and filters to be evaluated in a DAX expression it determines which subset of data is used to perform calculations dax expressions adapt or refer to the context for evaluating dynamic and contextaware results let’s begin with an overview of row context row context refers to the table’s current row being evaluated within a calculation when a DAX expression is evaluated for a specific row it considers the values of the columns in that row as the context of the calculation this allows for calculations to be performed at row level and it’s especially useful for iterating through rows within a table for instance if you create a formula for a calculated column the row context for your formula includes the values from all the columns in the current row let’s demonstrate the concept using Adventure Works sales table the table contains sales data for multiple products over one month stored within the following columns date product category quantity and price adventure Works wants to create a total sales calculated column that shows the total sales data for each product in the table the company can use a DAX formula to multiply the quantity data in the quantity column by the price data in the price column for each item the formula iterates through the relevant quantity and price column values at the row level and returns the results in the total sales calculator column in other words the formula calculates the new values via row context next let’s review filter context as the name suggests filter context refers to the filter constraints applied to the data before it’s evaluated by the DAX expression in the previous example a different result was produced in each cell because the same DAX expression was evaluated against different subsets of data however with filter context you can determine which rows or subsets should be included or excluded from the calculation let’s demonstrate filter context using the Adventure Works sales table adventure Works must calculate the total sales for all items in category X the company can create a DAX formula containing filters that target all sales recorded against category X once the formula is executed it iterates through each row and retrieves only the data with the value of X row and filter context also interact with each other to produce results when a DAX expression is evaluated it first considers the filter context then the row context takes effect let’s demonstrate how this occurs with Adventure Works the company can use the filter context to narrow its sales data to the selected region the row context then iterates each row in the filtered results and calculates the sales totals as you’ve just discovered a filter applied on a table column affects all table rows filtering rows that satisfy that filter if you apply two or more filters to columns in the same table they are executed under a logical end condition this means only the rows satisfying all the filters are processed by the DAX expression in that filter context be careful when applying a filter in a large data model with multiple tables a filter context automatically propagates through the relationships between the tables in the data model based on the selected cross- filter direction of the relationships in this example this means that when data is filtered in the sales order table then data in the related tables is also filtered you can disconnect the tables to prevent propagation a row context on the other hand doesn’t automatically propagate through a data model’s relationships if you have a row context in a table you can iterate the rows of a table on the many side of a one to many or many to many relationship using the related table function you can also access the rows of the parent table using the related function of DAX understanding the context of DAX expressions at the row and filter level is important as you continue to build data models for reporting and visualization context affects how DAX interprets and analyzes your data so always consider the context when creating and executing your DAX formulas as a data analyst you’ll often have to perform complex calculations on large data sets beyond the scope of spreadsheet software like Microsoft Excel in these instances you need to utilize formulas and functions in DAX in this video you’ll review some commonly used DAX functions and examples of formulas that use these functions adventure Works has experienced steady growth in recent months however this growth has led to data management issues so Adventure Works needs a better way to generate insights into its data fortunately DAX formulas and functions are the perfect solution for generating these insights let’s find out more about DAX formulas and functions and then discover how Adventure Works can make use of them you previously learned about operators the building blocks for creating a DAX formula however there are also many common formulas and calculations performed on data these are part of DAX’s extensive library of functions functions are reusable pieces of logic that can be used in a DAX formula these functions can perform various tasks including aggregations conditional logic and time intelligence calculations data analysts can use these functions to handle complex data challenges and drive meaningful insights to create a function you must be familiar with the syntax a function begins with the function name followed by parentheses containing the functions parameters dax function names are typically expressed using capital letters to help differentiate them from table and column names for example Adventure Works could use a function to get the distinct count of rows in the custom key column in a table named sales dax expressions can be difficult to write particularly complex calculations which require nested functions so you can use variables in your DAX formulas to simplify calculation results and store them for reuse you can use variables to store intermediate results in a temporary location they’re like a storage box that you can put information into to be retrieved later this improves reliability and readability and reduces the complexity of your expressions you can define a variable in DAX by placing var before your variable or expression follow the variables with return where the expression’s result is provided adventure Works can create a simple formula that defines two variables to generate insights into its sales and customer data sales amount and customer number are variables defined to determine the total sales and number of customers respectively the return statement divides one variable by the other the entire expression’s result is in the DAX query’s return statement although DAX functions can be classified into many broad categories there are some commonly used functions let’s review these and discover how Adventure Works could leverage them to resolve their business problems the calculate function evaluates an expression in a context modified by the specified filters adventure Works can use the function to analyze total sales for a product category based on the color of the products the company just filters the products based on a specified color like blue the calculate function evaluates the sum of the sales table sales amount column in a modified filtered context a new filter is added to the product table color column another useful function is average X the average function returns the average of an expression evaluated for each row in a table adventure Works can use this function to calculate the collective average for freight and tax the function calculates the average freight and tax on each order in the sales table it first sums freight plus tax amount in each row and then averages those sums you also need to be familiar with the summarize function the summarize function creates a summary table by grouping data based on one or more columns adventure Works can use the summarize function to generate a sales summary report displaying annual sales for each product category this function returns the summary of sales grouped around the calendar year and the product category the resulting table allows you to analyze the sales by year and product category dax is a powerful language for advanced data modeling and analysis its wide range of functions can be combined with formulas to generate deep insight and remember that DAX functions can be combined to create complex calculations that perform multiple operations this versatility and flexibility makes DAX an essential tool for data analysts you might not always be able to answer business questions using an existing data model it could lack the required data or be too complex in these instances you can use calculated and cloned tables to enhance your data sets and improve your analysis over the next few minutes you’ll explore calculated and cloned tables and learn how to create them from different sources using DAX functions adventure Works needs answers to business specific questions about its sales and marketing but its current data model isn’t up to the task however by creating calculated tables the company can compare and analyze its data to generate the required insights you can learn more about calculated and clone tables by discovering how Adventure Works can create them using DAX functions let’s begin with cloning a table cloning a table can be extremely useful for manipulating or augmenting data without affecting the original table this is especially true when working with tables that are refreshed periodically and any changes you made to the original table might be overwritten for example Adventure Works must augment its sales table to generate insights but it doesn’t want to alter the original data so the company can create and work from a cloned version of the table while leaving the original intact a table can be cloned using a simple DAX formula type the new table’s name an equals operator and the original table name in parenthesis add the word all to instruct PowerBI to clone all data from the target table this formula states that the clone table is equal to the original table adventure works can use this syntax to create a clone of their sales table called sales data you can also use DAX to create a calculated table based on data from various sources for example Adventure Works must combine customer data from a database with sales data from an Excel spreadsheet to analyze the relationship between its sales and customers the company can use DAX to merge these sources and enable its analysis calculated tables can also be used to normalize dimension tables adventure Works can use DAX to split their product dimension table into category and subcategory tables this creates a hierarchy that enables more efficient data exploration and reporting now that you’re familiar with creating and cloning calculated tables let’s help Adventure Works before we begin let’s quickly review the data model within our model the sales table is the fact table it’s connected to all other tables via one to many relationships and the cross filter direction is set to single for all relationships we’re now ready to start the first step is to create a new calculated table using DAX in the data view of PowerBI select new table from the table tools tab to expand the DAX formula bar select the formula bar and write an all DAX function that extracts all data from the sales table to create a new cloned version of the table press enter to execute the function and generate an exact copy of the sales table the new cloned table is listed as cloned sales next you need to create a calculated table based on different data sets this must be an annual sales summary table that references the sales and product tables from the imported data set select new table once again then access the formula bar and write a DAX expression that uses the add columns summarize and calculate functions to calculate and summarize the required data within the annual sales summary calculated table press enter to execute the formula and generate a new table called annual sales summary finally ensure you have the proper relationships set between the tables for the proper functioning of DAX review the new calculated tables and the relationships in the data pane and the PowerBI desktop model view adventure Works can now begin analyzing its sales data and answering specific business questions by creating visualizations and reports using the newly calculated tables and existing data calculated tables are useful in DAX and PowerBI for simplifying and enhancing data analysis you can deploy DAX functions to perform analysis without impacting the original data sets study these tools carefully and make them a central part of your skill set you might often encounter tables that don’t have the data you need you can generate this data by combining existing columns to create a new calculated column in this video you’ll explore the basics of calculated columns in PowerBI learn how to create them using DAX and evaluate their effectiveness in contributing to meaningful analysis adventure Works is analyzing the data in its sales table and realizes there’s no data for the profit margins on its product categories in the original data source calculated columns are the perfect solution to this problem adventure Works can add data on its profit margins using DAX expressions to create new calculated columns within the original data source before you begin helping Adventure Works let’s find out more about calculated columns a calculated column is a new column added to an existing data table in PowerBI data analysts can use calculated columns to derive new data from existing columns and add it to the data model once added these columns can be used in any part of a report or visual just like any other column traditional columns are filled with data imported from a data source a calculated column is created by defining a DAX expression you can create a DAX expression that calculates the data from two or more columns the result of this calculation is then added to the table as the newly calculated column write the name of your calculated column and an equals operator then write the names of the tables to be referenced in single quotation marks and their respective column names in square brackets include a relevant arithmetic operator depending on the operation required for example Adventure Works can create a total sales calculated column by multiplying the quantity and unit price columns in its sales table now that you’ve explored the purpose of calculated columns in PowerBI let’s help Adventure Works to calculate its profit margin from its sales data in its sales table by creating calculated columns launch PowerBI desktop and load the Adventure Works data set the workbook contains one table called sales the table tracks Adventure Works recent sales data access PowerBI’s data view to view the sales table adventure Works need to calculate its profit margin but to do this it must first calculate its total sales for the quantity of each item sold however the table is missing this data you can add this data to the table by creating a new total sales column you just need to multiply the quantity and unitpriced columns select the sales table from the data pane on the right hand side of PowerBI desktop in the table tools tab select the new column from the calculations group this opens the DAX formula bar write DAX code in the formula bar that multiplies the quantity column by the unit price column and adds the result as a new total sales column press enter to execute the code a new total sales calculated column appears under the sales table in the data view on the right hand side of the PowerBI interface you can use this new column in any report or visualization like any other table column now that you’ve identified the total sales data you can create a profit column to determine how much profit has been made on each item write another DAX formula that subtracts the cost from the total sales and generates the data as a new profit column press enter to execute the formula the new profit calculated column is added to the sales table now that you’ve identified the profits you can create the profit margin column select new column again then write another DAX formula in the formula bar that divides the profit and total sales columns and generates the result in a profit margin calculated column press enter to execute the formula the profit margin column is added to the data finally you need to format the calculated columns select the profit column and format it as currency then format the profit margin column as a percentage you should now understand the basics of calculated columns and be able to create them using DAX and evaluate their effectiveness measures uncover the information hidden in your data and help you to tap into its real potential over the next few minutes you’ll explore measures and their importance for data analysis you’ll also explore how calculated tables are built from pre-calculated measures adventure Works needs to calculate its sales data for all the products it has sold this month it also needs to ensure that this calculation can be updated monthly against new sales data the company can generate these insights using measures you can discover more about measures and how they function by exploring how Adventure Works uses them let’s begin with an overview of measures measures in PowerBI are used to perform calculations on data model fields measures play a pivotal role in data analysis and interpretation measures are used in PowerBI to perform aggregations calculations or evaluations on data that provide meaningful insights measures are typically used in data visualization elements examples of these elements include charts tables and cards by using measures you can compute aggregated values such as sums averages minima maxima counts or more complex statistical calculations measures in PowerBI offer several benefits in data analytics and reporting let’s explore some of the benefits measures are calculated in the context of the visualization a report they are used in this means they are dynamically updated based on filtering and other interactions within the report in other words if the context changes then so does the measure this dynamic calculation allows you to dive deeper into data and gain insights from different angles and perspectives measures are also reusable once created you can continue to recall them in your code this reduces the repetitive work of creating the same calculations and ensures data consistency across all reports another benefit is performance measures can be used to track the performance of different aspects of a business measures are commonly used to create key performance indicators or KPIs essential to monitor business performance kpis provide a quick snapshot of performance against predefined targets or benchmarks and finally measures also help to maintain consistency measures help maintain consistency in metrics across different visualizations and reports consistency ensures the same results show regardless of filtering or grouping in your measures your calculations must be standardized and uniformly applied throughout the analysis this ensures accurate and reliable reporting across various visualizations and dashboards measures can also be used to create calculated tables in PowerBI a calculated table is a table that you add to a model derived from existing tables by using a DAX formula adventure Works has created a measure called total sales this measure is the sum of all sales across all products now the company needs a new product table that lists each product alongside its respective total sales this can be done using a DAX formula in this DAX formula sales is the original table sales product is the product column in the original table and total sales is the measure Adventure Works created let’s take a moment to explore a sample of the syntax used to create such a formula begin with the name of your new measure followed by an equals operator then add the required expression that contains the logic of your measure for example Adventure Works can create a new measure called total sales that calculates the total sales amount from the sales table when executed this DAX formula will list each product and its total sales creating calculated tables from pre-calculated measures is particularly useful for creating a summary table from large data sets or for creating a table with data that does not exist in the original tables this can enhance data analysis visualization capabilities in PowerBI in this video you have learned about measures and their importance in data analysis you are also able to explain how calculated tables are built from pre-calculated measures measures in Microsoft PowerBI are essential to data analysis and interpretation they offer dynamic reusable and complex calculation capabilities enabling businesses to gain insights from their data and make datadriven decisions effectively and efficiently as a data analyst you want to be able to provide your business with answers and solutions to the questions they are asking using measures you can gain valuable insights into your data drive strategic decisions and enhance your business’s performance over the next couple of minutes you’ll explore the different types of measures in PowerBI adventure Works is using different types of measures to prepare its annual sales report to compile this report it must analyze its sales data across different regions and generate insights into specific products and sales team members let’s explore the different types of measures Adventure Works can use to prepare its report before we explore measures let’s quickly review the concept of additivity additivity refers to how measures behave when aggregated across different dimensions for example summing or averaging values however not all measures behave the same way so understanding the behavior and categorization of measures is crucial for accurate data analysis and visualization in PowerBI measures are essential for performing quantitative analysis and deriving meaningful insights from the data they provide a way to summarize calculate and compare data across various dimensions based on specific criteria and business requirements measures can be categorized into three types additive semi-additive and non-additive let’s explore these types of measures in more detail additive measures facilitate data aggregation across any business dimension like time geography or product categories the basic mathematical operations applied to these measures are addition and subtraction these types of measures provide consistent results regardless of how you group data additive measures also use the sum DAX function to aggregate over any attribute for example Adventure Works monthly sales analysis report shows revenue and quantities sold by product category and region this data is for a specific unit of time in this case per month you can use additive measures to aggregate revenue and quantity sold by summing them across all dimensions this allows you to view the total revenue and total quantities sold while analyzing the performance of various products regions and months of the year next is non-additive measures non-additive measures cannot be meaningfully aggregated across any dimension these measures involve calculations like ratios averages and percentages the result of aggregating a non-additive measure can be skewed or misleading and should be handled with caution for example at Adventure Works the average sales per customer is a non-additive measure the average sales per customer in January is $300 and in February it’s $350 however it doesn’t make sense to add these averages and state that the average sales per customer for the two months is $650 instead calculate the total sales and total numbers of customers for the two months combined then divide the total sales by the total number of customers to obtain the correct average sales per customer for the period finally let’s explore semi-additive measures semi-additive measures can be aggregated over some but not all dimensions they’re mostly used in situations where the data represents a state at a particular point in time they’ve meaningful aggregation for certain dimensions but not for all like with additive measures semi-additive measures use some to aggregate over some dimensions and a different aggregation over other dimensions examples of semi-additive measures that Adventure Works use include inventory balance and current account balance adventure Works has created a measure called inventory at hand it uses this measure to add inventory across different product categories or store locations but the measure can’t be used to add up the inventory across time like the change in inventory over a two-month period this is because it’s semi-additive for example Adventure Works had 50 bicycles in stock at the end of January and 60 at the end of February but it would not be accurate to say that it had 110 bicycles in stock for the two months the stock level changed over this period it wasn’t a fixed unit or measurement you should now be able to identify and distinguish between the different types of measures in PowerBI each of these measures plays a unique role in generating insights and guiding decision making as always with data analysis it is vital to remember that the value lies not just in the numbers but in their correct and thoughtful interpretation as a data analyst you’ll often have to identify trends from raw data supported by empirical evidence this sounds like a complicated task but you can make it easier by using statistical functions in this video you’ll explore the most common statistical functions used in measures and explore examples of each one adventure Works needs to identify trends in its business from raw data the company can use several basic statistical functions to generate these insights exploring Adventure Works use of these functions is a great way to understand how they work but first let’s begin by understanding what data analysts mean by statistical functions statistical functions calculate values related to statistical distributions and probability they also allow you to perform calculations and comparisons that reveal meaningful information about the data when it comes to quantitive data analysis statistical functions are the lifeblood of the process these functions enable in-depth analysis by providing insights into your data trends patterns and relationships some common statistical functions you’ll make use of include average median and count there’s also distinct count min which calculates the minimum and max which calculates the maximum let’s start with the average function also known as the mean this function sums up all the numbers in a data set and divides the result by the total count of numbers this function is frequently used to identify a central tendency in a data set it is beneficial when you need to find the middle ground or commonality within data for example Adventure Works can use the average function to identify its average sales amount the company can create a calculation to generate this data using the average function sales is the name of the table that contains the sales data and sales amount is the column that contains the numbers for which it wants the average the next statistical function is the median function this function calculates the middle value in a set of numbers it sorts the numbers in ascending order and then selects the middle number the median is the average of the two middle numbers for data sets with an even number of observations unlike the average the median is less affected by outliers and extreme values this makes it useful for data sets with skewed distributions for example Adventure Works needs to compute average response times for its customers service team with this data the company can measure the team’s performance and identify areas of improvement the data set contains the support table with the response time adventure Works can apply the median function to compute the median value support is the table name response time is the column containing the numbers for which the company requires the median which is response time in this case only numeric data types are supported in this function dates logical values and text columns are not supported next let’s explore the count function this function counts the number of rows in a column or a table it is often used to measure the size of a data set you can use it to count all rows or only rows that meet specific criteria the only argument in the function is column when the function finds no rows to count it returns a blank for example Adventure Works needs a report containing sales of product categories to generate this report it needs to analyze the count of sales for each product category it can use the count formula to calculate this category is a column name that contains values to be counted next let’s look at the distinct count function this function counts the number of distinct values in a data set this function is helpful when you need to understand the count of unique values or categories the only argument allowed for this function is a column you can use columns containing any type of data when the function finds no rows to count it returns a blank otherwise it returns the count of distinct values adventure Works needs to analyze the number of unique daily visitors to its website this data is stored in a website table containing a visitor ID column adventure Works can use distinct count to compute the number of unique visitors website is the table name for reference visitor ID is the column name that contains the values to be counted lastly let’s examine the min and max functions the min function is used to identify the smallest value in a column or between two scalar expressions the max function is used to identify the largest value in a column or the larger value between two scalar expressions both min and max functions can provide an overview of the range of your data adventure works can use these functions to analyze its store inventory the min and max functions identify the minimum and maximum product quantity from the inventory table using the quantity column inventory is the name of the table quantity is the name of the column that contains the values to be evaluated you should now be familiar with the most common statistical functions used in measures and be able to make use of them mastering these functions will undoubtedly elevate your data analysis skills do you want to create custom calculations for tables columns and measures you can create custom calculations by using DAX over the next couple of minutes you’ll learn about context and how it impacts DAX measures you will also examine different scenarios where measures are presented in various ways adventure Works wants to analyze its sales data determine which customers make the largest purchases and compute stock in hand across all stores in an inventory management scenario at this stage of the course you should be familiar with the concepts of DAX measures and contexts you’ll often create measures in the form of custom calculations but these custom calculations are contextsensitive it’s important to understand the influence of context because it can result in variations in your calculations these variations are based on the level of data you are evaluating the model structure and the visual you are using to represent it an understanding of context and variation helps deliver accurate data analysis and provides business intelligence to key stakeholders let’s recap the basics of context context in DAX comes in two primary forms row context and filter context row context is the current row being evaluated in an expression like racing bikes in the Adventure Works data set in contrast when you build reports in PowerBI you can filter the report data which results in DAX using the filter context this is the subset of data the calculation operates upon influenced by visuals or reports filters for adventure works it could be all cross-country bicycles sold in North America now let’s explore the impact of context on DAX to understand how the use of context in DAX measures can influence business decisions adventure Works wants to analyze and present a report on annual total revenue the company can use the sum xdax formula to compute the sum of all the quantity values multiplied by the unit price in the sales table by applying this measure to the sales table the formula computes the sum of all sales amounts but this measure utilizes only the row context adventure Works needs more insights to drive key decisions through data for example it must understand which products are selling the best to improve warehouse stock management and impact marketing decisions to identify the best performing product categories Adventure Works can filter the data set using a DAX query this query determines the total sales for products under the bikes category it incorporates filter context created by the category column from the product table in addition to the row context it incorporates filter context created by the category column from the product table in addition to the row context adventure works also needs to determine which customers make the largest purchases first the company must determine the average purchase amount using the average DAX function it can calculate the average sales amount per customer by applying this measure to the sales data set to compute the measure for the customer with the highest purchases you need to define a logic based on customer ID customer ID corresponds to the total sales amount of $2,000 and above as high purchase customers and those who spend less than $2,000 are average purchase customers in this case the customer ID is now acting as a filter context to compute the measure this instructs the sales and marketing team which customers to target in their campaigns you should now be familiar with the impact of context on DAX the contextsensitive nature of DAX is a powerful feature of PowerBI it enables dynamic calculations based on the context in which the DAX computes the formula understanding how context impacts DAX allows users to create more accurate insightful and dynamic reports that can be tailored to specific business scenarios powerbi is very effective for generating insights but writing DAX code to analyze data takes time fortunately you can create calculations and measures faster using PowerBI’s quick measures feature over the next few minutes you’ll explore the concept of quick measures learn about the different types available and review the process for creating them in PowerBI adventure Works wants to quickly analyze and monitor the performance of its sales team against several key performance indicators but constantly rewriting the same DAX code for each performance review is time consuming adventure Works can speed up the process using PowerBI’s quick measures feature let’s learn more about how quick measures work in PowerBI so you can help Adventure Works as you’ve just learned quick measures are a useful technique for performing commonly used calculations quickly and easily a quick measure runs a set of DAX commands behind the scenes then presents the results as a new measure you can use in your reports and visualizations in other words you don’t have to spend time writing DAX code the measure does it for you based on the inputs you provide there’ll still be times when you need to write DAX expressions for specific business case scenarios but quick measures can still act as a good foundation many different categories of DAX calculations are available to work with and you can modify these calculations to meet your specific analytical needs when creating quick measures in PowerBI you can choose calculation types depending on the nature of the analysis you want to perform types of quick measures include aggregate per category filters and time intelligence there are also totals mathematical operations and text quick measures in PowerBI offer several benefits for data analytics and reporting you can use quick measures to generate commonly used calculations with just a few clicks this eliminates the need to write DAX expressions making the process more efficient another benefit is accessibility you can create quick measures using PowerBI’s userfriendly interface this accessible UI means even users with limited DAX knowledge can create calculations quick measures also help with data they empower business users to take ownership of their data analysis and reporting this simple and accessible tool for creating calculations reduces dependency on data experts and quick measures also offer flexibility to iterate and refine calculations if you need to adjust a calculation or explore alternative metrics you can easily modify your quick measures without affecting the underlying data now that you’re familiar with the basics of quick measures let’s help Adventure Works use them to track the performance of its sales team before we begin let’s quickly review the model you’ve launched PowerBI connected to your data sources and loaded transformed and configured the following tables for your model products region sales and salesperson now you can begin creating measures in PowerBI the first step is to select the report view or data view to access the calculations group within this group select quick measure the quick measures window appears on the screen choose the required calculation type and fields to run the calculations alternatively you can select the ellipses next to the table name on the data pane then select new quick measure from the drop-own menu remember that the measure is created by default in the table you have selected from the data pane on the right side of the window choose select calculation this action opens a list of available calculation types in PowerBI adventure Works must calculate what quantity of each product each team member has sold so choose total for category filters applied next you must select the required fields from the right pane to perform calculations select the sales column from the sales table and assign it as the base value then select the category column from the product table and assign it to the category section then select add to add these elements to the measure the new quick measure appears in the fields pane and the underlying DAX formula appears in the formula bar adventure Works also needs to know how much revenue each team member has generated this year you can calculate this using a year-to-ate sales measure to create this sales measure you can repeat the same process as before select quick measure from the measure tools tab then select the year-to-ate total calculation type then select the sales column from the sales table as the base value and the order date column from the product table in the date section finally select add a new measure called sales YTD appears in the fields in the data pane thanks to your help Adventure Works can now quickly track the performance of its sales team using quick measures and you should now understand the importance of quick measures be familiar with the different types available and be able to create them in PowerBI measures are PowerBI features that let you explore your data to create meaningful reports and visualizations in this video you’ll learn how to create custom measures with DAX adventure Works needs to analyze its sales data to calculate its total sales and identify the top two best-selling products in each category and region you can use DAX calculations to create custom measures to help Adventure Works generate these insights custom measures refers to userdefined calculations or metrics created using DAX like traditional measures custom measures also generate insights about data let’s create custom measures to help Adventure Works generate insights into its sales data before we begin let’s quickly review the company’s data model you’ve launched PowerBI connected to your data sources and loaded transformed and configured the following tables in the model: products region sales and salesperson so within our model the sales table is the fact table it’s connected to all other tables via a series of active one to many relationships and the cross filter direction is set to single for all relationships we’re now ready to start creating measures the first step is to create a new measure called total sales using DAX in the data view of PowerBI select new measure from the table tools tab to expand the DAX formula bar type total sales as the name of your new measure be aware that any new measure added to the DAX formula bar is named measure by default if you don’t rename the measure all new measures are named measure one measure two and so on give your measures unique names to be easily identifiable particularly when creating several measures write the total sales measure using the sumx function to multiply the unit price and quantity columns from the sales table when you enter your formula a list of suggested functions appears after you type the equals operator you’ll need to ensure that you understand the functions on this list and that you select the relevant one for your calculation and once you reference a table or column name PowerBI displays a drop-own list of available tables and columns within your data model select the correct field when choosing a reference from the drop-own list to ensure your chosen measure functions as required press enter to execute the function and generate the new total sales measure you can view the new measure within the table you selected under the data pane on the right hand side of the PowerBI desktop interface next you must create a measure that identifies the number one and number two top selling products in each category you can use the total sales measure to create another new custom measure select new measure to expand the formula bar and write a measure called top two products the measure begins with a variable that defines the ranking of products using the DAX values function the return section returns the value with the required calculation the calculate function filters the results of the total sales measure based on the top two products the top function defines the top products based on their respective sales it uses the number two to represent the top two products this is a dynamic measure that you can use to present the number one and number two top selling products by product category color or region press enter to execute the function when executed the function displays the results of the measure in a matrix or table that shows the total sales amount for the top two performing products in each category you can dig deeper into the data by working through different business years thanks to your help Adventure Works now have the insights they require and you should now be able to create custom measures with DAX this is a valuable new skill you’ve learned when used correctly you can deploy dynamic calculations to generate insights quicker there may be times when you encounter a data model with a cardality and cross filter direction configured making it impossible to perform the necessary filters with the cross filter function you can change the cross filter direction for a specific measure while maintaining the original settings in this video you’ll develop an understanding of the cross filter function its syntax and its relationship to measures adventure Works needs to analyze its sales performance for the previous few years along with the performance of its sales team however its data model tables are connected via one to many relationships and single cross filter direction this prevents the company from filtering the data as required and changing the cross filter direction to both results in a permanent change fortunately Adventure Works can use the cross filter function to alter the direction while maintaining the original settings let’s explore how this works as you’ve just discovered the cross filter function changes the cross filter direction between two tables for a specific measure while maintaining the original settings in other words it specifies the cross filtering direction to calculate a relationship between two columns so how do you create a cross filter function a cross filter function can only be used within a DAX function that accepts a filter as an argument like the calculate function for example this means that the function receives two arguments the name of the table you want to filter along with the required column and the direction in which you want to filter let’s explore an example the syntax begins with the cross filter function the argument is then placed in parenthesis the argument is the name of each table followed by the names of the required columns in square brackets the first column name is typically the Manny side of the relationship and the second is the one side finally add the filter direction the first column name is typically the Manny side of the relationship and the second is the one side for example Adventure Works could filter between both sides of the relationship on its sales and products table using the product key columns common to both you might be familiar with cross filter directions from earlier in this course here’s a quick recap of the possible directions in which you can filter the relationships in your model you could use none which means that no cross filter occurs within the relationship there’s also the one-way direction filters applied on one side of a relationship propagate to the other however you can’t use the one-way option with a one:one relationship next is oneway right filters left in this instance filter propagation occurs from the right side to the left side of the relationship and finally there’s one way left filters right in which filter propagation occurs from the left side to the right side of the relationship let’s review an example of how adventure works can make use of cross filter function in the adventure works data model the sales fact table is related to the dimension tables via one to many relationships and single cross filter direction this means that filters propagate from the product table to the sales table but not in the other direction so when Adventure Works analyze products sold by year the results aren’t accurate because the model can’t filter the results correctly you could try to resolve this issue by changing the cross filter direction between the tables to both but this also changes how the filters work for all data between these tables instead you can create a cross filter function using DAX to change the filter only for the current measure create a new productby-year measure that computes the total number of products sold the distinct count function calculates the number of distinct values in the product key columns between the sales and products tables and the cross filter function alters the cross filter direction from single to both based on this column once Adventure Works analyzes the measure based on the year column from the date table the results are accurate according to the business analytical needs you should now be familiar with the cross filter function and how it works cross filter is a useful function to change the direction of a relationship without changing the relationship itself this function creates visualizations with custom filtering depending on the business needs you’ll often create measures that generate answers to specific data questions but what if you need your measure to answer another question you can use the calculate function to refocus your measure in this video you learn how the calculate function can alter the filter context for measures adventure Works needs to analyze its total sales for all its products it also needs to generate more granular data including sales of bikes blue colored products and sales within the US region it can calculate the total sales for all products using a standard measure but insights into the other data will require more specific filters adventure Works can use the calculate function to change the filter context and generate these insights let’s learn more about how quick measures work in PowerBI so you can help Adventure Works changing the context of a filter means changing the data that the filter must analyze for example Adventure Works needs to create a calculation or measure that analyzes its total sales for all its products this is the original filter context once this calculation is completed the company needs to explore its data in more granular detail by identifying how many bicycles it has sold it can combine the original total sales measure with a new bike sales measure that generates insights into how many bicycles have been sold so the filter context changes from all products to all bikes before you review some examples let’s review the syntax of the calculate function the calculate function can be invoked with an expression as its first argument a set of filters in square brackets then follows the expression these filters are defined or modified by expressions to find out more about how this works let’s explore how Adventure Works make use of the function adventure Works first needs to calculate its total sales the company can create the total sales measure using the sumx function the measure must multiply the sales table quantity and unit price columns this measure uses row context and iterates over each row of the sales table to compute the total sales of products for Adventure Works adventure Works can continue to use this measure in all the other calculations it needs to complete now that Adventure Works has a generic measure of total sales it can refocus its filters to generate insights into bike sales adventure Works can create a new measure called bike sales that uses calculate to analyze the sales of products in the bikes category when the category bikes is executed the formula calls the total sales measure again however this time it adds the bikes product category as an additional filter in the filter context in other words the filter context changes from all products to all bikes next Adventure Works needs to analyze all blue colored products in each category the company can write a new measure called sales of blue products when executed the expression incorporates the blue color from the product color column as an additional context for this calculation it calculates the total sales of blue color products from the entire data set you can also specify multiple filters in the same calculate function all the filters intersect regardless of the order in which they appear for example Adventure Works can create a measure called sales of blue products in USA that computes the total sales of blue products in the USA region this measure calculates the total number of blue products sold only in the United States by adding the country column from the region table in the overall filter context of the calculation but what if you’ve already created filters on these columns any existing filters will be overridden by those in your calculate function so how do you retain both sets of filters you can use calculate modifiers to keep the behaviors that already exist in your columns an example of a calculate modifier is keep filters you can add keep filters before your argument while placing the argument in parenthesis this ensures that existing active filters on your columns are not overridden or merged with new filters other examples of calculate modifiers include cross filter all and use relationship you’ll explore these modifiers in more detail later in this lesson you should now be able to use the calculate function to alter the filter context of your measures so you can create measures to generate insights into your data and modify your measures filters to ask and answer other questions about your data as a data analyst unlocking fresh insights requires exploring data from multiple angles with role-playing dimensions you can explore your data from different perspectives and eliminate the need for redundant data structures through active and inactive relationships in this video you’ll explore the concept of role-playing dimensions and active and inactive relationships adventure Works receives thousands of orders from all over the world and it’s important that the company continually analyzes its orders to avoid delayed or mistaken deliveries it can use multiple dimensions to explore its order related data from multiple angles let’s find out more about role- playinging dimensions by exploring how Adventure Works makes use of it in the context of PowerBI dimensions represent the various attributes or business entities used to organize data role-play dimensions are instances of the same dimension used multiple times in a data model each instance plays a unique role by representing different aspects of the data this provides the flexibility to analyze data from different viewpoints without duplicating data tables let’s demonstrate this with an example from the Adventure Works database adventure Works sales and shipping departments operate in sequence first new sales are recorded in the sales data set as order date then the orders shipping date is recorded in the sales data set finally the system automatically generates a delivery date when the customer receives the product so in Adventure Works sales data set the date dimension is used three times for new sales shipping dates and receipt dates adventure Works can analyze sales performance by order and shipping date without needing separate tables optimizing delivery time by delivery date analysis this helps the business to analyze sales performance based on order date and shipping date without creating separate tables for each date type when Adventure Works queries its data the role of the date dimension is based on the fact column used to join the tables for example the table join relates to the sales order date column when analyzing sales by order date an important part of role-playing dimensions are active and inactive relationships an active relationship is a relationship between two tables used for analysis reporting and visualization an inactive relationship is a valid relationship not being actively used in the current analysis to differentiate between active and inactive relationships PowerBI marks active relationships with a solid line and inactive relationships with a dotted line let’s examine an example from Adventure Works in the Adventure Works table the date and the sales tables have three relationships however there can only be one active relationship between two PowerBI model tables all remaining relationships must be set to inactive a single active relationship means there is a default filter propagation from the date to the sales table the active relationship is set to the most common filter used by the company’s reports which is the order date relationship you can utilize the inactive relationship for specific analytical needs using the DAX use relationship formula so how do active inactive relationships relate to role- playinging dimensions here’s a quick demonstration of how these concepts function in the Adventure Works database let’s begin with creating a role- playinging dimension after importing sales and date tables you can create two relationships between them one for order date and another for shipping date by default the first relationship is active and the second is inactive the date table serves as a role- playinging dimension for both order and shipping date any analysis reporting and visualization you require can make use of this active relationship occasionally you’ll need to analyze data from a unique perspective for example Adventure Works needs to calculate its total sales based on the shipping date however the shipping date is an inactive relationship so using this calculation requires a measure to create such a measure an inactive relationship needs to be employed this is where the DAX function use relationship comes in to use the shipping date the inactive relationship create a measure using use relationship for instance to calculate the total sales based on the shipping date you can create a DAX formula calculate is used here to alter the filter context of the entire measure sum is summing up the sales amount column of the sales table as the sales table is connected to the date table via order date column by default each DAX calculation is based on the relationship between the tables user relationship function in DAX overrides the relationship and establishes a temporary relationship based on the shipping date column of the sales table or inactive relationship the relationship becomes active only for the current calculation this formula forces PowerBI to use the inactive shipping date relationship for the calculation role-playing dimensions and active inactive relationships in PowerBI create an efficient data model for comprehensive analysis although it might take some time to get used to these concepts they will prove invaluable as you navigate your PowerBI journey as a data analyst you’ll often encounter table relationships that are difficult to perform analysis with fortunately you can alter or manipulate table relationships to facilitate more efficient analysis using the use relationship function over the next few minutes you’ll explore the use relationship function its syntax and its application adventure Works needs to analyze its sales data based on the shipping date it could create a calculated table for the shipping date and relate it to the sales table this might work well for a smaller data set but Adventure Works has millions of shipping records a more effective approach is for Adventure Works to use the use relationship function to create a measure that utilizes the inactive relationships between the tables before we explore how Adventure Works can analyze its sales data let’s find out more about the use relationship function the use relationship function is used within the calculate function it forces the inactive relationship between the tables for the considered calculation to be used this lets you switch contexts within your data model without changing the default relationship between the tables it’s most useful when there are multiple relationships between two tables the function allows you to create contextaware calculations that can analyze data based on different date dimensions or adjust analysis based on a different category of products the advantage of use relationship is that it enables you to perform analyses using different relationships available between the related tables without affecting the overall structure of the data model now that you’ve explored how the use relationship function works let’s review the syntax begin with the function and then place your argument in parenthesis the argument is the names of the required tables and their respective columns that define the relationship the order of the columns doesn’t matter for the accurate calculation this function doesn’t return a value but modifies the context of a calculation this changes the table relationships meaning that there is no scalar value or table returned as a function is executed instead it changes the context by overriding the relationship between tables let’s return to Adventure Works data model to explore the syntax in action as you discovered earlier Adventure Works data model has a sales fact table and a date dimension table the data model’s current active relationship is from the sales tables order date column and the date table’s date column as no shipping date dimension table exists in the data model Adventure Works needs to create an additional relationship between the sales fact table and the date dimension table using the sales table’s shipping date column by default the active relationship for any analysis and visualization is utilized however there may be a requirement to calculate the total sales using the shipping date to do this it can use the use relationship function within the calculate functions first Adventure Works creates a sales by shipping date measure then it inputs the calculate function followed by the required argument in parenthesis in this argument the sum expression calculates the total of the sales amount column from the sales table the use relationship function changes the context of this calculation by switching the active relationship from the sales tables order date column and date tables date column to the sales tables shipping and date date to sales shipping date and date date when executed this calculation results in multiple relationships between these tables an active relationship with the order date and an inactive relationship with the shipping date this affects only the calculate function where it’s used it won’t permanently alter the active relationship let’s review some important points to remember when working with use relationship use relationship only works within the calculate and calculate table functions if you try to use it elsewhere you will receive an error use relationship functions can be used multiple times within a single calculate function to switch multiple relationships the use relationship must exist in the data model but it doesn’t have to be active the use relationship function provides flexibility to derive insights from different perspectives within a data model this provides a layer of flexibility to PowerBI making it an essential function for data analysts to master it can be challenging for a data model to handle various roles for a single dimension so analysts deploy the use relationship function in their calculations to configure role-playing dimensions in this video you’ll learn how to configure a role-playing dimension in PowerBI using calculate and use relationship adventure Works wants to analyze its sales data based on the shipping date instead of creating a separate date dimension table it can use the use relationship function in DAX to roleplay dimensions helped the company achieve this by launching PowerBI desktop and loading the Adventure Works data set the data model contains two tables called sales and date the sales table tracks Adventure Works recent sales data access PowerBI’s model view to view the sales and date tables however after loading data the model is missing the relationships you can establish the relationships between the sales and date table in the model view of PowerBI select and drag the order date column from the sales table to the date table this is the active relationship between these two tables next select and drag the shipping date field from the sales table to the date column of the date table this is an inactive relationship represented by dashed line you can validate the relationship by selecting the connector line between the tables and doubleclicking it opens the edit relationship dialogue box you can observe the checkbox make this relationship active is unchecked next you need to create the measure total sales by shipping date in the home tab of data view select the new measure from the calculations group this opens the DAX formula bar write DAX code in the formula bar that uses use relationship function to create a custom relationship between the date column of the date table and the shipping date column from the sales table press enter to execute the code a new total sales by shipping date measure appears under the sales table in the data pane on the right hand side of the PowerBI interface you can use this new measure in any report or visualization to analyze monthly sales data based on the shipping date you should now be familiar with the process for configuring a role-playing dimension in PowerBI using calculate and use relationship by now you should be familiar with methods for generating insights into your data but the most powerful and effective data insights you can generate are timebased in this video you’ll explore the concept of time intelligence and discover its importance by reviewing some scenarios where it can be applied over at Adventure Works the company is preparing its sales strategies and marketing campaigns for the year ahead as part of its preparation it needs to generate insights into time related data like seasonal trends annual growth and specific sales periods adventure Works can generate insights into these timerelated aspects of its business by using time intelligence functions as the Adventure Works scenario suggests time intelligence functions refers to methods and processes that aggregate and compare data over time data analysts can deploy time intelligence functions to analyze data based on time related dimensions time related dimensions include dates weeks and months and quarters and years you can also generate comparisons of time related data over annual periods and yearto date or YTD so why do data analysts view time intelligence as important time intelligence provides the ability to analyze data within the context of time this enables a more in-depth understanding of trends and patterns as the earlier Adventure Works example demonstrates this data plays a significant role in a business’s ability to generate insights to help with its planning forecasting and decision-making processes let’s explore a few other benefits of time intelligence time intelligence is useful for trend analysis identifying trends in past business performance is crucial for future decisions for example Adventure Works can use time intelligence data to examine historical sales trends and recognize if certain products sell better at specific times of the year identifying trends in past business performance is crucial for future decisions insights derived from time intelligence also help with forecasting and predictive analysis adventure Works can forecast future trends and plan activity based on historical trends it can make informed predictions about sales and demands which helps with resource planning budgeting and risk management for instance if the data shows a consistent increase in mountain bike sales every spring the company can ensure adequate inventory before the season starts time intelligence also enables real-time performance monitoring this is possible by creating dynamic measures like year-to-ate or YTD and month-to-ate or MTD adventure Works can use these measures to monitor real-time performance against key performance indicators the company can then use these insights to respond quickly to changing conditions time intelligence calculations facilitate comparative analysis an example of this is year-over-year R Y functions adventure Works can compare its current growth rate sales performance and other metrics against data from previous years to analyze its progress time intelligence also facilitates the optimization of sales and marketing strategies adventure Works can analyze its sales trends and the impact of its marketing efforts over time it can then use the results of these analyses to fine-tune its marketing strategies and sales tactics to improve its results now that you know its benefits your next question might be how do I use time intelligence implementing time intelligence involves creating calculated fields and measures to analyze data over time you can use PowerBI’s automatic time intelligence features or deploy DAX formulas to create quick measures powerbi offers an auto date time feature that allows easy data analysis by year quarter month and date this is useful for smaller data models powerbi automatically creates one date table for each date column in the date model to analyze data by different date attributes this table is hidden from the user because PowerBI handles it automatically you can also use custom DAX calculations to shape your data model and implement time intelligence calculations with more complex and non-standard requirements time intelligence is essential for understanding and visualizing time related trends and patterns in data as a PowerBI developer mastery of time intelligence calculations is key to generating meaningful information from your data summarizing data over a specific period is a key skill for data analysts timebased data can generate temporal insights and trends within data in this video you’ll review the importance of using DAX based time intelligence functions to summarize data over time over at Adventure Works the company needs to generate insights into its recent sales trends the insights it requires includes revenue growth seasonal sales patterns and the impact of marketing campaigns adventure Works can generate these insights using time intelligence functions index to summarize its data over time so what does it mean to summarize data over time at its core summarizing data over time is identifying trends patterns and anomalies in business performance over a specific period like sales per quarter or annual growth you can generate these insights by using timebased data summarization functions some frequently used examples of these functions include total year-to- date year-to- date and dates between each function generates insights into different aspects of your data the functions are written by stating the function name and the required arguments in parenthesis this basic structure is similar across all functions but the syntax for the arguments varies r must be combined with calculate and other functions let’s begin with the year-to-ate calculation the year-to-ate calculation or YTD aggregates values from the beginning of the year to the specified date for example all sales from January 1st of that year to the specified date the year-to- date requires two mandatory and two optional arguments expression is the first mandatory argument it calculates the total sales from the sales table dates is the date column we use PowerBI default date dimension in the current lesson filter and year-end date are optional parameters for example Adventure Works wants to evaluate its realtime sales performance call the expression sales year-toate and add the total year-to-ate function after the equals operator in your first parameter reference the total sales column from the sales table and aggregate the values using sum in the second parameter reference the order date column from the sales table then add another date field in square brackets when you type the date field PowerBI allows you to select a field from the table next let’s review the date year-to- date function this function returns a running total in the form of a single column table containing year-to-ate or YTD dates in the current filter context this function is part of a group that also includes the dates MTD and dates QTD DAX functions for monthto date also called MTD and quarter to date or QTD you can pass these functions as filters into the calculate DAX function the syntax contains two arguments the first is dates the column containing the required dates and the second is the year end date an optional parameter while the total YTD function is simple it limits the filter expression to only one filter if you need to apply multiple filter expressions within year-to-ate values use the calculate function then pass the dates YTD function as one of the filter expressions for example Adventure Works needs a running total that calculates its year-to-ate sales on a month-by-month basis based on the order date column from the sales table it can calculate this by creating an expression called sales yearto date method 2 the expression does not refer to any separate date table instead the dates YTD function is combined with the calculate function so Adventure Works can incorporate further filters when executed the expression returns a calculated table with the required running monthly total the next function is dates between this function returns a table that contains all dates between a specified start date and an end date the syntax contains three arguments dates is the column containing dates start date is the date expression for the start of the calculation end date is the date expression for the last date for the calculation adventure Works wants to evaluate its total sales over the summer season so it must create a measure using the dates between function in DAX the DAX code computes the total sales between June 1st and August 31st 2023 the calculate function computes the values of the total sales column of the sales table and dates between defines the period for which the sales values are to be computed when executed the expression returns a calculated table with the required total sales figures as these examples have shown your data model requires a date table or dimension before you can use time intelligence functions however you can use PowerBI’s auto date time intelligence if you’re missing the date dimension or you can create a date dimension in PowerBI using Power Query or DAX as you’ve just discovered DAX-based time intelligence functions provide valuable flexibility in summarizing and analyzing timebased data you can use these functions with other DAX functions to build powerful and insightful data models as a data analyst it’s important to be able to compare data sets particularly those from different periods like previous years or months in this video you’ll learn how to use DAX for comparison over time using time comparison functions like date ad parallel period and same period last year adventure Works is preparing its marketing campaign for the holiday season as part of its preparations it needs to analyze and evaluate campaigns from previous years adventure Works can implement DAX time intelligence comparison functions to identify trends and patterns from marketing campaigns from previous years it can then use these insights to inform its current campaign before you can help Adventure Works let’s find out more about comparison over time comparison over time means as the term suggests comparing sets of data over specific periods for example comparing sales from this month to last month these comparisons are generated using time intelligence functions in DAX like same period last year date add and parallel period the basic syntax for each function is to state the function name followed by the required arguments in parenthesis however the rest of your syntax can vary according to the functions requirements and your analytical needs when executed the functions return insights in the form of a table let’s explore an example of each function from the Adventure Works database to learn more about how they work the same period last year function returns a table that contains a column of dates these dates are shifted one year back in time from the dates in the specified dates column in the current context in other words it compares the current period against the same period from last year the syntax for this function requires one argument in the form of specific dates adventure Works can use this function to evaluate its sales from the previous year to compare them against the sales team’s performance from this year it first creates a measure called revenue previous year then it defines var as the variable for the previous year’s revenue calculate computes the total revenue based on the same period last year function which takes the date column from the sales table as its parameter in this instance we are using PowerBI’s autogenerated date dimension finally the return function displays the value of the entire expression next Adventure Works wants to evaluate its year-over-year change in sales it can modify the measure it just created to calculate the change ratio it first creates a new measure called revenue year-on-year percentage variables used in the expression enhance the code readability and query performance and in addition to the previous calculation the divide function computes the change ratio of sales amount by dividing the difference by the previous year’s revenue the results of both measures can be visualized in table format the following table extract compares revenue for July and August over a three-year period next let’s look at the date add function the date add function returns a table containing a column of dates added either forward or backward in time by the specified number of intervals from the dates in the current context the syntax contains three arguments dates is the column containing the required dates the number of intervals is the integer value that defines the number of intervals to add or subtract from the date interval is the unit of time by which to shift the date the unit can be a year quarter or month for example Adventure Works can use the date add function to compare this month’s sales with the previous month’s sales the calculate function computes the total revenue based on the filter arguments previously computed in the revenue measure date add function takes the order date column from the sales table as a date reference one represents the unit of time and the negative sign indicates that the intervals are back in time month represents the unit of time you can also use day quarter or year the results of this measure can be visualized in table format the following table extract compares sales revenue for August to October over a 2-year period comparing data over time is a powerful method for deciphering business trends and growth patterns mastering this skill will enable you to provide valuable insights for your organization to help it strategize and grow when working with time oriented values your date table must be correctly formatted and configured to avoid issues with your analysis in this video you’ll explore the process for setting up and the benefits of a common date table adventure Works data model has multiple fact tables tracking different aspects of its business like sales products and resellers but the data model doesn’t contain a date table this means there’s a risk that the different fact tables might represent dates differently without a common date table this makes it difficult to compare or relate data from diverse sources let’s find out more about the role of a common date table then help Adventure Works to add one to its data model a common date table or date dimension is a prerequisite for time intelligence calculations you can’t execute them without a date dimension the date dimension must meet the following requirements there must be one record per day there must be no missing or blank dates and it must start from the minimum date and end at the maximum date corresponding to the fields in your parameters but what if your data model is missing a date dimension in this instance you can use PowerBI’s autodate time intelligence you can also create a date dimension in PowerBI using either Power Query or DAX this is useful when working on large data sets with complex calculations you can create a date dimension with DAX using the calendar and calendar auto functions both functions return a calculated table with a single date column and a list of date values when executed adventure Works could use the calendar function to create its date dimension the company can use the calendar function as a calculated table called date it can then include its required periods start and end dates as its arguments it can also use calendar auto the calendar auto function scans the data model for the date column it takes the start and end date from the order date column from the adventure works sales table fiscal year and month is an optional parameter if defined for a different end of the year month for example if you specify three the year starts on April 1st and ends on March 31st if not specified PowerBI takes the default year-end month which is December now that you’ve explored the basics of a common date table let’s help Adventure Works build one in its data model begin by launching PowerBI desktop and loading the Adventure Works data set the data model contains five tables: sales salesperson products reseller and region the sales table tracks Adventure Works sales data the data model has no date dimension table so you’ll need to create one navigate to the home tab and select new table in the formula bar that appears on screen write the DAX code using the calendar function to create the date dimension table this table must calculate all date values between the 1st of January 2017 and the 31st of December 2021 when executed the DAX code creates a table with a single column containing the dates specified in your code the date values in the column also have timestamps format the column as date format to remove the timestamps select an appropriate format from the drop- down list of the format section navigate to the home tab and select new table to populate the common date table you need to write more DAX code using the date related functions like year month week number and weekday these functions extract the relevant information from the date columns of the other tables next you need to mark the common date table as the date table navigate to the date pane select the ellipses to the right of the date table and select mark as date table from the drop-own list of options this opens the mark as date table dialogue box select the date option from the date column drop-own menu if these steps are completed successfully a validation message appears select okay this action overrides the PowerBI’s autogenerated date dimension for all time intelligence and datebased calculations in DAX within the data model finally access the model view of PowerBI and establish the new one to many relationship with single cross- filter direction between the date table and the sales fact table drag the date column from the date table to the order date column in the sales table the model is now configured for time intelligence calculations adventure Works can use the model to generate its timebased reports and visualizations you should now be familiar with configuring and formatting a common date table in your data model a common date table makes the data analysis process more accurate and efficient it’s an essential part of every data analyst’s toolkit to execute time intelligence functions your data model must contain a common date table in this video you’ll explore the process for setting up a common date table using IM language in Power Query adventure Works must execute time intelligence functions but its data model lacks a common date table let’s help Adventure Works by creating a date table using M language in Power Query m is a PowerBI developmental language used in Power Query to create new dimensions and tables within a data model it provides a much more visual approach to creating dimension tables to assist Adventure Works load the data tables into the PowerBI data model select transform data in PowerBI desktop to open the Power Query Editor access the Home tab and select new source select blank query from the drop-own list of options add the required IM language code to create the date dimension table in the editor the list dates function lists the dates in this code based on the provided date range in this instance you’re creating a 5-year table from January 1st 2017 to January 1st 2021 the syntax 365×5 represents all the possible dates within this 5-year range and duration specifies the duration of the period with one equaling one day once you execute the code PowerBI generates a list of dates these dates must be converted to a common date table navigate to the top left side of the Power Query editor in the transform tab and select to table this action converts the list of dates to a table with a column named list by default rename the column as date next you must change the columns data type to the date data type right click to open the drop-own list and select change type select the date option from the list now you need to populate the table with the related columns select the table’s date column and navigate to the add column tab of Power Query Editor select the date section to expand the drop-own list of options select the following columns to add to the table from the drop-own list year month name of month name of day and week of year access the properties name field in the query settings and rename the query as date then select close and apply to return to the PowerBI interface finally select the ellipses next to the date table from the data pane and mark the table as a date table select the date column from the dialogue box then select okay to confirm finally establish the required relationships between the data models date table and other tables the model is now configured for creating time intelligence measures using DAX and for creating reports and visualizations in this video you learned how to set up a common date table using IM language in Power Query this video is a short introduction to IM language and Power Query you’ll learn more about IM language as you continue your PowerBI studies meet Tina Adventure Works in-house expert on using time intelligence calculations in DAX adventure Works is looking to optimize all aspects of its business from sales and deliveries to financial planning using time intelligence calculations in DAX the company suggests that Tina analyze its data in these areas and generate insights that reveal where improvements could be made to the business first Tina focuses on sales she performs timebased trend analyses using year-to-ate functions to analyze trends and patterns in sales over time her analyses reveal seasonal spikes and downward trends in sales of certain products over different months and quarters adventure Works can use these insights to forecast demand for its products this means the company better understands what products customers purchase and when they will most likely buy them it can design and implement marketing strategies targeting consumers during the months they’re most likely to purchase specific products tina’s insights into sales trends also help Adventure Works to manage its inventory better by identifying what kinds of bicycles customers are likely to buy and when adventure Works can then ensure that these products are in stock in time for busy sales periods tina can also use time intelligence functions to track sales team performance she can compare current and past performance data to prepare for the upcoming sales period the insights generated from her comparisons are then used to set realistic targets for the team and identify the high performers the upcoming sales period also requires large investments in inventory and marketing fortunately time intelligence is also a useful budgeting and financial planning tool tina can compare actual financial data with budgeted values over different periods assess financial performance and track spending the company’s finance team can use these insights to make budget adjustments time intelligence functions can also identify issues and their root cause for example Adventure Works anticipated a high volume in sales of mountain bikes over the holiday sales period but sales declined over the season tina can use time intelligence functions to drill into the related data and isolate these sales anomalies to analyze the root cause of the slowdown in sales for example the decline in sales might indicate a shift in customer behavior that needs to be addressed time intelligence in PowerBI is an important tool that businesses can use to use the power of time dimensions in data analysis through the insights generated by time intelligence businesses like Adventure Works can generate valuable insights that drive informed decisionmaking and help resolve issues congratulations on reaching the end of the second week in this course on data modeling in PowerBI this week you’ve explored how to use data analysis expressions or DAX in PowerBI let’s take a few minutes to recap what you’ve learned in this week’s lessons you began the first lesson by learning about DAX dax is a programming language that adds new information about existing data it consists of a library of functions operators and constants these are used in formulas or expressions to add information missing from the original data model a key element of formulas is functions functions are reusable logic used in a DAX formula to perform tasks like aggregation or calculations commonly used DAX formulas and functions include calculate sum and average you then explored the syntax of a formula a formula begins with the name of your new calculated column or table followed by an operator typically an equal sign you then write the name of your DAX function and parenthesis that contain the logic of your formula you then learned about row and filter context dax computes formulas within a context the evaluation context of a DAX formula is the surrounding area of the cell in which DAX evaluates and computes the formula row context refers to the table’s current row being evaluated within a calculation while filter context refers to the filter constraints applied to the data this determines which rows or subsets should be included or excluded from the calculation you are then introduced to calculated tables and columns a calculated table is a new table created within a data model based on data from different sources a calculated column is a new column added to an existing table that presents the results of a calculation you then completed the lesson by putting your new skills to the test by assisting Adventure Works with its use of DAX in the exercise and completing a knowledge check in the second lesson you received an introduction to measures you learned that a measure is a calculation or metric that generates meaningful insights from data measures are an important aspect of data analysis and play a lead role in creating calculated tables and columns there are three different types of measures additive semi-additive and non-additive which type of measure is used depends on the needs of your data and its dimensions a key element of measures is statistical functions statistical functions calculate values related to statistical distributions and probability to reveal information about your data several common statistical functions are used in measures like average median and count you learned how to build statistical functions into your syntax and explored how to use common functions like using the average function to calculate the average of a data set you then discovered how context impacts DAX measures you reviewed Adventure Works business scenarios in which the context of measures influenced the company’s business decisions finally you tested your new skills with a knowledge check and explored additional learning material in the additional resources in the third lesson you expanded your understanding of measures you began by learning how to create quick measures in PowerBI using common calculations instead of DAX codes you then explored techniques for creating more complex custom measures with DAX next you learned how to use the cross filter function you can use the cross filter function to change the cross filter direction between tables for a specific measure while maintaining the original table settings a cross filter function can only be used with a DAX function that accepts a filter as an argument like calculate you can use calculate and its related modifiers to combine filters and generate more granular insights into your data you then tested your new skills by adding a measure to an adventure works data set in the exercise and you tested your understanding of the topics in a knowledge check in the fourth lesson you explored how DAX is used with table relationships you began the lesson by learning about role-playing dimensions instances of the same dimension used multiple times in a data model each instance plays a unique role by representing different aspects of the data this allows analysts to analyze data from different viewpoints without duplicating data tables in a data model relationships between tables are either active or inactive you can configure these relationships using the use relationship function alongside the calculate function to force the use of the inactive relationship you completed this lesson by helping Adventure Works to add a role-playing dimension between two tables in its data model you then tested your understanding of the topics in a knowledge check and explored further learning material in the additional resources in this week’s final lesson you explored time intelligence in DAX you learned that time intelligence functions refer to methods and processes that aggregate and compare data over time these functions can be used in PowerBI through the auto date time feature or DAX dax can summarize data over time by identifying trends patterns and anomalies over a specific period or it can be used for comparison over time by comparing data sets over specific periods these insights are generated using summarization and comparison functions that return the required insights there are also more complex functions that can be used with time intelligence a prerequisite for using time intelligence functions is a common date table or date dimension if this isn’t present in your data model you can build one using the calendar function or the calendar auto function both functions return a calculated table with a single date column and list of date values you also learned how to generate a calculated date table using language in Power Query you then explored a realworld scenario where time intelligence played an important part in a business’s decision-making process during this lesson you helped Adventure Works use time intelligence calculations in DAX during an exercise and activity you’ve now reached the end of this module summary it’s time to move on to the discussion prompt where you can discuss what you learned with your peers you’ll then be invited to explore additional resources to help you develop a deeper understanding of the topics in this lesson best of luck we’ll meet again during next week’s lessons imagine you’re a data analyst at Adventure Works a thriving multinational bike manufacturing company your role is significant it involves digging deep into the vast array of data sifting through it and translating it into meaningful actionable insights decision makers in Adventure Works rely heavily on your PowerBI dashboards which provide a window into the world of Adventure Works vast data landscape these dashboards through your analysis guide the company and reveal its successes challenges and opportunities however over time you start noticing an issue as the data volume is growing the reports are slowing down simple queries that used to take seconds now take many minutes even hours this bottleneck is frustrating staff delaying decisions and even starting to undermine the value of datadriven solutions there is an urgency to fix the situation and you must act before the issue escalates further that’s when you realize the need for performance optimization this video covers the importance of performance optimization in PowerBI and how it affects the overall performance of data models reports and dashboards by the end of this video you’ll understand the benefits of PowerBI performance optimization such as enhanced speed and efficiency informed decision-making improved user experience resource efficiency and timely report generation over the next few minutes you’ll learn about the challenges Adventure Works face due to growing data volume and how performance optimization in PowerBI can address these issues in the context of PowerBI optimization refers to the process of modifying tuning or streamlining your data models reports and dashboards to achieve the best possible performance at its core it’s all about making sure your reports and dashboards run as smoothly and quickly as possible when you’re dealing with small volumes of data performance isn’t typically a concern but as your data grows the performance of your PowerBI solutions can start to deteriorate this might manifest as slow report loading times sluggish response times when interacting with dashboards or even timeouts and errors performance issues can arise due to a variety of factors including inefficient data models complex DAX calculations and inappropriate visuals however regardless of the cause performance issues can have a significant negative impact on the user experience and the usefulness of your PowerBI solutions that’s where performance optimization comes in by understanding and applying optimization techniques you can improve the performance of your PowerBI solutions ensuring they continue to deliver value as your data grows now let’s dive into some of the benefits provided by performance optimization first enhanced speed and efficiency adventure Works manages enormous volumes of data from sales records production statistics customer interactions to employee information this data holds valuable insights that guide strategic decision-making by optimizing your PowerBI report and data model you can significantly cut down the loading and processing time of large data sets allowing you to execute queries faster this means the different teams at Adventure Works from sales to production to management can quickly access the data they need reducing weight times and enhancing overall productivity the next benefit of performance optimization is informed decisionmaking the ability to make timely and informed decisions at Adventure Works is critical to its success if there’s a sudden drop in sales of a specific bike model or if a new bike accessory becomes a hot seller company decision makers must know about it as soon as possible to adjust its strategies accordingly with an optimized PowerBI data model reports load swiftly enabling faster analysis of trends and thereby leading to more prompt informed decisions next let’s look at the improved user experience of optimizing performance in PowerBI at Adventure Works numerous team members rely on PowerBI reports for their tasks slow loading reports can lead to frustration loss of time and lower productivity in contrast an optimized PowerBI system can dramatically improve the user experience by ensuring reports load smoothly and swiftly this way team members can focus on deriving insights instead of waiting for reports to load as Adventure Works continues to expand the data it manages grows as well requiring more computing resources in this situation they need more efficient use of resources an optimized PowerBI data model can make more efficient use of the resources handling larger volumes of data without a noticeable drop in performance this is crucial as it allows Adventure Works to handle its growth and the accompany increase in data without requiring excessive increases in computing resources lastly there is timely report generation different teams at Adventure Works may require regular reports to function efficiently the sales team might need weekly sales reports while the manufacturing team might require daily production reports with an optimized PowerBI data model these reports can be generated and distributed in a timely manner facilitating smooth operations across the company and ensuring each team has the data it needs when it needs it by embracing the power of performance optimization in PowerBI you’re not just enhancing the speed and efficiency of reports and dashboards you’re helping Adventure Works to make better decisions faster remember every second saved in loading a report every query executed faster every frustration eliminated by a smoothly loading dashboard these are victories in your quest to unlock the full potential of data so continue to explore optimize and innovate for it’s through these actions that you make a difference in organizations industries and the world you are the data pioneer and the future is in your hands imagine it’s your first day at Adventure Works a multinational manufacturing company renowned for its premium bicycles as a newly hired data analyst you have an enormous challenge to analyze the constant stream of data generated by the company’s diverse operations every sale in North America every accessory produced in Asia and every customer interaction in Europe sends ripples through the vast ocean of data that Adventure Works amasses every day this data is a disorganized treasure trove filled with critical insights that can drive strategic decision-making and fuel the company’s continued growth but how do you extract these precious insights from an unoptimized data set that’s where your secret weapon comes in the effective combination of optimization techniques and PowerBI this video aims to assist you in understanding the fundamental concept of optimization in PowerBI using a relatable scenario set in the context of Adventure Works by the end of this video you’ll understand the various optimization techniques such as sorting filtering indexing and data transformation and how they contribute to enhancing the efficiency and accuracy of data analysis over the next few minutes you’ll learn the importance of optimization in decisionmaking and strategy formulation to recap optimization in the context of PowerBI is the process of transforming cleaning and organizing your data sets to achieve the best possible data performance optimization involves techniques like filtering sorting and indexing which can make your data more manageable and your searches faster improving overall efficiency adventure Works operates in a data inensive environment this includes sales data from diverse markets manufacturing data from various plants product management data on hundreds of items human resource data on employees from different regions and much more to help understand this let’s put ourselves in the shoes of Lucas Pereira an assistant data analyst at Adventure Works lucas is tasked with understanding the sales performance of their different bike models across North America the sales data in front of Lucas is vast filled with information about bike models sales dates customer details and regions this is where optimization becomes a vital tool in Lucas’s arsenal there are four tools that will help Lucas with his task: sorting filtering indexing and data transformation in PowerBI sorting is an optimization technique that allows Lucas to organize his data alphabetically by bike model this seemingly straightforward step is like putting on a pair of glasses it sharpens the focus on the sales patterns and performance of each bike model making the data set much easier to read and interpret the benefits of sorting go beyond simplicity and aesthetics it sets the stage for faster and more efficient data processing by grouping similar data the search operation is enhanced thereby saving time it allows Lucas to identify trends patterns and outliers more quickly leading to quicker insights and decision-m in the competitive environment that Adventure Works operates this speed can translate into significant business advantages lucas then moves on to filtering his data to focus on his area of interest North America filtering data enhances clarity and relevance it eliminates unnecessary noise making the data more manageable lucas removes all irrelevant data related to other regions filtering leaves him with a data set that focuses exclusively on North American sales and by doing so Lucas can conduct more precise and targeted analyses leading to more relevant insights and strategies it also reduces the processing time and computational load making the overall process more efficient if filtering takes place during the transformation stage it also reduces the amount of data stored within PowerBI like using a well-laidout map to reach a destination faster indexing enhances the data analysis process by providing faster access to specific data points lucas creates an index on bike models and regions this allows him to quickly locate the data for a particular bike model in a specific region without having to sift through the entire data set it saves time and makes the analysis process more efficient enabling Lucas to respond faster to queries or generate reports more quickly thereby enhancing the decision-making process finally Lucas applies data transformation to standardize the sales dates which are in multiple formats the key benefit of data transformation is the improvement in data consistency which facilitates more accurate and meaningful analyses standardizing the dates allows Lucas to conduct a proper date related analysis enabling him to track and forecast sales patterns accurately it helps eliminate potential errors in the analysis due to inconsistent data the cumulative effect of these optimization techniques turns data sets into a powerful instrument of insight lucas’s journey through the data set of Adventure Works demonstrates that by streamlining and simplifying the data set optimization makes the data more accessible and manageable by applying optimization techniques businesses like Adventure Works can harness the true power of their data turning information into actionable business strategies as you’ve seen through Lucas’s journey data is more than just numbers on a screen it’s a mosaic a narrative a path that can lead you to new insights strategies and victories but to interpret data effectively you must refine it shape it and most importantly understand it that’s what optimization techniques do they’re the compass the map and the light that guide you through the maze of data so step up to the challenge use the power of optimization in PowerBI to create your own stories of success imagine it’s a Monday morning at Adventure Works headquarters and sales data from the previous quarter has just arrived as a newly appointed data analyst you’re eager to dive in and extract meaningful insights from the data pouring in from several stores and customer orders worldwide in addition there’s data from various suppliers and manufacturers who deliver essential parts for Adventure Works diverse bicycle product line for this report you are tasked to trace the journey of a specific component from the Adventure Works suppliers data set to the products data set as you start loading the data into PowerBI things begin to slow down queries that should take seconds are taking minutes and some aren’t loading at all you notice that the performance issues intensify when dealing with relationships between the different tables in your data model specifically many to many relationships this video helps you to understand how to identify data model performance issues in relationships and how to resolve them by adjusting the cross filter direction by the end of this video you’ll understand how to edit the relationships and optimize the performance of your data model using PowerBI over the next few minutes you’ll learn how to balance accuracy and performance in your data model by applying birectional filters only where necessary to understand the issue let’s first dive into what a manyto-y relationship entails in a data model relationships in data models represent how data tables connect and interact with each other the simplest form is a onetoone relationship where one row in a table corresponds to one row in another however real world data isn’t always that simple often one record can correspond to multiple records in another data set and vice versa this is where you can encounter the many to many relationships in the context of Adventure Works consider the relationship between the products and suppliers tables each product at Adventure Works is made up of various components from multiple suppliers and each supplier can provide components for multiple products this mutual relationship where each entity can relate to multiple entities on the other side is what we call a many to many relationship now let’s dive into the cause of many to many performance issues and how you can resolve it your focus is on the model view so select the bottom icon in the model view your tables are represented as boxes with field lists lines connecting these boxes represent the relationships between these tables find and select the specific relationship you wish to edit in this case you are interested in the relationship between the products and suppliers tables if your model has many tables and relationships you might need to drag the tables around or zoom in and out using the scroll wheel or the zoom slider at the bottom right of the screen now that you’ve located the relationship it’s time to edit it double click on the line connecting the products and suppliers tables this action opens a new dialogue box titled edit relationship the cross-filter direction between the products and suppliers table is causing performance issues in the data model since you wanted to trace the journey of a specific component from the adventure works suppliers table to the products a one-way filter would be appropriate for this limiting the products data to only those that involve the chosen component in the edit relationship dialogue box locate the option labeled cross filter direction the current setting is both meaning filters can flow from the products table to the suppliers table and vice versa to change the cross filter direction to reduce this complexity select the drop-own menu for cross filter direction and select single or suppliers filters products now that you’ve made the desired changes it’s time to save them at the bottom right of the manage relationships dialogue box select the okay button this action will close the dialogue box and apply your changes to the data model by changing the direction of its filter you’ve simplified the data model this simplicity has made it more efficient and resolved the performance issues you’re a newly hired data analyst at Adventure Works your first task is to source prepare and analyze data to aid the marketing initiatives as you’re delving into the data you start to encounter an issue you notice that your PowerBI reports usually swift and reliable have started to slow down you discovered that this is due to high levels of cardality in this video you’ll explore the impact of cardality on performance and how high cardality affects your data analysis tasks by the end of this video you’ll have the practical knowledge to reduce cardality to improve the performance of your PowerBI reports over the next few minutes you’ll learn how to identify high cardality explore strategies to reduce cardality decimals and consider the implications of these changes on your data as you might already be aware cardality in the context of PowerBI refers to the number of distinct values in a column for example imagine analyzing a data set containing a column called product category within this column you might find several different categories each of these unique categories represent a distinct value and the total count of these unique items determines the cardality of the product category column a column with a high number of distinct values has high cardality when you have high cardality it can increase the size of your data model and the time taken to process queries slowing down your PowerBI reports imagine trying to find a specific book in a library that doesn’t have a categorization or indexing system that’s essentially what happens when cardality is high the PowerBI engine must sift through more unique values slowing down the process while high cardality can slow down the performance of your PowerBI reports identifying high cardality columns and modifying them appropriately can enhance your report’s performance powerbi itself is a high-erformance system that can handle large volumes of data with high cardality however there are always trade-offs in system design and reducing cardality can help when dealing with truly large data sets let’s explore some methods for reducing high cardality one strategy to reduce cardality is through summarization during transformation this step is similar to moving from a detailed view to a summary view of your data instead of looking at individual transaction data you can group them by categories such as product category order date or delivery date in Adventure Works instead of analyzing every unique bike sale you could aggregate sales data on a product category basis however that’s not the only method to reduce high cardality a second strategy is to reduce cardality by changing decimal columns to fixed decimals high precision decimal values can significantly increase cardality for instance consider the product weight column in Adventure Works sales table responsible for tracking the weight of each bike to the microgram the variation in bike weights is very large leading to high cardality by rounding these weights to a fixed decimal point you can significantly reduce cardality now that you’ve learned how to identify high cardality let’s look at how you can reduce it as you just discovered you can reduce the cardality of Adventure Works data model through summarization once you have located the column you want to summarize in this case product category select the columns header to select the entire column then go to the transform tab on the top menu bar in the transform toolbar select group by a new group by window will appear in this window you can specify the column you want to group by and the aggregation function you want to apply like sum count average etc based on the nature of your data after specifying these settings select okay this form of summarization lowers the cardality leading to improved performance and as the second strategy demonstrated you can also reduce cardality using fixed decimals to do this locate and select the decimal columns header you want to modify in this case the product weight column then select the transform tab on the top menu bar in the transform toolbar select data type a drop-own menu will appear with a list of different data types from this list select fixed decimal number after this the column’s data type will be changed and it should now contain fewer unique values effectively reducing its cardality by following these steps you can reduce the cardality of your data thereby improving the performance of your PowerBI reports however remember that reducing cardality might also result in less granular data so always take into consideration the requirements of your analysis before you decide to reduce cardality as you continue exploring the world of data always remember that it’s not about having less data or more data it’s about having the right data and when you master this you can turn raw numbers into insightful stories make informed decisions and create impactful change data enthusiasts are often required to look for real-time insights and dynamic visualizations to make informed decisions direct query in PowerBI enables you to dive into vast amounts of data with auto refresh functionalities though direct query connectivity has several benefits it comes with its own set of behaviors and limitations let’s walk through these elements of direct query as data connectivity options in PowerBI adventure Works has expanded its operations in recent years to various regions across the world the company wants to build a real-time sales dashboard to monitor sales performance across various regions categories and products adventure Works has a massive transactional database that records sales data in real time the company also wants to implement data security to ensure data access permissions are defined within the database and users only have access to the data they are authorized to view to meet the requirements of Adventure Works you need to establish a direct query connection in PowerBI to retrieve and analyze the data let’s explore what direct query is and how it can help you to connect to your data sources direct query is a data connectivity option in PowerBI that allows analysts to connect directly to the data sources without importing data into PowerBI model instead of loading data into the memory direct query sends queries directly to retrieve data from the sources for real time analysis although it is best practice to import data into PowerBI model there are times when using direct query is inevitable let’s review some of the benefits that Direct Query offers direct Query allows you to execute queries in real time for example in a multinational retail corporation new sales transactions are added every hour to the database this ensures that the sales dashboard always displays the latest data large data set imports to PowerBI models can cause performance problems and high memory consumption by using direct query PowerBI avoids loading an entire data set to the model optimizing memory usage direct query respects the data source level security ensuring that only the authorized users have access to the data the data access permissions defined in the underlying database are enforced providing a secure and controlled data access environment let’s examine the behavior of direct query connections when you establish a connection in PowerBI desktop via direct query if the connection is made to a relational database like SQL you can select a set of tables from the database that will return a set of data for example at Adventure Works you can select data from the central SQL data warehouse via direct query connection to perform realtime sales analysis data loading in PowerBI only loads the schema not the actual data reports and visuals send queries to the underlying database to retrieve the necessary data the visual refresh time depends on the performance of the underlying data source the tables you selected for Adventure Works are not imported to PowerBI model only the schema is therefore the data refresh cycle sends the query to the central database once added information is recorded to the source database the reports and visuals do not reflect the updated data immediately you will need to refresh the report to display the latest data for instance each new sale record of Adventure Works saved on the database will be reflected on the dashboard after you refresh the report if you publish a PowerBI report to a PowerBI service it displays the same behavior as with imported data except there is no data imported all the report elements can be used in creating a dashboard the dashboard titles are automatically refreshed as per refresh frequency that you can configure dashboard visuals will show data from the latest refresh when opened for example if your manager asks you to present the most recent dashboard every morning then you can set up refresh time an hour before the presentation time the use of direct query can have negative implications the limitations vary depending on the specific data source that is being used it is always faster to query data from memory import data rather than querying it from the server direct query the performance depends on the size of the data the database server specifications the network connection speed and optimizations to the data source you must understand these performance implications before deciding to use the direct query for your data analysis in PowerBI with direct query you can apply some data transformation in the query editor of PowerBI however not all the transformations are supported this also depends on the data source for example a SQL server supports some transformations while SAP business warehouse doesn’t support any transformation in the query editor in the latter case you need to apply transformation in the underlying data source data modeling and DAX are also limited in direct query mode for example PowerBI default date hierarchy is not available in direct query and some of the DAX functions such as parent child functions are also not available complex DAX measures also cause performance issues so it is advisable to start building simple aggregation measures and test the performance before moving to more complex calculations in DAX when using direct query mode almost all the reporting capabilities that you have with imported data are also supported for direct query models provided that the underlying source offers a suitable level of performance however when you publish your PowerBI report to a PowerBI service the quick insights and Q&A features of the service are not supported in direct query mode dax measures filters can cause performance implications in reports of direct query models direct query offers an alternative way to connect PowerBI to the data sources but it has some limitations data analysts must understand the behavior benefits and limitations of direct query before deciding to use it for their analytical and business needs direct query models demand consistent performance across all layers of the solution fortunately there are several optimization and query reduction strategies that you can use to help you along the way over the next few minutes you will learn how to optimize the underlying data source for better query performance adventure Works is experiencing poor report performance it is taking too long for pages to load in the reports table and matrix visuals are not refreshing quickly enough when certain elements of the report are selected while reviewing the data model you discover that the model is using direct query to connect PowerBI to the source data resulting in the poor report performance you will need to act in order to optimize the performance of the direct query model in direct query mode performance optimization is needed at each layer of the solution the first layer of the solution to be optimized is the data source you’ll need to tune the source database any optimization done to the underlying source database will enhance the direct query connection which will improve your PowerBI reports the following standard database practices apply to most situations avoid using complex calculated columns because the calculation expression will be embedded into the source queries review the indexes and verify that the current indexing is correct if you need to create new indexes ensure that they are appropriate powerbi desktop provides you with the option to reduce the number of queries sent to the database in direct query mode in PowerBI the default behavior of a filter or slicer is that when you select an item in that slicer or filter the other visuals of the report will be filtered automatically in direct query mode this will send multiple queries to the database for every selection within a filter or slicer these multiple queries will reduce the performance of your report for example you want to select multiple items but when you select the first item five queries are sent to the underlying database on selecting the second item another five queries are sent to the database this will result in a further slowdown of speed this is especially true when you have a multis select slicer or filter you can optimize the number of queries sent to the database in PowerBI desktop the optimization of performance through query reduction requires effective strategies and techniques aggregations allow for pre-calculated summary values that can be imported and stored in the memory engine of PowerBI an optimized data model can lead to efficient query processing simplifying relationships eliminating unnecessary columns and avoiding complex DAX expressions wherever possible can enhance query optimization by reducing the number of queries sent to the underlying data source you can limit the number of visuals and filters in a PowerBI report while working with direct query connectivity for example you can reduce the number of visuals on the report page or reduce the number of fields that are used in a visual in direct query mode performance optimization is vital to deliver a smooth and responsive user experience implementing query reduction strategies and focusing on query performance enhancements allows you to maximize the benefits of real-time data connectivity in PowerBI as a data analyst you’ll often need to optimize the query performance of direct query connectivity fortunately configuring the table storage will improve data retrieval speed and reduce the query workload on the data source over the next few minutes you’ll learn direct query performance optimization with table storage adventure Works is experiencing slow data retrieval speeds while trying to build its reports upon further investigation you discover that the cause of the slow retrieval speed is due to the query workload on the data source you will need to use table storage to reduce the query workload and improve the retrieval speed let’s explore what storage modes are and how they can be used to optimize the performance of your direct query data sets storage modes in PowerBI determine where the data of that table is stored and how queries will be sent to the data sources you can specify the storage mode of the table individually within your data model the storage mode lets you control whether PowerBI desktop catches table data in memory for reports storage modes in PowerBI offer the following benefits as users interact with visuals in PowerBI reports DAX queries are submitted to the underlying data set caching data into memory by properly setting the storage mode can boost the query performance and interactivity of your reports tables that are not cached don’t consume memory for caching you can enable interactive analysis over large data sets that are too large or expensive to completely cache into memory you can choose which tables are worth caching and which aren’t you can reduce the refresh time by only importing the tables that are necessary to meet your business and analytical requirements this will optimize the data refresh time and frequency now that you’re familiar with what storage modes are let’s examine the three storage modes that PowerBI supports if a table is using the import storage mode it means that the data of that table will be stored in the in-memory storage of PowerBI every query to the data would be a query to the in-memory structure and not to the data source for instance Adventure Works sourced a sales table from a SQL server but is using the import storage mode a copy of the data will be stored in the memory engine of PowerBI whenever you refresh a report in PowerBI desktop it will query the in-memory structure instead of sending queries to the SQL server data source tables using the direct query storage mode will keep the data in the data source for example if adventurework sales data is stored in a SQL server and a report is created within the storage mode PowerBI will send SQL queries to the data source and to retrieve the results because the table is using the direct query storage mode you can use SQL profiler at the same time to view manage and optimize the queries when using dual storage mode one table can act either as direct query or import with respect to the relationship to the other tables in some cases you will fill in queries from imported data while in other cases you will fulfill queries by executing an ondemand query to the data source for example to a SQL server let’s find out how various storage modes work in PowerBI desktop while connecting to direct query mode launch PowerBI desktop and connect to SQL Server via direct query navigate to get data and select a SQL Server from the drop-own list of options you’ll be presented with a SQL Server database dialogue box enter the server name and database name by default import mode is selected select direct query and select okay this action directs you to the SQL server containing an Adventure Works database named Adventure Works DW2022 here you can select the number of tables you want to load to PowerBI model select the following tables from the database the internet sales fact table and the product customer and sales territory dimension tables navigate to model view and expand the properties pane select the sales table scroll down to the properties pane and expand advanced access the storage mode drop-own menu to view the three storage modes select the import storage mode for the internet sales fact table once you have selected the import mode a dialogue box appears on screen this dialogue box warns that setting storage mode to import is irreversible you will not be able to switch back to direct query select okay you have now successfully optimized the storage mode of the fact table in the adventure works database you can further leverage this feature to decide which tables of the schema you need to import and which you can keep in direct query connectivity depending on the analytical requirements in direct query mode performance optimization is vital to deliver a smooth and responsive user experience by implementing query reduction strategies to optimize the number of queries sent to the underlying database and focusing on query performance enhancements you can maximize the benefits of real-time data connectivity in PowerBI aggregations in PowerBI are a great method of generating fast query performance and interactivity in your reports and visuals aggregations in PowerBI enable you to dive deeper into your data without compromising the speed and performance of your query in direct query connections powerbi not only provides a potential solution for small data sets but it also has the potential to handle large data sets by switching to direct query as direct query does not store data into the memory PowerBI sends queries to the underlying data source for every page of the report however direct query mode can be slow depending on the number of visuals in a report and the number of users interacting with the data at a given time for example imagine your report contains four visuals and every time you apply a filter to the data PowerBI sends queries to the data source sending queries to the data source with each interaction makes the direct query quite slow fortunately PowerBI has a solution to handle the slow response of direct query called composite mode composite mode allows you to use part of the model as a direct query which for larger tables is typically a fact table and use part of the model to import data for smaller tables usually dimension tables this approach allows you to achieve better performance when you work with smaller tables as they are just querying the in-memory storage of data however the tables that are part of direct query connection are still slow in response this is where a useful feature within composite mode called aggregations can come into play in PowerBI aggregation refers to summarizing or consolidating large volumes of data into more manageable summary tables to improve query performance by condensing detailed information into simpler high-level values aggregations are the solution to speed up the direct query connected tables within a composite mode with the help of aggregations you can create layers of pre-agregated values which are stored in memory storage of PowerBI for faster performance let’s consider these concepts in a scenario adventure Works wants to analyze data for the last 5 years of sales across all its products and regions the fact table might contain tens of millions of rows making it a huge data set for PowerBI’s import limit of file size in this example the objective of performing the analysis is to query the sales values by the year region product or customers category in short you are querying the fact table by aggregations of the dimension tables therefore creating and managing aggregations of the fact table will help you to reduce the file size of the sales table and optimize query performance for Adventure Works for example suppose you are aggregating sales data by calendar year the aggregated table can pre-calculate the sum of the sales amount for every calendar year in this case you only have five rows of data one for each year and that is smaller than the original fact table this pre-calculated aggregation can be imported to the memory of PowerBI and will be efficient in querying daily analysis furthermore if you want to analyze data at a higher level of granularity at a daily level the total number of data rows is still tiny in comparison to the millions of rows in the fact table as dimension tables are typically smaller than the fact table aggregated tables are always smaller than the fact table before you create aggregations in PowerBI you need to decide the granularity of analysis you want to perform on the aggregations for example evaluating sales amount at day level once you decide on the grain the next step is to create aggregations you can create aggregations in one of three ways you can create a table with aggregations at the database level for instance SQL Server database if you have access to the data source and then import the table to PowerBI you can create a view of the aggregation for example in SQL Server database and import the view to PowerBI if you have access to the data source you can use Power Query editor in PowerBI to create aggregations aggregations in direct query have several benefits let’s explore three specifically in case you are handling a large data set aggregations provide a faster and optimized query performance and assist you in analyzing the data they also reveal insights without querying the underlying data source that is slower in response and in worst case scenario the query times out if users at Adventure Works are experiencing slower refresh time of the reports in PowerBI you can create aggregations which help you to speed up the refresh process the smaller size of aggregated tables imported to memory reduces the refresh time enabling a better user experience adventure Works is anticipating a growth in sales volume for the upcoming year you can leverage aggregations to create and manage aggregations as a proactive measure to futureproof the solution thereby enabling a smooth scaleup of the company aggregations are the game-changing feature of PowerBI in optimizing the speed and performance when dealing with huge volumes of data with the help of aggregations you can have layers of pre-calculated tables stored in the memory of PowerBI always ready to respond to queries when users interact with the data in reports powerbi’s aggregation feature is useful for creating a seamless bridge between raw data and meaningful analytics in this video you’ll learn how to create and manage aggregations in Power Query Editor of PowerBI first you need to load the required tables launch the PowerBI desktop and connect to SQL Server via direct query navigate to get data and select SQL server from the drop-own list of options this opens a dialogue box called SQL Server database enter the server name and database name by default import mode is selected select direct query and then okay the action directs you to the SQL server containing the Adventure Works database powerbi opens the navigator window with the list of tables select the following tables to load the internet sales fact table and the customer and date dimension tables once the tables are loaded PowerBI autoestablishes the relationships between the tables in this instance you only need to review the relationship between the date and internet sales tables delete any inactive relationships between these tables next you need to create aggregations using Power Query Editor from the home tab select transform data to open the editor create an aggregated table based on the internet sales fact table note that this action converts the existing table to an aggregated table to keep the original table intact the first step is to reference the fact table select the internet sales table from the queries pane right click and select reference from the drop-own list this action duplicates the internet sales table rename the query as a sales or aggregated table next from the home tab of the query editor select choose columns this opens a select columns dialogue box for the current aggregations create an aggregation using the order date key and customer key columns from the list of columns first unselect all columns then select the following columns order date key customer key unit price and sales amount select okay next select group by from the transform tab to open the group by dialogue box by default basic is selected choose advanced the first section presented is grouping this is because you’ve selected two columns for grouping select add grouping to add another field select order date key and customer key from the first and second grouping columns respectively the second section is aggregations to find the new column name then the mathematical operation for the aggregation like sum count average and so on finally select which column the calculation should be based on for the current lesson add the following aggregations sum sales amount based on the sales amount column sum unit price based on unit price column and order count which will take the count rows operation and does not require a column reference select okay after adding and defining aggregations the action will add an aggregated table to the data model the new aggregated table is much smaller than the original table now you have created an aggregation based on fact internet sales keeping the original table intact the table is added to the data model next you need to establish the relationship between a sales table and the customer and date dimension tables build the relationship between the a sales table and the dimension tables customer key and date key columns finally you need to set the storage mode of the aggregated table as import navigate to the model view and expand the properties pane select the a sales table in the properties pane expand advanced select import from the storage mode drop-own list of options the action opens the dialogue box indicating the warning message setting storage mode to import is an irreversible operation this means that you will not be able to switch back to direct query there is another recommendation on the dialogue box the number of weak relationships can be reduced by setting the customer and date dimension tables to jewel the checkbox set affected tables to jewel was checked by default leave this checked and select okay this action imports the a sales table to PowerBI’s memory and converts the storage mode of the dimension tables to Juul the reason is that both dimension tables are connected to the original fact table that is direct query sourced and to the a sales table that uses import mode this means the dimension tables are set to dual storage mode so they can act both ways depending on the situation select the dimension tables and check the storage mode option in the bottom right hand corner of the visualization pane to confirm that dual storage mode is selected in this video you learned how to create and manage aggregations in Power Query Editor of PowerBI congratulations on reaching the end of the third week in this course on data modeling in PowerBI this week you’ve explored optimizing a model for performance in PowerBI let’s take a few minutes to recap what you’ve learned in this week’s lessons you began the week with an introduction to what optimization is and why it is necessary you learned about PowerBI dashboards you learned how dashboards can provide access to large volumes of data that can be used to generate insights on successes challenges and opportunities you then explored query lag and how simple queries that used to take seconds could begin to take many minutes even hours you investigated the challenges that growing data volumes can bring as well as how performance optimization can address those issues and you reviewed the benefits of performance optimization in PowerBI and how it affects the overall performance of data models reports and dashboards you then further examined optimization and what it is and how performance issues can arise due to a variety of factors including inefficient data models complex DAX calculations and inappropriate visuals you explored how optimizing your PowerBI report and data model can significantly cut down the loading and processing time of large data sets allowing you to execute queries faster next you examined how the benefit of performance optimization informs decision-making and how the ability to make timely and informed decisions is critical to its success and how with an optimized PowerBI data model reports load swiftly enabling faster analysis of trends and thereby leading to more prompt informed decisions you then explored the user experience and the benefits that an optimized PowerBI system can have dramatically improving the user experience by ensuring reports load smoothly and swiftly next you learned about resource efficiency and how an optimized PowerBI data model can make more efficient use of resources handling larger volumes of data without a noticeable drop in performance you explored optimization by example and how to analyze the constant stream of data next you examined optimization techniques such as filtering sorting and indexing which can make your data more manageable and your searches faster improving overall efficiency you are introduced to four tools that will help you to understand vast amounts of data: sorting filtering indexing and data transformation you learned how sorting made data sets much easier to read and interpret how filtering reduces the processing time and computational load making the overall process more efficient how indexing allows you to quickly locate the data for a specific region without having to sift through the entire data set and how data transformation facilitates more accurate and meaningful analyses next you moved on to resolving performance issues in data models which had you explore the different types of relationships such as onetoone and manny to manny you then learned how to identify and reduce cardality levels and how identifying high cardality columns and modifying them appropriately can enhance your reports performance you learned about the behavior and limitations of direct query connections you learned that direct query is a data connectivity option in PowerBI that allows analysts to connect directly to the data sources without importing data into PowerBI model you explored the benefits of direct query which are real-time updates reduced memory usage and data security you then investigated the negative implications of direct query which are its impact on performance its limited support for data transformation its limitations in modeling in DAX and its reporting limitations you explored optimizing direct query performance with query reductions you learned that in direct query mode performance optimization is needed at each layer of the solution and how PowerBI desktop provides you with the option to reduce the number of queries sent to the database in direct query mode you learned some effective query reduction strategies and techniques including aggregations optimizing the data model and report optimization you then explored optimizing direct query performance with table storage and how storage modes in PowerBI determine where the data of that table is stored and how queries will be sent to the data sources and that you can specify the storage mode of the table individually within your data model you examined the benefits of storage mode which are query performance larger tables and data refresh optimization you then learned about import mode and that if a table is using the import storage mode it means that the data of that table will be stored in the in-memory storage of PowerBI you also explored direct query mode and that tables using the direct query storage mode will keep the data in the data source you then learned about dual mode and that when using dual storage mode one table can act either as direct query or import with respect to the relationship to the other tables you then moved on to aggregations in PowerBI and how aggregations in PowerBI enable you to dive deeper into your data without compromising the speed and performance of your query in direct query connections you explored composite mode you learned that composite mode allows you to achieve better performance when you work with smaller tables as they are just querying the in-memory storage of data and how in PowerBI aggregation refers to summarizing or consolidating large volumes of data into more manageable summary tables to improve query performance by condensing detailed information into simpler higher level values you identified the three ways to create aggregations which are you can create a table with aggregations at the database level for instance SQL server database if you have access to the data source and then import the table to PowerBI you can create a view of the aggregation for example in SQL Server database and import the view to PowerBI if you have access to the data source you can use Power Query Editor in PowerBI to create aggregations finally you learned about the benefits of aggregations that in the case you are handling a large data set aggregations provide a faster and optimized query performance and assist you in analyzing the data they also reveal insights without querying the underlying data source that is slower in response and in worst case scenario the query times out and if users are experiencing slower refresh time of the reports in PowerBI you can create aggregations which help you to speed up the refresh process the smaller size of aggregated tables imported to memory reduces the refresh time enabling a better user experience as well as that you can leverage aggregations to create and manage aggregations as a proactive measure to futureproof the solution thereby enabling a smooth scaleup of the company you’ve now reached the end of this module summary it’s time to move on to the discussion prompt where you can discuss what you’ve learned with your peers you’ll then be invited to explore additional resources to help you develop a deeper understanding of the topics in this lesson best of luck we’ll meet again during next week’s lessons you’re nearing the end of this course on data modeling in PowerBI you’ve put great effort into this course by completing the videos readings quizzes and exercises and you should now have a stronger grasp of the foundations of data modeling these include basic concepts of data modeling using DAX for analysis and optimizing a model for performance you’re now ready to apply your knowledge in the exercise and the final course assessment in the exercise you’ll build and optimize a data model putting everything you’ve learned into practice this is followed by the course assessment or graded quiz that consists of 30 questions related to topics you covered throughout the course but before you start let’s recap what you’ve learned in the first week of this course you discovered that data modeling is the process of creating visual representations of your data in PowerBI you can use these representations to identify or create relationships between data elements by exploring these relationships you can generate new insights into your data to improve your business microsoft PowerBI is a fantastic tool for creating data models and generating insights and you don’t need an IT related qualification to begin using it during your exploration of PowerBI you learned how to create data models using schemas and relationships analyze your models using DAX also known as data analysis expressions and optimize a model for performance in PowerBI you also explored key concepts related to data modeling you learned to identify different types of data schemas like flat star and snowflake create and maintain relationships in a data model using cardality and cross filter direction and form a model using a star schema in the second week of this course you focused on DAX or data analysis expressions this syntax is used to create elements and perform analysis in PowerBI you began by writing calculations in DAX to create elements and analyses in PowerBI you then explored the formula and functions used in DAX and used DAX to create and clone calculated tables you were then introduced to the concept of measures you learned where measures are used and what types are available you worked with measures to create calculated columns and measures in a mode and you learned about the importance of context and DAX measures finally you performed useful time intelligence calculations in DAX for summarization and comparison and learned how to use these techniques to set up a common date table in the third week of this course you learned how to optimize a model for performance in PowerBI you began by learning how to identify the need for performance optimization this means analyzing your data models to determine how they can perform more efficiently you then learned how to optimize your PowerBI models for performance you explored different techniques and methods for ensuring that you’re running efficient models and you also learned how to optimize performance using DAX queries now that you’ve built a solid understanding of the fundamentals of data modeling you’re ready to test your knowledge by undertaking the exercise and the final course assessment best of luck congratulations you’ve made it to the end of the data modeling in PowerBI course your hard work and dedication has paid off you’re making great progress on your data analysis learning journey and you should now have a thorough understanding of basic concepts of data modeling using DAX for analysis and optimizing a model for performance you should now have a firm knowledge of data modeling in PowerBI think about everything you can do with this new knowledge well done for taking the first steps towards your future data analysis career by successfully completing all the courses in this program you’ll receive a Corsera certification this program is a great way to expand your understanding of data analysis and gain a qualification that will allow you to apply for entry-level jobs in the field this program will help you prepare for the PL300 exam by passing the exam you’ll become a Microsoft certified PowerBI data analyst it will also help you to start or expand a career in this role this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to perform the following tasks prepare data for analysis model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX these are two concepts that you’ve explored in detail in this course and will continue to learn more about in future courses you can visit the Microsoft certifications page at http://www.learn.microsoft.com/certifications to learn more about the PowerBI data analyst associate certification and exam this course has enhanced your knowledge and skills in the fundamentals of data modeling in PowerBI but what comes next there’s more to learn so it’s a good idea to register for the next course whether you’re just starting out as a novice or you’re a technical professional completing this program demonstrates your knowledge of data modeling in PowerBI you’ve done a great job so far and you should be proud of your progress the experience you’ve gained will show potential employers that you are motivated capable and not afraid to learn new things it’s been a pleasure to embark on this journey of discovery with you best of luck in the future welcome to data analysis and visualization in PowerBI in this course you’ll discover the power of visualization in Microsoft PowerBI to create datadriven stories and solve realworld business problems data analysis and visualization are not only essential skills for data analysts to uncover and communicate data insights they are vital for organizations across different industries to flourish in today’s datadriven world from healthcare to finance data analysis and visualization play a critical role in informing decisionmaking and driving success with its extraordinary visuals PowerBI is a data analytics and visualization tool that you can use to transform data into intuitive visualizations it empowers you to present data in a visually appealing way that stakeholders can understand facilitating datadriven decisions you are currently on a path of discovery centered on data analysis in PowerBI exploring the skills tasks and processes that enable data analysts to create compelling data stories with PowerBI so what can you expect for this part of your learning journey you’ll start by diving into creating reports in PowerBI and exploring the various visualizations available to you and their potential to solve different business problems you’ll learn how to format these visuals and add them to reports and dashboards the powerful mediums through which you can provide stakeholders with insights in PowerBI you’ll master the art of designing reports and dashboards that are not just visually appealing but accessible userfriendly and interactive you’ll discover how to share your carefully crafted reports with stakeholders ensuring your hard work reaches the right audience and the journey doesn’t end there you can look forward to learning how to use visualizations and other features like AI to perform data analysis you’ll closely examine the data in your PowerBI reports discovering how to extract meaningful insights and value by using PowerBI’s analytical tools and performing advanced analytics by the end of this course you’ll learn how to recognize different types of visualizations in PowerBI add visualizations to reports and dashboards apply formatting choices to visuals incorporate useful navigation techniques into PowerBI reports design accessible reports and dashboards and use visualizations to perform data analysis to complete the course successfully you’ll need to apply the skills and knowledge you gain to a practical graded assignment in this assignment you’ll build reports and dashboards based on a realworld business scenario involving Adventure Works a fictional bicycle manufacturing company you may have encountered before in this program you’ll also need to complete a final graded quiz demonstrating your understanding of the key concepts in data analysis and visualization but no need to worry the videos readings exercises and quizzes in this course will gradually guide you through the learning material preparing you thoroughly for your assessment you have the flexibility to recap and revisit items as you need so watch pause rewind and re-watch the videos until you are confident in your skills the readings knowledge checks and quizzes will help you consolidate your knowledge and measure your progress ultimately this course is about more than just gaining knowledge and skills in data analysis and visualization in PowerBI it’s about setting yourself up for a career in data analysis by completing all the courses in this program you’ll earn a Corsera certificate to showcase your job readiness to your professional network plus the program prepares you for exam PL300 which leads to a Microsoft PowerBI data analyst certification globally recognized evidence of your realworld skills so are you ready to add data analysis and visualization skills to your data analyst toolbox well this course will equip you to recognize use and format different visualizations strategically design accessible and beautiful reports and dashboards and extract more value from your data using visualizations and advanced analytics best of luck as you embark on this learning journey renee Gonzalez the marketing director at Adventure Works walks into our office and finds a report on her desk the report is packed with data sales figures marketing campaign results regional statistics customer feedback and more but as she flips through the pages the strings of numbers and texts seem to blend failing to convey any meaningful story it’s like trying to decipher an alien language can she make informed decisions based on this data probably not data on its own is often meaningless but here’s the game changer when you apply the tools of data visualization and analysis the data starts to weave a story patterns emerge from the chaos trends become evident and the confusing jumble of numbers transforms into insights that can guide business decisions this is the power of business intelligence in this video you’ll explore the basics of business intelligence or BI specifically focusing on data visualization and analysis and the role it plays in making complex data accessible and understandable you’ll discover how business intelligence and data analysis go beyond data visualization providing deeper insights and forming the backbone of informed decision-making in its simplest terms business intelligence or BI is a technological approach to convert raw unprocessed data into meaningful actionable information for business analysis the heart of business intelligence is to create an environment where data informs strategic business decisions it’s about leveraging data to improve operations increase efficiency and boost financial performance bi uses several tools and methodologies to achieve these objectives including data mining analytical processing querying and reporting but two of the most critical tools in this toolbox are data visualization and data analysis data visualization is a graphical representation of information and data think charts graphs maps or any other visual format that makes complex data more understandable accessible and usable to grasp the power of data visualization let’s revisit the scenario at Adventure Works say the marketing director is examining the sales figures for different products in the last month the spreadsheet is dense with rows and columns of information you’d be hardpressed to discover any significant insights just by glancing at the raw data but imagine if you could take these numbers and transform them into a visually compelling line graph suddenly the sales trends are immediately visible it’s easier and quicker to identify high-erforming and underperforming products which can inform strategic planning and datadriven decision-making it may also provide insights into seasonality and the effect of marketing initiatives on income visualization is a powerful transformative tool used to spot patterns and anomalies identify trends and grasp complex data sets at a glance in addition to visualization another critical aspect of BI is data analysis while data visualization provides a graphical representation of your data data analysis dives deeper into these visualizations to undercover the reasons behind the trends and patterns data analysis is like the detective work of BI it sifts through data asks critical questions and uncovers the truth to illustrate the importance of data analysis let’s explore another term from BI profit margins the profit margin is a critical financial metric that provides insights into a company’s profitability you can calculate this by subtracting the cost of goods sold from sales revenue and dividing the result by the sales revenue but just knowing this profit margin figure isn’t enough let’s say for example that Adventure Works has a profit margin of 20% what does this figure tell you on its own not much but when you analyze this figure in relation to other factors the story begins to unfold for example to determine whether the margin is good or bad you can compare it across different periods or to the company average historical data or industry benchmarks you may also want to analyze the contribution of different products to profitability likewise you can also analyze the profit margin in relation to other financial metrics like sales revenue and expenses or external factors like market trends for a more comprehensive view of the financial health of Adventure Works data analysis helps you understand not just what is happening but also why it’s happening it allows you to diagnose problems spot opportunities and make informed decisions data analysis can also be pivotal in predictive analytics an aspect of BI that uses current and historical data to forecast future events behaviors and trends let’s imagine Adventure Works is planning to launch a new product line by analyzing past sales data customer behaviors and market trends you can predict how well customers might receive this new product its potential sales and even what type of marketing might be most effective this type of predictive insight can be instrumental in crafting successful business strategies as you embark on your own journey in the world of business intelligence remember that you’re not just a data analyst you’re a storyteller each strand of data is a part of your narrative and it’s up to you to assemble these strands into a narrative that guides a business to success remember data is just data it’s what you do with it that counts with data analysis and visualization you can transform data into actionable intelligence imagine a stakeholder at Adventure Works is handed a spreadsheet with numbers representing sales production and human resources data trying to draw conclusions or make decisions using these rows and columns is as challenging as navigating a dense forest with a paper map although the map may have all the information you need it isn’t easy to understand and interpret but what if there was a way to examine this data that’s immediately understandable and meaningful data visualizations can act like a navigation system with a clear interactive display that demonstrates how to navigate the forest of vast and complex data in this video you’ll learn about data visualization including its role in business intelligence and how data flows and is represented in visualizations in Microsoft PowerBI at its most basic a visualization is a graphical representation of data however visualizations are much more than just common graphical depictions converting raw data into a visual format using PowerBI can help you identify patterns trends and insights that might not be apparent in textbased data for example suppose Adventure Works wants to track the performance of its different bike types across various regions the data comes from several sources ranging from sales and regional reports to customer feedback in a spreadsheet this data would be complex and hard to digest however you can use PowerBI with its many ways to visualize data which you’ll learn about later to transform the data into a compelling interactive and easily digestible format visualizing data for business intelligence is crucial particularly in complex and dynamic business environments like Adventure Works let’s explore how data visualization in PowerBI can enhance business intelligence at an organization like Adventure Works the data generated from its operations is vast and complex visualizing this data simplifies the complexity transforming large intricate data sets into intuitive easy to understand graphical representations data visualizations can reveal patterns trends and correlations hidden in raw data for example Adventure Works could use a bar chart to visualize sales data demonstrating geographic regions where sales are the highest they could also use a scatter plot to identify correlations between marketing spend and sales performance powerbi’s interactive visualizations allow companies to dive deep into their data they can drill down into specific areas of interest such as analyzing sales trends for a particular product in a specific market over a given period leading to more precise datadriven decisionmaking visualizations make data more accessible to a broader audience not everyone at organizations like Adventure Works will be comfortable interpreting raw data but most stakeholders can understand a well-designed chart or graph as a result more stakeholders can engage with the data and contribute to datadriven decisionm visualizations are a powerful communication tool and can tell a compelling story with data making the insights more memorable and persuasive to demonstrate the success of a new product line to stakeholders at Adventure Works you could use visualizations to highlight key performance metrics in a visually engaging way now that you know more about the importance of visualizing data for business intelligence let’s explore how creating visualizations works in PowerBI creating visualizations in PowerBI begins with connecting to your desired data sources these can range from Excel spreadsheets to SQL databases once connected you can use Power Query to extract transform and load the data into PowerBI these transformations include renaming columns changing data types filtering rows and combining data from multiple sources you can then load this refined data into PowerBI’s data model for further manipulation using data analysis expressions or DAX a formula language for creating custom calculations the next stage of the workflow involves representing this processed data in visualizations powerbi provides a wide variety of visualization types such as bar charts scatter plots pie charts and even geographical maps after selecting a visualization type you map the data elements to different aspects of the visualization from adding values to the axes or fields to the color scheme PowerBI allows you to add slicers which are visual filters that allow viewers to segment and filter the data in real time to enhance the usefulness and interactivity of these visualizations the final step in the workflow involves arranging the visualizations on a report page and then sharing the report with other stakeholders the PowerBI service allows you to publish these reports enabling a broader audience to interact with them online even on mobile devices visualizations don’t only present data in a more understandable form they also enable realtime data analysis for example as sales figures at Adventure Works are updated the visualizations in PowerBI will update automatically this provides companies like Adventure Works with up-to-date accurate insights and enables them to react more quickly to changes in their business environment the next stage of the workflow involves representing this process data in visualizations data analysts must carefully craft them to communicate the right insights effectively this includes ensuring you select the correct type of visualization for the data you want to represent for example while pie charts are appropriate for displaying parts of a whole line graphs are more suitable for displaying trends over time an inappropriate choice of visualization can lead to misunderstandings or even misinformation visualizations are not only advantageous but essential in today’s datarich business environments rather than simple graphical representations of data used correctly visualizations are like keys to insights transforming the way stakeholders understand and engage with data and journey through the complex world of business intelligence with PowerBI you can guide stakeholders to strategic decisionmaking uncovering valuable insights and knowledge as a new data analyst at Adventure Works you’re overwhelmed with the vast amount of sales customer and manufacturing data you know the data contains invaluable insights about commerce customer behavior production efficiency and more but how do you translate it into meaningful information that stakeholders can understand and act upon you have a powerful solution PowerBI visualizations in this video you’ll learn about commonly used visualizations in Microsoft PowerBI you’ll discover their purpose and versatility in relation to data representation and interpretation you learned that data visualization is the graphical representation of data a method to uncover patterns trends and insights that may not be apparent in raw data visualizations communicate complex data sets in an intuitive and accessible way creating an approachable narrative that encourages datadriven decision-making let’s explore some of the common visualization types available in PowerBI and their practical uses in the context of Adventure Works the first visualization type is the column chart column charts are a clear straightforward way to compare different categories in a vertical orientation they can demonstrate data changes over time or illustrate comparisons among items column charts are generally used when there are fewer than 10 categories on the x-axis the horizontal axis at the bottom of the chart adventure Works could use a column chart to compare the sales of different bicycle models over the past year each column would represent a different product category and the height of the columns would indicate the sales figures allowing stakeholders to compare and contrast sales performance across models quickly bar charts are another powerful visualization for comparing different categories unlike column charts however bar charts are a horizontal representation of data the length of each bar corresponds to the quantity of the data it represents bar charts are useful for comparing larger quantities or categories with lengthy labels long labels are inappropriate for column charts as their vertical orientation means the labels appear sideways which can be challenging to read you can also use bar charts to display comparisons among discrete categories or non-ontinuous distinctly separate groups of data such as different payment methods for example Adventure Works could use a bar chart to compare the number of order transactions per payment category this clear and straightforward visual would make it easy for stakeholders to compare the performance of the different payment methods identify opportunities for payment option optimization and gain insight into customer behavior and preferences a further common visualization type in PowerBI is the line chart line charts are best suited for showing trends over time they connect individual numeric data points forming a line this visual is useful when you have a large data set and are interested in visualizing trends patterns or fluctuations in your data over time it’s particularly effective when used to represent many data points adventure Works could use line charts to track sales trends over time they might compare the monthly sales figures of different bicycles for the past five years to identify when sales peak and when they are slow helping inform strategic decisions about promotions and inventory powerbi also offers area charts which are in essence line charts except color or texture fills the area beneath the line these charts help compare two or more quantities and show part to whole relationships over time or across categories representing how individual segments contribute to an entire data set for example in an area chart for adventure works based on sales data each product type like mountain bikes or road bikes would be in an area on the chart showing its sales as a portion of the total sales this can help stakeholders understand how each product contributes to total sales and how this relationship changes over time now let’s explore pie charts pie charts are circular graphics divided into slices to illustrate numerical proportions this visualization type is ideal when you want to show a data set as a proportion of a whole each slice of the pie represents a category of data and the size of each piece is proportional to the quantity it represents from the whole adventure Works might use a pie chart to illustrate the proportion of sales made up by each product category each slice would represent a different product category and the size of each slice would be proportional to the revenue generated by that category this visual would enable stakeholders to understand which products contribute most to overall sales at a glance keep in mind that pie charts become less effective when there are too many categories to compare resulting in a high number of small slices in this case a bar chart might be better for clear visualization the last visualization type you’ll learn about in this video is the table tables in PowerBI are a way to view raw detailed data and exact numbers they display information in columns and rows providing a comprehensive numerical view of your data while they don’t offer the same visual impact as other chart types tables can display additional details that might be critical to stakeholder understanding of your data adventure Works could use a table to display a detailed monthly sales breakdown for each product category by region this would allow the relevant stakeholders to examine exact sales figures and make precise comparisons supporting detailed nuanced analysis in this video you discovered a range of common visualizations available to you in PowerBI each visualization type plays a unique role in data storytelling by understanding and effectively using the visuals in PowerBI you can transform raw data into a masterpiece that conveys insightful actionable information driving more thoughtful decision-making and improving business outcomes in a complex organization like Adventure Works sales reports are indispensable in coordinating sales efforts across regions and product lines let’s explore how to apply visualization items to a basic sales report once you’ve imported your data using get data on the home ribbon and cleaned and transformed it using the power query editor you can start adding visualizations to your report canvas first let’s add a column chart to visualize how sales are distributed among various product categories helping Adventure Works gain insight into the performance of different products from the visualizations pane select the clustered column chart button this will create an empty chart on your report page now that you have an empty clustered column chart it’s time to fill it with data you can find your data fields in the fields pane also referred to as the data pane or data section typically located on the far right side of the PowerBI interface these fields correspond to the columns in your data source find and select the product category field on your sales data source while holding the field drag it over to the Xaxis box under the visualizations pane releasing it will drop the field into the box by placing the product category field in the Xaxis well or input box you’re telling PowerBI to use the unique values from this field to create individual columns on the chart the next field you need to add to your chart is the order total field select and drag the order total field to the yaxis box as you did with the product category field and the x-axis when you drop a field into the y-axis box PowerBI will perform a calculation on that field for each category in this case it will calculate the sum of the order total for each product category and display this data in the respective column with this column chart stakeholders at Adventure Works can identify trends opportunities and challenges in product performance that can guide product development marketing campaigns and pricing strategies next let’s create a pie chart to represent sales distribution by different payment methods visually a pie chart will make it possible for stakeholders to determine how much of the total each payment method represents to start creating your chart select the pie chart button in the visualizations pane this will add an empty pie chart to your report page to start populating the chart with data find the payment method field in the fields pane and drag it into the legend well in the visualizations pane by putting the payment method field in the legend well you’re telling PowerBI to create a different slice of the pi for each payment method in your data after that find the order total field in the fields pane and drag it into the values well when you drop a field into the values well PowerBI performs a calculation on that field for each category by default PowerBI calculates the sum so it will calculate the sum of the order total for each payment method this pie chart can help Adventure Works understand key revenue streams and customer payment preferences and even guide decisions around payment processing partnerships finally let’s add a line chart visualization to the report line charts are effective for showcasing trends or changes over time for example this chart can help stakeholders recognize and understand the patterns and cycles in their sales data and identify any anomalies to create the line chart identify the line chart button from the visualization pane and select it this will generate an empty line chart on your new page to fill your empty line chart with data locate the order date field representing time and drag it into the xaxis field well located in the visualizations pane by doing this you’re instructing PowerBI to use time as the xaxis of your line chart which forms the basis for the trend analysis then locate the order total field and drag this field into the yaxis field well by default PowerBI will calculate the sales sum for each date and plot it as a data point on the line chart this offers stakeholders a practical way to visualize and understand sales trends over time stakeholders can use the line chart to inform strategic decisionmaking and drive business growth remember that PowerBI may make certain assumptions about your date data when creating line charts for example if your order date field includes specific times PowerBI might plot every unique timestamp to ensure PowerBI aggregates data according to your preferences select the drop-own arrow next to order date and choose your desired level of detail for example by year quarter month or day after creating your visualizations the next step is to save your report to ensure you don’t lose any of your work to save your report select the file option located in the upper left corner of the PowerBI interface a drop-own menu will appear from this menu select save a window will open asking you to name your report name it something descriptive to help you and others understand what the report is about such as Adventure Works Sales Analysis Report in this window select save again to finalize the process and there you have it you’ve learned how to apply visualization items to a basic report in PowerBI the sales analysis report complete with visualizations holds valuable insights for Adventure Works and will support datainformed decisionmaking imagine you are a data analyst at Adventure Works working with vast amounts of information daily while innovative and interactive charts can be flashy and captivating there are moments when your audience wants simplicity a straightforward no frrills presentation microsoft PowerBI’s table visualization is useful when you want to employ the classic clear-cut style of tables to ensure your audience can grasp the essence of the data quickly it elegantly presents refined data allowing viewers to immediately consume critical information and insights in this video you will learn more about the table visualization in PowerBI and how to configure it when you load a raw data set into PowerBI like an Adventure Works sales report with data from February March and April it is tough to pinpoint details quickly for instance figuring out the monthly sales for each region becomes a challenge and if you are trying to dive even deeper aiming to identify specifics like the number of orders that were either cancelled or shipped extracting this information from this raw format is a difficult task the table visualization in PowerBI can summarize all these insights and still present them in tabular format the same sales data is now presented using a table visualization the table displays summarized insights which is much more userfriendly to work with you can even customize the table visualization to improve its aesthetic appeal or aid engagement and comprehension now that you know more about the table visualization in PowerBI let’s learn how to configure this visualization once you load your data in PowerBI using a table visualization is quite straightforward open your report view and select the table visual from the visualizations pane this will instantly place this visual in the report area you can resize this visual by dragging the corners or sides while keeping this visual selected select as many data fields as you want for example you can select month and order total on the data pane this will give you an insight into monthly total sales if you want to break down the sales by different regions simply add the product region field from the data pane and the table visual will display monthly sales for each region adding another field order quantity to this visual gives you more insight into how many items were shipped cancelled or still under processing the visual even calculates the totals automatically displaying them at the bottom of the visual what if you want to see the order status in this table just select the order status field from the data pane notice how the table visual summarizes valuable information like order quantity and order total for each row you can sort any of these columns by selecting the column header for example selecting the product region column header sorts it in ascending order another click on the same header will sort it in descending order you can change the sequence of these columns by dragging the fields up or down on the visualizations pane let’s drag the order status after the product region notice how the visual changed the way it’s displaying the data it now shows the order status column right after the product region column you have the option to format this table visual and change its appearance by customizing various options available in the format tab expand the style presets option and select any preset from the available dropdown the appearance of your table will change instantly you can also further customize the table by expanding other sections for example you can display horizontal grid lines by expanding the grid section and selecting your desired color and width you can also change the table header font size color and other options by expanding the column headers section there are many other options to format the appearance and feel of the table whether to reflect your brand colors or to increase its visual appeal for your audience using raw data can feel like looking for a needle in a haystack it can be overwhelming messy and confusing but using table visuals in PowerBI is like sorting that haystack into neat manageable piles making it easier to find what your audience is looking for with data neatly laid out rowby row and column by column table visualizations present insights clearly and are an invaluable tool for bridging raw data and actionable intelligence your manager asks you to present a sales report to key stakeholders during a business meeting later in the week imagine you receive an Excel file containing all the adventure work sales data for the current year the sales department wants an appealing report that offers a comprehensive view of the company’s monthly sales volume and the number of processed orders and cancellations so what is your strategy for completing this task this is where Microsoft PowerBI’s bar and column charts can make you shine in this video you’ll discover the different bar and column charts in PowerBI that can help you efficiently represent your data you will also learn about the four field wells you can use to customize these charts axis legend values and tool tips previously you learned that bar and column charts are popular types of visualizations to display data in a clear and organized way they are beneficial for showcasing categorical data or data that can be organized into distinct groups bar charts display data horizontally whereas column charts display data vertically the simplicity and intuitive nature of bar and column charts make them effective tools for presenting data and identifying patterns or trends over time with six different types of bar and column charts in PowerBI you can convert raw data into visually appealing and meaningful insights let’s explore each of these chart options their features and how to add and configure them to PowerBI it can be difficult to identify patterns or insights when working with raw data sets containing text and numbers in this data set sales volume across different regions and the order status such as shipped or cancelled are organized into various columns let’s examine how to visualize this data using the different bar and column charts available in PowerBI to create a bar or column chart that demonstrates the number of orders by status and month select the month order quantity and order status data fields from the data pane with the relevant data fields selected let’s start by placing a bar chart on the report area you can do this by selecting the stacked bar chart icon on the visualizations pane you can resize it as needed by dragging its edges with this chart stakeholders can quickly compare and gain insight into the number of orders shipped cancelled or processed during February and March this is much easier to interpret compared to working with the raw data set you have the option to visualize this data using the variety of bar and column charts available to you to change the chart type select the chart you placed and then select the relevant icon from the visualizations pane such as the stacked column chart a stacked column chart is like a stacked bar chart but data is displayed as columns instead of horizontal bars another option for visualizing the data is a clustered bar chart in a clustered bar chart the values are displayed in individual bars instead of a group in the next option the clustered column chart the data is shown in individual columns the last two options are the 100% stacked bar chart and the 100% stacked column chart in both charts important insights are displayed on the tool tips for example if you hover your mouse over any of these bars or columns PowerBI displays the percentage and value of any grouped item such as the order quantity in PowerBI you can select any of the charts individual bars or columns to highlight them the other items fade making the selected items more prominent this is useful for highlighting specific areas or insights of interest now let’s explore four essential field wells in these charts the legend X and Yaxis and tool tips the field wells represent different sections of your chart that you can customize according to your requirements the first field well is called a legend it displays under the title or on the side of a chart the legend field controls the color coding or grouping of the bars or columns in your chart it helps to differentiate between different categories or subgroups within the data the legend makes it easier to understand which color in the chart represents which item you can hide the legend by turning it off in the format tab on the visualizations pane you can hover your mouse over the bar or column to display the data if the legend is not shown the next field wells are the X and Y axis each axis represents the data points you want to compare or analyze for bar charts the X-axis shows the values like order quantity and total sales and the Yaxis shows the categories like month or product regions for column charts this is reversed the x-axis shows the category and the y-axis shows the values like order quantity or total sales the final field well is called tool tips a tool tip displays data or extra information when you hover over the data points of a chart understanding the different types of bar and column charts in PowerBI such as stacked clustered and 100% stacked charts allows you to present your data in visually engaging and meaningful ways by using the four field wells axis legend values and tool tips you can create customized visualizations that are informative and insightful adventure Works is preparing for their annual sales conference your team leader has tasked you with presenting a report that portrays the direction of sales trends the report must also incorporate monthly information regarding delivered pending and canceled orders this is where Microsoft PowerBI’s line and area charts become instrumental in this video you’ll explore line and area charts when to use them and how to add them to your reports learning to use these charts is essential for creating attractive reports that empower stakeholders to make informed and effective decisions a line chart uses a line to connect individual data points it is the perfect tool for illustrating a sequence of values or displaying trends over a time period for example a line chart can help Adventure Works understand how sales are progressing monthtomonth or year to year a line chart with multiple lines can show sales across different regions over time and help the stakeholders understand the trend or sales performance while a line chart focuses on trends an area chart emphasizes the magnitude of changes it can display the part to whole relationships among your data making it easier to compare quantities for example regional sales represented by an area chart can help stakeholders intuitively understand and compare the degree each product region contributed to total sales for each month there’s a variant of the area chart called a stacked area chart where the data points from multiple categories are stacked on top of one another this can be useful when emphasizing the total across several categories for example you could use a stacked area chart to illustrate the total orders over a period and demonstrate how each product region contributes to the total so how do you decide when to use bar or column charts which you learned about previously or line and area charts when presenting a few items bar and column charts can be visually appealing and effective however when dealing with many data points these charts can become cluttered and difficult to read each bar or column takes up a certain amount of space and the chart can become overcrowded if there are too many to plot unlike bar and column charts area charts are effective for visualizing changes in multiple values over time both line and area charts are effective in visualizing the changes in values of multiple categories particularly over time while line charts are useful for identifying trends area charts offer a further benefit they help us interpret the magnitude of the values they also effectively illustrate the cumulative impact of the data points over the selected time providing an overall picture of the data trends now that you’ve been introduced to line and area charts let’s take a moment to explore how you can create them in PowerBI start by importing the Adventure Works quarterly sales data set file to a new PowerBI project in PowerBI the line chart area chart and stacked area chart icons are available in the visualizations pane to create a line chart select the line chart icon from the visualizations pane and place it on the report section open the data pane and select two fields month and order quantity the x-axis of the visualization is sorted by descending order quantity to modify it to ascending order navigate to the visual settings and select sort access and sort ascending a line chart is handy for illustrating trends for example this line chart displays the

total sales from February to April it clearly demonstrates an upward trend in sales for the quarter the sales team at Adventure Works may also want to compare the performance and trends of different regions across the quarter to do this select the line chart open the data pane and select the product region the line chart now indicates that although there appears to be a general upward trend in sales in all regions the European region outperformed both Asia and North America in February March and April as you discovered earlier you can display your data another way using area charts and stacked area charts to create a new area chart select the area chart icon from the visualizations pane place it on the report section and select the month and order quantity fields from the data pane using the visualization settings change the ascending order quantity to descending order in the x-axis to highlight the increase again for a more nuanced understanding of the number of orders for the quarter you may want to display the data by individual regions to do this select the product region field from the data pane while keeping this chart selected the sales team can get a better idea of how the regions contributed to the order quantity in February March and April you can also display the values in a stacked manner you can do this by selecting the visual and then selecting the stacked area icon on the visualizations pane this allows you to display the individual values as well as the total on a single chart in all these charts you can hover over the data points to display the values in a tool tip for example a tool tip could display the exact sales figure for a specific month this tool tip is one of the four essential field wells available in many visualizations in PowerBI the other three important field wells are the legend the X and the Y-axis you can configure the titles of these axes colors and other details by selecting the paintbrush icon on the visualizations pane this will open the format tab where you can make any necessary changes line area and stacked area charts are potent tools in PowerBI that can convert complex data into easily understandable visuals learning to use these visualizations and their essential field wells can equip you to deliver effective PowerBI reports that present clear and compelling comparisons of data over time and across different categories the sales manager at Adventure Works wants a comprehensive overview of how order quantity relates to overall sales performance for the past few months while bar charts can easily display the sales or the order quantity juggling these metrics on one chart could be a visual challenge likewise line charts offer an excellent way to track changes over time but won’t show the difference between sales and order quantities by visualizing the order quantity and total sales metrics for the past few months simultaneously the sales manager can quickly identify any patterns or trends and make strategic decisions to boost sales performance this is where combination charts referred to as combo charts in Microsoft PowerBI can help in this video you’ll learn more about these charts including how to create and format them in PowerBI a combo chart is a dynamic combination of a line and a column chart allowing you to visually represent two different yet interconnected data points powerbi offers two types of combo charts a line and a stacked column chart and a line and a clustered column chart a line and stacked column chart is helpful for displaying a total across the series of data and how each individual part contributes to the total for example you could create a line and stacked column chart for the sales team using columns to visualize total monthly sales each stacked by different product regions the line represents a different but related factor order quantity on the other hand line and clustered column charts are excellent for comparing several sets of data side by side this can be useful to track and compare different metrics over the same period for instance you might have columns representing the sales of each product region by month with a line indicating the average order quantity across all regions as a PowerBI analyst combo charts are one of the many essential visualization tools in your toolbox so let’s delve into the process of adding and setting up a combo chart in PowerBI suppose you need to create a combo chart in PowerBI using an Adventure Works data set containing sales data the purpose of the chart is to provide the sales team with insights into orders for February March and April including the overall performance of each month and each sales region to create this combo chart you’ll need four data fields: month order quantity order total and product region let’s start by placing a line and stacked column chart on the report area from the visualizations pane you can resize the visualization by dragging its edges select the chart while keeping it selected open the data pane on the visualizations pane and select the month order quantity and order total fields in the column yaxis field in the visualizations pane order quantity and order total appear together select the order quantity field and drag it to the line yaxis field both the line and column visuals now appear on the inserted chart now let’s add one more field from the data pane product region the chart now has a stacked look with each colored segment representing the contribution of each product region to the order total stakeholders can now not only compare the sales performance over the quarter but also compare the performance of each region monthtomonth you can also sort the chart in ascending order to do this select the three dots on the top right corner of the chart followed by sort axis from the drop-down menu and sort ascending you can change this chart to a line and clustered column chart by selecting the chart and then selecting the line and clustered column chart icon on the visualizations pane let’s briefly explore some of the key field wells for the chart the x-axis or shared access for the line and columns displays the categories in this chart month is used as the category the line y-axis is where you place the data to be displayed as a line like sum of order quantity the column yaxis is where you place the data to show as columns like order total and finally the legend is used to add categorical fields to the chart for example the product regions when you hover over a data point with your mouse some default values for the data point display if you’d like to add additional information to this displayed data select the appropriate fields from the data pane and drag them to the tool tip area combo charts in PowerBI are yet another tool in your data analytics toolbox with your knowledge and understanding of these charts and their functionalities you can present complex and related data points seamlessly and in a visually compelling way at Adventure Works your recent report made quite an impact your manager asks you to create another Microsoft PowerBI report adding visualizations other than the area charts you used previously your team suggests using pie and doughut charts which can offer similar critical insights to area charts but are clearer when many items have the same data range as it can be difficult to identify these items correctly in an area chart this is where pie and donut charts can be helpful in this video you will learn about these charts and how to use them in your PowerBI reports pie and donut charts are two types of visualizations available in PowerBI these charts which are circular and cut into slices provide a way to represent data proportionally while pie and doughut charts are useful for comparing different categories they become less effective when comparing large amounts of categories as the slices can become too small and difficult to distinguish between choosing between a pie and a donut chart depends on the specifics of your data and your report requirements let’s explore each type of chart starting with a pie chart in a pie chart each slice of the pie corresponds to a unique category from your data set the size of each slice is directly proportional to the quantity it represents suppose you have a quarterly sales data set with a pie chart you can visually compare the contribution of each month to the total sales the larger the slice the higher the sales for that month providing your audience with an immediate and intuitive understanding of the distribution of sales like a pie chart a doughut chart segments are proportional to the data they represent the difference between a pie and a doughut chart is that the doughut chart is ringing shaped with a circular central space you can use this space to provide context for the surrounding segments returning to the sales data example you could use the donuts chart center to highlight total sales average sales or any other key metric you’ll learn more about this later in the course when choosing between a pie and a doughut chart to represent parts of a hole the doughut chart may be a better choice if you’d like to display additional information in the space in the center having explored pi and doughnut charts let’s uncover the steps for adding and configuring them in PowerBI imagine you need to create a pie chart using a quarterly sales data set from Adventure Works for the pie chart you need to specify at least two data fields let’s start by placing a pie chart on the report area from the visualizations pane and resizing it by dragging its edges select the pie chart and while keeping it selected open the data pane and select two fields month and order quantity ensure that month goes to the legend field and the order quantity goes to the values field you can add more data to create a more detailed pie chart or illustrate additional insights for example you may want to examine the total order quantity by region to do this select the product region field from the data pane and ensure that it goes to the details field now the pie chart slices display the total order quantity sold in February March and April for Asia Europe and North America you can sort this chart by order quantity to display the slices in size order to do this select the three dots in the top right corner of the chart select sort axis and then sort ascending you can also visualize this data using a donut chart which also shows the relationship of parts to a whole to convert the pie chart to a donut chart select the pie chart while it is still selected select the doughut chart icon on the visualizations pane unlike a pie chart the center of the doughut chart is blank this allows space for additional information that can provide context for the surrounding segments to make your charts more interactive and display more data when presenting them to your audience you can enable drill mode for example select product category from the data pane and then select the drill down icon to turn on the drill mode ensure that product category goes to the legend field there is no visual change if you add the product category field when the drill mode is off once you turn on drill mode you can display the additional details by selecting each slice for example if you select the slice that displays the total sales in April more information is displayed to return to the main chart select the drill up icon in the dynamic world of data analytics the correct visualization can make all the difference pi and donut charts offer clean effective ways to visualize and compare proportions to illustrate the relationships within your data by using these visualizations in PowerBI you can present clear and engaging presentations you’ve been exploring the range of visualizations that Microsoft PowerBI offers one of these is a tree map chart like a pie or donut chart tree maps are another helpful tool in PowerBI for illustrating your proportional data however instead of circles tree maps use rectangles to display your data you might be wondering why do I need another chart if they serve a similar purpose using different chart types can enable you to make the best use of space in your reports and add variety by displaying data in new and exciting ways in this video you’ll become familiar with tree map charts understand their applications and how to craft them in PowerBI to create insightful presentations a tree map is a unique visual used to display hierarchical data or data that’s organized in a treelike structure as nested rectangles the entire chart represents the total data set or tree and each rectangle or branch represents a portion of the whole tree each rectangle’s size corresponds to the value or size of the data it represents while pi and doughut charts are familiar and widely used to represent data proportionally they have limitations for example pie and donut charts can become cluttered and difficult to read when dealing with many categories or variables or when the differences between data points are small however the design of a tree map chart allows for easier visualization and interpretation of larger data sets its rectangular nested structure means it can handle more data points without becoming overly complex to illustrate this pie chart represents sales at Adventure Works across Asia Europe and North America for one quarter when you convert the same chart to a tree map it becomes less cluttered and the information is presented in a more readable way now let’s create a tree map chart using a quarterly sales data set from Adventure Works let’s start by placing a tree map chart from the visualizations pane on the report area you can resize it as required by dragging the edges to create a tree map chart you need three fields to add data fields select the chart while keeping it selected open the visualizations pane and select month order total and product region from the data pane this visual automatically directs the selected data fields to the appropriate field wells month to the category well product region to the details well and the sum of the order total to the values well if you are not satisfied with this automatic selection of the field wells you can manually drag the data fields to the appropriate field well let’s compare this tree map chart to a pie chart created using the same data there is a legend in the pie chart which is absent in the tree map chart because the month names are already displayed in each branch inside the tree a separate legend is not required also the pie chart displays the data values by default which are missing from the tree map chart you can enable the data values in a tree map chart to do this select the chart and open the format tab on the visualizations pane select data labels to turn on the data values now the tree map chart displays the values beside the month and the region name similar to a pie and donut chart you can add more fields to the tree map chart and enable drill mode to add more data fields select the data field order status from the data pane while keeping the tree map chart selected a drill down arrow icon appears on the top right hand corner of the chart select the drill down icon to enable the drill mode then select any branch to display the detailed information making it interactive if you’d like to return to the main less detailed visual you can select the drill up arrow icon you can also customize your tree map by changing the font size of the category and data labels and colors of the categories to do this open the format tab on the visualizations pane then open the data and category labels section here you have the option to change colors and the font sizes of your chart as needed tree mapap charts offer a unique approach to displaying hierarchical data allowing for efficient use of space clear comparisons and effective handling of larger data sets while pi and donut charts are popular knowledge of tree map charts provides an added layer of flexibility and depth to your reports you now know what a tree map is and how it can elevate your data storytelling and presentation skills well done imagine you are in a sales meeting presenting a chart focusing on employee turnover rates at Adventure Works while this chart may help management understand why employees are leaving the company or make resourcing decisions it is not useful in the context of the sales department that’s because the chart is not representing a key performance indicator relevant to the sales department such as total sales revenue previously you discovered the importance of creating targeted charts to help stakeholders make informed decisions these charts are tailored based on the key performance indicators or KPIs relevant to different departments in this video you’ll learn more about visualizing KPIs by exploring the elements available in PowerBI to display KPIs in an engaging way kpis differ from regular charts and metrics because they align directly with strategic business objectives instead of simply presenting raw data KPIs offer insight into how that data impacts overall business goals and progress a well-designed KPI visual helps stakeholders clearly understand organizational or departmental goals and the metrics that signify progress by providing a concise summary of complex data KPI visuals make it easier and more efficient for stakeholders to comprehend a business’s overall performance progress and key metrics this empowers stakeholders to make informed decisions and implement datadriven strategies to promote successful business performance microsoft PowerBI offers a range of visualizations to display KPIs including cards multirow cards gauges and the KPI visual let’s explore each of these visuals and their uses the card visualization displays one value or a single data point this type of visualization is ideal for representing essential statistics you want to track on your PowerBI dashboard or report for example you could use a card visual in a sales dashboard to provide a snapshot of the total sales revenue enabling stakeholders to gain instant insight into overall financial performance next is the multirow card visualization that displays one or more data points with one data point for each row another visualization you can use is the radial gauge this visual is a circular arc that displays a single value measuring progress toward a goal or target or indicates the health of a single measure although radio gauges can highlight critical insights in a visually appealing engaging way they take up a lot of space compared to the insights they provide let’s examine the structure of this visual powerbi spreads all the data values evenly along the arc from the minimum leftmost value to the maximum rightmost value the default maximum value is double the actual value you should specify the target minimum and maximum values using the corresponding field wells in the visualizations pane to create a realistic gauge chart that represents your data the shading in the ark represents the progress towards your target and the value underneath the ark represents the progress value lastly the KPI visual in PowerBI is a powerful tool for tracking the performance of a metric against a target the KPI visual also includes a trend line or chart to show the data’s trajectory over time in this case the chart is showing the daily sales trend against the target of $10,000 it displays an indicator that shows whether the performance is above or below the target for example this KPI visual clearly indicates that the total sales amount on the last day is falling behind the target the KPI visual usually has three field wells indicator which is the primary measure you are tracing trend axis which shows how the indicator is performing over time and target goals which represents the benchmarks you are trying to achieve you’ll place the relevant measures or fields into these field wells to represent your data accurately and comprehensively with the chart key performance indicators act as a health checkup for a business providing stakeholders with insights into their progress toward reaching business goals by using PowerBI’s card multiro card gauge and KPI visuals you can make KPIs quick and easy to understand that means stakeholders can make informed decisions and reach their goals faster suppose you’re a data analyst at Adventure Works as the financial year ends you need to provide management with a report analyzing sales trends and financial performance across regions throughout the year ribbon and waterfall charts in Microsoft PowerBI can help you achieve this goal in this video you will learn about these specialist charts and how to use them in your PowerBI projects a ribbon chart is a form of stacked chart for visualizing data that changes over time and has a clear ranking order these charts stack the highest ranked series at the top of the chart making it easy to track shifts in the rankings over time they are also helpful for comparing the performance of different categories across distinct time intervals in the adventure work scenario management wants to understand the sales ranking of various regions throughout the year this ribbon chart effectively conveys how the different sales regions performed compared to each other and how their sales rankings varied from February to April waterfall charts show a running total as PowerBI adds and subtracts values these charts are useful for understanding cumulative effects in data analysis and visualization cumulative effects refer to how an initial value is affected by a series of positive or negative sequential factors events or changes over time for example a waterfall chart can be used in financial analysis to visualize how a company’s net income results from a cumulative effect of various financial elements including revenue costs and other factors like taxes this waterfall chart depicts how adventure work sales total changed from February to April for the different product regions showing a general upward trend with this visual stakeholders can intuitively grasp the overall sales performance as well as easily compare and contrast the contributions of each month and the regions to the sales total over time now let’s take some time to explore how to configure ribbon and waterfall charts in PowerBI you can start with a blank PowerBI file this data set contains sales data for Adventure Works across different regions over time let’s place a ribbon chart from the visualizations pane on the report area you can resize it as needed the aim of the ribbon chart is to demonstrate the change in sales value and ranking changes in categorical data like product regions and month so you’ll need to include three data fields to display the data properly while keeping the chart selected open the data pane and select the relevant fields month product region and order total ensure that month goes to the xaxis field product region to the legend field and order total to the y-axis field none of these fields is optional when creating a ribbon chart you can sort the category fields by selecting the three dots on the top right corner of the chart followed by sort axis let’s select sort ascending to ensure the months are sorted in the correct order note that each month has two distinct areas on this chart first is the actual sales value for each region the other shaded area shows how that region performed compared to the previous month’s data for example by hovering over this shaded area for Europe in April the tool tip reveals that Europe’s sales rank changed from second in March to first in April you can create a waterfall chart using the same process as you followed with the ribbon chart alternatively you can convert the ribbon chart you created by selecting it and then selecting the waterfall chart icon from the visualizations pane there are four field welds in this waterfall chart category breakdown order total and tool tips ensure that month goes to the category field which defines the x-axis and shows the individual positive and negative values then ensure the product region goes to the breakdown field which represents different segments in the category however unlike ribbon charts this field is optional in waterfall charts lastly ensure the order total goes to the yaxis field this field denotes the yaxis values to calculate the running total if there is a decrease in the sales total the waterfall chart displays red areas to observe this you can sort the chart in descending order by selecting the three dots in the top right corner then selecting sort axis and sort descending each month shows the total sales and how these regions are performing compared to the previous month’s data you can find out additional information about this performance using the tool tips field by hovering over any of the red or green areas you learned about two specialized charts in PowerBI ribbon and waterfall charts ribbon charts help represent rankings and their shifts over time which is ideal for sales performance analysis across categories waterfall charts on the other hand are perfect for breaking down the cumulative effects of various factors providing clear insights into financial performance these charts are impactful visualizations for complex data sets the sales manager at Adventure Works has noticed a recent decline in online sales despite continued marketing efforts and website traffic concerned that marketing strategies may not be converting leads into sales the marketing team asks you to create a visualization that represents the customer journey from lead or interest in the product to actual sales they’d like to gain insight into dropoff rates between the stages and identify areas they can improve their marketing strategies to improve sales performance funnel charts in PowerBI are one type of visualization you can use to represent the progression of data through different stages like a sales workflow in this video you’ll learn about funnel charts and how to implement them in PowerBI the funnel visualization displays a linear process that has sequential connected stages where items flow sequentially from one stage to the next funnel charts are commonly used in business or sales contexts they are well suited to visualizing data that’s sequential and moves through at least four stages where you expect a greater number of items in the first stage than in the final stage the charts can help reveal bottlenecks such as where a significant number of items are being lost are not moving forward in linear processes in addition you can use them to calculate a potential outcome by stages such as revenue sales or deals and track conversion and retention rates these rates relate to how many potential customers move through each stage of the sales process and stay in the process similarly you can use them to track the progress and success of click-through advertising campaigns now let’s take a moment to examine an example funnel chart representing the stages of a sales workflow each bar in the chart represents a stage the customer goes through during the sales process it begins with the lead stage at the top of the funnel representing customers interested in a product or service the qualify solution and proposal stages follow where these leads are evaluated for their potential presented tailored solutions and then sent formal sales proposals lastly the finalized stage is where the lead agrees to the proposal closing the sales deal each stage in the chart decreases as the lead conversion process progresses creating a funnel shape the narrowest part of the funnel represents the leads that resulted in actual sales now that you know more about funnel charts and their uses let’s explore how to create and configure a salesfunnel chart in PowerBI for the sales team at Adventure Works you’ll start with a blank PowerBI file the data set contains sales data including information about the lead conversion stages let’s start by placing a funnel chart on the report area from the visualizations pane you can resize it as needed keeping the chart selected open the data pane and select two fields sales ID and conversion stage ensure that conversion stage goes to the category field well and sales ID to the values field well category defines the stages of the process and values assigns the numeric data to each stage notice the shape of the funnel the highest value is displayed on the top gradually displaying the lower values each of the horizontal bars in a funnel chart is called a stage as mentioned before this is the typical pattern of the sales conversion process many people are identified as potential leads in the first stage but the number gradually decreases as they finally become the customer if you hover your mouse over each stage it displays information that compares to its previous stage and the highest or the first stage you can use the tool tips field well for providing this additional information when hovering over a specific stage you can format the colors of each stage whether to reflect your brand colors or improve readability and aesthetic appeal to do that go to the format tab on the visualizations pane and open the colors section then turn on show all and select the color for each stage you can also sort funnel charts in reverse order where the lowest value shows at the top and the highest value at the bottom you can do that by selecting the three dots icon at the top right corner of the chart then sort a access and sort ascending funnel charts are an invaluable tool for presenting sequential or staged data these charts provide a clear and concise visualization of various stages of a process such as a sales pipeline or customer journey enabling you to identify trends bottlenecks and opportunities by incorporating funnel charts into your PowerBI reports you can provide stakeholders with a comprehensive view of essential data supporting more informed and strategic decisionmaking suppose Adventure Works has been facing a steady decline in its profitability for some months marketing has invested heavily in advertising across multiple platforms and has run several promotional campaigns to boost sales the company is struggling to understand the relationship between its advertising spend and its sales revenue in this video you will learn about scatter charts their purpose and configuring them in PowerBI scatter charts are a powerful tool in data visualization they use dots to represent values obtained for two variables in a data set plotting these two numeric variables along two axes scatter plots help illustrate how one factor is affected by another representing correlations between the variables the relationship between the variables can be linear follows a straight line nonlinear follows a curved line or random scatter charts can help you identify trends patterns and perhaps most importantly anomalies like outliers in your data anomalies refer to deviations from the general pattern of the data outliers are a type of anomaly where valid data points significantly differ from other observations deviating from the general data trend they tend to lie far away from other data points in a scatter chart for example in a scatter chart representing the relationship between sales revenue and advertising spend at Adventure Works you might expect the data points to show a positive correlation where higher advertising spend is associated with more sales an outlier would be a data point representing unusually high sales revenue and low marketing spend this data point is worth investigating as it may indicate an effective marketing strategy able to generate revenue beyond what is expected based on the amount of money spent on marketing a keen eye for outliers is essential because they can dramatically skew statistical measures and data distributions though they might seem problematic at first outliers often carry vital information about the process under investigation or the data gathering mechanism they can help businesses gain valuable insight into potential issues or areas for improvement and optimization let’s help Adventure Works investigate the relationship between their advertising spend and sales revenue by creating a scatter chart the company can also explore any outliers using this chart enabling them to quickly identify issues areas for improvement and exceptional successes let’s use an imported data set containing Adventure Works sales and advertising expenditure data for this task to understand how various advertising media are performing with their advertising budget against the sales revenue you need to compare two fields the sales revenue and profit margin you need to identify each of these items via their campaign ID and platform type start by opening the report view place a scatter chart in the report area by selecting the scatter chart icon from the visualizations pane and resize accordingly while keeping the chart selected open the data pane and select these four fields campaign ID profit margin sales revenue and platform the campaign ID should go to the values field these represent your individual data points the profit margin goes to the xaxis field the sales revenue goes to the yaxis field and the platform goes to the legend field the x and yaxis field wells contain the data fields to compare against each other to display more data when hovering over a data point drag the advertising spend field from the data pane to the tool tips field now hover over any data point to see the updated tool tip this scatter chart is visualizing the correlation between marketing spend and sales the data points or markers are shown as dots you can manually change the size of these markers if needed by opening the format tab and the markers section the data points behaving as expected are closely gathered in the chart creating a cluster there are three outliers instantly evident this makes it easy to investigate these data points and gain insight into what caused the deviations from the expected pattern the data point in the leftmost corner represents a campaign that has an unusually high advertising spend compared to its sales revenue this is not in line with the trend seen in the other campaigns where a lower advertising spend usually correlates with a higher sales revenue marketing can use this insight to make decisions around resourcing for example reallocating the advertising budget to campaigns that are not underperforming in contrast the data point in the middle represents a campaign demonstrating a substantial deviation from the expected trend with a low advertising spend yet an unusually high sales revenue likewise for the data point on the top right corner sales revenue is exceptionally high given its relatively low advertising spend this campaign outperforms all others in terms of sales despite the minimal investment in advertising stakeholders can investigate these outliers to gain insight into the successful strategies and optimize other campaigns two additional field wells for scatter charts in PowerBI are worth noting the size field enables you to change each marker size dynamically it provides insight into how additional factors are affecting the data points for example let’s drag the advertising spend data field to the size field on the visualizations pane notice how the size of the data points change with the dot in the leftmost corner being the largest and the dot in the top right corner being the smallest the size of these points is now representing the advertising expense you can also add animation to your chart by adding a data field to the play axis for example let’s drag the advertising spend field to this play axis the chart now displays as a video like a player with a play button when you play it will animate each data point and display advertising spend in the top right corner this is useful for engaging audiences during presentations in this video you discovered scattered charts in PowerBI a type of visualization you can use to represent the relationship between two variables scattered charts are a powerful data visualization tool for uncovering outliers providing insights into trends and patterns and assisting datadriven decision-making they are an essential part of any data analyst’s toolkit congratulations you’ve completed the first module of this course creating reports in Microsoft PowerBI this week you are introduced to the different types of visualizations in PowerBI and how to add them to reports and dashboards with an emphasis on the significance of visualizations in presenting valuable insights to stakeholders you started the week by exploring the course overview and structure as part of your course introduction you set up your PowerBI environment and online account preparing you for the course exercises you also explored the importance of visualization and analysis in the context of business intelligence using real world scenarios and terms to enrich your understanding next you were introduced to visualizations in PowerBI starting with an overview of their importance in business intelligence you discovered the power of visualizations to simplify vast and complex data uncover patterns and trends enable detailed investigations of data make data accessible to and engaging for all kinds of stakeholders and communicate your analysis insights effectively you also explored creating visualizations in PowerBI a process that involves connecting to your data sources extracting transforming and loading your data selecting your visualization types and mapping data elements to different aspects of the visuals arranging the visualizations on the report page and finally sharing your report you learned how to apply visualization items to a basic report and were introduced to some common business reports you then familiarized yourself with the visualizations pane in PowerBI gaining hands-on experience in creating your own business report a sales report for Adventure Works you also explored how to pin visualizations in PowerBI in order to empower stakeholders to access key insights quickly encourage collaboration and promote a datadriven culture in your third lesson you delved deeper into basic visualizations in PowerBI you explored bar and column charts line and area charts combo charts pie and donut charts and tree map charts you not only learned how to create these different charts but also when and how to use them for maximum impact and effective data representation you also had the opportunity to practice your new skills by completing various activities and tasks using different chart types plus you discovered how important it is to target your data visualizations based on the needs of your audience with the basic visualizations covered you moved on to some of the specialist visualizations in PowerBI you learned about key performance indicators which are measurable metrics linked to an organization’s objectives and their vital role in business you were introduced to cards multi-roll cards gauges and KPI visuals visualization types in PowerBI that you can use to represent KPIs in business reports kpi visualizations provide stakeholders with a snapshot insight into overall performance and progress towards goals you also learned about ribbon waterfall funnel and scatter charts including their different purposes and how to configure each of them in PowerBI you then had the opportunity to put your knowledge to good use by creating a performance report for the marketing team at Adventure Works configuring visualizations that showcased relevant KPIs and answering realworld questions about performance over time you are now equipped with essential data visualization techniques and report creation skills in PowerBI you will build on your learning thus far discovering how to enhance the user experience and accessibility of your reports keep up the momentum and ensure you use the quizzes and additional resources to further consolidate your learning you’re a data analyst at Adventure Works a company that relies heavily on data analytics for decision-making the company recently added some talented individuals to its sales team including Logan who is visually impaired and uses screen reading software to access digital content soon after joining the team Logan realizes that the Microsoft PowerBI reports he receives are not entirely compatible with his screen reader he finds it difficult to interpret the visuals and graphics and there are some components that he cannot access recognizing the potential impact on Logan’s performance and the ability of the sales team to make datadriven decisions his manager immediately alerts the data analytics team while their reports are comprehensive and visually appealing the team has neglected the critical aspect of accessibility in this video you’ll learn about accessibility in data and reporting its importance in the business context and designing PowerBI reports that are accessible and inclusive to all in the context of digital systems accessibility refers to products applications websites and tools designed to allow all users to use them effectively regardless of whether they have any disabilities accessibility practices cover a wide variety of elements to ensure the usability and inclusivity of digital content this includes enabling digital content compatibility with assist of technology or AT which is used to increase maintain or improve the functional capabilities of people with disabilities such as Logan’s screen reader powerbi supports many accessibility standards that help ensure your PowerBI experiences are accessible to as many people as possible among these standards are the web content accessibility guidelines commonly known as WUKAG that help ensure web content is accessible to people with disabilities according to key principles of these guidelines web content including information user interface components and navigation should be perceivable operable understandable and robust or interpretable by a wide range of user agents including assist of technology implementing accessibility features in PowerBI reports can enhance the audience’s experience and comprehension of your reports in several ways firstly accessible reports promote inclusivity by designing PowerBI reports with accessibility in mind you ensure everyone can interact with and understand the data regardless of any limitations this results in a more inclusive and equal environment accessible reports also improve usability the practices used in creating accessible reports such as providing clear and concise titles adding alternative text descriptions for visuals and implementing keyboard navigation typically results in a better user experience for everyone in addition you can cater to different user learning and processing preferences by using various channels or methods to present information like text visuals audio and tool tips multimodal presentation can enhance comprehension and engagement for a wider audience accessibility features can also promote a clear interpretation of the data presented using techniques such as tool tips or descriptive titles can provide more context and reduce the chances of misinterpretation of the data finally accessible reports ensure compliance with various jurisdictional laws and regulations regarding digital content accessibility this keeps your organization within the legal framework and builds trust with your audience to promote accessibility which is vital in data and reporting PowerBI offers a variety of features for designing accessible reports powerbi visuals are fully keyboard navigable and compatible with screen readers facilitating user interaction and navigation powerbi also supports high contrast themes ensuring better readability plus users can use focus mode to expand visuals improving visibility and view data in a screen reader friendly tabular format with the show data table option for users with difficulty with color like color blindness you can use markers to convey different series in visuals like line or area charts similarly PowerBI supports pattern fills in visuals like pie or bar charts which you can use in addition to or instead of solid colors it also has some built-in report themes that consider accessibility guidelines when choosing colors and themes you need to ensure that there is enough contrast between text and background colors and be aware of color combinations that are difficult to distinguish you can add alt text which refers to alternative text descriptions to the visuals in your reports to make them more accessible alt text conveys essential insights even if users cannot see your visuals adding descriptive titles and labels to your visuals also enhances their accessibility as well as their understandability and usability finally some users may have motor difficulties and rely on assistive technologies that for example use keyboard commands for reading and interacting with your report content you can set the tab order of reports to help keyboard users navigate them in an order that matches the way other users visually process the report visuals in this video you discovered the importance of making PowerBI reports easy to use for all users and how to design accessible PowerBI reports which you’ll explore in more detail as you progress through the course accessibility ensures you follow the rules about being fair and inclusive makes your reports easier to use and helps everyone understand your data the usability and understandability of your reports play a vital role in communicating analysis insights and ultimately for stakeholders like Logan to apply data insights to decisions in the business context knowing the importance of accessible reports you need to include features that make your Microsoft PowerBI reports accessible to everyone in this video you’ll learn how to configure and format visualizations to improve accessibility let’s start by adding alt text or an alternative text description to a pie chart visual in an existing report for Adventure Works this is especially useful for people with visual impairment because screen readers can read this text when they select a visual to provide alt text for any object in a PowerBI desktop report start by selecting the object in the visualizations pane select the format section expand general scroll to the bottom and fill in the description in the alt text text box this text box has a limit of 250 characters alt text should include information about the insight that you would like the report consumer to take away from a visual because screen readers read out the title and type of visual you only need to add a description related to the data and main point of the visual for example alt text for this pie chart could be sales figures for February March and April in Europe North America and Asia combined next let’s explore how to set up tab order to improve accessibility by ensuring easy keyboard navigation navigate to the tab order page of the report to set the tab order select the view tab in the top ribbon in the show panes panel select selection in the selection pane choose tab order to display the current tab sequence for your report you can select an object then use the up and down arrow buttons to move the object in the hierarchy you can also select an object with your mouse and drag it to the position you’d like in the list now let’s move on to working with titles and labels to increase accessibility for visuals in your reports make sure that any titles access labels legend values and data labels are easy to read and understand let’s navigate to the titles and labels page of the report and compare the two-line chart visuals the visual on the left has no legend or access labels this makes it difficult to comprehend the insights the chart is meant to convey by including a legend the report consumer now knows which line in the chart corresponds to which product region and including the axis labels of February March and April makes it easier to interpret the trends in the data over time you can also add data labels to your charts to do that select the visual select the format section and find the data labels toggle and turn it to on turning data labels on for this chart displays the order total amount for each month along the lines representing the product regions this makes it easier for the user to interpret the visual at a glance with data labels you can even choose to turn on or off the labels for each series in your visual as well as position them above or below a series while PowerBI does its best to place data labels above or below a line sometimes it isn’t clear for example in this visual the data labels are jumbled and not easy to read to change the default position expand the data labels menu and select above or under from the position drop- down list positioning your data labels above or below your series can help ensure clarity especially if you’re using a line chart with multiple lines with a few adjustments the data labels are now clearer you learned that markers can also help to convey information in visuals like line area combo scatter and bubble charts adding markers improves accessibility by not only relying on color for users to interpret your visual and distinguish between data points for example different series in a line chart to turn markers on select the visual then the format section in the visualizations pane next expand the shape section scroll down to find the show markers toggle and turn it to on the line chart is now displaying markers to change the shape of the markers for each line separately select the format tab and expand markers from there select any series from the series dropdown and change the shape and size of the markers from the shape section lastly let’s explore the focus mode and show data option in PowerBI when a report consumer is examining a visual in a dashboard they can expand it to fill up more of their screen by selecting the focus mode icon in the context menu of the visual this displays only the selected visual allowing for better presentation and focus to return to the main report area select the back to report button to view the data in a visual in a tabular format select the three dots icon on the top right corner of the visual followed by the show data table in the visual context menu this displays the data in a table that is screen reader friendly you can also switch the layout to vertical or horizontal by selecting the layout button on the top right corner of the visual in this video you learned how to format visuals to improve accessibility and use various accessibility features in PowerBI integrating accessibility features improves inclusivity by ensuring users can access and interact with your content and can enhance the overall comprehension and usability of your reports your manager Adio asked you to design a report highlighting critical data within a table visual he wanted you to display data bars with sales figures for immediate recognition and to differentiate specific rows based on their data values for increased readability to implement this request you discovered PowerBI’s useful feature conditional formatting this feature enables the customization of charts based on diverse data criteria enhancing report readability and user engagement in this video you’ll learn about the conditional formatting feature in PowerBI and how to apply it to visualizations conditional formatting is a feature that allows you to apply specific formatting to cells or rows in a table or matrix based on specific conditions this feature is significant when you have vast amounts of data and want to highlight certain elements that meet specific criteria for example if the total profit displayed in a table was a negative value indicating a loss you could highlight this by using conditional formatting to change the value to a red color other visuals also support conditional formatting for example you can format a bar chart so that if the sales target for a specific product category goes beyond a certain threshold that category’s bar will change color conditional formatting offers many benefits it provides immediate insights allowing users to quickly spot trends anomalies and focal points without going through a vast amount of data one by one a more visually appealing report particularly one with colored data or data bars in a table can enhance user engagement making the information more accessible and readable in addition relying solely on manual analysis can result in users missing crucial details however with conditional formatting vital data points are automatically highlighted significantly reducing the potential for errors now let’s explore how to add conditional formatting to a table visual which offers excellent support for conditional formatting select the table visual from the visualizations pane you can resize it as needed in the report view now select the month product region order status order quantity and order total fields from the data pane from the format tab expand style presets and select the alternating rows preset from the drop- down menu if you’d like to resize the columns you can drag the column corners as needed you can also change the column headers by doubleclicking the fields in the column well on the visualizations pane let’s rename sum of order quantity to order quantity and sum of order total to order total now let’s show data bars using conditional formatting data bars display on columns with numerical values like order total or order quantity in this table to show the data bars rightclick the order total field in the column well on the visualizations pane select conditional formatting and select data bars this will display the data bars dialogue box in this data bars dialogue box you can select a color for positive and negative bars positive bars will display when the value is positive and negative bars when the value is negative select the colors and select okay the data bars will display in the order total field with your selected colors you can also change the background color of a cell using conditional formatting let’s try this with the order status column say you want to change the background color when the values are shipped cancelled and processing respectively to do that rightclick the order status field in the columns well on the visualizations pane select conditional formatting then background color this will show the background color dialogue box where you can set the conditions to apply specific formatting type shipped in the value text field and change the background color then select the plus new rule button to add a new rule in this new rule type cancelled and change the background color add one more rule and type processing and change the background color select the okay button and the table will update with the new conditional formatting instantly remember that you can add as much conditional formatting to each field as you want in this video you discovered how to implement conditional formatting in a table visual conditional formatting in PowerBI is an effective feature that you can use to enhance the clarity and usability of your visualizations making your data easily accessible and increasing visual appeal and user engagement during a recent project review you presented a report you carefully designed to the Adventure Works marketing team the presentation went smoothly engaging the audience with crucial data insights however Renee the marketing director noticed that the visual elements of the report didn’t align with the company’s brand colors and style guide renee asked you to update the design elements of the report to reflect the company’s brand aesthetics as you started selecting each individual item and manually adjusting their colors it was clear that this would be a tedious time-consuming task luckily your manager stepped in demonstrating how themes in Microsoft PowerBI could simplify the task at hand and save you a lot of time and effort in this video you will learn more about themes in PowerBI and working with them in your reports themes in PowerBI are predefined sets of colors fonts and visual styles that you can apply to your reports easily and quickly they ensure visual consistency across different reports and can save significant time that would be otherwise spent customizing individual items you can customize themes to align with company color schemes and design guidelines this can help enforce a strong brand identity in your reports and create a more impactful and professional appearance using themes in PowerBI can enhance accessibility in a variety of ways powerbi offers theme customization options you can use to cater to specific accessibility needs such as high contrast themes for users with visual impairments you can also enhance readability by using themes that employ distinct and consistent colors assisting users in differentiating between various data points and categories plus PowerBI provides built-in themes to help make your report more accessible for example by offering themes with colors that are easy to distinguish and visible to colorblind users this can broaden the accessibility of your reports to a more diverse audience not to mention a well-designed theme ensures that reports are userfriendly and easier to interpret let’s take a moment to explore how you can apply these themes in PowerBI you can choose report themes by going to the view ribbon in the themes section select the drop- down arrow and then select the theme you want to apply to your report these themes are similar to themes seen in other Microsoft products such as Microsoft PowerPoint here you can also find accessible themes which you can utilize to create accessible reports select a theme to apply it to your report instantly if you would like to customize the appearance of your PowerBI reports in the future changing the theme allows you to update all your visuals at once for more options you can also browse the collection of themes created by members of the PowerBI community by selecting theme gallery from the themes drop- down menu this opens the themes gallery in your browser in the themes gallery you can select any theme then scroll down and download the JSON file for the theme to install the downloaded file select browse for themes from the themes drop-down menu go to the location where you downloaded the JSON file and select it to import the theme into PowerBI desktop as a new theme this theme will instantly apply to your current report you can customize a theme directly in PowerBI Desktop to do this select a theme that is close to what you’d like you can then customize the theme by making any necessary adjustments to customize a theme from the view ribbon select the themes drop-own button and select customize current theme a dialogue appears where you can make changes to the current theme you can then save your settings as a new theme there are customizable theme settings in various categories you can name your custom theme and define color settings customized text settings such as font family size and color and visual settings which cover background border header and tool tips and adjust page elements like wallpaper and background as well as filter pane settings including background color transparency font and icon color size and filter cards after you make your desired changes select apply to save your theme you can now use the theme in your current report it will also be available in the custom themes section in the themes drop-down menu in this video you learned about themes in PowerBI using themes can significantly enhance the efficiency consistency and accessibility of your reports enabling you to effortlessly maintain a uniform look that aligns with brand guidelines learning how to use and customize themes is an essential skill that’ll help you make visually appealing easy to understand and professional reports quickly you need to present this quarter’s sales data to Adventure Works management team the data you’re dealing with is multifaceted and includes information like product categories regions stores periods and various performance metrics like total sales average sales and profit margin you include various charts and graphs that visually represent the overall sales trends regional performance and product category performance in a dashboard for management however the team also wants more granular and contextual information like store specific performance and individual product performance within categories due to the dashboard’s highle design displaying all these detailed data points could clutter the dashboard and overwhelm users you can use PowerBI’s tool tip feature to deal with this in this video you will learn about how this feature can improve the accessibility of your PowerBI reports and how to add custom tool tips you learned that tool tips in PowerBI display additional information about the data being displayed in your visuals when users hover over different data points you can create custom tool tips by adding extra items to the tool tips field well for a visual tailoring the content to the needs of your report users tool tips can contribute to improved accessibility of PowerBI reports and dashboards in various ways tool tips allow you to provide an extra layer of detailed information without cluttering the dashboard for example hovering over a specific region in a regional performance chart could show the top performing and bottom performing stores within that region this can make complex charts and graphs more accessible to all users including those with cognitive disabilities you can customize tool tips to provide contextspecific details for instance when a user hovers over a bar representing a product category in a bar chart the tool tip can display the top three best-selling products within that category for visually impaired users descriptive tool tips can provide crucial information that might not be readily accessible from the visualization screen readers can read out tool tips making the data more understandable for those with visual impairments tool tips are included in the show data table option for every visual tool tips can also support users that find distinguishing between different segments or lines in a chart based on color challenging such as colorblind users detailed tool tips can help these users by providing the necessary information when they hover over parts of the visualization even if they cannot visually distinguish between the colors users can discover new insights and patterns with tool tips in turn they may facilitate users who need additional support to interpret the visualizations and ensure insight clarity you can also use tool tips to explain or define the metrics and measures used in the visualizations enhancing users understanding of the data a further benefit of interactive features like tool tips is that they can make the data exploration process more engaging increasing user engagement lastly tool tips can help maintain a clean minimalist design in the dashboard by minimizing visual distractions tool tips ensure you don’t overwhelm the dashboard with additional details this allows users to focus on highle trends and patterns and explore details when necessary aiding their overall comprehension of relevant insights now that you know more about tool tips and how they can support report accessibility let’s explore how to configure and customize them in PowerBI if you hover over this ribbon chart PowerBI displays a tool tip that contains contextual information useful for understanding the visual for example hovering over this faded area shows various performance indicators for the Europe sales region such as monthly order totals and rankings the tool tip can also display other information related to this data point if you hover over the solid color it provides the month region name and the sum of order total you can customize this tool tip say for example some stakeholders want additional information related to order quantity and product stock to add this information select the visual open the visualizations pane and scroll to the tool tips field well drag order quantity from the data pane to this well powerbi will automatically convert it to sum of order quantity you can further customize a tool tip by selecting an aggregation function select the arrow beside the field in the tool tips well then select from the available options like sum average minimum maximum and many others as per your requirement you can repeat this process for product stock once tool tips are added to the tool tips well hovering over the same data point on the visualization also displays values for the sum of order quantity and sum of product stock you can also change the position of these fields in the tool tip by dragging them in the tool tips field well in this video you discovered how to add tool tips in PowerBI and how they can make your reports more userfriendly and accessible ultimately tool tips help add extra details without cluttering your dashboards and reports this feature can improve clarity and data comprehension and ensure all users including those with cognitive disabilities or visual impairments can access vital information the sales team at Adventure Works wants a comprehensive overview of their bicycle sales performance from overall company performance down to specific product models and different sales representatives setting up a hierarchy in a Microsoft PowerBI data model is a neat way to organize and explore related data from a general view to specific details in this video you’ll discover more about hierarchies in reports and how to create well ststructured hierarchies in PowerBI so that users can easily explore data at various levels of detail in your reports data hierarchies are a way to organize and structure your report data and visuals in PowerBI hierarchies group related data items by hierarchical relationships while you do not need to organize your data in PowerBI using hierarchies it can make it easier for users to understand the data and the connections between different components hierarchies in PowerBI also support data exploration making it possible for users to navigate from high-level data overviews to more detailed information these hierarchies enable drill mode in your visuals empowering users to drill down into detail within the same visualization or report for example PowerBI automatically creates a date hierarchy when importing date columns from data arranging dates from more general to more specific such as year quarter month and day in a data set with timebased sales data a hierarchy like this enables users to explore the sales totals from a broader point of view such as yearly sales to a more detailed one such as sales on a particular day let’s explore hierarchies further by considering the example of an adventure works data set containing sales records you can create a hierarchy by organizing the data points into a structured framework that starts with bike as the main category and further breaks down into subcategories which you can break down further into specific product names this way stakeholders can understand the overall sales of bikes at a glance and explore the data at a more detailed level such as the sales performance of mountain bikes versus road bikes or the sales performance of individual products similarly for a data set containing geographical sales data you can structure the data according to the hierarchy of continent country city area this way report users can drill down into the data by geographic level from exploring global trends to examining local successes or difficulties so how can you create hierarchies like these in PowerBI let’s take a moment to explore the process you can start by importing your data set in this case the adventure works sales data set into a blank PowerBI report you don’t need to transform any data then select the sales table followed by the load button if you open the data pane you will notice that PowerBI has automatically created a hierarchy with all the date fields such as estimated delivery date and order date for example if you expand order date then date hierarchy it shows the dates organized according to year quarter month and day how can you create a hierarchy of your own let’s create a hierarchy for product related data using the product category product subcategory color and product name fields imagine how this hierarchy should be constructed the product category should be the overarching or main category at the top rightclick the product category field in the data pane and select create hierarchy from the context menu this will immediately create a new item in the data pane called product category hierarchy if you expand this item the product category field is nested inside it to add more fields to this hierarchy right click on a field for example the product subcategory and select add to hierarchy from the context menu then select the newly created product category hierarchy the product subcategory field will be added to the product category hierarchy following the same process let’s add product color and product name fields to this hierarchy you can remove any field from the hierarchy by right-clicking on it and selecting delete from model you can instantly add a table visual to your report area by checking the check box before the hierarchy on the data pane you can resize this visual as needed alternatively you can create a visual and then apply the hierarchy to it select the tree map visual from the visualizations pane and resize it as needed while keeping it selected mark the checkbox of the product category hierarchy in the data pane now select the order quantity field the tree map visual will be ready with drill down mode instantly and you can dig down into as many levels of data as you want you can turn the drill down mode on by selecting the down arrow on the top right corner of this visual and make the report interactive understanding report hierarchy enables you to organize data for yourself and the stakeholders working with the report you’re creating hierarchies facilitate an understanding of how different data fields relate making the data less confusing and more userfriendly with hierarchies users can start with the bigger picture and smoothly zoom into different levels of detail as needed empowering them to make a range of informed decisions imagine you are asked to design an interactive visual for a report that displays crucial information while allowing users to delve into any chart element and engage more deeply with the associated data points users should have the flexibility to navigate through multiple layers and return to the main report as needed while drill down only allows users to navigate from a broader to more detailed level within the same visualization with PowerBI’s drill through feature users can navigate from a visualization to a separate detailed report page focused on the selected data point in this video you’ll learn how to configure the drill through feature in a PowerBI report for Adventure Works let’s start with a pie chart displaying total sales figures by month this visual provides stakeholders with a way to compare monthly order totals at a glance suppose you want to direct users who require more detail about sales performance to a separate page that displays the sales data broken down by region and order status you can add a new page to your report by selecting the plus icon at the bottom to add a page title doubleclick on this new page title and type regional sales add a table visual to the page and resize it accordingly then select month from the order date hierarchy order quantity order status and product region the table is now displaying all of this data at once so how can you have users land on this new page because the pie chart displays total sales by month you can link the table to the chart using the shared month field while keeping the table selected drag the month field from the order date hierarchy to the drill through field well notice how a back button is added above the table visual you can now press the control key on the keyboard and select this button to return to the main report returning to page one in our report area when you right click on any slice of the pie chart for example April a new item in the context menu called drill through displays select regional sales and notice how the table is now showing only the sales records for April returning to the main report if you rightclick on the March slice and select drill through followed by regional sales you are shown the regional sales table for only March’s sales data suppose some stakeholders also want insights into the performance of different categories of bikes let’s create a new page that displays the data by bike categories sold in every month and link it to the main chart using the drill through feature add a new page and rename it bike categories select a card visual resize it as needed and select month from the order date hierarchy on the data pane dragging it into the fields well next select a multirow card and resize it as needed select the order quantity and product category fields on the data pane drag the month field to the drill through well to link the new page to the main chart now let’s return to the main page and explore the new addition if you select any slice for example March there are two items available under the drill through menu in the context menu if you select bike categories you will be taken to the bike categories page but now data is showing for only March you can add as many pages as you need and link them to other report pages using the drill through feature in PowerBI in this video you learned how to use the drill through feature in PowerBI this feature is essential for professional and real life business data visualization enabling you to create multi-page reports with easy navigation allowing users to dive deeper into the data as needed without sacrificing clarity in reporting and visualization sorting and filtering functions can help users better understand the data presented in reports highlight patterns and trends and focus on information that’s relevant to them in this video you’ll discover how to apply and manage sorting and filtering features in PowerBI with PowerBI you can sort or order the data in your report visuals based on different data fields like ascending or descending order for example in a report on sales performance sorting a column chart depicting sales performance by region in ascending order makes it easier for stakeholders to identify the lowest and highest performing sales regions an unsorted visual can create confusion and make the visual unreadable and difficult to understand consider this line chart showing sales trends for the quarter the chart is sorted by sales amount by default and the months are not presented in logical chronological order if you do not sort the visuals by month users might have difficulty understanding or misinterpret sales performance over time as at a glance it seems like sales are declining however when properly sorted by month it is clear that sales increased in all three regions over time there are also many filtering options available to you when creating your reports filtering enables you to select specific data points or subsets of data as needed to ensure the data presented is relevant and clear this is helpful for excluding certain values when representing your data with different visuals for example this report displays the combined total of orders from different sales regions it includes all types of orders including cancelled orders or those still being processed in this example you may want to use filtering to exclude these data fields if you add an order status filter to show only the numbers for orders that have been shipped the picture changes dramatically by filtering out canceled orders and orders still being processed stakeholders can focus on completed orders and gain a better overall picture of actual sales performance in the different regions now that you know more about the sorting and filtering features let’s explore how to use them in PowerBI you can sort any chart in PowerBI by data fields in a variety of orders depending on your needs to sort select the three dots on the top right corner of the visual followed by your preferred sorting method some visuals like this line chart give you the option to sort the legend as well arranging the different categories presented in the legend in a particular order other visuals like this pie chart offer only sort access which refers to sorting data points along the horizontal or vertical axes in a particular order from the axis you can select various data fields and then also select to sort them in ascending or descending order let’s sort the stacked column chart in the bottom left corner of the report by month currently it is sorted by order quantity in ascending order select the three dots on the top right corner of this chart select sort axis then month followed by sort ascending the chart is now sorted by month in ascending order beyond sorting PowerBI offers powerful filtering capabilities there is a filters pane that you can use to apply different filters to the whole report page as well as individual charts let’s filter the line chart in this report to show the order total for the shipped orders only notice the filters on this visual section in the filters pane let’s filter the line chart in this report to show the order total for the shipped orders only here you can select relevant fields and apply filtering for example you can exclude Asia from this line chart by selecting the product region and then checking every region excluding Asia the line chart will update instantly it now displays sales data for Europe and North America only you can also add other filters like order status here drag the order status field from the data pane to the add data fields here box now check shipped the line chart will update and display the order total for only shipped orders instead of individually applying filters you can apply filters on all chart items at once from the filters pane unselect any chart item by selecting a blank area on the page and open the filters pane if it’s not opened yet notice the section called filters on this page this is where you can drag the relevant data fields and set filters for all visuals on the report page let’s drag the order status field from the data pane to this section and check shipped notice how all visuals on this page reflect this change instantly if you have a multi-page report you can apply filters to all pages by dragging any field to the filters on all pages section in the filters pane and then by setting the filters you can also remove a filter anytime by selecting the field you want to remove in the filter pane followed by the cross or X icon in the top right corner in this video you explored sorting and filtering discovering how these can provide stakeholders with a clearer picture of their data these features are fundamental to data analysis and reporting in PowerBI applying sorting and filtering to your visualizations makes it possible for stakeholders to focus on the vital relevant data points enabling faster datadriven decision-making imagine you’re presenting a report to key decision makers at Adventure Works one visual displays sales across a quarter while another portrays product categories arranged in descending order based on the number of orders the stakeholders request more interactivity in the report for example by selecting a specific month on the sales chart they wish to see corresponding product categories emphasized in the other chart this provides clarity on which products sold the most during a particular month microsoft PowerBI’s cross filter and cross highlight functionalities make it possible for you to emphasize related data across multiple charts or remove unrelated data in this video you’ll learn about these exciting features and how to use them in your PowerBI reports cross filtering refers to the practice of selecting an item or data point on one visual which in turn filters out unrelated data in another visual it creates a relationship between two separate visuals such that a selection in one visual affects the data shown in another for example with cross filtering selecting the mountain bikes column in a report will filter the table visual to display only sales data related to this product category the other product categories are no longer shown with cross highlighting when you select a data point in one visual it highlights the related data in other visuals instead of filtering out unrelated data this is the default behavior for most visuals in PowerBI to illustrate with cross- highlighting selecting the mountain bikes column in one chart highlights the sales of mountain bikes in February March and April for each region in the stacked bar chart unlike cross- filtering it still displays unrelated data however it’s dimmed or faded let’s take a moment to explore these cross filter and cross highlight features in PowerBI in this report there are four different visuals displaying various sales data let’s start by examining how default cross highlighting works in PowerBI using the stacked bar chart in the top left corner if you select any region for example Europe it highlights the bar related to Europe and dims the other bars notice how all other charts instantly reflect your selection and highlight data that is related to your selection in the stacked bar chart the bright areas represent data related to Europe and the dim areas represent data from other regions you can press the shift key on the keyboard and select multiple regions or even multiple units in the stacked bar chart every time your selection changes the other charts respond automatically by highlighting the related data take note that the table visual behaves differently rather than fading the irrelevant data it hides them based on your selection this is called cross filtering to clear your selection you can select the selected item again to return to normal view if you select data points on any of the charts on this page the other charts will cross highlight based on your selection instantly for example if you select mountain bike on the stacked column chart in the top right corner the other charts respond just remember that cross- highlighting means irrelevant data will remain visible but dimmed and cross filtering means irrelevant data will be hidden you can change the default behavior of interaction in PowerBI reports from cross- highlighting to cross filtering to do that select the file menu options and settings and then options this opens the options dialogue box from here select the report settings from the left sidebar and then check change default visual interaction from cross highlighting to cross filtering in the visual options section and select okay now if you select mountain bike on the stacked column chart notice how the stacked bar chart on the left reacts it is not showing the dimmed areas anymore and is displaying data related to the mountain bikes only in other words cross filtering hides all sales data unrelated to mountain bikes based on your selection in the other visual cross filtering and cross- highlighting are powerful features in PowerBI that can enhance the clarity and effectiveness of your reports having the ability to enable one chart to influence another you offer a more interactive and intuitive experience for report users this approach not only makes your report more dynamic but also simplifies the data analysis process as you create more interactive reports for your audience filtering data becomes increasingly important at Adventure Works the CEO asks you to set up a sales report that she can use in a presentation with the company’s shareholders next week you want to make this report as useful as possible for the CEO but unfortunately her schedule is busy between now and the presentation you know she will be filtering data but cannot predict every filter she will apply however you know that she’ll most likely filter the data by region and product this is a perfect scenario to use a slicer in Microsoft PowerBI in this video you’ll learn what a slicer is how it works and how to apply slicers to your reports a slicer is a great way to apply common filters to a report page quickly when added to a report you can use the slicer to display a list of commonly used or most important filters the slicer can be displayed in multiple formats depending on the field on which the slicer is filtering for example if you apply the slicer to a field with text data type the slicer can display as a list of unique entries in that field similarly if you apply the slicer to a field with a date type the slicer can be displayed as a date range selector however no matter which format the slicer is displayed in the underlying behavior is the same the slicer provides a list of filters that users can apply to the visualizations in the report when a filter is selected the visualizations will immediately update to reflect the filtered data it is important to note that you do not need to connect every visualization in a report to the slicer as a PowerBI data analyst you can configure which visualizations are impacted by the slicer selected filters you can also synchronize multiple slicers so that when a slicer applies a filter other slicers on different pages are updated to reflect the selected filter this is useful when filtering through multiple layers of data for example if you had one slicer for regions on a sales page and another slicer for regions on a costs page when you select a specific region the region is selected on both slicers this helps improve the user experience as filtering remains consistent as you navigate multiple pages of the report now let’s explore how to configure a slicer in a PowerBI report let’s begin with an existing sales report for Adventure Works the report has two pages sales summary and sales detail on the sales summary page you need to apply two slicers one for region and one for products let’s start by adding the region slicer navigate to the visualizations pane and select the slicer icon then select the slicer in the report and navigate to the data pane in the data pane select the region field in the region table notice that the slicer now lists all of the sales regions of Adventure Works if you select the entry for France in the slicer this will apply a filter for sales data belonging to France notice that when you apply the filter the visualizations update immediately next let’s add the slicer for products again navigate to the visualizations pane and select the slicer icon select the slicer in the report and navigate to the data pane this time select the product field in the product table the slicer now displays the lists of all products now let’s confirm that each visualization is connected to the slicers to do this navigate to the format option in the ribbon menu and select edit interactions each visualization will show a filter icon indicating that filters are being applied if you want to disconnect the slicer select the none icon in the visualization remember that you can synchronize the slicers across pages to reflect the current filter context let’s configure two slicers to synchronize with each other first I’ll create the same region slicer in the second page of the report by adding the slicer visualization and again applying the region field from the data pane next navigate to the view menu and select sync slicers this opens the sync slicers view select the region slicer in the report it is now displayed in the sync slicers view expand the advanced options drop- down menu enter the name of a group you want this slicer to belong to for this scenario let’s name the group region there are two additional options here sync field changes to other slicers and sync filter changes to other slicers for this report you need to select both options as you want to sync the slicers with each other when the viewer interacts with them and also for maintainability purposes so that if you change the filtered field in the data pane both slicers will update now select the region slicer in the first page and navigate to advanced options again once again enter the group name region while you can enter any name for the group you must name it consistently if you misspell the group name on a slicer it won’t synchronize correctly again select sync field changes to other slicers and sync filter changes to other slicers now it’s time to test the report when applying a filter using the region slicer for example by selecting France the visualizations on the first page update now when you navigate to the second page the region slicer on this page is already set to France and the data is filtered you learned about adding slicers to PowerBI reports in this video slicers are a dynamic tool that you can use to enhance the interactivity of your reports while also improving the user experience as you design reports for different audiences it is essential to consider their filtering needs and identify common or important filters to apply the world of apps has rapidly expanded over the past decade from apps on your mobile phone to apps in the web browser on your desktop with people already familiar with the app experience what if you could make your reports more app-like this could improve the user experience for your target audience immensely and encourage them to interact with and use the reports you build microsoft PowerBI comes with a built-in set of buttons that you can add to your reports to increase interactivity from navigation between pages to quickly applying filters in this video you’ll discover more about buttons and how they’re invaluable in your toolkit for building interactive reports buttons in PowerBI come with many configurable options the two most common configurations you will work with are the visual style and the action you can change the visual style of buttons to different shapes such as rounded rectangles pillshaped and arrows you can also change the colors of the buttons and their text if the business you work for already has other applications these options help you align with potential existing app and user experience guidelines the action of the button is how it behaves when a user interacts with it let’s explore the different options available back returns the users to the previous page of the report this action is useful for drill through pages bookmark allows users to capture or bookmark a particular state in the report it presents the report page that’s associated with a bookmark that is defined for the current report you’ll learn more about this later drill through navigates the user to a drill through page filtered to their selection without using bookmarks page navigation also involves navigation without using bookmarks it navigates the user to a different page within the report q&a opens a Q&A explorer window when your report readers select a Q&A button the Q&A explorer opens and they can ask natural language questions about your data apply all slicers and clear all slicers buttons apply all the slicers or clear all the slicers on a page lastly web URL opens a web page in a browser these buttons provide different means through which users can engage with your reports let’s explore how to enhance the interactivity of a report by adding buttons this PowerBI sales report has two pages sales summary and sales detail on the sales summary page there are slicers available let’s start by configuring buttons for page navigation to add a button navigate to the insert tab in the ribbon select the buttons dropdown and choose right arrow position the arrow in the top right corner of the report select the button in the report to open the format pane the format pane allows you to configure the different options of the button for now let’s expand the action section in the format panel in the action section first select the off button so that it changes to on enabling the action next select page navigation as the type and then choose the sales details page as the destination now let’s navigate to the second page of the report again navigate to the insert tab in the ribbon select the buttons drop-down and choose left arrow in the action section select page navigation and then the sales summary page as the destination finally position the arrow in the top left corner of the report page you can test the buttons by holding the control key and selecting the buttons given that there are slicers on the sales summary page you can ensure a good user experience by allowing the report viewer to clear the slicers quickly to do this navigate to the insert tab in the ribbon select the buttons drop-down and choose clear all slicers let’s position the clear all slicers beside the slicers on the report page for ease of access now when the viewer applies a filter using the slicers they can select the clear all slicers button to reset the state of all the slicers these simple changes will help improve the user experience of the report buttons are a useful way to improve the user experience for your target audience when building your next report consider how you can use buttons to simplify navigation add filtering and provide access to the Q&A feature as you progress with your learning you’ll explore how this feature is particularly useful when building reports for mobile devices at the end of the last financial year Adventure Works conducted a customer survey to determine how happy customers were with the way the company handled product orders and deliveries unfortunately a common complaint was that it took too long for orders to arrive after being placed to investigate the possible causes of this delay you have created a report in Microsoft PowerBI that tracks data from different sources including storefront orders warehouse fulfillment and courier delivery because you plan on sharing this report with multiple departments you know each department will want to filter the data specifically to align with their responsibilities rather than expecting users to apply complex filters they are unfamiliar with to isolate the data they’re looking for your manager suggests using the bookmarks feature to make this data easily accessible to them in the next few minutes you’ll learn what bookmarks are and how to add them to your reports in PowerBI bookmarks in PowerBI are a way to capture the current state of the report you are viewing and share this state with other viewers for example if you apply filters to a report you can save the filtered state as a bookmark viewers can then select the bookmark and the report will change to the filtered state you established when adding a bookmark there are four state options that you can save data properties such as filters and slicers display properties such as visualization highlighting and visibility current page changes which present the page that was visible when you added the bookmark and selecting if the bookmark applies to all visuals or selected visuals in the adventure works example bookmarks will enable different users to focus on different parts of the data without setting up filters every time you can also highlight specific insights and create customized views relevant to the different departments by default all states are saved for all visuals if you modify a report after you create a bookmark any visualizations not present when you created the bookmark will appear in a default state so remember if you change a report you should make sure to update your bookmarks to reflect the changes given that bookmarks in PowerBI are excellent for creating tailored interactive reports that users can easily navigate and extract crucial insights from it’s essential to know how to create them let’s take a moment to find out let’s start by filtering data in an existing sales report in PowerBI with two pages sales summary and sales details let’s filter data related to the France sales region by selecting France in the region slicer next let’s filter further by selecting the Mountain 200 Black 38 model in the product slicer now that the report is in a filtered state let’s create a bookmark to do this select view in the ribbon menu and then bookmarks this opens the bookmarks panel to create the bookmark select the add button this saves the state and creates a new bookmark with a default name to rename the bookmark select the three dots beside its name and select rename for this bookmark let’s rename to France if you don’t want the bookmark to open the current page you can select the three dots beside the bookmark again note that current page has a check mark beside it indicating that it is enabled for the bookmark to disable it select current page now let’s test the bookmark clear all slicers so that the report is reset if you open the bookmark panel again and select the bookmark you can observe the filters reapplied to the report bookmarks in PowerBI empower you to streamline data exploration and customize and tailor reports based on user needs by capturing states of reports such as data and display properties bookmarks allow different users to filter and focus on specific aspects of the data easily bookmarks are also a valuable tool for enhancing interactivity and creating tailored user-friendly reports that can support datadriven decision-making adventure Works has embraced the datadriven decision-making unlocked by Microsoft PowerBI however as you’ve continued building and updating various reports you’ve identified a significant time cost to maintaining them and when you need to add new visualizations to the company’s many reports moving all the existing individual visualizations is very timeconuming the lead data analyst suggests grouping the visualizations to make maintenance easier this video will demonstrate how to group and layer visuals to improve maintainability let’s start with an existing Adventure Works sales report the report has four visualizations sales revenue by region sales revenue by month sales units by region and sales units by month to make maintenance more manageable let’s create two groups one for the sales revenue visualizations and one for sales units visualizations to do this first select the sales revenue visualizations by holding down the control key and selecting the two visualizations then navigate to the format tab in the ribbon menu and select group next select the two sales units visualizations by holding down the control key and selecting them again navigate to the format tab in the ribbon menu and select group notice that now when you select and move the sales revenue by product visualization the sales revenue by month visualization moves too this is because they are grouped you can view all existing visualizations and groups using the selection pane to open the selection pane navigate to the view tab in the ribbon menu and select the selection button the groups created in this video are listed under the layer order tab in the selection pane inside each group are the visualizations that belong to the group to improve maintainability let’s rename the groups let’s doubleclick the first group’s name and rename it sales revenue similarly doubleclick the second group’s name and rename it sales units the ordering of groups and visualizations is important in the pane as this determines how the elements are layered for example moving the sales revenue group to overlap the sales units group results in this group displaying under the sales units group visually to change the visual order you can select the revenue group in the selection pane and select the upward arrow so that it moves above the units group in the layer order now suppose after reviewing the groups with a colleague you conclude that managing the visualizations as a single group would be better in the selection pane you can select and drag both sales units visualizations in the units group to the revenue group notice that the units group is automatically removed as there are no more visualizations belonging to it let’s add a title to the report page which is now more maintainable through its grouped visualizations and descriptive group name select the insert tab in the ribbon followed by text box in the text box add the text sales detail then select all the text in the text box and change the font size to 24 now let’s organize the layout of the report select and drag one of the visualizations and the group will move move the group to the bottom of the report page then move the report title to the top of the report and adjust its sizing as more pages are added to a report and future updates are made time is saved by organizing visualizations into groups in this video you discovered how to group and layer visuals in PowerBI grouping visualizations is a crucial activity for improving the maintainability of reports make sure to consider the benefits of grouping visualizations and how to implement groups effectively when designing reports in PowerBI data analysis expressions or DAX is a powerful language for creating custom calculations however DAX is contextsensitive so it’s important to understand how context influences the reports you build with it in this video you’ll explore how visualizations impact DAX context adventure Works is analyzing its total annual revenue the company needs to identify its total revenue based on different product categories as part of its analysis once the analysis is completed the results must be delivered to management as a visual presentation adventure works can use DAX filter context in visualizations to perform its analysis and create its reports let’s begin with a recap of what we mean by the term context in Microsoft PowerBI in data analysis context comes in two primary forms row context and filter context row context refers to the table’s current row being evaluated within a calculation whereas filter context refers to the filter constraints applied to the data before it’s evaluated by the DAX expression in other words you can determine which of your reports rows or subsets should be included or excluded from the calculation the interaction between DAX evaluation context and visualization is crucial for creating dynamic and interactive reports and dashboards each time you interact with the data like selecting a portion of a chart or an item in a slicer you alter the filter context let’s consider an example to find out more about how this works adventure Works can create a DAX measure of profit margin and then create a visual in the report canvas from this measure the visualization displays the profit margin of the entire data set because that is the current context let’s learn more by exploring how Adventure Works make use of DAX filter context in its visualizations adventure Works begins its analysis of its product categories by creating a DAX formula that calculates the sum of the quantity of each product sold multiplied by the unit price in the sales table when executed the formula computes the sum of all sales amounts the result of this formula is that Adventure Works has sold $3.5 million worth of goods over the past year however when this measure is added to a PowerBI report as a visual like a bar chart for example it isn’t very engaging it offers limited insight into the sales data by displaying only the total revenue the visuals become more engaging and display meaningful insights when used with filter context for example Adventure Works could generate more useful insights by comparing or contrasting total sales revenue across product categories by comparing sales of bicycles to other categories Adventure Works discovers that bicycles outsell all other products by a considerable amount adventure Works can still view the total revenue but each of these revenue figures now has a meaning which is the total revenue for each product category powerbi is displaying the sum of all sales within a specific product category but now it’s computing different values for different cells because of the evaluation or filter context total sales by category adventure Works can enhance these visuals further by using the year category from the date table as another filtered context or attribute once this context is applied a new visualization is generated each table cell shows a different value even if the formula is always the same you can place multiple fields in both rows and columns this is because both the row and column sections of the table define the context as you discovered earlier the interaction between the DAX evaluation context and the visualization alters the filter context interaction affects DAX calculations and alters the results in the visualizations let’s explore this process using an adventure works data set now that Adventure Works has calculated its annual total sales it creates two slicers in its report one for the region and the second for the month when a specific region is selected the profit margin measure recalculates and the chart dynamically adjusts adventure Works can also select a month to implement month as an additional filter on top of region the measure now displays the profit margin value for a specific region in a specific month the contextsensitive nature of DAX is a powerful feature it enables dynamic calculations based on the context in which DAX computes the formula by understanding how context impacts DAX you can create more accurate insightful and dynamic reports to tailor to specific business scenarios congratulations on completing the navigation and accessibility module of the data analysis and visualization with PowerBI course this module taught you essential skills for creating accessible well ststructured and interactive reports let’s recap what you accomplished you started with how to design accessible reports you discovered the significance of accessibility and the many benefits of implementing accessibility features in PowerBI such as improving your reports inclusivity usability and understandability you learned about some of the PowerBI features that can support the accessibility of your reports including keyboard navigation and tab order screen reader compatibility accessible themes and high contrast support focus mode and displaying data in a screen reader friendly table format markers and pattern fills and alt text titles and labels you explored how to enhance accessibility by formatting and configuring your visualizations using these accessibility features learning how to design reports that cater to a diverse audience who can all access and comprehend the information you present conditional formatting was a key focus empowering you to apply dynamic rules to your visualizations that enhance their clarity and usability you also engaged with themes in PowerBI and the ways they can enhance the accessibility of your reports such as enhancing readability in addition to other benefits such as visual consistency and enhancing clarity and brand identity in the process you learned how to apply configure and customize themes in PowerBI to further guide your journey you were introduced to best practices for designing accessible reports you then put your newfound knowledge of accessibility into action by applying formatting themes and design best practices to create an accessible report for Adventure Works you went on to learn how to enhance the accessibility of your reports even further by adding custom tool tips to your visualizations you also explored the many ways tool tips can improve accessibility in your reports such as making the data more accessible to users with visual impairments as tool tips are screen reader compatible and making complex charts more understandable to users including those with cognitive disabilities next you focused on report navigation and filtering you began by comprehending the concept of report hierarchies and learned how to configure them effectively in your reports these hierarchies empower users to drill down into your data as needed encouraging user interaction and engagement and enhancing user understanding you also learned how to configure PowerBI’s drill through feature which empowers users to navigate from a visualization to a separate detailed report page focused on the data point they select another key area of exploration was sorting and filtering data which are fundamental to data analysis and reporting in PowerBI you gained proficiency in applying and managing these techniques in PowerBI reports to enhance data presentation and exploration and highlight relevant insights you were then introduced to the concept of cross filtering and cross highlighting providing you with the knowledge to configure interaction behaviors for visualizations improving the interactivity of your reports whereas cross highlighting highlights the related data in other visuals when a user selects a data point in one visual cross- filtering filters out or removes the unrelated data from the other visuals you applied your skills by sorting and filtering marketing data in a report emphasizing and contextualizing the importance of sorting and filtering in the real world after that you took your PowerBI reporting skills to the next level with an indepth exploration of creating highly interactive reports you discovered the dynamic nature of slicers and how they can contribute to enhanced report interactivity plus you explored using buttons to add more interactivity to your reports and learned how to customize them to suit your needs you learned how to improve user experience and storytelling in your reports by adding bookmarks as well as how to add URLs to enrich your PowerBI reports further grouping and layering visuals provided a way to efficiently manage the visuals in your reports making report maintenance more efficient you put your skills into action by creating an interactive report demonstrating your proficiency in using the drill through button slicer and bookmark features finally you recaped the importance of filter context in DAX measures and how it impacts visualizations throughout this module knowledge checks were strategically placed to assess your understanding of key concepts covered in relation to designing accessible reports navigating and filtering data effectively and creating interactive reports keep up the excellent work and get ready to explore designing accessible dashboards and data sharing bringing you closer to becoming a proficient PowerBI data analyst and visualization expert the marketing director at Adventure Works receives an overwhelming number of data reports monthly sales numbers customer demographics market trends and product performance metrics all need to be analyzed and interpreted and she needs your help doing this luckily you know about dashboards a tool in Microsoft PowerBI that can help transform this data into valuable insights but what is a dashboard and how does it differ from a report in this video you’ll explore the concept of dashboards in a business context you’ll discover their importance functionalities and how they serve as key tools in data analysis and decision-making processes let’s start by exploring what a dashboard is consider the dashboard of a car it presents critical data like speed fuel level and engine temperature in a consolidated visually understandable way this information allows you to make necessary decisions while driving similarly in the business context a dashboard visualizes the critical information required to accomplish specific objectives skillfully arranged and consolidated on one screen for example a sales dashboard for Adventure Works might display total sales sales by region top selling products and trends over time dashboards can present data from different sources in various forms making it easier for stakeholders to understand they are interactive and real time allowing users to in essence have a conversation with their data and drill down into specific details when needed say you notice an unusual sales spike in one region at Adventure Works with an interactive dashboard you can delve deeper into the data inspecting the specifics of the sales transactions identifying the products involved and even the key customer demographics contributing to this sudden surge dashboards play an important role in today’s competitive business world where informed decision-making is vital to success with dashboards you can transform raw data into actionable insights providing a comprehensive view of business performance at a glance dashboards can serve as an essential navigational tool for tracking various aspects of business performance for example for Adventure Works dashboards can bring the different threads of data on sales trends production efficiency customer behavior and market dynamics together presenting a comprehensive view of the overall health and trajectory of the business suppose there’s a sudden drop in sales in a specific sales region without a dashboard recognizing this issue would require sifting through vast amounts of sales data a time-consuming process with the potential for oversight however a well-designed dashboard can quickly highlight this anomaly triggering a timely investigation and corrective action dashboards also play a vital role in promoting a culture of transparency and accountability within an organization they act as unbiased databacked mirrors that reflect the true performance of different business units against set targets and benchmarks by doing so dashboards can foster a sense of ownership and accountability among team members encouraging continuous improvement dashboards make data accessible to everyone break down barriers and encourage data sharing between teams as well as promote a shared understanding of business performance across departments but what is the difference between a dashboard and a report though often used interchangeably dashboards and reports serve different purposes in Microsoft PowerBI a report in Microsoft PowerBI is highly interactive users can slice and dice the data drill down into details apply filters and explore various facets of the data within the report itself in essence a PowerBI report provides an indepth interactive multi-perspective view of a specific data set or topic it’s like an exploratory journey through your data a dashboard on the other hand is like a summary or highlight reel of one or more reports it’s a one-page overview of the most important metrics or KPIs selected from the various pages of one or more reports a useful way to consider the difference between a dashboard and a report is to compare it to a news bulletin versus an indepth news article the news bulletin or dashboard provides key highlights summarizing the most essential points if a particular news point catches your attention you can read the full news article or report for a more detailed understanding as you continue your data analysis journey remember that the true power of data lies not in its volume but in its usability both dashboards and reports are vital navigation tools in the sea of data they provide visibility drive accountability facilitate understanding and ultimately inform decision making addio your manager at Adventure Works asks you to create a dashboard in Microsoft PowerBI that highlights key performance indicators and insights from a sales analysis report you and your team created this screencast will explore how to create and configure a dashboard in Microsoft PowerBI as well as how to configure the mobile view for the dashboard and customized themes previously you learned that a dashboard is a consolidated display of multiple visualizations reports and other data in a single layout to create a dashboard open your Microsoft PowerBI service and navigate to your workspace in the left navigation pane then from your available workspaces select the adventure works workspace let’s create a new canvas where you can pin your visuals on the top left corner select new and then select dashboard a popup appears asking you to name your dashboard let’s name it Adventure Works Sales Dashboard after typing the name select create once you have created your dashboard you can start adding visuals return to your workspace and open the sales report you and your team created each visualization in your report has a pin icon in the top right corner select the pin icon for the total sales by product category bar chart this opens a dialogue box where you can choose where to pin this visual select your newly created Adventure Works sales dashboard from the drop-down menu the bar chart is a good starting point for your dashboard as it provides a broad overview of sales distribution by product category then pin the monthly sales trends line chart this chart shows the sales pattern over time which is critical for identifying seasonal trends or growth patterns in the modern business landscape having mobile accessible data is key with PowerBI’s mobile layout feature you can configure your Adventure Works sales dashboard to be mobile friendly ensuring stakeholders can access insights on the go to switch to mobile view go to the main navigation bar find and select the edit menu from the drop-down options select mobile layout to switch the view from desktop to mobile once you select the mobile layout your screen adjusts to replicate a mobile devices screen size now instead of a wide canvas it displays a vertical layout this canvas is blank but don’t worry all your visuals are safe and where you left them you just need to decide which visuals to show on the mobile layout and where to place them a list of all the visualizations in your dashboard is displayed on the right side of your screen each visualization has a pin icon next to it to select the visuals you’d like to appear in the mobile layout select the relevant pin icons selecting these pins indicates the visuals you’d like to appear in the mobile layout you can select and drag each visualization to move it around on the canvas you can also resize each visualization by dragging its edges finally let’s explore how to change the theme for the Adventure Works sales dashboard start by navigating to the Adventure Works Sales Dashboard you just created in the upper menu find and select the edit menu this opens a drop-own list of view options select dashboard theme another drop-own list appears select switch theme a popup window displays various pre-made themes you can apply to your dashboard choose a theme that you feel best visually represents the data and select it then select save the theme is now applied to your dashboard and you’ll immediately observe the changes in color and style applied across all your visualizations and there you have it you now know how to create a dashboard configure the mobile view and customize your dashboard theme foundational knowledge that is vital to using dashboards in PowerBI and conveying key insights from your reports with its large scale of operation Adventure Works generates immense data volumes daily as a data analyst your role involves harnessing this data making sense of it and transforming it into insights that inform strategic decision-making but with such a large mass of data where do you start microsoft PowerBI has the answer it’s quick insights and Q&A features over the next few minutes you’ll discover how to optimize the usability of your PowerBI dashboards by adding quick insights and utilizing the Q&A feature you’ll also learn how to set up quick insights and integrate the Q&A feature into your dashboards quick insights is a feature in PowerBI that automatically searches data sets to discover and visualize potential insights it identifies patterns trends outliers and other useful insights that may not be immediately obvious for example uncovering sales patterns to help the marketing team at Adventure Works target their campaigns more effectively quick Insights not only presents the insights in an easy to understand format but also explains how it arrived at these insights this way even if you’re new to data analysis you can follow along and gain a solid understanding of the data let’s explore the steps to set up and use the quick insights feature in PowerBI open your Microsoft PowerBI service and navigate to your workspace on the left hand side of the screen here different data sets and reports shared with you are displayed select the data set or report you want to analyze open or select the ellipsus menu and get quick insights to initiate the automated analysis powerbi starts an automatic scan of your data during this process the function applies various machine learning algorithms and statistical functions to your data set it searches for potential patterns trends correlations outliers and other interesting attributes this process can take a few minutes depending on the size and complexity of your data set after the scan you can access the insights by selecting view insights this will lead you to a new page filled with cards each insight card visually represents a particular pattern or trend in your data hover over the visuals or select them to display more details this is where your data interpretation skills come into play in this case you have to understand what each of these visuals represents and how it relates to the Adventure Works business context if you find any insight particularly useful or wish to share it with others in your team you can pin it to a dashboard to do this hover over the card and select the pin icon in the top right corner of the card then select the dashboard you want to pin it to or create a new one now let’s move on to the Q&A feature the Q&A feature is a natural language processing tool in PowerBI it allows you to ask questions about your data in plain English and provides answers in the form of charts graphs or simple numeric results this feature is invaluable in the business context because it allows users of all levels to interact with their data and find specific answers without requiring deep technical knowledge the key advantage of the Q&A feature is its flexibility you ask questions ranging from simple questions like “What was the total revenue last quarter?” to more complex ones such as “Which product had the highest sales growth rate last year?” The more you use the Q&A feature the more it learns and adapts to your question style offering even more relevant and precise answers over time let’s explore how to set up and use the Q&A feature in PowerBI at the top of your dashboard there’s a field ask a question about your data this is the Q&A box place your cursor in the box to ask your question type your question in normal conversational language as you type PowerBI Q&A will start offering suggestions and autocomplete options based on the data in your dashboard for instance if you’re interested in sales trends you could type “What were the total sales last month?” or “Show sales by product category.” As soon as you finish typing your question PowerBI Q&A generates an answer in the form of a data visual such as a bar chart line graph or table this visualization is based on the best interpretation the Q&A can make of your question if the interpretation is not what you intended you can rephrase or refine your question the PowerBI Q&A tool uses machine learning so it becomes smarter and more accurate the more you interact with it if the visual answer to your question is particularly useful and you want to keep it handy you can pin it to your dashboard to do this locate and select the pin icon at the top right of the visual choose the existing dashboard where you want to pin it or create a new one with quick insights and Q&A you are well equipped to bridge the gap between data and decision-making these features simplify complex data analysis enabling you to deliver actionable insights faster and more accurately imagine you’ve prepared stunning visuals in Microsoft PowerBI for Renee Gonzalez the marketing director at Adventure Works showcasing sales trends across different product categories you’ve pinned these visuals to your dashboard for easy reference but as you start digging deeper into the data exploring trends and cross- filtering data you come across a snag the pinned visuals are static snapshots they don’t interact or update you realize you’ve hit a roadblock that prevents you from extracting the full potential of your data analysis frustrating right you’re not alone as that’s a common issue with pinned visuals in PowerBI in this video you’ll explore the limitations of pinned visuals in PowerBI and how to overcome these limitations by setting up and pinning live reports to your PowerBI dashboard in PowerBI a pinned visual is a snapshot of a specific piece of data or chart from a report that is attached or pinned to a dashboard you can pin various things like a line chart showing sales trends over time a bar chart comparing the performance of different product lines a gauge displaying progress towards a goal or even a simple card displaying a single important number like total sales or total customers pinned visuals provide an at a glance overview of specific insights however they have certain limitations

the main limitation is their lack of interactivity you can’t cross filter or drill through data using pinned visuals which prevents you from exploring data trends in greater detail for example imagine Renee is studying a pinned visual showcasing sales trends for different bicycle product categories as she scans the data she wants to filter it by region to understand which categories are more popular in certain regions this could provide valuable insights for regional marketing strategies however the static nature of pinned visuals prevents her from cross-filtering or drilling through the data leading to incomplete insights and potentially missed opportunities for datadriven strategies so is there a way around these limitations absolutely the solution lies in pinning live reports to your dashboard instead pinning a live report means attaching an entire report page to your dashboard as a live tile unlike standard visuals pinned to a dashboard live report tiles are dynamic and maintain the interactivity of the original report this includes the ability to drill through data cross filter and view tool tips which provides a more immersive data exploration experience directly from the dashboard pinned live reports retain the original report layout and formatting making the visuals aesthetically consistent the interaction between visuals within live reports reveals relationships and patterns that isolated visuals cannot while pinned visuals offer a quick view of specific data points pinning live reports significantly enhances data exploration and analysis capabilities providing a comprehensive interactive view of your data now let’s explore how to set up and pin live reports the first step is to select the report you want to pin to your dashboard if you’re starting from scratch you will need to create a new report once you have opened your report select the reading view button on the ribbon directly above your report then select the ellipses on the far right of the ribbon followed by pin to dashboard from the drop-own menu the pin live page feature lets you pin an entire report page as a live tile on the dashboard this means the tile will continually update and allow interaction something a simple pinned visual cannot do a dialogue box asks you to choose a destination for your pinned live report you can select an existing dashboard or create a new one by typing a new name into the text box after you’ve selected the destination select the pin live button in the bottom right corner to pin your live report to the selected dashboard to view your newly pinned live report navigate to your chosen dashboard by selecting the workspaces button on the lefth hand navigation bar and selecting the dashboard where you pinned the live report now a live interactive report is directly accessible from your dashboard it retains all its interactive capabilities in the report view allowing you to filter and drill down into the data directly from the dashboard any changes you make to the original report will reflect in the live report on your dashboard ensuring real time data updates by using live reports you not only enrich your data storytelling but also create opportunities for more deeper more insightful analysis pinning live reports to your dashboard can help you turn static one-dimensional visuals into dynamic insightful narratives your manager Adio asked you to create a comprehensive report on the sales of Adventure Works product lines across different regions you have cleaned and analyzed the data and created a final report that is visually appealing and informative now you need to share the data and insights contained in the report with key decision makers in Adventure Works this is where Microsoft’s PowerBI publishing reports feature comes into play over the next few minutes you’ll discover the process of publishing reports in PowerBI let’s start by exploring what publishing reports in PowerBI means when you publish a report you move it from your local PowerBI desktop and upload it to the more accessible and collaborative online platform PowerBI service publishing a report connects you with decision makers allowing you to share your reports with colleagues your whole organization or external stakeholders who need to draw insights from the data in data analysis the purpose of creating reports is to assist with decision-making guide strategies and provide insights into business operations and for that to happen you need to publish and share the reports for example you can publish and share your report with the regional sales managers at Adventure Works this enables them to access the report through the PowerBI service where they can identify bestselling and underperforming products analyze sales patterns such as seasonal trends and then plan and focus marketing efforts accordingly furthermore a published report is not static you can set up automatic data refreshes so the report is always up to date with the latest data let’s explore how to publish reports in PowerBI publishing a report to PowerBI service from PowerBI desktop involves a series of steps let’s work through these steps the first step is to save the report since PowerBI will not allow you to publish unsaved reports select file in the top left corner of the PowerBI desktop interface and then save as to save the report choose a location on your computer and give it a descriptive name like Adventure Works product sales report select save once you’ve saved the report the publish option becomes available in the home tab of the ribbon of PowerBI desktop select publish and a new dialogue box pops up in this dialogue box indicate where you want to save the report in PowerBI service select Adventure Works as your workspace and then the select button for larger projects or collaborations you can create and select different workspaces once you’ve selected the destination PowerBI starts publishing the report a loading dialogue appears indicating that the report is being published depending on the size of the report and your internet connection this could take a few moments once your report is published a new window pops up to confirm it says success and gives you two options you can either open the report in PowerBI service or you can cancel and open it later in this case let’s select open selecting open launches the default web browser on your computer and takes you directly to your report in PowerBI service the report now displays as it will appear to other users while data analysis is about facts and numbers it’s also about communication publishing reports in PowerBI is a crucial part of the data analysis storytelling process as a data analyst your reports are pivotal in driving datainformed decisions and a vital link in the chain of business intelligence as a data analyst at Adventure Works you are tasked with reviewing and sharing sales data since Adventure Works is a multinational company the final report contains large amounts of information which you need to present in a format that is more manageable for stakeholders microsoft PowerBI allows you to pageionate and export reports as a result you can break down complex sets of results into smaller more digestible parts and share them easily in this video you will learn how to create multiple pages of content in a PowerBI report and navigate between them you will also learn how to export these pages to a PDF file in PowerBI you can organize and present your data across multiple pages within a single report which is known as pageionation a page in a PowerBI report is like a page in a book pages make it easier for the reader to navigate and understand the content for example if you have a large data set with numerous visuals presenting all of them on a single page can make the report difficult to read and interpret by dividing your report content into multiple pages you make your report more organized and easier to navigate let’s discover how to configure pageionation and export reports in PowerBI desktop with PowerBI desktop open navigate to the file menu located in the top left corner of the applications home screen once you select file a side menu appears select open report and then select browse reports to open a dialogue box navigate to the location on your computer where your PowerBI report file is stored select the file and then open to load the report now that your report is loaded you need to make sure you’re in the right view to pageionate your report a vertical pane on the left of the screen contains three views in PowerBI report data and model select report this choice is now highlighted on the bottom left of the report view screen is a tab with the name current page to add a new page select the plus sign which is the new page option to rename this page appropriately to represent the data it contains right click on the page name and select rename page you can then move visuals and report elements by cutting and pasting them from your main report to these newly created pages you can navigate between pages by selecting the tabs this allows you to organize the data in your report and makes it easier to review and understand if you need to present this report in a meeting or share it with colleagues who don’t use PowerBI you can export it to a PDF format select file in the top left corner of your PowerBI desktop screen on the menu that opens select the export option a side menu opens with the different export formats available select the to PDF option to begin the process of exporting your PowerBI report as a PDF document depending on the complexity and size of your report this may take a few seconds to a few minutes once the export is completed the PDF file will open automatically to display the result creating multiple pages and exporting to PDF can help you to produce effective PowerBI reports pageionation and exporting in PowerBI help you break down and categorize data clearly to enhance understanding and easily share insights that can drive informed decisions you’ve spent hours working on a sales report for the management team at Adventure Works and are confident that it will not only meet but exceed their expectations the feedback unfortunately is not about the insights your report offers it’s about the loading time your sales stats visuals load at a sluggish pace causing the stakeholders to become impatient despite your effort in creating the report its slow loading time overshadows its merits sounds like a nightmare right but it doesn’t have to be this is where Microsoft PowerBI’s performance analyzer comes into the picture over the next few minutes you’ll learn about the vital role of PowerBI’s performance analyzer in optimizing the performance of your reports by the end of this video you will understand why it’s important to measure current performance before implementing changes using the performance analyzer so let’s get started the performance analyzer a tool in PowerBI is designed to help you understand the load time for each visual element in your report this functionality is crucial in scenarios where a report has various visuals filters and calculations each of which can potentially impact the overall performance of the report it is critical to measure current performance before making changes to a report in data analysis just as you wouldn’t make business decisions without first analyzing relevant data you shouldn’t implement changes to your PowerBI report without understanding the current performance situation and identifying any problem areas with insights from the performance analyzer you can take targeted actions improve the performance of the lagging visuals and transform your report into a fast loading efficient tool the performance analyzer doesn’t just highlight what’s wrong it also shows you what’s right not all visuals or filters in your report will be problematic many of them might be well optimized and load swiftly recognizing these efficient components allows you to learn from them and apply those best practices to other reports or visuals now let’s dive into the interface and discover how to activate the performance analyzer in PowerBI desktop after your report is open and loaded select the view tab find and select the performance analyzer option at the top middle of the screen a new pane titled performance analyzer opens on the right side of your screen displaying buttons for starting and stopping recording refreshing visuals and exporting data the performance analyzer pane has a button labeled start recording to begin gathering performance data for your report select this button once activated the performance analyzer starts monitoring any actions taken on the report capturing useful performance metrics for each visual element on the page now that the recording has started you need to generate the actions you want to analyze this could involve refreshing a report page to load all the visuals or navigating through different report pages if it spans multiple pages you can manually refresh the page by selecting the refresh visuals button in the performance analyzer pane this action causes PowerBI to reload all visuals on the page and the performance analyzer records the performance data for each visual during this process the performance data displays in a list in the performance analyzer pane with each visual on a separate row this list contains information such as the name of the visual the duration of time it took for the visual to render the time it took to run the DAX query for the visual and more this information can help you understand how long it takes for each visual to load and render and identify any potential bottlenecks in your report expanding the row by selecting the plus icon reveals more granular details about the performance of that visual this includes a breakdown of the time it took for each operation such as the DAX query execution visual display rendering and any other operations the actual DAX query run and more the performance analyzer lists visuals in the order they were rendered on the page by default however this order may not always be the most useful when diagnosing performance issues you can reorder the list by selecting the duration column header this sorts the visuals by the time taken to render allowing you to quickly identify which visuals are taking the longest to render and could be potential targets for optimization once you’ve gathered the performance data you need you can stop the performance analyzer recording select the stop button in the performance analyzer pane to conclude the data capture you can always start a new recording session by clicking the start recording button again as a data analyst your task isn’t just to ensure that your reports are accurate or comprehensive but also that they’re efficient a well optimized report can mean the difference between insights that sit on a virtual shelf gathering dust and insights that spark change and propel a business forward in the world of data speed isn’t just a convenience it can enhance the impact of your reports lead to better decision-making and drive business success imagine you’re a data analyst in Adventure Works working through streams of data finding patterns making connections and uncovering insights that could improve business performance you’re in the middle of an exciting project where you’ve created a new complex DAX query to analyze sales performance and uncover trends but as you load your PowerBI report you’re not met with a rush of insights but rather a slow loading screen that seems to drag on forever this isn’t just frustrating it’s a barrier between you and the crucial insights needed to drive Adventure Works forward as these performance issues make your data exploration and analysis frustratingly slow you remember a helpful tool the performance analyzer in this video you’ll discover the role of the performance analyzer tool in diagnosing and resolving DAX performance issues you’ll become familiar with the process of identifying if a DAX query is causing a delay and learn how to optimize it for improved performance at the heart of PowerBI’s data modeling is DAX or data analysis expressions as you may recall DAX encompasses a wide range of functions operators and constants that you can combine to create different formulas and expressions the power of DAX lies in its flexibility with DAX you can build custom calculations within data models thereby allowing you to analyze data in unique and powerful ways however just like a powerful vehicle it requires skill and care to operate effectively and efficiently while DAX has immense analytical power it can sometimes run into performance issues these issues arise when the DAX queries that are created based on your formulas and visual configurations become complex making the engine work harder and longer to return the results for example suppose you are dealing with large adventure work sales tables that need to be sifted through your DAX formulas might be complex and inefficient or you might have a data model that’s been improperly structured regardless of the case these issues can lead to slow report loading times sluggish interactions and an overall frustrating user experience to help identify and resolve these performance issues PowerBI has a built-in tool called the performance analyzer this tool provides detailed timing breakdowns on all the various components and processes that occur when your report is refreshed it helps you spot which visuals fields or DAX calculations are taking up the most time and hence slowing your report down let’s explore how to identify and resolve DAX query performance issues using the performance analyzer once you’ve loaded your PowerBI sales report you first need to open the performance analyzer on the ribbon interface at the top of your PowerBI report locate and select the view tab within the view tab find and select the performance analyzer option in the performance analyzer pane locate and select the start recording button now it’s time to refresh your report you can accomplish this in two ways either by selecting the refresh button situated in the home tab of the ribbon interface or by directly interacting with the report interactions could be in the form of changing a filter selecting a slicer or simply navigating to a different page of the report as you interact with the report while the performance analyzer is recording it will track and document the time taken to load each individual visual item this data is crucial for diagnosing performance issues once the report has finished refreshing review the performance analyzer pane you’ll see a list of all the visual items in your report and their respective load times pay special attention to any visual items that take a significantly longer time to load compared to others for the visuals with slower load times you can drill down into the details by selecting the arrow beside the visuals names this will provide a detailed breakdown of the DAX query time and the visual rendering time helping you understand where the bottleneck lies if the DAX query time is high then your effort should be directed towards optimizing the DAX measures in this case it appears that the average sales by product category is slowing down the report performance as it has a considerably larger DAX loading time locate the average sales field from the data view on your right and select it to view the underlying DAX formula the filter and all functions used in this formula iterate over the entire data table to calculate the average sales for each product across all stores this operation becomes particularly slow when working with larger data sets to simplify the DAX formula eliminate the filter and all functions and instead use the average X function the average X is a function that evaluates an expression for each row of a table and then returns the average result however since it operates directly on the data context which is already filtered based on the report’s current context it avoids the need to iterate over the entire data table finally rerun the performance analyzer to test if the optimization was successful the advantage of applying an optimized formula is that it simplifies the calculations and reduces the computational load by avoiding the iteration over the whole data table it leads to a significant speed up in query execution you’ve now seen how seemingly simple tasks like generating a sales report at Adventure Works can become complex it’s in these complexities that you as the data analyst can create value by optimizing your DAX queries and delivering faster smoother reports you can empower stakeholders to make quick and informed decisions remember data analysis isn’t about delivering vast amounts of information it’s about delivering the right information in the right format at the right time each time your report loads a little faster or your DAX query runs a little smoother you’re not just improving a technical process you’re contributing to better faster and more informed business decisions you are now better equipped to find the hidden inefficiencies in your DAX queries confront them headon and turn them into opportunities for learning and growth adventure Works has a rich set of data from manufacturing to sales the data is vast and you are responsible for developing a comprehensive dashboard that compiles all these data sources into meaningful insights you start creating a report in Microsoft PowerBI and use DAX the formula language in PowerBI as you create complex DAX expressions you realize that the report starts to lag the calculations are getting more complex and timeconuming and you wonder if there’s a more efficient way to handle all this data without sacrificing performance in your search for solutions you discover DAX variables which are said to have the power to make PowerBI dashboards more efficient could using DAX variables be the answer to you improving your report performance in the next few minutes you’ll discover DAX variables and their importance in PowerBI you’ll also learn how to effectively implement DAX variables to optimize the performance readability and accuracy of your PowerBI reports dax or data analysis expressions is a formula language that includes functions operators and values you can combine to construct formulas and expressions in PowerBI and Power Pivot in Excel in programming and formula languages a variable acts as a storage container you can put something into it like a number or a string or even the result of a more complex expression once you’ve assigned a value to a variable you can reference that variable by its name elsewhere thus saving you the need to recomputee or refetch that stored value in DAX variables serve a similar role but with a twist catering to its analytical nature instead of thinking of them as simple storage containers think of them as computational snapshots when dealing with complex data sets like the multi-layered operations at Adventure Works recalculating the same values or expressions can be resource inensive especially if done multiple times in a single report or visualization this is where using variables in DAX for PowerBI is beneficial let’s explore the benefits of using DAX variables in more depth using variables allows for storing intermediate results complex calculations done multiple times can be stored in a variable and referenced thereafter saving computational effort and time this optimization leads to faster report rendering and performance enhancement especially in large data sets dax formulas can sometimes become quite lengthy and complex by breaking down these formulas and storing parts of them in variables the main formula becomes more streamlined and easier to read improving readability also once a value or a result is stored in a variable it remains consistent throughout the formula this ensures consistency and no variation due to repeated calculations leading to more accurate results in addition to ensuring consistency reusing variables in multiple expressions within a formula means you don’t have to recalculate or redefine commonly used values or results and provides flexibility in formula construction should there be an error or an unexpected result in your report having your formula broken down into variables makes it easier to pinpoint where things might have gone wrong instead of sifting through a long complex formula you can check variable values individually making debugging easier lastly breaking down complex expressions into smaller parts held within variables makes your formulas more transparent and easier to understand this reduced complexity can be immensely beneficial when working in teams where other data analysts or report developers might need to decipher or modify your DAX expressions for example if you were to calculate the total sales for Adventure Works in the last year and then use that figure in multiple parts of your DAX formula without variables the same total sales value might get recalculated every single time it’s referenced this redundancy isn’t just a waste of computational resources it’s a drain on performance by using a variable you compute the value once store it as a snapshot and then reference this snapshot wherever needed in your formula ensuring both clarity and improved performance now let’s examine how to use a variable in DAX to improve report performance in PowerBI let’s start by opening the existing Adventure Works sales PowerBI report once your report is open you’ll notice various panes on the screen on the right side you’ll find the data pane which lists all the tables that your report is connected to select the sales table that contains the empty sales measure upon selecting the sales measure the formula bar will open where you can start writing your DAX formulas begin the formula with the var keyword this is the starting point for declaring a variable after typing var add a space and then name your variable it’s a good practice to name your variable something meaningful for instance if you’re calculating total sales for the last 12 months you might name your variable sales_12 months next you’ll provide the DAX expression that calculates the value for the variable after the equal sign write out the DAX formula you want the variable to hold this expression calculates the sum of sales amounts over the last 12 months after defining all necessary variables the next step in your DAX measure is using the return keyword this keyword indicates the final output of your DAX measure after performing calculations using your variables once you’ve written out your measure press enter with the measure saved to your table you could use the variable you created to quickly compare the last year’s sales figures across different product categories or regional markets by leveraging the pre-calculated variable the report would render these comparative visualizations much more quickly using variables in DAX within PowerBI offers a streamlined approach to handling complex calculations and improving report performance as you get more accustomed to this feature you’ll find yourself employing variables more often to make your DAX measures both efficient and maintainable using variables to optimize your data models and make them efficient can ensure not only quick results but more accurate insights every line of DAX you write every measure you create and every insight you derive has the potential to influence decisions shape strategies and drive success adventure Works has seen soaring sales this year with mountain bikes especially flying off the racks like never before but as you sift through your PowerBI dashboard a nagging feeling settles in the mountain bike sales data for the past 12 months that you have been visualizing through a complex DAX formula isn’t tallying up with the raw sales numbers questions whirl through your mind is there a missing link an error in the formula maybe the weight of potential inaccuracies weighs on you mistakes mean mistrust in data and mistrust in data can lead to poor business decisions in this video you’ll learn how to use variables in DAX to troubleshoot issues like this one to recap a variable in DAX lets you store a value or a table to be used later in your formula think of them as placeholders or temporary storage units for your data by breaking down your DAX formula into smaller pieces and storing parts of the calculation in variables you can keep track of each step making the process more comprehensible and easier to debug returning to the earlier adventure works example suppose you’re faced with a formula representing the sales for the last 12 months given the vast amount of data and interconnectedness of the business processes ensuring accuracy in the formula is paramount so let’s help Adventure Works troubleshoot their mountain bike sales data for the past 12 months before you can do any troubleshooting understanding the overall structure and components of the formula is essential without a comprehensive grasp of what the formula consists of determining what might be causing an issue becomes like finding a needle in a haststack once you have opened your PowerBI report on the right side of the interface you’ll notice the fields pane within the fields pane scroll until you locate the DAX measure you wish to troubleshoot in this case the measure to troubleshoot is the sales_12 months upon selecting the measure a formula bar appears above the report canvas this bar allows you to view the DAX expression while carefully examining the expressions present you can identify components like the calculate function sum aggregation and dates in period function as each of these plays a role in the calculation once you identify each component of the measure it’s time to create variables for each part by breaking down the formula into smaller parts and assigning them to variables you can address each segment separately this modular approach aids in understanding which part of the formula might be behaving unexpectedly on the upper ribbon select the modeling tab and select the button named new measure this indicates you’re creating a new formula or metric that isn’t present in your data upon selecting new measure the formula bar becomes active for you to define the logic of your formula and break it down into variables start by typing var which stands for variable followed by a space then provide a name for your variable like current date using the equals sign assign the function today to this variable and return the result now let’s create a new measure and add a variable called last year sales for the dates in period section with variables holding specific parts of the formula analyzing them individually allows for isolated testing by evaluating each variable separately you can confirm its correctness ensuring that each foundational block of the formula is sound before the whole formula is put together finally let’s create variables for the product category and subcategory to return the result for each on the right hand side locate the visualizations pane select the card icon to place a blank card onto your report canvas a card visual is useful because it displays a single prominent value ideal for scrutinizing individual variables once the card is active you’ll notice areas named values and axis in the visualizations pane locate your variable named current date in the fields pane select hold and drag it to the values area of the card the card will now dynamically showcase the current date as you continue the troubleshooting process create new card visuals on the canvas and drag the sales filtered by category and sales filtered by subcategory measures to the cards to provide a snapshot of the isolated categories after assessing individual variables it’s crucial to observe how they interact together sometimes even if variables are correct when isolated they may not interact as expected when combined this step ensures that the overall logic of combining the variables is correct let’s create a new measure called mountain bike sales to weave these variables together with the calculate function calculate modifies or extends the context in which a calculation occurs so combining these variables essentially tells PowerBI to consider only sales amounts of mountain bikes in the cross country subcategory for the last 12 months to visualize the combined logic drag the newly made measure mountain bike sales onto a new card visual if everything is functioning correctly this should vividly illustrate the mountain bike sales restricted to the last 12 months for the cross country subcategory you notice that the sales filtered by subcategory card is significantly different in value from the mountain bike sales card based on your troubleshooting you uncover that while the technical logic of your DAX calculation is correct a pre-existing filter was applied onto the sales filtered by subcategory card that skewed your calculation showing sales for the past 6 months to resolve this select the sales filtered by subcategory card visual and clear the applied filter in this video you learned how to use variable for troubleshooting you discovered the importance of breaking down a DAX formula piece by piece understanding each element and its interaction and how this modular approach provides a systematic method for troubleshooting you also explored the process of defining DAX variables and combining them to ensure their interactions produce accurate results imagine you’re a captain navigating the seas of business data your compass is your understanding of key performance indicators your sales are your dashboards and your map is Microsoft PowerBI the winds of analytics fill your sales pushing you towards better informed decision-making this module bringing data to the user has equipped you with the navigational skills needed to sail through the waters of business analytics you’ve not only discovered the pivotal role of dashboards in steering organizational decisions but also ventured into report navigation and publishing configuring mobile views fine-tuning report performance and sharing leveraging features like quick insights and Q&A and optimizing reports using DAX variables let’s recap key concepts including dashboards in business decisionmaking including how to create and customize them sharing information with stakeholders such as PowerBI workspaces publishing reports and optimizing pageionation for better navigation and user experience and the usage of the analyze in Excel feature in PowerBI and optimizing reports using DAX variables thereby making your report easier to debug and more efficient you started with a deep dive into creating dashboards you explored the concept of dashboards in the business context their importance functionalities and how they serve as key tools in data analysis and decision-making processes much like a car’s dashboard that shows critical data like speed and fuel level you learned that a business dashboard provides a consolidated real-time visual display of key performance indicators or KPIs such as sales trends and customer behavior while they share similarities with reports dashboards differ in that they offer a one-page summary of the most important metrics in contrast reports provide a more indepth multi-perspective view you also recognize the need to understand the visual and interactive nature of dashboards their role in promoting transparency and accountability within organizations and how they aid in breaking down barriers to information sharing your exploration continued to how to build a simple dashboard configure the mobile view and change themes you started by creating a new report dragging and dropping various data fields to make visual charts like bar graphs and line charts once you had your visuals you combined them into a single dashboard for a comprehensive view of important metrics to elevate your data analysis capabilities you explored how to optimize the usability of your PowerBI dashboards by adding two key features its quick insights and Q&A features you also discovered the limitations of pinned visuals in PowerBI how their static nature can prevent deep data exploration and how to overcome these limitations by setting up and pinning live reports next you delved into sharing reports with stakeholders you learned about PowerBI workspaces and their importance alongside the stepby-step process of creating a simple workspace workspaces are essential as containers that hold various components such as dashboards reports workbooks and data sets you explored the step-by-step process of publishing reports in PowerBI as well as the concept of pageionation and why it’s beneficial for creating organized reports publishing reports serves as a bridge connecting you the data analyst with decision makers and team members who need to draw insights from the data pagionation affirmed that dividing your report content into multiple pages makes your report more organized and easier to navigate akin to chapters in a book your journey then led you to understand the different elements of report page properties including page information canvas settings canvas background and wallpaper report page properties let you customize your report pages giving you control over how your report is presented influencing aspects like page size view and background enhancing overall readability and effectiveness you also learned how to use the analyze in Excel feature in PowerBI to take your reports and further analyze them combining the visual capabilities of PowerBI with the analytical depth of Excel it provides a live connection from an Excel pivot table to the data in PowerBI so when data in PowerBI is updated you can simply refresh your Excel report to see the new data you also explored the practical aspects of tuning report performance you grasped the role and function of the PowerBI performance analyzer the process of activating it starting a recording refreshing visuals analyzing performance data and exporting data for further analysis the performance analyzer helped you identify the parts of your report slowing things down by providing a detailed breakdown of loading times for each visual you also identified if a DAX query was causing the delay and took the necessary actions to optimize it for improved performance the process of simplifying a DAX formula involves reducing the complexity of the formula which might include eliminating unnecessary calculations using more efficient functions are avoiding iterating over large tables this can make the formula more efficient and less demanding on the DAX engine reducing the computational load in the final part of our journey you explored the importance of DAX variables how to use variables to enhance the performance and accuracy of your PowerBI reports and the steps to effectively implement them for optimal performance using variables in DAX formulas enhances readability by breaking down complex and lengthy expressions into more digestible smaller parts variables act as named references for parts of these formulas making the main expression streamlined and easier to interpret throughout this module you journeyed from understanding the foundational significance of dashboards to the details of optimizing DAX formulas at every step you’ve gained skills and techniques that empower you to bring data to the user a fundamental aspect of data analysis and visualization these skills and techniques aren’t just tools they’re instruments of change that can drive organizations like Adventure Works towards innovation efficiency and success the marketing director at Adventure Works Renee was captivated by the Microsoft PowerBI reports you produced recognizing their value in the company’s decision-making process Renee wants to delve deeper into the data introduce statistical results categorize data patterns and make predictions about future trends although these tasks have been vital for businesses for decades immensely helping their decision-making they were traditionally complex and timeconuming however the analytics in PowerBI has changed this powerbi offers a versatile and userfriendly toolbox to tackle analytical tasks effortlessly making these processes much more efficient and accessible but how can you use the analytics in PowerBI in your reports over the next few minutes you’ll be introduced to the concept of analytics and explore the analytics capabilities offered by PowerBI analytics refers to systematically using data statistical and quantitative analysis and predictive modeling techniques to uncover meaningful patterns insights and trends within data sets although these tasks have been vital for businesses for decades immensely helping their decision-making an essential part of analytics involves interpreting and visualizing data to extract valuable information resulting in actionable insights for informed and strategic decisions powerbi empowers you to transform raw data into meaningful insights through its various advanced tools and functionalities analytics in PowerBI unlocks many ways to enrich your visualizations adding significant value to your reports as you progress through this course you’ll explore the many ways analytics in PowerBI can enhance and elevate your reports for now let’s explore some of the PowerBI features available for analytics leveraging the statistical summary tool you can easily add functions to your visualizations like calculating averages and middle and median values you will also learn how to use the topend analysis in a visualization to highlight critical data points saving you time from repetitive tasks and manual calculations another feature you’ll learn about is DAX measures which can enhance PowerBI’s visualizations to find unusual data points called outliers with grouping and bin data for analysis you can classify two or more associated data points into groups or separate them into equals-sized groups respectively mastering organizing your data into meaningful categories can reveal trends and patterns in your data helping you make smarter decisions applying clustering techniques empowers you to discover another way of associating similar data points in a subset of your data using the clustering algorithm using a straightforward feature that identifies similarities and dissimilarities in the attributes values your data gets divided into subsets called clusters unveiling valuable patterns in your data powerbi empowers you to conduct time series analysis timebased data analysis with the time series involves exploring trends and patterns occurring over a range of time as you explore this feature further you’ll learn how to predict future trends using time series forecasting and discover captivating visuals to support your timeass associated data like the play axis an advanced visual containing a dynamic playback of data over time powerbi also offers the analyze feature this powerful feature automatically detects relationships and connections in your data revealing valuable insights that might have gone unnoticed with the press of a button on any data point PowerBI runs a rapid analysis to provide users with automated generated insights you can leverage advanced analytics custom visuals to create exceptional reports there are a variety of custom visuals in PowerBI called advanced analytics custom visuals or AI visuals powerbi leverages machine learning algorithms to provide insights on the data you provide on the chart visuals like key influencers and decomposition tree will take your data reports to a new level another AI powered feature of PowerBI service quick insights generates valuable information from your data sets in the form of a dashboard with the press of a button this will save you time and help stakeholders make better decisions faster plus you can uncover predictive and prescriptive insights with PowerBI’s AI capabilities you can generate AI insights with functionalities like sentiment analysis which visualizes emotions or attitudes in data and key phrase extraction which identifies phrases in text data these AI capabilities empower you to forecast future trends and stakeholders to make datadriven decisions with confidence you’ve now been introduced to the PowerBI features available for analytics in upcoming videos you will delve deeper into each one of the features and witness their magic at work exploring the powerful tools of analytics in PowerBI unlocks a world of possibilities for you to drive datadriven decision making with your reports by harnessing the power of analytics in PowerBI you can help organizations optimize their strategies and stay ahead in today’s dynamic business landscape adio your manager at Adventure Works just imported the company’s sales data for quarter 1 into a Microsoft PowerBI report there is an air of anticipation as your team brainstorms ways to extract valuable insights from this information despite the raw nature of the data set only containing product details order dates and the total order amount the team sees immense potential to build upon the aim is to create a report that can answer crucial questions like what was the total order amount per product category what were the average and medium amounts per product category did the early March ad campaign have any impact on sales adio is confident that PowerBI’s statistical summary capabilities can easily transform these questions into an insightful report in this video you will learn about these capabilities exploring the process of integrating a statistical summary into a PowerBI report data and statistics are closely intertwined as statistics serve as the essential language to articulate and analyze your data powerbi captures the power of statistics offering a comprehensive range of statistical functions you may already be familiar with some of the functions commonly used in data analysis such as sum of totals average for mean calculations and medium minimum and maximum to find the middle smallest and largest values in a data set powerbi not only provides rich features to seamlessly incorporate these functions into your visualizations and reports but also utilizes the DAX language that encompasses all of these statistical capabilities this powerful combination is referred to as the statistical summary in PowerBI using Adventure Works sales data set let’s examine two different ways of adding the average statistical function to a visualization this will help the sales team identify which product category accumulates the highest average order amount in addition to identifying whether Adventure Works early March ad campaign impacted orders the marketing team also needs to retrieve the number of orders per day from the data set as you are learning to integrate a statistical summary in a report let’s extract and utilize just three columns of Adventure Works sales data product category order date and order total which is the total order amount to prepare for our statistical summary exploration let’s create a few simple graphs to work with first let’s create a clustered column chart and select product category first to represent it on the xaxis and order total second as its yaxis to visualize the total amount of orders for each product category adjust the visual to the screen and click on an empty space of the canvas to deselect the bar chart and create the second visualization a line graph right below the column chart which will contain the order date on its x-axis using just order date without the date hierarchy and then the order total again as its yaxis this visualization depicts the total order amount of each date lastly let’s create a table graph in the right corner of the screen add product category as its first column and order total as its second column this will provide a better view of the numerical data when adding a numeric column to a visual the default function displayed is the sum or total of the amount however there are numerous built-in functions that you can apply to your graph these functions display on the popup menu in the visualizations pane directly at the right of your column such as average median and deviation to better understand how this works let’s add the order total column again in the same graph and adjust the function to calculate the average order amount of each product category instead you can also create your own calculations using DAX expressions which include a rich set of statistical functions let’s produce a similar result using a straightforward DAX measure in the ribbons home tab select new measure assign the measure a name and use the median function specifying the order total column for the calculation lastly modify the column chart to a line and column chart add your measure to the y-axis and observe the result now let’s explore the time series data let’s add the number of orders for each day to the line graph to do this drop the order total column into the secondary yaxis and use the count statistical function this is a helpful function that counts table rows in the graph based on the filter context it is given in this case where each row represents a single order the count function counts the number of orders by using statistical summary in PowerBI you explored how you can effortlessly calculate statistical measures and add them to your visualizations all the critical questions were answered in the report as it displays the average and median value of each product category and even displays the impact of the ad campaign in March when the count of orders doubled with just three columns as your data source you unlocked the power of analytics in PowerBI with the aid of statistical summary many business requirements can be met and questions answered with ease thanks to the array of statistical features tailor made for data analysts by PowerBI renee the marketing manager at Adventure Works has just finished a critical meeting with other marketing team leads to discuss new approaches and strategies for attracting new customers after the meeting she promptly reached out to the data analytics team to discuss the implementation of these approaches in their reports during the meeting the marketing leads for North America and Europe decided to take different approaches for each continent’s market this requires grouping country orders by continent a task that hasn’t been implemented in the existing data set additionally the marketing team agreed on launching ad campaigns in 10day intervals microsoft PowerBI’s visualization options already include automatic monthly and weekly breakdowns but the challenge is to figure out how to assemble orders into 10-day groups the data analytics team quickly searches for a solution and discovers that you can address both these problems using analytics in PowerBI particularly the grouping and binning data features these features both associate data points with each other in their respective ways grouping in PowerBI gives you the ability to manually divide data points into separate groups of your choice on the other hand bin automatically separates data points into segments referred to as bins giving you two options to do so you provide the number of outcome bins with PowerBI splitting the data points between them or you provide the size of bins and PowerBI splits the data points into any number of bins required to fit your data into the specified sized bins now the question is how can they effectively implement these features in the customer report in this video you’ll be introduced to the concept of grouping and bin and you will learn how to differentiate between the two concepts you will also learn how they can be effectively implemented in a PowerBI report to clarify information and provide easy to understand deliverables let’s start by helping Adventure Works group the orders from each country by continent to visually highlight orders for Europe and North America you need to group them in the report first let’s select a stacked bar chart and set the country on the Y-axis and the sum of order total on the X-axis hold down the shift key and select in the visual all the countries that belong to North America including USA Mexico and Canada while still holding the shift button down right click on the visual and select group data from the drop-down menu this action automatically creates a group and assigns it to the legend field resulting in a different color for the countries that were grouped together now let’s explore how to edit the group created earlier the new group appears as a new column in the table with an icon on the left side indicating that it is a super group of another column right click on this new group and select edit groups from the menu to open a new window now you have the option to rename the existing group let’s change Canada Mexico and USA to North America similarly you can select all European countries while holding the control key select group and create a new group called Europe once you are done select okay in addition to highlighting categories of data you can also use the newly created groups as an axis in your visuals to do this create a doughut chart and add the sum of order total to the values field then add country groups to the details field this will help you visualize the distribution of the order amounts between North America Europe and the other regions the doughut chart clearly represents how the orders are distributed among these different groups making it easier to analyze the data at a glance to create bins based on the 10day campaign interval right click on the order date column and select new group select bin as the group type and size of bins as the bin type in the bin size select the 10day interval to align with the campaign requirement and select okay next create a line chart and use the new bin on the x-axis and the sum of order total on the y-axis this creates a visualization of the 10day ad campaign interval by using this technique the marketing team can effectively analyze the data based on the 10day intervals gaining valuable insights into the trends and patterns within the data set as you know by now grouping and binning data has always been crucial in data analysis as it organizes data points into similar meaningful categories uncovering patterns hidden within them powerbi introduces this capability in its engine allowing you to seamlessly group or bin columns in a simple manner without having the hassle over delivering the result in code language to fully grasp the power of this feature let’s compare them with the complexity of using DAX code to achieve the same bin technique with just a few clicks the data analytics team publishes the report quickly leaving Renee astonished by the powerful capabilities of groups and bins in PowerBI the marketing team can now easily identify trends within the groups of North America and Europe enabling them to make immediate comparisons with the rest of the countries moreover they can analyze and assess the 10-day campaigns effortlessly gaining insight into critical information on their performance well done the sales team at Adventure Works is so impressed by your Microsoft PowerBI report that they ask you to add more analytics to the data set the team wants to analyze if there is a trend in the order amount identify the largest order of each day by order amount and determine the top 10 best and worst sales days for the business you can accomplish this by including a histogram in the report and using the topend analysis feature but what is a histogram and how do you add topend analysis in the next few minutes you’ll learn how to identify and build histograms as well as filter data points into a topend analysis showcasing only the most significant data a histogram is a way to visualize a topend data query result while the topend function in PowerBI is a built-in DAX function that retrieves the topend records from a data set based on specific criteria it compares the parameters provided and returns the corresponding rows from the data source the n in top n refers to the number of values at the top or bottom data points are grouped into ranges or bins making the data more understandable a histogram is a great way to illustrate the frequency distribution of your data as you already know a typical chart visual relates to two data points a measure and a dimension incorporating them on its X and Yaxis respectively adventure Works has an existing bar chart to track the total order quantity for different product categories but they would like to know how often quantities occur to do this they would create a histogram of the quantities the x-axis contains the quantity groups and the yaxis contains the frequency that these groups occur the most used charts for histograms are bar charts and area charts sorting a field in ascending or descending order is a relatively common process in data analysis reporting but what happens when there are so many attributes that the columns completely cover the canvas area hiding the crucial information the top end analysis prevents this by sorting the data to display according to a category’s best or worst data points this enables stakeholders to quickly identify the top or bottom values in the data and make datadriven decisions efficiently now let’s explore how to create histograms to analyze sales data and visualize the top 10 dates and sales by implementing top-end analysis in a visualization for the adventure work sales team let’s start creating a histogram to analyze trends in order amounts the first step in creating a histogram is to create a bar chart and to add order total to the X and Y axes ensure you select the sum of order total and not the count resize the chart by dragging its edges so it’s clearly visible notice that having numerous data points on the X-axis may make it difficult for users to interpret the analysis histograms directly address this issue by grouping X-axis data points in groups to achieve this use the bin technique you learned about previously rightclick the order total column and select new group from the drop- down menu select bin as the group type and number of bins as the bin type for the bin count enter 20 and then select okay to create the new bin in the order total column now replace the new bin on the x-axis instead of the standard column in both charts congratulations you have now created your first histogram bar charts are one of the most common histogram charts with area charts being a close second while having the visualization selected select the area chart to modify it using histograms the distribution of order amounts per amount ranges is clearly visible with the most revenue being accumulated through orders that were just over the $2,250 mark now let’s explore how you can visualize the top end data points of a column to achieve this you need an attribute and a sorting column the sorting column will be used to create ascending or descending order on the attribute column before the attribute column is filtered to its top end values let’s observe a topend analysis implementation creating a chart to highlight the top 10 days by sales amount create a funnel chart which is one of the most popular top- end charts and add order date without hierarchy to the category and order total to the values to limit the chart to a top 10 analysis navigate to the filter pane select the arrow on order date and select top N as the filter type select top 10 to display the best days you would select bottom for the worst days and add the total amount to the buy value to sort by this amount you now have a better understanding of the capabilities and potential of histograms and top end analysis in PowerBI by working through this lesson you discovered how to construct histograms transforming data into visualizations that uncover distribution patterns furthermore you’ve practiced your topend analysis skills to isolate key data points to inform actionable insights during a recent strategy meeting at Adventure Works stakeholders discussed adjusting prices to align with the business strategy however the current sales data set seems disconnected and lacks cohesion making it difficult to use recognizing the importance of optimizing the company’s product offerings you’d like to apply advanced analytics to categorize products based on order details and pricing your goal is to establish meaningful connections between the products to enable datadriven pricing decisions having explored groups and bins in Microsoft PowerBI you’ve learned to organize data points hierarchically with groups or into equal-sized bins but what if you want to group data points based on similarities in their values that’s where the clustering technique in PowerBI comes into play this video aims to equip you with all the relevant knowledge needed to apply the clustering technique to a data set including how to cluster data in scatter charts and identify outliers with clustering clustering is a powerful feature that enables you to discover groups of similar data points within your data set efficiently it is enabled in scatter plot visualizations as they are the optimal charts for analyzing data dispersed and identifying outliers by analyzing your data the clustering technique identifies similarities and dissimilarities in attribute values and then separates the similar data into distinct subsets known as clusters these clusters provide valuable insights and aid in understanding patterns and relationships within your data it covers the valuable insights that clustering can offer using the earlier example as a practical demonstration let’s begin exploring patterns in the Adventure Works products based on their sales data launching a new PowerBI report with the sales data set imported select the scatter chart icon on the visualizations pane and resize it on the screen for better visibility add product name in the values field as this is the field you want to separate into clusters for the axes use product price as the x axis and order total as the y-axis ensure the sum function is correctly applied to both as the default aggregation with this setup you can now apply the clustering technique to gain valuable insights from the data with the dots scattered across the graph let’s apply analytics to identify similarities between these data points that would group them into categories select the ellipses in the top right corner of the chart to see the visualization options now select the automatically find clusters option a pop-up window on your screen provides various clustering options you can adjust name the cluster group product cluster and for the description use clusters for product name based on product price and order total then you have to choose how many clusters you want the data points separated into or even let PowerBI automatically choose the number for our example let’s input three as the number of clusters and select okay the clustering technique has divided the product data points into three clusters the first cluster comprises products with low prices leading to low order amounts the second cluster includes products with high prices but relatively lower order totals compared to cluster three where high product prices also resulted in high order totals continuing with the clustering analysis you can leverage the newly formed clusters as axes for additional visuals allowing you to gain further insights based on clustering patterns select a horizontal clustered bar chart and set product category as the yaxis and sum of order total as the xaxis adjust the chart size to cover the right part of the canvas from top to bottom to add the new data grouping into the analysis add product cluster as the small multiple to do this navigate to format in the visualizations pane then small multiples and select three rows and one column to compare these multiples easily lastly include the product name in the tool tips of the visualizations by analyzing the clusters in both graphs you can directly gain insights from your data set while most ebikes and road bikes appear to belong to the high-erforming cluster three there are some exceptions in the lowerforming cluster 2 hovering over these product categories allows you to display the product names that belong to this category providing valuable information for future business decisions by clustering the products you helped the pricing department make crucial decisions to improve the promotion of specific products and embrace datadriven strategies at Adventure Works by analyzing products belonging to the low performing categories they adjusted their prices strategically aiming to achieve better results and optimize the overall market performance in this video you have gained valuable skills in using the clustering algorithm in your scatter plots to group data points effectively by applying clustering you learned how to identify hidden relationships and patterns within your data making it possible to optimize various aspects of business such as product pricing promotions and overall strategies you received a new report requirement this morning your task is to build a customer demographic analysis leveraging the sales and customer data sets to derive valuable insights about the customers to fulfill the business needs for visualizations based on country customer age and order dates you will have to use both axes categories categorical and continuous axes but what are these categories and how do you decide which one to use in each visualization over the next few minutes you’ll be introduced to categorical and continuous axes and learn how to differentiate between them you’ll also explore how to configure these axes in Microsoft PowerBI let’s start by exploring categorical axes you can use a categorical axis to represent discrete non-numeric data points it organizes data into distinct categories such as names categories are groups with no inherent numerical order common examples of categorical data include product names geographic regions and employee roles when you use a categorical axis PowerBI automatically arranges data points in the order they appear in the data set categorical axes are best suited for displaying qualitative information and facilitating comparisons between distinct entities or categories bar charts stacked bar charts pie charts and categorical line charts are common visualizations that use categorical axes on the other hand a continuous axis is designed to represent numerical data points with an inherent order and can be measured along a continuous scale these data points are typically represented by real numbers and can be integers or decimal values examples of continuous data include sales revenue temperature time and age continuous axes are ideal for visualizing quantitative information allowing users to identify trends patterns and correlations within the data common visualizations that use continuous axes are line charts area charts scatter plots and histograms now let’s explore how to use these two axes in your reports using a realcase scenario let’s explore both axes to understand their use better open a new report with sales and customer data sets imported the first visualization you’re going to work on is sum of order total by order date add a clustered column chart and insert order date on the x-axis without date hierarchy and order total on the y-axis resize the visual by dragging the edges the visual displays spaces with no data for the dates that held no orders this is because PowerBI automatically selects the continuous access type when given a date column in its access field by selecting the categorical access the bar chart displays no space by removing the depiction of dates with zero order total keep in mind that there is no right or wrong way to visualize the data and there are no numeric differences between the two axes the choice of axis type should be the one that best addresses the business need to explore the categorical axis let’s create a second visualization using a sum of order total by location to do this insert a clustered bar chart and add location on its y-axis and sum of order total on the x-axis move the visualization to the right part of the screen and resize it so it fits the screen top to bottom location has no inherent order so PowerBI automatically implements a categorical axis and turns off the option of turning it into a continuous axis for the last graph let’s explore another possibility of a continuous axis customer age is a column with an inherent numerical order so when you add a line chart and insert age on the x-axis and order quantity on the y-axis PowerBI uses the continuous type of axis you can observe a major difference between the two axes if you try to access the visualization sorting method through the ellipsus you will notice that continuous access doesn’t allow you to use a different sorting other than the one inherited by the numeric column to change the default sorting you need to use a categorical axis understanding categorical and continuous axes and their roles in data visualization will enable you to select the correct axis based on the nature of the data you’re analyzing with this knowledge you can create more effective and informative visualizations making it easier to compare discrete categories or identify trends and patterns within numerical data renee the marketing manager at Adventure Works relies heavily on analytics using Microsoft PowerBI to equip herself for important executive meetings as part of her preparation for a high-level meeting with the company’s executives Renee has created several reports and presentations based on the results of the most recent marketing campaigns run by her department renee takes great care when preparing the analysis however she worries that there could be essential data insights that she and her team have overlooked seeking expert advice she turns to Lucas the data analyst for guidance lucas suggests using the analyze feature in PowerBI with this feature they can examine the data from different perspectives and ensure that no valuable aspects have been missed but what is the analyze feature and how can it be added to reports the analyze feature provides you with advanced analytics to automatically detect patterns trends and anomalies in your data in this video you’ll explore the analyze feature and how it can be used to identify trends and patterns now let’s help Renee to examine her data from different perspectives with the customer and sales data sets imported let’s create a new report and add visualizations first you’ll create a line chart and insert the order date on the x-axis without the date hierarchy and then the sum of order total on the y-axis you will also add an area chart next to it with the age field as the x-axis and the sum of the order total field as the y-axis finally on the bottom of the page you’ll add a clustered column chart with the product category as the x-axis the sum of order quantity as the y-axis and the order status as the legend then resize it to fit the screen now let’s start using the analyze feature on each of these visualizations to discover what insights it can add to your analysis starting with the line chart it is obvious that the biggest order was placed on the 7th of March to explore this further select this specific date rightclick and select analyze now you can select the explain the increase option once this is selected a variety of different visualizations appear these analyze the increased order figure on this day based on factors such as product size payment method product categories and others clusters that were created manually in the table will also be included in the analysis by scrolling through these automatically generated visuals you can gain a clear picture of the factors that caused the increase in the order amount now let’s run the analyze feature on the second visualization the area chart since using distinct ages isn’t very informative for analysis you’ll first create bins to group the age data to do this right click on the age column and choose the option new group apply size of bins as the bin type with 10 as the bin size and select okay to create the age groups separated by decade then drag and drop this new bin to add it on the x-axis and use the x button on the previously used age column to remove it from the chart to investigate further with the analysis feature let’s select the first bin with decreasing values right click on it select analyze and then explain the decrease just as with the analysis in the first visualization this action causes a number of visuals to appear these help us to identify all relevant aspects that might have contributed to the decrease in the age group above 40 years now let’s explore another useful aspect of the analyze feature in the bar chart which shows product category and status you may notice that road bikes have an unusually high number of canceled orders to investigate what might have caused this right click on the blue cancelled bar for road bikes and select analyze if you select find where this distribution is different a variety of visualizations are generated these illustrate the factors that played a significant role in the large number of cancellations of orders for road bikes this feature can highlight contributo factors such as country and location product cluster and more every visualization generated by the analyze features includes a thumbs up and a thumbs down option on the upper right corner this allows you to provide feedback to PowerBI regarding the usefulness of its analysis for your report when you are using the explain the increase or explain the decrease features you have the flexibility to select different visualizations to display the results that best suit your analysis requirements finally if the analysis feature provides an insightful visual that you’d like to include in your report you can quickly add it to the report by selecting the plus sign button in the top right of the visualization in this video you explored how to generate valuable insights from your data using the analyze feature in Microsoft PowerBI in this demonstration you learned how to work with diverse visualizations and interpret the results effectively the analyze feature provides you with advanced analytics automatically generating visualizations from your data sets aiding you to automatically detect patterns trends and anomalies in your data time series analysis involves analyzing a series of data in chronological order to identify meaningful information and reveal trends in this video you will explore how to create an insightful report analyzing adventure work sales data over a period of 3 years time series analysis involves analyzing a series of data in chronological order to identify meaningful information and reveal trends in this video you will create an insightful report analyzing Adventure Works sales data over a period of 3 years in your PowerBI report three Adventure Works data sets have already been imported these are sales product and date you will now add four visualizations as the basis for the time series analysis first add a simple card visualization with sales amount as its field second add a horizontal clustered bar chart with product in its y-axis and sales amount in its x-axis using the filter pane add a top 10 analysis on the visualization by sales amount so the highest selling products are highlighted line charts and scatter plots are the two most common visualizations used in the time series analysis with the first two basic visualizations already created let’s add these two types of graphs to the report add a line chart and include the date field from the date table in the x-axis this should not include the date hierarchy use sales amount from the sales table in the y-axis add a fourth visualization which is a scatter plot use the sum of total product cost from the sales table in the x-axis add the sales amount from the sales table to the yaxis include the category field from the product table in the legend section and the sum of sales amount from the sales table in the size section resize and move all the visuals so that they are better placed on the page now that the visualizations are created let’s explore how time series analysis can give you different perspectives on these visuals before you can create a time series analysis you must first import a custom animation visual from Microsoft AppSource microsoft AppSource is an online store offering custom visualizations that are built by industry-leading software providers to access the Microsoft App Store first select the ellipses in the visualizations pane and then select the get more visuals option this will take you directly to the PowerBI custom visuals in Microsoft app source search for the term play access to find the certified play access dynamic slicer visualization when you have located it choose add you should now have the play access button imported into the visualizations pane now let’s explore how to use the play access button as a dynamic filter in the report the play access button automatically filters all the other visuals using the chronological order of the date field that is added to it first select the new playaxis visualization in the visualization pane add month from the date table as a field this will ensure that the play access visualization will filter the report in a monthby-month sequence in the format your visual section there are three different formatting options that you may use specifically for the playaxis visual first there is animation settings it is possible to set the animation to auto start or to run on a loop for a specified time frame the second option is the time which you can use to modify the rate of filter transition here you will set it at 750 milliseconds which is a smooth transition speed the next format option relates to the color of the visual and specifically the color of each action of the play access button in this area you can specify colors for play pause stop previous and next actions the last format option is enable captions if you set this feature to on the button shows the value of the field that you have inserted and how it changes during the animation press play on the play access button to watch the sales data change month by month the play access button makes the report interactive by updating all the visuals simultaneously this provides a dynamic picture of the data outcomes over time and provides a more detailed analysis of the trends in adventure works sales you now know how to do a time series analysis and implement the playaxis visualization you can also use the play axis to conduct time series analysis decision makers in all areas of businesses require answers to very similar questions typical questions asked of the data analyst might be can we compare daily sales against the sales average is there a way to uncover trends in order quantity within our visualizations can we manually add a sales target threshold into our visualizations the senior management at Adventure Works consult with their data analyst Lucas they would like to see key information such as trends or averages to be clearly visible on certain visualizations lucas identifies reference lines as the key Microsoft PowerBI feature which will fulfill this requirement a reference line is an additional element that can be added to a visualization to draw attention to a key insight or piece of information powerbi offers a variety of reference lines that can be added to a visualization to include an additional measure for comparison with the data points the implementation of the line is based on integral calculations in the line type you’ve selected or on settings which you can customize let’s explore the different types of reference lines an average line represents the average value of a data series it is useful for identifying how individual data points relate to the overall average a median line shows the median value of a data series it is particularly helpful when dealing with skewed data distributions a percentile line identifies a specific percentile value such as the upper percentile within a data set helping you understand data distribution an x-axis or yaxis constant line is a straight line that represents a constant value on a visualization it is used to indicate a fixed threshold target or benchmark value for comparison a trend line reference line helps to identify trends or patterns in data different types of trend lines can be added to capture relationships in data it’s important to note that each visual within PowerBI supports its own set of reference lines this means that not every reference line type might be available for every type of visual powerbi intelligently offers reference lines that are contextually relevant to the type of data and visualization you’re working with for instance certain reference line types like trend line and average line are more applicable to line charts or scatter plots where data trends are easier to discern other reference lines like min line and max line are often used in bar charts to quickly visualize data ranges in some visualizations such as maps reference lines are disabled due to their limited interpretability within the visual context in the next few minutes you will be able to follow a practical demonstration on how to implement reference lines in PowerBI reporting this PowerBI report has two data sets already imported customer and sales you will create three graphs and add reference lines to them as another layer of visual information first create an area chart add age bins as the x-axis value and sum of order quantity as the yaxis value and resize it on the screen next add a line chart use order date as the x-axis value without using the date hierarchy and order total as the y-axis value resize this visual finally add a horizontal bar chart include location as the yaxis value and order total as the xaxis value and resize it to fill the screen now let’s add reference lines once you have selected the area chart a magnifying glass icon appears in the visualizations pane selecting this opens the analytics pane this pane lists the types of reference lines that can be added to the visualization add a trend line by selecting the on button a reference line appears which depicts the trend of order quantity over age groups it shows that older people order significantly less than younger people you can use the options below the trend line to adjust the line color transparency and style so that it stands out more for the next example select the line chart you will now add an average line which will help identify the days where the order total amount was above or below the average of each day in the analytics pane select average line and add line when the average line appears the choices underneath can be used to format it or to add a data label lastly in the bar chart it is important to easily identify the locations which are over a minimum target threshold select the bar chart in the analytics pane select constant line and add line add 3000 as the constant line value format the line if required it is now obvious that three locations Chicago Shanghai and Buenosaurus are below target thresholds of order total when choosing visualizations keep in mind that they do not all support reference lines for example if you change the bar graph to a map you can see that the line disappears in the analytics pane the message analytics features aren’t available for this visual appears you’ve now explored how adding reference lines to visualizations can highlight trends and data sets and simplify comparative analysis between data points adding reference lines to your report extends the capabilities of visual customization and allows you to meet the diverse demands of different business scenarios planning for the future is crucial for all businesses one business may need to plan for seasonal fluctuations in orders or revenue another may need to plan for growth and/or expansion what is critical in either situation is that key decision makers have reliable data and information and that they also have a realistic picture of future outcomes data analysts use forecasting to examine previous trends and patterns in business to predict whether they will continue and how they can affect future outcomes microsoft PowerBI contains a forecasting tool which can assist in this process renee at Adventure Works is currently formulating a 2-year development plan for the department she manages she has already been impressed by the reports that she has seen in PowerBI she approaches Lucas the data analyst to see if there are any visualizations available that could apply predictive models and forecast results lucas informs her that one of the core charts in nearly every report is already equipped with forecasting capabilities she’s excited to find out more the forecasting tool in PowerBI is directly built into line charts and it allows analysts and business users to predict future trends and values based on historical data they can make informed decisions and plan more effectively users can tailor their predictions to align with specific business needs and data patterns with forecasting options let’s look at three important concepts confidence interval in forecasting is the range of values within which the actual feature outcomes are likely to fall with a certain level of confidence it quantifies the uncertainty associated with a forecast for example a 95% confidence interval indicates that there’s a 95% likelihood that the actual future values will fall within the forecasted range this helps decision makers understand the potential variability in the predictive values seasonality refers to recurring patterns or cycles that appear at regular intervals in time series data patterns could be daily weekly monthly or yearly they often result from external factors like holidays or seasons or economic cycles recognizing and accounting for seasonality allows forecasting models to capture the expected fluctuations in data that repeat over time lastly ignore the last is a feature that allows users to selectively exclude the most recent data points from the historical data set when generating forecasts in PowerBI anomalies or abrupt changes in the data may occur in the latest periods which might distort the forecasted results by ignoring the last few data points users can focus the forecasting model on the more stable and representative patterns in the earlier data now let’s step through a practical example of including forecasting results in a line chart forecasting in PowerBI starts with a line chart adventure Works sales and date data sets have been imported into a new report in the visualizations pane select a line chart to add it to the canvas add date from the date table to the x-axis do not add the date hierarchy add sales amount from the sales table to the y-axis this basic configuration is all you need to apply forecasting to access the forecasting capabilities select the line chart then select the magnifying glass to open the analytics pane of the visuals select forecast in the list and turn it on a predictive section has already been added in the line chart select the arrow on the left to open the forecast settings options is the first and most important section here you can define the rules for how the forecasting line will be drawn units is set to points points refers to the date unit currently used in the visualization in forecast length you can specify a number of these date units and this will determine the length of the forecasting line in this case to forecast a whole year of values select 365 points to forecast a whole year period for confidence level select 90% confidence interval and select apply the forecast line also contains options to customize the line select the forecast line select a blue color so that it is similar to the actual line with the style option you can choose a dashed dotted or solid forecast line adjusting the transparency setting changes the visibility of the forecasted plot the confidence band choices allow you to customize the style of the upper and lower bounds changing it from fill to line the none choice will display no confidence bounds at all the forecasting feature in Microsoft PowerBI can create predictions of future trends from historical data adding these to your reports can provide you with valuable insights you are now familiar with using forecasting in a line chart and with concepts such as confidence intervals seasonality and ignore the last you’ve learned how to capture recurring patterns and how to allow for uncertainty these skills will allow you to design reports containing accurate forecasting the accurate anticipation of future outcomes will drive informed decisionmaking understanding the forces driving sales trends is a continuous concern for businesses advanced analytics tools are an accessible avenue to understanding these forces this is precisely the avenue your team proposes to navigate within Adventure Works sales data set with the robust capabilities of Microsoft PowerBI’s key influencers visuals you aim to identify all primary factors contributing to the rise and fall of sales figures in this video you’ll discover the power of the key influencer visualization an advanced analytics visualization in PowerBI you’ll learn how to include it in a PowerBI report and use it properly to obtain valuable information the key influencer visualization is one of the main advanced analytics visualizations in PowerBI it uses advanced algorithms to uncover relationships buried within data shedding light on the influential factors behind specific outcomes whether you want to understand the triggers behind a surge in sales or the reasons for a sudden decline the key influencers visual offers a concise snapshot of what truly matters now let’s explore the capabilities of the key influencers visual let’s start with an empty report with imported adventure work sales data select the key influencer icon on the visualization pane to add it to the canvas your aim is to apply AI insights to analyze the factors behind increases and decreases in the sales amount to do this drag and drop this sales amount field from the sales table in the analyze field the key influencer visual is now declaring that there are no fields in explain by requesting any number of relevant fields to the sales amount to initiate the analysis an AI analysis on all those factors will take place locating which of them are the main contributors behind sales amount surges and decreases to ensure the visualization provides insightful results you can add various relevant fields to the analysis for example let’s add the country region field from the customer table and the color and subcategory fields from the product table notice that as you add fields the visualization is already running a background analysis on the correspondence between the sales amount with all fields added in the explain by section let’s observe the results the top influencers affecting the sales amount are displayed on the visuals left side you can view the analysis results in detail by selecting any of them let’s select the red color influencer to delve deeper into the analysis when you select an influencer bar chart with a color field an analysis of sales amount compared to the average of sales per color displays you can observe the influence the red and silver products have on the sales total at a glance in contrast with the multi and white colors that barely made any sales to analyze the factors behind low sales amounts select the what influences sales amount box to change it to decrease apart from highlighting the key influencers affecting the sales these advanced visuals also group these influencers showcasing segments of influencers that played a significant role in sales increases or decreases select the top segments option in the upper border of the visual and in the field when is sales amount more likely to be choose high to identify the segments that perform well in sales now select the largest circle to view the results red road bikes have the biggest impact on sales with mountain bikes in the second position in this video you’ve explored the key influencers visualization an advanced analytics feature in PowerBI in just a few minutes with the support of AI algorithms powering the key influencer visual you extracted insights from your data set shedding light on the driving factors behind sales trends whether positive or negative you can also incorporate advanced analytics into your reporting process elevating the quality and depth of your analytical insights the marketing team at Adventure Works was fascinated by the impact the previous advanced visualization key influencers had on their data set they are now eager to explore what other advanced visualizations can accomplish your manager Addio wants to introduce decomposition trees another specialized analytics tool in Microsoft PowerBI if you’re wondering where and how to include the decomposition tree visual in a report this video is for you in the next few minutes you’ll be introduced to the decomposition tree and how to use this visual to navigate through data hierarchy levels which refer to the arrangement of data points in a structured format where elements are organized into levels or tiers based on their relationships you’ll also learn how to activate its AI potential letting the visual guide you through the critical factors behind outcomes but first what are decomposition trees the decomposition tree visual in PowerBI lets you visualize data across multiple dimensions it automatically aggregates data and enables drilling down into your dimensions in any order it is the optimal solution when analyzing the hierarchical structure of data being an AI visual it can also leverage the hierarchical graphical representation of the visualization to automatically explore dimensions based on certain criteria here is an example of how the decomposition tree breaks down Adventure Works sum of sales amount into hierarchical groups referred to as branches to analyze the distribution of the amount in its subcategories the user can navigate through the branches manually by selecting any data point or enable the AI capabilities of the visual to automatically navigate through the branch based on the most influential components to start our journey with decomposition trees let’s launch a new report using the Adventure Works sales date and product data set locate and select the decomposition tree visual in the visualizations pane to add it to the report readjust the visual so it fits the whole screen add the sales amount into the analyze field before looking into its AI powered capabilities let’s explore the basic functions of decomposition trees decomposition trees excel at analyzing data structured in a hierarchical fashion so let’s find a structure built like this in the data set navigate to the data view of the report and to the product table you can see that each model belongs to multiple supercategories which have the following sequence product model subcategory and category let’s add this hierarchy to the decomposition tree to utilize its basic features add all four components of the hierarchical structure into the explain by field in any order a plus sign appears just right of the sales amount bar navigate through the hierarchy components in the order they are being used in the data set to get a complete breakdown of the sales amount between products in the data set although you can use the plus sign in any order you want utilizing the hierarchy sequence will give the best decomposition possible hit X anytime to remove a column from the decomposition tree and use the lock button to prevent a user from removing it now that you have a basic understanding of the decomposition tree let’s look at its AI capability to explore this potential let’s remove the model and product fields and add two other dimension fields to the chart color from the product table and year from the date table start at the first level of decomposition the category and select the plus sign you can now see that besides the columns added on the explain by field there are two more options high value and low value with a light bulb on their left side by selecting either one of them the decomposition tree will automatically choose the main driving factor between all fields added in the explain by section and highlight it for you to look at its capability select the high value of accessories to identify that the helmet subcategory was the driving factor of the accessory sales while in the clothing category the main reason behind the accumulation of the high amount was the superb clothing sales of 2019 on the other hand by removing the generated column and selecting a low value in the bikes category you can identify that blue colored bikes were the lowest performing attribute in bike sales with each lowest point being in 2020 in this video you learned about the capabilities of the decomposition tree an advanced visualization in PowerBI the decomposition tree is a unique tool for ad hoc exploration and root cause analysis of the factors behind any outcome in a data set combining both basic features with advanced AI capabilities it can convert information into valuable insights and contribute to business decision making by providing a deeper understanding of the underlying insights in a data set in the modern age of technology where information is all around us imagine you could uncover a map that reveals the hidden pathway that leads to success this is the exciting world of identifying patterns and trends in Microsoft PowerBI a journey that transforms raw data into secrets for success and numbers into opportunities this module gave you the experience of a modern-day explorer equipped not with a compass but with PowerBI’s analytical tools so let’s briefly recap some of the key concepts covered in the identifying patterns and trends module your foundation of identifying patterns and trends was laid through an introduction to analytics in PowerBI and its statistical summary capabilities you are equipped with the knowledge needed to incorporate a range of statistical functions into your reports supported by practical examples and a detailed cheat sheet of available statistical functions within DAX language you learned the importance of grouping similar data points into segments to highlight hidden patterns to empower you in this concept you explored PowerBI’s grouping bin and clustering techniques which helped match the precise needs of your analysis covering histograms top-end analysis and continuous and categorical axes you gained even more tools to include analytics in your data sets advancing and focusing on trend identification you engaged with the exceptional tools of the analytics pane including reference lines error bars and forecasting these tools significantly enhance chart information depth enabling not just data point comparison but also future trend prediction these tools have the capacity to explain data fluctuations providing a variety of insightful visuals that you can instantly add to your reports moreover you gained an initial glimpse into PowerBI’s ability to generate insightful visualizations via the analyze feature automatically these tools have the capacity to explain data fluctuations providing a variety of insightful visuals that you can instantly add to your reports lastly your introduction to AI visuals in PowerBI completed the picture you learned how to conduct root cause analysis within your reports using specialized visualizations like key influencers and decomposition trees these visuals are invaluable for uncovering key drivers behind data set fluctuations you also explored the Q&A visualization a powerful tool capable of transforming any business user into a data analyst formulating queries and crafting visualizations this natural language processor empowers you to translate language into graphs with remarkable efficiency ultimately your journey through the identifying patterns and trends in PowerBI has equipped you with a multi-dimensional toolkit from mastering statistical functions to unraveling hidden insights through segmentation and powerful analytics techniques you’ve become a data explorer skilled at revealing the story within the numbers with the ability to predict trends and harness AI powered visuals you are now better prepared to translate data into strategic decisions imagine yourself as an explorer in a maze of data surrounded by a vast and complex landscape of information somewhere deep within beyond the twists and turns lies pathways to hidden insights and unchartered opportunities awaiting discovery navigating through this data maze without proper guidance or tools could mean missing out on these hidden treasures entirely microsoft PowerBI serves as your modern-day explorer’s toolkit equipped with advanced mapping techniques helpful clues and expert data navigation it helps you cut through the noise interpret the data patterns and go directly to the heart of the insights buried within during this course you’ve transformed from a curious data wanderer into a skilled navigator prepared to guide businesses like Adventure Works toward newfound opportunities and business success using data analysis and visualization in this video you’ll consolidate critical lessons from your journey through this data analysis and visualization with PowerBI course you’ll have a refreshed understanding of creating visually engaging dashboards and reports you’ll also recall concepts related to making your PowerBI dashboards and reports more userfriendly accessible and inclusive sharing your dashboards and reports with users and optimizing reports using DAX language and using visualization and AI in PowerBI to perform data analysis and identify patterns and trends your journey began with a foundational understanding of PowerBI acting as your compass you delved into the details of PowerBI service PowerBI desktop and PowerBI mobile in this part of the course you are introduced to choosing between PowerBI Pro and PowerBI Premium the limitations and advantages of each and how these choices impact data storage sharing and collaboration capabilities you also became well-versed in the administrative interface getting to grips with workspace creation and data set management this was like understanding the maz’s structure and its very pathways setting the course for your data journey you learned how permissions and roles in PowerBI can influence the accessibility and security of your data much like how an explorer’s team is structured based on roles and expertise in navigation you gained insight into diverse visualization forms from simple bar charts to more complex waterfall and funnel charts your journey went beyond surface level exploration introducing you to the DAX language for calculated columns and measures to make your visuals more dynamic and informative you also explored advanced customization options such as using slicers for real-time data manipulation or conditional formatting to highlight key metrics these tools became guiding tools for precise data interpretation you also picked up the importance of visual hierarchy and storytelling along the way realizing that a well ststructured report can convey a narrative that empowers decision makers making your insights both accessible and inclusive became your next focus you learned how to make your PowerBI dashboards and reports accessible to users with disabilities this involved implementing high contrast color schemes adding alt text to visuals and ensuring tab navigation compatibility moreover you explored the built-in translation features of PowerBI ensuring minimal data language barriers these strategies ensure your data exploration is inclusive and reachable for all additionally you covered how to create mobile responsive reports understanding that accessibility also pertains to the variety of devices used to access data navigating through advanced functionalities was your next challenge here you deepened your knowledge of PowerBI’s more robust features such as using drill down and drill through functionalities to navigate between different layers of your data you also tackled data modeling understanding how to create relationships between various tables and sources your expedition delved deeper to uncover query parameters and their role in making your reports dynamic and interactive these tools enable you to interpret the data in the maze precisely without losing sight of the broader context you even ventured into APIs and custom connectors expanding the realms of data sources you can bring into PowerBI finally you were introduced to PowerBI’s AI capabilities like text analytics and the integration of machine learning models you explored time series analysis to forecast trends and discovered how to generate predictive models understand correlation and create data simulations your exploration continued to discover how to generate predictive models understanding correlation and create data simulations this makes it possible to predict and prepare for future trends much like an experienced explorer reading signs from the environment to prepare for what lies ahead you were guided through the process of automated machine learning in PowerBI making it possible to create predictive models without indepth programming knowledge like finding shortcuts and secret pathways within the maze as you conclude this course take a moment to reflect on your expedition you began as a budding explorer and now stand as the guide of others navigating through the intricate and sometimes bewildering maze of data analytics with confidence you’ve mastered the navigational tools and the instruments at your disposal with PowerBI and learn the art of reading and interpreting data in its deepest forms remember the world of data is vast and the technology that helps us navigate it is ever evolving you’ve acquired the skills strategies and insights to embark on countless more adventures but the maze remains boundless with every question you answer you’ll discover new ones that provoke your curiosity and challenge your understanding that’s the beauty and the challenge of data analytics embrace the ongoing quest for knowledge wisdom and growth with optimism in your heart and curiosity as your guide the best adventures still await congratulations on completing the data analysis and visualization with PowerBI course your dedication and hard work have paid off and you’ve gained knowledge skills and tools that will help set you on a path to excel in the world of data analysis you have successfully covered the following topics adding visualizations to reports and dashboards applying formatting choices to visuals adding useful navigation techniques to reports designing accessible reports and dashboards and using visualizations to perform data analysis you should now be well grounded in data analysis and visualization with Microsoft PowerBI you’ve learned how to use the power of data visualization and reporting in PowerBI to create compelling data stories and use formatting navigation and filtering to create interactive user-friendly and accessible reports that are engaging and informative from using visualizations and AI features to uncover data trends and patterns to sharing your insights effectively you are now better positioned to support businesses like Adventure Works in making datadriven decisions and driving business success but remember this is just one step on your data analysis journey by completing all the courses in this program you’ll receive the Microsoft PowerBI Analyst Professional Certificate from Corsera this program is an excellent opportunity to enhance your proficiency in data analysis in PowerBI and gain a qualification that opens doors to entry-level positions in the data analytics field this program will also help you prepare for exam PL300 Microsoft PowerBI data analyst by successfully completing the PL300 exam you’ll earn the Microsoft Certified PowerBI data analyst certification which will position you well to begin or advance your career in this role this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to prepare data model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX which you will learn about throughout the program to learn more about the PowerBI data analyst certification and exam visit the Microsoft Certifications page at http://www.learn.microsoft.com/certifications your journey through this course has not only provided you with essential skills in data analysis but also has laid the groundwork for your future endeavors your ability to recognize different visualizations apply formatting choices design accessible reports and dashboards and perform data analysis using PowerBI will undoubtedly set you apart in the world of data professionals but there’s still more to learn and room to grow so why not register for the next course in the program whether you’re a novice in the data analysis field or an experienced technical professional completing the entire program will showcase your knowledge of and proficiency in analyzing data with PowerBI your dedication to learning and growing in the world of data analysis is commendable and you should be proud of your progress and accomplishments your commitment will show prospective employers that you are capable motivated and driven and eager to learn it’s been a pleasure to be part of your educational journey wishing you all the best as you continue to explore the endless possibilities that data analysis with PowerBI has to offer congratulations once again and best of luck hello and welcome to the creative design in PowerBI course businesses and organizations obtain data from many sources these include government financial economic health and scientific data to name just a few as a data analyst it might be your job to extract insight from this large pool of data you could use Microsoft PowerBI to import this data and create data models but how will you then present the results of your work would you agree that a more creative presentation approach is required especially when dealing with large volumes of data you might aim for a more userfriendly presentation of the data so we’ve designed this course to give you the skills you need to visually share your data insights with your intended audience in this course you will learn how to creatively design dashboards reports and charts you’ll make visuals that the audience can quickly understand and you’ll know when and how to include specialist elements such as videos streaming data and QR codes as part of your business intelligence presentations you’ll be introduced to the theory and practice of visualization and design this includes the design principles of data display and visualization let’s now quickly summarize the course material to give you an overview of all you’ll study in this course you’ll begin by learning how to create a cohesive report design based on the characteristics of your target audience you will identify key information so that you can produce audience focused reports in week two you’ll learn how good design enhances the comprehension of data in your reports you’ll apply visual clarity use multi-dimensional visualizations insert map visualizations and implement custom visualization such as Python-based visualizations with these methods you can design powerful report pages that improve the enduser experience then it’s time to visit the concepts of dashboard design and storytelling you’ll compare the design of a dashboard with the design of a report and you’ll explore the principles of data storytelling advanced dashboard features such as embedding media and QR codes are part of your studies this week during the course you can watch pause rewind and re-watch the videos until you’re confident in your skills consolidate your knowledge by consulting the course readings and measure your understanding by completing knowledge checks and quizzes in addition the course discussion prompts allow you to share and chat with other learners by connecting with your classmate during discussions you can help grow your network of contacts your studies prepare you for a final project and a graded assessment that you’ll undertake in the last week of this course in the project you’ll get a pre-made Adventure Works data set and model in PowerBI your challenge is to use the data to prepare reports for the sales team and the executive board you’ll need to use data storytelling and cohesive design you’ll also be asked to use the data to highlight new business opportunities after this hands-on learning you will complete a final graded assessment be assured that everything you need to complete the assessment is included in the course and of course as part of your preparation for assessment you can always review the content of any lesson to revise the relevant videos readings exercises and quizzes businesses need data sourcing preparation and analysis presenting the insights gained is often the last part of this data processing it’s a key factor in ensuring that the benefits of the analysis are understood by all stakeholders is this course for you hopefully the outline of the course content and topics will help you decide you don’t need an IT related background to take this course it’s for anyone who likes using technology and has an interest in presenting the results of data analysis whatever your background to complete this course you need to have access to some resources you need a laptop or desktop computer with a recommended 4 GB of RAM an internet connection and a Windows operating system version 8.1 or later it should have a .NET Framework version 4.6.2 or later installed and a subscription to Microsoft Office 365 you will also need to install PowerBI Desktop available as a free download the courses in this program prepare you for a career in data analysis when you complete all the courses in the Microsoft PowerBI analyst professional certificate you’ll earn a Corsera certificate to share with your professional network taking this program not only helps you to become job ready but also prepares you for exam PL300 Microsoft PowerBI data analyst in the final course you’ll recap the key topics and concepts covered in each course along with a practice exam you’ll also get tips and tricks testing strategies useful resources and information on how to sign up for the exam finally you’ll test your knowledge in a mock exam mapped to the main topics in this program and the Microsoft Certified Exam PL300 ensuring you’re wellprepared for certification success earning a Microsoft certification is evidence of your real world skills and is globally recognized a Microsoft certification showcases your skills and demonstrates your commitment to keeping pace with rapidly changing technology it also positions you for increased skills efficiency and earning potential in your professional roles the topics covered in the practice exam include prepare data model data visualize and analyze data and deploy and maintain assets in summary this course introduces you to how a data analyst using Microsoft PowerBI applies data design techniques to create compelling stories through reports and dashboards i hope you are ready to start creating compelling and cohesive reports and dashboards using the best visual techniques to optimize audience focus i don’t have to tell you that a social media photograph gets way more likes and shares than a message that contains text only we choose to look at images first your brain processes visual data thousands of times faster than text that’s the main reason we prefer visual communications it’s also why right now all over the world people are using data visualization software to make sense of large complex data of course humans communicated visually long before we had technological power let’s check in on how we progressed from using just numbers for data presentations prepare to understand the real meaning behind the numbers as our understanding of the impact of visuals increased the approach to creating visualizations changed and in 1933 Harry Beck created the London Rail Underground Map inspired by electrical circuit diagrams it simplifies a complex layout by focusing not on rail line geography but on how a commuter uses the rail system it’s a visual style still used today to make data easier to understand visualizations that successfully connect with users have a lasting impact on how we communicate data let’s say you want to use data visualization to illustrate a much larger rail network it could be 10 times bigger or a thousand times bigger scale it to 100,000 times and you have an idea of the data volumes now available data visualization tools help us understand big data in the world around us just compare older 2D maps to how satellite mapping reveals a different vision we can zoom in for more detail to give a granular understanding of the area zoom further into a city’s layout and reveal data insights with visual markers while always being able to place our insight in the context of a global landscape businesses benefit from data visualization by understanding the impact of their decisions businesses can create better products and services that improve the lives of their customers but data visualization is not just for business it improves data accessibility for governments organizations and citizens for the first time we all have access to detailed and accurate data about the planet professor Hawkins from the University of Reading created global warming strikes like this a simple visual with no text no numbers but its message of the danger of global warming is clear despite technological advances the goal of data visualization remains the same to make data accessible and easier to understand imagine a world where large-scale decisions are better understood through visualizations of this data you can use data visualization tools to enhance your communication skills reveal insights on a global scale and help build a better world how do you choose an outfit from your wardrobe when choosing which clothes to mix and match it’s important to know what colors go well together after all you want to look your best the same goes for your reports and dashboards to look their best they need to have the best mix of colors and shades that’s why you are now being introduced to color theory in this video you’ll explore color theory its basic concepts and how it assists you in creating presentations and data graphics color theory is the collection of design rules and guidelines used to communicate with users through effective color schemes color theory involves the meaning and use of colors and how to pick the best colors in different situations to build harmonious and visually captivating color combinations as a data analyst understanding the principles of color theory is essential for creating visually captivating and effective designs colors can evoke emotions convey messages and enhance the impact of reports color theory is a practical guideline for the visual effects of color combinations it includes the color wheel color harmony color psychology and color symbolism it gives you a powerful toolkit to create visually pleasing and meaningful designs the color wheel represents the relationship between colors it consists of primary colors red blue yellow secondary colors which are mixes of primary colors such as orange green and purple and intermediate or tertiary colors which are mixes of primary and secondary colors the color wheel guides your choice of colors leading to color schemes that create harmonious compositions color harmony is another important concept color harmony refers to the arrangement of colors in a specific design that is visually pleasing to the viewer you create visual balance and enhance the overall impact of your design by choosing the correct color combination here are a few methods used to combine colors into a color scheme complimentary colors this system uses opposite hues on the color wheel analogous colors uses groups of colors that are next to each other on the color wheel triadic is a color concept that uses a three-pointed triangle selection of colors from the color wheel monochromatic color combinations use several variations of the same color the psychology of color is one of the most important aspects to consider during your design colors can evoke emotions and influence behavior for instance when designing marketing materials for Adventure Works outdoor adventure products incorporating vibrant and energetic colors like orange and yellow can evoke feelings of excitement and enthusiasm colors can often carry symbolic meanings and cultural associations different cultures may interpret colors differently so it’s important to consider cultural context when selecting colors for global designs for instance while red may symbolize luck in Eastern Asian cultures it can represent danger in some Western cultures by understanding color symbolism you can ensure that your designs effectively convey the intended message across different cultural backgrounds given the importance of color theory it’s crucial to consider accessibility when working with color in design as not all individuals perceive colors in the same way color blindness is a condition where individuals have difficulty distinguishing certain colors or perceiving color differences the most common type of color blindness is red green color blindness where individuals have trouble differentiating between shades of red and green to ensure that your designs are accessible to individuals with color blindness use color combinations that have sufficient contrast this means avoiding color combinations that may appear similar to individuals with color blindness it’s recommended to use high contrast color combinations such as black text on a white background to improve readability additionally providing alternative ways of conveying information beyond color is crucial for example if you’re using color to indicate different categories or data points consider also using patterns labels or symbols to supplement the color coding this ensures that individuals with color blindness can still understand and interpret the information accurately by considering color theory and accessibility together you can create designs that are not only visually appealing but also inclusive and accessible to a wider range of individuals mastering color theory is a vital skill for any artist designer or creative professional by understanding the principles of the color wheel color harmony color psychology and color symbolism you can create visually captivating designs that effectively communicate messages and evoke emotions in your audience as you embark on your colorful journey at Adventure Works let color theory be your guide in transforming ordinary designs into extraordinary visuals if I tell you that the temperature is very hot what color comes to mind most people answer in the range of orange to red color is a crucial design element for business intelligence dashboards and reports to make them visually intuitive and understood by all viewers by the end of this video you will understand how colors evoke psychological associations and convey symbolic meanings let’s explore the science of color in communicating datadriven stories in business communication colors serve as navigational tools directing users attention and facilitating efficient information access here are some roles colors can play in designing your reports and dashboards background is the color of your report or dashboard background or the background of an individual visual within the report use low saturation colors that is a color that is not too vivid rich or intense then the background will not distract users from the main story the dominant or primary color gives viewers the first impression of the color theme it’s typically used in a lot of elements to create contrast within your report an accent color is used for focal points of your report capturing users immediate attention examples include call toaction buttons alerts and warning messages semantic colors are colors that have an

actual meaning and they aid a seamless comprehension for example commonly employed colors for alerts are red for bad orange represents average and green signifies good semantic colors are usually used for conditional formatting on texts and charts once you choose colors for your reports you can create a color palette powerbi can upload a color palette as a JSON file to design a custom theme for your reports and visualizations by using a JSON file you can create a report theme file that standardizes your charts and reports making it easy for your organization’s reports to be consistent use these colors to amplify insights for example identify certain values or groups within your data that are good or bad use contrasting colors to differentiate between different values use shades of the same color to demonstrate the strength or weakness or various grades for instance using shades of the same color in a geographical visual to represent the ascending or descending values of sales use a dull color for something less important and a bright color for crucial information at Adventure Works you must create a report showing a table of sales data with profit margins the profit margins will be emphasized using effective color combinations while considering accessibility requirements let’s explore color selection in data visualization launch Microsoft PowerBI desktop and open the project salesbyear.pbix pbix navigate to the report view of PowerBI desktop to the report containing a table with sales and profit margin values and a column chart emphasizing the profit margin to remove the sum of prefix from the column titles go to the visualization pane and in the columns list doubleclick on the column name and delete the sum of text this can be done for all columns that need to be renamed to change the theme of the visualization navigate to the view tab of PowerBI and select the accessible city park theme from the theme drop-down list this will change the entire color combination for the current report the theme contains colors that satisfy accessibility requirements to ensure accessibility for the broadest range of consumers you can increase the font size and change the font color throughout the report to maximize visibility and contrast for instance increase the font size of the table values to 18 point select the table and navigate to format visual visual expand the value section and change the font size to 18 expand the column header section and change the font size to 18 then to accommodate the new size of the table move and resize the two visuals the next task is to highlight the most valuable information in the table the profit is the most important information for the executives you can use color psychology to emphasize this section of the visual select the table visual and go to the visualization pane in the columns list select the drop- down arrow beside the profit margin column and move the cursor to conditional formatting in the drop- down list this opens a submen of the drop-own list font color is what is needed from this list this opens the font color format dialogue box for the profit margin column values select rules from the format style drop-own menu and select values only from the apply to section profit margin is selected under the what field should we base this on section leave this column selected next to define the rules for the first rule the process is to select the greater than or equal to symbol and enter zero for the value then select number from the drop-own list just after the and part of the rule select less than and write max in values and select number from the drop-own list finally in the then part of the rule select the green color from the theme color selection section to set up the second rule continue to select a new rule with the plus icon to add a new rule to the list in the first control select greater than or equal to from the drop- down list and remove zero it will automatically select min and then select number from the drop-own list after the and of this rule select less than write zero in the values and select number from the drop- down list finally select a red color from the theme colors select okay the conditional formatting will change the color of the text to red if the profit margin is in the negative range this is a format that the company executives expect it allows them to quickly assess this part of the report to colorize the column chart representing the profit margin select the chart and in the visualization pane navigate to the format visual tab from the expand the column section where you can assign individual colors to each column select a red color for financial year 2022 and keep the green for 2020 and 2021 finally change the text size of the column chart to 12 point this means in format visual changing the font size for x-axis values y-axis values title and data labels that example transformed a report lacking clear visuals and without Adventure Works branding into an attention-grabbing report by the intelligent use of colors as a report designer understanding the key role of color is crucial to creating visually compelling and impactful work you get a report or a page of information on your screen how do you decide if the content is important enough for you to read many designers include headlines subheadings and other design devices such as callouts elements like these highlight key parts of the information allowing you to decide faster if the content is relevant to you you’ll use similar tactics in your Microsoft PowerBI report and dashboard designs over the next few minutes you will be introduced to the concepts of positioning and scaling by strategically placing and sizing visual elements such as charts tables and text you guide the viewer’s attention and indicate the level of importance of the information let’s say you are asked to create a complex report for Adventure Works to present the company’s annual revenue growth by region to achieve effective positioning and scale you place a bar chart in the middle of the report clearly displaying revenue figures for each region to provide additional context you position a map visualization alongside the bar chart showing the geographic distribution of revenue growth by placing the two different visual elements together you can enable viewers to make connections between regions and their respective revenue performance for the most effective delivery you must plan your report think about the positioning of different portions of data use scaling techniques and create a good user experience in your report positioning is the strategic placement of visual elements within a report to guide the viewer’s attention and convey key information it’s essential to consider the flow of information and the logical sequence in which the audience will consume it the placement of data and insights can significantly impact how they are perceived for example when presenting sales figures for Adventure Work’s latest product line you would position the most important metrics such as revenue and units sold at the top of the report this ensures that viewers immediately grasp the success of the product line before diving into further details additionally you must pay attention to the logical flow of information you arrange sections of the report in a way that follows a natural progression enabling viewers to easily navigate through the data supporting details such as product specifications or regional sales performance are strategically positioned below the main metrics providing contextual information to support the overall narrative now let’s explore scaling scaling refers to the relative size and proportions of visual elements within a report it is important to recognize that finding the right scale is crucial for ensuring readability and visual clarity heading and titles are carefully sized to be larger and bolder drawing the viewers’s attention to important sections for instance when showcasing the company’s quarterly sales performance you can use a larger font size for the title to make it stand out and capture the viewer’s interest in contrast data labels and annotations are scaled down to avoid overwhelming the viewer with unnecessary information additionally the scale of charts and graphs should be carefully considered to represent the data accurately access labels tick marks and legends should be appropriately sized and positioned for easy interpretation by maintaining consistency in the scale of measurement across multiple charts and graphs in your reports you enable the viewers to make meaningful comparisons and draw insights effectively overall the positioning and scale of information in report design should aim to create a visually pleasing and intuitive experience for your audience by effectively organizing and presenting data you can enhance understanding facilitate analysis and effectively convey your message for report design mastering the art of positioning and scale is vital by considering the logical flow emphasizing key information and balancing scale you create visually compelling and informative reports that captivate viewers as a data analyst adopting these principles can elevate your report designs and effectively communicate insights to your audience adventure Works has a salesperson performance Microsoft PowerBI report with total sales and quantity sold however the visuals are randomly positioned and the information is overwhelming the task is to redesign the report to better present the data let’s explore how this is done the report contains a clustered column chart showing total sales by year and salesperson a clustered chart showing quantity by salesperson a card showing the top three salespersons and the company logo the first issue with the current report is the density of information presented in a single visual for example the column chart of total sales by year and salesperson is busy with too much information and the second is that all the visuals are randomly located on the report canvas to begin the redesign in the view tab from the theme drop-down options activate the accessibility city park theme themes are standardized color schemes that can be applied to your entire report to maintain consistency throughout your report the accessibility support in this theme includes a color palette that provides contrast between content background and adjacent colors so the text and graphics are legible to ensure accessibility for the broadest range of consumers increase the font size and change the font color throughout the report to maximize visibility and contrast to make the text color of the axis titles and labels consistent throughout the report customize the theme to do that navigate to the view tab and in the themes dropdown select customize current theme the customize theme dialogue appears select advanced from the middle pane and select a black color for the second level elements select apply then select the total sales by year and salesperson column chart in visualization build visual scroll down and remove the salesperson field from the legend section the legend is busy with too much information in a small area the primary objective of the chart is to show the total sales per salesperson by removing the salesperson field and creating a slicer we can present the same information with better clutter-free visuals resize the column chart and drag it to the left of the canvas then navigate to visualization format visual visual to expand the Xaxis and scroll down to change the title toggle to the off position move the second chart out of the way for now for the first chart going to visualization format visual visual expand the column section and select FX to open the conditional formatting dialogue box in the dialogue box select the total sales from the drop-own of what field should we base this on section then select the black color in the lowest values check add a middle color and select a green color for the midval section select the darker green color for the highest value section then select okay to finish setting up conditional formatting the conditional formatting converts the columns to the shades of green and black color that you specified with the shade based on the column value it also adds a color legend to the column chart the legend is an unnecessary element in the chart that can be deleted to make the design cleaner to remove it go to visualization format visual visual legend and turn the toggle to the off position finally change the text size of the chart X and Yaxis and data labels to 12 points as the original visual was created to represent the salesperson’s performance add a salesperson slicer to the report to do this from the data pane bring the salesperson field from the salesperson’s table to the report canvas and select the slicer option from the visualization pane selecting the slicer go to visualization format visual visual slicer settings options there from the style drop-own list of options select the drop-own choice resize the slicer and drag it to the top right position of the report canvas next select the sum of quantity by salespersons column chart and replace the salesperson field from the x-axis with the year field from the order date column of the sales table the reason for this change is that we have a salesperson slicer and we can create a consistency between it and this chart by having year on the xaxis then the salesperson slicer will interactively present the sales generated by each salesperson in each year from visualization format visual general expand title and rename it as quantities sold rename the y-axis label as quantity sold then remove the x-axis title apply conditional formatting to the column colors remove the color legend and change the text size the column chart is resized to the same size as the previous one and dragged to position it parallel with the previous visual next resize and drag the top three salesperson’s card below the slicer and adjust the position and size accordingly for better visibility and accessibility change the text size and color of the salesperson’s name on the card go to format visual visual expand the card section change the title font size to 18 and color to black finally drag the Adventure Works logo to the top left of the canvas and add a report title of salesperson’s performance the report now has a structured layout with a logical flow of all the information originally presented this report demonstrates that proper positioning and information density adjustments improve comprehension and engagement placing visual elements optimizing scale and ensuring clarity of labels allows organizations to effectively communicate insights and make datadriven decisions in the realm of report design the organization and presentation of information plays a crucial role in capturing the attention of viewers in this video you will explore the concept of cohesive pages and the importance of striking the right balance between chaos and cohesion in report design drawing inspiration from Adventure Works you will delve into how thoughtful design choices contribute to cohesive pages that effectively convey information and captivate audiences before going into the dynamics of chaotic versus cohesive pages let’s recap the significance of cohesion in report design in previous videos you learned how elements such as color positioning and visual hierarchy contribute to cohesive designs by utilizing consistent color palettes strategic positioning of elements and clear visual hierarchy designers can create reports that are visually appealing easy to navigate and convey a unified message consider that your company Adventure Works needs to showcase its product lines performance across different regions in a report to create a cohesive page you need to employ a clean and structured layout you have to utilize consistent color schemes such as using brand colors to highlight important information and differentiate regions graphs and charts are thoughtfully positioned aligned and scaled to facilitate easy interpretation in this scenario a chaotic page would feature disorganized graphs overlapping text and a mix of unrelated colors leading to confusion and a lack of clarity chaotic pages suffer from a lack of structure coherence and intentionality they are characterized by cluttered layouts conflicting color schemes and elements positioned inconsistently chaos not only hampers visual appeal but also creates confusion and hinders effective communication of information in an Adventure Works report a chaotic page may include confusing graphs overlapping text and inconsistent use of color making it challenging for viewers to understand the intended message when working for Adventure Works you recognize the significance of cohesive pages and strive to create designs that engage and inform viewers effectively by adopting cohesive design principles you ensure that your reports are visually appealing organized and easy to navigate for example when presenting quarterly sales performance you carefully arrange key metrics in a logical flow utilizing a consistent color palette that aligns with their brand identity this approach creates a cohesive page that guides viewers through the information in a structured and comprehensible manner adventure Works demonstrates how thoughtful design choices contribute to cohesive pages you ensure that fonts colors and other visual elements align with the brand identity creating a consistent and recognizable aesthetic throughout your reports by utilizing whites space effectively you allow elements to breathe and improve readability clear headings and subheadings along with intuitive navigation elements further enhance the overall cohesion and user experience by incorporating these steps into your report design process you can improve cohesiveness and create visually appealing reports that effectively communicate information cohesiveness is not just about aesthetics but also about facilitating understanding and engagement for the intended audience creating a clear visual hierarchy is essential for guiding viewers through the report and highlighting key information use font size color and formatting to differentiate between headings subheadings and body text ensure that the most important elements stand out and draw the viewers’s attention adopting a consistent color scheme throughout the report enhances cohesiveness and strengthens brand identity choose a color palette that aligns with the company’s branding guidelines and use it consistently across charts graphs text boxes and other visual elements this consistency helps to establish visual harmony and reinforces the overall design aesthetic pay attention to the positioning of elements within the report ensure that related information is grouped together logically and presented in a sequential manner use alignment and spacing techniques to create a sense of order and structure avoid cluttering the page with unnecessary elements and maintain sufficient white space to enhance readability and visual appeal utilize grids and guides as design aids to achieve precise alignment and spacing grids help maintain consistency and alignment across different sections of the report while guides assist in positioning elements accurately these tools provide a framework for maintaining cohesiveness and ensuring that elements are visually aligned consistency and typography is crucial for creating a cohesive look and feel choose fonts that are legible and align with the overall design style use a limited number of font styles and sizes to maintain consistency throughout the report consider the readability of the chosen fonts and ensure that they are suitable for the target audience regularly review and refine your report design to identify areas for improvement seek feedback from colleagues or stakeholders to gain fresh perspectives analyze the report’s effectiveness in communicating the intended message and make necessary adjustments to enhance cohesiveness continuous improvement is key to achieving optimal results in the dynamic world of report design finding the balance between chaos and cohesion is essential for creating engaging and impactful pages by recapping the importance of cohesion exploring chaotic examples and showcasing the best practices you have gained insights into how color positioning and other design elements contribute to the creation of cohesive pages as you embark on your own report design journey remember the value of cohesive pages thoughtful design choices including consistent color schemes strategic positioning and attention to visual hierarchy can elevate your reports and captivate your audience by creating designs that balance order and clarity you will effectively communicate your message empower viewers with valuable insights and leave a lasting impact let’s take a poorly designed sales performance report and redesign it into a cohesive report the report view of PowerBI desktop displays a sales performance report called adventurework sales.pbix the report is poorly designed with randomly placed visuals and lacks coherence the redesign will change colors reposition and scale visuals and format text the report contains two line charts one funnel chart and two card visuals a logo and a report title the first step is to change the theme from the theme drop-down activate the accessible city park theme to ensure accessibility and impose a consistent style the theme contains colors that satisfy accessibility requirements customize the theme to enhance the label and access colors to ensure accessibility for the broadest range of consumers increase the font size and change the font color throughout the report to maximize visibility and contrast now drag the company logo to the top left of the report canvas also drag the title box to align with the logo let’s change the color of the title to black and make the text bold to align with the color palette of the theme select the sum of total sales card visual and rename the title as revenue to match the intent of the data in visualization format visual general effects change the background to theme color 2 both cards will have the same background color differentiating them from the report background and letting the viewer know that they both hold related data and contain the most valuable information in visualization format visual visual callout value change the font size to 32 and change the color to white to indicate the importance of this item for category label change the color to white and font size to 18 for better visibility against the new background then repeat these steps for some of quantity card visual and rename that visual as units sold now reposition both card visuals to the top right of the canvas and make sure these are of the same size because they are of equal importance you can rescale the card by selecting and dragging any side of the visual next select the sum of total sales by month line chart and rename it to a more appropriate title of revenue by month remove the x-axis title by turning the title toggle to the off position navigate to visualization format visual visual expand x-axis and scroll down to turn the title toggle to the off position x-axis represents monthly sales with the month name the month title on the axis does not add any relevant information rename the y-axis to total sales USD to clarify the sales details and currency now to add grid lines to the line chart in visualization format visual visual grid lines select dashed as style and black as the color next select the sum of total sales by month and country line chart and change its title to revenue by country remove its xaxis title as done in the previous chart and rename the y-axis as total sales USD next to format the legend navigate to visualization format visual visual and scroll down to the legend section in the legend section turn the title toggle button to the off position change the text size to 12 points and select the top right position from the position drop-down list of options the legend title is redundant because the country names provide sufficient information add grid lines to match the other visuals ensuring items such as title legends axis values and font size are formatted consistently for all the visuals helps report cohesion select the funnel chart and rename the title to revenue by category in visualization format visual visual conversion rate labels toggle to off as this is not relevant to the sales go into visualization format visual visual and expand the color section and select FX to open the conditional formatting dialogue in the dialogue select the total sales from the drop-down of what field should we base this on section then select a blue color called theme color five for the lowest values check add a middle color and select mid green theme color one for the midvalue section select the dark blue color theme color two for the highest value section select okay to apply conditional formatting converts the bars to shades of blue in descending order of sales amount dark blue represents the highest sales values next change the text size of the funnel chart to 14 points for better accessibility and visibility likewise change the font size of the axis titles and labels of both line charts to 12 points finally rescale and reposition the visuals making sure the distance between the visuals is equal to maintain design integrity adjust the position by dragging and rescale by selecting and dragging any side of the visual it’s good practice to review your work and possibly invite comments from colleagues a quick review right now suggests some slight improvements for instance to finish increase the size of the titles on each chart to 18 points that’s a demonstration of how to create cohesion in a report by applying and customizing an accessible theme ensuring consistent formatting for all visuals and scaling and positioning visuals in a logical hierarchical way to deliver a coherent data story imagine you’re planning a musical performance but you are playing for two different audiences one a group of classical music enthusiasts and the other a crowd of young energetic music lovers satisfying both audiences is a challenge it’s like the challenge you have when presenting data understanding your target audience is crucial and catering to their unique needs is the key to success it’s impossible to please everyone but the data must be readily understood by the majority with essential insights highlighted for your specific audience a key visualization success factor is understanding the audience you must tailor presentations to the specific needs and preferences of the target audience that is the specific group of people that your content is intended to reach it is the group of individuals most likely to be interested in or benefit from your data identifying and understanding the target audience is essential for communication and allows tailored strategies that can connect with this specific group’s preferences needs and characteristics every audience has unique characteristics including their level of technical expertise roles and responsibilities demographic information and other specific needs in this video you will explore the importance of knowing the audience and how the characteristics of your target audience influence the creation of your data presentation because of their characteristics you may be able to identify an audience’s needs an executive board needs highle summaries and key performance indicators while a marketing team wants detailed customer insights and marketing analytics when considering the target audience for a report or presentation assess some factors this will help identify the audience’s characteristics and needs enabling you to tailor your design to meet their specific requirements here are some key factors to consider identify the different roles or job functions of the potential users for example are they executives analysts marketers or sales representatives each row may have distinct data requirements and preferences determine the audience’s level of expertise and familiarity with the subject matter or the software being used are they beginners intermediate users or advanced professionals this helps you gauge the complexity of the information and the level of detail needed understand the goals and objectives of the audience what specific information or insights are they seeking for example executives may be interested in highle performance summaries while analysts may require more detailed data for in-depth analysis determine the specific information needs of the audience what kind of data or metrics are most relevant to their decision-making process for instance marketing teams may focus on customer demographics and campaign performance in contrast finance teams may require financial metrics and profitability analysis consider the preferred communication style of the audience some individuals prefer visual representations and charts while others prefer textual reports or interactive dashboards adapting your content to their preferred format enhances engagement and understanding assess cultural and demographic factors influencing the audience’s preferences and understanding this includes language preferences cultural nuances and accessibility considerations recognize the time constraints of the audience are they busy executives who require concise and summarized information or do they have more time for in-depth exploration tailoring the level of detail and presentation format can ensure that the information is effectively conveyed within the available time frame by considering these factors you can gain valuable insights into the target audience and align your report or software design to meet their specific needs once the target audience is identified the next step is to use data visualization techniques to address audience requirements it’s important to find the right balance between providing the required data and ensuring that it is understood by most of the audience when creating for diverse audiences it is crucial to simplify complex concepts and avoid jargon or technical terms that may be unfamiliar to non-technical stakeholders adventure works for instance may use clear and concise language to explain intricate manufacturing processes or market trends which your internal team would be familiar with however if presenting to external partners or users from outside the company they may be unfamiliar with manufacturing processes and therefore the technical terms should be avoided it’s important to identify and highlight the most relevant insights for the target audience for instance when presenting to the executive board the focus may be on financial performance market share and strategic initiatives on the other hand when presenting to the marketing team you can focus on customer behavior campaign effectiveness and market segmentation by tailoring the content to the specific interests of each audience data presentations become more engaging and actionable incorporating examples and scenarios that your audience is familiar with can help them connect with the data when presenting to the executive board a case study on the success of a recent product launch or a comparison of sales performance across different geographic regions can provide valuable insights similarly presenting market research findings or customer feedback to the marketing team can help them fine-tune their strategies and campaigns knowing the audience is vital in creating impactful data presentations by understanding the target audience’s needs preferences and roles within the organization data analysts can tailor their presentations to ensure maximum impact and understanding focusing on simplifying complex concepts highlighting relevant insights and using real world examples specific to the audience can significantly enhance the effectiveness of data presentations balloons are great fun at every party they brighten the room and raise the celebration mood but the same balloons that you used at a retirement function you don’t expect them to work as well at a kid’s birthday party for that party you’ll have balloons in different shapes and colors it’s the same situation when it comes to presenting data designing with the end user in mind is the key to success in data visualization the age range of the target audience is a vital consideration age related design considers the unique needs preferences and capabilities of different age groups in this video you’ll explore the significance of age related design in Microsoft PowerBI and discover specific considerations when designing visualizations for younger children aged 5 to 12 teenagers adults aged 18 to 64 and older adults aged 65 and above before exploring age related design considerations let’s briefly revisit the fundamentals of color theory color plays a crucial role in data visualization evoking emotions conveying meaning and aiding comprehension when designing for different age groups it’s important to select colors that are visually appealing to the group easily distinguishable and aligned with the intended message now let’s examine age related design in detail designing for younger children requires a simplified and engaging approach use vibrant and engaging colors younger children are attracted to bright and bold colors a visually stimulating color palette can capture their attention and enhance their engagement use simple and intuitive icons complex visual elements can overwhelm young children choose simple and recognizable icons that are easy to interpret interactive features such as buttons or dragable elements make the experience more interactive and enjoyable for young users incorporate playful illustrations and characters for example adventure works could use animated bicycle characters or friendly animal mascots in their visualizations to make the content more relatable tell a story through the data to capture the imagination of younger children adventure works could create a virtual journey such as showcasing different bicycle models in color and visually appealing environments for adults use a clean and professional design choose a visual style that meets the target audience’s expectations avoid excessive use of playful elements or overly casual designs ensure the visual elements have sufficient contrast and use clear readable typography for easy comprehension use text that is clear legible and easily readable choose appropriate font sizes typography and contrast to enhance readability adults appreciate a clear and intuitive user interface use logical navigation structures like menus and breadcrumbs to help users quickly navigate the content streamline the user interface and minimize complex interactions consider the audience’s needs for efficient data analysis and decision-m design dashboards and reports that provide relevant information quickly and concisely incorporate advanced visualizations appropriately consider using advanced charts graphs and interactive elements to provide deeper insights and facilitate data exploration allow users to personalize their dashboards or reports according to their preferences and priorities providing customization options can enhance user engagement and satisfaction designing for older adults requires additional focus on clarity legibility and ease of use use large and well spaced elements aging eyes may need help with small text or densely packed visuals enlarge fonts and provide ample spacing between elements to enhance readability and prevent visual clutter designing for different age groups requires consideration of their unique characteristics and needs by incorporating age related design principles you can create Microsoft PowerBI visualizations that cater to the specific requirements of groups like younger children and older adults from vibrant colors and interactive elements for children to clear typography and simplified interactions for older adults every design decision should prioritize the target audience’s ease of understanding and engagement age related design is one important aspect of creating inclusive and compelling visualizations continually exploring and understanding the needs of diverse user groups will help you focus the features of PowerBI to deliver impactful and accessible data visualizations for all imagine you’re preparing a delicious meal carefully selecting the finest ingredients your focus is on the flavors that will make the meal great in a similar way when presenting data focusing on the key details is crucial much like those food ingredients your audience craves the most relevant and impactful insights prioritizing key information ensures your message fulfills and satisfies the audience understanding the needs and preferences of your audience allows you to focus on the most relevant data points highlight outliers and provide the right level of detail for effective communication in this video you will explore the importance of prioritizing key information in Microsoft PowerBI and how it can enhance data insights for your audience before exploring the details of prioritizing it is vital to know your audience and their specific needs for instance presenting to the executive board requires a highle overview with emphasis on the big picture and key insights while presenting to a sales team may require more detailed information about performance evaluation consider a report for the executive board with an overview of quarterly sales and an emphasis on product categories the data also indicates that the executives need to focus on France and the United Kingdom for their marketing efforts by understanding your audience you can tailor the presentation to their specific needs ensuring that the key information is appropriately highlighted it allows you to customize the content format and level of detail in your presentation by adapting the presentation to the preferences knowledge level and goals of the sales team you increase the chances of delivering a compelling message that meets their needs when presenting data it is essential to capture the attention of your audience quickly by focusing on headlines or the most important findings and trends you can convey the main message effectively in the case of Adventure Works annual sales report key headlines may include overall revenue growth top selling product categories and regions with significant sales increases by highlighting these headlines you provide a clear and concise overview that immediately grabs the audience’s attention in any data set there are often outliers or data points that deviate significantly from the norm these outliers can provide valuable insights or indicate areas that require attention by highlighting them visually such as using color or annotations you draw the audience’s focus to these critical data points for example adventure works may have a particular product that experienced a sudden spike in sales or a region that underperformed compared to others by highlighting these outliers you prompt further exploration and discussion ensuring that the audience does not overlook essential information while headlines and key findings are crucial it is also essential to provide access to detailed information for a closer inspection when appropriate different audience members may have different levels of expertise or specific questions that require a deeper dive into the data in tailoring presentations the availability of detailed information for closer inspection should be carefully considered aligning with the needs and preferences of the specific audience for instance in an annual sales report from Adventure Works presenting to the executive board may emphasize highle trends revenue figures and strategic directions while a presentation to the sales team might delve into granular details like regional performance customer segments and sales targets adapting the level of detail ensures that each audience receives the information that aligns with their decision-making requirements optimizing the impact of the presentation microsoft PowerBI allows for interactive exploration where users can drill down into specific data points or filter the information based on their interests by providing this level of detail you enable further analysis and empower your audience to extract insights relevant to their specific needs the definition of significant information can vary across different audiences what may be crucial for one group may not be as relevant to another therefore it is crucial to adapt your presentation to align with the preferences of your audience for example the executive board may prioritize overall revenue and market share while the sales team may be more interested in product specific details or customer segmentation by understanding these preferences you can ensure that the key information presented is meaningful and resonates with your audience prioritizing key information in Microsoft PowerBI is a critical skill for effective data visualization and communication you can enhance data insights by understanding your audience focusing on headlines highlighting outliers providing access to detailed information and adapting to audience preferences the key to successfully prioritizing information is understanding your audience and tailoring your presentation to meet their specific needs picture a vault where your most valuable possessions are stored now imagine that this vault doesn’t have a strong lock leaving your treasures vulnerable to theft just as you’d prioritize security for your valuables safeguarding data is paramount in our digital age data the lifeblood of modern organizations is subject to a range of threats cyber attacks breaches and unauthorized access ensuring the security of this digital gold mine isn’t just a choice it’s a necessity let’s explore the world of data security where the keys to protection lie in understanding the risks implementing robust measures and fostering a culture of vigilance in the world of data visualization ensuring the security of data is of utmost importance from protecting sensitive information to maintaining data integrity incorporating robust security measures is crucial in this video you will explore the significance of security in data visualization and discuss key considerations for safeguarding data throughout the visualization process adventure Works a fictional multinational bicycle manufacturer is used as an example to illustrate the concept of data security in practice data visualization often involves working with sensitive information such as customer data financial records or proprietary business insights ensuring the security of this data is essential to maintain trust comply with regulations and protect against unauthorized access or data breaches let’s examine the key aspects of security and data visualization controlling access to data is vital to ensure that only authorized individuals can view or interact with specific data sets by implementing role-based access control data can be restricted or served in a controlled manner to the individuals who need to access it this helps protect sensitive information and reduces the risk of unauthorized data exposure additionally access logs and audit trails can be implemented to track and monitor data access providing accountability and visibility into data usage in Adventure Works you implement role-based access control to ensure that sensitive data is accessible only to authorized individuals in data visualization processes for instance the finance team has access to financial data while the marketing team can view customer demographics for targeted campaigns this granular access control prevents unauthorized individuals from accessing data beyond their scope safeguarding sensitive information anonymizing data is an effective technique for protecting privacy and confidentiality by removing personally identifiable information or replacing it with pseudonyms the data can be used for analysis and visualization while preserving privacy anonymization techniques such as generalization suppression or noise addition ensure that individuals cannot be identified from the data generalization involves simplifying or aggregating data to a higher level of abstraction often to protect privacy or reduce complexity suppression is the deliberate removal of certain data elements to prevent identifying individuals or sensitive information noise edition introduces controlled random variation into the data to make it more challenging to deduce specific details about individuals or confidential data these techniques are commonly used in data anonymization and privacy preservation to strike a balance between sharing useful information and safeguarding sensitive details ensuring data remains useful while reducing the risk of privacy breaches organizations should follow best practices and guidelines for data anonymization considering factors such as the nature of the data regulatory requirements and the intended use of the visualizations in Adventure Works you conduct market research and collect customer feedback to protect customer privacy you employ data anonymization techniques when visualizing the data personal information such as names addresses and contact details are replaced with pseudonyms or aggregated to preserve anonymity this allows Adventure Works to analyze and prevent valuable insights without compromising the privacy of customers maintaining data integrity is crucial to ensure the accuracy and reliability of the visualized information data integrity aspects include data validation error detection and consistency checks data validation involves verifying the accuracy and integrity of input data to ensure it meets predefined criteria error detection focuses on identifying mistakes or anomalies in data helping prevent erroneous information from causing problems consistency checks ensure that data conforms to established standards or matches other related data maintaining a reliable and cohesive data set these practices collectively help maintain data quality minimize errors and ensure that information is reliable and useful for decision-making and analysis implementing data validation rules and performing regular audits help identify and rectify any anomalies or inconsistencies in the data ensuring the visualizations reflect accurate and reliable insights furthermore employing data encryption techniques can prevent unauthorized modifications and tampering of the data maintaining its integrity throughout the visualization process in Adventure Works you prepare quarterly reports on sales performance which are shared with the executive board to ensure data integrity you implement data validation checks to detect any anomalies or errors in the sales data by cross- referencing the data with your customer relationship management system or CRM and performing consistency checks Adventure Works ensures the accuracy and reliability of the visualized sales information this data integrity provides the board with confidence in making informed decisions based on reliable insights when transferring data between different systems or sharing visualizations with stakeholders it is essential to prioritize secure data transmission using encrypted connections such as HTTPS or SSLTS ensures that data is encrypted during transit making it difficult for unauthorized individuals to intercept or manipulate the data https hypertext transfer protocol secure is a protocol that provides secure communication for website connections allowing user data to be transmitted in an encrypted manner this encryption relies on security protocols such as secure sockets layer SSL or transport layer security TLS secure sockets layer transport layer security SSLTS is used to ensure privacy and integrity during data transmission over the internet protecting user data from malicious attacks and ensuring its security these protocols enhance users online experience by providing a more secure environment when conducting online transactions and sharing sensitive information additionally organizations should consider secure file sharing methods such as using virtual private networks or VPNs for the connections using two-factor authentication or 2FA for authenticating users using Microsoft one drive for business Google workspace or Dropbox business for enterprise level cloud storage solutions and using secure protocols like secure file transfer protocol or SFTP and also utilize secure cloud-based platforms for distributing visualizations s ensuring data remains protected throughout its journey adventure Works collaborates with external partners and distributors sharing visualizations and sales data for joint business planning to ensure secure data transmission you utilize encrypted connections such as SSL TLS when sharing sensitive information over the internet this encryption protects the data from unauthorized access during transit maintaining the confidentiality and integrity of the shared visualizations and data data visualization often involves working with data that is subject to legal and regulatory requirements such as general data protection regulation or GDPR compliance with these regulations is crucial to protect individuals rights and maintain legal obligations data visualization practices should adhere to the relevant regulations including obtaining appropriate consent anonymizing data when necessary and implementing necessary safeguards organizations should stay informed about evolving data protection regulations and ensure their data visualization processes align with the correct legal frameworks adventure Works operates in various regions with different data protection regulations when visualizing data they ensure compliance with relevant regulations such as GDPR they obtain appropriate consent from customers anonymize data where necessary and implement necessary security measures to protect personal information this ensures that Adventure Works aders to the legal requirements and maintains the privacy rights of individuals security is a fundamental aspect of data visualization ensuring the confidentiality integrity and availability of data by implementing robust security measures such as access control data anonymization maintaining data integrity secure data transmission and compliance with data regulations organizations can build trust protect sensitive information and deliver reliable insights to their stakeholders as the importance of data continues to grow prioritizing security in data visualization is essential for maintaining the confidentiality and integrity of information in today’s datadriven world kim grew up in a small town in rural America the town had seen better days the region’s economy was in decline there were few career prospects for a young woman kim had to stay in her hometown and take whatever jobs she could find luckily she was an avid social media fan with a recent smartphone the phone allowed her to connect online even though the town’s wired internet connections were slow and often failed completely she vented her career and life frustrations on social media and very soon she got many suggestions for alternative careers and educational paths kim explored the opportunities available to her taking advantage of the low barrier of entry offered by the internet she used her phone and computer to take online courses and to research business ideas she had an eye for fashion and makeup an affinity for emerging styles and an ambition to succeed that combination led her to establish a business venture offering a few products online luckily for Kim the launch of her online business coincided with the upgrade of the town’s broadband to fiber connectivity yes you can work from anywhere with an internet connection but if you’re at all competitive it’s nice to be somewhere that has fast internet speeds the world is now a global village the internet is at the heart of this transformation and is an integral part of our everyday lives that’s why the need for better speeds and greater coverage has been felt around the world in the USA average connection speeds increased from 25 megabytes per second in the past to over 100 megabytes per second in recent times this is largely due to the widespread adoption of fiber optic technology which gives us faster speeds and improved coverage kim started slowly but her business grew as more and more people in her small town began to connect to and use the internet more because of its better speed her business expanded as the world grew more connected through fast internet connections kim started to use data from her customers to visualize and identify preferences and grow her business further despite the lack of local resources Kim was able to run a global business from her small town people both in rural and urban areas can access the internet easily with predictable costs and 247 access thanks to new technologies such as mobile broadband connections on 4G and 5G when traveling Kim can run her business using her smartphone connected to a cellular network or using one of the many Wi-Fi hotspots supplied by cities across the world the rise of global internet connectivity allowed Kim to access a wide array of resources with fast access to a global network she was able to stay upto-date with the latest trends in international business she made connections with professionals in other countries and was soon collaborating on new business deals and markets she couldn’t have considered before what was once an impossibility is now a reality for Kim she continues to explore global internet connectivity and use customer data analysis to expand her international business and explore new opportunities welcome to this high-level recap of the lessons covered this week this summary will help you revise the concepts of visualization and design during the course various adventure work scenarios were used as real life simulations of a multinational bicycle retailer operating in multiple countries these scenarios are designed to facilitate understanding and provide relatability and will be mentioned again in this recap as you review color theory positioning scale and density of information chaotic versus cohesive pages knowing the audience age related design prioritizing key information and security in data color theory is a crucial guideline for mixing colors and understanding the visual impact of specific color combinations it includes concepts like the color wheel color harmony color psychology and color symbolism by grasping these principles you gain a powerful toolkit for crafting visually appealing and meaningful designs the color wheel illustrates the relationships between colors including primary secondary and tertiary colors enabling you to navigate various color schemes for harmonious compositions color harmony focuses on arranging colors pleasingly in a design achieved through complimentary analogous triad or monochromatic combinations enhancing balance and impact color psychology explores how colors evoke emotions and influence behavior helping you use colors strategically for specific messages for example using yellow and orange can often evoke vibrant and energetic emotions symbolic meanings and cultural associations of colors are also essential ensuring effective communication across diverse cultural backgrounds mastering color theory empowers designers to create captivating designs effectively convey messages and evoke desired emotions making color theory a guiding force in transforming ordinary designs into extraordinary reports and dashboards color is a fundamental component in report design and data visualization impacting the quality and effectiveness of reports color influences emotions perceptions and the overall visual impact of your data visualization each color holds unique psychological associations and symbolic meanings generating diverse emotional responses for example warm colors like red and orange convey energy passion excitement and attention or warning while cool colors like blue and green evoke calmness serenity and harmony by skillfully selecting and combining colors designers can effectively convey the intended emotional message in report design while also considering cultural interpretations for global designs positioning in report design involves strategically placing visual elements to guide the viewer’s attention and convey essential information adventure Works recognizes the importance of this ensuring key data points like revenue and units sold are prominently placed at the top of a report the logical flow of information is also considered with supporting details arranged beneath the main metrics creating a natural narrative for easy navigation scaling information in the report and dashboard design is also crucial for clarity visual hierarchy and emphasis proper scaling optimizes space ensures responsiveness and reduces cognitive load chart selection plays a pivotal role in optimizing scale of information for example bar charts are used for presenting nominal and original scales while line charts work with interval and ratio scales once an appropriate chart is selected all associated elements can be scaled proportionately according to the degree of emphasis overall mastering the art of positioning and scale enhances report designs creating engaging informative reports that effectively communicate insights to the audience positioning in design involves arranging visual elements to guide attention and convey messages effectively adventure Works understands this importance ensuring key data is presented clearly and avoiding overcrowding techniques like grouping related info consistent spacing and visual hierarchy are employed to enhance information density while white space prevents clutter allowing viewers to focus their attention aligning elements guides the narrative and helps the flow of information proper positioning and information density are crucial in data visualization for comprehension and engagement enabling organizations to communicate insights efficiently cohesive page design is crucial contrasting with chaotic layouts that lack structure and coherence cohesive designs engage viewers utilize clear visual hierarchies and maintain a consistent color scheme aligned with the brand identity thoughtful positioning effective use of whites space and strategic typography contribute to organized visually appealing reports the incorporation of grids guides and regular reviews will refine the design ensuring a cohesive presentation of information by mastering these principles you create compelling reports that communicate effectively and leave a lasting impact on your audience the crucial first step in creating a successful report or presentation is identifying the target audience’s unique characteristics such as their roles expertise goals information needs and preferred communication style adventure Works for instance uses clear language and visualization elements to explain complex concepts while highlighting relevant insights for different groups such as the executive board or marketing team where possible incorporate real world examples and scenarios to help the audience connect with the data this targeted approach ensures data presentations effectively convey meaningful insights and contribute to the business success of Adventure Works to optimize data visualization designing with the end user in mind is crucial and age related design is a significant aspect to consider designing for all age groups requires understanding their unique needs by following age related design principles Microsoft PowerBI users can create visually appealing and engaging visualizations that cater to the specific requirements of different age groups the goal is to prioritize ease of understanding and engagement for the target audience prioritizing key information is a crucial aspect of data presentation by understanding your audience you can tailor your presentation to meet their specific needs ensuring that the most relevant data points are appropriately highlighted when presenting data capturing attention quickly is essential identifying outliers and important data points is another critical strategy providing access to detailed information for closer inspection is essential for those in your audience who need to drill down to reveal more data that’s part of adapting to your audience’s preferences prioritizing key information in Microsoft PowerBI is a critical skill that enhances data visualization and communication by considering your audience focusing on headlines highlighting outliers providing detailed access and accommodating audience preferences you can drive more meaningful decision-making based on data insights during your data visualization work security has a vital importance when dealing with sensitive information this includes data such as customer data financial records or proprietary business insights ensuring proper data security is crucial for maintaining trust complying with regulations and preventing unauthorized access or breaches by implementing robust security measures such as access control data anonymization maintaining data integrity secure data transmission and compliance with data regulations organizations build trust protect sensitive information and deliver reliable insights to their stakeholders access control involves controlling who can access specific data sets reducing the risk of unauthorized exposure you can implement role-based access control granting access only to authorized individuals and ensuring that sensitive data is protected data anonymization preserves privacy by removing identifiable information allowing analysis and visualization without compromising personal details maintaining data integrity is crucial to ensure the accuracy and reliability of the visualized information data integrity aspects include data validation error detection and consistency checks compliance with data regulations such as general data protection regulation or GDPR is essential and you can obtain consent from customers anonymize data as needed and implement security measures to comply with relevant regulations during this week you explored color theory positioning scale of information and information density chaotic versus cohesive pages knowing the audience age related design prioritizing key information and security and data by applying these techniques you will have more control over data visualization and design in Microsoft PowerBI the difference between insight and noise is clarity is the message of your report clear to the viewer or is the insight hidden by the noise in your presentation crafting compelling visualization in PowerBI is a necessity in this video you will learn to transform raw data into captivating stories where charts and graphs are not just shapes they bring essential clarity to your story data visualization helps convey complex information in a way that is easy to grasp and interpret microsoft PowerBI offers a wide range of visualization options from simple bar charts to intricate custom visuals allowing you to tailor your presentations to your audience and data however the true impact of data lies not just in its presentation but also in the clarity and visual appeal of the visualization when considering the importance of clarity charts data and visuals are all crucial components clear and visually appealing charts make it easier for stakeholders to understand complex data the right chart type can simplify complex information making it accessible to broader audiences data is only valuable when it communicates an insight and supports a decision visual impact ensures that your data presentation is engaging and persuasive cluttered visuals can lead to misinterpretation and therefore erroneous conclusions visual clarity in your reports reduces the risk of drawing incorrect insights let’s explore some best practices to create visual clarity and impact selecting an appropriate visual to present the data is critical for ensuring clarity and visualization it helps to display data accurately for instance a pie chart can be used to present a data set showing parts of a whole this might be a breakdown of total sales by each product category but what if you have 20 product categories pie charts will get cluttered and difficult to read if the data set is too complex break it down into smaller more digestible parts you can create summarization and aggregation measures within your data model you can employ drill down functionality of PowerBI to present details about your data although you can use colors to highlight key data points overuse of colors can lead to confusion you need to include clear and concise data labels for data points in your chart type avoid overcrowding the chart axis as this creates clutter in your chart and the overall report becomes unreadable you need to maintain a formatting consistency across all charts of your report pages you can use and customize report themes to ensure a cohesive look the data quality also contributes to the visual clarity of the report visualizations are only as good as the data quality they represent you need to make sure the data is clean accurate and formatted when choosing a chart for your report consider key elements such as the data type the message the context and the audience understand the nature of your data is it numerical categorical or geographical this helps you decide the appropriate chart type determine the data story you want to convey in your report are you showing comparison trends distribution or proportions this influences the chart selection evaluate how your visualization will be used dashboards presentations and interactive reports require distinct types of charts and visuals consider your audience’s familiarity with data visualizations select a chart type that connects with their experience although PowerBI provides tools and the flexibility to create stunning visuals it’s up to you as a data analyst and report designer to use them to eliminate clutter and impart visual appeal by prioritizing clarity selecting an appropriate chart and following best practices you can transform your data into captivating and meaningful stories that deliver insights in the dynamic world of data visualization creating visually appealing and compelling reports is essential for effective communication and decision-making however as you design these reports you must not forget about accessibility in the context of data reporting and visualization accessibility refers to the design and implementation of reports that can be easily used and understood by all individuals including those with disabilities this involves creating reports in a way that accommodates various needs such as providing alt text for visuals ensuring sufficient color contrast enabling keyboard navigation and providing compatibility with screen readers ensuring that your reports are inclusive and accessible to all users regardless of their abilities is a crucial aspect of responsible and user centric report creation because of its global operations Adventure Works executive management want to design its reports and dashboard to be used by a broader audience therefore as a data analyst your task is to consider the accessibility features of PowerBI before you plan and execute data analysis and design reports and dashboards now let’s explore a project file in PowerBI to learn how to create reports that are userfriendly and accessible to all audiences the project file contains three data tables sales products and region the first task is to create a line chart by dragging the total sales month and country fields from these tables into the respective wells of the line chart visual next create a donut chart representing the total sales by product category select the total sales and category fields to add to the chart for users with visual impairments these visuals may not be accessible add alt text to make your reports inclusive select the line chart and access visualizations format visual then general and scroll down to the alt text box enter the following descriptive text for the line chart monthly regional revenue analysis for adventure works this description acts as a text alternative that screen readers can access this lets users understand the content even if they cannot see it your users can also expand a specific visual from the report or dashboard select the line chart then select the focus mode icon on the top right corner of the visual the chart fills the entire screen select back to report to exit focus mode you can also view the data in a tabular format that is more screen reader friendly from the visual context menu select show as a table from the drop-own list this displays the line chart with a data table visual and report page titles are important accessibility features that serve as reference points let’s add some access visualizations select general then select the chart title provide a descriptive title of the chart like month sales by country next you need to name your report pages select the page number and rename the page to better represent the data both the X and Yaxis titles should also be readable and provide sufficient information in the line chart a color on its own might not be sufficient to convey information use markers to help distinguish the different data sets used in the visual select the line chart and turn the markers toggle to the on position select a different shape marker for each country you can configure the marker shape size and color for each line powerbi’s tab order feature provides a way to arrange all visual elements logically to accommodate keyboard users this ensures a natural order of visuals that keyboard shortcuts can access navigate to the view tab of PowerBI desktop and access the selection pane from the show panes group this opens a selection pane with two tabs layer order and tab order in the tab order tab you can rearrange the order of visuals in your report you must ensure screen readers effectively interpret and convey visuals and text this way you can ensure that the report is properly interpreted and conveyed to users with screen readers finally choose an appropriate accessibility theme and the high contrast windows option from the view tab to help ensure report accessibility this generates contrasting text and background colors to help make the content readable for users with visual impairments or color blindness if you use a high contrast mode in Windows PowerBI desktop automatically detects which high contrast theme is being used in Windows and applies those settings to your reports lastly test your reports with diverse users including those with disabilities to gather feedback and identify accessibility issues real world feedback helps you improve report design there are accessibility features available in PowerBI to help you successfully create a report design that can be accessed by a wide range of consumers integrating PowerBI accessible features into your workflow is not a limiting factor in designing compelling reports and dashboards it is the correct way to generate reports usable by a broader audience including those with disabilities you created a canvas of charts and graphs in Microsoft PowerBI to visualize your data but as you review your report it seems incomplete it’s as if one piece of the puzzle is missing that critical piece is the assessment of its clarity and impact a report is not just a collection of individual charts its clarity and its impact come from combining these visual elements into a compelling narrative this video will explore strategies and best practices to ensure your PowerBI reports are not just a canvas of information but are visually compelling engaging and impactful guidelines for creating an impactful report include deciding on the report objective establishing a visual hierarchy using branding and themes carefully composing the report employing storytelling techniques and optimizing the report performance for the best user experience what do you intend to communicate in your report and what is your target audience having a clear understanding of these aspects guides your design decisions the use of visual cues such as size color and visual placement builds the visual hierarchy to emphasize key insights or data points and assist navigation use branding and themes to help create a professional report design brand guidelines enforce a consistent style that adds credibility to your reports when composing your report consider layout and composition factors such as whites space alignment and screen real estate optimization whitespace means ensuring proper spacing between report elements like headings visuals and brand elements alignment is about aligning report elements to create a structured layout and a sense of order that emphasizes the data story screen real estate refers to the available space on the report canvas of PowerBI finding the right balance between presenting enough data to get your message across while avoiding overwhelming your audience is crucial when dealing with a lot of data points think about incorporating interactive elements like tool tips slicers and drill through such features keep the main visual clear but allow users to expand specific data points telling a story with your data significantly enhances the engagement and impact of your PowerBI report sequence items on the report canvas to make a natural storytelling flow for example a clear introduction key insights supporting details and finally a conclusion slow loading or unresponsiveness leads to a poor user experience that can diminish the impact of a report optimize report performance by eliminating unnecessary data minimizing complex DAX logic and aggregating data choosing an appropriate chart type based on the data type is critical in designing a clear and impactful report we will now explore use cases strengths and limitations of some commonly used chart types bar charts can compare discrete categories or values displaying rankings and trends over time easy to interpret useful to display data with few categories can come in the form of a bar chart where the bars display horizontally and in a vertical orientation when it displays as a column chart not suitable for continuous data and can become cluttered with too many categories display trends and patterns over time with a line chart to identify changes in data over a continuous scale excellent for visualizing time series data and to display multiple series for comparison less effective for comparing individual data points and not suitable for categorical data pie and donut charts display the composition of a whole showing parts of a percentage and they emphasize relative proportions easy to understand and they work well with a small number of categories not suitable for use beyond eight categories scatter plots are great for visualizing the relationship between two numerical values identifying outliers and spotting correlations it reveals patterns clusters and trends and is effective in displaying highdensity multi-dimensional data the visual may be overwhelming with too many categories a gauge chart displays a single value in relation to a predefined target such as key performance indicators or KPI provides a visual representation of performance against a goal not suitable for displaying multiple data points tree map is ideal for visualizing hierarchical data structures showing the proportions of categories within a whole visualizing hierarchical relationships by effective use of space and color coding may not be suitable for non- hierarchical data and it gets complex when there are deep hierarchies a strategic approach to report design in Microsoft PowerBI can create a clutter-free and engaging data story by having a clear objective maintaining a visual hierarchy implementing consistency and adhering to best practices in all design choices such as chart selection you can create a report that makes the best impression on the audience data is not just numbers it is a compass that guides you through the maze of business performance highlighting exactly where you underperform and where opportunities await a key performance indicator chart is one way to transform numbers into insights stories and to uncover hidden messages from raw data often used for sales marketing and customer service KPIs act as performance benchmarks measuring progress and identifying trends a KPI visual typically displays a single metric and its performance against a target or baseline this makes it easier for viewers to quickly judge performance and identify problems microsoft PowerBI has a built-in KPI visual but gauge charts and bullet charts can also be used to present KPI values kpi measures a value and shows trends and status the value is the main measure that you want to evaluate for instance current sales the element you want to compare the value with is the target for example the sales target the trend is how the value performs over time for example are the sale values going upward or downward the KPI visual can be adjusted from a desktop design to a version that works well on mobile devices to optimize a KPI chart for mobile devices keep the charts layout uncluttered use appropriate font sizes and contrasting colors focus on presenting the essential data points and avoid excessive decorative elements adventure Works wants insight into sales figures and an assessment of sales targets let’s design a sales performance KPI visual in PowerBI desktop and optimize it for mobile devices first launch PowerBI desktop and open the adventure work sales report to create a KPI chart to track sales performance against the target drag the total sales and target fields from the sales table to the report canvas powerbi automatically generates a column chart from these values you don’t need this chart so select KPI visual from the visualization pane to convert it to a KPI this action results in an empty chart with no data hover the cursor on the information icon the icon indicates that both values and trend axes are needed for this chart the three elements of the KPI chart are in the build visual tab of the visualization pane these elements are value target and trend to compare the sales values with the target add the total sales measure to the value section of the visual for the trend axis add months to view monthly sales trends remove the target values and drag the month field from the order date hierarchy to the trend axis this action generates a KPI visual that charts sales values by month it’s like creating an area chart with month as an axis and sales as values the main value indicated in the visual is sales but is this total sales or a filtered value the value represented at the center of the KPI visual is the last data point shown in the trend axis this means that if the trend is a month then this is the last month sales only in this report it’s the sales for December 2018 if the data set contains sales for multiple years then the value indicates the sales for December of all years if the data set contains the values for the full year then it’s for December but what if you only have sales for certain months access the visualizations tab then format visual visual and date turn on the date toggle to display the values date you’ve presented the sales data but must compare the value to the target drag the target measure from the sales table to the target section of the KPI visual adding the target generates color coding in the visual by turning the value and the area chart red an exclamation mark appears beside the value indicating that the sales values are behind the target the target is represented as the goal by default the percentage difference between the sales and the target is displayed in parenthesis which is minus 6.59% in the current report if the sales values meet or exceed the target then the color of the value and area chart turn green with a check mark next you must format the chart using font style and size changing color or adding background color for instance you can choose the sentiment color red as bad or red as good based on the nature of the value lastly optimize the KPI visual for mobile devices navigate to the view tab and select mobile layout drag the KPI visual from the page visuals pane to the mobile layout page positioning and rescaling the visual to adjust it the visual is now optimized for mobile devices a KPI chart represents the sales trend against the target value with the help of KPI visuals Adventure Works can identify which product region or sales representative is underperforming and as a result devise strategic decisions for performance improvement the key to revealing insights from raw data is using the appropriate visualization techniques have emerged using specific data types and analytical methods to produce tailored visualizations dotplot is one such visualization that is popular when presenting categorical data in relation to a numerical value to display the relationship between two numeric variables you can create a scatter plot that defines the correlation between variables a variation of a scatter plot is a bubble chart that can display the relationship between three variables the third variable represented in the size of a bubble a bubble chart is like a dot plot but instead of numeric data you use categorical information on the x-axis dotplot charts are a simple yet effective data visualization technique used to display the distribution of data points along a single axis in a dot plot chart each data point is represented by a dot and dots are stacked vertically above the corresponding data values on the axis this makes dot plots especially useful for visualizing the distribution and frequency of categorical data powerbi does not have any visual named dotplot or dot chart but you can create a dot plot by converting a scatter chart to a dot plot however there are certain custom visuals available in the PowerBI marketplace that are used to directly create dot plots in PowerBI let’s quickly check on a few reasons dot plots make such a useful chart type a dotplot chart is easy to use it is easy to interpret for non-technical users it’s particularly useful when visualizing categorical data giving a clear comparison between categories it displays the distribution and patterns in the data it can visualize a large amount of multi-dimensional data and it’s a compact chart that’s cell phone friendly adventure Works needs insights into regional product category sales performance they need to know the quantity sold for each category and the revenue per country the challenge is the number of variables to be presented in a single visual as a PowerBI analyst you can deploy a dot plot to present categorical information such as category or country on the x-axis sales on the y-axis and quantity as the size of the dot let’s jump into PowerBI and use a dot plot to analyze and visualize the Adventure Works information open the Adventure Works sales project the PowerBI core visualization pane has a no dot plot or dot chart visual so you need to begin with the scatter chart and convert it into a dot plot adventure works must present sales quantities country and category data drag the sales and total quantity sold measures from the key measures table to the report canvas powerbi autogenerates a column chart select the scatter chart from the visualization pane to convert the column chart to a scatter chart powerbi autofills the x-axis section with sales and the y-axis field with total quantities sold this is your scatter chart the sales data is numeric but you need to bring categorical data to the x-axis drag the country column from the region table to the x-axis field of the visual and move the sales data to the y-axis next drag the category column from the product table to the visuals legend section when I hover the cursor on a single dot in the chart a tool tip appears displaying the country category and sales amount for the category in that country to add more data drag the quantity sold measure from the key measures table to the visual size section the dot size changes in proportion to the quantity sold the tool tip now displays quantity information in addition to the previous data the chart still resembles a bubble chart to change it navigate to the format visual tab and expand markers in the shape drop-own list select the square dot you could also select distinct shapes for each category the dot size can also be adjusted here next format the aesthetics first add a chart title description then adjust the legend position legend title and font size format the axes to display clear labels and titles add and format the grid lines then add background color to improve the report’s accessibility select different shapes for each category finally you must add analytics lines select analytics in the visualization pane represented by a magnifying glass icon to display a range of different analytical lines expand the average line drop-down and select add line to add an average line to the chart format the line color and toggle the data label button to the on position to add average sales value data other analytical lines can be added to the chart as required adventure Works analytical needs were fulfilled by presenting categorical data in a single visual the dotplot chart allows you to visualize multi-dimensional data with more than two variables and categorical information instead of numerical values on the x-axis of the chart interactive visualizations breathe life into data revealing hidden patterns and relationships between variables powerbi’s core visualization pane offers a visual where numbers are transformed into dynamic bubbles bubble charts can depict multi-dimensional data in a single view making intelligent use of space in addition to the X and Y axes a third dimension of data is represented through the size of each bubble this approach enables you to highlight complex relationships between variables and identify patterns that might not be immediately evident in traditional two-dimensional scatter plots the bubble charts ability to convey multiple data dimensions simultaneously gives analysts and decision makers deeper insights into their data these insights can lead to more informed choices and strategies across a range of applications such as market analysis financial planning sales performance evaluation and resource allocation one example of applying a bubble chart effectively is in market analysis suppose you are analyzing the performance of various products within different markets the X and Y axis can represent market share and revenue while the bubble size corresponds to the total number of units sold by examining this data in a bubble chart you can discern valuable insights such as which products are dominant in specific markets based on market share and revenue and how sales volume relates to these factors highdensity data refers to data sets containing a substantial number of data points which can lead to visual clutter and hinder effective data interpretation with bubble charts you visualize data point density and use sampling techniques to manage data representation on the chart by adjusting the size of the bubbles or employing dynamic filtering options you can focus on specific areas of interest and maintain a clear and coherent chart despite the data’s complexity adventure Works wants to get insight into their data about the performance of different product colors the correlation between total revenue and profit margin the management wants to know the number of units sold of each product color sales profit margin product color and quantity together make the analysis and visualization challenging you can utilize a bubble chart in Microsoft PowerBI desktop to give all the required information in a single visual let’s transform those raw numbers into dancing bubbles of information and help Adventure Works make datadriven decisions about product colors the data model displays information on total sales and profit margin measures the product table has product color information to begin visualizing profit margin and sales select scatter chart from the visualization pane to add a placeholder visual to the canvas drag the sales and profit margin measures from the key measures table on the data pane to the x and y axis this generates a scatter chart with a single data point to make the chart more interesting bring a third data dimension to the chart fields this converts the scatter chart to a bubble chart then drag the color column from the product table to the legend field of the visual the tool tip now displays information about the total sales amount of a specific color product and the profit margin associated with that product color adventure Works needs to know the unit sold so bring the quantity sold measure from the key measures table to the size section of the visual another important feature of bubble charts is the play axis which you can use to animate your visuals drag the year field from the order date hierarchy from the sales table to the play axis now you can also analyze the data by year select play on the left side of the axis powerbi animates the bubbles to represent the variations in sales quantities and profit margins over the years next navigate to the analytics tab represented by a magnifying glass in the visualizations pane add a medium line based on sales and another for profit margin these chart lines provide analytics on the median sales and profit values the analytics pane provides interesting insights about the data now you need to format the chart first change the bubble shape and size to convey additional information and insights select visualization format visual visual and then markers in the shape dropdown change the shape of an entire series or individual categories in the size section adjust the size you can apply further formatting by changing the font style size and color adding background color and so on adventure Works can now visualize dense and multi-dimensional data in a compelling visualization to draw meaningful insights for future strategic plans in this video you discovered how a bubble chart delivered an engaging visualization to Adventure Works about the correlation between profit margin and sales based on the product color units sold and year you also explored the analytical capabilities of the bubble chart by adding the median and average lines to the chart to convey additional insights about the data you are working with a large data set when you discover that no one is interested in the data that’s a big surprise to you then you realize that it’s the insights people want presented not the data when dealing with data sets containing an abundance of data points presenting the information without overwhelming the viewer is vital in this video you will explore advanced display techniques in Microsoft PowerBI techniques such as presenting highdensity data using maps drills and 3D visualizations in PowerBI highdensity data is where you have a large amount of data points or values within a small area on a visual it often leads to visual clutter and makes it challenging to accurately interpret the visual some techniques to handle highdensity data include use aggregations and summarization drill through and drill down color coding such as heat maps and geographical maps and using 3D and custom visualizations let’s check some PowerBI visualizations that use these techniques and evaluate their potential for use in reports the first one to explore is heat maps heat maps are a powerful tool for visualizing the density and distribution of data across geographical regions or grids using color gradients to represent values heat maps allow viewers to quickly identify patterns trends and hotspots within large data sets for example imagine you are analyzing sales performance across various regions for Adventure Works a heat map could represent the sales figures using a color spectrum highlighting regions with the highest sales in vibrant hues while cooler shades indicate lower sales the heat map visualization is not available in the PowerBI core visualization pane you can import a heat map from PowerBI marketplace you can also use a Python-based heat map visualization in PowerBI you will learn about that option later in the course another visual to consider for highdensity data is called tree maps tree maps are ideal for displaying hierarchical data and comparing the proportions of data points across different levels in a tree map each rectangle represents a category and its size correlates with the proportionate value it represents this technique allows viewers to analyze the overall composition and the data point breakdown in a single visual for instance you can use a tree map to display the distribution of sales by product categories and subcategories within Adventure Works now let’s explore the functionality of drill through and drill down where analysts and viewers can dig deeper into the data a drill down in PowerBI allows users to move from a higher level of detail to a more granular level while a drill up does the reverse for example Adventure Work sales data is plotted on a time scale the viewers can use drill down to look at the sales data on a data hierarchy that goes from a year to each quarter to month and all the way down to a daily level there are two drill through situations to explain chart drill through lets users explore additional detail within a visual by clicking on specific data points for example in a bar chart representing sales figures for various products at a summary level selecting a specific bar say product 3 can trigger a drill through action revealing a detailed report highlighting sales trends in various regions product details and customer information related to that specific product page drill through allows users to navigate to a different page with associated information this advanced technique is especially valuable for creating summary pages with high-level insights while two-dimensional visualizations are more popular 3D visualizations can offer a new dimension of insights for instance a 3D scatter plot can showcase the distribution of products with a three-dimensional space revealing potential correlations and patterns such as a presentation of a product’s performance based on three parameters: price sales volume and customer satisfaction a 3D map can present data points in an interactive three-dimensional map space 3d mapping adds a sense of depth and realism to geographical data making it easier for users to identify spatial trends and analyze data use Microsoft PowerBI’s advanced display techniques to extract insight from large complex data sets while considering enduser requirements master highdensity data display drill through capabilities and the world of 3D visualization to improve your PowerBI reports and deliver impactful insights do you only access your social media accounts from a desktop computer no like most of us you probably spend most of your internet time on a mobile device accessing data on the go has become the norm decision makers expect to be able to access critical information anytime anywhere as a report creator you must be able to optimize report layouts for mobile devices that way you ensure your insights appear on smaller screens without losing clarity and usability creating a mobile friendly report layout involves careful consideration of visual placement font sizes and content organization to do that use the tools and settings in the mobile layout canvas of Microsoft PowerBI when optimizing a report for mobile one of the key considerations is responsive design a responsive layout automatically adjusts to fit different screen sizes and orientations ensuring that the report looks and functions optimally on various mobile devices such as tablets and smartphones the adaptability is crucial as mobile devices come in various screen sizes it ensures report access without the user needing to zoom or scroll horizontally another critical aspect of mobile optimization is the selection of visuals and data presentation not all visuals are suitable for mobile viewing due to their complexity or size you must choose visuals that convey essential insights while maintaining readability on smaller screens simplified visuals such as line charts bar charts and KPI cards are often preferred for mobile layouts as they can present data clearly font sizes play a crucial role in mobile optimization text that appears legible on a desktop monitor might become challenging to read on a smaller mobile screen use appropriate font sizes that ensure readability without straining the user’s eyes headers and labels should be clear and concise while data points should have sufficient spacing to avoid clutter in addition to visual elements interactivity is another aspect to consider when optimizing mobile devices you must choose visuals that convey essential insights while maintaining readability on smaller screens some interactions such as tool tips and drill through actions may work fine on desktops but might not translate well to touch-based mobile devices test and adjust interactions to ensure a smooth and intuitive mobile user experience as a best practice testing your mobile optimized report on various devices is crucial to identify potential issues and ensure consistency across different platforms emulating different mobile devices or using responsive design testing tools can help verify the reports performance and appearance on various devices adventure Works executive management wants to visualize its product sales summary it must be a mobile friendly sales summary dashboard so that it can be accessed anytime anywhere let’s use PowerBI desktop to optimize the Adventure Works sales summary report for mobile viewing before optimizing a report for mobile it is essential to review its current layout and design you need to identify elements that may not translate well to smaller screens and those that require adjustments to maintain readability and user friendliness let’s optimize the adventure work sales summary report for mobile devices the report contains one column chart representing the yearly sales amount a donut chart displaying sales by country or region and two card visuals showing sales and profit to begin navigate to the view tab and select mobile layout the mobile layout page has three panes: visualizations page visuals and mobile layout the page visuals canvas displays all the visual elements of the original report the mobile canvas has a precise grid layout for rescaling and repositioning the visuals on the screen with snap to grid functionality additionally you can select the checkbox lock objects from the view ribbons page options this action locks the visual elements in place to avoid any accidental movement use this once you are satisfied with the position and scale of your visual next drag all visual elements from the page visual pane and drop them to the mobile canvas one at a time first move two card visuals to the mobile canvas align both cards to the top side by side of the mobile screen now the main values on the card visuals are no longer visible so navigate to visualizations then visual expand the call out and in the value section change the font size to 18 in the label section change the font size to 12 in the spacing section change the vertical spacing to five pixels you can adjust font size independently for mobile and desktop versions of reports repeat this formatting for the second visual make some fine adjustments in positioning and scaling of the cards to optimize the readability and design next drag and drop the column chart to the mobile canvas enlarge the chart to fill the screen size and align it below the two card visuals finally move the donut chart to the mobile canvas enlarge it to fill the screen below the column chart in the mobile layout the donut chart legend values are not completely visible a small arrow is visible on the right end of the legend this suggests navigating for more information navigate to visualizations visual and expand legend in the position drop-own menu select center left you can also adjust the font size if necessary this changes the position of the legend from the top to the left all values are now visible without further navigation you can perform more adjustments for scaling the visuals and aligning them in the mobile layout screen the Adventure Works sales summary report is ready for anytime anywhere access on mobile devices optimizing report layouts in Microsoft PowerBI for mobile devices is an essential step in meeting the needs of today’s onthe-go business environment the world of data visualization continues to evolve and Microsoft PowerBI is at the forefront of introducing innovative ways to present and interpret data one of the latest additions to PowerBI’s visualizations is the shape map a feature that allows users to create geographic visualizations to uncover insights from geographical data in this video you will delve into the concept of shape map visuals their purpose and cover a step-by-step guide on how to add and configure them in your PowerBI reports adventure Works have recently expanded into territories across the globe as an analyst you realize the traditional table and chart visuals might not effectively communicate the geographical aspects of analysis you can use shape map visuals in PowerBI to better represent geographical and sales data to better showcase data topics such as population density competitor location and market demand across different regions a shape map visualization empowers users to tell stories using geographical data unlike traditional map visuals that plot data on a geographical map shape maps go a step further by enabling users to work with custom regions or shapes such as countries states or provinces sharing your report with a PowerBI colleague requires that you both have individual PowerBI paid licenses or that the report is saved in premium capacity powerbi Premium provides extra features like the ability to store more data cloud features and improved performance for PowerBI workspaces you can also use it to deploy reports and data sets and share content with users reliant on free licenses let’s help Adventure Works to craft a shape map visual to better present their performance across various geographical territories the shape map visual is only available in PowerBI desktop and in preview mode since it is in preview it must be enabled before you can use it to enable the shape map you need to select file options and settings options global preview features then select the shape map visual checkbox followed by okay you will then need to restart PowerBI desktop after making this selection now you need PowerBI to display the Adventure Works shape map visual the data set contains two fields sales and states these fields contain state names and corresponding sales amounts in PowerBI desktop after the shape map visual is enabled you select the shape map icon from the visualizations pane to add a shape map placeholder to the report canvas after adding the shape map to your report canvas you should add data to the data fields drag the state field to the location well and drag the sales field to the color saturation well of the map visual you can select the view tab to change the color scheme to a more accessible one such as accessible city park if you have an additional data set like product category or product color you can move them into the legend well to create a divergent color in this case as there is no category available in the data set you can apply gradient colors to the map go to format visual visual fill colors and turn the gradient toggle to the on position then add light blue for the minimum purple for center and black for the maximum you can also change the border color to black and three width now you need to display the map keys select the map settings dropdown then view map type key this action opens a dialogue that lists the map keys these keys are for US states you can change the map type to view keys for other countries if required the next option in this menu is projection you can use this option to present a 3D object on a 2D map powerbi selects Alber’s USA map style by default but three other options are available one option is equi rectangular this is a cylindrical projection that converts the globe into a grid each cell in the grid has the same size shape and area merc is another option this is a cylindrical projection with the equator depicted as the line of tangency polar areas are more distorted than equictangular projections and finally there’s orthographic this is a projection from an infinite point as if from deep space it gives the illusion of a three-dimensional globe next you’ll access the zoom dropdown and toggle on the zoom on selection and manual zoom options these options allow you to zoom in on states when selected finally to format the chart title access the general tab then expand the title drop-down and use the design effect options to change the title’s properties as required in this video you learned about shape map visuals discovered their purpose and explored a step-by-step guide on how to add and configure them in your PowerBI reports you specifically learned how to create a shape map visual with color coding to represent the sales amount for Adventure Works cororoplathth maps also known as filled maps stand out as a powerful tool for representing and analyzing spatial patterns by color coding geographical regions based on data values Cororopath maps offer a compelling way to visualize variations in data across different locations in this video you will explore the fundamental aspects of Cororoplathth maps their use cases and examples of the type of data best suited for this visual format adventure Works executive management realizes that simply looking at raw data in a tabular or columner format is not sufficient to comprehend the regional distribution of scales they need a visual that instantly communicates the variations in sales across various geographic regions as an analyst you can resolve this issue by employing the Cororapath map visual in PowerBI which allows you to present sales data on a geographical map with color-coded regions to indicate sales performance across various territories a cororoplath map is a geographic representation in which areas such as countries states or regions are shaded or patterned to illustrate quantitative data values each region on the map is assigned a color or pattern that corresponds to a specific data value allowing viewers to identify patterns and trends instantly the intensity of the color or pattern represents the magnitude of the data value enabling easy comparisons and highlighting regional disparities corroplath maps are most effective when the data being visualized has clear geographic boundaries when designing a cororopath map it is crucial to carefully select colors or patterns that are easy to interpret and distinguish using a color scale that smoothly transitions between values can enhance readability it is also essential to provide a clear legend or data scale to help users understand the relationship between colors or patterns and the corresponding data values now let’s consider some detailed use cases for cororoplath maps cororoplathth maps are ideal for visualizing population distribution across different regions by shading regions based on population density or total population you can quickly identify densely populated areas and areas with sparse populations corroplath maps are widely used to showcase various economic indicators such as GDP per capita unemployment rates or poverty levels across different geographic regions this helps policymakers and economists in understanding the economic disparities and making informed decisions corropath maps are valuable in displaying health and education related metrics such as disease prevalence vaccination rates literacy rates and school enrollment levels they provide insights into regional health and education challenges and aid and resource allocation cororoplath maps can effectively display environmental data such as air quality temperature variations or levels of pollution these maps help environmentalists and policy makers in assessing environmental conditions and devising appropriate conservation strategies but how can a cororoplath map best help adventure works in their business activities one example is to break down sales performance data per country as well as per state within those countries in this example of the United States states with higher sales are represented by darker shades while lighter shades indicate lower sales corropath maps offer a captivating way to explore and comprehend data patterns through geographic visualization their ability to showcase variations in data across different regions makes them a popular choice for a wide range of use cases from health economic indicators environmental data and population distribution with Cororoplath maps data analysts researchers and policy makers can gain valuable insights and make datadriven decisions with geographical context as an essential tool in the data visualization toolkit cororoplath maps assist in deeper understanding of the world around us cororopath maps have become an essential tool in data visualization for representing and analyzing data in a spatial context cororoplath maps also known as field maps are particularly effective in displaying quantitative data across geographical regions in this video you will explore the steps to create and utilize field maps in PowerBI focusing on a scenario involving the Adventure Works company by the end of this video you will have the skills to configure and display data on a cororoplath map allowing you to transform complex data sets into insightful visualizations before diving into creating a cororopath map it’s crucial to know how to select the appropriate data for analysis in the context of adventure works let’s consider a scenario where the company wants to understand the sales performance across different regions in a specific country the data should include at least two columns one representing the geographical regions and the other containing the relevant quantitative data such as total sales revenue or profit corresponding to each region in PowerBI creating an effective data model is the foundation of any compelling visualization the data should be structured in a way that PowerBI can understand the relationship between the geographical regions and the quantitative data you must ensure that the columns representing regions are in text format and contain matching names or codes for the regions present in the map data visualization similarly the quantitative data should be in numerical format for accurate analysis with the data model ready it’s time to create a corropath map visual in PowerBI to achieve this you can navigate to the visualizations pane and select the filled map option and PowerBI will automatically detect the columns representing the geographical regions and the quantitative data and position them on the respective fields to enhance the visualization and make it more meaningful you can customize the coroplath map further powerbi offers several customization options to help you fine-tune the visual representation for example you can adjust the color scale to highlight different intensity levels of the data making it easier to interpret variations additionally you can format the map’s title legend and other visual elements to suit your report’s aesthetics and readability let’s apply the steps mentioned above to a specific scenario involving Adventure Works a multinational bicycle manufacturer the company wants to analyze its sales performance across various states in the United States and identify regions with the highest and lowest sales for the very first step map and cororopath map visuals are disabled you must enable them by accessing file options and settings options global then security then check use map and filled map visuals the Adventure Works data set contains two relevant columns state for the geographical regions and sales for the quantitative data representing sales revenue in each state you must ensure that the state column is formatted as text and each state name matches the corresponding states in the map data visualization similarly the sales column should be in numerical format in this instance you will format it as currency you can select the visualizations pane and click on the filled map icon drag the state field to the location well and sales to the tool tip well of the visual to apply the color coding to the map visual go to visualizations format visual and then visual select fill colors and then select the FX icon to apply conditional formatting in the conditional formatting dialogue box add three rules for the color coding of the map based on sales values based on the data the maximum sales value is $400,000 and the minimum value is $81,000 so you can define the following rules rule one all sales values between $80,000 and $149,000 must be colorcoded yellow rule two all sales values between $150,000 and $249,000 must be red rule three all sales values between $250,000 and the maximum value must be purple you then expand the map settings in the style drop-own list you will select a map style powerbi has five styles: Aerial dark light grayscale and road you will select the aerial map style expand the controls option and turn auto zoom to the off position turn the zoom buttons and lasso tool to the on position this gives you control over zooming into a specific area of the map to make the corroplath map more informative you can customize the color scale to represent varying sales levels across states regions with higher sales revenue can be displayed in darker shades while regions with lower sales values can be represented in lighter colors formatting the map title and adding a meaningful legend will help convey the information more effectively lastly you can access the general tabs title dropdown to format the title of the visual and apply other effects as required cororopath maps are powerful tools that empower businesses to visualize and understand data across geographical regions with their ability to display data variations using color intensity these maps provide valuable insights into spatial patterns and trends by following the steps outlined in this video and applying them to a scenario involving adventure works you can master the art of configuring and displaying data on a corupath map in PowerBI in the ever evolving landscape of data visualization map visuals have emerged as powerful tools for presenting geographical data in an engaging and informative manner powerbi Microsoft’s robust business intelligence platform offers a range of features to create compelling map visualizations that can reveal insightful patterns and trends in this video you will explore essential tips and tricks to optimize your map visualizations in PowerBI ensuring that you leverage the full potential of your geographical data map visualizations hold the potential to unlock a wealth of insights from your data especially when dealing with geographical information however it’s essential to optimize these visuals to effectively communicate your insights to your audience adventure Works operates in multiple stores across different cities and states the North American sales manager asks you to present a report of sales for various states and cities as a PowerBI analyst your task is to create a comprehensive analysis of sales across various regions using map visuals a single layer of analysis in map visual might only provide a summary level of information about sales to dig deeper into states and cities you need to create geo hierarchy and map visual of PowerBI let’s go through adventurework sales data and create a geo hierarchy using filled map visuals in PowerBI launch PowerBI and open the project adventurework sales.pbix report the report contains two data tables a fact internet sales table and a geography table in map visualizations defining a precise location is especially important this is because some designations are ambiguous due to the presence of one location name in multiple regions for example there is a Southampton in England Pennsylvania and New York adding longitude and latitude coordinates solves this issue but if the data set does not have this information you will need to make sure to format the geographical columns as the appropriate data category select the country column from the geography table and navigate to column tools then properties in the data category dropdown select country format the data category for a state province name and city columns as state or province and city respectively a global icon appears before the field name this tells PowerBI that this is a geographical data type you will collapse the geography table and expand the fact internet sales table you then select the sales amount column from the fact internet sales table and format the data type as currency within two decimal places select the field map icon from the visualization pane to place a map placeholder in the report canvas you can then enlarge the placeholder to create the geo hierarchy drag the country state province name and city columns from the geography table to the location field of the map visual make sure the order of the fields is country then state province name and finally city next drag the sales amount field from the sales table to the tool tip field of the map visual to differentiate the states based on the sales you should color code the map open the conditional formatting dialogue box by selecting the FX icon from the fill colors in the conditional formatting dialogue box select yellow for minimum red for center and purple for maximum the data set contains sales data of various countries but you only want to present sales data for the United States expand the filter pane and under the country option select United States adding depth to map visualizations leverages geo hierarchies you can drill down from country to state state to city and so on at the top right corner of the map visual in the report canvas are arrow icons these arrows represent the drill down functions used to access the hierarchy of the data first select the downward arrow to turn on the drill down function when the drill down mode is on the arrow is highlighted with a black background now select the downwards double parallel arrow to go to the next level of the hierarchy in the current example selecting the double arrows takes us to the US country level alternatively you can also select the country on the map to go to the next level of the hierarchy you can then hover the cursor over California the tool tip displays the sales value for the entire state in the tool tip is a drill up and a drill down text with icons you can select these icons to either go one step up or one step down in the hierarchy select drill down to access the city level it is important to note that the color of the drill down will be the same color as the higher level view so it may need to be modified for accessibility purposes at the city level the tool tip displays all data from country to city with relevant sales amounts there’s no drill down option because city is the last level of the hierarchy in this report however you can create a more granular hierarchy by adding postal code and stores to the location save the project to your local computer making sure to apply all changes before exiting PowerBI you should now understand how to use data to create geo hierarchies powerbi map visualizations are a powerful and dynamic tool for data analysts seeking to explore understand and communicate geographic data in this video you’ll learn to explore the map visuals interface and display and configure a map adventure Works has created a filled map visual with geo hierarchy let’s help the company format this map by exploring the control options PowerBI offers you launch PowerBI and open the file adventurework sales.pbix go to visualizations and select format visual then visual then expand the map settings dropdown in the style dropdown you can select from the five map styles supported by PowerBI road style is selected by default let’s select aerial from the drop- down list expand the control section to reveal the three zoom options auto zoom zoom buttons and the lasso button auto zoom is automatically turned on you must also turn the zoom and lasso buttons to the on position this provides more control over the map to highlight a specific region the last option in map settings is geocoding culture by default PowerBI sets it to auto leave it as it is to further format the colors of the map visual open the conditional formatting dialogue box where you can modify the colors as needed with the current selection these colors represent the sales data across various states and cities yellow represents the states with the lowest sales values purple represents the states with the highest sales values next you can rename the labels and titles to make the visual clutter-free and help users identify specific places on the map double click on the state province name field in the location well of the map visual and rename it as state in the tool tip field rename sum of sales amount to sales go to visualizations format visual and then general change the title of the map visual to a more descriptive title like sales distribution by location you can configure and format the information that appears when you hover over a specific region on the map expand the tool tips option scroll down to the background and change the color to light green you can use the other options to further format the style and size of the data displayed on the tool tip you have now created a filled map with geo hierarchy and explored the various control and formatting options in PowerBI remember presenting information alone is not sufficient you must also use formatting and design to create engaging dashboards and reports in PowerBI in this video you learned how to explore the PowerBI map interface and display and configure a map powerbi offers various visualization options to display geographical data effectively two popular choices for mapping data are shape maps and filled maps known as corroplets both of these visualizations enable users to present geographic data in a visually engaging and informative manner in this video you will delve into the key differences between these two map types exploring their unique features use cases and the data they utilize as a business analyst working at Adventure Works you need to present regional sales data across different countries in PowerBI you have two options to choose from: filled maps or shape maps a filled map allows you to display color-coded regions based on a metric like sales for various geographical areas while shape maps provide more flexibility for customization the final selection should be based on the visualization requirements shape maps provide a platform for users to create their own custom visualizations by importing geographic data in the form of vector files the vector files used in shape maps are typically in the top too JSON format which is a file format used for storing geographic data topojson files allow for compact and efficient data representation as it reduces the data size and loading times in web applications and visualizations with shape maps users can visualize regions countries states or even custom territories by utilizing their own data sets there are three key features of shape maps to consider: customization precision and data complexity through customization users have the flexibility to use their data and design custom regions based on unique geographical boundaries or territories with precision shape maps can accurately represent non-standard geographic regions that are not predefined in standard geographical data sets by handling data complexity since users provide their geographic data shape maps are ideal for visualizing intricate boundaries and smaller regions filled maps or corropathlets are a type of map visualization that leverages predefined geographical boundaries provided by PowerBI’s

built-in mapping capabilities users can assign data values to regions represented by the map’s predefined shapes filled maps use color shading to represent data values allowing users to visualize data distribution across various regions the key features of Cororoplath maps are simplicity filled maps offer a straightforward approach to map visualization as they utilize predefined shapes without requiring additional custom data sets quick insights with field maps users can quickly gain insights into data distribution and patterns across various regions bing maps integration filled maps benefit from Bing Maps extensive geographic database providing accurate and up-to-date boundary information there are four main differences between shape and filled maps let’s consider these differences and how this would impact on your decisions when working with geographical data the primary distinction between shape maps and filled maps lies in their data sources and customization options while shape maps allow users to import their custom geographic data filled maps utilize predefined geographical boundaries from Bing maps this difference impacts the level of customization and the ability to visualize specific non-standard regions imagine Adventure Works wants to visualize its complex sales territories each with unique boundaries defined by the company’s specific business needs in this scenario shape maps will be a better choice with Shape Maps Adventure Works can import its custom geographic data creating precise and granular visualizations that accurately represent their sales territories the ability to use custom-defined administrative boundaries ensures that Adventure Works can tailor the map to its unique requirements making shape maps the perfect choice for this task shape maps represent data by associating values with custom regions created by users offering precise and granular visualizations filled maps use color gradients to represent data values within predefined regions providing a more generalized view of data distribution across larger geographic areas adventure Works wants to show its sales densities across different regions they want to get a quick high-level overview of how sales are distributed with field maps Adventure Works can quickly assess sales densities by country or region using color gradients providing insights without the need for customdefined boundaries shape maps are best suited for scenarios that require complex geographic representation such as visualizing sales territories customer distribution or customdefined administrative boundaries filled maps with their simplicity and quick insights are ideal for showcasing highle data patterns such as population densities sales performance by country or regional sales growth field maps benefit from Bing Map’s geographical database which ensures accurate and up-to-date boundary information this integration simplifies the process of creating visualizations especially for users who do not have access to specialized geographic data sets adventure Works faces a challenge they want to showcase sales performance by country highlighting regional sales growth but they also want to maintain a level of precision here’s where the choice between shape maps and filled maps becomes crucial shape maps with their custom regions could offer the precision needed to visualize specific sales trends however if a more generalized view is acceptable filled maps can quickly provide insights across larger geographic areas striking a balance between detail and simplicity in conclusion shape maps and field maps are two valuable map visualization options in PowerBI each catering to different use cases and data requirements in the realm of data visualization geospatial information can be a gamecher the ability to visualize data on maps not only adds context but also unlocks new layers of insights powerbi offers a range of map visualizations and one standout feature is its integration with Azure maps azure maps are part of the broader Azure location-based services family also called Azure LBS they provide a comprehensive platform for building geospatial solutions including mapping searching routting and traffic services azure maps visual provides a rich set of data visualizations for spatial data on top of a map it connects to a cloud service hosted in Azure to retrieve location data such as map images and coordinates that are used to create the map visualization it has several advantages compared to other map visualizations including seamless integration with Azure services advanced geospatial features scalability performance enterprisegrade security and developer friendliness details about the area are sent to Azure to retrieve images needed to render the map canvas also known as map tiles data in the location latitude and longitude buckets may be sent to Azure to retrieve map coordinates a process called geocoding in this video you will delve into what Azure maps are how to add them in PowerBI and provide a step-by-step guide to set up and configure an Azure map for Adventure Works competitor analysis by state now you will learn Azure maps and its usage in PowerBI reports you are working as a data analyst in Adventure Works company and you have public sales report data from a competitor you will configure an Azure map for Adventure Works competitor analysis by state you can enable the Azure Map PowerBI visual by selecting the Azure maps icon from the visualizations pane a disclaimer text appears on the screen regarding Azure Maps use of data access model view to view the data model tables the data model contains three data tables a reseller sales fact table a geography table and a reseller dimension table all these tables are related by one to many relationships you return to report view drag the country field from the geography table to the location well of the Azure map visual then drag the reseller measure from the reseller dimension table to the size well of Azure map visual the bubble size proportionally represents the number of resellers in each region to further analyze the reseller for each product line of Adventure Works drag the product line field from the reseller dimension table to the legend well of the visual this adds color coding to the bubble and displays the number of resellers for each product line in each country you can create a geo hierarchy by bringing other fields from the geography table to analyze the granular data further however in this video let’s just focus on the country level next let’s explore some formatting and control settings go to visualizations format visual visual and then map setting you can select the style of the map from the style dropdown select road from the available options in the bubble layer section you can configure the size shape and color of the bubbles the bubbles minimum size is very small so let’s change the size to 15 pixels in the size option of the bubble layer change the color of each bubble slice based on the product line you will also add category labels to the map for accessibility let’s increase the font size to 12 and reduce transparency to 25% lastly you can format the Azure Map title color text style and so on by following the steps outlined in this lesson you can seamlessly add configure and utilize Azure Maps to perform advanced analysis as you continue to explore the possibilities of Azure Maps and PowerBI you’ll be empowered to create compelling visual narratives that go beyond numbers helping you make informed decisions driven by location intelligence cycling is a peaceful and calming leisure activity that anyone can enjoy many people use their bicycles to get outdoors and enjoy the countryside or to go on camping trips with friends but in the business of bicycle manufacturing it’s a constant battle to grow sales and find new markets one way Adventure Works seeks new opportunities is by using data analysis it recently conducted some competitor analysis and that data tells an interesting story its main competitor is performing really well in specific European regions that’s an intriguing insight but the big question is what is the reason for that success what is it about the market that makes it different from elsewhere and is it something that Adventure Works can learn from does it have a product to satisfy the demand in this region the Adventure Works team does some more research to figure out what their competitor is doing right they check on sales volumes the products that do well and the areas of Europe that are supplied by competitors an analysis of competitor marketing tactics reveals that they’re selling to a specific young female demographic in particular regions they’re using a lot of focused social media marketing to get their message to the target audiences the findings point to the frustrations that young female cyclists have with their choice of bike types for city and suburban commuting to bring more depth to the data insights Adventure Works decides to analyze city demographic data where its competitors are most successful focusing efforts on these areas leads to the discovery that there are market demographics that are a perfect match for some Adventure Works products so what can Adventure Works do to compete in the identified regions and markets to find out more the team dive further into the demographic and marketing data the data analysis team then uses the data discoveries to create geographical visualizations the visualizations identify patterns and trends that can lead them toward the development of a new marketing strategy finally it’s time to present the new market plan to the company’s management team examining the new report of the targeted regions it compares the data to its own target audience for bike ranges adventure Works uses the collected data to design their own strategy to target a similar demographic the marketing staff brainstorm ideas for social media adverts influencers and other marketing tactics in areas that the target audience is spending most of their time jaime the CEO believes it has the potential to be very successful and is confident that this plan will help compete with her rivals in these regions data analysis is a powerful tool to help discover new business markets creative use of chart visuals and map visualization can help identify new opportunities and grow business through sales data analysis and competitor data analysis Adventure Works identified a market that they had not yet entered but competitors were already performing well in by the visual analysis of data it found market segments that matched its product line this was valuable insight and led it to new customers and new regions that have a high potential for continued growth powerbi offers several core visuals readily available on the visualization pane but what if the type of visualization you require doesn’t exist in PowerBI you can create it with custom visualizations in this video you’ll explore what custom visualizations are why they matter and how to create them adventure Works needs a visualization to explore its sales data however none of the existing visualizations in PowerBI are appropriate so Adventure Works needs a custom one find out more about custom visualizations then help Adventure Works build its own so what are custom visualizations custom visualizations are userdefined visual elements that extend the capabilities of PowerBI beyond the built-in visual options they enable you to create unique tailormade visuals that cater to specific business and visualization requirements enhancing data’s clarity and impact but why do visualizations matter because of their ability to help address unique needs every organization has its unique analytical requirements with custom visualizations you can create visuals that directly resonate with your organization’s specialized needs custom visuals also offer insights that standard visuals might not be able to convey as effectively this can help you uncover the trends and patterns hidden within your data for example through its custom sales data visuals Adventure Works might discover that it sells more bicycle repair equipment in the winter months custom visualizations can be installed in PowerBI from different sources you can import custom visuals created by developers from the PowerBI marketplace certified PowerBI visuals are available in AppSource microsoft or its partners develop these visuals which can be downloaded from PowerBI desktop you can create custom visualization in PowerBI using Python or R programming languages these visualizations are imported from a file on your local computer you can also develop PowerBI visuals to meet your analytical or aesthetic needs if developing in R or Python then it’s recommended that you use an integrated development environment or IDE such as Visual Studio Code also known as VS Code python is a powerful open-source programming language often used for data analytics it’s very versatile and offers a rich ecosystem it’s beginnerfriendly and backed by community support making it a great language for data professionals it also offers pre-written code bundles or libraries for creating visualizations like Seabor and Mattplot lib using R or Python to develop your own PowerBI visuals or to customize existing ones is an optional expertise you may wish to pursue it if you have a coding background a familiarity with Python or want to extend your skill set into this area before creating a visualization you need to load some data for it luckily Python has built-in data set examples that can be imported and can be used to create new data sets for this demonstration Python has already been installed in PowerBI and the relevant libraries and data sets have been imported so the first step I need to take in PowerBI desktop is to enable Python scripting i navigate to file and select options and settings then select options this opens options where I can select Python scripting always ensure PowerBI has detected the Python installation path under detected Python home directories if you need to you can copy and paste the path from your Python installation i select okay now I am ready to use Python and PowerBI python and PowerBI is used in two ways the first purpose is to import data the second is to create custom visualizations let’s explore the first method and import some data python libraries contain sample data sets that you can import to PowerBI i navigate to the get data dropdown and select more this opens the get data dialogue in the search bar I write Python the Python script appears on the right side of the window i select Python script and then select connect a Python script dialogue box appears on screen from here you can write a Python script to import sample data from Python libraries for instance I can write a Python script to import your data set into PowerBI desktop the code creates a data frame by importing the pandas package of Python with the required columns and associated values once I execute the code PowerBI opens the navigator window with a data set named sample data set the data set appears under the data pane on the right side of the PowerBI interface when I select load to load the data set it can now be used to create visualizations in PowerBI powerbi offers a wide range of core visualizations custom visualizations provide several unique advantages that contribute to more effective data communication improved insights and tailored solutions python with its rich set of libraries and ability to handle data manipulation visualization and machine learning tasks make it an essential tool for data professionals as a data analyst it’s important to be able to extract the insights you need from your data and engagingly present them integrating Python with PowerBI allows you to explore your data more deeply to reveal further insights and present the data through sophisticated visualizations in this video you’ll learn how to add a Python-based visualization to PowerBI Desktop adventure Works is analyzing its data sets and realizes that the core PowerBI visuals don’t provide a comprehensive view of its data you can help the company generate a more sophisticated analysis by leveraging a Python-based visualization in PowerBI let’s learn more about adding a Python-based visualization then help Adventure Works python is a powerful scripting language that relies on libraries these libraries like mattplot lib and seabor can be integrated with powerbi to create dynamic and sophisticated custom visualizations although python provides useful features and libraries it still has a few limitations and it’s important to be aware of these limitations before designing visuals python’s data set size is limited to 150,000 rows and has an input limit of 250 megabytes all data fields from different tables must have defined relationships between them or you’ll encounter an error python visuals refresh after each update filter or highlight external Python scripts might raise security concerns using R or Python to develop your own PowerBI visuals or to customize existing ones is an optional expertise you may wish to pursue it if you have a coding background a familiarity with Python or want to extend your skill set into this area to get you more familiar with custom visualizations let’s demonstrate a Python custom visualization in PowerBI desktop for this demonstration Python has already been installed in PowerBI and the relevant libraries and data sets have been imported so the first step is to create a visualization using the imported sample data set i navigate to visualization pane and select the Python visual icon this opens a dialogue called enable script visuals select enable a placeholder for a Python visual image appears in the report canvas and a Python script editor appears at the bottom of the report page a Python script can only use fields added to the value section by creating a data frame you can add or remove fields while you work on your Python script powerbi desktop automatically detects field changes as I select or remove fields from the value section supporting code in the Python script editor is automatically generated or removed i drag all the fields from the sample data set table to the value section of Python visual based on the selection the Python script editor generates the code the editor creates a data set called dataf frame with the fields I added to the value section duplicate rows are removed from the data and the fields are grouped the first visual will be a scatter plot graph that generates insights between the age and weight fields of the sample data set in the Python script editor I write the code to draw a scatter plot graph that measures age on the x-axis and weight on the y-axis i execute the code to import the mattplot lib Python library which creates the plot finally I select run from the top right corner of the Python script editor title bar to generate the Python visual on the report canvas next to generate another Python visual using Adventure Works data I open the Adventure Works Sales PowerBI project the data model contains four related data tables: sales products salesperson region i make sure the data tables relate to each other using appropriate relationships without these relationships you cannot use the fields from the different tables to create Python visuals the visual required for Adventure Works is a bar chart of total sales by each country to create this visual drag the total sales field from the sales table and the country field from the region table to the value section of the Python visual the editor creates a data set called dataf frame with the fields I added to the values section duplicate rows are removed from the data and the fields are grouped to create a column chart I write the Python script under the paste or type your script code here then I run the script the script draws a plot with total sales on the y-axis and country on the x-axis the script imports the metplot lib visualization library which generates the bar chart you can customize the visuals for color size data values and other attributes by modifying the Python code or importing other libraries that’s an example of creating Python-based visuals in PowerBI both by importing and with Adventure Work sales data set integrating Python with PowerBI helps to move a sophisticated data analysis to a compelling presentation however even though Python-based visualizations expand the capabilities of PowerBI they also have some limitations to consider such as Python’s limited data set size and they do require specialist expertise to implement in PowerBI welcome to this highle recap of the concepts and techniques covered this week this summary will help you revise the lessons on the design of powerful report pages during the course simulations of adventure work scenarios were used in videos and exercises these scenarios are designed to facilitate understanding and provide relatability the items we will review are clarity and visual impact accessibility considerations for Microsoft PowerBI creating and formatting KPI and dotplot charts how to visualize highdensity multi-dimensional data map visuals such as corroporath and shape maps and custom visualizations including adding a Python-based visualization in the first lesson on visual clarity in reports you learned to transform raw data into a story using charts and graphs that expressed the essential narrative of your data charts data and visuals are all crucial components of the clarity and visual appeal of data visualization selecting the correct chart type simplifies complex information making it easier for stakeholders to understand your presentation design with your audience in mind consider how familiar they are with data visualizations and then select visuals and chart types that are appropriate for their background and experience you must use your design ability to create visual impact and clarity one technique to use to do this is to eliminate clutter when building reports and visualizations don’t neglect accessibility produce reports that can be easily used and understood by all individuals including those with disabilities production should include alt text for visuals sufficient color contrast keyboard navigation and compatibility with screen readers the key areas of impactful report creation include deciding on the report objective establishing a visual hierarchy using branding and themes carefully composing the report employing storytelling techniques and optimizing the report performance for the best user experience when deciding on an appropriate chart type consider recommended use cases for the chart its strengths and its limitations by having a clear objective maintaining a visual hierarchy implementing consistency and adhering to best practices in all design choices such as chart selection you can create a report that makes the best impression on the audience kpi charts are often used to illustrate performance benchmarks measure progress and identify trends you can use the Microsoft PowerBI built-in KPI visual or use gauge charts and bullet charts to present KPI values dotplot charts are used to visualize the distribution and frequency of categorical data by displaying data points along a single axis for instance you can use a dot plot to represent category information on the x-axis sales on the y-axis and sales quantity as the size of the dot bubble charts depict multi-dimensional data in a single view for instance to analyze the performance of various products in different markets the X and Y axis represent market share and revenue while the size of the bubble is related to the total number of units sold with bubble charts you visualize data point density and use sampling techniques to manage data representation on the chart when creating reports PowerBI has many built-in capabilities that support ease of use and help your productivity they include app navigation ribbon navigation and navigation and key panes such as the visualization pane and the selection pane as a designer should you have any other disabling factors you have accessibility options that allow you to operate and design in Microsoft PowerBI you explored advanced display techniques in Microsoft PowerBI such as techniques to present highdensity data and the use of maps drills and 3D visualizations for instance you could use a heat map to illustrate sales figures using a color spectrum a tree map to display hierarchical data and compare data point proportions for sales data plotted on a time scale users can use drill down to look at the sales data on a data hierarchy that goes from a year to each quarter to month and all the way down to a daily level powerbi gives you the ability to use chart drill through and page drill through is a technique for creating summary pages with highle insights 3d visualization such as 3D mapping adds a sense of depth and realism to data making it easier to identify trends and analyze data as a report creator you must optimize report layouts for mobile devices to ensure reports display properly on mobile screens one of the key techniques to optimize a report for mobile devices is the use of responsive design powerbi’s shape map visualization reveals insights from geographical data cororoplathth maps visualize variations in data across different locations by color-coding geographical regions based on data values a popular use case for cororopath maps is to display environmental data such as air quality temperature variations or pollution levels for any PowerBI map visual it is vital to properly prepare the data this includes cleaning formatting handling missing values and optimizing for performance one key feature of PowerBI map visualizations is its integration with Azure maps azure maps are part of the broader Azure location-based services family also called Azure LBS custom visualizations are userdefined visual elements that can create unique tailormade visuals for specific visualization requirements custom visuals created by developers can be imported from the PowerBI marketplace certified PowerBI visuals are available in app source and they can be downloaded from PowerBI desktop you can also create custom visualization in PowerBI using Python or R programming languages to help you design powerful report pages you explored various features this week such as clarity and visual impact for charts and reports accessibility considerations for Microsoft PowerBI creating and formatting KPI and dotplot charts how to visualize highdensity multi-dimensional data map visuals such as cororroplath and shape maps and custom visualizations including adding a python-based visualization by applying these techniques you will be better able to create powerful report pages in Microsoft PowerBI data is a treasure and with Microsoft PowerBI analytical powers you can explore it in a variety of ways but what do you need to explore this treasure a treasure map to see the big picture or a magnifying glass to analyze the details that’s the difference between a dashboard and a report your dashboard will provide a high-level analysis of the data that has been analyzed in one centralized place dashboards are a simplified overview of the big picture designed to highlight key metrics for quick monitoring and decision-making reports are comprehensive and analytical designed to dive deep into data while in your report you are able to analyze the finer details of this data add filters slicers and drill through functions in this video you will learn more about the key differences between PowerBI dashboards and reports discovering their use cases along the way jamie the Adventure Works CEO needs to visualize an overview of the company’s performance including sales marketing customers and so on the sales and marketing directors need to explore more granular data to identify trends outliers and anomalies within the data as a principal PowerBI analyst you need to decide on a dashboard design that will work perfectly to present to the CEO with summary level visualizations but for each of the directors you need to create detailed reports about sales and marketing now let’s delve into the primary differences between dashboards and reports both PowerBI dashboards and reports serve distinct purposes and have unique design considerations before exploring design approaches let’s try to understand the fundamental differences between dashboards and reports let’s start by listing some key characteristics of PowerBI dashboards powerbi dashboards are concise summarized displays on underlying reports in PowerBI they typically contain a single canvas or page offering a high-level view of metrics and key performance indicators also called KPIs dashboards are designed for quick decision-making and monitoring they can also include visuals tiles and widgets from different reports when it comes to creating and designing a dashboard in Microsoft PowerBI you can only do it in Microsoft PowerBI service the Microsoft PowerBI service sometimes referred to as PowerBI online is the software as a service part of PowerBI you generate a dashboard and PowerBI service using visual elements and tiles as well as pin an entire page of a report to your dashboard first you have simplicity and focus dashboards are concise and focus on key metrics they avoid clutter and unnecessary visual elements and prioritize the most critical information for quick decision-making next you have visual hierarchy visuals need to be arranged in a logical sequence the use of size color and placement emphasizes the significance of information that is presented lastly there is mobile responsiveness you must ensure your dashboard is responsive and visually appealing on a variety of devices such as tablets and mobile phones it is important to use responsive design principles to adapt to all screen sizes now let’s turn our attention to PowerBI reports powerbi reports are detailed and structured documents often consisting of multiple pages or tabs they are also designed for in-depth analysis and exploration of data containing tables matrixes and visuals that provide detailed insights powerbi reports support filtering drill through and slicers for interactive exploration to maximize report impact for all types of viewers you must consider three major areas of design layout and structure interactivity and storytelling let’s start with layout and structure you need to use a clear and logical structure to guide report users through the data utilize page numbers titles sections and headers to improve report navigation next you have interactivity in the report design you must consider adding slicers filters and drill down and drill through functionality to access granular data finally storytelling reports are designed to tell a datadriven story you need to use text boxes annotations and narratives to explain valuable insights arrange visual elements in a logical sequence to guide users about the introduction main body and the conclusion of the story before exploring an example of using dashboards and reports let’s touch on charts in PowerBI and how they interact with dashboards and reports appropriate chart selection to match the type of data being presented is essential to designing both reports and dashboards in PowerBI chart selection is critical in data visualization as it directly impacts the effectiveness of data communication the choice of chart will determine how your audience understands and interprets data because a dashboard is based on your underlying reports it is essential to make the correct chart selections for the data in your reports for your task for Adventure Works you need to create multiple dashboards for the CEO as well as the sales and marketing directors let’s start with the CEO Jamie with a tailored dashboard with data presented to meet their specific needs with this dashboard you should focus on designing a dashboard emphasizing highlevel insights key performance indicators and strategic information in a visually appealing layout based on this typical dashboard layout often includes these six categories first is an executive summary this section may include KPIs in the form of card visuals such as revenue profit margin year-over-year growth and market share next up is sales performance this may include charts showing revenue expenses profit trends and time comparisons the third category is market overview which represents market share trends and competitive analysis the fourth category customer metrics can include customer retention and acquisition rate charts the fifth category is operational performance in this category production output customer satisfaction and departmental performance visuals can be included finally you have strategic initiatives completion status for key initiatives in the form of progress bars and charts illustrating project timelines and milestones can be presented in this section for the sales director you need to design reports with drill down and drill through modes for detailed and granular data analysis for the drill down and drill through modes to work you can break down the report into individual pages these pages are sales performance overview geographical analysis product analysis salesperson’s performance and timebased analysis each of these pages needs to be designed with appropriate structure and chart selection based on data you want to present lastly let’s consider what is required for the marketing director’s report the marketing director will need to see data related to Adventure Works marketing channels how campaigns are performing and a categorization of customers for the marketing director the report content should contain an overview marketing channel analysis campaign performance customer segmentation and recommendation and insights this will provide the marketing director with a good starting point to begin assessing their department and that concludes our summary of dashboard versus report design in Microsoft PowerBI designing a dashboard and designing a report are distinct processes with unique objectives reports offer in-depth analysis and exploration of granular data while dashboards provide high-level overview for quick decision-making and monitoring of key metrics consider a PowerBI dashboard that feels like it was designed just for you precisely delivering the insights you need to drive your decisions this dashboard is designed to optimize your experience the end user making your work easier creating user centric dashboards in PowerBI is not about displaying a collection of charts and graphs it is about solving specific problems for your users with important data indicators prioritized high on the page trends and performance comparisons further down the page and general information towards the bottom in this video you will learn about getting a better understanding of your audience creating user centric dashboards as well as exploring some examples of these dashboards so how can you better understand your audience when designing your PowerBI dashboards you will likely have a baseline of knowledge depending on the products or services your company offers but what else can be done to help understand your target audience let’s look at four methods you can use they are identifying the end users defining user needs establishing users data literacy and finally identifying the preferred devices of users let’s begin by identifying the end users end users are the individuals or groups who will be interacting with and generating insights from your dashboards identifying your audience helps tailor the dashboard to their specific needs and preferences next you must define user needs each user group may have distinct data requirements and objectives you need to work closely with each user group to determine the specific data they work with and how you can visualize them you can do this by identifying key metrics relevant to their roles allowing you to select what is presented on their dashboard having established the end users and their needs you must now consider their level of data literacy are they data savvy or do they need a simplified data interface for example a sales team will need the most accessible data they are used to working with as opposed to a finance team that may be used to more complex data sets and charts lastly you must consider the device preferences of your audience consider the devices they are using most frequently are they accessing dashboards on laptops tablets or mobile devices this will help you make selections optimized for device specific dashboards let’s consider an example where this is put into practice the Adventure Works sales director received a sales performance dashboard that she did not like as it was difficult to comprehend the visuals on the dashboard realizing she is unable to use the current dashboard to assist in decision-making she passed the dashboard and underlying reports to you to make necessary improvements when you open the dashboard you look to identify the issues the dashboard might look impressive at first glance but there are many problems remember a dashboard should be understandable and actionable but currently this dashboard is neither there are data shortcomings as well as design shortcomings in this dashboard the data shortcomings include the area chart displaying sales by category is not appropriate here the donut chart shows sales by country without any legend the tree map used to display sales by product subcategory is too busy with too many colors the top five products by sales column chart is not relevant to the sales dashboard with regards to the design there are a similar number of issues the salesbyear column chart has a negative value but is the same color as the positive numbers key metrics of the dashboard such as revenue units sold and profit are not presented appropriately overall there is no color and style uniformity in the entire dashboard based on a brief analysis of the dashboard it can be easily concluded that the dashboard is neither understandable nor actionable your task is to redesign the dashboard focusing on key metrics including the relevant information for salespeople and visually appealing colors and charts let’s redesign this dashboard by following these steps select visuals that effectively convey your intended message when you design user specific dashboards you might want to import custom visuals in PowerBI to meet the specific needs of your audience next place the most critical information at the top of the dashboard based on the requirement gathered use key performance indicator tiles to highlight key metrics maintain consistency in your design including the color schemes fonts and layouts if you choose a color to convey positive figures ensure it is consistent with all graphs and charts ensure you employ responsive design techniques when designing your dashboard many end users access dashboards from their mobile devices therefore you need to make sure the dashboard is visually appealing and functional on smaller screens create a narrative flow within your dashboard text boxes card visuals and annotations can guide users through the data visualization if you implement these best practices to redesign the dashboard you will create a dashboard which is understandable and actionable this dashboard is concise relevant to the sales manager and maintains consistency in terms of theme and color palette all the charts are appropriate for the data type presented let’s finish this example by outlining some user specific dashboards you would design for other departments in Adventure Works for the marketing team your dashboard would monitor marketing campaign effectiveness visualize social media engagement provide demographic and geographic insights about the target audience and display competitor analysis on various product lines if you were tasked to develop a customer support team specific dashboard you would track customer support ticket data display customer satisfaction scores provide a real-time view of open tickets and escalations as well as highlight frequently reported problems these are just guidelines in real life situations you need to tailor your dashboard according to your user requirements once you have crafted and designed a user specific dashboard it is essential to conduct testing and receive user feedback to ensure that the dashboard meets their needs and expectations user feedback can especially add value to improved iterations of your dashboard creating user centric dashboards is about two things is it understandable and is it actionable to do this you need to identify your target audience understand their needs their data literacy and the devices they use to engage with dashboards you should now understand the effective use of visuals how to remain consistent in your color selection and selecting the most appropriate data for your audience imagine you are working for Adventure Works when you receive a request from your manager Addio Quinn who is traveling abroad for a business meeting they need an up-to-date overview of the company’s sales performance in a dashboard format adio may not be able to access the dashboard on a large device such as a computer or laptop while traveling therefore your primary goal is to create and optimize the dashboard so Adio can access the required information on the go using their mobile device in this video you will learn about how you can optimize dashboards for mobile phones and Microsoft PowerBI mobile optimization of PowerBI reports and dashboards is not just a trend it is a necessity in modern business intelligence applications there are three reasons in particular why mobile optimization is so important they are accessibility real-time decision-making and enhanced user experience mobile optimized dashboards ensure that actionable insights are accessible to users who rely on smartphones as their primary device the second reason is real-time decision-making executives directors and managers need up-to-date information at their fingertips to make strategic decisions on the go lastly you have enhanced user experience a welloptimized dashboard improves the user experience making it easier for users to interact with and understand data let’s explore how you can optimize the Adventure Work sales dashboard for cellular devices a dashboard is a single canvas of data visualization displaying the current state of the business based on underlying reports in PowerBI service you want to optimize a sales summary dashboard for mobile devices log to your PowerBI service all reports data sets and dashboards are listed in my workspace select my workspace from the left navigation pane of the PowerBI canvas and select the sales summary dashboard to open it this is an existing dashboard created from a report published from PowerBI desktop in my workspace dashboards are distinguished by clock icons once the dashboard is open select the arrow beside edit from the top menu and then select mobile layout from the drop- down options this opens the phone dashboard edit view the phone layout screen has two panes edit mobile layout and unpinned tiles the unpinned tiles pane contains all tiles that are unpinned from the dashboard you can resize and rearrange any tiles to fit the phone view the desktop version of the dashboard will not change you can also unpin any tile from the phone view if it does not fit or is not needed in the edit mobile layout screen the tiles of the sales summary dashboard are not in the correct order you can resize reposition and rearrange the tiles in the mobile layout once you drag and resize a tile other tiles in the dashboard adjust their position automatically instead select unpin all tiles from the top menu bar this will unpin all tiles and move them to the unpinned tiles pane this will allow you to start the design from scratch you can now pin individual tiles and resize them in a sequence to the mobile layout pane the three card visuals contain a snapshot of information about sales and profit you can then pin these three card visuals to the top of the mobile layout screen select the pin icon on the top right corner of the tile to pin the visual on the mobile screen next pin the yearly profit tile to the mobile screen below the card tile you can pin the sales by year and sales by category tiles side by side below the yearly profit tile on the mobile screen next pin the sales by country tile and sales by salespersons below the existing tiles you can enlarge the sales by salesperson tile to display the entire data set the top five products tile is not related to the sales summary dashboard and is not needed in mobile screen so you can leave that tile on the unpinned tiles pane you can resize and rearrange the tiles according to your analytical and audience requirements if you are still unhappy after you have completed these changes you can either reset tiles or unpin all tiles reset tiles returns the dashboard to its original state while unpin all tiles moves all tiles from phone screen to unpinned tiles pane when you’re satisfied with the phone dashboard layout you can switch to web view by selecting web layout from the top menu bar powerbi automatically saves the mobile layout once a dashboard has been completed you can view it on your cell phone you will need to download and install the PowerBI mobile app and log into your account all dashboards are listed in my workspace the ability to access and act on data insights while on the move is an essential element of today’s fast-paced business landscape by ensuring your mobile dashboards are accessible enable real-time decision-making and enhance the user experience you will set yourself up for success optimizing PowerBI dashboards for mobile devices ensures that the decision makers have access to the data they need when they need it leading to better and instant decisions given the amount of data sources available a single dashboard can never display all of the available data as a data analyst you must manage multiple dashboards and reports in Microsoft PowerBI let’s say you need to design multiple but similar dashboards for example you might need these dashboards for managers in different countries designing each dashboard from the beginning each time is not good practice in this video we will explore features in the Microsoft PowerBI service that can accelerate your workflow when creating and managing multiple dashboards there are two different workflow approaches you can use in PowerBI service making a copy of a dashboard and pinning elements from one dashboard to another there are many occasions when a copy of a dashboard helps your workflow these include using a dashboard as a template testing dashboard versions making regional versions of a dashboard and working databases that have the same data structures and types you can use an existing dashboard as a kind of template to create a new dashboard use this technique when you work on scenarios that closely resemble each other in terms of structure and flow of information the procedure is to build the first dashboard copy it rename it and then edit this copy modifying it to reflect the second data scenario to test dashboard performance create a duplicate of a dashboard modify it and test its performance against the original version for global operations you may need to create slightly different versions of a dashboard to match the culture language or norms of various countries or regions when you get a new database that has the same data structure and types as the existing data set you can duplicate the original dashboard and use it as a template for the new data set the second technique to handle multiple dashboards in PowerBI service is copying a visual element between the dashboards for example imagine you have a custom visual tile in a dashboard that you want to include in another dashboard in your workspace you can simply pin the tile from one dashboard to another without navigating back to the original report the source of the tile does not change meaning that the pinned tile links back to the original source report where it was created if the original content changes all dashboards pinned to it will also be updated to create and copy dashboards you must use the Microsoft PowerBI service you can view dashboards in Microsoft PowerBI service and in Microsoft PowerBI mobile dashboards are not available in PowerBI desktop therefore you need to publish all your reports to PowerBI service before creating and managing dashboards to create a copy of a dashboard you must be the creator of the dashboard if someone in your team shared a dashboard with you you cannot duplicate it you cannot pin tiles from dashboards shared with you only from dashboards created by you let’s open PowerBI service and explore some techniques to manage multiple dashboards to duplicate a dashboard log into your PowerBI service and open the workspace that contains your dashboard select the dashboard to duplicate from my workspace navigate to file and select save a copy from the drop-down a duplicate dashboard dialogue opens here you need to give an appropriate name for the duplicated dashboard select duplicate a duplicated dashboard is saved in the same workspace as the original one now the dashboard can be opened and modified to satisfy the analytical requirements some of the tasks you can perform include move resize and delete tiles add or pin new tiles share your dashboard with colleagues and team members the next task is to pin a tile from one dashboard to another open the product sales dashboard from my workspace and hover the cursor on the tile to pin then select more options and select pin tile from the dropdown in the pin to dashboard dialogue from the drop-down select either an existing dashboard to pin to or create a new dashboard and pin the tile to that when you select pin a success message appears at the top right corner indicating the visualization has been pinned to the selected dashboard open the dashboard to check the pinned visual further operations can now be performed on the pinned visualization like resizing renaming and moving you can duplicate a dashboard and pin a tile from one dashboard to another in Microsoft PowerBI service in real world data analysis working on many dashboards and reports is a frequent practice being able to quickly replicate a dashboard and copy visual elements between dashboards is a valuable addition to your skill set content with a visual always attracts more viewers than non-visual content visually rich media such as photos images videos and animations significantly contribute to the impact of content eye-catching visuals help to onboard and engage viewers informative visuals enable them to focus on and understand your message in this video you’ll discover media elements you can integrate into your dashboard and explore the benefits they bring to your workflow microsoft PowerBI service supports many media types in a dashboard including text boxes images videos web content and live streaming or real-time data there are many benefits to using media elements such as their ability to enhance data context create engagement reinforce branding provide instructions and present a summary visual content such as images and videos provide a context to data for example you can use images to display product photos company logos location maps and use video footage for a manufacturing or promotional video clip to help users understand the data being presented still images and motion graphics make dashboards more engaging and assist effective storytelling videos or animations for instance can be included to narrate the story behind the data making it more relatable and impactful reinforce an organization’s branding by including company logos and product images in your dashboard animations and video clips about a company’s corporate culture manufacturing process or marketing campaigns are some examples that can be included the use of short video clips containing instructions on how to navigate dashboards and interact with data effectively is another helpful application of media in dashboards images and icons can be used to present a visual summary of data making it easier to quickly grasp key insights you can include live streaming as a media element in a dashboard powerbi’s real-time streaming updates your dashboard data automatically and constantly any PowerBI visual or dashboard can be used to display and update real-time data and visuals the streaming data that feeds your updates can come from social media sensors such as a point of sale terminal or sensors detecting changes in light heat or motion service usage such as metering the consumption of power or other utilities or any time-sensitive data there are three types of data sets designed to display on real-time dashboards and tiles push data set streaming data sets and pubnob streaming data sets a push data set is where the data is pushed to PowerBI service from any live streaming data set such as SQL server when the data set is created the PowerBI service automatically creates a new database in the service to store the data with a push data set you can create visuals reports and dashboards as with any other report visual because the data is stored in PowerBI service you can pin any visual to the dashboard from your report and on the dashboard visuals are updated in real time whenever the data is updated powerbi only stores data from a streaming data set in temporary caches which expire quickly with a streaming data set the data is also pushed to PowerBI service from any data set that is constantly updating like SQL server or Amazon web services Oracle and so on a streaming data set is not stored in PowerBI memory as a result it has no underlying data set physically saved in PowerBI that means you cannot use regular report functionality in PowerBI like using filters and slicers in your report for drill down functions and to create interactivity the only way to use a streaming data set is to add a tile to your dashboard and use the streaming data set as a data source called custom streaming data in PowerBI service the tile is then optimized to quickly display real-time data you can choose any visual you want on the tile and the benefit of a streaming data set is that the visual always displays live data we can also use something called the PubNub streaming data set pubnub is a platform for building realtime applications it works with the minimum of delay which is called low latency this is because no data is pushed to PowerBI all realtime data is live streamed from PubNub it is a solution that has high reliability and is scalable meaning that its reliability and performance are retained as your audience grows this is a vital feature since your audience will expect the real-time changes to be instant regardless of how many viewers are online pubnub manages this by being scalable over globally distributed data centers pubnub is compatible with platforms across web mobile and internet of things powerbi is one of these platforms that can read an existing PubNub data stream the PowerBI web client uses the PubNob software developer toolkit or SDK to read an existing PubNub data stream the PowerBI service stores no data because the web client makes this call directly you must list any approved traffic from your network to PubNub as allowed like a streaming data set PowerBI does not store data so you cannot use any report building functionality you can visualize a PubNob streaming data set by adding a tile to your dashboard and configuring a PubNub data stream as the data source tiles based on a PubNub data source are optimized to quickly display real-time data pubnub is a streaming service that means it is a platform that helps build and operate real-time interactivity for mobile web and internet of things it is useful for real-time use cases that require security scalability and reliability the three types of data sets you can use to display real-time data are push data set streaming data sets and pub streaming data sets in PowerBI with the push data set you can create reports visuals like you usually do with an imported data set and then pin the visual to the dashboard streaming data sets and PubNob streaming data sets are not stored in PowerBI memory and therefore do not allow you to create any report visuals to use those you create a dashboard tile and connect a live streaming data set directly to the visual on the tile choosing a streaming method depends on factors such as where the data set is hosted what the analytical requirements are and what infrastructure your organization has available live streaming brings many benefits including live streaming updates enable users to access current data in real time this is especially valuable for monitoring rapidly changing metrics or critical data points dashboards with live updates can include alert mechanisms that trigger notifications when specific conditions are met live data streaming allows organizations to respond quickly to market changes operational disruptions or emerging trends team communication is improved through real-time collaboration and live data updates enable organizations to adjust forecasts and strategies based on the most recent data incorporating media elements like still images motion graphics and live streaming updates helps to transform your PowerBI dashboard using dynamic engaging and real-time visuals these visuals not only enhance the user experience but also empower users to respond quickly and make decisions about changing business conditions a sales summary dashboard that you created has all the required sales data but it fails to engage the audience the addition of media elements can help in this video you’ll learn how to add and format dashboard media elements to help enhance user experience powerbi service allows you to incorporate media elements such as still images and motion graphics into your dashboard log into a PowerBI service account open the sales summary dashboard from my workspace we’ll add three media elements to the dashboard a text box a still image and a video clip you need to add a tile to your dashboard to place an image text box or video select add a tile from the edit drop-down the add a tile dialogue appears where you can select the media type to add a dashboard heading select the text box and select next the add a text box tile window appears on the right side of the screen where the title and description can be added add text to the content section such as this dashboard displays the most up-to-date sales information of Adventure Works next format the text to increase the size color and indentation change the font size to 16 bold the color to black and center it tick the check box to display the title and subtitle of the tile you can also set a custom link and add either an external link or a link to another PowerBI dashboard or report from my workspace hyperlinks can also be added to the content section of the text box next let’s add the Adventure Works logo to the dashboard if you want to place your company logo or any other image to your dashboard you need to publish the image online and create a URL link with http colon or https colon you must also make sure that security credentials are not enabled to access the image you cannot add SVG file types to a PowerBI dashboard from the add tile window select image and then next in the detail section to display the title above the image tick the display the title and subtitle checkbox when placing something like the Adventure Works logo you don’t need to enable the title and subtitle now to enter the image URL the Adventure Works logo is already published to Google Drive and the URL was generated without any security credentials which is added here to the URL section to hyperlink the tile select set custom link and then select external link you need to enter the URL of the external source to make the tile a hyperlink select apply and a logo image is added to the dashboard and you can rescale and reposition the tile within the dashboard the last media element to add is a video only YouTube and Vimeo links are supported from the add tile window select video a video information window appears where you need to add information about the video to display the title and subtitle of the video tick the check box display the title and subtitle we will leave the title and subtitle off for this demonstration add a video URL to a clip hosted on YouTube or Vimeo to add the hyperlinks tick the check box set custom link under functionality select external link and add the video URL you can add the video link to open in a new browser tab or add a link to an entire playlist viewers can watch the video on the dashboard tile and also select a hyperlink to navigate to the entire playlist to watch further videos in the same tab select the no option from the open custom link to open the custom video link in a new tab select apply a video tile is added to the dashboard and you can resize and reposition the tile as needed once you add a media tile to your dashboard you can go back and make any changes to the text box change the video URL and so on to make changes select the title and hover the cursor on more options indicated by three dots on the top right corner of the tile and select edit details then the edit tile window opens where you can make and apply changes to the media tile you should now be familiar with adding media elements to the dashboard and formatting them to help create an engaging and captivating user experience with the help of images and videos you can transform your dashboard into an immersive and informative tool you don’t ever want your end users to have to type in a URL they may not type it at all because it’s too much effort or worse still they may type it incorrectly fail to reach your site and give up a QR code is a better solution that avoids the end user having to type in anything it’s short for quick response code a QR code is a two-dimensional barcode that contains information in a machine readable format qr codes consist of black squares arranged on a white square grid typically in a square shape qr codes can store different types of data including text URLs contact information phone numbers and more qr codes are a valuable addition to PowerBI dashboards and reports they enhance user interactivity and data accessibility qr codes are useful in PowerBI dashboards because codes can be generated for specific reports and dashboard tiles in Microsoft PowerBI service users can scan the QR code using their mobile devices to instantly access the associated content without any manual navigation this feature is especially useful for onthe-go access to critical information external web sources or documents can be linked to QR codes providing users with additional context or supporting information related to dashboard data qr codes can be used to gather user feedback or conduct surveys directly from the dashboard since QR codes are mobile friendly they align with the growing trend of mobile business intelligence users can scan codes using their smartphones making data consumption more convenient and accessible the marketing department can use QR codes for instance linking to promotional materials or campaigns related to the data presented on the dashboard you can create a QR code for a dashboard tile and PowerBI service or for a PowerBI report to better understand the use of QR codes consider this scenario to help manage sales reporting and streamline order placement Reneie the Adventure Works marketing manager wants to have quick and easy access to key sales metrics she also wants to share the measures with the sales team to track the sales progress using PowerBI service you can fulfill her analytical needs by adding the power of a QR code reini can share the QR codes among her team members and any stakeholders to give them quick access to relevant data let’s explore PowerBI service and discover how to generate a QR code for a report or dashboard tile in PowerBI service you can generate QR codes for either the entire report that you published from PowerBI desktop or for an individual tile of a dashboard you can create a QR code in the PowerBI service for tiles in any dashboard even in dashboards that you cannot edit let’s check both processes log into PowerBI service and open the sales summary dashboard in the dashboard there is a tile representing sales by salesperson you can generate a QR code for this visual element of the dashboard select the more options from the upper right corner of the tile represented by three dots and select open and focus mode from the drop-down powerbi opens the visual in a full screen in focus mode select more options from the upper right corner of the menu bar and choose generate QR code from the dropdown a dialogue with the QR code appears from here you can scan the QR code or download it as an image which can be shared by email or print to display it in an office or a public place where colleagues can access the information if you want to print the QR code make sure to print it at 100% or actual size if the data in the tile is updated the sales manager can monitor the sales performance you can select exit focus mode to go back to the dashboard next to generate a QR code for the entire PowerBI report open the Adventure Works PowerBI report from my workspace select file and choose generate QR code from the drop-down a dialogue with the QR code appears and you can use the QR code as mentioned previously you can scan the QR code from the PowerBI app on a phone to directly access the visualization qr codes can be generated using the built-in capabilities of Microsoft PowerBI both for a dashboard tile and an entire PowerBI report strategic integration of QR codes and PowerBI can streamline the workflow leverage the power of mobile technologies and enhance the user experience whether it is for efficient data access or engaging user interaction QR codes are a valuable addition to your PowerBI dashboards and reports have you ever accidentally started watching a film halfway through remember how confused you felt and how many questions you had to ask the other viewers before you finally understood the character and the plot if a Microsoft PowerBI report or dashboard does not tell a cohesive story then the employees and stakeholders who view them can feel a similar confusion transforming raw data into a meaningful narrative is a vital skill for the data analyst effective data storytelling serves as a bridge between the analysis of the data and communication of the results it combines the art of storytelling with the science of analytics to convey insights and findings in a compelling way with a multinational organization like Adventure Works where employees and stakeholders are spread across different regions effective data storytelling is particularly important in this video you will explore the main components of data storytelling and discover the benefits of a good data story data storytelling is the art of using data and visuals to build compelling narratives which helps to convey a message highlight trends and engage a wide audience at its core it involves presenting data in a way that captures attention facilitates understanding and informs decision-making you can achieve effective storytelling by combining three distinct components in a well scripted way which can lead the report users to the insights produced by your analysis let’s explore those components at the core of data storytelling is the data itself this includes raw information facts and statistics that you have collected when the data has been processed and analyzed you can then identify the primary message you want to convey the use of a business analytic tool such as PowerBI can help to provide the context throughout your data story in addition the data provides the context that the audience needs to interpret the analysis presented to them next you design the journey the audience will take towards your primary message identifying the start and end points and any key data points along the way a narrative provides structure context and meaning to your data a well-crafted narrative explains the significance of data outlines the key findings and guides the audience through the story’s progression it might include explanations interpretations and implications based on data insights data visualization is the representation of data using charts graphs maps and other visual elements by choosing appropriate and effective data visualizations you allow viewers to quickly grasp information viewers can identify the trends patterns and insights that might be challenging to discern from raw data alone in the context of data storytelling visual elements educate your audience on your proposed theory by creating a connection between the visual elements and your narrative you can engage the audience and present both detailed and summarized data points these three components work together to create a datadriven story that communicates information and insights effectively and can even create an emotional response the data provides evidence substance and context visualizations aid in comprehension and the narrative ties everything together into a cohesive and compelling data story effective data storytelling can have a positive impact on the stakeholders directly involved and your organization as a whole some benefits of successful data storytelling include engagement engaging stories capture and hold the audience’s attention this engagement is vital for conveying critical messages next is enhanced understanding good data storytelling simplifies complex information and highlights key points making it accessible to a broader audience the visualizations and narratives help them to understand datadriven insights without requiring them to have advanced technical knowledge to capitalize on this you need strong communication data storytelling ensures that analysis is not limited to data analysts or data scientists it facilitates communication between different departments and disciplines within an organization fostering collaboration at the heart of datadriven stories is the purpose of solving problems datadriven stories help identify problems and opportunities by revealing patterns and trends it also encourages proactive problem solving through business analytic tools lastly there is effective reporting whether you are working in research business or academia data storytelling enhances the effectiveness of reports and presentations it transforms dry data into engaging narratives that captivate audience attention and involvement data storytelling is a transformative approach to data analysis and communication you can leverage the power of narrative data and visualization to convey insights effectively by mastering data storytelling you can add value to your data and insights and offer value to your audience and industry when you think about data and the story it can tell you need to think of it as a traditional story that you’ve read in books or watched in movies it contains the same elements of traditional stories like a setting characters a situation of conflict overcoming this conflict and a resolution to the story as an analyst you need to build your data story around these traditional storytelling methods by the end of this video you will have explored how elements of traditional storytelling can be translated to your data story in Microsoft PowerBI data contextualization establishes the environment and background against which the datadriven story unfolds your setting includes the details about the data sources the time frame and the broader context in which the analysis takes place for instance if you are analyzing sales data for a specific year in Adventure Works the setting would include details about the industry the market conditions and the company’s current financial status next up are the characters of your data story these are the individuals involved in the analytical process this includes data analysts data scientists and other stakeholders such as business leaders collaborators and external partners in a data story each character plays a unique role data analysts are the main characters who explore and interpret the data the main audience of your analysis such as CEOs or directors are supporting characters to the data story stakeholders are impacted by the insights driven from the data like many great stories conflict is central to your data story in this context the conflict is the business problem or data challenge it is the central issue that the data analyst aims to resolve for example your problem could be a decline in sales a drop in customer satisfaction or any other business issue determined through data analysis the conflict sets the stage for your analysis and drives the story towards the resolution finally there’s the resolution to the data story the resolution in the data story is the result of the analysis where insights are presented and actionable recommendations are made the resolution should provide a clear path of action based on datadriven insights and findings for example if the conflict is declining sales the resolution might involve strategies to boost sales like targeting specific customer segments launching a season specific marketing campaign and so on let’s explore how as a Microsoft PowerBI data analyst you would implement story elements to address a real world challenge at Adventure Works the story unfolds at Adventure Works headquarters where the company’s CEO Jaime is meeting with leadership to discuss the declining sales of Adventure Works products threatening the company’s future as a PowerBI data analyst and report designer you are the main character of this data story you are determined to uncover insights and anomalies from the data that will lead the company out of its sales slump a secondary character is the Adventure Works CEO Jaime jaime is considered a visionary CEO known for her adventurous spirit and belief in the company’s potential she is eager to make strategic decisions based on your analysis to move the company towards new heights the challenge facing Adventure Works is a steady decline in sales over the past two years the decline is causing concern among various stakeholders of the company including Jaime the executive leadership recognizes the company needs a datadriven solution to identify the reason for the decline and devise strategies to reverse the trend as the principal analyst you explore the company’s sales data from this 2-year period you investigate customer demographics seasonal trends and product performance through effective data visualization you uncover three significant insights first the sales of mountain bikes have outperformed other products in the same subcategory during the spring and summer months secondly by delving into customer feedback you discovered a compelling pattern of customers consistently praising the durability and quality of Adventure Works mountain bikes lastly you revealed a correlation between decreased marketing efforts and the months of declining sales based on your results it became clear that the company’s reputation for producing rugged and durable products is a hidden gem that can be capitalized on and that a consistent and effective marketing campaign is the missing piece of the puzzle to increase sales now you reached the resolution of this data story after working on data visualization and exploration you presented your report to the executive meeting and the CEO the committee decides to immediately address the identified issues based on your findings the marketing team drafts a roadmap to focus their efforts on promoting the durability and quality of their mountain bikes based on these findings the CEO Jamie provides a directive to the marketing director to increase the campaigns by targeting the competitive advantage Adventure Works has over their competition reliability with a datadriven strategy in place Adventure Works can now embark on a new journey as the company emphasizes the durability of its bikes and expands into new markets Adventure Works reignites their essence of exploration and sales begin to rise once more you have crafted a datadriven story of transformation for Adventure Works through data analysis and storytelling the company identifies outliers correlations and patterns to their problem this insight helps the company to rediscover its core strength and plan its future efforts accordingly a collection of numbers and charts on a report canvas in Microsoft PowerBI does not always tell a captivating story however with the science and art of data storytelling you can turn data context into your story setting turn stakeholders into characters and frame a business problem into a conflict and resolution the data storytelling process is an integral part of presenting data analysis it involves transforming datadriven insights into a narrative that is engaging and informative and leads to action and resolving the conflict in this video you will delve into the full process of data storytelling and how you can relate it to the data analysis process let’s start by outlining the eight steps you will cover they are goal data collection and preparation data analysis and exploration data visualization audience consideration communication feedback and iterations and actions and decision-making the data analysis process typically begins with defining a clear goal and a hypothesis of what you expect to uncover in your analysis analysts theorize about the relationship between the variables in the analysis and what they expect to discover from the data connecting this to data storytelling it is crucial to understand what message or insight you want to convey through the data this end goal guides the entire storytelling process data is collected from a source cleaned transformed and prepared for analysis as you learned in previous lessons this process might include merging data sets removing errors and duplicates handling missing values and so on in data storytelling your work begins with prepared data therefore it is essential to have a well ststructured data set that aligns with the goal of your story this ensures that the story is based on accurate and relevant information the data analysis and exploration stage involves statistical analysis hypothesis testing and data exploration techniques to uncover patterns trends and relationships in the data these findings are the heart of data storytelling you need to select the most critical insights that align with your goal such as key trends correlations anomalies or any other significant findings visualization is the key component of data analysis allowing you to explore and communicate data patterns effectively it plays a significant role in determining how receptive your audience is to receiving complex information to create effective visuals to support the goal of your story you need to choose the appropriate chart type relevant to your data effective visualization can help to reveal patterns trends and findings from your data provide context interpret results and articulate insights streamline data so your audience can process information and improve audience engagement you need to create a dashboard using data visualization tools in PowerBI to present these findings a data dashboard is used to manage information and for business intelligence a dashboard provides a single canvas to organize and present valuable information in a logical sequence the dashboard is the single location where the audience can understand the connections between the data story and the hypothesis you made initially data storytelling places a strong emphasis on the audience you need to tailor your story to your audience’s background their knowledge of the topic and business requirements the narrative is designed to resonate with the audience data storytelling involves dynamic and engaging communication this includes presentations interactive reports and dashboards you need to collect feedback from team members and other stakeholders which helps you refine your narrative visuals and overall storytelling approach to better meet your audience’s needs data storytelling is not just about providing information it aims to inspire actions having established your goal at the start of the storytelling process it should link back to the actions and decisions the compelling visuals and narrative aim to motivate stakeholders to make informed decisions backed up by accurate data and insights presented data storytelling is changing the way we consume information storytelling with data imparts a human dimension to often complex and cryptic data sets filled with numbers and statistics crafting a narrative plays a role in this process but the ability to comprehend and convey information is crucial for constructing a compelling narrative and leading to effective decisions congratulations on completing dashboard design and storytelling in Microsoft PowerBI you learned about using design principles to improve the visual impact of a dashboard and tailoring the design to the users interacting with the dashboard you also explored data storytelling and how it is a compelling way of transforming raw data into a data narrative that informs engages and inspires action let’s recap what you learned and the key takeaways from each topic you began by learning about improving dashboard and report design in Microsoft PowerBI dashboards are created in PowerBI service and are based on underlying reports dashboards are typically a single canvas of information presenting the current state of the business reports are designed from a variety of data sources in PowerBI desktop and typically contain multiple pages reports support the use of slicers and filters to enhance interactivity for users having established your knowledge of dashboards and reports you then learned about how to identify and focus on the end users in an adventure works scenario reports generated with data from various sources may contain information about the company’s inventory or sales the growth of the company in different regions about salesperson performance or best performing product categories the purpose of your analysis is a dashboard that contains only the relevant information needed by your target audience for example if you want to design a dashboard for the finance department you first need to identify the relevant data from the available data set you must visualize and present the information necessary for the finance team with all irrelevant data emitted when creating a user centric dashboard your ability to prioritize and visualize relevant data is a major step in engaging your audience you then learned about optimizing dashboards for mobile phones in the lesson you learned how to optimize dashboards for cellular devices how to allow for accessibility considerations and how to create dashboards for real-time decision-making and an enhanced user experience keep in mind though you need to be the owner of the dashboard to make any changes having completed the lesson on improving dashboard design you then learned about other dashboard elements you learned about working with multiple dashboards specifically how to duplicate a dashboard duplicating dashboards is especially important when you need to test the performance of a new dashboard with slight variations or to distribute a slightly different dashboard for other departments or regions another tool that you learned about is pinning a specific tile from one dashboard to another you can pin the tile from one dashboard to another without navigating back to the original report the source of the tile does not change meaning that the pinned tile links back to the original source report where it was created you then learned about incorporating media elements such as images videos and animations and text boxes to your visualization you learned about types of media which can positively impact the dashboard and its engagement with the audience you learned in this lesson how to add and edit various media files to the dashboard from PowerBI service you also learned what factors you must consider ensuring they work correctly for example an image file can only be displayed when it is published online with a URL without security credentials lastly in this lesson you gained hands-on experience in creating QR codes for various dashboard tiles and entire reports in PowerBI service a QR code is a feature that enables you and business users to access the most critical information on the go this can also be used to collect feedback conduct surveys and add external web links to your dashboard the last lesson in this module covered the principles of data storytelling data visualization and narrative are the three fundamental components of data storytelling effective data storytelling can have a positive impact on the overall analytical process benefits of data storytelling include engagement enhanced understanding communication problem solving and effective reporting next you went through an example of data storytelling for adventure works you learned about the principles of setting a stage identifying the conflict assigning the roles to various characters of the story and conflict resolution throughout the storytelling process then you learned about the storytelling process via eight steps they are goal data collection and preparation data analysis and exploration data visualization audience consideration communication feedback and iterations and actions and decision-m in the context of data analysis these steps cover the entire process from data collection and cleaning to databacked decision-m in real world scenarios you will come across examples of poor storytelling which need to be improved before they are presented to your audience choosing the wrong chart type designing a random dashboard canvas and inconsistent use of colors are all common mistakes you need to avoid while crafting a dashboard for your data story you should now have a better understanding of how to optimize your dashboard visuals and how to incorporate data storytelling best practices to create effective dashboards and reports the skills you’ve learned over these weeks will enable you to create data stories that capture user attention enable them to recognize the goals of your data analysis and generate effective solutions for your business congratulations on completing this course on creative design in Microsoft PowerBI microsoft PowerBI is not just an analytical tool it provides opportunities to implement creativity into your reports and designs to better engage dashboard and report users let’s recap what you have learned over the last few weeks reflecting on the key takeaways you started your learning journey by exploring color theory and the key role of color in building reports color theory is the collection of designs rules and guidelines used to communicate with users through color schemes you applied color theory and the role of color principles to improve a report for Adventure Works following on from this you explored appropriate positioning and scale of information while designing your PowerBI reports strategic placement of visual elements such as charts and graphs in a logical sequence within reports increases their user impact in addition consistent scaling within various chart types in accordance with the data type and structure also ensures the effectiveness of design next you learned how to avoid chaos in your PowerBI reports maintaining cohesion and consistency to your report building you also implemented the principles of chaos and cohesion practically to generate a cohesive design in PowerBI throughout this course you learned that the key to successful visualization is knowing your audience you must tailor your PowerBI presentations to meet the needs and preferences of your audience you must tailor your PowerBI presentations to meet the needs and preferences of those interacting with and using them during this lesson you learned how several factors such as job role user objectives information needs and cultural considerations influence your audience’s requirements you then switched to another crucial factor that plays a pivotal role in report design and that is age differences in your audiences colors are significant when designing PowerBI visualizations for various age groups appropriate formatting of a report that reflects the analytical message concisely while maintaining the design principles is key in report design and finally an important aspect of working with data is data security you learned about keeping data secure through data anonymization and how it can be achieved now let’s turn our attention to visual clarity in reports visual clarity at both chart level and report level affects the impact of your reports in this lesson you explored how to choose the correct chart type for the type of data you are visualizing you learned the data type the message and the audience all play a role when selecting a chart type branding visual hierarchy and the business objective are some of the factors that impact your visual clarity at report level next you covered both theoretical and practical aspects of accessible report design in Microsoft PowerBI many built-in tools can be employed to consider people with visual impairments while retaining an engaging and compelling report design following this you gained a thorough understanding of important chart types in PowerBI you gained hands-on experience in designing a key performance indicators or KPI chart a dotplot chart and a bubble chart a KPI chart is significant as you can visualize the current values against a predefined target value with trend axis in place a scatter plot chart along with its variations dot plots and bubble charts are of special significance because of their ability to display multi-dimensional and highdensity data in a single visual with these charts you can visualize categorical information on the charts x-axis having delved into the topic of charts you also explored advanced tools within PowerBI desktop to display complex data structure like tree maps heat maps and drill through and drill down functionalities of PowerBI to conclude this section on visual clarity in reports you learned how to optimize your PowerBI reports for mobile devices joining the wave of dynamic mobile business intelligence geographical data is the part of every business that requires special visual needs powerbi has various map visuals to visualize the location-based information you explored various map visuals through examples and with a hands-on experience shape maps and corropath maps also called filled maps are the two most common map visuals azure maps is a new map visual within PowerBI that offers more control and formatting options through map layers to accomplish the growing need to combine visualizations with complex data structures sometimes PowerBI core visuals are unable to fulfill your analytical requirements this is where you can leverage custom visualizations the PowerBI app source provides a range of custom visuals that are developed by partners and tested by Microsoft for quality and accuracy you learned how to download install and format a custom visual in your core PowerBI visualization pane you have gained a thorough understanding from installing Python to using it for your custom visualization python along with its rich and versatile visualization libraries such as mattplot lib and seabour provides an entire new avenue of dynamic and interactive visualization within powerbi having learned about designing powerful report pages you turned your attention towards dashboard design and storytelling the dashboard is a distinct component of the Microsoft PowerBI ecosystem you began by exploring the differences between a PowerBI dashboard and report as both offer several benefits and serve distinct purposes a PowerBI dashboard represents a snapshot of information displaying the current state of business and is a single canvas of visualization with key insights and KPIs a report is designed for granular data analysis that might consist of multiple pages with drill through and drill down functionalities you learned how to publish your report to PowerBI service create a dashboard and optimize your dashboard for mobile phones remember you can only create and optimize dashboards in PowerBI service the reports you generated using data from various sources might contain information about inventory sales regions growth of the company salesperson performance and best and worst performing product categories the product of your analysis is a dashboard that must contain only the relevant information needed by your target audience in the real world you need to work on multiple reports and dashboards simultaneously in this context you explored ways to streamline your workflow by duplicating a dashboard and pinning a visual element from one dashboard to another media elements are an integral component of a dashboard in the digital era adding images text boxes and videos to your dashboard can have a significant impact on audience engagement you gained practical experience in integrating media elements such as images and videos to your dashboard the fast-paced business landscape requires continual access to up-to-date data powerbi’s live streaming capabilities allow you to integrate real-time data to your dashboard for faster and on-time decision-making you learned that there are three types of live streaming data sets that PowerBI service supports push data set streaming data set and pub streaming data set only push data set is physically stored in PowerBI memory allowing you to build reports on top of the data set effective data storytelling serves as a bridge between the analysis of the data and communication of the results it combines the art of storytelling with the science of analytics to convey insights and findings in a compelling way you gained a thorough understanding of the components of data storytelling the narrative the data used and visualization and how these elements weave a data story next you learned the elements and the process of data storytelling with Adventure Works scenario with the eight-step process you crafted an engaging data story for Adventure Works the eight steps of data storytelling are goal data collection and preparation data analysis and exploration data visualization audience consideration communication feedback and iterations and actions and decision-making lastly you learned that effective data storytelling can have a positive impact on the overall analytical process benefits of data storytelling include engagement enhanced understanding communication problem solving and effective reporting as you have now finished your recap of this course you should take a moment to reflect on your learnings before embarking on the final project assessment and course quiz be sure to recap your learnings additional resources and previous quizzes and best of luck as you complete your journey congratulations on completing the creative design in PowerBI course your hard work and dedication have paid off you’ve made significant progress on your data analysis learning journey and you should now have a thorough understanding of the theory and practice of visualization and design including the design principles of data display and visualization this course provided you with a strong creative design foundation in Microsoft PowerBI this should allow you to modify your report designs to build cohesive reports and to produce audience focused reports aimed at target audiences you learned that to enhance the comprehension of data and improve the enduser experience you can apply visual clarity use multi-dimensional visualizations insert map visualizations and implement a custom visualization exploring the concepts of dashboard design and storytelling you compared the design of a dashboard with the design of a report examined the common steps involved with data storytelling and discovered advanced dashboard features such as embedding media and QR codes your PowerBI knowledge of visualization and design will help you to create better reports and dashboards well done for completing another step in your data analysis education by passing all the courses in the program you’ll earn a Microsoft PowerBI analyst professional certificate from Corsera this program is a great way to expand your understanding of data analysis and gain a qualification that will allow you to apply for entry-level jobs in the field and will help you prepare for the PL300 exam by passing the exam you’ll become a Microsoft certified PowerBI data analyst it will also help you to start or expand a career in this role this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to prepare data for analysis model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX you can visit the Microsoft certifications page at http://www.learn.microsoft.com/certifications to learn more about the certification and exam this course has enhanced your knowledge and skills in the fundamentals of creative designing in Microsoft PowerBI but what comes next there’s more to learn so it’s a good idea to register for the next course whether you’re just starting out as a novice or you’re a technical professional completing this program demonstrates your knowledge of data modeling in PowerBI you’ve done a great job so far and you should be proud of your progress the experience you’ve gained will showcase your willingness to learn your motivation and your capability to potential employers it’s been a pleasure to embark on this journey of discovery with you wishing you all the best as you continue to pursue your studies and develop your career working with PowerBI involves working with many different assets like reports and dashboards managing all of these can be a difficult challenge so we’ve designed this course to equip you with the skills you need to deploy and maintain PowerBI assets during this course you’ll explore the role of PowerBI in business deploying assets in a PowerBI workspace and the role that security and monitoring play in safeguarding reports and dashboards in PowerBI let’s take a few minutes to preview what you’ll learn you’ll begin with an introduction to the role of PowerBI in business with a focus on data flow data flow in business refers to the movement of information within an organization this movement or flow occurs in the following stages collection processing analysis and decision making once gathered the data is cleaned or standardized it’s then transformed data analysts use the refined data to generate insights the data is analyzed using PowerBI service this software offers many advantages for analysts it’s accessible scalable and offers collaboration tools and data backup and recovery features the data analyst is the central figure in this process they possess important skills and expertise in extracting valuable insights from data an important skill that all data analysts must possess is understanding structured query language or SQL data analysts use SQL to interact with the SQL databases that store the data analysts can connect to a SQL database using import or direct query modes import mode loads data directly into PowerBI direct query mode connects PowerBI directly to the source database an analysis is presented in the form of a report a report can be static or dynamic a dynamic report explores multiple areas of interest its results are presented in the form of visuals these reports also facilitate using whatif parameters that permit interactive adjustments to modify visualizations and generate insights into potential scenarios next you’ll explore how to deploy assets in a workspace a workspace is a specialized area in PowerBI that holds important assets there are two types of workspaces in PowerBI the first is a personal workspace which you can use to store your content the second is a shared workspace where a team can collaborate on reports and dashboards workspace roles determine how individuals can interact with workspaces workspace roles include viewer contributor member and admin you can manage these roles using PowerBI’s manage access feature next you’ll learn how to monitor workspaces by monitoring a workspace you can measure its impact and make changes to increase its usefulness you’ll also explore the topic of data sets and gateways in PowerBI a data set must contain the latest available information you can use a scheduled or incremental refresh to ensure accurate data and you can promote and certify data sets to inform your team where to access the most current and reliable data you’ll also explore establishing a secure reliable connection between your on- premises data and PowerBI service using data gateways there are three types of gateways in PowerBI the on premises data gateway the on- premises data gateway personal mode and the Azure virtual network or V-Net data gateway which type of gateway you choose depends on the setup of your organization and its specific data management and security requirements you’ll also learn how PowerBI deployment pipelines move content through the following life cycle stages development testing and staging or production another useful feature for maintaining your workspace is the lineage view this view shows the data journey from source to destination with all the connections in between impact analysis shows how changes to your data can impact or affect different assets in your workspace next you’ll explore the role that security and monitoring play in safeguarding reports and dashboards in PowerBI you’ll first explore how to share information safely and identify sensitive data sensitive data is essential information that if leaked could damage the company’s reputation finances or privacy you can safeguard data using PowerBI’s authentication tools you can also use sharing links to control who you share information with and use sharing permissions to determine what they can do with the data sensitivity labels are also another useful method of safeguarding data access to data sets is governed by data permissions these ensure that only authorized individuals can access data you can configure permissions in PowerBI to safeguard your data you’ll also review rowle security for safeguarding data rowle security or RLS controls which individuals can view data based on predefined roles and rules there are two types of role security static RLS restricts users to specific data dynamic RLS uses data analysis expressions or DAX to adjust realtime data access based on user roles finally you’ll explore subscriptions and alerts in PowerBI you can subscribe to reports and dashboards a PowerBI subscription is an automated delivery system that provides daily data snapshots as emails or notifications you can use the subscriptions pane in PowerBI to manage your subscriptions as well as subscriptions PowerBI also offers data alerts these automatic customizable notifications inform users when specific conditions or thresholds have been met or exceeded you’ll also complete exercises in which you’ll put your new skills into practice by helping adventure works with PowerBI knowledge checks which will test your understanding of these topics and additional resources in which you’ll consult Microsoft learn articles to help you explore these topics in more detail in the final week of this course you’ll undertake a project and graded assessment in the project you’ll prepare configure design and develop a data model for a fictitious online company called Tailwind Traders finally you’ll have a chance to recap what you’ve learned and focus on areas you can improve upon throughout the course you’ll engage with videos designed to help you build a solid understanding of data modeling in PowerBI watch pause rewind and rewatch the videos until you are confident in your skills then consolidate your knowledge by consulting the course readings and measure your understanding of key topics by completing the different knowledge checks and quizzes this will set you on your way toward a career in data analytics and form part of your preparation to take the PL300 Microsoft PowerBI data analyst exam by the end of the course you’ll be equipped with the necessary skills to work effectively with data models in PowerBI good luck as you start this exciting learning journey data is integral to business success but how that data arrives at the business is also important in this video you’ll learn about the flow of data in business and how it can be managed to help generate insights lucas is helping Adventure Works to develop its latest business plan this requires collecting all available data about the business to ensure that Adventure Works plan is as informed as possible this involves exploring what kind of data adventure works can analyze how it makes its way to the business and the techniques the company can use to prepare it for analysis but first let’s begin with the question what is data flow data flow in business refers to the movement of information within an organization this movement occurs in stages the first stage is collection where data is gathered from various sources such as Excel spreadsheets and SQL databases the second stage is processing where data is cleansed and transformed to prepare it for meaningful analysis during the next stage analysis advanced analytics and algorithms are applied to the processed data to uncover trends patterns and insights that inform business strategies the last stage is decision-making during this stage informed decisions are made based on the analyzed data guiding actions and adjustments within the business to optimize processes and achieve objectives and there are processes within business that govern aspects of data like how it is acquired stored manipulated and shared to support business operations and objectives let’s begin with the first stage data collection at Adventure Works data is collected from a variety of valuable sources firstly the Adventure Works e-commerce platform acts as a primary source capturing customer transactions web store browsing behavior and purchase history this platform integrates seamlessly with the customer relationship management or CRM system which compiles customer insights and interactions the point of sales systems in Adventure Works physical stores provide realtime data on instore purchases and customer foot traffic the company collaborates with suppliers who share inventory and sales data ensuring a streamlined supply chain social media platforms serve as another essential source offering insights into customer sentiment engagement and trends once the data is collected it then needs to be processed this vast amount of data is managed through SQL databases that securely store these records in tables you’ll learn more about SQL later in this course for now you just need to know that the SQL database is the center of Adventure Works data operations it links all aspects of the business and it provides an overview of business operations and customer interactions this empowers Adventure Works to make informed decisions for continued success with such a vast amount of information flowing through the system ensuring the accuracy and reliability of the data is paramount the two main steps in this stage of the process are data cleansing and transformation let’s explore these steps more closely data cleansing is the process of examining correcting and standardizing incoming data this removes inconsistencies from the data ensuring that it’s reliable and accurate for instance Adventure Works can standardize customer addresses at the data source by ensuring all addresses are collected and stored in the same format using consistent data types this provides a consistent foundation for shipping and billing this process not only refineses the quality of the data but also establishes a solid foundation for subsequent analysis once cleansed the data then flows through pipelines where transformation steps come into play the process of data transformation involves working with aggregations applying calculations and enhancing data for example Adventure Works can aggregate sales figures from different locations for an overview of regional performances these pipelines act as a bridge for the data to undergo a series of carefully designed transformations before it’s ready for analysis and reporting this stage of the process ensures that the insights derived from Adventure Works data are precise and actionable this helps to drive informed decisions for the company’s continued success after cleansing and transformation the refined data is now ready for analysis the results of this analysis form the foundation for insightful reporting for example Adventure Works can generate sales insights from its regional sales data these insights then form the basis of a report that offers a clear business snapshot now that Lucas has generated the required insights he passes the report on to management once Adventure Works management obtain a copy of this report they can use its insights to make decisions about the business the report indicates low sales of its new mountain bike model based on this insight Adventure Works might try a new marketing campaign for this model to help improve its sales beyond Adventure Works various industries harness data in unique ways to drive their operations for example the public transportation sector uses data from its routes travel times and ticket sales to optimize schedules allocate resources efficiently and enhance the overall commuting experience for passengers other sectors that make use of data include food companies those dealing with perishable goods are impacted by weather and temperature so they must collect and analyze meteorological data cold storage facilities rely on real-time temperature monitoring to preserve the quality and safety of products and they might also increase production in anticipation of a heat wave these examples illustrate how different sectors leverage data to make informed decisions this enhances their efficiency and competitiveness in the market you should now be familiar with the flow of data within a business and how this data is used to generate insights and make decisions an effective data flow is essential for generating insights for informed decision-making in today’s datadriven world the ongoing management of data is crucial for businesses to make informed decisions enhance efficiency and gain a competitive edge in this video you’ll learn how a company like Adventure Works can leverage its data assets using PowerBI service to become a datadriven enterprise and the importance of the continued maintenance of these assets adventure Works has set a goal of becoming a datadriven enterprise by the end of the year to achieve this goal the company must make the most of its data assets so its data analysts have configured custom reports and dashboards in PowerBI to monitor inventory levels track customer preferences analyze market trends and assess product performance let’s explore how the company can leverage and manage these assets to drive strategic decision-making in a datadriven enterprise like Adventure Works data isn’t just information it guides strategic choices resource allocation and maps the pathway for future growth during this transition to a datadriven mindset PowerBI service is used to deploy and maintain data assets as you’ve previously learned PowerBI service is a cloud-based platform used for data analysis it’s a centralized hub where teams can collectively work on reports and dashboards ensuring that everyone has access to the most up-to-date information this ensures that insights remain current and relevant and it empowers Adventure Works to make informed decisions swiftly and accurately unlike its desktop counterpart the service offers the following advantages it’s accessible for remote teams offering flexibility and collaboration across geographic distances adventure Works can use the service to scale up or down to accommodate changing business needs teams can also easily add or reduce resources without extensive hardware and infrastructure investments powerbi service also offers real-time collaboration features for documents and projects improving productivity and teamwork and it provides data backup and recovery reducing the risk of data loss due to hardware failures or other unforeseen events now that you’re more familiar with its advantages let’s explore how the Adventure Works data analysis team makes use of PowerBI service as you discovered earlier Adventure Works can deploy PowerBI service assets like reports and dashboards to monitor inventory levels track customer preferences analyze market trends and assess product performance all in real time let’s find out more about the insights PowerBI service can generate in these areas powerbi service can help to monitor inventory data data analysts can track inventory turnover rate order fulfillment accuracy shipping and delivery times and return rates adventure Works can track existing and emerging customer preferences this information can be used to differentiate its product offerings and stay ahead of competitors adventure Works can also use data to analyze market trends the company can identify opportunities for new product development or enhancements to existing products ensuring Adventure Works remains relevant and it can study trends in pricing to adjust costs to stay competitive and maximize profits other areas of the business that Adventure Works can monitor include product performance powerbi service can deliver information on the performance of individual product lines this information can include the best and lowest selling products and data from online product engagement and product recommendation effectiveness can guide decisions for the purchasing and marketing teams this ensures Adventure Works maintains a competitive advantage in a dynamic market it’s not just retailers like Adventure Works who use PowerBI service in today’s datadriven landscape businesses and organizations across various industries rely on the continuous maintenance of data assets to help guide decision-making for instance in the health care sector accurate and up-to-date patient records are critical for providing quality care a hospital’s ability to access a patients medical history in real time can be a matter of life and death in the finance industry investment firms require accurate data on stock prices and market trends to make timely investment decisions and as the Adventure Works examples demonstrated understanding customer behavior and preferences is vital for online retailers to tailor their offerings and marketing strategies effectively as these examples show data assets help to inform every sector of enterprise you should now be familiar with how a company like Adventure Works can leverage its data assets using PowerBI service to become a datadriven enterprise and the importance of the continued maintenance of these assets whether it’s optimizing supply chains fine-tuning logistics or tailoring marketing strategies the need for continuously maintained data assets is universal deploying and maintaining assets is not just an advantage but a prerequisite for success in today’s business world data analysis is essential and data analysts are central players in this data analysis process extracting invaluable insights from raw information in this video you’ll explore the pivotal role of a data analyst and the profound impact they have on organizational success adventure Works relies heavily on data analysts to help make sense of its data and generate insights to drive business success and there are certain skills and traits a company like Adventure Works looks for in its analysts let’s find out more about the skill sets Adventure Works values and the contribution that its analysts make to the company a data analyst is expected to possess specialized skills in statistics math and programming they use advanced tools to analyze big data and find hidden trends and anomalies that others might miss a data analyst creates reports and visualizations that combines complex information into simplified insights these reports and summaries help decision makers to navigate the business landscape they spot opportunities for improvement automation and cost reduction helping to make processes more efficient and boost the organization’s competitiveness data analysts enforce data protection rules they detect and fix weaknesses safeguarding organizations from harmful breaches and data leaks now that you’re familiar with the skills a data analyst must possess let’s examine some examples of where a data analyst can offer invaluable insights and solutions a data analyst at Adventure Works can employ advanced analytics to segment customers based on behavior demographics and preferences for instance a data analyst might identify a segment of Adventure Works customers who prefer outdoor gear by tailoring marketing messages and promotions to this group the company can increase sales for outdoor related products this enables targeted marketing for higher sales conversion and enhanced customer loyalty data analysts can also use past sales data trends and seasonality to forecast product demand and optimize stock accordingly a data analyst may discover that certain products have a seasonal demand spike by adjusting inventory levels and promotions accordingly Adventure Works can prevent overstocking and reduce carrying costs this leads to higher profitability because Adventure Works can avoid the risk of excess stock data analysts can also generate insights into sales by studying the purchasing patterns of customers to discover which products sell together most effectively through market basket analysis a data analyst might find that customers who purchase hiking boots often also buy outdoor gear adventure Works can use this insight to create bundled promotions that encourage customers to purchase these items together these insights help Adventure Works to meet the needs of its customers and increase its sales in an online industry stopping fraud is vital data analysts use realtime checks to spot suspicious transactions keeping Adventure Works safe financially and protecting its reputation a data analyst may set up alerts for transactions that deviate significantly from a customer’s typical behavior for instance if a customer suddenly makes a high-V value purchase after a history of smaller transactions it could trigger a fraud alert you should now be familiar with the pivotal role of a data analyst and the profound impact they have on organizational success data analysts are essential for helping businesses drive insights and progress as the examples you’ve just explored demonstrate data analysts help to make informed decisions improve operations drive innovation and reduce risks sql or structured query language is a powerful language with many advantages for data analysts working with large enterprise databases in this video you’ll learn about the importance of SQL how it helps with data storage and queries and how it integrates with Microsoft PowerBI adventure Works has just hired some new traininee data analysts it needs these analysts to generate insights from its SQL databases but several of them are unfamiliar with this tool let’s explore the answers to some of their questions about SQL to discover how it helps enterprises like Adventure Works the first question these new trainees have is what’s a SQL database at its core a SQL database is a system for organizing and storing data in a structured format when we refer to a structured format we mean that data is structured or organized so it can be located quickly when required for analysis a SQL database excels in handling structured data its framework is built of tables rows and columns this means that all data is stored in specific categories and analysts can find the data they need with minimal effort for example Adventure Works needs to retrieve bicycle data for a report it can create a SQL query that accesses the product category column in the products table where a list of all bicycle types in stock can be found as this example shows a strong business case can be made for SQL databases through their structured and reliable framework however another advantage of SQL databases is that they facilitate complex queries for quickly extracting specific subsets of data this is important for generating reports and insights data sets are also constantly expanding which requires scalability and a larger data set requires more complex methods of data retrieval you can retrieve data from large databases using techniques like partitioning and indexing finally SQL databases can be accessed by multiple users or applications at the same time an entire team of Adventure Works data analysts can access the SQL database simultaneously without causing a conflict or slowdown this is an important advantage for a business as we’ve discovered the main advantage of a SQL database is its storage capabilities the next question that the new data analysts have is how does this storage work sql databases store data using a method called normalization you might be familiar with this method from previous courses normalization divides data into multiple related tables each with a specific purpose it’s like tidying a room by putting similar things in separate boxes as you discovered earlier SQL databases also use indexing indexing is the technique of assigning a unique number to each row in a table this acts like a table of contents in a book making it easier to locate information as a data analyst it’s also important for you to understand that the real power of SQL isn’t just its storage capabilities the ultimate benefit of a SQL database is its ability to return information through SQL queries sql queries are statements written in SQL they instruct the database to perform a specific operation like returning all records in a table or just a specific subset so you must study the syntax and structure of SQL statements carefully to extract the necessary insights as efficiently as possible for example Adventure Works data analysis team has created a SQL query that returns all bike data from the products table however they can also create a more complex SQL query that returns data only on bikes that cost $1,000 or more the new data analysts are now more familiar with the basics of SQL so their final question is how does a SQL database relate to PowerBI just like PowerBI SQL databases are used by businesses of every size to manage and organize data by integrating SQL databases with PowerBI data analysts can use these tools to create compelling visualizations and reports that turn raw data into actionable insights having explored the basics of SQL alongside Adventure Works new data analysts you should now be familiar with the importance of SQL how it helps with data storage and queries and how it integrates with PowerBI sql is an essential tool for data analysts to help generate the insights businesses need develop a good understanding of SQL and you’ll be an asset to any enterprise powerbi is a powerful tool for extracting

data and it can also be integrated with a SQL or structured query language database to generate even greater insights into your data in this video you’ll explore the structure of a SQL database the steps to connect it to PowerBI and some examples of connection modes adventure Works has recently migrated its data sets to a SQL database the company has tasked Lucas with connecting this database to PowerBI so that it can begin to analyze its data let’s explore the basics of integrating PowerBI and SQL databases then follow along with Lucas as he establishes the connection to begin here’s a quick overview of a SQL Server a SQL Server is a relational database management system or RDBMS developed by Microsoft it provides a secure and scalable platform for storing managing and retrieving data sql servers organize data into structures called databases where they’re stored in tables with rows and columns this makes it easy to retrieve and work with specific data sets users can interact with SQL databases by creating SQL queries that send instructions to the database so your next question might be how do I connect to a SQL database establishing a connection between PowerBI and a SQL database requires three pieces of information the name of the server the database name and your credentials here’s how these pieces of information work together to provide access the server name identifies the location of the database server the gateway to your data the database name is the database within the server you intend to access and the credentials are typically the username and password that grant access permission to the server these details provide a secure and efficient foundation for linking your analytical tools there are two primary modes available for connecting your data in PowerBI import mode and direct query in import mode data is loaded directly into PowerBI for fast and responsive visualizations however the data is static so it might need to be refreshed to reflect realtime updates on the other hand direct query mode connects PowerBI directly to the source database this enables real-time analysis but potentially leads to slower performance due to continuous queries to the database which one you choose depends on your business needs when making your decision balance factors like data size update frequency and performance requirements to communicate with this infrastructure you need to construct queries written with SQL for example Lucas can use a basic select SQL query to retrieve sales data from the database the select command initiates the retrieval of data from the database in other words you’re instructing the database to select specific data in this query the asterisk signifies that we want to retrieve all columns from the specified table the from clause specifies the table from which we want to retrieve the data or the source of the information we’re interested in in this instance we need the rows and columns from the Adventure Works sales table finally the wear clause adds a condition that filters the resulting table rows based on specified criteria in this query product category road bikes indicates that we’re interested in records in the product category column that match the road bikes value now that you’re up to speed with the basics let’s work with Lucas to establish a connection between PowerBI and the Adventure Works SQL database select get data from the home ribbon tab to import data from any PowerBI source a pop-up window with all available data source connectors appears type SQL in the search bar to locate the SQL Server database connector identify the required connector and select connect this opens the SQL Server database window where you must input the database details the SQL server is the server’s IP address containing the database or its identifying name in this instance the Adventure Works server name is FG7N373 and the database name is MSDB next ensure that import is selected as the data connectivity mode to load the table in the PowerBI file memory these settings should suffice for your connection to all database tables the next step is to create a SQL query to retrieve the required data set expand the advanced options then input a SQL select query to retrieve all road bike data from the product category column in the Adventure Works sales table finally press okay next you must provide credentials to connect to the required database and extract the sales data select the database tab and input your database credentials make sure the correct database level is selected then select connect to establish a connection between PowerBI and the database table a warning appears stating that an encrypted connection to the database is missing we can ignore the warning for this example scenario and select okay however it’s good practice to use an encrypted connection in a realworld PowerBI environment a preview of the data set appears on screen you can select transform data to interact with the data set in power query editor or select load to connect to it directly in this instance we’ll select load to connect to it directly once the required rows are loaded navigate to data view if your loaded data is present as a table then this confirms that the connection has been established successfully you’ve now explored the structure of a SQL database the steps to connect it to PowerBI and some examples of connection modes by integrating PowerBI and SQL you can greatly enhance the power of your data analysis powerbi generates static reports that offer a snapshot of data at a fixed point in time however it can also generate dynamic reports which adapt and respond to your business needs in this video you’ll explore the basics of dynamic reports an overview of PowerBI parameters and how to generate dynamic reports using parameters over at Adventure Works Lucas is preparing sales reports however instead of generating a new static report for each aspect of the business he wants to create one report that can serve several different purposes dynamic reports are the perfect solution up to this point you should have experience working with static reports these offer fixed snapshots of data like total sales revenue over January however dynamic reports can be adapted and transformed based on user specifications dynamic reports can be modified using parameters to change how they display information as the data analyst you can decide which parameters inform the report this means that its content is always aligned with your business needs you can also adapt your parameters for different scenarios or you can switch between data sources in real time with this alignment an organization gains more value from one single report this saves time optimizes resources and leads to more efficient and effective reporting practices as you’ve just learned dynamic reports are created using parameters in the context of PowerBI parameters are dynamic variables that influence the data displayed in the report parameters are like dials and switches on a control panel if you update your parameters your report updates accordingly there are many different examples of parameters including numerical values text inputs and boolean or true false settings parameters also accept default values or free form text there are many options for customizing your parameters for example Lucas is developing a sales report that must analyze monthly sales data in North America he can set up a parameter to analyze sales on a continual month-by-month basis or input a custom date range he can also set parameters to filter data by region so that the report focuses only on North America or he could set up a custom region name to focus on a specific area of interest like monthly sales data for states on the West Coast powerbi parameters are the cornerstone of dynamic reporting empowering users like Lucas to customize their data views let’s explore a few more examples of how parameters can be used with dynamic reports you can use parameters to explore high levels of data granularity with dynamic data selection and filtering for example as you’ve just discovered Lucas can analyze specific areas of interest in his data using custom ranges this helps to deliver greater insights for adventure works parameters also enable dynamic data source connections with parameters you can switch between data sources like databases files or application programming interfaces also known as APIs this is great for dealing with evolving data environments or multiple data repositories parameters can be used to analyze existing business situations or create new what-if scenarios for example Lucas can create financial forecasts by inputting growth rates expense projections and revenue assumptions as his parameters this generates a range of potential revenue outcomes for Adventure Works leveraging PowerBI parameters through scenarios helps Adventure Works to explore multiple outcomes helping to create datadriven business decisions you should now be familiar with the basics of dynamic reports PowerBI parameters and how to generate dynamic reports using parameters by using dynamic reports you can align your data more closely with the needs of your business and gain maximum value from one single report dynamic reports are an interactive userfriendly way of viewing and analyzing data and offer much more powerful insights than traditional static reports in this video you’ll learn how to create a dynamic report using a SQL database and PowerBI parameters lucas must generate a dynamic report for Adventure Works that analyzes the company’s sales data across multiple regions the report must extract data from a sales table in a SQL database it then needs to use parameters to alter the displayed region according to user selections the first step is to create a connection select get data from the home ribbon tab select SQL Server from the list of options the SQL Server database dialogue box appears on screen input the server name in the server field and the database name in the database field ensure that the import mode option is checked for data connectivity mode import mode should be selected by default next you need to retrieve and load the data for your report expand advanced options input a SQL select query that retrieves all table columns from the Adventure Works sales table containing data or values for sales in Asia select okay to execute the query input your database username and password credentials to access the SQL server select connect then okay on the encryption warning finally select load to load the database table into your report the table shows data from sales in Asia as specified in the where clause of the SQL select query the next step is to format the table and visualize the data the table’s default name is query one rename the table to sales now you need to visualize the sales as a table graph select the table visualization then expand the columns of the sales table select the product category product region and order total columns finally you need to increase the size of the text to make it more visible navigate to the format pane of the visualization increase the table’s values to 15 point font size increase the column headers to 16 point font size resize the table to fit the values and center it on screen next you need to create parameters to make the connection dynamic navigate to the transform tab on the home ribbon to access power query editor once in power query you can view the data set table you’ve connected to you can now create a new parameter to access the dialogue box for creating new parameters access the home tab select manage parameters then new parameter these actions open the manage parameters window you can configure your parameter as follows name it region parameter select text as the data type ignore suggested values as it’s not required for this project finally add Asia with single quotes as the current value select okay to create the parameter now you need to assess your parameter by adjusting your SQL query right click on your sales query in the query editor then select advanced editor your code appears on screen in the advanced editor dialogue box replace Asia in your code with the amperand symbol and region parameter check the bottom left hand corner of the dialogue box to ensure no syntax errors have been detected then select done you need to grant permission for this query to run select edit permission and then run select close and apply to return to report view now you need to test that the report is dynamic select transform data from the home ribbon and select edit parameters change Asia to Europe select okay then select apply changes to refresh the data set select run to enact your changes the data set modifies itself to display sales in Europe adventure Works now has a dynamic report that it can use to explore its sales data across multiple regions and you should now be familiar with the process steps for creating a dynamic report using a SQL database and PowerBI parameters a dynamic report typically offers insight into one area of interest at a time however with a multialue dynamic report you can explore several areas of interest at once in this video you’ll learn how to create a multialue dynamic report in PowerBI adventure Works needs to transform its current dynamic sales report into a multialue dynamic report that offers insight into its sales data across multiple regions simultaneously let’s create this report for the company using PowerBI the first step is to create a spreadsheet containing the required values to be passed to the SQL query it must use single quotes for text values however to include a single quote at the beginning of your text in Excel you need to use double quotes this indicates to Excel that you’re typing a single quoted text access the transform data option to open Power Query Editor select and import the product region selection Excel spreadsheet check the box for sheet one and select okay to add it to the editor once the sheet is loaded in the editor rename column one to region selection now you need to create a function to match the database table rows with the user selection in the spreadsheet select the sales query from the queries menu right click on the query and select create function from the list of options in the create function window type the following function name get sales data from regions select okay power query creates a folder that contains all parts of the function the next step is to invoke your custom function this ensures that the database table records match the spreadsheet column values in other words you import only the relevant data select the other queries folder and select sheet one then access the add column ribbon tab and select invoke custom function this action opens the invoke custom function window name the new column invoked function data select the get sales data from regions function query and select region selection as your region parameter then select okay your data set shows a new invoked function data column containing the required sales regions you can use the double arrow button on the top of the new invoke function data column to expand the data avoid using the original column name as a prefix this would make the column names too long it should only be used if combining multiple columns of the same name in the same function might cause confusion select okay to load the data this loads the database table columns and rows with a product region that matches the spreadsheet selections double click on sheet one in the queries pane and rename it to sales function select close and apply to return to the report view access the visualization pane and select the table icon select the following columns from the data pane product category product region and order total as you select these columns the table visualization is populated with the data from each one next select the format painter on the visualization pane increase the font size of the table’s values to 15 point and the column headers to 16 point for greater visibility then resize the table return to the spreadsheet and change Asia to Europe then save the document return to PowerBI and select the refresh option from the home tab the new multialue region selection from the spreadsheet is shown in the database table results your multivalue dynamic report is now ready to present to Adventure Works this report lets the company select and analyze sales from multiple regions for greater insight you should now be familiar with the process of creating a multialue dynamic report in PowerBI dynamic reports show information on your current data but with whatif parameters you can dynamically alter reports to observe hypothetical outcomes or scenarios in this video you’ll explore the concepts of whatif parameters and scenario-based analysis and you’ll review the process steps for applying these concepts to your reports adventure Works has raised its monthly order amount target lucas its data analyst must determine the target to meet next month’s sales goals lucas can use whatif parameters to forecast scenarios and identify the required sales target before we explore how Lucas can carry out this task let’s review the basics of whatif parameters a whatif parameter is a customdeefined variable that can make interactive adjustments within a PowerBI report you can adjust your parameters to change your visualizations and generate insights into future scenarios the main purpose of whatif parameters is to enable dynamic scenario analysis this means users can explore various hypothetical scenarios without the need for complex calculations or creating multiple versions of the same report instead a single report can be transformed into a versatile tool capable of adapting to various business contexts for example Adventure Works can use whatif parameters to create sales forecasts the company’s data analysts can tweak variables like sales growth rates seasonality factors or marketing budgets they can then instantly observe how these adjustments affect projected revenue sales and revenue targets this level of interaction empowers users to make informed decisions based on realtime insights while what if parameters offer tremendous flexibility it’s important to recognize when and where they can be most effective they’re most effective in scenarios with many variables that can significantly impact outcomes and where it’s important to be able to quickly assess these outcomes what if parameters can be applied across a range of industries organizations and use cases for financial analysts they facilitate stress testing of financial models and evaluation of risk scenarios marketing professionals can use them to optimize advertising budgets and forecast campaign outcomes supply chain managers can simulate various demand scenarios to fine-tune inventory levels once you have the available data the possibilities of whatif parameters are near endless now that you’re more familiar with whatif parameters let’s help Lucas perform a scenario-based analysis for Adventure Works lucas must create a whatif parameter to forecast the sales required in February to reach the new monthly target of 70,000 using the data from the sales report to help him first navigate to the modeling tab select new parameter and numeric range from the drop-own menu the parameters dialogue box appears on screen input the details as follows name the new parameter forecasted increase assign it a decimal data type input one as the minimum amount and two as the maximum then input 0.1 as the increment this creates 10 steps between one and two and set the default to one finally check add slicer to this page and select create a slicer is added to the page expand its settings on the visualization tab select vertical list as the style and turn on single select so a value is always selected resize the visual to fit the left side of the report navigate to the data pane and expand on the forecasted increase table to identify what has been created by the whatif parameter first there is the column that’s currently being used in the slicer which contains a list of numbers based on the parameter settings this was created by the generate series function secondly a measure contains the option selected in the slicer captured by the selected value measure you also need a third measure to handle the desired calculation to create it select new measure from the ribbon and name it forecast amount add the sum of order total column multiplied by the forecasted increase value measure now you need to add this measure to the analysis navigate to the column chart and access the build visual settings add the measure to the yaxis of the visualization since the parameter is set to one the forecasted results of the calculation is the exact same number as the current total you can cycle through the options to view more scenarios the whatif parameter dynamically modifies the visualization one forecast shows that a 1.6 increase in the total amount is enough to reach the monthly target you should now understand the concepts of whatif parameters and scenario-based analysis and the process steps for applying these concepts to your reports what if parameters in PowerBI offer a transformative approach to data analysis by providing the ability to dynamically adjust variables and instantly visualize the impact they empower users to make more informed decisions data scientists and data analysts and big tech companies already use SQL and other languages for advanced data analysis this gives leadership valuable insights into overall productivity and what the weak spots may be leading to evidence-based strategic decisions they can create comprehensive customer profiles to better understand their customers needs leading to targeted marketing initiatives businesses can look at supply chain analytics to figure out where production delays or bottlenecks happen but what impact can data science have on a larger global scale some cities are already using data analytics to inform decisions about urban planning to lead to a better quality of life for their inhabitants ultimately working toward being recognized as a smart city singapore Oslo New York and Paris the list goes on imagine a city planned entirely based on data analysis a city that takes the innovations all those cities already use and incorporates them into one place what would that look like welcome to Data Topia during its inception urban planners and data scientists work together to develop an exact ratio of residents to schools to shops to restaurants to healthcare facilities to green spaces and so on ensuring that all these amenities are accessible to all residents all the time there are no traffic jams in Datatopia real time data analytics and predictive models provide timely and actionable insights to traffic management centers using cameras sensors and GPS data from vehicles this is used to adjust traffic lights dynamically and reduces congestion by improving the efficiency of intersections digital signs display realtime traffic information to drivers suggesting alternate routes when congestion is about to occur real time analytics automatically detect traffic incidents and alert authorities leading to quick response times to minimize disruptions and improve safety data topians don’t have to worry about overflowing waste bins all bins have been fitted with sensors that detect when they are nearing full capacity triggering timely waste collection and preventing overflows landfill usage and recycling rates are carefully monitored using realtime analytics this data is used to inform sustainability initiatives water use cleanliness of public spaces and energy use is also monitored in Datatopia street lights dim when roads are empty to reduce energy consumption green energy systems power the city and smart grids optimize power distribution predictive analytics have shown that 38% of Dattopians will be over 65 in the next 10 years health care measures such as hospital capacity and resource allocation are carefully managed to accommodate the aging population data analytics identifies trends and patterns within the population to target preventive interventions and improve overall health outcomes this includes identifying at risk populations and tailoring interventions to specific groups education is very important in Datatopia educators can analyze attendance records coursework completion rates and other data to identify at risk students early in the academic year early warning systems can trigger interventions to prevent dropouts and improve student success analytics are also used to recognize high achievers who may benefit from advanced coursework statistical algorithms are used to predict student outcomes this drives decisions in allocation of university course offerings in the city data science is used in resilience planning in Datatopia predictive analytics ensure that the city has resilience strategies in place to cope with various challenges such as cyber threats economic downturns or natural disasters this data is used to improve emergency response times and the deployment of emergency services during a crisis datatopia seamlessly integrates information and technology to create a healthy and sustainable urban ecosystem we may not quite live like the people of the imagined data topia just yet whether it seems like a dream or a nightmare to you it’s clear that with the ever evolving landscape of the practical application of data analytics we may be getting one step closer every day congratulations on reaching the end of these lessons on PowerBI in enterprise during these lessons you explore data’s role in large enterprises let’s take a few minutes to recap what you learned in these lessons you first learned how data flows through an enterprise you discovered that data flow refers to data movement within an enterprise this movement occurs in the following stages: collection processing analysis and decision-m in a large enterprise data flows in from a variety of sources its flow is governed by processes influencing how it is acquired stored manipulated and shared once gathered the data must be cleansed and transformed to prepare it for analysis data cleansing is the act of standardizing data so that it is reliable and accurate data transformation is the act of transforming data as it flows through pipelines once cleansed and transformed the refined data is ready to inform strategic decisions as its insights are revealed through PowerBI reports organizations use these reports insights to become datadriven enterprises data isn’t just information it guides strategic choices and helps to map a pathway to growth powerbi service is used by many businesses to generate datadriven insights this is because of the advantages that it offers it’s accessible for remote teams it scales to meet data growth it offers real time collaboration and data backup and recovery and it’s you who helps organizations to take advantage of these benefits the data analyst is the figure that plays a central role in extracting valuable insights from this data a data analyst brings several important skills to an enterprise they provide analytical expertise they create reports and visualizations with data that drive decision-making they generate insights that identify room for innovation and they help to identify and mitigate risks next you learn about SQL and its role in enterprise sql or structured query language is used by data analysts to interact with SQL databases data is stored in a SQL database that stores data in a structured format this means data is organized so that it can be located quickly when required sql databases also store information using normalization and indexing to make it easier to locate data sql databases offer many advantages for enterprises they’re great for storing data they facilitate complex queries they can scale to meet the demands of a growing business and they can be accessed by multiple users at the same time sql databases return information through SQL queries data analysts must be familiar with SQL syntax to create queries that extract the required data to connect to a SQL database you must identify the location of the server and the database on the server that you need you then need to provide credentials to gain access you can connect your data using import mode or direct query mode import mode loads data directly into PowerBI direct query mode connects PowerBI directly to the source database you can communicate with this infrastructure using SQL queries for example Adventure Works can use SQL select queries to extract information on bicycles sql databases and PowerBI servers also facilitate the use of dynamic reports dynamic reports can alter between views based on user selection you can also create multialue dynamic reports that simultaneously explore several areas of interest within your data sets both can be modified using parameters to change how they display information this provides more value than standard reports as a data analyst you can decide which parameters inform the report once they align with your business needs you must connect PowerBI and a SQL server to create a dynamic report you then need to create a SQL query to retrieve and load the data from the SQL database once loaded you need to visualize the data typically in graph format finally you must configure parameters to analyze the data multi-dynamic reports are more difficult to create this is because they require the use of custom functions to be invoked in a data set powerbi reports also make use of a whatif parameter a whatif parameter is a custom-defined variable that can be used to make interactive adjustments within a PowerBI report you can adjust your parameters variables to change your visualizations and generate insights into future scenarios they’re most effective in scenarios with many variables that can significantly impact outcomes that must be assessed quickly throughout these lessons you also completed several knowledge checks that tested your understanding of the concepts and processes you explored you also encountered additional resources which presented you with links to further reading materials that you can use to enhance your understanding of the role of PowerBI in enterprise you’ve now reached the end of this summary it’s time to move on to the module quiz where you can test your knowledge of these topics this is followed by the discussion prompt where you can discuss what you’ve learned with your peers you’ll then be invited to explore additional resources to help you develop a deeper understanding of the topics in this lesson best of luck working with PowerBI service requires managing many different reports dashboards and data sets keeping track of these can be a demanding task fortunately you can use the workspace feature to manage your data assets in this video you’ll explore PowerBI service workspaces their advantages the types of workspaces available and best practices to follow when using them lucas has been tasked with managing several different reports and dashboards for Adventure Works he can use PowerBI service workspaces to keep all these data assets in one place using personal and shared environments let’s explore how workspaces can help Lucas manage Adventure Works assets powerbi service workspaces act like specialized rooms in a house each workspace hosts distinct data sets reports and dashboards this is great for data analysts because it helps with organized and efficient data management several features of workspaces make them useful for data analysts these include organization access control collaboration and streamlined updates let’s explore these features beginning with organization workspaces offer data analysts great organizational potential each workspace is a unique container for related reports dashboards and data sets this helps keep your data tidy and easy to locate workspaces also provide access control safeguard your data from unauthorized users with your workspac’s access control features depending on the workspace you can determine who can see or edit the content for example Lucas can configure his workspace so that only other members of the data analysis team can view it this is especially useful when working on confidential data or collaborating with specific teams workspaces also enable collaboration between teams shared workspaces are like conference rooms they’re spaces where Lucas and the data analysis team can discuss and refine data insights it’s not just about storing reports but building them together workspaces help keep content updated with workspaces you can streamline updates to your projects updating or modifying data is much easier with everything in its right place whether pulling in new data or revising visualizations having a structured workspace ensures consistency and clarity now that you know more about workspaces and their advantages let’s explore the different types available there are two main types of workspaces these are personal and shared workspaces both serve a different purpose let’s review their differences to find out more a personal workspace is like a private room in your house it’s your space where you can arrange things to your liking and work on projects privately here you’re in total control outsiders don’t have a key ensuring your work remains confidential and undisturbed shared workspaces let team members collaborate they can bring together their individual data insights and blend them into a collective narrative it’s a space designed for collaboration allowing multiple users to add edit and refine reports and dashboards simultaneously how you manage and utilize your workspace is crucial for effective data analysis adopting certain best practices can significantly enhance your efficiency and output one important best practice involves regular cleanup periodically review and remove outdated reports or data sets from your workspace this proactive approach ensures optimal performance and prevents potential confusion from irrelevant information you must also establish clear naming conventions for your data assets consistency is key when naming your reports dashboards and data sets this practice aids easy retrieval and benefits all users especially in shared workspaces you must also frequently review your access controls assign access levels based on roles and responsibilities to maintain data security and prevent unintended modifications for example over at Adventure Works Lucas must continually monitor who can access his team’s shared workspace to ensure only data analysts can view its assets in the digital realm safeguarding your work is paramount ensure that you back up your work regularly regular backups protect against unexpected data losses ensuring continuity in your projects on a large team like Lucas’ frequent backups are vital it only takes one mistake from one team member to lose important data and finally you should also encourage open discussion and collaboration with your team members you can do this by fostering a culture of continuous feedback you can refine data visualizations optimize reports and foster a more collaborative environment by actively seeking and implementing suggestions adhering to these best practices ensures efficient data management and creates a conducive environment for team collaboration you should now be familiar with PowerBI service workspaces their advantages the types of workspaces available and best practices to follow when using them as you’ve discovered through Lucas and his team workspaces can greatly benefit your data analysis projects as a PowerBI data analyst you’ll frequently collaborate with others in shared workspaces so it’s important that you understand how to create and manage these workspaces in PowerBI service in this video you’ll explore the process steps for creating a workspace and learn how to keep its content updated over at Adventure Works Lucas needs to create a collaborative workspace for his data analytics team a PowerBI service shared workspace is the perfect solution let’s help Lucas create and manage this workspace log into PowerBI service navigate to the lefthand sidebar to access the platform’s tools select workspaces to display the available workspaces for now Lucas only has access to my workspace his personal space select my workspace to access the space and reveal its contents the workspace contains reports dashboards and data sets however other team members need to collaborate on these assets to create a shared workspace for the team navigate to workspaces and select new workspace the create a workspace dialogue box appears on screen in this dialogue box you can input a workspace name assign a domain for your workspace and upload an image you can also use advanced settings to assign members for now let’s just input adventure work sales as the workspace name then select apply now that we’ve created the workspace we must upload some content select upload then select a PowerBI report the report and its data set and dashboard are uploaded to the workspace and ready to share however if any changes are made to the report in PowerBI desktop it will need to be uploaded again to the shared workspace to ensure these changes are reflected for all other users to demonstrate this let’s open the report in PowerBI desktop and make a quick change in the report select the order total by product color visualization select the ellipsus symbol then select sort access and modify the order by sort ascending all values on the x-axis are now sorted by ascending order total save the report and return to PowerBI service open the report again in the workspace screen this version does not reflect the change we made in PowerBI desktop so we’ll have to upload it again return to the workspace screen and select upload select browse and locate the updated report a warning appears stating that a data set with the same name already exists select replace and upload the new version of the report once the new version of the report is uploaded you can open the report and view your changes the updated chart is now visible in the report indicating a successful upload you should now be familiar with the process steps for creating a workspace and keeping its content updated by knowing how to build and manage shared workspaces in PowerBI service you can work effectively with your teams to generate insights and help drive business success running a shared workspace involves managing a lot of different people everyone must be assigned the correct roles and permissions to ensure the team works together effectively in this video you’ll explore workspace roles and the different types available and learn how to configure them lucas has created a new shared workspace for his Adventure Works colleagues to collaborate on the company’s latest reports he now needs to identify who requires access to the workspace and assign the correct roles to everyone let’s work with Lucas to assign roles to the team just as you wouldn’t let everyone in a company have the keys to every room roles determine who can do what in digital workspaces these roles ensure that each person has only the access required to do their part of the job nobody is granted unnecessary permissions that could lead to accidental disruptions or security risks in PowerBI service workspace roles are the backbone of efficient and secure collaboration workspace roles include viewer contributor member and admin let’s explore these roles in more detail beginning with viewer viewers are the audience they can look but can’t touch in other words they can view content without modifying or managing anything lucas can assign this role to managers stakeholders or anyone else who needs to be in the loop without directly impacting the workspace next is contributors contributors are there to add and modify content but they can’t adjust access permissions or delete items lucas should assign this role to those focused on adding content they can contribute to selecting content but don’t need to make bigger workspace adjustments workspaces also host members members can contribute to the content by adding and editing assets they can also add other members or collaborators with lower permissions however they cannot delete the workspace or manage user roles lucas can assign this role to regular team members who need to work on data or perform analysis and might also need to add others to the project and finally there’s admins admins oversee the workspace they have full control from adding editing and deleting content to managing user access and even deleting the workspace lucas can assign the role of admin to himself or another individual tasked with overseeing the entire project or workspace the chosen admin can keep the project running smoothly while ensuring everyone else performs their roles as required now that you’re more familiar with workspace roles let’s help Lucas to manage the roles in his shared PowerBI workspace lucas has uploaded the project’s report data set and dashboard in the adventure work sales workspace however roles must be assigned before the team can collaborate on this workspace first select manage access from the workspace environment all team members with access to the workspace are listed here for now it’s only Lucas who has access to add a new team member to the workspace and assign a role select add people or groups a brief information box appears stating that viewers cannot edit content in the workspace to add a team member search for their name or email in the search box for the first example let’s add Adio our fellow data analyst assign Adio the contributor role so he can collaborate on the content and press add adio is now added to the workspace next let’s add Renee the marketing manager as a viewer this role lets her access the workspace to view insights without making any changes lastly the IT department must be assigned the role of admin this role grants full permissions from content management to user access control locate the admin account in the search box select the admin role and add it to the workspace all roles have now been assigned select the back arrow to view the roles that everyone has been assigned select the down arrow on their permission to modify a role and alter it to another role for example Renee needs to be able to add users from her team to the workspace reassign her role to member to grant her these permissions having helped Lucas and his team organize their workspace you should now be familiar with workspace roles and the different types available and how they’re configured always configure workspace roles correctly to ensure your project runs smoothly and set your team up for success workspaces are useful for storing and collaborating on content but it’s important to keep this content organized and easily accessible workspace apps are a great way of organizing your content efficiently to be located quickly and easily in this video you’ll explore the basics of workspace apps their advantages and learn how to create one in Adventure Works each department accesses its reports and dashboards through PowerBI however navigating this content on PowerBI is complex and timeconuming as a solution Adventure Works wants to create departmentspecific apps so that each department can access its reports and dashboards quickly and efficiently let’s find out more about apps in PowerBI service and how adventure works can incorporate them an app in PowerBI is a collection of important assets like dashboards reports and data sets packaged together for ease of access these assets can be bundled together under a workspace they can then be published to the PowerBI service this enables a streamlined sharing and distribution mechanism for PowerBI content there are a few reasons why businesses like Adventure Works prefer to use apps to access content on PowerBI service one reason is ease of access with apps users don’t have to search through numerous reports and data sets everything they need is in one package this makes it quick and easy to locate content apps also facilitate version control when an app is updated users automatically see the latest version this ensures that everyone is on the same page apps also help with security apps maintain the same level of data security as individual reports access can be restricted to authorized users only and data can be secured at row level so users can only view what you want them to view these security measures are great for protecting your data finally apps can also be customized apps can be tailored for specific departments or roles within an organization for example Adventure Works can customize the app to show marketing data for the marketing department sales data for the sales department or financial data for the accounting department this makes Workspace apps incredibly flexible tools for data distribution now that you’re more familiar with PowerBI apps let’s explore the process for creating an app in PowerBI service adventure Works has created a workspace called Adventure Works Sales this workspace holds all content related to the company’s sales like reports and dashboards to create an app for this workspace select the create app option this opens the build your app window the window contains three tabs setup content and audience in the setup tab you must input key information about your app this includes the name description logo and color scheme you can also add contact information for publishers or other important individuals name the app Adventure Work Sales and add sales app as the description once you’ve input the required information select add content to move to the next tab in this tab select the add content option to add reports to the app adventure Works requires the orders report and product sales report select and add the reports once added the reports appear in the left sidebar you can preview the reports or adjust their order select the symbol to the left of the orders report and drag it to the bottom so it appears last in the app you can also select the down arrow on the right of add content to add separate sections to your apps let’s link to the Adventure Works site select add new section to add a new section the new section appears in the list rename it Adventure Works internal site press the down arrow again select add link name the link Adventure Works website and add the link in the opening field box select content area then in the section field box select Adventure Works website select add to add the link to the app then select next add audience to move to the next section the audience tab you can use the audience tab to manage access to your application anyone who can access the workspace can access the app by default you can add more users or groups from the search box or you can share your app with the entire organization for now let’s restrict access to workspace users select publish app to complete the process it might take a few minutes for the app to publish once it’s ready select go to app to view it the app is ready to use with the Adventure Works website as its landing page you can use the sidebar on the left to navigate its contents you should now be familiar with Workspace apps their advantages and how to create them in PowerBI service as you continue to work with PowerBI service use Workspace apps as useful tools to organize your content for quick access and more efficient projects workspaces are a useful tool for developers but how do you determine how widely used or effective your reports are with PowerBI workspace metrics features you can monitor the usage and effectiveness of your workspace content in this video you’ll learn about the importance of monitoring workspace and report usage utilizing the current report metrics and the new preview feature and you’ll explore how usage metrics enhance report and workspace efficiency lucas is responsible for monitoring the performance of his team’s PowerBI workspace and its content a strong understanding and efficient deployment of usage metrics will help Lucas monitor the effectiveness of his workspace and reports let’s explore these topics in more depth and find out how they can help Lucas monitoring workspace usage in PowerBI involves tracking how reports and dashboards are accessed used and shared within a workspace it provides a window into the effectiveness and reach of the deployed data solutions the insights gathered from this data enable data analysts to make informed decisions on optimizations security and resource allocation it’s important to understand how your content is used to measure its impact and effectively guide your efforts usage metrics act as feedback showing how reports and dashboards are accessed within the organization for example you might discover that your team references several reports daily or a certain dashboard isn’t receiving the number of views it should you can use these datadriven insights to improve the performance of these assets monitoring report performance ensures relevance efficiency and responsiveness aligning your work with organizational needs and user preferences monitoring is mainly performed using the PowerBI services usage metrics reports or monitoring reports you can enable these reports for every workspace giving insights into how frequently users access them the initial usage report in PowerBI primarily focuses on individual report metrics providing details such as the number of views shares and user interactions on a per report basis for example Adventure Works evaluates the performance of its global marketing reports by tracking views and user interactions the company also measures how the report has been shared to gauge engagement across its worldwide workforce the usage metrics report is instrumental in understanding the performance and user engagement of your workspace reports powerbi service offers its users the option to switch to a preview version of the new workspace metrics feature this new feature expands monitoring from individual reports to the entire workspace providing additional insights into report performance some of these insights include aggregated metrics which encompass all KPIs analyzed in the old usage reports and add report performance information this feature compiles all of Adventure Work’s previously analyzed KPIs and integrates report performance data to provide a comprehensive set of metrics other insights include the typical opening time of the report with daily and weekly breakdowns lucas uses this data to track the average report loading times to help ensure a smooth user experience and this feature also provides information on all workspace reports instead of a specific one lucas uses this data to understand how his reports are performing so he can improve their content you can also access a detailed FAQ article containing all relevant capabilities and a description of this rich new feature to run and access the usage metrics data you’ll require the following prerequisites you need a PowerBI Pro or premium per user PPU license to run and access the usage metrics data however the usage metrics feature captures usage information from all users regardless of the license they’re assigned to access usage metrics for a report you must have edit access to the report and finally your PowerBI admin must enable usage metrics for content creators your PowerBI admin may have also enabled collecting per user data in usage metrics ensure these prerequisites are established before running or accessing the usage metrics data in this video you’ve learned about the importance of monitoring workspace and report usage utilizing the current report metrics and the new preview feature and you explored how usage metrics enhance report and workspace efficiency monitoring workspace usage with PowerBI’s workspace metrics preview feature improves our understanding of data usage across the organization aligning with informed decision- making and resource efficiency as a data analyst your role includes tracking how users engage with your data with the workspace usage report you can review insights into workspace activity and user engagement you can then use these insights to optimize your data and reports in this video you’ll learn how to enable the workspace usage report feature in PowerBI generate and navigate a usage metrics report for a specific workspace report and interpret key metrics to gauge user engagement and report interaction lucas has uploaded a product sales report to his workspace he needs to check that his data analytics team has reviewed this report lucas can use the usage metric and workspace usage reports to monitor the team’s engagement with his product sales report let’s help Lucas achieve his goal by guiding him through this process the usage metrics report in PowerBI is important for understanding how individuals interact with reports and dashboards it is an insightful report that can be launched and viewed on any workspace report the new workspace usage report feature enhances this by providing even more detailed insights it allows a closer look at how workspaces are used not just individual reports thanks to these reports users can now view an enhanced overview of basic report metrics the report usage tab lets users better understand each report’s performance with more detailed usage metrics that provide data on topics like views and users the report performance tab provides a breakdown of a report’s effectiveness with detailed insights into specific report interactions and their impact users can also use the report list tab to explore how all the reports in the workspace are performing making it easy to compare their performance and success and the FAQ tab provides easily accessible answers and guidance adventure Works can use the new workspace usage report feature to align resources and strategies with actual user interaction and needs enhancing their performance and user experience now that you’re more familiar with usage reports and the new workspace usage report feature let’s create one for Lucas from the PowerBI home screen navigate to workspaces and select the adventure work sales workspace here you can view the content uploaded to this workspace to enable the usage metrics report on the product sales report hover over the report item and select the ellipsus symbol to access the reports options locate and select the view usage metrics report option to launch the monitoring report if this is your first time accessing the usage report PowerBI will need a few moments to create it in the usage metrics report you can find information on report views and unique views by day total report views and a list of all users who access the report there are also slicers available for your data that can filter the usage report based on distribution method this feature highlights users that the report was shared to or workspace users who access the report you can also slice based on the platform the users use to access the report either from a browser or mobile lastly you can even filter by viewing the usage of separate report pages to enable the new monitoring feature toggle the new usage report to on this transforms the usage report to the new workspace usage report this new feature contains four separate pages with monitoring tools on the first page report usage you can identify metrics like the old report with updated visualizations and separate graphs instead of slicers for example you can see that 100% of report access has been conducted through PowerBI.com instead of mobile also selecting pages on the bottom right visualization shows that the order report page takes up 57% of the views on the second page report performance you can see the loading time of the report based on date user country of browsing and the internet browser used this is a significant page when troubleshooting long loading times on reports on the third page report list the new usage report feature allows users to monitor the usage of every workspace report from this single view you can see the familiar tools from the old usage monitoring report now enabled through all workspace reports the fourth and last page FAQ contains a detailed guide on all metrics and terminology used in this new monitoring feature it explains the usage of every tool in detail all this information can easily be exported to Excel and analyzed making monitoring and reporting on the workspace usage easier than ever in this video you’ve learned how to enable the workspace usage report feature in PowerBI generate and navigate a usage metrics report for a specific report within a workspace and interpret key metrics to gauge user engagement and report interaction with these reports you can optimize your workspace and its reports so that they meet the needs of your team by now you’re familiar with generating insights into data insights are generated from data sets and these data sets in turn rely on timely accurate data flow from different sources over the next few minutes you’ll learn about the basics of data sets in PowerBI service explore the relationship of data sets to data flows and reports and compare scheduled and incremental refreshes in data sets adventure Works data sets are dynamic they’re continually updating as they receive new data from different sources the company must ensure that its reports capture this latest data so they’ve tasked Lucas with integrating its data sets and data flows let’s take a closer look at how data flows into data sets a data set in PowerBI is a collection of data you import or connect to this data can come from a single source or multiple sources once captured it forms the basis for your reports and dashboards every data set’s unique structure and metadata influences the analysis you can perform let’s break down this relationship further as the previous example shows data sets act as a bridge between data flows and reports in PowerBI data flows collect and transform data from various sources like SQL databases and Excel files these data sources are then loaded into data sets these data sets a collection of processed data feed into the reports this enables analysts to derive insights effortlessly the symbiotic relationship ensures a streamlined data flow from extraction to visualization let’s look at an example of how the Adventure Works sales department can use data flows to consolidate and prepare data for analysis an adventure works data flow may collect sales data from different regions using a complex network of data sources it then cleans this data by removing duplicates and transforming the remaining data into a unified format once this process is complete the cleansed and transformed data is loaded into a data set data analysts can use this data set to create a report to analyze sales trends compare regional performance and identify growth opportunities it’s important to remember that all data sets must be frequently refreshed to include updated data this is to ensure that your insights are as current as possible you can manually refresh your data set any time but with PowerBI you can also plan a refresh to occur automatically there are two main ways to automatically refresh your data in PowerBI service a scheduled refresh and an incremental refresh both refresh mechanisms are vital for maintaining the accuracy and relevance of data in the PowerBI service let’s take a closer look at these methods a scheduled refresh is a set routine where the entire data set is refreshed at specific intervals for example Lucas has scheduled a daily refresh for 2 a.m each morning in the Adventure Work sales workspace to ensure data remains current however be careful when using scheduled refresh it could be resource inensive for large data sets an alternative more resource efficient method is to use incremental refresh unlike a scheduled refresh an incremental refresh only updates the parts of the data set that have changed as you saw in the previous example Lucas sets a scheduled refresh at 2 a.m daily for the primary sales data set to capture the previous day’s data however he can also set an incremental refresh every hour for the continuously updated online sales data set this incremental refresh captures new sales data without reprocessing the entire data set this way Lucas efficiently keeps data sets current ensuring reliable analysis and reporting at Adventure Works both refresh methods help Lucas keep his reports timely and actionable you should now be familiar with the basics of data sets their relationship with data flows and reports and understand the difference between a scheduled and incremental refresh data sets are central to PowerBI and they’re a valuable part of your analytical toolkit leverage data sets effectively for greater insights and informed decision-making powerbi is a fantastic service for data analysis however to get the most out of it you must ensure it has a secure and stable connection to your data with PowerBI gateways you can create a strong safeguarded bridge between PowerBI services and your on premises data over the next few minutes you’ll discover how to connect data with PowerBI gateways explore the different types and uses of gateways and learn how to set up and manage gateways adventure Works stores large amounts of data on premises lucas and his data analytics team must connect to this data securely and reliably using PowerBI the team can leverage PowerBI gateways to establish a secure and reliable connection between on premises data and PowerBI service so why is PowerBI gateways a solution for Adventure Works powerbi gateways establish a secure and reliable connection or bridge between your on- premises data and the PowerBI service on Microsoft’s cloud this connection allows PowerBI service to access and retrieve data from on premises data sources this enables organizations to keep their data secure while benefiting from the PowerBI services cloud-based analytics and sharing capabilities powerbi gateways interact with on premises data in two ways the first is a data refresh gateways facilitate the scheduled refresh of data sets pulling the latest data from the source to PowerBI for example Lucas can use the gateway to schedule a daily refresh of adventure works on premises sales data this ensures that the sales team has the latest figures ready for analysis in PowerBI every morning the second type of interaction is query execution gateways help execute queries against the data source to retrieve updated data lucas opens the latest iteration of Adventure Works sales data report and executes a query to identify yesterday’s total sales the gateway helps Lucas to execute the query against the sales report there are three main types of gateways in PowerBI each suited to different scenarios the on- premises data gateway the on- premises data gateway personal mode and the Azure virtual network or V-Net data gateway which type of gateway you choose depends on the setup of your organization and its specific data management and security requirements let’s find out more about each type beginning with the on- premises data gateway the on premises data gateway suits multiple users sharing and refreshing data across many Microsoft services including PowerBI it’s very versatile which makes it useful for diverse organizational setups the gateway supports all types of connections from PowerBI like import data scheduled refresh direct query and live connection quick access to and support for these connections is important in real time data interaction for example each Adventure Works department requires access to different data sets stored on premises these data sets can be managed centrally with an on premises data gateway this setup lets multiple users refresh and access the data they need across different Microsoft services next let’s review the on premises data gateway personal mode the personal mode is tailored for single user scenarios it supports connections to local data sources such as SQL Server and Excel which is useful for individual users or analysts it’s also designed to be easy to set up and once setup is complete the gateway requires no additional configurations for data sources this offers a much less complex solution for business analysts who want to publish and refresh PowerBI reports with minimal hassle however this gateway supports only one type of connection import data or scheduled refresh and it’s designed only for PowerBI so it doesn’t support other applications lucas can use the personal mode of the on- premises data gateway to manage data sets he doesn’t want to share with the rest of the team with this straightforward setup he can refresh the data without going through the central gateway and finally there’s the Azure virtual or V-Net data gateway the Azure virtual network or V-Net data gateway best suits complex organizational setups by offering enhanced security and data management features within a virtual network it helps cut the costs or overheads of installing updating and monitoring on premises data gateways by virtually bridging PowerBI to supported Azure data sources this gateway securely communicates with the data source executes queries and transmits results to the PowerBI service as Adventure Works grows it requires better security and data management a V-Net is a great solution it enables secure data transfer and the ability to manage the data environment it provides a secure pathway for data that adheres to the company’s organizational security policies and it keeps data refreshed and readily available for analysis in PowerBI you should now understand how to connect data with PowerBI gateways the different types and uses of gateways and how to set up and manage gateways with a strong understanding of gateways you can establish an efficient and secure connection between your on premises data and PowerBI impactful insights depend on access to the latest data an analysis based on outdated data isn’t of much use to anyone configuring a regular PowerBI data refresh ensures your reports and dashboards are consistently synced with the latest data by the end of this video you’ll understand the importance of configuring a data set refresh and know how to configure a scheduled ondemand and incremental refresh adventure Works needs daily updates on its marketing campaigns and sales so Lucas must ensure that the reports and dashboards his team relies on for analysis contain the latest available data let’s help Lucas configure a data set refresh so his team is working with up-to-date information first access the adventure works sales workspace the workspace contains a new report on marketing campaigns access the report settings to plan a scheduled refresh select schedule refresh from the settings to navigate to the data set refresh settings the last refresh failed because the credentials weren’t entered when the data set was uploaded to the cloud navigate to the data source credentials category and select edit credentials this report is connected to the Adventure Works SQL database so input your Adventure Works SQL database username and password then select sign in next navigate further down the menu and expand the refresh settings toggle the setting on to activate the scheduled refresh check that the refresh is configured daily between 6:00 a.m and 100 p.m coordinated universal time or UTC the scheduled refresh is now ready navigate back to the workspace once the credentials are set you can manually refresh the data set whenever needed to demonstrate let’s refresh the orders report hover over the report and select the circular arrow this is the refresh icon selecting this icon performs an ondemand manual refresh of the data set next let’s configure an incremental refresh on the sales transaction report navigate to Power Query Editor on the PowerBI desktop to issue an incremental refresh you must now create two parameters one that determines when the refresh begins and another that states when it should end select manage parameters then new parameter in the manage parameters dialogue box name the first parameter range start assign it a date time parameter type and provide January 1st 2000 as the current value right click the parameter and select duplicate to create a copy this copy is now your second parameter rename it range end next select the sales table and identify the order date column select the columns down arrow access date time filters then custom filter in this window keep the rows where order date is after or equal to select parameter and input range start for the and option select before parameter and range end on the second row your configuration is now ready select okay then close and apply to return to PowerBI desktop right click on the sales table and select incremental refresh toggle the incremental refresh on configure the settings to archive data older than two years and incrementally refresh data from the last seven days each data set refresh will now remove transactions that occurred over two years ago and they’ll refresh only transactions that occurred in the last seven days note that as the info box states the report must be uploaded to the PowerBI service for the refresh policies to occur apply your changes and save your report lucas and his team are now working with the latest data and you should now understand the importance of configuring a data set refresh and how to configure a scheduled on demand and incremental refresh great work analyzing data involves working with many different data sets so it’s important to distinguish reliable data sets from unreliable or misleading ones to ensure your insights are accurate with PowerBI you can endorse promote and certify reliable data sets to clarify which ones you and your team should work from in this video you’ll understand the importance of data set endorsement differentiate between promoting and certifying data sets and learn how to promote a data set in the PowerBI workspace over at Adventure Works the sales workspace is cluttered with many data sets it’s difficult for Lucas and his team to determine which ones to work with lucas decides to identify and endorse reliable data sets to help his team maintain data integrity in their workspace let’s discover more about endorsing data sets then use our new knowledge to help Lucas and his team endorsing data sets involves identifying and marking reliable data sources in your workspace to ensure your team works with quality content you can endorse data sets in PowerBI from the endorsement and discovery menu data set endorsement in PowerBI comprises two levels promoting and certifying promoting a data set indicates that you trust its content and view it as ready for organizational use when you promote a data set a promoted icon appears next to it in the workspace when a data set is flagged as trusted it becomes easily discoverable and the team knows it’s reliable you can also certify a data set this is a higher level of endorsement it symbolizes that the data set meets the company’s stringent quality and compliance standards however content certification is a big responsibility only authorized users can certify content so this option is typically only available to workspace owners over at Adventure Works Lucas is the workspace owner that means he is the only team member who can certify data sets next let’s review the process for endorsing content in PowerBI by helping Lucas promote reliable data sets access the Adventure Works sales workspace to view all available data sets select filter then data set the team has been using the marketing campaigns report a lot recently it’s filled with high quality data that has delivered many great insights lucas has decided it can be endorsed as trustworthy content to begin the endorsement process hover over the data set to reveal the ellipsus symbol select the ellipsus then settings in settings locate and expand the endorsement and discovery section check the promoted option then check make discoverable so other users can identify the endorsed data set select apply to finish configuring the settings select adventure work sales from the navigation pane to return to the workspace navigate to the right of the workspace the marketing campaigns report data set is now marked as promoted the promoted flag draws the attention of the workspace users to the report and lets them know it’s suitable for analysis great work you’ve helped Lucas identify and endorse a reliable report that his team can use for analysis and you should now understand the importance of data set endorsement be able to differentiate between promoting and certifying data sets and know how to promote a data set by endorsing data sets you ensure your team works with and draws insights from reliable and consistent data anna oversees quality at the Spiro Car Company today she has a big meeting with senior leadership spiro has been manufacturing electric vehicles for the last 8 years and business is booming or at least it was lately there have been concerns about manufacturing time and quality business has slowed sales have dropped and morale is low and Anna unsurprisingly is worried luckily one thing Anna never worries about are statistics they never lie each machine in the assembly line reports statistics to a central database in the manufacturing facility unfortunately dumping data on her manager’s desks won’t solve the problem this time she has heard her colleagues discuss using PowerBI for analyzing data but Anna prefers the old ways and stores everything locally on a central database but what if she could somehow convert her data stack into a coherent interactive visual if so she would be one step closer to figuring out where quality is slipping and more importantly providing the leadership team with the answers they need she meets with Dennis and outlines her predicament he explains the on premises gateway to her this gateway will bridge the gap between Anna’s on premises data and PowerBI and best of all the data transfer is completely secure this means that she can access all the features of PowerBI using the data stored locally on her laptop a great solution after a quick guide through the basics from Dennis and a chat with it about requirements Anna is ready first she installs the gateway on the database server and signs in with her work account to register the gateway anna can now connect all the data she stores locally to reports and dashboards in PowerBI she can even configure a refresh schedule or perform an ondemand refresh she starts running reports building rich data visualizations and identifying interesting business insights she discovers that the main issue in the Spiro manufacturing supply chain process is a delay in delivering the car’s high-capacity battery packs the supplier also fails to deliver enough batteries which leads to further delays the quality slips as the assembly team tries to make up for these delays anna can’t believe how straightforward it was to convert her on premises data using the gateway and the best thing about it she doesn’t have to say goodbye to her older methods of storing her data locally anna arrives at the leadership meeting with an interactive dashboard to outline her findings and a plan to resolve the issue senior leadership decide to use Anna’s data analysis to develop a remediation strategy spiro switches to a more reliable supplier for their battery packs and they put better measures in place to review quality analytics so they can act before another issue occurs thanks to Anna Spiro’s business once again is booming when deploying content in PowerBI it’s important to ensure the data is safe and that the change is handled efficiently that’s why analysts make use of structured deployment over the next few minutes we’ll explore PowerBI’s deployment pipelines for streamlined project management in this video you’ll learn about PowerBI’s deployment pipelines recognize the importance of separate environments and explore how to enhance data security through structured development over at Adventure Works Lucas has been tasked with using PowerBI service to improve the company’s development process he must ensure that the data of all new content deployed to the workspaces remains accurate and secure during the report development stages let’s help Lucas achieve this deployment pipelines in PowerBI help content move smoothly through development testing and production stages this allows for controlled testing and validation of content before it reaches end users let’s explore these three stages of deployment in more detail first we’ll examine the development environment here developers can add new content without changing current reports this is the first step in the deployment process this is where developers can create and modify PowerBI reports any errors or issues at this stage have no impact on the existing production data for example Lucas improved a sales report by adding a new visual in the development stage ensuring it matched branding guidelines next let’s explore the test environment this is where a small group of testers review and test new reports for issues before they’re used in production providing feedback and checking for bugs and data problems here reports are validated for accuracy performance and any potential bugs before moving to the production environment for example Lucas can move his new visual from development to the testing phase this will allow for the testing team to check the accuracy and performance of the new visual lastly we’ll investigate the production environment once new reports and features are tested they’re ready to be used by the end users in the production environment this is the last step in the process for example once Lucas’ new visual has been validated through testing it is moved to the production environment once in the production environment users and stakeholders will be able to use the new feature however not all three development environments must be included in a deployment pipeline for example the testing phase could be excluded if it’s not considered necessary there are several benefits of a structured development life cycle by having distinct environments you can ensure that unvetted changes do not corrupt the production data a structured life cycle allows for comprehensive testing ensuring that the data remains accurate and reliable and deployment pipelines provide a streamlined process for managing changes enabling better control over the development process let’s find out how a structured development process helped Adventure Works in a realworld example lucas improved a sales report by adding a new visual in the development stage ensuring it matched branding guidelines after moving it to the test environment and thorough validation the report went to production this example showcases how PowerBI’s deployment pipelines ensure a smooth and accurate transition of content benefiting data accuracy and decision-making at Adventure Works using PowerBI’s deployment pipelines for a structured development process ensures safe data handling in this video you’ve learned about PowerBI’s deployment pipelines the importance of separate environments and enhanced data security through structured development with PowerBI’s deployment pipelines you can effectively manage changes with separate environments allowing for accurate and secure sales data while reducing risks and improving control and efficiency it’s important to catch potential errors in your pipelines to ensure your data is accurate for end users with PowerBI deployment pipelines you can catch these errors and ensure a smooth transition from development to production in this video you’ll learn how to access and configure a PowerBI service deployment pipeline how to allocate existing workspaces to their respective environments and how to oversee and monitor deployment history and settings a minor error in PowerBI report development could mislead end users lucas needs to use deployment pipelines to ensure changes are tested to enhance reliability and efficiency let’s guide Lucas through this process access the deployment pipeline icon on the left navigation pane on the PowerBI service homepage on smaller screens you might need to select the more ellipsus button in the navigation pane to locate and select the deployment pipelines an introductory screen with the pipeline capabilities appears select create a pipeline to begin streamlining the data processes the create a deployment pipeline window appears on the screen enter sales pipeline as the pipeline name and sales reports deployment pipeline as the description then select next three default environments appear on the screen you can add more environments by selecting the add button and naming them you can also remove environments by selecting the bin icon for this example let’s keep only the development and production environments of PowerBI we’re now on the deployment pipeline page note that the workspaces assigned to the environments must be created beforehand in this case the main workspace we’ve been using has been renamed to Adventure Works Sales Development highlight it in the development environment and select assign workspace next select the newly created Adventure Works Sales Workspace in the production environment and assign it after assigning both a warning pop-up appears indicating differences in content between the two environments select deploy in the test environment to confirm that the changes made by users in development have been approved they can now be deployed in the production environment where end users have access select deploy to begin the process a green tick appears at the end indicating that the two environments are now synced and no new changes are to be deployed for now several important features of the pipelines appear in the top ribbon you can adjust the pipeline settings from the ribbon manage access to the environment and view the deployment history the history contains necessary information such as the deployment user the number of items deployed and the final process status lucas has improved Adventure Works sales reports you can do the same by setting up a deployment pipeline to ensure smooth transitions from development to production minimizing errors and enhancing data integrity in this video you learned how to access and configure a PowerBI service deployment pipeline allocate existing workspaces to their respective environments and oversee and monitor deployment history and settings maintaining a workspace often requires updating its components however an update to one component could affect multiple others with lineage view and impact analysis you can understand how your components are related and how changes impact the workspace in this video you’ll learn about the core concepts of data lineage and impact analysis the functionality and benefits of the lineage view and you’ll also explore the impact analysis feature and its role in data management over at Adventure Works Lucas needs to update the SQL server his workspace depends on however several other workspaces also depend on this same server lucas must determine what components rely on this server and how they’ll be impacted by the changes he makes to it you can help Lucas by working with him to incorporate lineage view and impact analysis into his workflow let’s begin by understanding what these terms mean lineage view simplifies data tracking by showing its journey from source to destination it visually connects data elements by revealing the relationships between data sets data flows reports and dashboards these data elements are presented using a parent child relationship the parent child relationship shows how data elements are connected in a sequence parents are the starting points and children follow as subsequent steps in the data journey this helps to provide a clear picture of the connections between the data in your workspace lucas can use lineage view to manage his workspace by identifying and updating outdated data sets this ensures that his team works from the most recent and accurate reports another valuable tool in PowerBI is impact analysis impact analysis complements lineage view it helps you to understand how changes in your workspace affect different components it provides an overview of how data is used this feature helps you to make informed decisions when modifying data your data sets are intertwined with your reports workspaces and dashboards a change to one asset can affect multiple others once you understand how changes impact your workspace you can inform the rest of the team and ensure everyone can use the updated data effectively now that you’re more familiar with lineage view and impact analysis let’s explore how Lucas can incorporate them into his workflow when you log into a workspace you are presented with the default list view this view displays workspace items such as reports and dashboards to switch to the lineage view select the lineage view icon this view is only available to the admin contributor and member roles in lineage view you can explore the relationships between all your workspaces content for example in the adventure work sales workspace a SQL server database serves as the data source for both data sets in the workspace reports have also been created for both data sets additionally both reports have visualizations pinned to a single dashboard the sales dashboard selecting any component brings up a window with its details on the right hand side of the screen select the SQL server as this is the component to be modified selecting this component brings up information such as the server and database name the privacy and authentication methods and the status of the gateway which indicates that the connection is currently active select the X icon to close the window data sets also display their last refresh date and time you can refresh a data set on demand by selecting the refresh button this is the basic lineage flow in a workspace workspaces with larger data pools are more complex various reports could stem from a single data set this generates numerous end dashboards the show lineage button on every component is helpful in these situations you can select the arrow to highlight the entire lineage flow the most important feature of the lineage view is impact analysis select the screen icon on any lineage component to open the impact analysis window in this instance select the Adventure Works SQL Server data source the impact analysis window displays all components a SQL Server data source change affects the affected components are referred to as child items the asset you modify is the parent item in this instance modifying the Adventure Works server the parent item would impact six child items spread across three different workspaces you can also view the list of child items by type or workspace by selecting the buttons on the right before you modify the server you need to notify all team members impacted by your actions you can use the notify contacts feature to message all affected individuals you can also add a note to describe the impact in this video you learned about the core concepts of data lineage and impact analysis the functionality and benefits of the lineage view and the impact analysis feature and its role in data management lineage view and impact analysis in PowerBI boost data management you can easily track data history keep data updated and understand changes and effects these features make decision making smarter and data management smoother you interact with many different assets in your workspace and it’s important that they can be accessed quickly however some assets like reports can take longer to load the more you use them luckily PowerBI offers a caching feature you can use to optimize your workspac’s performance in this video you’ll learn about the fundamentals of query caching in PowerBI how caching interacts with import mode and the application of caching adventure Works data analysis team has been using the marketing campaign report heavily as a result of all these changes the report takes longer to load each time it’s accessed the team needs to make use of caching to improve the report’s performance let’s find out how caching is the process of temporarily storing query results this enhances performance by minimizing the time and resources required to fetch data accessed regularly for example the analytics team queries the marketing campaign report hundreds of times daily each query involves retrieving and processing significant data from the database this can strain the system and slow down the reporting process caching helps by saving frequently requested data like the marketing campaign report so it doesn’t need to be fetched from the database every time this speeds up the analytics process and reduces strain on the system there are many benefits to query caching first it offers faster performance with caching you can return reports and queries faster especially for frequently used static data sets it also preserves bookmarks and filters so that they don’t need to be reapplied or reset each time a query is run caching also offers personalized data access each user receives their own cached query results for a personalized experience query caching also follows all security rules which means that caching maintains data security without compromising compliance and lastly caching reduces the computing load on your workspace saving resources however query caching has certain limitations it is exclusive to import mode and not applicable for direct query and live connection modes not all users have access to query caching it is only available with a PowerBI premium or embedded subscription there are also other potential limitations clearing the cache when switching from on to off can cause a brief delay for ondemand queries and finally during data set refreshes the query cache updates and may impact performance with high query volumes now that you’re more familiar with query caching let’s help the Adventure Works data analytics team make use of this feature to improve their reports performance first open the Adventure Works sales data set where the report is located this report is used often which affects its loading speed so it’s a good candidate for query caching to use query caching hover over the marketing campaigns report data set select the ellipsus symbol and choose settings from the options in the settings menu navigate to and expand the query caching options query caching is turned off by default to enable query caching select on and then select apply this caches all bookmarks and filters on the initial report page the report will now open faster if you try to disable query caching a pop-up appears this pop-up warns that turning off query caching will result in saved queries being deleted the next time someone opens the report they may experience a slight delay during their first use this applies to both options with query caching disabled in this video you’ve learned about the fundamentals of query caching in PowerBI how caching interacts with import mode and the application of caching using query caching in PowerBI improves report speed and resource efficiency streamlining your data analytics journey it’s a smart way to optimize performance maintaining uninterrupted service connectivity in PowerBI is important for timely and accurate data analysis by understanding the most common connectivity challenges and how to troubleshoot them you can perform analysis without issue in this video you’ll learn about the most common connectivity issues in PowerBI how to rectify refresh failures caused by credential modifications and the process of configuring notification settings for multiple users over at Adventure Works Lucas has been alerted to a supply chain optimization project report that failed to update because of a credential change to troubleshoot this issue he must fix the schedule he also needs to add another team member to the notifications in case the updates fail again when he’s unavailable let’s help Lucas fix the report and ensure that Adio is notified the next time there’s a problem but before we do let’s learn more about troubleshooting service connectivity issues powerbi service connection problems can lead to data set refresh failures with various causes to fix this a clear troubleshooting plan is needed this involves checking the gateway configurations resolving data refresh issues and ensuring data source settings are correct by following this process users can improve service connectivity leading to smoother data analysis in PowerBI it’s also important to correctly set up notification settings to alert the right people about refresh failures this ensures quick action can be taken to resolve any issues let’s start by exploring some of the most common connectivity issues as you’ve just learned most connectivity issues in PowerBI fall under the umbrella of three main categories the first of these we’ll explore is gateway configuration the first step is to check the gateway connectivity status by verifying that a gateway connection is active and running on your data sources the next step is to ensure you’ve selected the correct gateway choosing the correct gateway facilitates a reliable connection to your data sources this ensures that your reports and dashboards have the most accurate and up-to-date information and you must also check that you’re using the latest gateway version an updated gateway ensures a solid connection between PowerBI and your data sources another category is data refresh issues this can include issues like unsupported data sources that do not support refresh operations understanding the nuances of these data sources and rectifying such issues is essential for ensuring that your reports reflect the most current data it’s also important to perform a scheduled refresh check testing the accurate configuration of the scheduled refresh is vital in preventing data latency a well-configured scheduled refresh guarantees that your data is updated regularly and that the insights derived from your reports are based on the latest available data finally there are also data source settings an example of this is data source misconfigurations addressing any misconfigurations in your data source settings promptly ensures uninterrupted data retrieval a malfunctioning data source may prevent the connection with PowerBI blocking the refresh processes and there’s also credential verification verifying the credentials for your data sources helps prevent unauthorized access and resolve connectivity issues ensuring the credentials are accurate and upto-date is fundamental for maintaining a secure and reliable connection to your data sources let’s discover how these issues can be solved by taking a few moments to help Lucas troubleshoot his PowerBI connection navigate to the supply chain optimization project workspace to address the data set that failed to refresh a red exclamation mark next to the refreshed column indicates that the refresh has failed to complete select the warning icon to view details of the error in the report settings menu immediately when opening the settings Lucas identified that the last scheduled refresh failed this resulted in the refresh being disabled by PowerBI so the error resulted from this failed refresh let’s troubleshoot this error scroll down and check the gateway and cloud connection options verify that the personal gateway is running on the database and does not pose an issue with the connection between the data source and PowerBI the next set of options data source credentials states that the data source failed due to incorrect credentials this is the cause of the connection issue select edit credentials to fix this and enter the new login credentials leave the rest of the settings as they are and select sign in the connection has now been reactivated scroll down to the refresh settings expand the options and select on to enable a daily refresh in the next section check the these contacts box to add AIO to the contacts list adio will now be notified if a refresh failure occurs again in the future in this video you learned about the most common connectivity issues in PowerBI how to rectify refresh failures caused by credential modifications and the process of configuring notification settings for multiple users by rectifying credential errors reconfiguring scheduled refreshes and ensuring the right individuals are notified about refresh failures you’ll ensure the accuracy and timeliness of your data congratulations on reaching the end of these lessons in deploying assets during these lessons you explored creating monitoring connecting to and maintaining workspaces and data sets in PowerBI let’s take a few minutes to recap what you’ve learned so far you began the first lesson by exploring the concept of a workspace you learned that a workspace is a specialized area in PowerBI that holds important assets like data sets reports and dashboards its advantages are that it helps to organize assets for easy management provides security through access control as only permitted users can access workspaces a workspace also enables collaboration teams can use them to build reports and workspaces let analysts update or modify data quickly there are two types of workspaces in PowerBI the first is a personal workspace which you can use to store your own personal content the second is a shared workspace where a team can collaborate on reports and dashboards always follow best practices in your workspace like performing regular cleanups establishing clear naming conventions safeguarding your data regularly backing up your work and seeking feedback from your team on improvements that could be made to the workspace the process of creating a workspace is very straightforward a workspace can be created by selecting the new workspace option from the workspaces tab in PowerBI when creating a new workspace you must consider workspace roles workspace roles determine who can perform each task workspace roles include the following viewers can view content but can’t modify it contributors can add and modify content members can alter content and add new members and admins have full control over the workspace assets and its members you can manage these roles using PowerBI’s manage access feature during this lesson you also created a shared workspace for Adventure Works where Lucas’ team could collaborate on reports in the next lesson you learned how to monitor workspaces this involves tracking how reports and dashboards are accessed used and shared within a workspace by monitoring a workspace you can measure its impact and make changes to increase its usefulness monitoring is performed through usage metrics and monitoring reports these reports provide details like how a report was used or an overview of a report’s performance you can create a usage metrics report in a workspace from a reports options list there are also slicers for your data that can filter report data powerbi automatically creates a usage metric report data set when you create a usage metric report the credentials for accessing this report must be carefully managed so that it can be refreshed and accessed as required in the third lesson you explored the topic of data sets and gateways in PowerBI a data set is a collection of data you import or connect to it can come from one or multiple sources the captured data forms the basis of your reports the captured data must be the latest available information this ensures that your reports are accurate you can use a data refresh to ensure accurate data a scheduled refresh is a routine that refreshes an entire data set at specified intervals you can configure a refresh by selecting the scheduled refresh feature from your reports options ensure you enter the correct details and credentials so PowerBI can access the report an incremental refresh updates only the parts of the data set that have changed this is a more resource efficient alternative you can configure an incremental refresh from Power Query Editor this involves creating two parameters determining when the refresh begins and when it ends promoting and certifying data sets lets you inform your team where to access the most current and reliable data promoting a data set indicates you trust its content and it’s ready for use certifying a data set states that it meets the company’s highest standards you can promote and certify data sets from PowerBI’s endorsement and discovery menu you also explored establishing a secure reliable connection between your on premises data and PowerBI service using data gateways these gateways enable you to perform a data refresh or query execution securely there are three types of gateways in PowerBI the on premises data gateway the on- premises data gateway personal mode and the Azure virtual network or V-Net data gateway which gateway you choose depends on your organization’s setup and its data management and security requirements you also practiced your new skills with an exercise in which you configured a data set for Adventure Works you also worked through a knowledge check which tested your knowledge of these topics and an additional resources item in which you explored Microsoft learn articles on data sets and gateways in the fourth and final lesson you learned how to maintain workspaces and data sets you began the lesson with an overview of development life cycles powerbi contains deployment pipelines that help move content through the following life cycle stages development in which new content is added testing in which content is reviewed for issues before it’s used in production and production when reports and features are deployed to end users the benefits of a structured development life cycle include data safety data integrity and efficiency and control you can access the deployment pipeline in PowerBI from the navigation pane this feature can create customize and manage pipelines or environments another useful feature for maintaining your workspace is the lineage view this simplifies data tracking by showing the data journey from source to destination with all the connections in between impact analysis helps you understand how changes to your data can impact or affect different assets in your workspace you can alternate between these views in PowerBI you’ve now reached the end of this summary it’s time to move on to the module quiz where you’ll test your knowledge of the topics you’ve covered best of luck data analysts often find themselves working with sensitive data as such they often need to think about the responsibility of handling such information safely in this video you’ll learn how to identify sensitive data and review measures that can be taken to protect data at Adventure Works a data breach could lead to legal trouble loss of trust and a competitive disadvantage safeguarding sensitive data is important for protecting its reputation and success data analysts must handle sensitive data with care so how do we tell the difference between regular data and sensitive data sensitive data contains important information about a business or its stakeholders that if mishandled could cause harm or misuse here’s a simple rule if it’s information that could damage the company’s reputation finances or stakeholder privacy it’s sensitive data for example general sales figures for a particular region might be considered regular data but a detailed list that breaks down customer details financial records employee information or even proprietary business knowledge is sensitive data any information that offers intimate knowledge that isn’t meant for circulation can be classified as sensitive the consequences of mishandling sensitive data can have multiple serious consequences both at business and employee level for example an email containing sensitive product designs for Adventure Works next big launch is inadvertently sent to an external vendor a mishap could give competitors an advantage or lead to legal problems if designs were patented also think about the impact of an employese’s personal data leak this could breach privacy laws resulting in fines and harm trust between employees and management one mistake can bring financial losses legal troubles and brand damage as you navigate the world of data it’s important to be equipped with a security toolkit let’s explore the various measures that can be implemented to ensure data remains in safe hands before a user can access a report they need to prove that they are who they say they are adventure Works operates globally so everyone accessing the PowerBI platform must be verified an authentication system requires users to input a unique identifier that ensures only authorized personnel can access data once a user is authenticated the system determines what data they are permitted to access this protects Adventure Works from internal leaks and unauthorized external breaches in PowerBI you can define roles for users as each role has specific permissions tied to it since employees within Adventure Works have varied job functions PowerBI allows roles to be customized ensuring data is distributed on a need to know basis for instance a product management analyst role might be permitted to see inventory levels reports while the human resources analyst can access employee reports regularly reviewing and updating these roles is essential to ensure they align with organizational needs and changes another measure used to protect sensitive data is rowle security rowle security or RLS is like a detailed filter where users can view only the data rows they are supposed to based on their role or identity for example a regional manager for North America at Adventure Works might only need to view sales data for North America and not Europe rls ensures specific rows of data in PowerBI are shown only to authorized users safeguarding regional strategies and preventing potential conflicts of interest another measure used to safeguard data is encryption adventure Works intellectual properties such as proprietary bicycle designs and vendor contracts are invaluable the company can use encryption to ensure that only authorized individuals can read this data as data moves between systems or across the internet it is susceptible to interception encrypting this data ensures that even if someone gains unauthorized access they can’t decipher the information this helps protect business interests as a global company Adventure Works data is often accessed from around the world encrypting data while it’s being transmitted ensures it can’t be accessed and misused finally there’s also data masking data masking allows you to work with obscured versions of sensitive data enabling you to verify transactions without risking financial security it strikes a balance between transparency and security for Adventure Works sometimes you might need to work with data without knowing the exact details in these instances you’ll need to use the technique of data masking for instance you might need to verify the last four digits of a customer’s credit card without seeing the whole number data is powerful but carries great responsibility in PowerBI every data point represents Adventure Work’s commitment to its global community you should now know how to describe sensitive data and understand the measures that can be taken to protect data protecting data preserves trust in the company’s vision your choices today shape tomorrow’s outcomes as a data analyst you’ll often need to send very large files to other people fortunately you can use PowerBI’s link sharing feature to grant access to reports without transferring large files or losing their interactivity in this video you’ll explore sharing a URL in PowerBI service different types of links and how to generate a URL or link to share a report at Adventure Works data analysts are constantly building useful and dynamic reports powerbi’s link sharing feature allows them to quickly distribute these reports to multiple teams with a simple link let’s find out more about how this works in PowerBI when you share a link you’re essentially giving someone a URL to access your report or dashboard directly in a web browser a link is fast efficient and doesn’t require downloading large files however it does pose security risks which means that access must be carefully managed powerbi offers different sharing options for links let’s explore some of these the first category is people in your organization for example you’ve built a report on Adventure Works yearly sales trends and want to share it with the whole sales team when you select people in your organization anyone with an Adventure Works email can open the report using the link this means only those within the organization can view those insights the next category is people with existing access you’ve shared a report with the product management team perhaps containing confidential info about a new touring bike prototype when you use the people with existing access option only those you’ve already permitted can view the report others at Adventure Works won’t be able to view it even if they find the link the final category is specific people in certain situations a specific person may need access to a report tailored to their project by using the specific people option you can ensure that only the individuals you explicitly mention can view the report other individuals can’t access it unless you permit them however configuring who can access the link is just as important as configuring what the individual can do with the data provided by the link configuring data protection is vital failure to do so could result in unauthorized access to sensitive customer and employee data leading to legal issues privacy breaches and a tarnished reputation sharing permissions is a vital tool for protecting data permissions safeguard your data by determining who can access it in large companies like Adventure Works these protections are crucial let’s explore two common sharing permissions in PowerBI re-share and build permissions data and insights must move between departments in big companies like Adventure Works re-share permissions let people share with others which can be great for sharing important information quickly but it can also cause problems each time it’s shared again the original context can get lost leading to misunderstandings or the wrong people accessing the data build permissions lets others use the data you’ve shared recipients with build permissions can merge data as needed for richer analyses but they can’t change the core data however using this power wisely is essential to avoid cluttered less useful reports now let’s demonstrate an example of how you can generate a link to share using PowerBI first start by navigating to PowerBI service on the left sidebar select workspaces and select the specific workspace where your desired report is located browse through the list of reports and select the title of the report you wish to share this opens the report and provides a live interactive view of its contents it’s always good practice to review the report before sharing to ensure it’s the correct one towards the top left corner of the screen locate and select the share icon which resembles an arrow the share button provides different mechanisms for report distribution in the window that opens just above the email address field select the people in your organization with the link can view and share option choose the people in your organization permission level from the available options ensure you uncheck the option allow recipients to share your report by toggling this option off you ensure that the content is only viewed by its intended audience once you have selected the desired permission level select the apply button near the bottom of the send link window is the copy link button depicted by a paperclip icon when you opt to share via a link PowerBI generates a unique URL that directs users to your report by copying this link you’re grabbing the address of the live version of your report once copied you can paste and share this link just like any other web link when a user clicks on it provided they have the required permissions they’ll be directed to the report on PowerBI service where they can interact with it live remember always to consider the sensitivity of the data when selecting an option next let’s configure build permissions for the reports data set access your data set from the workspace hover over the record select the ellipses or three dots to the right of the data set’s name and select manage permissions in the manage permissions pane select add user and then input the names or email addresses of the users or groups you want to grant build permissions to in the permissions dropdown select allow recipients to build content with the data associated with this data set this allows users to create new reports or visuals based on this data set coupling it with reshare ensures they can distribute their creations to others to restrict re-sharing simply uncheck the reshare option after configuring the permissions as desired select the grant access button having explored sharing via links you should now be familiar with sharing a URL in PowerBI service the different link types and generating a URL or link to share a report links and their related permissions are instrumental for sharing your reports safely in the business world data is power but it must be handled responsibly data analysts often work with sensitive client and employee data which must be safeguarded carefully fortunately they can use PowerBI’s data sensitivity labels to protect this information in this video you’ll learn how to identify data sensitivity labels and how to work with data sensitivity labels at Adventure Works customer and employee information needs to remain confidential lucas has just completed a new sales report this data is confidential so it’s important that he correctly labels the report as so let’s learn more about data sensitive labels and how Lucas can use them to categorize data powerbi’s data sensitivity labels allow you to categorize data and safeguard the company’s reputation and trust they act like digital tags showing the level of confidentiality data requires they guide users on how to handle data responsibly these labels are part of a security system across Microsoft’s products when you apply them in PowerBI you set the data sensitivity level properly using these labels ensures data protection especially when sharing or exporting there are six different categorizations of data sensitivity labels used in PowerBI personal public and general and there’s also confidential highly confidential and restricted let’s learn more about these labels by exploring how Adventure Works makes use of them in PowerBI from the left sidebar of PowerBI select Workspaces then select the workspace that contains the report or dashboard you wish to configure in this instance you need to configure Lucas’ sales report inside the workspace choose the sales report with the report open select the title at the top of the screen in the drop-own menu access the sensitivity label dropdown if you haven’t applied a label before you might find that the label reads none or no label in a faded gray color signaling its dormant state select the sensitivity label drop-down to show the range of available options select confidential for the current report let’s take a moment to review these labels the personal sensitivity label denotes data linked to specific individuals but not intended for the wider organization for example a junior data analyst might share information with a senior data analyst this information is valuable but doesn’t need to go to the entire company adventure Works often creates content for a wide audience including customers stakeholders and the public this content is labeled as public for example a brochure showcasing Adventure Works new bike range for an exhibition is intended for wide distribution without any restrictions the general sensitivity label is for information meant for the broader internal audience without specific sensitivities like Adventure Works monthly newsletters which cover company events and other general news this information is for all employees not external stakeholders and the general label keeps it freely accessible within the company the confidential label deals with sensitive information across departments this label is for data that needs careful handling it’s for valuable data that’s not intended for everyone like PowerBI reports shared between data analysts the highly confidential label safeguards Adventure Works critical innovations it’s for essential sensitive data like research into new products or markets this label ensures limited access protecting valuable information for project insiders at the highest level of data sensitivity is the restricted label for adventure works it means maximum secrecy and caution it’s for data that requires extensive protection like top executives discussing mergers acquisitions or critical contracts the restricted label keeps this monumental data as secret accessible only on a need to know basis now that you know the different labels let’s label the sales report select confidential for the current report the selected label appears near the report’s name at the top of the screen this signifies that you’ve successfully labeled your report in this video you learned how to identify sensitivity labels and how to work with sensitivity labels not all data is the same certain data must be treated more carefully than others use tools like data sensitivity labels to protect the integrity and confidentiality of your data many people think sensitive data leaks only happen because of a targeted attack from cyber criminals but sometimes unintentional internal leaks can be just as damaging meet Daniel daniel has been part of the Adventure Works team for the last 3 years as an IT specialist daniel’s life is busy and with his first kid on the way increasingly expensive while he’s happy at Adventure Works he sometimes wonders if he could earn more working elsewhere one day Daniel answers an IT help desk call from Maya on the payroll team daniel has never met Maya but he’s happy to help when she reports a problem opening Microsoft Excel attachments after a few minutes of troubleshooting Daniel has no success daniel asks Mia to send him an example of one of the attachments so he can check if it works from his side maya is anxious to get the issue resolved and without thinking she sends him the top email from her inbox which happens to be from HR when Daniel opens the attachment he discovers that it’s a complete list of salaries for all Adventure Works employees he’s a bit surprised to see this but he closes it down and helps Mia to adjust some of her trust center settings she verifies that this resolved the issue and they end their call daniel continues his work but before he logs off for the day curiosity gets the better of him he knows he shouldn’t but he reopens the attachment he received earlier from Maya he accesses the tab labeled IT department he sees his name and salary no surprises there he spots some names from the management team and he’s shocked by what some of them earn maybe he should consider management then he notices some other names these are names of colleagues on the same team as him friends he can’t resist looking at their salaries some are on a pretty similar pay scale to him but other team members earn significantly more per month he’s got no idea why this might be but he’s not happy he closes the spreadsheet logs off and heads home later that night Daniel can’t stop thinking about the salaries he saw it seems so unfair that people doing the same work as him earn more and some just joined Adventure Works in the past year daniel has been there over three years however the spreadsheets information is limited and doesn’t tell the full story the people on the list with higher salaries hold advanced qualifications that justify their higher pay and Daniel is in line for a promotion and a sizable salary increase next month in recognition of his hard work he has a bad night’s sleep and is not in a good mood when he arrives at the office the next day while he’s grabbing a muchneeded cup of coffee he bumps into Katie he confides in her about the salary information he saw the day before katie is annoyed too later that day she tells Caleb who then tells Sam and so it continues word is spreading and employee engagement has taken a hit daniel and Sam decide they’ve had enough of feeling undervalued and they accept slightly better paid positions with another company katie Caleb and the others have stayed where they are but they are not feeling very motivated with reduced headcount and disengaged staff the rest of the company has noticed that the quality of service from the IT help desk is slipping such a simple mistake could have been avoided if HR had used sensitivity labels with encryption settings on their sensitive files even if Mia had still inadvertently shared the Excel file with Daniel he would have been denied access to the file due to insufficient permissions life at Adventure Works would have carried on normally and Daniel would have received his muchdeserved promotion data helps businesses generate insights make decisions and succeed however not everyone in the business needs access to all its data sensitive data must be safeguarded with data permissions in this video you’ll learn about the risks of sensitive data and how to evaluate and safeguard these risks adventure Works relies heavily on data from sales reports to make decisions around its product lines however some of the Adventure Works sales reports also contain sensitive information on profit margins this information should be visible to senior leadership only let’s look at how PowerBI data set permissions can be used to restrict data access to only those who need it to perform their roles first let’s define what we mean by PowerBI data set permissions at the core of every datadriven organization lies its data sets data set permissions are the gatekeepers to these data sets as they’re like a series of digital locks and keys they’re permissions that ensure that the right individuals have the necessary keys to access specific data they strike a balance between accessibility and security all employees of Adventure Works have their own designated roles data permissions act as boundaries ensuring that everyone has access only to the data they need for their role the available permission types are read build reshare write and owner the first permission type we’ll explore is the read permission the read permission in PowerBI grants users the ability to view and understand data sets without altering the original content for example the marketing team at Adventure Works may need to look at the product sales report to analyze the effectiveness of marketing campaigns and promotions but they don’t need to alter this report in this case the read permission is sufficient it permits access while minimizing the risk of unintentional data modifications preserving data

integrity next we’ll explore the build permission the build permission enables users to construct visuals PowerBI reports and dashboards based on the available data without modifying the source data itself at Adventure Works the finance team responsible for creating and maintaining the sales data sets often find that sales representatives and product managers who have legitimate reasons to access the data are unintentionally changing key financial figures while exploring the reports this not only leads to incorrect financial analysis but also disrupts the financial team’s workflow by utilizing the PowerBI build permission the sales and product team can format the data for analysis without the risk of inadvertently altering it sharing information is central to collaborative environments like Adventure Works the reshare permission enables users to distribute specific data sets or reports to other users or teams permitted to access this information before a product launch at Adventure Works the finance team can use the re-share permission to share a tailored readonly data set with the marketing team this means the marketing team can optimize their advertising campaigns based on realtime sales data while the finance team is able to safeguard the integrity of their financial reports now we’ll examine the right permission the right permission in PowerBI allows users to alter data users with this permission have the authority to make modifications to the actual data sets adventure Works product development and marketing teams need access to the company’s sales and customer data granting the right permission allows the teams to not only view the data but also make specific updates and additions to the data set for example they can record customer feedback update product specifications and add marketing campaign results this permission when used cautiously ensures that Adventure Works data remains current and relevant however it comes with the caveat that any modification should be made with caution to prevent misinformation finally we’ll explore the owner permission much like the CEO overseeing every aspect of Adventure Works having an owner of the business data ensures centralized data governance the owner permission grants comprehensive control over data sets encompassing the capabilities of all other permissions owners can modify share build and even restrict access to data owners ensure that the correct data is available to the correct people safeguarding sensitive information while also fostering a culture of openness where needed with overarching control they are the custodians of data’s trajectory ensuring it aligns with the broader vision of the organization in this video you’ve learned about the risks of sensitive data and how to evaluate these risks and safeguard data these permissions promote data governance and integrity by ensuring that users only access the data relevant to their roles leading to more accurate analyses and informed decision-making as a data analyst you must ensure that your data sets are accessed only by relevant individuals and at the required permission levels so it’s important that you can configure data set permissions effectively in this video you’ll learn how to add and manage permissions for a data set in PowerBI adventure Works must share its sales report with the wider data analytics team however some team members must be assigned different data set permissions than others let’s help Adventure Works assign permissions as required upon successful login navigate to the icons on the lefth hand navigation pane select the workspaces icon select the Adventure Works workspace the Workspaces pane is where all your current and future workspaces reside browse through the data sets to find the Adventure Works product sales data set remember each data set can represent different departments or analytical perspectives once selected a new view appears on screen this screen provides useful details about the data set such as the current storage location the last date refreshed as well as existing reports and dashboards that currently use the data set find and select the file drop-down in the top left corner when this option is selected additional options appear such as download this file and manage permissions from the drop-down select manage permissions this option lets you oversee who can view or edit this data set a link section appears on screen these are sharable URLs that have been generated for this data set they act as direct gateways for users to access the data set without navigating the entire PowerBI interface each link outlines its creator who has access and the type of permissions assigned it allows you to maintain a clear shared data record ensuring that old links can be retired or renewed as needed next to the links tab select direct access the direct access tab enables you to grant direct access to a specific individual or group within Adventure Works here you will find the names of people and groups with access their email addresses and the type of permissions assigned select the add user button to add a new user you can input email addresses or names and PowerBI will suggest matches from your organization in this case you need to provide ADIO another data analyst access to the report once you’ve selected Adio you must assign in permission levels check the box that corresponds to the desired permission level for now you just need Adio to be able to read the data set assign read permissions you can add a personalized message explaining the reason for granting this access once you have selected grant access an email notification is sent to the user a new record appears in people and groups with access indicating that the user has been successfully granted access next you must remove access for the employee Kai as he’s no longer part of the project to remove access for a user or a group first locate their name in the people in groups with access section each name is followed by details such as the permission level and the date the access was granted next to each name is an ellipsus or three vertical dots which reveal additional options when selected within this menu locate the remove access button a confirmation pop-up appears select remove access it’s crucial always to be sure when revoking access to a data set as it can result in delays in accessing critical reports and dashboards upon removal the user’s name disappears from the people and groups with access list this immediate feedback confirms that the revocation action was successful finally you need to grant right access to Lucas identify his name in the list and select the ellipsus to bring up the menu select add right to assign right permission it’s important only to assign right access to people with the necessary understanding and responsibility you should now understand the process of granting and removing access to specified users with PowerBI these permissions help keep data in check and accurate by letting users access only the data they need for their roles improving analysis and decision-making data analysts often share sensitive data with people outside of the organization this means the correct permissions must be assigned when sharing links to this information to keep it secure in this video you’ll discover how to maintain data security and integrity when sharing information outside of your organization adventure Works needs you to share a PowerBI sales report with a new partner to prepare you for this task let’s explore the importance of maintaining the security and integrity of the data when sending it to outside stakeholders when sharing PowerBI reports externally it’s essential to protect sensitive data and respect privacy boundaries to prevent potential harm to the company and its stakeholders this involves carefully controlling what information is shared and maintaining strict security measures you can control this information using techniques like user licensing sharing permissions and rowle security or RLS there’s also data masking and anonymization report embedding and external sharing settings let’s explore these techniques in more detail when sharing PowerBI reports with external partners or vendors it’s important to ensure they have the right PowerBI Pro licenses for smooth access an Adventure Works admin can assign and oversee these licenses through the Microsoft 365 admin center requiring ongoing monitoring to maintain compliance and prevent violations next is the use of rowle security or RLS using rowle security is crucial especially when sharing sales data with external vendors adventure Works can ensure vendors see only relevant table data this technique keeps other sensitive information in the same table safe and inaccessible we’ll explore this more in a later lesson next let’s examine data masking and anonymization to protect sensitive data Adventure Works uses data masking and anonymization techniques this involves replacing real data with fake or pseudonmous data in Power Query allowing external partners to analyze trends without accessing Adventure Works sensitive information another technique is report embedding when Adventure Works shares PowerBI reports externally they choose secure embedding methods like publish to web or embed code they use these options carefully considering the data sensitivity before deciding which one to use this is important to keeping data confidential and limiting report access to the right people these embedding methods allow you to add reports to external platforms while keeping control over who can see and access the data next is external sharing settings to enable external sharing Adventure Works adjusts their PowerBI service settings controlled by the PowerBI admin these adjustments include various configurations to maintain the company’s security standards such as authorizing users or groups for external sharing and setting content restrictions they can also control the links expiration time and mandate authentication for external users to access shared content lastly let’s examine the use of sharing links adventure Works boosts report security by creating safe links with clear permissions making them a safer sharing choice these links can have expiration dates and be limited to specific users reducing the chance of unauthorized access you can use these features to share a sales report with the new partner so that it can only view required data in this video you discovered how to maintain data security and integrity when sharing information outside your organization as you explore and share data always be sure that you retain its integrity and confidentiality data analysts are often required to share sensitive data with multiple teams and departments this can pose a problem if the wrong individual accesses specific data fortunately you can use rowle security or RLS to ensure that your data remains accessible and protected in this video you’ll learn about the importance of maintaining data integrity how to evaluate and safeguard these risks and how RLS regulates data access adventure Works needs your help to manage data access for its global team of employees and customers effectively you can use role level security in PowerBI to tailor data access by region and role ensuring data integrity and confidentiality companywide let’s explore the basics of rowle security and how you can use it to help adventure works we’ll begin with an explanation of what we mean by rowle security rowle security or RLS ensures that only authorized individuals can access the right data this helps to preserve the security and integrity of your overall data sets in other words rowle security controls who sees what data based on predefined roles and rules it’s especially important when many different actors are interacting with the same data essentially it ensures that each person can view only the data they need and sensitive information is safeguarded let’s explore some of the advantages of implementing rowle security rowlevel security gives you precise control over who views what this helps prevent accidental data leaks by safeguarding sensitive data from unauthorized users as an organization expands its data scales and increases in its complexity rls makes it easier to handle these more complex data access needs you can use RLS to establish new rules for accessing data without starting from scratch compliance and auditing play a vital role in any organization rls helps companies comply with data privacy regulations it simplifies auditing by keeping track of who can access what for companies like Adventure Works data breaches pose a significant threat rls reduces the risk of data breaches with RLS even if someone unauthorized gets into a PowerBI report they can’t see data they aren’t assigned to this adds a layer of security against data breaches while there are many benefits to rowle security there are also several potential issues you could encounter if it’s not managed correctly using security layers especially dynamic RLS can slow down data retrieval because it filters data in real time monitor performance especially with big data sets to keep things running smoothly rowle security often requires maintenance that’s why regular checks and updates as roles and access needs change are important periodically review the RLS settings to make sure they still work well for your organization to ensure that the correct access is given to the correct individual when you set up RLS test it thoroughly to ensure the rules work and give the right access regular testing helps prevent data leaks and keeps everything working as expected next let’s explore the different kinds of rowle security static and dynamic static rowle security in PowerBI creates predefined rules to control data access based on user roles it restricts users to specific data ensuring that they only see information relevant to their roles for example a new hireer on your team has been tasked with analyzing sales of mountain bikes in North America this means they should not have access to sales data for other products or regions with static rowle security you can establish clear rules that ensure they can only access data related to sales of mountain bike products in North America dynamic rowle security in PowerBI adjusts real time data access based on user roles this permits users to view only the data that’s relevant to them at any given moment dynamic rowle security uses DAX or data analysis expressions formulas and user roles in PowerBI to filter data based on specific conditions these conditions could include user attributes or affiliations stored in a database for example your new hire has successfully analyzed sales of mountain bikes in North America so they’ve been tasked with analyzing sales of mountain bikes in other regions this means that PowerBI can now grant them access to data for other regions with dynamic row security the system can adjust its access so the new hire can view sales data for specific regions as required in this video you’ve learned about the importance of maintaining data integrity how to evaluate and safeguard these risks and how it regulates data access you should now be familiar with the basics of rowle security and how it ensures that data remains accessible and protected by using rowle security you can ensure that each entity gets the correct data in the right situation as a data analyst it’s important to control access to your data so that others can only view information relevant to their roles a useful method of safeguarding data is configuring security at the table row level in this video you’ll learn how to configure static rowle security on a data set in PowerBI your team member Addio Quinn needs access to the latest sales reports to analyze sales data from North America let’s configure static rowle security so Adio can only view the data required to complete his task to begin select the modeling tab then choose the manage roles option in the manage RO section you need to create a new role with the relevant permissions for audio select the create button to add a new role right click on the new role and choose rename rename the role as marketing North America to maintain a structured and organized role management next select the table you want to filter in this case it’s the sales table then right click on the table name and select add filter to specify which data rows this role can view choose the region field from the drop-own list and add it to the table filter DAX expression area the table filter DAX expression is where you define the limits for each RO’s data view it’s crucial to be precise about the data accessible to users in this role select the region field and input a relevant DAX expression stating that the region’s value should equal North America this DAX expression ensures that AIO can only view North American data to verify if the expression works as intended select the check mark icon in the top right corner of the manage roles window after creating your DAX expression select save to confirm your changes and establish clear visibility boundaries now you need to ensure that everything works correctly select view as and test the configuration choose the marketing North America role and select okay to view the data from a user’s perspective and verify its accuracy once you’ve completed your check select stop viewing to exit the view as ROS feature be sure to save your settings after saving your RO definition go to the home tab and select publish in the publish to PowerBI dialogue box choose Adventure Works the current PowerBI workspace you’re working in click the select button powerbi publishes the report to your chosen destination the time required for this process may vary based on the report size and your internet connection a new dialogue box confirms your report’s successful publication access the Adventure Works workspace and locate the newly published report and data set identify the data set with the same name as your report it’s now available in the PowerBI service and can be adjusted for user access select the ellipses next to your data set name to open a list of options choose security from the list to display the role level security settings from here you can assign user roles in the role level security settings locate the role you created in PowerBI desktop marketing North America then access the members area and enter Adio’s email address this action assigns Adio to the role of member and grants him access to North American marketing data next select add then select save to enforce the role assignments locking in the user access levels if Adio attempts to access data outside of North America he will see blank visuals as he only has access to marketing data related to the North American region you should now be familiar with the process steps for configuring static row security on a data set in PowerBI as a data analyst it’s your job to keep data safe and accurate so make sure that you always configure static role level security as required during a project the roles and needs of your users may often change which requires constant updating of data access permissions that’s a lot of work if you’re using static rowle security however with dynamic rowle security you can adjust data access automatically as roles change in this video you’ll learn how to configure dynamic rowle security or RLS on a data set in Microsoft PowerBI and how to assign validate and publish a report secured with dynamic RLS access PowerBI and open the Adventure Works product sales report locate and select the modeling tab in the ribbon area at the top of the screen on the modeling tab locate the security group in this group select the manage roles choice a dedicated manage roles window opens this is the area where you can define and manage roles create a new role using the manage roles dialogue box name the new role dynamic sales access now you need to apply filters select the role you just created then locate and select the table you wish to apply a filter to in this case it is the sales table next right click on the table name and select add filter select the email field from the drop- down list to add it to the table filter DAX expression area this area establishes visibility boundaries for each role determining what data each user can view you must now formulate a DAX expression that equates data from the table’s email column to the user principal name function the user principal name function fetches the user’s email address it then filters data dynamically by limiting the user to rows or data that match their email address for instance Lucas who works in sales and marketing can only access data relevant to his marketing campaigns this ensures he can’t access confidential data from other business areas to verify the syntax of your DAX expression select the check mark icon on the top right side of the manage rolls window if the expression is correct select save in the bottom right to confirm the change to the role once the role has been created and configured it must be tested to ensure it works as required select the view as choice on the modeling tab this opens a view as roles dialogue box then select the other user choice and enter Lucas’s email address then select okay you can now view the data as if you were Lucas if you are content with the validation exit the view as ROS mode by locating and selecting stop viewing at the top of the window save your changes to ensure your created role is not lost this ensures that all your configurations are stored securely after saving the role definition select the home tab and select publish in the publish to PowerBI dialogue box choose your current workspace and then the select button depending on the size of the report and your internet connection the publication process could take a few moments a new dialogue box confirms that your report has been published successfully next locate the newly published report and data set the data set can now be configured for user access select the ellipses security choice next to the data set name select security from the list this displays the rowle security settings of the report the role you created in PowerBI desktop is displayed in the left pane once the role is selected on the left email addresses can be added in the members pane on the right type in Lucas’s email to assign him to that role and give him access to specific data areas next select add and save to enforce the role assignments locking in the user access levels you can repeat this process for other users as required adventure works can now distribute the report with the knowledge that its data is safeguarded and you should now understand how to configure dynamic rowle security and assign validate and publish an RLS configured report searching for daily reports in PowerBI can be a time-consuming task wouldn’t it be great if they arrived automatically in your inbox at a set time each day thankfully you can configure this setup with report and dashboard subscriptions over the next few minutes you’ll learn how to set up subscriptions to your reports and dashboards and review the advantages of this setup every morning Lucas reviews his PowerBI workspace for new reports and dashboards this is a time-consuming process by configuring subscriptions he could have these assets delivered directly to his email subscribing to reports and dashboards in PowerBI offers a wide array of advantages let’s take a closer look at those benefits a PowerBI subscription is an automated delivery system that sends daily scheduled snapshots of your reports and dashboards as an email or as a notification this turns a tedious manual process into a seamless and automatic one one of the main benefits of subscribing to reports and dashboards is quick access to data once there’s a new update you and all other subscribers receive an instant update or alert this ensures that decision makers always operate with the most current data with a subscription Lucas can ensure that his sales and marketing insights are always drawn from the most recent reports and dashboards subscriptions also boost efficiency and productivity manually pulling up the same report day after day is a tedious task but you can automate this process with subscriptions your teams can prioritize more important tasks and dedicate more resources to analysis and insight instead of wasting time fetching reports with a subscription to the weekly sales dashboard Lucas could receive the latest sales and marketing data every Monday at 6:00 a.m sharp receiving regular reports fosters a sense of routine and consistency in data consumption with set delivery intervals users can create structured time slots dedicated to datadriven assessments a shared understanding is key to effective collaboration when multiple team members or teams subscribe to the same reports it establishes a uniformity in the information they base their decisions on everyone is working from the same version of each report now that you’re more familiar with the benefits and uses of subscriptions in PowerBI let’s configure a subscription for Lucas so he has quick access to the most up-to-date data all your reports dashboards and data sets are listed in your workspace select the report you’re interested in to open it once the report loads navigate to the top toolbar select the ellipses next to the edit button to open more options in a drop-own menu from these options select subscribe to report the subscriptions pane appears on screen you can use this pane to configure your subscription as follows first give your subscription a memorable name especially if you plan to set up multiple subscriptions decide how often you want to receive this report for example should it be daily weekly or even monthly depending on your chosen frequency set the specific time you’d like the report sent if you want other colleagues to receive this subscription add their email addresses here remember you also need access to the report to view it you can also add a custom message in the email received when the report is sent once you’ve set up your subscription select save and close or save to activate it you’ll then receive confirmation that the subscription is now active depending on your settings you’ll begin receiving the report via email based on your selected frequency select an existing subscription to view its details you can modify pause or cancel your subscription from this menu lucas now has daily automated access to sales and marketing reports and dashboards this gives him more time to analyze data and generate insights and you should now know how to set up subscriptions to your reports and dashboards and the advantages of this setup with PowerBI subscriptions you’ll work more efficiently consistently and faster this leaves you more time and opportunities to generate insights to help your organization achieve its goals much of your daily work as a data analyst involves analyzing data to generate insights but what if PowerBI could generate and deliver these insights to you with PowerBI data alerts you can receive automated insights that save time and effort in this video you’ll explore the benefits of data alerts and learn how to set up an alert in PowerBI at Adventure Works Lucas monitors and analyzes data for events like a spike in sales or a slowdown in production or shipping times however manually uncovering these insights takes time it would be much more efficient to configure data alerts that flag these events automatically let’s find out more about data alerts and how Lucas can use them for more efficient monitoring data alerts are essentially automatic notifications set up within PowerBI they inform users when specific conditions or thresholds in a dashboard are met or exceeded and these alerts can be customized to cater to a range of business needs there are many different benefits to data alerts a major benefit is real time decision-making data alerts notify data analysts immediately when specific metrics reach a predefined threshold this instantaneous awareness means decisions can be made quickly organizations can adapt to real-time changes in the business environment at Adventure Works Lucas can use data alerts to monitor sales spikes in Europe for marketing campaigns this realtime insight allows the European sales team to adjust strategies for maximum impact quickly data alerts also help with efficiency and timesaving manually analyzing data takes time by configuring data alerts that monitor important conditions data analysts can direct their attention elsewhere confident they’ll be notified if something requires their attention for example Lucas previously spent hours checking website traffic following the launch of new marketing campaigns now thanks to data alerts he’s instantly informed of significant traffic changes which frees his time for other tasks instead of discovering issues after they’ve occurred and seeking solutions data alerts can notify stakeholders of potential problems before they escalate for instance an alert can be triggered if a manufacturing process at Adventure Works starts to slow the company can intervene immediately before the slowdown impacts the wider production line this proactive approach can mitigate risks and prevent minor issues from becoming major problems data alerts also ensure that all relevant parties are notified about important datadriven insights for example if Adventure Works launches a new marketing campaign in Germany data alerts can notify the marketing and IT teams of surging website traffic this synchronization ensures greater collaboration the marketing team can assess the campaign success while the IT team can scale server resources and finally data alerts are highly customizable this lets different teams or individuals set alerts based on what’s most important to their role or department a sales manager might set alerts related to sales metrics while a supply chain manager might focus on inventory levels this personalized approach ensures that each stakeholder receives the most relevant data instead of unnecessary information now that you’re more familiar with data alerts let’s help Lucas set up alerts in PowerBI in your workspace is a list of reports dashboards and data sets select the report you’re interested in to open it once the report loads navigate to the KPI visual you wish to create an alert for it’s important to note that PowerBI differentiates between reports and dashboards dashboards are a collection of tiles each representing a specific visual or information alerts can be set on tiles pinned from report visuals or PowerBI Q&A and only on gauges KPIs and cards hover over the visual to pin it from your report to a dashboard then select the pin icon this action opens the pin to dashboard menu you can select the dashboard to which you want to pin the visualization and even change its theme a confirmation message appears once you’ve pinned the visualization select the messages go to dashboard option to view your pinned visualization move your cursor over the tile of interest an ellipsus appears at the top right corner select it to reveal a drop-own menu with additional options for that tile select manage alerts from the drop-own menu this opens the core settings for alerts related to this tile on the alerts menu select add alert rule you can now define a new condition for alerts a clear descriptive name for an alert like drop in shipping time provides a clear context next choose a condition parameter like above or below and set a numeric value this value becomes your trigger point for instance if shipping times drop below a set number it’ll trigger the alert you can decide the alerts notification frequency depending on how critical the data is if it’s a vital metric like manufacturing uptime you might instead set up every hour alerts for less urgent data every 24 hours might suffice once you’ve configured the alert to your satisfaction select save this activates your alert it’s good practice to review your alerts regularly to access your active alerts just select manage alerts again you can view and manage your existing alerts from the manage alerts menu frequently reviewing your alerts ensures that they’re still relevant to your organization’s goals outdated alerts might cause unnecessary distractions or lead you to miss out on critical insights you should now understand the benefits of PowerBI data alerts and be familiar with the setup process data alerts are a great tool for delivering automated actionable insights that save you time increase your productivity and help you and your organization succeed emily is the CEO IT specialist designer head of HR delivery driver and chief coffee maker at Ecocraft Furniture you name it Emily does it along with a small but close-knit team of other crafts people Ecocraft specializes in producing highquality sustainable furniture founded just two years ago the company is already exporting its products to various countries across North America and Europe the raw materials for Ecocraft’s furniture such as sustainably sourced wood and eco-friendly paints are imported from different countries this means transactions often take place in multiple currencies this has been one of the biggest challenges for Emily and Ecocraft fluctuations on the currency markets can significantly impact production costs and profit margins the company needs a system to issue alerts when rates are favorable for making large purchases or setting prices for overseas markets this would help Emily and Ecocraft manage budgeting and financial forecasting powerbi is the perfect solution for Emily she can use it to track important business metrics sales supply chain status and currency exchange rates emily decides to set up alerts on PowerBI for currency exchange rate changes this will give her the information she needs to make sound financial decisions the first step is to collect data emily enlists the help of her tech-savvy friend Alex who helps her create a robust data pipeline together they source real time and historical exchange rate data for the currencies of the countries from which they import raw materials they also collect data on their purchase orders and expenses related to each supplier next they create a dashboard to monitor various key performance indicators the dashboard will also identify patterns and potential risks associated with currency fluctuations the exchange rate data and other vital metrics like sales and supply chain status are displayed in real time emily configures PowerBI to send custom alerts whenever currencies pair like when the US Canadian dollar or the US dollar to euro cross thresholds that impact the company’s financials she sets these alert levels based on historical data and current business needs for instance if the exchange rate for the euro increases by more than 5% in a week Emily will receive an alert armed with these alerts Emily is better prepared to mitigate currency risk when an alert triggers she can immediately assess the potential impact on her production costs and take necessary actions this could include renegotiating contracts with suppliers and hedging currency exposure or seeking alternative suppliers from more stable regions shortly after setting up the PowerBI dashboard an alert indicates that the US dollar to euro exchange rate has dropped to a favorable level based on this information the team orders raw materials from the European suppliers saving thousands of dollars as Emily continues to use PowerBI and respond to alerts she gains deeper insights into her business she can analyze which suppliers are more cost effective based on currency trends and adjust her sourcing strategy accordingly these datadriven insights help the company to make more informed decisions save money improve the overall efficiency of its supply chain and ultimately increase profitability over time the currency alerts become integral to Emily’s business this provides the stability she needs to pursue her mission of creating beautiful eco-friendly furniture for years to come the company plans to extend the PowerBI platform’s capabilities to other business areas solidifying data as a core component of its growth strategy emily’s journey with PowerBI is a testament to the power of datadriven decision-making congratulations on reaching the end of these lessons on security and monitoring in PowerBI during these lessons you explored the role that security and monitoring play in safeguarding reports and dashboards in PowerBI let’s take a few minutes to recap what you learned in these lessons you first explored how to share information safely and identify sensitive data sensitive data is essential information that if leaked could damage the company’s reputation finances or privacy if the information is employee related the leak could damage an organization’s and its workforce’s relationship fortunately you can safeguard data in PowerBI using the following methods authentication and authorization systems ensure that those accessing the data are who they say they are assigning clear roles and permissions ensures that individuals can only access certain data rowle security or RLS filters data so that individuals can only access relevant elements of data sets data encryption prevents data from being intercepted during transmission data masking lets you work with obscured versions of data so that you can only view the information required to complete your task you also learn that sensitive information can be shared using links these links offer sharing options so you can control who views the data these options include people in your organization who need the data people with existing access to the data or specific people that you include directly and you can decide what recipients can do with the data using the following sharing permissions they can reshare the data with others or make use of the data to perform analysis another method of safeguarding data is the use of sensitive labels these labels let you categorize data making it clear who can access it these categories include personal which denotes data linked to specific individuals public which is data for a wider audience and general meaning information meant for a wider internal audience there’s also categories that govern more sensitive data the confidential label means the information is sensitive and requires careful handling highly confidential relates to sensitive data on critical business innovations and the restricted label is used for data that must be treated with maximum secrecy and caution you then demonstrated your understanding of sharing information in PowerBI by applying sensitive labels to an Adventure Works data set in the next lesson you explored the topic of organizations and permissions you discovered that access to data sets is governed by data permissions these ensure that only authorized individuals can access data powerbi offers the following permission types the owner permission grants a user complete control of a data set the read permission permits users to view but not alter data the reshare permission permits users to reshare data the build permission lets users utilize the data for analysis and the write permission enables users to alter data you then learned how to configure these permissions in PowerBI using the manage permissions option this option lets you create and manage URLs for data access that can be shared with your team you also learned that data can be shared outside of an organization however it’s important to consider which safeguards are most appropriate to ensure the data remains confidential you completed this lesson with a knowledge check in which you tested your understanding of data permissions and you reviewed additional resources to help you learn more about PowerBI and data permissions in the third lesson you reviewed rowle security for safeguarding data rowlevel security or RLS controls which individuals can view data based on predefined roles and rules some of the benefits of RLS include granular control over data the ability to scale as your data grows assistance with compliance and auditing and a reduced risk of data breaches however RLS also gives rise to several potential issues it can impact performance by slowing down data retrieval it requires regular maintenance and it must be tested frequently there are two types of role security the first is static static RLS restricts users to specific data so they can only view information relevant to their roles the other type is dynamic RLS dynamic RLS uses data analysis expressions or DAX to adjust real-time data access based on user roles you completed this lesson by undertaking a knowledge check focused on rowle security and you reviewed some additional resources on this lesson’s main topics in the fourth and final lesson you explored the topic of subscriptions and alerts in PowerBI you can subscribe to reports and dashboards a PowerBI subscription is an automated delivery system that provides daily data snapshots as emails or notifications the advantages of subscriptions include timely access to information a boost in productivity because more tasks are now automated consistency in data consumption and enhanced collaboration teams can now work from the same data sets you can configure subscriptions using the subscriptions pane in PowerBI with this feature you can name your subscription decide how often you receive it and even include other colleagues you can also modify pause or cancel your subscription as you need as well as subscriptions PowerBI also offers data alerts these automatic customizable notifications inform users when specific conditions or thresholds have been met or exceeded some of the benefits of data alerts include realtime decision-making efficiency through automation proactive problem solving enhanced collaboration and customization and personalization you can configure data alerts in PowerBI the manage alerts feature lets you set conditions and thresholds that determine when you receive alerts finally you demonstrated your understanding of these topics by undertaking an exercise in which you configured a data alert for Adventure Works you’ve now reached the end of this summary it’s time to move on to the discussion prompt where you can discuss what you’ve learned with your peers you’ll then be invited to explore additional resources to help you develop a deeper understanding of the topics in this lesson congratulations on everything you’ve achieved so far you’ve now reached the capstone project during this course you explored the role of PowerBI in business deploying assets in a PowerBI workspace and the role that security and monitoring play in safeguarding reports and dashboards in PowerBI let’s take a few minutes to recap what you’ve learned so far you began with an introduction to the role of PowerBI in business with a focus on data flow data flow in business refers to the movement of information within an organization this movement or flow occurs in the following stages: collection processing analysis and decision making once gathered the data is cleaned or standardized it’s then transformed data analysts use the refined data to generate insights the data is analyzed using PowerBI service this software offers many advantages for analysts it’s accessible scalable and offers collaboration tools and data backup and recovery features the data analyst is the central figure in this process they possess important skills and expertise in extracting valuable insights from data an important skill that all data analysts must possess is understanding structured query language or SQL data analysts use SQL to interact with the SQL databases that store the data analysts can connect to a SQL database using import or direct query modes import mode loads data directly into PowerBI direct query mode connects PowerBI directly to the source database an analysis is presented in the form of a report a report can be static or dynamic a dynamic report explores multiple areas of interest its results are presented in the form of visuals these reports also facilitate using whatif parameters that permit interactive adjustments to modify visualizations and generate insights into potential scenarios next you explored how to deploy assets in a workspace a workspace is a specialized area in PowerBI that holds important assets there are two types of workspaces in PowerBI the first is a personal workspace which you can use to store your content the second is a shared workspace where a team can collaborate on reports and dashboards workspace roles determine how individuals can interact with workspaces workspace roles include viewer contributor member and admin you can manage these roles using PowerBI’s manage access feature in the next lesson you learned how to monitor workspaces by monitoring a workspace you can measure its impact and make changes to increase its usefulness you also explored the topic of data sets and gateways in PowerBI a data set must contain the latest available information you can use a scheduled or incremental refresh to ensure accurate data and you can promote and certify data sets to inform your team where to access the most current and reliable data you also explored establishing a secure reliable connection between your on- premises data and PowerBI service using data gateways there are three types of gateways in PowerBI the on- premises data gateway the on- premises data gateway personal mode and the Azure virtual network or V-Net data gateway which type of gateway you choose depends on the setup of your organization and its specific data management and security requirements you also learned how PowerBI deployment pipelines move content through the following life cycle stages: development testing and staging or production another useful feature for maintaining your workspace is the lineage view this view shows the data journey from source to destination with all the connections in between impact analysis shows how changes to your data can impact or affect different assets in your workspace next you explored the role that security and monitoring play in safeguarding reports and dashboards in PowerBI you first explored how to share information safely and identify sensitive data sensitive data is essential information that if leaked could damage the company’s reputation finances or privacy you can safeguard data using PowerBI’s authentication tools you can also use sharing links to control who you share information with and use sharing permissions to determine what they can do with the data sensitivity labels are also another useful method of safeguarding data access to data sets is governed by data permissions these ensure that only authorized individuals can access data you can configure permissions in PowerBI to safeguard your data you also reviewed rowle security for safeguarding data rowle security or RLS controls which individuals can view data based on predefined roles and rules there are two types of rowle security static RLS restricts users to specific data dynamic RLS uses data analysis expressions or DAX to adjust real-time data access based on user roles finally you explored subscriptions and alerts in PowerBI you can subscribe to reports and dashboards a PowerBI subscription is an automated delivery system that provides daily data snapshots as emails or notifications you can use the subscriptions pane in PowerBI to manage your subscriptions as well as subscriptions PowerBI also offers data alerts these automatic customizable notifications inform users when specific conditions or thresholds have been met or exceeded during these lessons you also completed exercises in which you put your new skills into practice by helping adventure works with PowerBI knowledge checks which tested your understanding of these topics and additional resources in which you consulted Microsoft Learn articles to help you explore these topics in more detail you’ve now reached the end of this recap it’s time to move on to the capstone project which will test your understanding of these concepts through a series of exercises best of luck you’ve reached the next stage of the capstone project you’ve worked hard to get to this stage and made good progress let’s recap what you’ve achieved so far in the previous set of scenarios that you’ve just completed you prepared sales data configured data sources and designed and developed a data model you’ll begin this next stage of the capstone by configuring aggregations for Tailwind traders these aggregations will help the company generate insights into its financial performance as part of this scenario you’ll calculate sales and profits data and record the performance of visuals using the performance analyzer these aggregations will help generate insights informing the company’s strategic decisions for the upcoming business year by completing this exercise you’ll demonstrate your ability to create timebased summaries determine median sales volumes and utilize the performance analyzer tool next you’ll transform the insights you generated from configuring aggregations into a sales report tailwind Traders needs a report that helps to inform sales decisions the company needs your help to generate such a report using its sales data to generate this report you’ll complete the following tasks create charts and cards to visualize your data and add a slicer to your report aside from the sales report Tailwind Traders also requires a report that displays key insights into its profits creating this report will be your next task you’ll generate this report by creating charts and cards to visualize the data creating a KPI and adding a slicer through this and the previous scenario you’ll demonstrate your ability to create different kinds of charts to display sales data and display important sales metrics using cards and KPIs in the next capstone scenario you’ll help Tailwind Traders create an executive dashboard tailwind Traders will use the dashboard to generate insights into its global performance the dashboard must focus on sales and profits and be accessible on mobile devices you’ll create this dashboard by pinning sales and profits card visualizations and KPIs to the dashboard and configuring mobile view for the cards KPI visuals and core visualizations by completing this scenario you’ll show that you can create an executive dashboard in PowerBI display sales summaries highlight profit metrics use card visualizations for quick insights and configure a dashboard that’s mobile friendly in the final scenario you’ll need to help Tailwind traders to generate quick and actionable insights into its data you can carry out this task using PowerBI subscriptions and alerts features you’ll complete this task by creating daily alerts for key metrics and creating subscriptions for the sales and profits overview tabs by successfully helping Tailwind traders to generate quick and actionable insights you’ll prove that you can configure subscriptions and set up proactive alerts if you encounter any difficulty with these scenarios remember that you can refer to previous learning materials like videos and readings for guidance you’ve already completed similar tasks in the other exercise items in this course so you’re more than capable of working through these scenarios best of luck congratulations on completing the Capstone project it’s been a lot of work but you finally reached the end your completed Capstone PowerBI environment should contain sales and profits reports visualizations of the key metrics in your reports pinned to an executive dashboard and you should also have configured alerts and subscriptions let’s take a few moments to recap the exercises you’ve completed by reviewing examples of what the completed dashboard should look like don’t worry if these examples don’t quite match your dashboard you can review these best practice examples in more detail when you access the exemplars in the first exercise you configured aggregations using DAX you created measures to calculate the following: yearly profit margin quarterly profit and median sales you then assessed the performance of these reports in the second exercise you created a sales report you then visualized the data in this report using charts you created a bar chart for loyalty points by country a column chart for quantity sold by product a pie chart for median sales distribution by country and a line chart for median sales over time you also created cards for stock quantity purchased and median sales in the third exercise you created a profit report you then visualized the data in this report using charts you created a bar chart for net revenue by product a donut chart for yearly profit margin by country and an area chart for yearly profit margin over time you then created cards for year-to-ate profit and net revenue USD you then set up a KPI for gross revenue USD and added a slider for your profit report finally you saved and published the report once your profits and sales reports were completed your next task was to create an executive dashboard to create this data you created a dashboard called Tailwind Traders Executive Dashboard you then pinned the following sets of visualizations to the dashboard sales overview core visualizations sales overview card visualizations profit overview core visualizations and profit overview card and KPI visualizations once you finished pinning your visualizations you configured the mobile view for the cards KPI visuals and core visualizations in the final exercise your main task was configuring the dashboards alerts and subscriptions you first created a daily alert for gross revenue USD that informs Tailwind Traders when gross revenue drops below $400 US next you created and activated a weekly subscription for the sales overview tab ensuring it could be viewed and shared in PowerBI you then created and activated a weekly subscription for the profit overview tab ensuring it could be viewed and shared in PowerBI you’re now ready to begin working through the exemplers where you can compare your PowerBI environment against the best practice examples in more detail congratulations you’ve reached the end of this capstone project course you’ve worked hard to get here and developed many new skills you made great progress on your PowerBI journey this course and all you have achieved is a culmination of all the previous courses you’ve completed in this specialization having completed this course you now understand the basics of PowerBI’s relationship with business you’re familiar with the process steps for creating monitoring and maintaining workspaces you can connect data sets and gateways you can securely share information with your team and the wider organization and you can manage subscriptions and alerts in your workspaces with this course you were able to reinforce and demonstrate the learning and practical development skill set you have gained throughout this program this was achieved through hands-on guided practice configuring a PowerBI workspace for Tailwind Traders the graded assessment further tested your knowledge of PowerBI after completing the final project it’s a great time to pause and reflect on your journey you can reflect on the completed course from several vantage points you could consider the links between this course and the previous ones you’ve completed or you could reflect on the process of completing the project for example what were the hardest parts of the project what was the easiest what experience did you gain from the project and would you benefit from revisiting previous courses whether you’re just starting as a technical professional a student or a business user this course end project proves your knowledge of the value and capabilities of database systems the project consolidates your abilities with a practical application of your skills but the project also has another important benefit it means you have a fully operational PowerBI workspace to reference within your portfolio this serves to demonstrate your skills to potential employers and not only does it show employers that you are self-driven and innovative but it also speaks volumes about you as an individual and your newly obtained knowledge you’ve completed all the courses in this specialization and earned your certificate in PowerBI the certificate can also be used as a progression to other role-based certificates you may go deep with advanced role-based certificates or take other fundamental courses depending on your goals certifications provide globally recognized and industry endorsed evidence of mastering technical skills you’ve done a great job and should be proud of your progress the experience you’ve gained shows potential employers that you are motivated capable and not afraid to learn new things thank you it’s been a pleasure to embark on this journey of discovery with you best of luck in the future welcome to the Microsoft PL300 exam preparation and practice course a significant milestone on your journey toward becoming a certified Microsoft PowerBI data analyst if you’re motivated to set yourself up for a career in the world of data analytics you’re on the right track your learning journey in data analytics with Microsoft PowerBI has culminated in this course carefully designed to equip you with the knowledge skills and competencies you need to excel in the Microsoft PL 300 exam as you delve into this course you’ll navigate key PowerBI features and concepts that are integral to the PL 300 exam these concepts encompass a broad spectrum including data preparation modeling visualization and asset deployment plus by the end of the course you won’t just be well prepared for the PL300 exam you’ll also be equipped with valuable insights into your future career prospects in data analytics with PowerBI your course journey begins with a comprehensive review of fundamental concepts associated with data preparation and loading in PowerBI you’ll cover a range of essential topics such as the journey from exam preparation to Microsoft certification mastering the art of acquiring data from diverse sources and data profiling and cleaning as well as the intricacies of data transformation and loading the next part of your course journey involves a detailed recap of core data modeling concepts in PowerBI representing another crucial step in your preparation for the PL300 exam this will entail a thorough recap of designing effective data models and the creation of model calculations using DAX or data analysis expressions additionally you’ll delve into implementing well ststructured data models and optimizing data performance for efficient and seamless analysis following your refresher in data modeling you’ll take a turn toward revisiting essential concepts linked to data visualization and analysis more essential components to your PL300 exam readiness this part of the course encompasses creating impactful reports and enhancing and elevating those reports to boost usability and storytelling plus you’ll also focus on developing your skills in recognizing patterns and trends within data which is invaluable in data analytics after covering these critical content areas you’ll shift your focus to the deployment and maintenance of assets within PowerBI here you’ll refresh your understanding of pivotal topics like establishing and managing workspaces and assets you’ll also work on your proficiency in the efficient handling of data sets a skill that’s fundamental to the work of a data analyst to complete this course successfully you’ll have the opportunity to apply the skills and knowledge you have gained to a practice exam specially designed to simulate the conditions of the PL300 exam this practical hands-on assessment will allow you to assess your readiness and identify areas that may require further attention or improvement furthermore you’ll receive additional study resources and materials to further enhance your preparation you’ll also have the opportunity to explore different roles and career prospects that will be accessible to you once you’ve successfully completed the exam and obtained your Microsoft certification in sum the objective of this course is to prepare you for the PL300 exam and support you in realizing the next steps towards a career as a PowerBI data analyst the course is structured to prepare you thoroughly for assessment and guide you in recapping and consolidating the concepts you’ve acquired throughout the program it aims to increase your confidence in your competence and ensure you are truly exam ready as with the other courses in this program the videos readings activities and quizzes will contribute to you consolidating your knowledge and serve as a way for you to measure your progress beyond preparing for the PL300 exam this course holds a much larger promise it’s about more than just gaining knowledge and skills in data analysis in PowerBI it’s about taking an important step in setting yourself up for a career in data analysis a field filled with opportunities and potential by completing all the courses in the program you’ll earn a Corsera certificate which you can use to proudly showcase your job readiness to your professional network furthermore the program with an emphasis on this exam preparation and practice course will prepare you for the Microsoft Exam PL300 which leads to a Microsoft PowerBI data analyst certification globally recognized evidence of your realworld skills so are you ready to achieve exam readiness and take a leap toward a career in data analytics with PowerBI congratulations on reaching the home stretch of this program and all the best as you embark on the exciting and promising learning journey that lies ahead this is the final course in the Microsoft PowerBI data analyst professional certificate which will guide you through taking the PL300 exam and earning the associated Microsoft certification by obtaining the Microsoft PL300 certification you can unlock various career opportunities enhance your knowledge and skills and cultivate a competitive edge in the job market exams are nothing new it’s likely that you’ve encountered similar challenges earlier in your career just like before it takes preparation to make the most of it and the more effective your preparation the more benefits you will reap from all your effort this video provides a quick overview of what you can expect from the PL300 exam the logistics around taking the exam and the steps you need to take to prepare for success you can take the PL300 exam online at your home or office through Pearson View online you can also take your exam with Pearson View at one of their worldwide test centers pearson View is a global leader in computer-based testing and assessment services their Onview platform employs several security measures to ensure a fair and secure testing experience you can schedule your exam for a specific date and time on the Pearson View website there are a few important things to do before the day of the exam these include a system check making sure your ID document meets the specified requirements and choosing the appropriate space to take the exam the PL300 exam is a proctored exam which means that you are monitored by a live proctor or exam supervisor through your webcam during the exam the proctor ensures that you follow the exam guidelines and don’t engage in any prohibited activities the proctor will also give you certain instructions during the check-in process on the day of your exam there are very strict rules about what items and actions are allowed while taking the exam which you’ll learn in greater depth later it’s critical to understand these policies because failing to adhere to them will result in the termination of the exam session let’s move on to the topics covered in the exam to succeed in the PL300 exam you should be proficient at using Power Query and writing expressions using DAX or data analysis expressions you should know how to assess data quality as well as understand data security including rowle security and data sensitivity the PL300 exam measures your ability to accomplish the following technical tasks: data preparation data modeling data analysis and visualization and asset deployment and maintenance there are certain percentages of exam questions relating to each of these categories knowing these percentages can help you focus your study schedule on the categories that carry the most weight and help you prepare in the most effective way you can look forward to exploring the specific ways in which the skills related to each of these categories might be assessed later you can also consult the detailed exam skills outlined provided by Microsoft effective exam preparation not only requires a lot of dedication but you also need to consider effective strategies for during the exam for instance you should consider the type of questions you might get and how to approach them some helpful strategies include reading every option before choosing a final answer and following a process of elimination when you are unsure you will learn more about these and other strategies later one of the best forms of preparation is to take a practice test before the exam this way you can monitor your progress and identify the areas that might require a little more attention later in this course you will take two mock exams each one will focus on the topics and key concepts covered in the previous courses and the skills measured in the PL300 exam this video gave you a bird’s eyee view of how the PL300 exam works what it tests and some core elements of an effective exam preparation strategy you’ve already put in a lot of hard work by engaging in course material exercises and assessments during this program you are in a good position for the final preparation before taking the exam the information and materials in this lesson will help you focus your preparation in this final stage toward earning the Microsoft PowerBI data analyst certification datadriven enterprises rely on data analysts to provide them with accurate and insightful analysis as you’ve learned finding the correct data sources is essential for data analysts to help businesses achieve their goals in this video you’ll recap the importance of identifying the right data sources and connecting to data sources with Microsoft PowerBI as you begin the data analysis process identifying what data is required and which sources can provide the data is the first step toward a successful analysis outcome for example when looking to increase sales your social media accounts and popular search engines become your key data sources to analyze marketing data similarly if you’re looking to improve customer satisfaction tracking the volume of support requests and resolution time from your customer support system is the key data source fortunately PowerBI comes with over 100 connectors to allow you to tap into the different data sources available to you these include spreadsheet sources such as Microsoft Excel user directory services such as Microsoft Active Directory SQL databases such as Microsoft Azure SQL databases and text files in various formats such as XML JSON and CSV plus Microsoft continues to add new connectors and update existing connectors each year now let’s explore how to connect to a data source in PowerBI in PowerBI desktop select get data followed by Excel workbook when the file browser opens navigate to the folder that your Excel file is in select the Excel file then open the navigator window will open displaying all the available sheets within the workbook select the check boxes beside the sheets that you want to import at the bottom of the navigator window are three buttons: load transform data and cancel selecting load will load the data directly without cleaning or transforming it for this example let’s select transform data to open the Power Query Editor and inspect the data powerbi will begin loading the data note that this may take a few minutes depending on your computer and the size of the worksheet once the data is loaded the Power Query editor will open power Query allows you to apply transformation operations to the data before loading it into PowerBI on the left side of the editor is the queries pane where each table is listed selecting a table will allow you to clean and transform its data each row of data in the table is listed in the main working view on the right side of the editor is the applied steps list this lists each of the transform operations being applied to the data and the order in which they are being applied note that if you need to change the source of the data query you can select the cog icon beside the source step this opens a window where you can change the file from which the data is loaded if you’re satisfied with the existing data source you can close the window by selecting okay in this example let’s use the data as is without cleaning and transforming it select the close and apply button in the top left corner of the editor to finish transforming the data and load it into PowerBI powerbi will begin loading the data with transformations applied to it again this may take a few minutes depending on your computer and the size of the worksheet once the data is loaded you can begin working with it to build reports and dashboards if you want to inspect the data after loading select the table icon on the left side of the interface to open the table view also known as the data view in this view you can inspect each table and each row of data working with data sources is an important aspect of the role of a data analyst this video revisited the importance of identifying the right data sources and how to connect to an Excel data source load its data using Power Query Editor and configure the data source settings by selecting the cog wheel next to the source step in the applied steps pane as you solve business challenges unlock new opportunities and optimize existing processes consider which data sources can provide the data you need to achieve your objectives powerbi with its more than 100 connectors makes it possible for you to harness these sources to their fullest potential with hundreds of connectors in Microsoft PowerBI it should be no surprise that a wide range of options are available when using these connectors previously when you used an Excel worksheet as the data source the data imported into PowerBI but for larger volumes of data importing may become a resource inensive operation this is where choosing a different storage mode like direct query comes in in this video you’ll revise the different storage modes available in PowerBI powerbi Desktop supports three different storage modes also known as connectivity modes or data set modes in PowerBI service import mode direct query mode and dual mode when you use import mode data is copied from the data source to PowerBI this allows quick access to the data locally however if the data source is updated after importing you must refresh the data source fortunately you can configure PowerBI to schedule refreshes at specific intervals such as daily or weekly when you use import mode consider how up-to-date the data must be for stakeholders to make datadriven decisions effectively another consideration when using import mode is the required storage space if you are working with an extensive data set storing all the data on your local device may not be possible in today’s datadriven world it is not uncommon to see data sets consuming several gigabytes of storage so what about data sources with significantly large volumes of data a scenario where import mode may be unsuitable by changing to direct query mode PowerBI will query the data source directly for data rather than importing it this means that when a report is displayed in PowerBI each visualization will send a query to the data source to request the required data to determine what connectivity mode is supported you can refer to Microsoft’s documentation for your chosen connector one disadvantage of using direct query is that it requires transferring query results from the data source every time a query is made depending on the volume of data this may take some time slowing down visualizations and reports to improve the user experience PowerBI also provides a dual mode this mode is a combination of the direct query and import modes depending on the query and data source PowerBI will store a local copy of query results and refresh the copy as needed this helps improve the responsiveness of visualizations and reports without importing all data into PowerBI as you build data models in PowerBI connecting to multiple data sources is common when your data model connects to multiple sources it is known as a composite model with composite models you can configure the storage mode for each table in the model for example let’s say you have two tables in your data model products and sales in a niche business the product data set might be a small Excel spreadsheet and the sales data a large data set stored in a SQL database in this scenario it would make sense to use import mode for the products table and direct query or dual mode for the sales table this would help ensure no slowdown in your reports and that the viewers have a good user experience but what about connecting to a data set on PowerBI service powerbi features a type of connector called live connection which allows you to use direct query with data sets published to PowerBI service powerbi service becomes an important data source for building reports and dashboards as an organization grows hosting data in PowerBI service allows the organization to have one source of truth to maintain consistency and accuracy in reporting the benefit of using live connection is that security rules can be applied to the data ensuring that company data remains protected from unauthorized viewers in this video you recaped import direct query and dual storage modes to help you choose between them choosing the right storage mode is important to ensuring a good user experience for different stakeholders if data retrieval is slow reports and dashboards will also be slow which may result in stakeholders not utilizing the insights unlocked by your data analysis as you proceed through the data analysis process carefully consider which storage modes are suitable for different data sources and how they should be configured query parameters are a useful feature in Microsoft PowerBI for simplifying a dynamic element of your data for example changing between a test data source and a production data source or filtering data from your data source in this video you’ll revise how to configure query parameters and the values that they use in the Power Query Editor there’s an Excel data source loaded containing stock orders for different business regions because the data set is quite large let’s use query parameters to filter the data needed to do this select the manage parameters button in the home tab of the ribbon menu this opens the manage parameters window to filter the data by country you need to add a country parameter in the manage parameters window select new in the name field enter country in the description field let’s add a note that this parameter filters the stock order data by country ensure that the required option is enabled so that report users must specify a value for this parameter for the type field let’s change the type to text as the country values are text values also since there’s a fixed list of countries in the data let’s change the suggested values to list of values in the list of values add the three countries present in the data the United States France and Germany for the default value select United States this will be the default value for users of this data set for the current value select United States then select okay this adds the parameter to the queries pane to ensure that the data source query utilizes the parameter select the stock orders query in the queries pane then select the filter button in the country column followed by text filters and equals which opens the filter rows windows in the filter rows window change the filter value button to parameter this will then change the equals filter to utilize the previously defined country parameter you can then select okay note how the data set is now filtered by the country parameter in the home tab of the ribbon menu select close and apply to load the data set to confirm that the parameter has been applied select the table view button also known as the data view in this view it is clear that the data set contains only stock orders for United States this matches the current value specified for the country parameter earlier to visualize how this parameter is used let’s create a simple report containing a card visualization navigate to the report view in the visualizations pane select the card visualization the visualization is then added to the report now select the visualization in the report in the data pane also known as the fields pane let’s select the unit price field this applies the unit price field to the visualization in the visualizations pane in the data field rightclick the sum of unit price and then select average the visualization now displays the average value of the unit price field in the data set to change the parameter you can select the drop-down of the transform data button in the home tab of the ribbon menu then select edit parameters in the edit parameters window let’s change the country parameter to France then select okay powerbi now displays a notification that there are pending query changes if you select apply changes the parameter change will be applied note that the average value in the visualization has changed this is because the data set has now been filtered for only stock orders in the France business region to confirm this let’s select the table view button in this view it is clear that the data set contains only stock orders for France in this video you recaped how to change the values in a parameter query parameters are a great way to filter your data queries dynamically as you begin building reports and working with more extensive and multiple data sets consider how you can use query parameters to reduce the scope of data being retrieved by PowerBI optimizing your reports and providing a better user experience as a business continues to grow so does the challenge of managing large volumes of data and ensuring that the data is wellformed and ready for analysis microsoft PowerBI’s data flows help to solve this issue by creating reusable data transformation logic in this video you’ll explore what a data flow is how it works and how to connect to one in PowerBI desktop maintaining a single source of truth is important in a datadriven enterprise it ensures consistent analytical conclusions are obtained from the underlying data one method of ensuring a single source of truth is by creating data flows in PowerBI service a data flow is a collection of tables that exist within PowerBI service you can add and edit tables in your data flow apply transformations and manage data refresh schedules directly from the workspace in which your data flow was created each table consists of columns and rows each cell in a table is known as an entity data flows promote the reusability of underlying data elements preventing the need to create separate connections with your cloud or on premises data sources if you want to work with large data volumes and perform the extract transform and load or ETL process at scale data flows with PowerBI premium scales more efficiently data flows act as data sources for your data sets in both PowerBI service and PowerBI desktop data flows can also act as data sources for other data flows however when using a data flow there are important considerations and limitations to keep in mind if a data flow links to another data flow the maximum number of linked data flows in the chain is 32 this is known as the maximum depth you need a PowerBI premium subscription in order to refresh more than 10 data flows across the workspace data flows are managed individually this means that there is limited visibility into dependencies between data flows in PowerBI data flows you can use parameters but you can’t edit them unless you edit the entire data flow when creating a data set in PowerBI desktop and then publishing it to the PowerBI service ensure the credentials used in PowerBI desktop for the data flows data source are the same credentials used when the data set is published to the service previously in this course you walked through how to create a data flow let’s take a moment to explore how to connect this data flow to PowerBI desktop launch PowerBI desktop and select more from the get data drop-down list of options in the get data dialogue box that appears select Power Platform from the left column and select data flows from the right column of the dialogue box then select next if you are connecting to the dataf flow for the first time a dialogue box opens where you need to sign into your PowerBI service account after you enter your login credentials select connect a navigator window appears displaying the workspace and the data flow you created previously expand the workspace and data flow to display the available tables the two tables that you imported during the creation of the data flow are available here select both tables fact internet sales and dim date followed by load the tables are loaded into the PowerBI model a process you may be familiar with you can establish relationships between the data tables and create reports and visualizations as you typically do with any data set once the data is updated in the source data set you need to go back to PowerBI service and refresh the data flow or configure the scheduled refresh of it you will revise scheduled refresh later data flows are a powerful feature that enable you to centralize your data as a single source of truth as an organization grows data flows help to encourage consistency and reuse of data leading to effective decision-making within the organization businesses operate with many data sources from SQL databases to Excel spreadsheets but with multiple data sources comes varying degrees of quality some sources may be perfect and ready for analysis but others require quality checks cleaning and transformation in this video you’ll revise the importance of inspecting data before loading it for analysis before loading a data source into PowerBI it is essential to evaluate whether the data source will provide the data that you require and if the format is compatible with PowerBI utilizing the wrong data for analysis can lead to incorrect conclusions being drawn or even worse wrong business decisions being made once you’re satisfied that the data is suitable the next step is to load it into PowerBI when you first load a data source PowerBI inspects the first 1,000 rows of data of each table to determine the data types of each column powerbi supports multiple data types such as numeric types date and time types text and true or false in most scenarios PowerBI will automatically determine the correct type however while this automatic feature is useful it is important to inspect the results of it in the data view also known as the table view of Power Query Editor incorrect data types can cause significant issues later when writing DAX queries building reports and analyzing the data if you need to change the data type use the Power Query editor to perform the transformation once the correct column types are established it is important to evaluate the statistical distribution of the columns in PowerBI this is done using three data profiling tools column quality column distribution and column profile let’s revisit each of these profiling tools starting with column quality column quality displays the percentage of data that is valid in error and empty in an ideal situation you want 100% of the data to be valid column distribution displays the distribution of the data within the column and the counts of distinct and unique values distinct values are all the different values in a column including duplicates and null values distinct tells you the total count of how many values are present on the other hand unique values do not include duplicates or nulls unique tells you how many of those values only appear once lastly column profile provides a more in-depth look into the statistics within the columns for the first 10,00 rows of data this column provides several different values including the count of rows which is important when verifying whether you imported your data successfully for example if your original database had 100 rows you could use this row count to verify that 100 rows were in fact imported correctly additionally this row count will show how many rows PowerBI has deemed as being outliers empty rows and strings and the min and max which will tell you the smallest and largest value in a column respectively this distinction is particularly important in the case of numeric data because it will immediately notify you if you have an anomaly in your data such as a maximum value that is beyond what your business identifies as a maximum now let’s recap how to access these profiling tools in the Power Query Editor a sales data set has just been loaded in the Power Query Editor the data set contains the transaction ID product ID quantity sales amount and other related data to inspect each column’s data type navigate to the transform tab in the ribbon menu to display the data type in the ribbon menu select the column and inspect its data type the data type is currently set to text for each column as the data in the first four columns are numeric update the first four columns to the whole number data type by selecting each column and changing the type in the ribbon menu note that when the data type is changed a new step is added to the applied steps list remember you can edit remove and reorder the steps in this list next let’s update the sales amount column to the decimal number data type and finally update the transaction date column to the date data type next you have to evaluate the column quality distribution and profile to do this navigate to the view tab in the ribbon menu enable the column quality column distribution and column profile options in the menu the view now updates with the corresponding statistics each column is 100% valid meaning there are no errors or empty values in the quantity column there are four distinct values and zero unique this means that among this data there are four values that occur in the quantity column but none of them are unique in the column statistic panel the count is 52 since there are 52 rows of data this is the correct number the minimum and maximum values for the quantity column are within the expected range for the business if there were any issues with this data further transformation would be required to clean the data you will learn more about transformation later in this course the data is ready for import navigate to the home tab in the ribbon menu and select close and apply profiling your data is important for ensuring accurate results later in the data analysis process without accurate data businesses can’t unlock the insights that they’re seeking remember accurate and consistent data is a requirement for a successful datadriven enterprise as you know by now datadriven organizations rely on data to make informed decisions and drive innovation however the effectiveness of such decisions is greatly dependent on the quality and consistency of the data poor quality data and inconsistencies can lead to expensive mistakes missed opportunities and damaged reputations in this video you’ll explore resolving inconsistencies and issues in your data let’s start by exploring the question what is data quality data quality refers to the accuracy completeness and reliability of the data as a future data analyst a key responsibility of your role is ensuring that data is of high quality before it is used stakeholders and decision makers rely on accurate data to assess performance and build strategies inaccurate or incomplete data can lead to inaccurate reports and misguided decisions such decisions could have significant effects on the business if the business is operating in a regulated industry such as pharmaceuticals the wrong decision could lead the business to fall out of compliance with regulation and be subject to fines or legal proceedings for example duplicate entries in your marketing data could lead management to overstock certain products increasing costs and negatively impacting the finances of the business the common types of inconsistencies and quality issues that can occur are duplicate rows empty or missing values and errors or invalid values fortunately PowerBI comes with tools to help analyze the quality of your data and resolve inconsistencies and errors previously you learned how to use data profiling tools to analyze a column’s quality distribution and profile which helps identify irregularities in your data you also learned how to ensure that the column has the correct data type now let’s revisit how to use the Power Query editor to resolve other data quality issues and inconsistencies here in Power Query is a data set that contains several data quality issues the first issue is that every row is duplicated to resolve this navigate to the home tab on the ribbon menu then select the remove rows button and select remove duplicates power Query has now removed the duplicates and added a step to the applied steps list for removing duplicates next there are some values in the transaction date column that are null the sales team has informed you that there was an error on their system and the date was the 1st of January 2023 to fix this select the replace values button under the home tab the replace values dialogue box appears here specify null as the value to find and 1st of January 2023 as the value to replace with select okay and the changes are applied again note that a new step is added to the applied steps list in the sales amount column one of the values is spelled as the words 500 instead of the number to fix this use the replace values dialogue again this time specifying the words 500 as the value defined and the number 500 as the value to replace select okay to apply the changes now that the quality issues are resolved return to the home tab in the ribbon menu and select close and apply to apply the changes maintaining data quality is a key aspect of being a data analyst by regularly evaluating and auditing your data you can help maintain the accuracy of your analysis and help organizations make effective decisions that will lead them to success data comes in different forms a telephone number is not the same as a block of text therefore ensuring these different forms are correctly represented and stored in table columns is important for accurate and consistent data collection and analysis in this video you’ll revise how to identify and transform column data types and how to create a new calculated column based on existing data in PowerBI a table consists of one or more columns of data as you add data to the table a new row is created in the table with a value in each column each column has a specified data type which determines how the data in the column is represented which calculations are available and how the data can be used in visualizations you’re already familiar with the different types of data in PowerBI including numeric types date and time types text and true or false once your data is loaded into a table you may identify missing data for example suppose you are working with a table of products consisting of two columns cost and sale price for the report you’re building you also need to display the profit per product sold since the data is not provided by the data source you can use a calculated column to derive the value required calculated columns use a data analysis expressions or DAX formula to create new values for each row in the table like in the previous example these calculated columns will often use values from existing columns to derive their values based on the example the formula to create a profit column would be profit equal sale price minus cost this is a simple example but DAX is a powerful expression language that you can use to create complex formulas to derive insights from your data now let’s take a moment to review how to identify a column’s data type transform the column and create a new calculated column in PowerBI load and open the sales data set in the Power Query Editor as you’ve previously learned PowerBI automatically determines the data type based on the first 1,000 rows of the data set however it is best practice to inspect the data type of each column before importing to do this select the first column in the main working view in the home tab of the ribbon menu the data type is specified as whole number inspect each column noting that all columns except the last one are set to the whole number data type the last column transaction date is set to date data type all types are correct except the sales amount column since a currency amount can have numbers after the decimal place you need to change this column’s data type to fixed decimal number to do this select the column then select the data type in the ribbon menu and select fixed decimal number in the drop-down note that this can also be done in the transform tab of the ribbon menu a prompt appears asking if you want to replace the existing change type step in the applied steps list or add a new step for this example select add new step a new change type step is added to the applied steps list now that the data types for each column are correct you need to add a new calculated column the data set is missing the sale price per unit which is calculated as the sales amount divided by the quantity to do this select the add column tab in the ribbon menu and then select custom column the custom column prompt appears for the new column name enter sales amount per unit next you need to complete the custom column formula powerbi provides a list of available columns on the right side of the prompt first select sales amount and select insert this adds the sales amount column to the DAX formula in the custom column formula type space then forward slash and then space forward slash is the division operator in DAX then select the quantity column in the available columns list and select insert on the bottom left of the prompt note that PowerBI has detected no DAX syntax errors then select okay power Query has now added the calculated column to the table select the column to inspect its data type the column has been created as an any type change the column to a fixed decimal type and the data set is now ready in the home tab on the ribbon menu select close and apply to begin importing the data into PowerBI as you work with large data sets consider how correct data types and calculated columns can help optimize the visualization of your data saving calculation time during visualization will improve the user experience and drive engagement with the reports you are building as you begin working with multiple data sources keeping track of the different queries can grow in complexity very quickly this is where PowerBI’s query pane and reference queries become crucial to a data analyst in this video you’ll learn about the query pane and how to effectively manage queries using it in PowerBI when you connect to a data source it creates a query in the query pane as you begin applying transformations these exist within the context of the query however if you are working with large data you may need to apply multiple transformations inserting data into tables at different stages doing this with a single query can become difficult to maintain very quickly this is where duplicate and reference queries come in in the query pane you can duplicate a query to create a copy and perform different transformations on it from the original query this allows you to transform data into different formats and insert it into different tables for example let’s say you have a sales data set that contains the following columns sales date item quantity shipment address and shipment country you need to build a table for sales and a table for countries the sales table can be imported from the data set but unfortunately you don’t have a separate countries data set so you need to build a table from the sales data set in this scenario you can duplicate the query rename it to countries and apply the necessary transformations to remove all columns except shipment country remove duplicates and import the data into a country’s table you now have a table containing all countries that sales have shipped to in this scenario duplicate queries make sense as you have two completely different sets of transformations and resulting tables if there are common transformations this creates an issue for maintainability let’s work through an example where duplicate queries could create problems again let’s say you have a sales data set that contains the following columns: sales date item quantity shipment address and shipment country you need to build a table for sales and a table for countries however in both tables you need to rename the shipment address column to address and shipment country column to country if you duplicate the query you will need to apply this transformation in both queries and if you need to update this transformation later you will need to do it in both queries well this is a simple example if you had a series of more complex transformations maintaining these in two different queries could easily result in mistakes and human error this is where reference queries are important to use reference queries allow you to use another query as the base of a query using the previous example you can apply the column rename transformations in one query and then create two new queries which reference the first query to perform the subsequent operations to create the sales and country tables now if you update anything in the first query the dependent queries will be automatically updated this reduces the complexity and effort of maintaining queries minimizing the risk of human error it also increases the efficiency of PowerBI as PowerBI can pipeline results from the first query as input to the dependent queries instead of repeating transformations multiple times on multiple queries when importing very large data sets efficient queries can be the difference between a few minutes and a few hours of importing data duplicate and reference queries require much consideration when working in PowerBI identifying when efficiency and maintainability are needed is an important skill to develop as you progress in your career as a data analyst and can help you perform effectively in your role as you work with multiple data sources you’ll discover that the data is often disjointed and needs to be combined and transformed into a data model that is suitable for analysis in this video you’ll explore how merge and append queries in PowerBI can combine multiple data sources into single tables suitable for visualization and analysis in later stages of the data analysis process it is common to encounter data that is broken down into multiple files or data sources for example sales data might be stored in one Excel file per month or perhaps sales data was originally stored in Excel files but later moved to a SQL database however to effectively analyze this data you require it to be contained in a single table in PowerBI fortunately the Power Query Editor contains the append queries feature which allows you to append multiple sources into a single table using the earlier example let’s say you have one Excel file containing sales for January the file contains the columns sales date product name and sales amount you then have a SQL database containing a table with sales for February with the same columns as the Excel file using an append query you can combine the data from these two data sources into a single table containing sales for both January and February but what happens if the columns are different suppose that the SQL table contains an additional column named discount when the append query executes it will insert null values in discount column for rows that originate from the Excel file append queries works well when the columns in the data source are well aligned and the desired resulting table should match the format of the data sources however you may encounter more complex scenarios requiring the merging of data from different sources this is where merge queries comes in let’s say you have a table of customers named customers from a customer relationship management or CRM system you then have a table of sales orders from a SQL database named sales you want to prepare a single table containing the most common cities where orders are delivered to to do this you’ll need to merge the tables from the two data sources using a merge query to merge two tables you need to tell the merge query which type of join you would like to use the join type informs PowerBI how to merge the two tables a join requires that there is a common column between the two tables in our previous example the sales table contains a unique customer ID which is present in the customers table this is known as the join key once the join key is determined the join type must be chosen powerbi supports the following join types left outer right outer full outer inner join left anti-join and right anti- join let’s explore each join type and the way it combines data from multiple tables based on matching criteria to understand the join types picture two tables one of the left side named sales and one of the right side named customers the sales table contains the columns sales ID customer ID and sales amount the customers column contains the customer ID country and name columns the customer ID column in both tables will act as the join key with a left outer join the resulting table will contain all rows and columns from the left table merged with all matching rows and columns from the right table this results in a table with the column sales ID customer ID sales amount country and name if the sales table has a customer ID that does not exist in the customers table the name and country columns for that row will contain null values in a right outer join the resulting table will contain all rows and columns from the right table merged with all matching rows and columns from the left table this results in a table with the columns sales ID customer ID sales amount country and name if the sales table contains customer ids that are not present in the customer’s table these rows are excluded from the results a full outer join simply merges all rows and columns from both tables into the resulting table if the sales table contains rows that do not match the customer’s table null values will be inserted for the country and name country columns if the customer table contains rows that do not match the sales table null values are inserted for the sales ID and sales amount columns in an inner join the resulting table only contains the matching rows from both left and right tables a left anti-join will keep rows from the left table that do not have matching rows in the right table note that this will still include columns from the right table but since there is no match in the right table every row will have a null value in these columns a right anti-join will keep rows from the right table which do not have matching rows in the left table again note that this will still include columns from the left table but will have null values for these columns in each row merge and append queries are valuable tools in your data analysis toolkit they allow you to combine tables from multiple data sources into a format that aids rather than hinders the data analysis process as you continue through the data analysis process designing a schema to represent your data is a key step before diving into the analysis itself this video will explore table relationships and how to identify appropriate keys for establishing relationships a table relationship is how two tables are connected to each other let’s say you have two tables sales and products the sales table contains the following columns sales ID sales amount and product ID the products table contains the columns product ID product name and product category in the products table the product ID column is what’s known as a primary key each value in the product ID column is unique that is if one row has the ID of 11 no other rows in that table will have that ID therefore a primary key uniquely identifies a row in the table in the sales table the product ID column is what’s known as a foreign key it’s not the primary key of the table but instead it establishes a relationship to the products table this means that each row in the sales table is associated with a specific row in the products table if a row in the sales table has a value of 11 in the product ID column it is therefore associated with the row in the product table which has a primary key of 11 for primary and foreign keys the whole number data type is most commonly used however there are scenarios where a non-numeric identifier may be used for example if you are analyzing countrybased data you could use the two-letter standard identifier for each country such as US for United States DE for Germany and so on now that you know how to establish a relationship between two tables the next important aspect is the cardality of the relationship in PowerBI there are three types of cardality one many to one or one to many and many to many to explain these cardalities let’s say that you have two tables table A and table B a onetoone relationship would mean that each row in table A is directly related to only one row in table B and vice versa for example if table A contained countries and table B contained capital cities the relationship would be one to one as each country has only one capital and each capital belongs to only one country a many to one relationship would mean that multiple rows in table A can be related to a single row in table B the relationship from table B to table A is a one to many relationship that is each row in table B is related to multiple rows in table A our earlier sales and products example was an example of a many to one relationship multiple rows in the sales table are associated with one product in the products table a many to many relationship would mean that each row in table A is related to many rows in table B and each row in table B is related to many rows in table A for example if you had a table of books and a table of authors a book can be written by multiple authors and an author can write multiple books establishing relationships is an important aspect of building a schema for your data model you will learn more about schemas and data modeling later table relationships are an important consideration when modeling your data in PowerBI using incorrect relationships or cardality can lead to wrong insights and results in the data analysis process as a data analyst it is your responsibility to ensure correctness in the data model so that a successful analysis outcome can be achieved congratulations on completing the first part of the Microsoft PL300 exam preparation and practice course designed to help you achieve your PL300 certification you’ve discovered much about the PL300 exam and honed your data preparation skills and knowledge within Microsoft PowerBI to ensure your success let’s recap some key takeaways and insights you’ve covered so far you began with an overview of the course and how it will prepare you for your certification journey you explored the syllabus course structure and helpful tips for success you delved into all things Microsoft certification as part of your exam preparation you identified key knowledge and skills measured in this course’s mock exam and the PL300 exam learning how to plan your study time effectively the steps to register and schedule the procedur exam were outlined offering a clear road map to taking the exam you also discovered more about the administration of the PL300 exam so you know what to expect you explored testing strategies and the advantages of practice assessments and mock exams you also had the opportunity to discuss exam preparation with your fellow learners armed with more knowledge about the PL300 exam you moved on to reviewing exam content focusing on data preparation in Microsoft PowerBI you began by revisiting the practicalities of getting data from various sources you learned the importance of choosing the right data sources and were reminded of PowerBI’s extensive range of connectors you were guided through connecting to an Excel data source and loading data via the Power Query Editor and you explored configuring data source settings you also explored the difference between local and shared data sets the pros and cons of import direct query and dual modes and choosing different storage modes you gained handson experience setting up and configuring a data set reviewing the advanced query capabilities of Power Query and using query parameters in Power Query expanded your toolkit you covered connecting to a data flow recapping data flows and creating them in a workspace you also explored the difference between data flows and Microsoft data versse enriching your expertise then you focused on the critical task of profiling and cleaning data you covered evaluating data data statistics and column properties reviewing why data evaluation is crucial Power Query’s profiling capabilities and different evaluation methods through an interactive activity you practiced analyzing a data set for anomalies and statistical irregularities preparing you for real world scenarios as a PowerBI data analyst you also explore data inconsistencies unexpected or null values and data quality issues you may encounter as a PowerBI data analyst as well as resolving data import errors next you explored the transforming and loading data you reviewed creating and transforming columns understanding the importance of selecting appropriate column data types and how to transform columns and create calculated columns in Power Query you brushed up on shaping and transforming tables and applying query steps to shape the data exploring reference queries you recaped when to use reference or duplicate queries you also unpacked the differences between merge and append queries and explored the different types of joins finally you reviewed how table relationships work identifying appropriate keys for relationships and configuring data loading for queries in a PowerBI project you now have detailed insight into what taking the PL300 exam entails and have boosted your skills and knowledge in data preparation with PowerBI and that’s not just good for the exam it’ll also contribute to your success in the world of data analytics previously you covered how to establish table relationships building on this you will explore how to design a schema that contains facts and dimensions when deciding on the data schema you plan to use for your analysis the most common schema types are star and snowflake schemas you may recall that in these schemas data is broken down into fact and dimension tables fact tables represent a business processes measurements metrics or facts they can contain several repeated values for example one product can appear multiple times in multiple rows sold to different customers on different dates these values are used to create aggregations during visualizations dimension tables store contextual data or descriptive attributes about the facts these tables are connected to the fact table via key columns you can use dimension tables to group or filter data in the fact table during visualization in Microsoft PowerBI in the context of an Adventure Works data set with sales and product tables the sales table is the fact table as it contains transactional information about the sales process the product table is the dimension table as it contains the contextual information the product sold for each sale in the star schema the most common data model a single fact table is typically related to one or more dimension tables the snowflake schema further normalizes the dimension tables for example the product table is broken down into product category and product subcategory tables based on category ID and subcategory ID now let’s revisit how to create and configure a star schema in PowerBI launch PowerBI desktop and load the data from the Excel workbook containing Adventure Work sales data the data set contains four data tables one fact table the sales table and three dimension tables these are product region salesperson navigate to the model view where you can create and configure the data model and build a star schema once you load the data PowerBI auto detects the relationships between the data tables based on the key columns you can disable this function from options and settings to create and control the nature of relationships between your data models you can establish the relationship between the fact and the dimension table in two ways to build a star schema remember in a star schema the fact table is at the center of the star the first method is simply dragging the key column from the fact table to the dimension table in the current data set drag the product key column from the product table and drop it on the product key column in the sales table if there are no duplicate values in the product key column of the product table PowerBI automatically establishes a one to many relationship with a single cross filter direction repeat the same process for region and salesperson tables to relate these dimension tables to the sales fact table let’s delete the relationships to explore the second way to build the star schema right click on the connector line and select delete the relationship select manage relationships from the home ribbon a manage relationship dialogue box appears on screen here you can either select autodetect or new with the autodetect selection PowerBI identifies the key columns and establishes relationships in your data similar to when you load data into the PowerBI data model for the current exercise let’s select new a create relationship dialogue box opens select tables cardality and cross filter direction for all data model tables one at a time your star schema is ready to use for your analysis and visualizations practically in a star schema dimension tables are typically positioned above the fact table to give it a waterfall-like structure these dimension tables are used for filtering the fact table meaning the typical direction of the filter is like the flow of water from the waterfall in this video you explored how to build and configure the star schema from the adventure works data set data modeling is a key skill set that you need to master in your journey to become a successful PowerBI analyst and succeed in the Microsoft PL300 exam role-playing dimensions enable data to function dynamically and facilitate better informed decision-making this involves assuming the perspective of your data to play multiple roles and uncover insights that might remain hidden to the untrained eye in this video you’ll recap roleplaying dimensions and the use relationship function in Microsoft PowerBI in business intelligence a role- playinging dimension is a single dimension that can be used for different purposes in the same data model using an adventure works example you might have a date dimension table that connects to various fact tables like sales purchases and inventory this date dimension could play distinct roles like acting as order date when examining sales data purchase date when working with purchases or inventory check date for inventory related analyses previously you encountered a practical scenario involving role-playing dimensions a single sales table that contained multiple date related fields like order date shipping date and delivery date in this case the date dimension table in your model can be related to the sales fact table via multiple relationships to accommodate the different date roles such as new sales shipping dates and receipt dates however remember that only one relationship can be active at a time and the remaining relationships must be inactive you can switch the active relationship manually from the manage relationship in the PowerBI model view continuing with the previous example you would need to import Adventure Works sales data into PowerBI desktop to implement the roleplaying dimension and start building the relationships between the date dimension and the sales fact table the date dimension table is the roleplaying dimension in this scenario and is used for the entire analysis and visualization in PowerBI in a realworld environment you often need to analyze data and present information from a distinct perspective for example Adventure Works might need information about its sales values based on shipping or delivery dates currently the data model contains only one date dimension which is role-playing one way to achieve this is to duplicate the date dimension and rename it shipping date although this is not a practical approach fortunately PowerBI’s formula language DAX provides the solution with its use relationship function creating a measure using the DAX use relationship function temporarily switches the inactive relationship to active let’s break down the DAX formula to create a measure that calculates sales values based on shipping date the code is defining a new measure or calculation called total sales orders shipped in this formula the calculate function alters the filter context of the entire measure within the calculate function it uses the sum function to sum up the sales amount column of the sales table as the default relationship between the sales table and the date table is based on the order date column each DAX calculation is based on the relationship between the tables the use relationship function in DAX overrides this relationship and establishes a temporary relationship between the date column of the date table and the shipping date column of the sales table this inactive relationship becomes active only during the current calculation when using the use relationship function there are some essential points to consider you can only use use relationship within DAX functions that take filter as an argument for example calculate calculate table and total YTD when rowle security is defined for a data table you cannot use the use relationship function otherwise PowerBI will return an error you must first define relationships in your data model because the use relationship function uses existing relationships the column used as the argument in the formula must be part of the relationship if not an error message will display on screen you can nest up to 10 use relationship functions in a single expression lastly in a onetoone relationship use relationship can only activate a relationship in one direction meaning filter propagation will be in one direction only to activate birectional filter propagation you need to use two use relationship functions within the same expression mastering creating custom measures within your data model using a use relationship function and implementing role-playing dimensions are two methods you can use to handle the inactive relationship between data models these skills will not only help you to succeed in your Microsoft PL300 exam but will be valuable in practice as a PowerBI data analyst by now you have an idea of evaluation context and how it works in DAX calculations all DAX calculations compute measures under row and filter context calculate along with its companion calculate table is the only DAX function that can alter the filter context during a DAX calculation in this video you’ll revise how to use calculate to manipulate filters at Adventure Works the management team wants to analyze granular levels of sales data for example suppose the sales manager needs information about the sales of mountain bikes in Europe only a product specialist is interested in the performance of a specific color product that the company recently launched and the United States Countrywide manager wants to filter out the sales amount for the newly hired salesperson all this granular information is easy to compute using DAX measures in PowerBI you can filter the entire sales measure for a specific color product a particular region a salesperson and so on using calculate this will change the filter context of the measure from all to the filtered arguments let’s examine the syntax of calculate and how it impacts the filter context of the calculations in a DAX formula that calculates the total sales of red products the DAX code uses the calculate function and specifies a filter condition where the product table’s color column is equal to red when you use this measure in a matrix or table visual the filter over product color is added to the already existing filter placed by the matrix itself on the month column in the first column the month is the filter context filtering sales for each month the total sales measure computes the sales amount for each month for all products this time adding product color equals red as an additional filter context in this syntax a condition is used to apply the filter over product color however in the DAX engine filter arguments of calculate are tables so the same calculations can be achieved by a formula where the DAX engine converts the previous shorter syntax of calculate to a longer syntax let’s explore this behavior from another perspective if you visualize the total sales by color in a matrix the filter context is filtering the product color the presence of the all function in the longer expression means the outer filter over product color is ignored and replaced by the new filter introduced by calculate in the matrix the sales values for the red products are repeated in all the rows for each row the filter introduced by the matrix is the corresponding color and the red product sales imposes a new filter forcing red to be visible this means the new filter introduced by calculate overrides the existing filter so the sales values are computed within the filter context that filters only red products let’s say the European sales manager of Adventure Works needs the sales amount of red products in Europe only you need to introduce another filter argument within the calculate expression this expression applies two filter functions to the overall filter context of the calculation namely the product color filter to include only red products as in the previous example and takes the region groups as an additional filter to specify Europe as the region the measure presents the sales of red color products in Europe for various months likewise you can perform further granular analysis to compute the sales amount for individual categories product salespersons resellers of the company and so on from the examples you have learned the calculate only modifies the outer filter context by applying new filters this is done by either overriding the existing filter or by combining new filters with the existing ones the evaluation context and calculate function are the foundation of the DAX language making these fundamental skills any PowerBI analyst should master to pass the PL300 exam and to handle realworld analytical challenges previously you learned that multiple data tables constitute a data model for instance a star or snowflake schema a relationship exists between the data tables why does this relationship exist a model relationship propagates filters applied to one column of the table to another model

table a filter can only propagate if there is a relationship path to follow which may involve multiple model tables this video will cover the cardality types and cross- filter directions that exist between the data tables in Microsoft PowerBI in a model relationship two columns are involved from two different tables one from the from side and one from the two side of the relationship both these columns must be of the same data type at its core cardality defines the nature of the connection between two data tables it tells you how many values in one table correspond to how many values in another each relationship must have at least two data tables a from side and to side of the relationship the column on the from side of the relationship must contain unique values while the two side column can have duplicate values powerbi supports four types of cardality these are one one to many many to one and many to many when you establish relationships between tables by dragging the key column from one table to another PowerBI automatically detects and sets the cardality type by sending queries to investigate which columns contain unique values however sometimes PowerBI’s autodetected cardality is not correct therefore it is recommended to check the cardality type before starting analysis and visualization now let’s start by reviewing the onetoone relationship one type of cardality supported in PowerBI a onetoone cardality means both related columns contain unique values this is not a common type of relationship in data modeling consider an example where Adventure Works has two dimension tables product and product category each table has a skew or stock keeping unit column all fields in these columns contain unique values a onetoone relationship exists between these two tables based on the skew column because it’s common to both this means that when skew filters the product category table the product table will be filtered for products associated with the skew next are the one to many and many to one cardality types these two types are essentially the same where each value in one table column is related to multiple values in another it is also the most common type of cardality in PowerBI data models it ensures slicing and dicing data allowing for drill down analyses to uncover granular insights for example in an adventure works data set the sales table also the fact table is related to the region table or the dimension table both tables have a sales territory key column which establishes a one to many relationship between the tables in the region table the sales territory key field contains a unique value in each row as each region only exists once in the table each region can have multiple sales so their sales territory key may be repeated in multiple rows of the sales table a many to many relationship means both related columns can contain duplicate values this type of relationship is used when designing a complex data model typically it’s also used to relate two dimension tables or two fact tables for example consider the relationship between a financial corporation’s customers and the various financial products they hold a customer can hold many financial products and each financial product can be held by many customers a many to many relationship supports the duplicate customer ID data in both tables now that you’ve covered the cardality types in PowerBI let’s delve into how these cardonality types influence the cross filter direction you may recall that cross filter direction refers to the direction of filter propagation between two related model tables it dictates how data from one table influences the data in another table enabling relational analysis without resorting to complex queries or manual data consolidation single cross filter direction means the filter propagates unidirectionally from one table to the other within the relationship and both means the filter can propagate in both directions a relationship that filters in both directions is commonly described as birectional the cross- filter direction is dependent on cardality type onetoone relationships support only both cross filter direction one to many and many to one relationships support both types of cross filter directions many to many relationships can have a single cross filter direction where table A filters table B or table B filters table A or both of these single cross filter directions simultaneously although you can set and configure cross filter direction in PowerBI desktop’s model view in real world scenarios it’s often necessary to answer business questions that require changing the direction of filter propagation manually adjusting the cross filter direction to meet these analytical requirements is not practically feasible dax provides the solution with its cross filter function with the cross filter function you can change the cross filter direction for a specific measure while maintaining the original settings the syntax of the cross filter function takes three arguments let’s examine this syntax briefly in the first argument table one name refers to the name of the first table and column name one refers to an existing column within that table usually representing the many side of the relationship to be used similarly in the second argument table two name refers to the name of the second table and column name two refers to an existing column within that table this time usually representing the one side of the relationship to be used finally filter direction represents the cross- filter direction to be used you can define this as none single or both cross filter directions in the expression both cardality and cross- filter direction are the key analytical concepts in data modeling and analysis as the businesses continue to rely on datadriven decision-making mastering key skills in data modeling and DAX will set you on a path to becoming a functional and influential analyst you have just imported a data set for analysis and upon careful investigation you’ve realized that some information required to address business questions is missing in the data set creating calculated columns to add the missing information into your data tables is a concept you’ve learned before and will be covered briefly in this video calculated columns are custom data columns that are created within a Microsoft PowerBI data model using data analysis expressions or DAX language unlike standard columns that store data directly from imported data sets calculated columns contain formulas that drive values from existing data once you add a calculated column to your data model by defining a DAX expression you can use this column to generate any report and visualization just like the standard columns calculated columns are stored in the data model level and therefore consume memory so you have to be careful not to use too many calculated columns the standard columns of a data model are populated with the imported data model whereas you need to define a DAX expression to populate a calculated column from the existing data the data can be taken from multiple columns and tables of the data model that you must define in the DAX script remember calculated columns can be created from the report view data view or model view of PowerBI desktop and are based on the data you have already loaded into your data model for instance if you have a customer data table with two distinct columns containing information about the first and last names of the customer and you want to combine these two columns into a single column containing the full name of the customer you can use a DAX expression to concatenate the two columns into a single calculated column one of the most common examples of populating a data table with calculated columns is creating a date dimension table previously you populated a date table with various calculated columns like year month name month number and so on now let’s briefly recap the DAX syntax for defining calculated columns the syntax starts with the name of your calculated column followed by an equal operator then write the names of the tables to be referenced in single quotation marks and their respective column names in square brackets include a relevant arithmetic operator or any other expression for example at Adventure Works you are creating a sales report based on geographical information in the geography table both city and state information are available in separate columns displaying only the city name in a visualization might create ambiguity because of the same city name in multiple regions of the globe you can solve this by creating a calculated column using the following DAX code to create a new column in your geography table city and state and an equal operator region in single quotation marks followed by region then city in square brackets the concatenation operator followed by state in square brackets you also learned that if you want to include data from two different tables of the model you first need to make sure the tables have appropriate relationships and secondly you need to use the related DAX function in your formula let’s now recap the benefits of using calculated columns the first benefit is that it enhances data transformation calculated columns help you transform raw data into meaningful information for instance you can convert currency values calculate percentages and so on the next benefit is dynamic and interactive reports you can use calculated columns to introduce slicers and filters to make your report interactive and dynamic another benefit is consistency by embedding calculated columns within the data model you can ensure consistency in your reports changes in source data reflect instantly in calculated columns thereby reducing the risk of errors the last benefit of calculated columns is complex analysis whether it’s timebased calculations statistical analysis or forecasting calculated columns with the power of DAX allow you to tackle intricate data challenges calculated columns are indispensable tools in PowerBI offering a means to shape and analyze data effectively they enhance your data model by introducing new information based on an already loaded data set that allows you to reveal the hidden insights of your data the key lies in mastering the art of crafting the calculated columns using DAX to extract valuable information as a data analyst you receive data from different sources you clean and transform the data and build an effective data model for accurate and effective analysis to ensure an accurate and effective analysis you need to put on DAX magnifiers to see the hidden information in your data data analysis expressions DAX can be used to build calculated tables calculated columns and measures measures are of special significance as they do not take space from your PowerBI memory and are not stored in the model the measures are executed dynamically and can thereby integrate any filter context you apply while writing the script measures in PowerBI are the calculations that summarize aggregate or perform complex calculations on data the calculations can range from simple sums to intricate analyses with the use of these measures you can go beyond basic data visualization as they allow you to drive insights make data back decisions and unearth patterns and trends within your data set that at first glance are not noticeable you can create measures in PowerBI in two ways quick measures and custom measures using DAX in a previous lesson you covered how to create quick measures in PowerBI for Adventure Works timebased analysis to recap briefly PowerBI supports the following types of calculations in quick measures this average per category calculation lets you create the average variance min and max for each category you can apply some fundamental filters in this category of calculations powerbi allows you to create some basic time intelligence calculations like year-toate or YTD monthtoate or MTD and yearover-year or Y with this calculation category you can calculate the running total or total for each category basic mathematical operations are used like addition and subtraction to create quick measures simple concatenation can be done for your measures although you can create a handful of quick measures in PowerBI to get some quick insights the real analytical power of measures lies within the DAX logic dax allows you to write complex logic in the form of formulas and expressions custom measures refers to userdefined calculations or metrics created using DAX to generate insights about the data through aggregations calculations time intelligence functions and so on for example suppose Adventure Works needs to analyze its sales and profit data for each product category in sales region you can compute DAX measures to calculate total sales total profit and profit margin percentage separately these measures can be visualized in your report and you can integrate any filter the company needs to evaluate the total profit and profit margin for each product category and region as mentioned earlier measures compute the values on the go for example when you apply the filter for the bikes category the profit measure will use the product category bikes as the filter during the calculation and only display the profit margin values for the bikes category this way you can help Adventure Works generate the insights needed let’s explore the DAX syntax to create simple measures for sales profit and profit margins that you can use to address Adventure Works needs for sales create a measure called sales then add the sum x function after the equals operator in your first parameter reference the quantity column from the sales table and multiply it with the unit price column from the sales table to calculate profit create a measure called profit and after the equal operator subtract the total cost measure from the total sales measure you can use a measure inside other measures as in the profit measure both total sales and total cost are pre-calculated measures used to compute the next total profit measure next for profit margin you’ll start by creating a measure called profit margin and after the equal operator divide the profit measure previously created with the total sales measure make sure to format the measure as a percentage so the measure will display the percentage profit in your visualization remember to format the measure appropriately for example profit and sales measures can be formatted as currency with two decimal places while profit margin measures need to be formatted as a percentage with two decimal places measures created with DAX provide a way to summarize calculate and compare data across various dimensions based on specific criteria and business requirements measures serve as a microscope to see and discover the hidden message of your data mastering DAX is the key skill for any data analyst and you will receive a considerable number of questions about DAX in your Microsoft PL300 exam time is the dimension that virtually underpins all data analysis and for this reason time intelligence functions hold a position of paramount importance time intelligence functions are specialized functions designed to work with date and time data enabling users to perform advanced temporal analysis and gain deeper insight into historical data previously you cover the theoretical foundations of time intelligence functions and gain significant hands-on experience in creating them to summarize and compare data over time in this video you’ll recap the important benefits of time intelligence functions and how you can implement them in the aggregation and comparison of data values let’s start with the benefits of time intelligence functions the temporal comparison function makes it easy to compare data across different time periods you can create measures using DAX to compute yearoveryear or quarter overquarter trends which allow you to track growth seasonality and performance the next benefit of time intelligence functions is that they allow you to compute moving averages moving averages are a valuable tool for smoothing out fluctuations in data and identifying trends and patterns over time this is particularly important in scenarios where noisy or erratic data can be a challenge with time intelligence DAX functions you can compute moving averages to enhance your data model and analysis time intelligence functions facilitate the creation of cumulative totals which help in understanding the progression of values over time these measures are crucial for tracking key metrics such as cumulative revenue profit or customer acquisition time intelligence functions also facilitate the creation of periodtoate calculations to simplify the process of calculating values from the beginning of a time period to a specific date this is a valuable set of DAX measures to compute metrics like year-toate and month-to-ate values parallel period functions make it straightforward to compare data with previous or future periods which is vital for identifying trends and seasonality and making datadriven decisions with the benefits of time intelligence functions refreshed it’s time to recap a few important time intelligence DAX functions the first important time intelligence DAX function is total year-to- date let’s say for example that Adventure Works wants to compute the real-time sales performance of its various product categories you can calculate year-to-ate from the sales table’s total sales column or measure the DAX expression to compute YTD is a measure called sales year-to-ate followed by the total year-to-ate function after the equal operator in your first parameter reference the total sales column from the sales table and aggregate the values using sum in the second parameter reference the order date column from the sales table the date in square brackets represents the date column of the date hierarchy powerbi IntelliSense provides the option to select other fields of the date hierarchy such as year or month but to create time intelligence measures you need to select the date one of the main product categories of Adventure Works is bikes the company wants to evaluate the sales trends of bikes over the summer months you can use the dates between DAX time intelligence function to compute the measure summer sales after the measure is executed you can add the bikes category as an additional filter to the measure to answer management’s question the DAX code for the measure should be a measure called summer sales followed by the calculate and sum functions to compute the values of the total sales column of the sales table then insert the dates between function which takes the order date column from the sales table as a date reference finally include the starting date and another date referencing the end date now let’s say the marketing executive of Adventure Works wants to evaluate the impact of her recent marketing campaign the original time for the campaign was 3 months and after a month its impact should be evaluated you can create a DAX time intelligence measure using the dates in period function to compute a measure for last month’s sales create a measure called last month sale followed by the equal operator and the calculate and sum functions to compute the values of the total sales column of the sales table next add the dates in period function which takes the order date column from the sales table as a date reference this is followed by the today function that takes today’s data as the starting time 30 represents the number of intervals finally day represents the unit of time adventure works CEO wants a sidebyside comparison of the company’s sales for the current and the previous year this will provide her with insights into the necessary improvements to sales and marketing strategies you can create a measure using the same period last year dax’s time intelligence function as follows a measure called revenue previous year then define var as the variable for the previous year’s revenue followed by the equal operator and calculate which computes the previous year’s revenue by filtering the revenue measure based on same period last year finally the return function displays the value of the entire expression a sales forecast is a vital component of an analysis and adventure works sales executive wants a report based on historical sales values that predicts the future growth of the company in terms of revenue and profitability you can use the date ad function in DAX to either compare the current period sales with the previous period or to predict the future period period here refers to year quarter or month for instance to compare the current month sales with the previous one the DAX script should be a measure called sales comparison followed by an equal sign then calculate computes the measure by filtering the revenue measure followed by date add which takes the order date column from the sales table as a date reference one represents the number of intervals the negative sign indicates that the intervals are back in time this is followed by month representing the unit of time you can modify the code to predict the sales for a future period by changing month to any other time period like year or quarter time intelligence DAX functions in PowerBI are indispensable for analyzing historical trends forecasting future outcomes and understanding the impact of time on your data these measures uncover the insights hidden in your raw data you need to master this skill to excel as a PowerBI modeler pass your Microsoft PL300 exam and become a certified PowerBI analyst as datadriven businesses are evolving so are the business analytical tools microsoft PowerBI stands out as a formidable business intelligence ecosystem offering profound insights through its rich array of features central to the effectiveness of PowerBI are measures which serve as building blocks for data calculations and visualizations previously you covered measures in detail in this video you’ll recap the three main types of measures with scenarios measures are essential for performing quantitative analysis and deriving meaningful insights from the data they provide a way to summarize calculate and compare data across various dimensions based on specific criteria and business requirements measures can be categorized into three types: additive semi-additive and non-additive let’s recap each of these types of measures additive measures are the workh horses of data analysis and provide the easiest summation additive measures behave as you expect they can be summed up or aggregated across various dimensions without losing their meaning adventure Works has a sales analysis report that displays the sales amount and quantity sold for individual transactions each transaction is then tracked with a specific customer region product category and date as a data analyst you can create simple additive measures to sum up the attributes across all given dimensions this will help the adventure works team visualize total sales and total quantity by product category region salespersons and time of course the next type is semi-additive measures these measures introduce a layer of complexity they can be summed across some dimensions but not all and the crux of the matter is time think of inventory on hand as a simple example while it is meaningful to sum the inventory by product or warehouse it makes no sense to sum by time semi-additive measures are often seen in scenarios where time plays a crucial role you can handle these using DAX in PowerBI by specifying which dimensions are suitable for summation and which are not this dynamic flexibility makes it possible to create insightful reports while leveraging the power of DAX in PowerBI let’s explore an inventory balance example if a warehouse has 35 mountain bikes in stock at the end of September and 62 mountain bikes at the end of October it is not accurate to say the warehouse had 97 mountain bikes for the two months together you will handle these measures using DAX functions like last date last non-blank and others you’ll review later finally let’s cover non-additive measures these measures lead you to advanced analytics non-addeditive measures defy straightforward summation across any dimension consider for example the profit margin measure while it is tempting to sum profit margin across products or time periods the results do not make sense you cannot add percentages in this manner you need to perform complex calculations to handle non-additive measures like percentages or ratios to produce meaningful summation dax functions like average X sum X and divide provide you with the toolkit to work with non-additive data thereby allowing you to craft sophisticated calculations that provide valuable insights let’s delve a bit deeper into the profit margin example profit margin is a percentage that represents the profitability of a business and is calculated by dividing the profit by revenue for example let’s say Adventure Works has four product categories and the profit margin of the individual product categories are 9% for bikes 5.5% for accessories 10% for components and 2% for clothes if you sum up the profit margin of these product categories you’ll get a total profit margin of 26.5% however this result is incorrect because it does not reflect the true overall profit margin of Adventure Works you need to employ other DAX functions to compute these types of complex calculations the skill to distinguish and handle additive semi-additive and non-additive measures is the key to generating accurate and actionable insights out of your data the use of appropriate DAX functions from its rich library empowers you to compute each type of measure with precision and to reveal the story hidden within your raw data as a data analyst you import data from disperate sources to your data model the imported data however may not contain the information you need to visualize the key to any analytical work is to reveal hidden insights trends and opportunities you may need to add tables to your data model to accomplish this in this video you’ll explore the types of calculated tables and scenarios where creating these tables is necessary at Adventure Works the executive management team needs answers to specific business questions based on a specific data set after careful investigation you realize the information required can be visualized based on the provided data set but it may require more time and resources a quick way to accomplish the task is to create additional tables in the data model using DAX calculations say for example the sales table contains several columns but you only need to present the summary or the date table is missing from the data model and you need to perform timebased calculations or you also want to perform some analysis but keep the original table intact for other analytical needs all these are scenarios where you must create calculated tables previously you’ve learned that cloned tables are the exact copy of any existing table or data model clone tables are important when you need to manipulate data without affecting the original table for example Adventure Works wants to analyze sales data without altering the original sales table as they want to keep it as a reference you can simply create a clone version of the sales table by writing the following DAX expression as clone table name equals all original table name or more specifically as sales cloned equals all sales you can also create calculated tables using DAX expressions by taking data from multiple sources some examples of calculated tables include combining specific data fields from the sales and product tables to compare various product categories and associated sales values normalizing the dimension table for instance the product table contains categories and subcategories with information you need to separate from the product table this you typically do by creating a snowflake dimension creating a common date dimension table for a data model using DAX to perform advanced time intelligence calculations the last example of a calculated table is combining two tables with the same structure while keeping the original tables unaltered for example suppose you received two different tables with the same structure for Adventure Works customers one for Eastern States customers and one for Western States customers and you need to combine them into a single customers table you can also use measure to create calculated tables in PowerBI for example consider the scenario where you’ve created a measure sales for Adventure Works this measure displays all sales across countries you can use this measure to create a calculated table displaying the individual sales by each country using the following DAX expression country sales is the name of the new calculated table sales in single quotes is the name of the original sales table sales in square brackets is the DAX measure used to create a calculated table total sales in double quotes is the name of a new column added to the calculated table creating calculated tables from pre-calculated measures is especially useful when you want to create a summary table from large data sets or when you want to create a table with data that does not exist in the original tables this can enhance data analysis and visualization capabilities in PowerBI now let’s explore the syntax of a few common DAX table functions you can use the add columns function to add calculated columns to a given table or table expression here is the syntax for using the add column function type add columns and within the parenthesis specify the table name from which you want to retrieve data follow this with the name of the new column enclosed in double quotes and then provide the DAX expression for the calculation you can add more column names and expressions as needed but these additional pairs are optional the summarize function returns a summary table for the requested totals over a set of groups the DAX syntax for summarize is as follows type summarize and inside the parenthesis first input the name of the table you wish to summarize next include the names of the columns to group by each enclosed in double quotes you can also add new column names in double quotes followed by their respective DAX expressions for calculated values adding these additional columns and expressions is not mandatory but can be done based on your data analysis requirements filters returns the values that are directly applied as filters to column name with filters inside the parenthesis simply specify the name of the table or column for which you want to retrieve the current filters applied in the context top n returns the top n rows of the specified table for top n within the parentheses start by specifying the number of top items to return follow this with the name of the table from which to retrieve these top items conclude by indicating the column to sort by and optionally the order of sorting ascending or descending and lastly union creates a union or join table from a pair of tables when using union inside the parenthesis list the tables you wish to combine ensure each table name is separated by a comma the tables should have the same number of columns and corresponding columns should have compatible data types by using DAX to generate calculated columns you can combine data from various multiple tables into a single table that opens a whole new door of analysis in practice you will encounter situations where creating calculated tables is the only solution to certain data challenges the skills you’ve gained will help you tackle these real world analytical tasks efficiently data is often like a complex puzzle with pieces scattered across various dimensions microsoft PowerBI offers a way to unravel this mystery by creating a data hierarchy hierarchies provide a structured way to organize and visualize data allowing users to uncover hidden insights and tell a compelling story adventure Works a multinational company sells its products across the globe the product department heads need not only an overview of the sales but also require a deeper level of understanding of the location of customers and the category and subcategory of products sold you can provide this information by creating a hierarchical visualization of the data powerbi provides a way to display information where managers can drill down to view the granular details about customers and products in PowerBI a data hierarchy comprises interconnected fields from the data set organized in a way to present data elements in ranked order it represents a structured relationship between data attributes typically organized from an overview level to the most granular the hierarchical structure simplifies data exploration and analysis by allowing users to focus on specific aspects of the data at different levels for instance in a sales data set you might have a hierarchy that starts with year drills down to quarter then month and finally day in certain cases you can also drill down to hourly details product geography and organizational hierarchies are some other examples of data hierarchies in PowerBI in a hierarchical structure the first level sometimes called the parent level is ranked over the other sometimes referred to as the child level this way report users can drill down from the parent level presenting the highest level of information to the lower levels in an order powerbi allows a maximum of five levels to be added to a hierarchy using a hierarchical structure to create your visualization enhances the user experience in understanding the data and provides a more comprehensive analysis common visualizations that can be used to visualize hierarchies include bar or column charts line charts heat maps and map visuals powerbi provides several options to use a hierarchy in visualizations for example you can enable inline hierarchy labels to sort data by hierarchy levels you can use the path DAX function to add a column for the entire path length this is important when you are working with an organizational hierarchy you can also create DAX measures to determine the path length of the hierarchy which helps you in determining the shortest and the longest path now let’s explore how you can create a data hierarchy in PowerBI to help Adventure Works analyze granular data launch PowerBI desktop and load the data from the Excel workbook containing Adventure Works sales data the data set contains two data tables a fact internet sales and a geography dimension table the geography dimension table of the model contains geographical information therefore it is advisable to generate a geographical hierarchy the first step is to format the location-based data for an appropriate data category to do this select the country field and then select country from the data category drop-own list now format state province name city and postcode as state or province city and postal code a globe icon appears before the field name which tells PowerBI that this is geographical information let’s visualize the sales data by geography in the report view of PowerBI desktop to do this select the column chart from the visualizations pane and bring the sales amount field from the sales table to the yaxis well of the visual in the x-axis a geographical hierarchy is needed to display the sales data at various levels of locations bring the country state or province and city fields to the x-axis in the same order a set of arrows appears in the top right corner of the visual indicating the drill down functionality to turn on the drill down select the second down arrow if you hover the cursor over any data point for example the United States a drill down icon displays on the tool tip to go to the next level of the hierarchy select the drill down icon in our example the next hierarchy level is states from here you can either drill up or down to the next level alternatively you can create a hierarchy in the data pane select the country field from the data pane and select more options which is represented by three dots a drop-own list appears where you have to select create hierarchy a new country hierarchy field appears in the geography table with country as the highest level of the hierarchy you can now add related fields to the newly created hierarchy one at a time to do this select state or province and from the drop-own option select add to hierarchy next you need to select the hierarchy where you want to add the field in the current project there is only one hierarchy available country hierarchy select the country hierarchy and the field is added as the second level repeat the process for city and postal code you can test the country hierarchy by creating a new visual remember to format your reports using the appropriate font style and colors data hierarchies are indispensable tools for effective and granular data analysis and reporting in PowerBI they provide structure and context to your data making it easier to navigate and drive trends and let your audience gain a deeper understanding of the information at hand in fast-paced analytics where every business is turning into a datadriven organization performance is everything businesses rely on business analytics tools such as Microsoft PowerBI to turn vast amounts of data into actionable insights but what happens when too many users interact with your reports and you need to optimize the speed and efficiency of your reports and dashboards the performance analyzer helps you evaluate the performance of various elements of your PowerBI reports and dashboards adventure Works uses PowerBI as a business intelligence tool to create stunning reports and visualizations however as the data sets grow with the growth of the company and reports become more complex there is a need to make sure the reports perform optimally you can implement PowerBI’s performance analyzer to evaluate the performance of individual report elements such as visuals and DAX measures you may recall that the performance analyzer is a built-in tool of PowerBI that allows users to diagnose and optimize the performance of their reports and dashboards it provides insights into query execution time data model performance and visual rendering enabling analysts to pinpoint bottlenecks and fine-tune the creative work slow responding reports and dashboards hinder productivity and may lead to customer dissatisfaction with the performance analyzer you can identify and rectify slow performing report components not only is speed critical but efficiency also matters by identifying and optimizing inefficient elements of your reports you can reduce resource consumption and enhance user experience a healthy data model is the foundation of your analytical work the performance analyzer offers insights into your data model performance helping you to maintain and enhance it the tool does not stop at query diagnostics it also helps to analyze visual renderings this means you can identify the problematic and slow rendering visuals and optimize them for faster loading now let’s review how to use the performance analyzer you need to launch your PowerBI report and access the performance analyzer from the view ribbon of the report view upon selection the performance analyzer displays on the right side of the report canvas the performance analyzer records the processing time required to update or refresh each report element for instance when a user interacts with a slicer to modify the visual a query is sent to the underlying data model and visuals are updated according to the interaction you need to select start recording to start recording with the performance analyzer the performance analyzer inspects and collects performance measures in real time each time you interact with a report element the performance analyzer displays performance results in its pane once you finish recording select stop and the performance analyzer will display information about queries data models and visuals in a userfriendly interface the information log contains the time spent completing the following tasks dax query if your report has DAX calculations the duration between the query sent to the data and the results retrieved is displayed in the pane visual display the time needed by a visual to display on the report canvas which also includes the time to retrieve web data other this is the time the visual requires for preparing queries waiting for other visuals to complete or performing other background processing evaluated parameters if your report visual contains field parameters the time spent on these will be displayed in this category this is in preview mode the performance analyzer records duration in milliseconds and the values indicate the difference between the start and end of any operation once you stop the recording you can save the results onto your local computer now you can identify areas that need optimization and make necessary adjustments to your DAX logic visual elements and data model to improve overall performance having reviewed how to use the performance analyzer let’s briefly explore some of its real life applications when working with large data sets the performance analyzer helps you optimize the reports to ensure they remain responsive in the case of complex data models this tool assists you in maintaining efficient performance in addition you can use the performance analyzer to fine-tune reports visuals elements and queries for faster performance where you have many report users the performance analyzer in PowerBI is your handy tool for faster and more efficient yet visually appealing reports and dashboards to succeed both in Microsoft PL300 and as an efficient data analyst you need to master the skill of diagnosing issues through the performance analyzer and optimizing your reports accordingly in the dynamic landscape of data the sheer volume of data itself is not a threat to meaningful analysis the key lies in how you handle the data transform it and create visually appealing and analytically insightful reports but often the amount of effort you put into creating a masterpiece doesn’t perform according to expectations due to the slow responsiveness of the visuals and queries this highlights the significance of performance optimization which is equally important as creating reports and dashboards in this video you’ll review how to improve report performance via cardality and summarization in Microsoft PowerBI imagine Adventure Works Microsoft PowerBI reports meticulously designed to dissect sales trends monitor inventory levels and analyze customer behavior are encountering a challenge with a colossal volume of transactional data streaming daily the reports are performing sluggishly you may recall that you can improve performance by reducing data although the PowerBI engine effectively handles extensive data minimizing the volume of data loaded into your data model is still crucial this is especially important when working with larger data volumes or anticipating substantial data growth over time there are many reasons to minimize the data volume loaded into the PowerBI model including your current PowerBI capacity may not support the larger volumes of data for instance PowerBI shared capacity can host a model maximum of 1 GBTE in size smaller data models can reduce resource contention by using fewer resources like memory and processing power increasing efficiency loading more models for a longer period helps reduce the eviction rate meaning the data is removed from memory less frequently this can result in faster queries as the data sets do not need to be reloaded into memory smaller data models also tend to refresh more efficiently resulting in decreased time to generate and deliver reports with up-to-date data or lower report latency finally fewer rows in a data table can lead to faster calculation and improve query performance powerbi supports many techniques to reduce the data loaded to the PowerBI data model in this video you will review two methods reducing cardality and aggregation or summarization let’s begin with reducing cardality previously you learned about the type of cardality between data tables throughout the development of the data model you either establish or modify the relationship between the tables you need to ensure the data types of the fields participating in the relationship establishment are the same you cannot create a functional relationship where the data types of the columns are different for example the column has a key column that might be set to a text data type if the column contains only the numeric values you must change the data type to integer and whole numbers to decimal numbers which performs better than the text data type in the PowerBI model changing the decimal number data type to a fixed decimal number also improves the performance as you learned in the previous DAX lessons when you create a DAX calculation in your data model the default data type is decimal number or general this means the results of the calculation display unlimited places after the decimal which hinders optimal performance you need to define the distinct data type with specified decimal places for best performance changing to fixed decimal places reduces storage requirements enhancing model performance the next technique is reducing data via aggregations aggregation refers to summarizing large volumes of data into more manageable summary tables to improve query performance by condensing detailed information into simpler higher level values consider an example where you have a large data set containing a record of each transaction for reporting you’re analyzing only the yearly or monthly sales or sales by region you can create aggregated tables that are imported to the data model in the current example you can generate aggregated tables from the sales table grouped by region or month according to your requirements this pre-calculated aggregation can be imported to the memory of PowerBI and will be more efficient in querying daily analysis powerbi also supports three storage modes to handle large data sets where you can define the storage modes of data tables for example a large fact table with millions of rows can be set to direct query while smaller tables can be imported to the model for improved performance aggregations offer several benefits that can help you improve model performance if you are handling a vast data set aggregations provide a faster and optimized query performance they assist you in analyzing the data and revealing insights without importing the entire data set into the model if users are experiencing a slower refresh time of the reports in PowerBI you can create aggregations to help speed up the refresh process the smaller size of aggregated tables imported to memory reduces the refresh time enabling a better user experience lastly suppose your company is anticipating a growth in sales volume by expanding its operations to new regions or adding new products to its inventory you can leverage PowerBI to create and manage aggregations as a proactive measure to futureproof the solution enabling a smooth scaleup optimization of your data model in PowerBI is not just a technical endeavor it is a strategic imperative for organizations and an analytical challenge for you as an analyst powerbi’s performance optimization unlocks a new door of analysis ensuring that every decision is not just datadriven but empowered by the speed and efficiency necessary to thrive congratulations on completing the data modeling section of this course a prerequisite to analyzing data and creating reports and dashboards in Microsoft PowerBI let’s recap the key takeaways you began with a journey into designing data models starting with a recap of schema design principles you reviewed the star and snowflake design the two major types of schemas used in PowerBI and worked through a hands-on activity building a star schema for adventure works by understanding the fact and the dimension tables you explored how to handle the inactive relationships between two data tables by implementing a role-playing dimension and using the DAX user relationship function as DAX and the evaluation context are fundamental to data analysis in PowerBI you recaped using the calculate function to alter the filter context of your calculations you also explored cardality the nature of the relationship between data tables types of cardalities and different cross filter directions in PowerBI you can either select single or both cross- filter directions determining the filter propagation in one or both directions of the related tables next you moved on to creating model calculations using DAX you recaped calculated columns the custom data columns you create in your data model using DAX you gained a detailed overview of conceptual foundations and practical skills related to creating and managing measures using a library of DAX functions measures hold the hidden information in your raw data empowering users to gain meaningful insights you reviewed sum sum x and calculate functions to compute aggregation measures which are the most common calculation used for analysis in any datadriven business you also explored implementing time intelligence measures as the time dimension is the foundation of any business analysis requiring historical analysis and future predictions dax offers a rich library of time intelligence functions to aggregate and compare data over time such as dates YTD and total YTD by using time intelligence functions you can compute things like moving averages temporal analysis and cumulative totals to gain insight into the overall performance and growth of the organization you also recap types of measures including additive measures like total sales or total cost non-additive measures for example profit and margin and semi-additive measures such as inventory level and current account balance you gained hands-on insight into replacing an implicit measure with an explicit one and creating a semi-additive measure after that your focus shifted to implementing a data model you started by identifying the need for calculated tables such as when a data model lacks a common date dimension table and how to create them in PowerBI you gained a solid understanding of DAX functions that you can use to create and manipulate tables in PowerBI you then explored creating hierarchies including date product and geographical hierarchies creating a hierarchy is a significant feature of PowerBI allowing you to create a hierarchical structure to analyze the overview and granular details of data within the same visual by using drill down functionality further you explored how you can add a hierarchy to slicers in addition to the standard PowerBI visuals you reviewed PowerBI’s Q&A feature which uses natural language processing to answer business specific and userdefined questions in visual form this feature is significant in the real world datadriven environment by making it possible for individuals regardless of technical expertise or department to use and gain insights into the data from your reports and dashboards you learned that PowerBI allows you to teach Q&A to customize the review questions synonyms and relationships to help PowerBI better understand your business needs finally you focused on optimizing model performance this began with a review of PowerBI’s performance analyzer a robust diagnostic tool within the PowerBI ecosystem that allows you to monitor and evaluate the performance of your report visuals data model health and DAX queries you can use the information the performance analyzer provides to optimize slow responding report components and enhance the user experience you explored improving report performance by choosing optimal data types and summarizing data you learned that PowerBI offers several techniques to reduce data size and volume which is important for avoiding slower reports reducing cardality and creating aggregated tables are the two most important techniques you can employ as data reduction strategies to enhance model performance in PowerBI building and managing a healthy and functional data model is the key to performing any analytical work in PowerBI and gaining meaningful insights from your data understanding the schema DAX logic and performance optimization can help you become a certified PowerBI analyst via the Microsoft PL300 exam as well as handle complex realworld data challenges visualizations act as a bridge between raw data and actionable insights microsoft PowerBI offers a wide array of visualization options for reports empowering analysts to create compelling data narratives in this video you’ll explore the analytical background of visuals in PowerBI to help you identify and implement the appropriate visual to address the business need the management of Adventure Works requested a comprehensive sales report for the past year the challenge is to select the right visuals that align with the data and the analysis objectives ensuring clear and insightful presentation of the sales performance powerbi features a broad spectrum of visualizations each tailored for specific data representation needs the visualizations in PowerBI can be broadly categorized into general purpose visuals and specific purpose visuals general purpose visuals include visuals like tables and KPI cards that are versatile and can be employed across various analysis scenarios specific purpose visuals include a range of visualizations each designed to cater to specific analytical needs like time series and geospatial analysis among others the general purpose visuals in PowerBI are tables and matrices which effectively display data in a structured tabular format allowing for easy comparison and analysis across multiple dimensions card KPIs or key performance indicators which are instrumental in highlighting critical metrics immediately enabling decision makers to quickly grasp the performance indicators that are crucial for their business objectives and lastly slicers which act as interactive filters allowing users to filter the data being displayed dynamically thus enabling a focused analysis powerbi offers numerous visuals each tailored for specific types of analysis used daily in modern enterprises the key to effective data visualization lies in aligning the visual with the analysis goal thus enabling a clear insightful and engaging data narrative let’s explore the various categories of analysis specific visualizations and the PowerBI visuals most suited for each time series analysis is a method to analyze timeordered data to discern the structure or functionalities underlying them it is an essential analysis in forecasting monitoring and anomaly detection the optimal charts for time series analysis are line charts and area charts line charts are the ideal and most common way of visualizing a time series analysis while area charts are suitable for tracking quantity over time while emphasizing the magnitude the next analysis type categorical analysis deals with data that can be segregated into multiple categories but have no inherent order or priority categorical analysis helps you to understand the distribution and relation of data across different categories the optimal charts for categorical analysis are bar and column charts and pie and donut charts bar and column charts are effective for comparing the magnitude of categories and easily identifying the differences among them pie and donut charts are best for representing the proportions of categories especially when dealing with a small number of categories to prevent visual clutter correlation analysis aims to find a relationship between two or more variables understanding correlations is foundational for prediction causation analysis and trend discernment the optimal charts for correlation analysis are scatter charts and bubble charts scatter charts are suitable for spotting relationships between two variables and understanding the strength and direction of the relationship bubble charts extend scatter charts by adding a dimension through bubble size allowing for an additional layer of analysis the next type of analysis is distribution analysis this type of analysis observes how values of a variable are spread or clustered over a range it’s crucial for statistical analysis allowing comprehension of data variability and central tendencies distribution analysis is suitable for spotting relationships between two variables and understanding the strength and direction of the relationship next there’s part to whole analysis this type of analysis examines how individual parts contribute to the aggregate it’s a widely used analysis in understanding composition analyzing contribution and comparing individuals to the total waterfall charts are the most widely used for partto-ole analysis as it’s highly effective in showing the cumulative effect of sequential positive and negative values the last type of analysis is geospatial analysis geospatial analysis examines data in terms of geographical or spatial relationships it’s instrumental in finding patterns understanding spatial distributions and making geographically informed decisions powerbi offers a variety of different map visuals including shape maps cororoplath or filled maps and arcgis maps shape and corropath or filled maps support external geographical files to draw a map arcgis maps are rich in map visualization features the array of visualizations in PowerBI provides a powerful tool set for analyst to convey data narratives effectively the right choice of visualization based on the analysis need is crucial mastering the art of selecting the right visual in PowerBI is a valuable skill that significantly augments the data storytelling proess of analysts to ensure Microsoft PowerBI visuals are of a professional standard it is important to explore both general and visual formatting settings in this video you’ll explore the available formatting options in PowerBI and how to implement formatting options lucas is tasked with enhancing an Adventure Works sales report with two visualizations let’s help Lucas explore all general visual and conditional formatting techniques in PowerBI launch the sales categorical analysis PowerBI file in this report two commonly used categorical analysis visualizations have been used column and pie charts lucas is tasked to investigate all available formatting and configuration options that could enhance this report select the column chart and navigate to the visualizations pane select the format visual tab this is where the formatting options for every visual reside the formatting options are split into two categories visual and general visual contains chart specific settings and general contains settings shared by all visualizations even the text box and shape visualizations share these settings let’s select the column chart and general options again to view them in detail the properties section is used to adjust the size position or padding of the visual it’s helpful when slight adjustments are necessary like moving the visual to the right the title section focuses on formatting the title of the visualization and provides numerous setting options like font size color background color alignment subtitles and even a divider lastly the effects section includes settings to format the visualization background visual borders and shadows when you navigate to the visual formatting settings the column chart specific settings appear here you can view settings for both axes modifying their range of values font or axis title you can even change the y-axis to logarithmic to display the results on a different scale when using disabled settings like legend and small multiples make sure that fields are using the respective visual slots the next settings allow you to add grid lines on your visual a zoom slider to magnify specific axis ranges modify the color of your columns and add data labels when you select the table visual note that the visual settings are adjusted to fit this visualization here you have style presets to easily modify the table some grid options as well as options to change the appearance of cell values column headers and the total finally to add conditional formatting to your chart you can enable it on your table visual columns by selecting any field and then selecting conditional formatting in PowerBI you can format the background and font colors you can also add data bars icons or even links to web URLs selecting a font color for example the conditional formatting window appears here you can format the font color of the table visualization this formatting can be conditional based on a custom rule that you can apply the specific value of any field in the data set or even a gradient based on a value powerbi keeps adding conditional formatting on various visualization aspects for example select the column chart navigate to the columns field and expand it a button with a function symbol appears on the right of the color field this indicates that conditional formatting can be applied to the columns dynamically altering the color based on specific criteria when you select this button the conditional formatting window appears indicating that these visualization columns can be formatted based on specific rules field values or with a gradient color just like for the table in this video you learned how to explore all the available formatting options in PowerBI and implement formatting options navigating through large data sets to find important insights is a common task in data analysis microsoft PowerBI helps ease this task with its robust slicing and filtering features in this video you’ll explore the available slicing and filtering options available in PowerBI these features are essential for data analysis projects making it easier for users to focus on specific data subsets and uncover meaningful insights in their reports the management team at Adventure Works requested interactivity to be added to the sales categorical analysis report enabling them to dynamically apply filtering in the report the ability to shift through extensive data sets focusing on specific data points is important when building business intelligence reports slicing and filtering for this reason is an essential tool for a PowerBI analyst facilitating interactivity in reports that offer a dynamic and engaging data analysis experience let’s explore slicing and filtering in PowerBI in more detail to identify the three main methods of filter applications slicers the filter pane and visual filters the first way of slicing and filtering a report is by using slicers slices are visualizations that act as filters enabling a user to make selections that filter data within reports to add a slicer to the sales categorical analysis report select the slicer icon on the visualizations pane and adjust it by dragging its edges drag date into the field box the slicer visualization automatically identifies the field as a date field and selects the slicer setting style between the second way of slicing and filtering a report is through the filters pane the filters pane is a central location where users can apply and manage filters to their reports at three different levels visual page or report level visual level filters apply to a single visual page level filters apply to all visuals on a page and all pages or report level filters apply to all visuals within a report add country region to the filters on the page section and select Canada this will immediately filter the report to display only the data for the table rows with Canada in the country region field an important aspect of the filters pane is the hide and lock features it provides to the right of the filter you just added a lock filter button is visible this feature prevents report users from changing this filter the hide filter button hides the filter and prevents users from knowing that a filter is applied finally the third method of filtering is through visualization filters visual filters are a direct method of filtering allowing users to interact with the visuals on a report to filter the data for instance selecting the blue color on the tree map will filter the rest of the report based on the selected segment this feature is what makes PowerBI stand out as a highly interactive business intelligence tool as all page visualizations are constantly interacting with each other with a click of a button understanding slicing and filtering is key to unlocking the full capabilities of PowerBI they not only simplify the process of creating interactive reports and focusing on specific data segments but also empower data analysts to quickly identify valuable insights imagine effortlessly navigating through vast oceans of data in Microsoft PowerBI just like a seasoned captain navigating a ship through turbulent waters with page navigation tools you can unlock your report’s full potential for you and report users in this lesson you will cover the core features related to navigation and sorting you will learn about how page navigation effectively streamlines the flow and readability of multi-page reports effectively utilizing bookmarks capturing and sharing specific reports and states exploring the sorting functionalities in PowerBI to visually organize data enhancing clarity impact and insights lucas is a data analyst with Adventure Works and has been tasked with enhancing the interactivity and user experience of the company’s sales categorical analysis report in PowerBI as this report is crucial for monthly sales meetings the report requires navigation improvements to help the sales team navigate data more efficiently and gain quicker insights lucas’ objectives are to streamline the report’s navigation across multiple pages create bookmarks for key data points to enhance presentations and apply sorting techniques for clearer data visualization page navigation in PowerBI is a feature used to create multi-page reports that are userfriendly and easy to navigate it allows users to move between different pages of a report and is essential for organizing information logically across multiple pages the implementation of page navigation in PowerBI involves setting up interactive elements like buttons or links that users can select to move to different report pages it provides a guided experience beyond clicking on tabs as it directs users through the report in a structured userfriendly way especially in complex reports page navigation is integral for assisting users through a report’s narrative especially in complex data sets or presentations there are several benefits to using page navigation in PowerBI reports they include an enhanced user experience these features collectively improve the navigation and understanding of reports making them more userfriendly and accessible for instance in a financial report the first page might provide an overall summary and subsequent pages delve into specific areas like revenue by region or departmental expenses all interconnected through intuitive page navigation the second benefit is dynamic data presentation bookmarks and page navigation enable dynamic storytelling with data allowing for interactive and engaging presentations for example in a market analysis report bookmarks can allow users to switch between different market segments time periods or product categories making the presentation interactive another benefit of page navigation is improved data organization sorting mechanisms help in structuring data effectively leading to better comprehension and quicker insights for example sorting mechanism can be applied to a sales table to organize data by revenue allowing users to quickly identify top performing products when utilizing page navigation it often leads to increased efficiency this is due to streamlining the process of exploring and analyzing large data sets saving time and effort for both report creators and viewers for instance bookmarks can be combined with sorting mechanisms creating different sorted views of a data set like sorting customers by purchase frequency or sales by region this allows for quick comparisons and analysis saving time for both report creators and viewers the final advantage to using page navigation tools is the flexibility in analysis navigation offers flexibility in how data is viewed and analyzed accommodating a variety of analytical approaches and styles bookmarks can be used to switch between different data filters or visualizations even on the same page accommodating various analytical approaches bookmarks in PowerBI are a powerful feature that can enhance report interactivity and storytelling bookmarks allow users to save specific views and states of a report enabling quick navigation to these points during presentations or analysis they are particularly useful in highlighting changes or comparisons in data over time creating bookmarks involves selecting and saving the current state of a report including filters slicers and the visibility of visuals where visualizations can be hidden or left in view in cases where specific report configurations and filters are used in a report they can be saved as bookmarks to easily navigate back to them without having to reconfigure the report these bookmarks can then be linked to buttons or other interactive elements allowing for a seamless transition between different views within the report sorting data in PowerBI reports is a fundamental feature that organizes data within visualizations making it easier to interpret and analyze it brings clarity to reports by arranging data in a logical order whether ascending descending or based on specific criteria sorting helps present data in a structured manner aiding in the quick identification of trends outliers or specific data points it’s essential for making reports more intuitive and insightful powerbi allows sorting of data in various visualizations like tables charts and graphs users can sort data based on different attributes such as alphabetical order numerical values or custom criteria to suit the specific needs of their analysis in this video you explored essential features in PowerBI that elevate the functionality and user experience of reports you learned how page navigation streamlines the flow of multi-page reports how bookmarks offer dynamic presentation capabilities and sorting mechanisms bring order and clarity to data visualizations these tools are invaluable for analysts like Lucas at Adventure Works as they make reports not only more interactive and engaging but also more insightful and easier to navigate by effectively utilizing these features PowerBI users can transform their reports into powerful tools for storytelling and data analysis driving more informed decision-making in Microsoft PowerBI the interactions between visuals in a report is a fundamental aspect that enhances data exploration and analysis this is due to the fact all visualizations can filter one another over the next few minutes you will discover how visuals utilize and share data and how they can be configured to interact with one another you will explore the key interaction types filter highlight and none and their impact on overall report dynamics understanding these interactions and how to choose between them depending on the specific business need in hand is crucial for creating cohesive and informative reports that allow users to delve into data with greater clarity and context there are three key topics you will learn about in this video specifically you will learn how to grasp the basics of visual interactions specifically how visualizations interact with a PowerBI report explore interaction types specifically filter and highlight and how they can be applied and lastly you will gain insights into the non-interaction setting and when it is appropriate to use it in a report lucas the data analyst at Adventure Works encounters a challenge with a report called sales categorical analysis the sales team has reported an issue where selecting a data point in a column chart unexpectedly wipes out the data in the tree map visualization realizing this is a visual interactions problem Lucas is tasked with troubleshooting and resolving it he discovers that the current setting is likely a filter interaction causing the column chart selections to overly restrict the data displayed in the tree map the way visualizations interact within a report is crucial for a comprehensive data analysis experience these interactions determine how selecting or hovering over data in one visual affects the data displayed in another there are three primary types of interactions filter highlight and none let’s start with filter interaction when you select a data point in one visual it acts as a filter for the other visuals in the report for example selecting a specific category in a bar chart will filter the data in all other visuals to show only data related to that category filter interactions are essential for drilling into specific subsets of data and analyzing them in the context of the whole report filter interactions provide a focused view allowing users to isolate and analyze specific data points across different visuals next is the highlight interaction instead of filtering out non- selected data the highlight interaction dims it maintaining the overall context selecting a data point in one visual will highlight related data in other visuals while dimming the rest a highlight is used when the context of the entire data set is required even while focusing on a specific section the highlight interaction helps to understand the relationship of one part to the whole providing a broader perspective of the data this option disables interaction between visuals where selecting a data point in one visual has no effect on others this interaction is useful when visuals are meant to function independently without influencing each other’s displayed data it is crucial for reports with visuals that represent different data dimensions or when independent data exploration is required understanding these interactions is necessary for effective report design in PowerBI by applying these interaction types you can create reports that not only present data in an organized manner but also offer intuitive and insightful data exploration experiences in the upcoming video let’s assist Lucas in configuring the interactions between the sales categorical analysis report let’s start by launching the sales categorical analysis report to identify the interactions between visualizations we know that the bike category contributes almost entirely to the total of sales amount which might prove to be an issue for interaction between visualizations selecting the bikes column of the column chart the tree map boxes are almost unchanged then selecting accessories and clothing categories you notice that those categories are such a small percentage that they are barely visible when filtered the reason this occurs is that there is a highlight interaction type from the column chart to tree map chart highlighting just the percentile of each category this makes it difficult for users to comprehend the filtering of the report so you need to modify the interaction to access the interactions between visualizations select any visualization for example the column chart the format tab will now appear on the ribbon select format and enable edit interactions this is an onoff button which is now enabled it shows the interactions of a selected visualization towards all other objects in the report having selected the column chart notice the icons above the tree map these are the three interaction options: filter highlight and none select filter to change the interaction type and press on the columns of the column chart to notice the modification the users can now clearly see the color of products with the most amount in sales for each category remember that it’s a good practice to always disable the edit interaction button when completing your modifications on interactions as it takes up a lot of memory and might reduce the performance of PowerBI desktop the strategic use of visual interactions in PowerBI filter highlight and none plays a pivotal role in crafting engaging and insightful data stories by understanding and applying these interaction types report designers can guide users through a more nuanced and comprehensive data exploration journey imagine you are a data analyst for Adventure Works creating multi-page reports and you have implemented slicers on some pages when you change a slicer on one page it doesn’t change on the others currently you are recreating the same filter over and over which can be tiring for you and with so many changes to implement any mistake will lead to poor user experience how can your workload be improved and lead to a better chance of a strong user experience in this video you will learn about the fundamentals of synced slicers in Microsoft PowerBI learning how to implement this feature and gain insights into the enhanced storytelling capabilities and improved user experience provided by synced slicers adventure Works wants to analyze their bicycle sales performance across multiple regions they’ve created a comprehensive PowerBI report with pages dedicated to sales data customer demographics and seasonal trends however a challenge arises in maintaining consistent analysis across these pages when users want to focus on specific regions or time frames this is where implementing synced slicers comes into play enabling a seamless unified view of data through the entire report project slicers serve as an effective method for narrowing down information enabling you to concentrate on a particular segment of the semantic model slicers provide the flexibility to choose precisely which values are shown in your PowerBI visuals there may be instances where you require a slicer to be active on a single page of your report while at other times applying the slicer across multiple pages might be more appropriate utilizing the sync slicers feature allows any selection made via slicer on one page to influence the visualizations across all the pages you’ve synchronized synced slicers are not just a cosmetic addition they are a functional necessity for creating cohesive and user-friendly reports here’s why they are essential first is navigation consistency synced slicers ensure that when a user selects one page it reflects across all other pages this consistency eliminates confusion and enhances the user’s ability to analyze data coherently the second necessity of sync slicers is time efficiency by avoiding the need to repeatedly set the same filters on each page synced slicers save time and streamline the data exploration process lastly is improved data storytelling in reports where data storytelling is crucial synced slicers help maintain the narrative flow they allow the story to unfold effortlessly across different pages without jarring interruptions or resets in filters now let’s explore how you can sync slicers across pages in PowerBI reports let’s get the slicers in sync for the current report the report is split into two pages the first page shows sales by product category and color and the second page details sales data for all products from the last two months at the top left corner of both pages there’s a slicer if you pick a country on the product category and color page it only changes the data on this page the details page hasn’t changed however if you activate slicer sync the same filter will apply to both pages here’s how to do it in the view tab of the ribbon select sync slicers this brings up the sync slicers pane on the right now select the slicer on the first page and in the sync slicers pane select the sync checkbox for both product category and color and details now whenever you select a country on the slicer in the product category and color page it’ll also update the details page with the same filter to check if it’s working properly I’ll select a country on the first page when I open the second page I notice that the selected country in the slicer remains as selected this is how you can quickly synchronize slicers on various pages in a PowerBI report the sync slicers feature in PowerBI is a critical tool for enhancing the coherence and usability of reports by allowing slicers to synchronize across multiple pages it ensures that filter selections are consistent thus providing a smoother and more intuitive experience for the user you are part of a team working on sales reports for the stakeholders at Adventure Works you’ve noticed that the way the designers arrange the visuals is causing confusion making it hard to spot related items as well as this there’s no consistency in how visuals have been named everyone’s been labeling them however they please which makes it even harder to locate the essential elements using the selection pane you can organize and group these visuals making everything much easier to manage and understand in this video you are going to learn how to name visuals group the related visuals and properly organize by layering them on top of one another grouping and layering visuals in Microsoft PowerBI simplifies report creation and management by organizing data in a user-friendly way enhancing the user experience through clear logical presentation the first step towards enhancing user experience in PowerBI is to clearly name your visuals this involves assigning each visual a name that is meaningful and relevant ensuring quick identification following this organizing the visuals in your report by grouping related visuals to create a report that is both well ststructured and userfriendly the next crucial aspect is layering these groups effectively this technique is about strategically arranging your data to guide the viewer’s attention ensuring that the most important information stands out first lastly the culmination of these skills is evident in the way you manage the visibility of various report elements the control over what and when information is displayed allows you to direct your audience’s focus to essential data significantly enhancing the overall experience in your PowerBI reports now let’s explore how this works in Microsoft PowerBI naming grouping and layering in PowerBI is done from the selection pane to open the selection pane go to view on the ribbon and select selection the selection pane will appear on the right side of the PowerBI desktop editor displaying all items on the current page you can select any name in this pane to identify which visual it refers to it’s important that you name these visuals properly to organize them in an appropriate way this is especially useful when you have many visuals on a page for example if I select text box it will highlight the report heading i can rename it as heading by doubleclicking the item and entering the updated name this can be done on any of these titles when I double click on any item it enables me to edit the name in this selection pane you can also change the layering of the items meaning you can rearrange the order in which visuals appear to better understand this select the insert tab on the ribbon select buttons and then blank from the listed options this will place a new button on the report page notice the new button item that now appeared in the selection pane i drag the new button next to the date slicer i select this button in the selection pane and using the up and down arrows I can change its order for example if I send it below the slicer it disappears from the report because it is under the slicer visual using this method you can bring any item to the front or send them back using the selection pane you can also group items from this pane let’s group the heading and the underline below this heading named shape select the shape item from the selection pane i then press the control key on the keyboard and select heading notice how these two items are now highlighted now I right click on either item select group and then group again this will create a new group of these two items to ungroup right click on this newly created group then select group and choose ungroup this way you can use the selection pane to change the item names group them and layer them on top of or below each other by grouping and layering visuals effectively you’re not just tidying things up you’re making the whole experience smoother and more intuitive for anyone seeing your reports use these techniques in your next PowerBI project to create reports that are not just visually appealing but also userfriendly and coherent in today’s fast-paced business environment the ability to access and analyze data on the go is increasingly important with a significant shift towards mobile device usage optimizing Microsoft PowerBI reports for mobile viewing becomes an asset for any organization this video highlights the importance of adjusting reports for mobile view and explores the capabilities of Microsoft PowerBI’s mobile layout view offering a strategic advantage in data accessibility by the end of this video you’ll be able to understand the significance of mobile optimized PowerBI reports explore the features and benefits of PowerBI’s mobile layout view and identify best practices for designing mobile friendly reports lucas a data analyst with Adventure Works is tasked with creating PowerBI reports that are easily accessible and readable on mobile devices his challenge is to ensure that these reports provide a seamless user experience maintaining readability and functionality across various mobile platforms lucas aims to make these reports not just accessible but also as informative as possible for his team who often rely on quick data insights while on the move the way users interact with data has fundamentally changed mobile devices with smaller screens and touch-based navigation require a different approach to data visualization compared to traditional desktop displays recognizing this shift PowerBI introduced a dedicated feature for the unique demands of mobile platforms the mobile layout view the PowerBI mobile layout view is a feature within PowerBI desktop that allows creators to design and customize reports specifically for mobile devices this view addresses the unique challenges posed by smaller screens and touch interfaces key aspects include mobile optimized layout this layout differs from the standard view focusing on simplicity and readability on mobile devices it allows users to rearrange visuals to fit a vertical layout which is more suitable for mobile devices interactivity and functionality despite the change in layout the mobile view retains the interactivity and functionality of the desktop reports users can still filter slice and interact with the data in meaningful ways customization and flexibility powerbi provides flexibility in designing these reports users can choose which visuals to include how to arrange them and even create different views for different devices consistency in data representation while the layout changes the data and its representation remain consistent with the desktop version this ensures that users get the same insights regardless of the device they use preview and testing powerbi allows creators to preview how their reports will look on various devices helping them make necessary adjustments before publishing let’s look at an example of adjusting the sales categorical analysis report for mobile navigation using PowerBI mobile layout view to access the PowerBI mobile layout view you select the phone screen button on the bottom left of the page using this button enables you to switch between the desktop and mobile layout views the mobile layout view appears on screen it features the mobile layout canvas a grid layout where you adjust the visualizations to fit any mobile screen the page visuals pane where all the reports visualizations are listed and visualizations where the format settings of any selected visual will appear to adjust the report for mobile platforms drag and drop any visualization from page visuals to the canvas such as the date slicer and the tree map fitting both to the screen you can use the visualizations pane to format the visualizations such as enabling data labels for the tree map chart these changes won’t reflect on the desktop layout view the sales categorical report will now appear with these configurations when launched through PowerBI mobile ensuring the seamless navigation of the report using any kind of mobile device when designing reports for mobile devices using PowerBI’s mobile layout view it’s important to be aware of certain considerations and limitations that can impact the user experience these include tool tips availability while the tool tips are not active in the mobile layout canvas during the design phase they become accessible to users when viewing the report through the PowerBI mobile app metric visuals interaction on the mobile layout canvas metric visuals are set to be non-interactive this means users cannot interact with these visuals in the same way they might in a desktop report slicer selections consistency slicer selections made in the mobile layout do not transfer when switching to the web layout conversely if you switch from the web layout back to the mobile layout the slicer selections will reflect those changes additionally when a report is published any slicer selections displayed will be those set in the web layout regardless of whether the report is viewed in a desktop or mobile optimized view optimizing PowerBI reports for mobile devices is a strategic step towards enhanced data accessibility and decision making in today’s mobile ccentric world this feature is instrumental in ensuring that valuable data insights are always at the fingertips of decision makers regardless of their location or the device they use have you ever noticed numbers in your data that seem unusual and just don’t seem to fit the data analysts in Adventure Works have in their recent sales report some unusual figures stand out and need investigation these odd numbers might be a coincidence or they might be indicators of hidden issues in the Adventure Works data or in the business as a whole they might also be clues that can lead the Adventure Works team to deeper business insights these odd numbers are referred to as anomalies and outliers in this video you will learn what anomalies and outliers in data are you will also discover how these odd figures can reveal deeper insights and information about your data and how you can use them to inform smarter business decisions the Microsoft PowerBI sales report prepared by the Adventure Works Analytics team shows a profit downturn for a month in the middle of the cycling season typically this is a time associated with peak sales profits rose in another month without a corresponding increase in sales volume the team needs to understand why these numbers are appearing to determine if any action needs to be taken let’s explore the terms anomalies and outliers and discover some examples of each anomalies are data points that occur outside the expected range of values and which cannot be explained by the base distribution base distribution is the normal pattern that data follows anomalies are often caused by invalid data outliers are data points significantly different from the rest of the data there are often values that deviate from the other values in a data set however outliers can be explained by the base distribution the main difference between an anomaly and an outlier is that an anomaly is often an error or a rare unexpected event whereas an outlier is an extreme but expected value that still belongs to the pattern of the data so how would you recognize an anomaly let’s step through some examples a sudden spike in website traffic that cannot be explained by any known marketing campaigns or events a sudden drop in sales for a product that has been consistently selling well a sudden increase in the number of errors in a system that has been running smoothly could also be an anomaly a customer who is aged 200 years old now let’s step through some examples of outliers a top student who scores 100% on a test while the class average score is 70% a house that is significantly larger and more expensive than the other houses in a neighborhood a stock that experiences a sudden price change that is not in line with the rest of the market a customer who is aged 99 let’s explore how to use a scatter chart visualization in PowerBI to identify anomalies and outliers in a data set this data set contains advertising spending and profits based on the same campaign in different media over several months it looks problem free but we can’t be sure until we process this data with some visuals like scatter charts to visually spot outliers and anomalies we’ve plotted this data set using a scatter chart on this report page placing campaign ID in the values advertising spend on the x-axis sales revenue on the y-axis and platform on the legend there are some data points which stand out in the scatter chart some of these data points demonstrate a slight variation while others diverge significantly these unusual data points might be anomalies or outliers the orange data points represent the social media campaigns the majority of them did well and the chart shows that when the advertising spend increased sales also increased the CO4 campaign is an exception to this however this will not be considered an anomaly because you know that the Adventure Works website was down on that day despite the ads continuing to run on social media because you can define a reason why C00004 performed badly you can define it as an outlier another campaign C006 didn’t perform well despite its high advertising spend this was a print media campaign and on further investigation you found that that type of media was not popular and this is why the C006 campaign failed the campaign is also considered an outlier because you can explain the reason why it varies so much from the other campaigns the online campaign C023 also stands out as different from the other data points in its category in this case the reason why this campaign has performed so differently has not yet been identified until you have the exact reason why this campaign performed exceptionally well you would consider this an anomaly and not an outlier anomalies and outliers in data are critical indicators of deviation from the norm while outliers can be explained within the context of existing data anomalies hint at underlying issues or exceptional occurrences that demand deeper analysis identifying these can lead to improved strategies and more informed decision-making processes in business operations orders at Adventure Works have increased recently as more of their customers are enjoying outdoor pursuits the data analysis team are kept busy analyzing data related to the large volume of orders being processed and shipped and creating reports to present the results their reports contain many of Microsoft PowerBI’s bright and colorful visuals it’s a large amount of data and the team wants to ensure that viewers of the report can quickly spot patterns and insights two PowerBI features grouping and bin will help them to create visuals that are concise organized and easier to draw conclusions from in this video we will explore what groups and bins are in PowerBI and how they can help you to organize your visuals to deliver information and insights more effectively as a data analyst at Adventure Works you’re part of the team creating a sales report which will provide a summary of the current order fulfillment situation your first task is to compare the number of items that have been shipped with those that have a status of processing or cancelled the management team particularly wants to be able to easily access information on shipped orders data grouping will allow you to group orders according to their status that will make the order fulfillment status more visible and make the data as a whole more coherent the management team also want to know the overall number of shipped orders in different value ranges the data bidding process will be invaluable for this it will enable you to organize the results based on the order value ranges and this in turn will allow the management team assess the pattern of which orders were more valuable let’s explore how the grouping and bin techniques work grouping refers to the process of combining data rows based on specific column values in PowerBI this technique allows you to create a new column that represents aggregated data the purpose of grouping is to simplify and streamline your data visualization by categorizing similar data points together you can group data related to product categories regions or customer segments making it easier to analyze and present summary information for instance you can group states into regions like East Coast West Coast and Central or you could group products by categories such as electronics clothing and home appliances to understand combined sales numbers bidding involves dividing a numeric column into ranges or bins bidding is useful when you want to analyze data in discrete intervals by categorizing numeric values into bins you can gain insights into the distribution of data and identify patterns for instance you could bin ages into ranges such as 1 to 18 19 to 30 31 to 45 and so on if you’re monitoring website performance you could bin website load times into categories like fast less than 1 second average 1 to 3 seconds slow 3 to 5 seconds and very slow 5 plus seconds to identify user experience issues let’s explore how you can use grouping and bin to help Adventure Works display the order status and the value range of the orders let’s begin by applying data grouping to a visual this clustered bar chart shows orders across multiple product regions it includes all shipped orders as well as orders that were cancelled or are still showing as processing let’s group those orders which have a status of canceled or processing to do that right click on the order status field in the legend well and select new group when the group pop-up appears press the control key on the keyboard and select cancelled and processing then select the group button and finally select okay the clustered bar chart updates with this new group data instantly now the orders with a status of canceled or processing are displayed in the same group and you can see the total value for these orders summed up together the management team asked you to display the orders in different value segments you can use the bin feature to achieve this create a new report page and add a clustered bar chart select the product region and order status fields from the data pane ensure that the product region is placed on the y-axis and order status on the x-axis ensure that the clustered bar chart visual on the report page is still selected open the filter pane drag the order status field from the data pane into the filter pane in the filter pane select the order status filter box and then select shipped from the drop-own checklist the visual updates to show only the shipped orders as requested by the management team they also wanted to have the orders displayed in order value ranges so let’s create bins to achieve this in the data pane right click on the order total field on the data pane and select new group from the shortcut menu in the new popup enter 5,000 as the bin size and select okay a new entry appears in the data pane called order total open parenthesis bins close parenthesis drag this new entry to the legend well now the data is properly binned you can hover on any bar to see how many orders are in each of these bins in this video you explored what the grouping and binning features are and how to apply them in your data set by using these two features to organize the results displayed in the PowerBI visual you made the visual clearer and more concise the use of grouping and binning in the chart visuals has enabled additional analysis to be implemented artificial intelligence commonly referred to as AI has revolutionized the world of data analysis and visualization making it easier for businesses to uncover insights and make informed decisions microsoft PowerBI Microsoft’s popular business analytics tool has embraced AI with a range of AI visuals that empower users to delve deeper into their data in this video you will explore three key AI visuals available in PowerBI key influencers decomposition trees and forecasts you will learn how these AI visuals are applied in PowerBI and how they are utilized by data analysts to improve the key factors behind business results gain a detailed overview of data breakdown and predict future trends the Adventure Works management team has noticed a concerning trend a significant drop in bicycle sales despite a surge in interest in outdoor activities they want to identify the reasons behind it the management team need to discover why the results for this product range are not as good as expected they also want to identify the product ranges that are performing well and predict if the current trends in sales will continue the data analysis team in Adventure Works can use AI visuals to provide this information they begin with the key influencers visual the key influencers visual helps users identify the factors that influence a particular outcome or metric in their data the visual uses machine learning to analyze and identify the factors that have the most impact on a selected outcome as the name suggests the key influencers visual examines potential influencers ranks them based on their impact and presents these insights in an interactive easy to understand format it helps business users to understand what drives specific results or why events occur by using the key influencers visual the data analysis team can identify the adventure works products and product categories that are not performing well they can also obtain key insights on how to reverse the current downward trend in bicycle sales key influencer visuals in Microsoft PowerBI offer many benefits first they help to identify causal factors key influencers help you pinpoint the variables or factors that have the most significant impact on your chosen outcome allowing you to make datadriven decisions second key influencer visuals offer intuitive visualization the visual representation of insights is easy to interpret making it accessible to both technical and non-technical users key influencers visuals also incorporate drill down capability you can drill down into specific features to gain deeper insights and an understanding of how different values within those features affect the outcome lastly there is a statistical significance with key influencer visuals the tool calculates statistical significance ensuring that the relationships it uncovers are robust and reliable the data analysis team uses another AI visual called a decomposition tree to help the management team optimize their product lines the decomposition tree visual is an AI powered visual in PowerBI that allows users to break down a measure into its underlying components a measure in PowerBI is an aggregated combined or calculated value the decomposition tree visual is particularly useful when you want to understand the factors contributing to a particular metric it offers a structured approach to dissecting data hierarchies and providing clarity in identifying the most influential components this type of information and insights can be crucial for optimizing strategies and resource allocation the management team at Adventure Works wants to gain a clear understanding of sales trends and the data analysis team uses the decomposition tree visual to provide information on how revenue breaks down by product decomposition trees in PowerBI offer many benefits they are ideal for breaking down complex measures into their underlying components making data more digestible and actionable a decomposition tree is a hierarchical visualization it allows users to explore the contribution of different factors at various levels of detail this visual also allows for interactive exploration users can drill down into each component for deeper insights and perform ad hoc analysis the tool calculates statistical significance ensuring that the relationships it uncovers are robust and reliable now that the management team at Adventure Works has a clearer idea of the factors influencing low sales in one product range and of the patterns and breakdown of their revenue they want to move on to forward planning their goal is to proactively adjust production plans with the appropriate models to stay ahead of the competition by capturing emerging markets and effectively meeting future customer demands the data analysis team can facilitate this by using AI features in PowerBI to forecast future bicycle demand trends the forecasting feature in PowerBI leverages AI to predict future values based on historical data this is vital for businesses that want to make datadriven predictions and anticipate future trends forecasting provides three important benefits forecasting enables you to predict future trends the forecasting tool helps organizations anticipate future values based on historical data aiding in proactive decision-making and planning another key benefit of using forecasting is scenario analysis users can explore different forecasting scenarios adjusting parameters to discover how changes impact future predictions lastly forecasting allows users to use datadriven planning businesses can use forecast to optimize inventory management resource allocation and budgeting microsoft PowerBI’s AI tools including key influencers decomposition trees and forecasting make complex data easy to understand they do this by analyzing patterns and trends in the data which assists businesses in planning and decision-m these tools turn complicated data into useful information helping companies respond to today’s needs prepare for the future and stay ahead in their fields some viewers of your report still have difficulty quickly absorbing the core data insights you’ve learned a lot about working with data in Microsoft PowerBI and you’ve created your reports according to best practices your reports use appropriate visualizations and they look great is there anything else you can do to help the viewers of your report focus on the key points yes you can use reference lines and error bars to insert further analytical visuals this video will explore the concept of reference lines in PowerBI and the application of different types of reference lines in data visualization you’ll also learn about error bars and use different types of error bars to represent data variability and uncertainty by the end of the video you should be able to recognize appropriate scenarios and visuals where you can effectively use reference lines and error bars renee Gonzalez is the marketing director at Adventure Works she asks you to enhance a Microsoft PowerBI sales report she wants to add an average reference line to display a clear sales performance benchmark she also wants to incorporate percentage error bars into a sales by product chart to give the sales managers a better understanding of sales fluctuations reference lines are used to highlight significant data points or trends these lines serve as benchmarks or guides to make data easier to interpret a reference line allows viewers to quickly identify key points like averages medians or specific thresholds they play a crucial role in highlighting deviations understanding distributions and setting performance targets there are several types of reference line in PowerBI choose the one that best interprets your data an average line marks the average value across a data set this is useful to compare individual data points against the overall average a median line indicates the median or middle value a feature that is especially helpful in skewed distributions percentile lines display a specific percentile giving a better understanding of the data spread a constant line or x-axis y-axis line represents a fixed value it is often used for benchmarks or targets min and max lines are used in charts to highlight the lowest and highest values in a data set providing a clear visual reference for understanding the range and distribution of the data and a trend line helps identify patterns or trends in data aiding in understanding data movements over time error bars are used to represent variability or uncertainty in data visualizations an error bar extends from a central point in a chart such as a specific line of a line chart or a bar of a bar chart the error bar visually demonstrates the potential range of values around a data point with the specific lower and higher bound highlighted in the tool tip this feature is particularly important in conveying precision reliability and potential errors in data in addition to displaying a range of values error bars also provide context and depth to the data points allowing for a more nuanced understanding of the data for instance in a financial report error bars can illustrate the potential fluctuation in revenue forecasts helping investment managers grasp the level of risk or uncertainty involved there are different error bar types choose the type you need depending on how they should be calculated and applied over the visualization the by field type of error bar allows you to specify a particular field in your data set to determine the range of the error bars it is useful when you have specific error values for each data point with by percentage the error bars use a percentage to calculate the error range this is particularly helpful when you want to display a consistent percentage error across all data points uh by percentile type will provide insight into the distribution of data points by displaying the range within a specific percentile for example a 25th to 75th percentile error bar indicates the interquartile range covering the middle 50% of data points these error bars help in understanding the central trend and spread of the data and the standard deviation type calculates the error range based on the standard deviation of your data it’s commonly used to indicate the variability of the data around the mean let’s discover how you can use the power of reference lines and error bars to add data insights in PowerBI the sales report contains two column charts the one on the left distributes the dollar sales amount over the customer country field and the other one distributes it over the product color field let’s explore how reference lines and error bars can help us interpret this data let’s start with the sales amount by customer country column chart select it and navigate to the visualizations pane then to the analytics pane component which is located below the icon of a chart in a magnifying glass the analytics pane has all the analytics metrics that PowerBI can apply to your visualization to add a horizontal line giving the average of the sales amount value select average line choose add line and turn on the data label section also expanding its options adjust the horizontal position to right so that the average value will be visible on the visualization and modify the style to be both so the users are clear about what’s being depicted with the reference line moving to the other sum of sales amount by product color visualization let’s add error bars to showcase the potential fluctuation of sales based on color select the visualization and navigate to its settings in the analytics pane once again error bars are at the bottom of the analytics expand this section and choose on for the options field box then directly below expand the type option to select an error bar type to be applied on the column chart select by percentage and modify the upper and lower bounds to be 5% the error bars are applied to your visualization you can hover over any column to display how the figures of any color will be modified based on a 5% increase or decrease in the sales amount this video highlighted the importance of reference lines and error bars in PowerBI both are key tools for enhancing data visualization reference lines aid in identifying and comparing key data points while error bars provide crucial insights into data variability and precision in summary reference lines serve as benchmarks or indicators helping to highlight key data points error bars offer a visual representation of the variability or uncertainty within the data adventure Works has streamlined its data analysis thanks to Microsoft PowerBI to keep making datadriven business decisions Adventure Works needs to be able to visualize performance tracking this is a crucial business metric for instance how is its customer satisfaction rating how close is it to the required goal and how can Adventure Works compare satisfaction ratings across different regions metrics and scorecards are the answer they are PowerBI tools that Adventure Works can use to track measure and report on key business goals and outcomes in this video you will explore the fundamentals of metrics and scorecards you’ll also learn how to create and customize metrics and discover how to build effective scorecards adventure Works needs a scorecard in PowerBI service to track the company’s ambitious sales target jamie the CEO wants a real time updating metric that accurately reflects the progress towards the sales goal this metric is the focal point of the scorecard which will also encompass other key performance indicators metrics in PowerBI are quantifiable measures that serve as key indicators of business performance essentially they are datadriven benchmarks used to track and assess the efficiency and success of an organization’s processes initiatives or strategies metrics in PowerBI are not just static numbers they are dynamic and interactive elements that update in real time reflecting the latest data the real-time tracking capability of metrics means that businesses can respond promptly to changes metrics can be customized to suit specific business needs such as tracking sales targets monitoring customer satisfaction levels or measuring operational efficiency scorecards in PowerBI are a step further in data visualization and analysis scorecards display a collection of related metrics on a single comprehensive dashboard providing a broad view of business performance this consolidated view is vital for managers and decision makers it encapsulates critical data points and trends in an easily digestible format that can reveal how business areas interconnect and impact each other scorecards and PowerBI are highly customizable organizations can tailor the information to align with their strategic objectives and key performance indicators or KPIs this includes the ability to set and track goals visualize progress and identify areas needing attention or improvement let’s create a scorecard with metrics for Adventure Works to track its sales amount target sign into PowerBI service with your credentials navigate to the left sidebar of the platform and locate the metrics icon select metrics to take you to the metrics page on the top right select plus new scorecard a new scorecard opens which you can start populating with metrics on the right of untitled scorecard select the edit pencil to rename the scorecard to adventure work sales goals all scorecards are saved in my workspace by default but you can move it to another workspace by selecting file and then move scorecard select the adventure work sales workspace to move the scorecard to and select continue the scorecard is now ready for the first metrics to create one select new metric name it sales amount goal and assign the admin account as the owner together with yourself on the current value field select set up to provide an actual figure from your data set instead of a manual number choose connect to data select the all reports tab and search for sales report select sales report then select next to move to the next step the report is previewed in the metrics window on the report there is a card visualization showcasing the total amount of sales select it to confirm the measure being used the current value as well as the filters and slicers affecting this value select connect to drive this measure onto your metric on the next field box final target input 30 million as the goal for the total sales amount a small box appears as you type the number aiding you in formatting the figure add a status on the metric which could be on track since the sales team is close to hitting the required goal let the start date be the default date given and assign a due date for the team to hit the target for instance this could be the end of the year all metric settings are now configured so you can select save to add the new metric to the scorecard the scorecard is now ready for users to access to share the scorecard and its metric goals with other Adventure Works members on the top menu of the scorecard select share and for instance select Renee the marketing manager to share the scorecard with her this video explored metrics and scorecards in Microsoft PowerBI illustrating their critical role in tracking and achieving business goals metrics in PowerBI provide quantifiable indicators that reflect the success of or progress to specific objectives scorecards give a comprehensive view combining multiple metrics into a holistic view of performance using these tools can empower organizations to align their strategies with datadriven insights ensuring that decisions are informed and goal oriented congratulations on completing visualizing and analyzing data in Microsoft PowerBI during these lessons you’ve gained insights into key data analysis concepts and tools in PowerBI and worked through practical activities for a deeper knowledge of these topics let’s recap what you learned and the key takeaways from each topic you began by learning more about the wide choice of visualizations available in PowerBI general purpose visualizations such as tables and matrices card KPIs and slicers are versatile as they can be used in a variety of analysis scenarios powerbi also offers many visuals that are tailored for specific types of analysis and this lesson explored which visualization is appropriate for specific analysis types for example categorical analysis is best displayed in bar and column charts or pie and donut charts scatter and bubble charts are more appropriate for correlation analysis histograms waterfall charts and maps were also discussed this lesson also examined the specific and general formatting settings that enhance the appeal and readability of visualizations in your reports modifying the size or position of visual elements or applying format changes such as font size and color to titles and data labels can add clarity and impact to visualizations you also learned about conditional formatting which can be used to dynamically highlight critical data points and add visual variety the slicing and filtering features in PowerBI allow you to dynamically adjust visuals and focus on specific data points slicers allow for intuitive selections and enable you to refine the data represented in all the visuals on a report page the filtering feature can be applied in the filter pane which manages filters at different levels visual level filters apply to a single visual page level filters apply to all visuals on a page and report level filters apply to all visuals within a report you also had an opportunity to learn about the tools in PowerBI that business users can use to export data for further analysis or presentation for example the analyze in Excel feature allows them to work with PowerBI data sets directly in Excel this offers a familiar environment for in-depth analysis and custom report creation another feature pageionated reports is ideal for creating print friendly formats these reports are designed for easy reading on paper or PDF and they can accommodate detailed data and complex layouts you then learned how to enhance reports for usability and storytelling this lesson began by exploring how smooth page navigation can improve readability and flow in multi-page reports the use of buttons or interactive links creates a seamless transition between different pages and guides users through the report’s narrative bookmarks captures specific report views and states enabling quick access during presentations and highlighting data changes over time sorting organizes data within visualizations making it easier to identify trends and insights the way that multiple visualizations within a PowerBI report interact with each other enhances data exploration and analysis filter interactions cause a change in one visual to filter data on another this refineses the display data based on the selection and allows users to isolate and analyze specific data points across different visuals another option highlight interactions does not filter out non- selected data instead it emphasizes selected data in connected visuals while the unselected data is dimmed and not filtered out this provides a clear view of how parts relate to the whole lastly there is an option none which completely disables the interaction between visuals doing this keeps the visuals independent without any interaction which can be useful for standalone data presentations you learned that syncing slicers in PowerBI reports improves the user experience with synchronized slicers a selection made on one page applies to all other pages this streamlined approach reduces confusion saves time and maintains the narrative flow you are also introduced to the selection pane where you can manage the report elements here you can clearly name individual visuals to ensure quick and easy identification you can use the selection pane to group visuals and provide structure to the report the selection pane also allows you to layer these groups this helps you to guide the report viewer through the data by controlling the order in which the visuals appear finally this lesson focused on how to adapt a report for mobile use the PowerBI mobile layout view it demonstrated how to modify the visual elements and layout for better readability and interaction on a smaller screen size in the final lesson you learned about the features in PowerBI which help you identify and analyze patterns and trends in your data it demonstrated how to recognize anomalies and outliers you were provided with examples of both and shown how to use scatter charts to identify them in PowerBI recognizing these types of discrepancies is essential for uncovering underlying issues or exceptional events and leads to smarter business decisions and strategy improvements the lesson continued with an explanation of grouping and binning in PowerBI grouping consolidates similar data points into categories which facilitates efficient summary visualizations bidding in contrast segments numeric data into ranges aiding in distribution analysis finally you learned about PowerBI’s AI tools which provide insights that can inform planning and decision-m key influencers to identify critical factors affecting outcomes decomposition trees to break down complex metrics and forecasting to predict future trends from historical data you should now have a powerful tool set in PowerBI for creating reports the first item in this tool set is the wide array of charts offered by PowerBI which you can use to convey insights features such as bookmarks grouping and layering visuals offer a way to create a smooth narrative for the viewer filtering and slicers help them to drill down to deeper insights techniques such as detecting outliers and anomalies data grouping and binning and using AI visuals provide a solid foundation for accurate data analysis in the world of data and reports having a centralized location where teams can work together is beneficial for all involved that’s where Microsoft PowerBI workspaces come in workspaces are more than simple folders they are special team rooms where analysts can add and share their charts reports and data in this video you will learn about what Microsoft PowerBI workspaces are and how they can benefit your work you will explore the different roles people can have in these workspaces and learn how these roles can make teamwork in PowerBI smooth and efficient at Adventure Works you are responsible for

creating and managing reports for a variety of teams the sales team requires regular updates on their performance metrics the marketing team tracks campaign results and the customer service department looks for feedback on user behavior each team creates its own set of data visualizations often leading to a collection of reports scattered across different platforms however using the PowerBI workspace feature you can set up workspaces for each of the sales marketing and product teams then each team will have its centralized room to create share and discuss their specific reports first let’s explore what PowerBI workspaces are powerbi workspaces are places to collaborate with colleagues and create collections of dashboards reports data sets and pageionated reports powerbi provides two types of workspaces personal and shared your personal workspace is a private area for individual tasks while shared workspaces are designed for team collaborations where members can jointly develop and fine-tune reports workspaces can contain a maximum of 1,000 data sets or 1,000 reports per data set workspace offers a feature called roles which helps to manage access control on these resources understanding and properly utilizing the roles within PowerBI workspaces is important to ensure effective collaboration and content management assigning the correct role to each user is vital to maintain data integrity security and efficient workflow powerbi offers four types of roles: admin member contributor and viewer let’s start with the admin role the most powerful role the admin has full control over the workspace including content creation member management and workspace settings adjustments they can add or remove members change roles and even delete the workspace next you have the member role members have the privilege to add modify and delete content in the workspace they can collaborate with others and share the workspace content but cannot change workspace level settings after the member role is the contributor this role is slightly more restricted than the member role contributors can add and modify content but cannot delete items from the workspace they also cannot share content with others lastly we have the viewer role the viewer role represents the most limited level of access within a workspace viewers are primarily consumers of content and their permissions are confined to viewing the materials available within the workspace they do not possess the right to modify or delete any content making them ideal for scenarios where readonly access is required having established your understanding of workspace roles let’s consider workspace role capabilities when an individual belongs to a user group they receive the role you have designated if a person is part of multiple user groups they inherit the highest level of permission from the roles they have been assigned in PowerBI service a user group refers to a collection of users who are grouped together based on certain criteria roles or purposes these groups can be leveraged for various functionalities including content sharing and permission management powerbi’s workspace offers a unique and powerful feature the ability to create template apps these are preset customizable structures that serve as a foundation for building specific data visualization applications once created they can be shared not just within the organization but also externally this external sharing capability enhances the utility of template apps rather than confining data visualizations and reports within organizational boundaries businesses can distribute these template apps to customers partners or other stakeholders the usefulness of these template apps lies in their flexibility when customers receive a template app they aren’t just locked into viewing static predefined data instead they can connect these templates to their own data sets now that you’ve learned about Microsoft PowerBI’s workspace tools you can explore ways to help your teams collaborate and use data more efficiently from setting roles that decide who can do what to offering readytouse templates it streamlines many tasks imagine you’re tasked with presenting multiple reports and data sets to teammates across various departments it will be convenient to bundle everything neatly together and offer it as a unified online package this not only simplifies your presentation process but also enhances accessibility for a wider audience this is precisely the type of solution that Microsoft PowerBI workspace apps look to provide streamlining and enhancing your data sharing capabilities in this video you are going to learn about PowerBI workspace apps what they offer and how to create and share them with your audience adventure Works faces a data sharing hurdle different departments need various PowerBI dashboards and reports to operate effectively the finance team requires sales data the marketing team are keen on customer insights and the supply chain team wants to view inventory levels sharing this data separately will be challenging this is where PowerBI workspace apps can assist you in generating these dashboards and reports using this feature the data analysis team can group related content into specific apps for instance all sales related reports and dashboards go into one app while customer insights go toward another these apps are then published to the appropriate teams ensuring everyone has access to the relevant information this improves workflow and efficiency for you and the data analysis team in PowerBI you can create official packaged content and then distribute it as an app these can be distributed to a wide audience such as an entire organization or to specific groups or people apps are created in workspaces you can choose a selection of reports dashboards and data sets from a workspace to distribute as an app you can then publish the finished app to large groups of people in your organization to create or update an app you need a PowerBI Pro or premium per user known as PPU license for app consumers there are two options the workspace for this app is not in a PowerBI premium capacity the workspace for this app is in a PowerBI premium capacity if the app is not in a PowerBI premium capacity all business users need PowerBI Pro or premium per user licenses to view your app if the workspace for the app is in a PowerBI premium capacity business users without PowerBI Pro or premium per user licenses in your organization can view app content however they can’t copy the reports or create reports based on the underlying data sets let’s consider how you create apps you can start the app publishing process when your workspace has content when you enter your workspace you will notice a create app button which will be your starting point you’ll be taken to the application settings area where you can set the name of your application add a description choose a logo and select the theme color for your application after that you can select which content you want to include in your app and you can sort content as you please once you are happy with the content selection you must select the audience for this application having created your app you must create and manage the audiences engaging with the app an app audience is the group of people you choose to share your app with in the audience tab there is a centralized place to decide who has access to your app and to what extent think of it as your control room where you can set up different audience groups for your app you might want to give access to everyone in your company or just want a specific group or certain individuals to have access with PowerBI apps you can create multiple audiences for your app and show or hide different content to each audience you can also set some advanced options like if your audience can share the data set or build new content with the data set in this app once you have the audience and the content they can engage with it is time to publish your app once the app is published it can be accessed by your intended audience you can come back to the app and update the settings and the published app will reflect the changes in a few minutes once the app is published it can be accessed via the URL or by searching for it from the app marketplace app consumers in PowerBI service and in PowerBI mobile apps only see the content based on the access permissions for their respective audience groups by default consumers see the all tab view which is a consolidated view showing all content that they have access to in this video you’ve learned about the process of setting up audiences in PowerBI deciding on the content visibility for each group and the steps to effectively publish and share your app microsoft PowerBI subscription and alert features enable users to remain informed about significant shifts in their data with data alerts users can establish notifications that activate when dashboard data surpasses predefined limits along with data alerts subscriptions ensure users consistently receive updates on their reports and dashboards in this video you will learn about Microsoft PowerBI subscription and alert features to keep you consistently informed about crucial data changes and how to utilize them effectively the newly appointed director of the strategic planning department at Adventure Works is eager to make a measurable impact with the recent launch of ebikes in Adventure Works it’s essential for the director to have a firm grasp on the daily sales figures however being new to the company’s PowerBI setup navigating through the PowerBI dashboards can be timeconuming to streamline this the business intelligence team establishes a PowerBI subscription focused on eBike sales metrics every day the director receives an email snapshot of the prior day sales enabling immediate datadriven strategic discussions powerbi subscription and alert features are tools that redefine the way businesses approach data analytics it is important to note that to activate subscriptions and alerts the content must reside in premium capacity or be tied to a premium per user license to support nearrealtime data flows data sets must be configured for scheduled refreshes or direct query connections with data alerts users can establish notifications that activate when dashboard data surpasses predefined limits along with data alerts subscriptions ensure users consistently receive updates on their reports and dashboards let’s first explore subscriptions with subscriptions timely delivery and tailored report dissemination becomes seamless eliminating a laborious manual process and ensuring that stakeholders are always informed there are many benefits of using subscriptions in Microsoft PowerBI with subscriptions you can schedule automatic delivery of reports on a recurring basis email or chat digests of key report pages to stakeholders set different schedules like daily weekly or monthly delivery customize data views with parameters and rowle security and eliminate the need to manually distribute reports users can set up to 24 subscriptions per report or dashboard with unique recipients times and frequencies for each subscription subscriptions can include a snapshot and link to the report or dashboard or a full attachment of the report or dashboard you can also create dynamic per recipient subscriptions which are designed to simplify distributing a personalized copy of a pageionated report to each recipient of an email subscription now let’s turn our attention to alerts alerts in PowerBI notify users when data meets defined conditions such as surpassing sales targets dropping below inventory thresholds or any other measurable value set within the system alerts shift from passive data monitoring to proactive and timely decision-making allowing businesses to harness real time data intelligence effectively the benefits of using alerts in PowerBI include getting realtime notifications when data meets thresholds responding quickly to insights instead of passive monitoring receiving dynamic metric alerts account for data variability ingestion alerts notifying you on data set refreshes getting push notifications via email mobile and Microsoft Teams chat and shifting from reactive to proactive data analytics with subscriptions and alerts microsoft PowerBI analysts can build out robust notification strategies ensuring stakeholders always have visibility into the data they care about this keeps them informed of critical metrics and enables proactive responses to data trends and anomalies in today’s datadriven world how can data analysts discern between trustworthy Microsoft PowerBI content that holds reliable information and content whose accuracy hasn’t been tested microsoft PowerBI’s features of promoting and certifying content hold the answer promoting and certifying content in PowerBI can elevate data credibility and can elevate the credibility of your data and ensure it is trusted as reliable content in this video you will learn about the differences between promoting and certifying PowerBI content their respective use cases and the implications of each method for content creators and consumers the marketing team at Adventure Works detects a noteworthy increase in sports bike sales in Europe after compiling the data a PowerBI report is generated highlighting the sales trends and key insights after compiling the data a PowerBI report is generated highlighting the sales trend and key insights recognizing its value the report is promoted within the European sales division and given its potential relevance to global strategies the upper management deems it fit for companywide sharing before its wider distribution the central PowerBI team thoroughly reviews the report ensuring it aligns with global standards once certified this report will be accessible across all regions its certification badge becomes an assurance of its precision and significance influencing strategic decisions throughout Adventure Works global operations promoting content in PowerBI is like giving it a stamp of approval when content is marked as promoted it signifies that it aligns with specific organizational benchmarks for accuracy and reliability however it is crucial to note that while it has met these preliminary checks it has not been subjected to an exhaustive vetting process when content like a report or data set is promoted it is made available for a wider audience to discover and consume promoted content appears in content packs and curated content lists in the PowerBI service promoting makes the content visible to more users but does not validate or endorse it any user with edit access to a workspace can promote content from it certifying content is more specific and detailed than promoting content it requires setting up a content certification policy and process with designated reviewers reviewers validate content to ensure it meets standards and best practices before officially certifying it certification offers a greater level of trust and validation when content is certified it means it has passed through a rigorous scrutiny process adhering to the standards set by the organization this is often a testament to its quality accuracy and overall trustworthiness there are four key aspects of certifying content in PowerBI they are review process expert validation of data quality and adherence to best practices governance implementing strict organization standards while certifying contents visibility certified content is marked with a badge for easy recognition trust indicates high level approval and reliability for all users in the organization when certifying content it requires admin setup of content certification policies certified status expires unless reertified within the policy period let’s explore the key differences between promoting and certifying content when it comes to level of trust promoted content signifies the content is trusted by the creator and might have undergone peer review certified content implies organizational approval often by a central team or authority indicating the highest level of trust with visibility promoted content appears in shared and recommended sections for end users certified content stands out with a distinct badge in the service ensuring users can instantly recognize its elevated status with regards to governance promoted content allows for decentralized governance where individuals or departments can decide the criteria certified content typically requires centralized governance with strict criteria that content must meet to achieve certification next we have the audience promoted content is ideal for departmental or team level sharing where the audience knows the creator and trusts their expertise certified content is best for organizationwide sharing where the audience might not be familiar with the creator but trust the centralized certification process lastly is the review process promotive content might involve peer reviews or departmental checks while certified content often involves strict review by experts or a central BI team including checks on data sources calculations and visualizations in this video you’ve learned about content promotion and certification in Microsoft PowerBI and the key distinctions between each process these two methods are vital for distinguishing trustworthy data and ensuring its credibility some of your data is in cloud-based storage but your other data sources are on premises do you have to move the on- premises data to the cloud to be able to combine and analyze all your data no microsoft PowerBI connects to many data sources microsoft PowerBI data gateways are used to connect PowerBI cloud-based data analysis technology and the data source on premises the gateway is responsible for creating the connection and passing data through in this video you will discover what PowerBI gateways are and how they can help organizations manage on premises data that will later be shared with different types of users adventure Works operates across North America Europe and Asia it uses its global data sources to analyze market trends to make smart business decisions effective decision-making depends on up-to-date reports based on the latest data that’s why the team needs a solution to synchronize the on- premises data sources like SQL Server Excel files and Microsoft Dynamics CRM with Microsoft PowerBI service with the gateway in place every morning when a regional manager logs in they get a dashboard showing not just their own store sales from their on premises sources but also data from other branches across the world despite originating from a server thousands of miles away the data is upto-date and ready for use managers can compare their sales with other regions identify trends and adjust their local strategies accordingly a PowerBI data gateway is an application that connects PowerBI cloud-based data analysis technology and on premises data sources such as SQL server databases or Excel spreadsheets it is required whenever PowerBI must access data that isn’t accessible directly over the internet gateways are responsible for creating the connection and passing data through and they can be installed on any server in the local domain running Windows Server 2012R2 or later there are three types of gateways available personal mode standard or on premises mode and virtual network data gateway with a personal mode gateway only one user connects to data sources and sources can’t be shared with others this mode can only be used with PowerBI and is ideal when one person creates reports and doesn’t need to share data sources the standard or on premises mode gateway allows multiple users to connect to multiple data sources that are secured by virtual networks this mode is well suited to complex scenarios in which multiple people access multiple data sources the virtual network data gateway facilitates secure connections for multiple users to various data sources protected by virtual networks as a Microsoft managed service it eliminates the need for manual installation the virtual network data gateway is particularly effective in handling intricate situations where numerous individuals need access to diverse data sources simultaneously who is the gateway for what type of user with personal mode individual analysts want to manage their own reports and sync personal data sources with the cloud whereas with the on premises mode admins set up the gateway and configure it the BI team uses the gateway to get up-to-date data for their reports what is the connection type you can use the personal mode to import data or schedule refresh the standard mode is used to grab refresh or run direct query how is the data managed each user handles their own data in personal mode in the standard mode the company manages data centrally for all users what is happening with data supervision can we oversee the data there is no supervision in personal mode users are on their own in the standard mode there’s a central system to watch over all the data the final factor to consider is compatibility personal mode works only with PowerBI the standard mode works with PowerBI various apps flows and more the gateway is responsible for creating the connection with PowerBI online service and syncing the local data let’s examine some of the gateway details the gateway is installed on a server in the local domain during installation credentials are stored in local and PowerBI services credentials entered for the data source in PowerBI are encrypted and then stored in the cloud only the gateway can decrypt the credentials the gateway controls access to the local data when an online tool wants data it asks the gateway the gateway checks asking and if they have permission grants access the gateway doesn’t store data it just connects and transfers when data in PowerBI needs updating the gateway passes the request to the local data once the data responds the gateway sends the updated info back to PowerBI one of the standout features of the gateway is the ability to set up scheduled refresh this means that at specified intervals the gateway will automatically fetch the latest data ensuring that online reports and dashboards are always updated finally let’s check some business use cases for PowerBI data gateways organizations with multiple locations or teams spread across different regions can face challenges in accessing a centralized data source the data gateway ensures all teams have uniform access to the same data source data can change rapidly for instance there can be continual updates in global markets for businesses to make informed decisions they need realtime access to data the gateway ensures that the data in online reports and analyses is always up to date a security consideration to remember is that when you use a data gateway direct connections to the online premises data sources are minimized only the gateway communicates with the data source providing an added layer of security all data transferred is encrypted and the established connection is outbound this reduces the risk of security vulnerabilities in this video you learned about Microsoft PowerBI gateways gateways help organizations keep databases and other data sources on their on- premises networks yet allow secure use of that on- premises data in cloud services organizations have a lot of data but not everyone needs to access all of it all the time and some data is sensitive in nature and access to it should be restricted rowle security or RLS is a powerful and exciting data governance capability in PowerBI that enables you to control access to the organization’s data at a granular level it allows you to restrict data visibility for different users or groups ensuring that each user can only access the data they are authorized to view in this video you will explore different types of role security and roles and how to configure them in PowerBI the BI team in Adventure Works is working on quarterly reports and forecasts as their data grows they often need to protect their reports and control access among teams in a report they want to grant certain teams access to specific visuals while restricting access to those visuals for others this security challenge led Adventure Works to implement rowle security rls allows them to precisely manage who can view data and particular visuals within a report providing a tailored and secure experience for each team rowle security controls the data viewable by users based on predefined roles and rules the role is like a group the user belongs to and the role or rules can be designed based on columns of the data set there are two types of rowle security static RLS and dynamic RLS static RLS is the rowle security method to use when you have a fixed set of users and roles like when you have some predefined roles like manager product lead customer marketing lead and so forth in your team you can create these types of roles and apply filters within PowerBI desktop using its rowle security editor static RLS is suitable when you have a small fixed list of users and a simple RLS logic in the report dynamic RLS is a flexible approach because it operates with the user attributes and conditions stored in the data itself it operates by using a centralized role assignment table containing user attributes like role assignments user ids and filter conditions relationships between this table and the primary data tables are established and DAX expressions are used to dynamically filter data based on the user’s role and attributes dynamic RLS is ideal for scenarios where user access is based on varying criteria such as region specific data access or complex role assignments whatever rowle security you create you must always test your configurations rigorously to guarantee accurate and secure data visibility across users testing might mean you just open your report as a specified user and check the data visibility in the modeling ribbon there is a choice called view as that will allow you to simulate a user login and check if the RLS is working as expected let’s create some static and dynamic RLS in the Adventure Works reports first let’s start with static RLS this is the Adventure Works world sales report on the modeling ribbon select manage roles and create a new role called manager Europe we want people in this role to view data from Europe only select the sales table select more options the three dots next to it and select the region field now in table filter DAX expression add this DAX expression open square bracket product region close square bracket equal to open double quotes Europe close double quotes and select save this DAX expression means that any user who belongs to the manager Europe role will only view sales data related to the Europe region to test if the static rowle settings are working properly return to the report view in your PowerBI editor and check if you can view sales data for every region now on the modeling ribbon select view as and check the manager Europe role and select okay this will immediately apply the RLS restrictions on the report and you get sales data for only the Europe region all other regional sales data is hidden you can exit this restricted view by selecting stop viewing since everything is working as expected publish the report to your workspace and add some users to the manager Europe role go to your workspace and select the data set named world sales report choose more options the three dots next to it from the drop-down select security in the role level security dialogue select the manager Europe role and add users to this role then select save with this static role security setup when users in this role view the world sales report they will be able to view sales data related to Europe but will be unable to view sales data from other regions for a more flexible filtering approach you can create dynamic row security return to your PowerBI editor to start applying dynamic RLS for example inside the PowerBI editor model view of your report you can have a table with all the regional managers email addresses and the product regions they belong to this table is related to the sales table using the product region field if you create a dynamic RLS when the managers view this report they will get only sales data related to their corresponding regions return to the modeling ribbon and select manage roles let’s delete the previously created manager Europe role and create a new one named managers this time select the sales table and add this DAX expression sales open square bracket product region close square bracket equal to lookup value open parenthesis managers open square bracket product region close square bracket comma managers open square bracket email close square bracket comma user principal name open parenthesis close parenthesis comma managers open square bracket product region close square bracket comma sales open square bracket product region close square bracket close parenthesis when finished select save this DAX expression checks the currently logged in user’s email against the manager table then filters the product region based on the product regions this user belongs to to test if the security settings are working properly return to the report view and check if you can view sales data for every region on the modeling ribbon select view as check the newly created manager role you also need to check the other user role and input one of the manager’s email addresses from the manager table notice how the report view changed and you are viewing sales data only for the regions assigned to this manager you can select stop viewing to return the report to the normal unfiltered view return to the home ribbon and publish this report to your workspace then open your workspace in the PowerBI service area and go to the security setting of your data set add as many users as you want to this new manager role the dynamic role security is active for this report so when users view the report based on their email address and assigned regions in the PowerBI data set they will view only relevant sales data this way users will have access to filtered data dynamically based on their email and product regions rowle security or RLS is a powerful feature in PowerBI to filter data based on various conditions and roles by establishing the right relationships and using appropriate DAX expressions PowerBI can filter data based on various conditions ensuring that each user sees only the data relevant to their specific permissions always test your RLS configurations rigorously to ensure users data visibility is accurate and secure team collaboration is crucial for proper data analysis the challenge presented by collaboration is to ensure the correct distribution of data within your organization discover how PowerBI’s robust permission management settings can help you maintain control over critical data sets at Adventure Works ensuring data integrity while enabling effective collaboration in this video we’ll explore aspects of permission management for data sets and workspace apps you work as a Microsoft PowerBI data analyst at Adventure Works and there are occasions when you need to share certain data sets with your colleagues your colleagues can either reshare these data sets or create new reports based on them however some of these data sets hold significant importance for the organization and even though they are shared among users you do not want anyone to modify the data set in addition to standard sharing there are times when you also need to share all items in a specific workspace with other users or teams as workspace apps nevertheless you still require precise control over some of these items like reports or data sets ensuring that various teams can only access relevant items the Microsoft PowerBI service offers various permission management settings for data sets and workspace apps which can be incredibly helpful in this context let’s quickly review some key terms data sets are the core collections of data that you work with in PowerBI often representing various aspects of your organization’s data workspace apps in PowerBI allow you to share entire workspaces including data sets dashboards and reports ia workspace app is a full data package that can be shared with specific users or teams ensuring a comprehensive sharing experience now to briefly review the topic of permissions with data set level permissions PowerBI service enables you to assign specific permissions to data sets while sharing you can ensure that although colleagues can access and utilize the data they cannot make changes to it this ensures the sanctity of vital data sets then there is workspace apps permissions in some cases you need to share all files within a particular workspace with other users or teams using workspace apps with PowerBI’s permission management you can maintain granular control over who sees which reports this means different teams can access only the reports that are relevant to their needs keeping your data organized and secure to check how many workspaces reports or dashboards are affected by a data set you can perform what is known as impact analysis to do this you go to your workspace and hover on a data set then select the more options three dots next to it and select show lineage this opens the lineage view for your workspace items where you can view which items are connected to each other on the right side of the screen it also shows the impacted workspaces reports and dashboards for this data set you can always perform impact analysis by selecting show impact across workspaces under each data set to exit lineage view on the top right corner in your workspace you select source view this will take you back to the previous list view where you can view all the items in this workspace as a list let’s experiment with permissions in PowerBI service to begin open your workspace to set permissions for a data set select more options the three dots next to the data set and select manage permissions from here you can add users to your data sets at the top select add user in this grant people access dialogue you can type the username or email address and then select the appropriate permission level using the check boxes for example if you don’t want this user to make any changes to this data set uncheck the allow recipients to modify this data set checkbox once added all users will be shown in this permission view you can make further changes by selecting more options the three dots next to a user and removing or granting permissions you can also fine-tune permissions for your new or existing workspace apps we have already discussed how to create an app and select an audience in previous lessons let’s discover how to update the audience for an existing workspace app open your workspace and at the top select update app select the audience tab here you can fine-tune all the settings related to the audience for an app on the right side in edit audience you can modify the current audience for example currently this app is shared with all users in the entire organization you can change it to some specific users by selecting specific users or groups and then typing their name and selecting update app alternatively you can select new audience and choose other users with different permissions for example you may want to share it with some other user but this time you want to allow them to share the data set among the users in this audience group you can select advanced settings then check allow people to share the data set in this app audience you can also select allow people to build content with the data set in this app audience just in case you want to allow the creation of new reports based on this data set to complete select update app and select update again on the confirmation popup and finally closing the published popup that is a demonstration of how you can manage permissions for a specific data set or for workspace apps inside your PowerBI service area powerbi’s permission management settings offer a robust framework for maintaining data integrity while facilitating effective collaboration at organizations like Adventure Works whether you’re safeguarding critical data sets or sharing workspaces these tools help you to apply access control to your data congratulations on reaching the end of these lessons in deploying and maintaining assets you explored creating monitoring connecting to and maintaining workspaces data sets and dashboards in Microsoft PowerBI let’s recap what you’ve learned so far you began the first lesson by exploring the concept of a workspace you learned that a workspace is a specialized area in PowerBI that holds important assets like data sets reports and dashboards its advantages are that it helps to organize assets for easy management provides security through access control only permitted users can access workspaces enables collaboration teams can use workspaces to build reports and allows analysts to update or modify data quickly when creating a new workspace you must consider workspace roles workspace roles determine who can perform each task viewers can view content but can’t modify it contributors can add and modify content members can alter content and add new members admins have full control over the Workspace assets and its members during this lesson you learned how to share Workspace assets as an app creating an app requires a PowerBI Pro or premium per user license the technical process of creating apps in PowerBI was outlined beginning with selecting create app in the workspace leading to an application settings area where one can name the app add a description set a logo and choose a theme color content can be selected and sorted for inclusion in the app which is followed by selecting and managing the audience powerbi allows the creation of multiple audience groups for an app enabling tailored access and content visibility you also learned how to manage assets in a workspace you can import assets directly into a workspace by uploading them or publishing them from your PowerBI desktop when the changes are made you can always publish them again which will update the previously published reports and data sets in addition you learned about setting up subscriptions and alerts in PowerBI service which allows users to receive regular updates and notifications based on data changes these tools enhance user engagement by automating the distribution of insights and ensuring timely awareness of critical metrics the lesson continued by exploring the steps required to promote and certify contents in PowerBI promoting and certifying are crucial for establishing trust and standardizing data quality across the organization thereby enabling users to identify and rely on the most accurate and relevant business intelligence assets the lesson ended with a detailed guideline on various global options for files within PowerBI such as data load and report visualization knowing how to configure these settings is important because it allows for more tailored and efficient data processing enhances visual representation and ensures a more seamless and intuitive user experience the next lesson started with the concepts of a data gateway and how it can help PowerBI data analysts and organizations a data gateway serves as a bridge between PowerBI’s cloud services and on premises data sources such as SQL databases or Excel files whether you are a data analyst working on your own or working for an organization you can sync your data with data sets hosted in PowerBI service using these data gateways and always keep these data sets up to date by setting up schedule refresh there are three types of data gateway personal mode is for single user use and this is suitable for individual report creators the standard mode also known as on premises mode supports multiple users and data sources and it’s used for complex access scenarios lastly the virtual network data gateway allows multiple users to connect to various data sources within virtual networks without any installation managed by Microsoft this lesson also discussed details of rowle security or RLS in PowerBI service a feature that allows for more granular control over access to data rls enables creators to define permissions on data rows so that users will only view data relevant to them enhancing both security and user experience this is particularly useful in organizational scenarios where data access needs to be restricted based on user roles or departments ensuring that sensitive information remains confidential while still providing valuable insights to authorized personnel finally this lesson covered the management of permissions for data sets and workspace applications effective permission management enables selective sharing of data sets and workspace apps allowing the designated individuals to access the data sets and create reports from these data sets the workspace audience management tools allow for sharing with the entire organization or customizing access for users additionally impact analysis tools are available to determine the connectivity and potential effects on workspaces reports and dashboards when there are updates to a data set you’ve reached the end of our summary on deploying and maintaining assets keep practicing your practical skills with sample data sets reports and dashboards and remember you can always revisit any item in the course to revise a topic by playing a video viewing a document or engaging with an activity best of luck with your studies the Microsoft PL300 exam is a professional certification in Microsoft PowerBI for aspiring analysts the exam tests your knowledge and skills in the technical and business requirements of data modeling analysis and visualization in PowerBI in this video you’ll discover the recommended strategy to maximize your chances of passing the exam PL300 Microsoft PowerBI data analyst a successful exam with a good grade is achievable if you are well prepared and practice some basic strategies one of the best ways to prepare is to take a practice test before the exam this way you can monitor your progress and identify the areas requiring more study or attention you have taken knowledge checks graded quizzes and completed exercises throughout this course these are designed to help you monitor your progress while preparing for the real exam you’ll be able to complete the PL300 mock exam a little later focusing on topics and key skills measured in the proctored exam the topics include preparing the data modeling the data visualizing and analyzing the data and deploying and maintaining assets during this program you have covered the skills measured in the PL300 exam and gained significant hands-on experience using the realworld data set of Adventure Works now it’s time to practice what you’ve learned the PL300 mock exam is based on a similar style and format to the proctored exam you can revisit any lesson to revise a concept if you need to review anything this practice exam is intended to provide an overview of the style wording and difficulty of the questions that you are likely to experience on this exam these questions may differ from those you could encounter in the exam and the practice exam is not illustrative of the length of the official exam or its complexity for example you may encounter additional question types such as drag and drop build list order and case studies you’ll also encounter exhibit and active screen questions like drop-own menus option boxes and complete a statement these questions are examples to provide insight into what to expect on the exam and help you determine if additional preparation is required review some possible exam formats and question types from the Microsoft documentation to get a feel for an exam in the reading preparing for the exam you can access Microsoft’s exam sandbox environment which was created to demo the interface that hosts exams to protect exam security Microsoft does not specify exam formats or question types before the exam microsoft continually introduces innovative testing technologies and question types and reserves the right to incorporate either into exams at any time without advanced notice in the mock exam you’ll have 150 minutes to complete the final practice exam which consists of 50 questions on completion of the exam you’ll be presented with your overall score and the questions you answered correctly once you’ve completed the PL300 mock exam it’s time to focus on the real exam a good exam strategy for the PL300 exam can be summarized with a checklist of what to do on the test day when test day arrives you should follow these tips to prepare ensure that you are well rested and nourished eat a meal or a snack and try not to drink too much water so you don’t need the bathroom during the exam give yourself enough time to get set up the last thing you want is to feel hurried or be late for the exam remember to bring your current governmentissued ID which must match the name on your Microsoft certification profile use your phone to capture the required headshot and ID if you’re unsure and require more details check the official documentation from Microsoft and Pearson View you’ll find links to these resources in the reading preparing for the exam the PL300 is a closedbook exam meaning you cannot bring any study or exam materials to the examination a score of 700 or greater is required to pass when it comes to answering the exam questions you can use these strategies keep calm and read the entire question before checking the answer options if multiple answer options exist try eliminating those you know are incorrect by using this process of elimination you can cross off all the incorrect answers read every answer option before choosing a final answer don’t rush and pick the first answer if you’re having difficulty with a question move on and return after you’ve answered all the questions you know try not to spend too much time on only one question ensure that you have enough time to attempt all the questions before checking them at the end you may be unable to change some of your answers so ensure you answer questions correctly avoid second-guessing yourself and changing your answer this can often be counterproductive you can complete the PL300 mock exam later focusing on the topics and key concepts this exam does not employ negative marking if you’re unsure of a question try making the best educated guess possible the important thing to always remember is that a successful blend of preparation test strategy and exam technique will help you maximize your chances of obtaining certification best of luck on a brisk Monday morning you step into your office ready to tackle the terrain of data as a seasoned PowerBI specialist your manager stops by your desk her expression a mix of excitement and anticipation she places a challenge before you i need you to explore Microsoft Copilot in Bing a powerful artificial intelligence or AI tool it’s designed to revolutionize problem solving and enhance productivity i believe it’s quite transformative and I want your insights on it as you switch on your computer the weight of opportunity settles in your mind races with possibilities could co-pilot streamline the development process and uncover new insights that haven’t been considered yet instead of reacting to market changes now there’s an opportunity to proactively shape them it’s more than just analyzing data it’s stepping into the future of generative AI microsoft Copilot is a powerful AI tool that enhances how users interact with data and digital content across various platforms with its design deeply integrated into Microsoft’s ecosystem including Bing and Microsoft Edge C-Pilot serves as an everyday AI companion that simplifies tasks boosts productivity and enhances creative processes c-pilot is accessible directly through the Bing website or the Microsoft Edge browser it employs advanced AI to provide a dynamic interaction model where you can ask questions generate content and receive detailed answers directly related to the task they are performing this is useful in scenarios like getting suggestions on generating a color palette from a company logo understanding and troubleshooting data analysis expressions also known as DAX formula or even answering specific contextual questions about improving a report interface in the everchanging digital landscape proficiency with advanced tools like Copilot is crucial for adapting swiftly to new technologies and maintaining a competitive edge now that you know what Microsoft C-Pilot is let’s explore its core capabilities and features c-pilot transforms traditional search capabilities by providing comprehensive contextaware responses to complex queries whether you’re asking for the benefits of using direct query or wanting travel advice on attending a data conference Copilot generates textbased answers images additional links and more delivering a rich detailed response copilot excels in creating text for a variety of needs including drafting emails writing user manuals and generating creative content like marketing posts this feature allows users to input prompts and Copilot crafts the necessary text in seconds tailored to the desired tone and format integrated with Dell E3 technology the designer feature in Copilot enables users to generate images on demand this tool is accessible directly through the Bing interface and creates visual content ranging from social media posts to custom event invitations copilot extends its functionality to the edge browser offering insights within the sidebar additional information links and suggestions enrich the browsing experience helping to discover new content and access relevant data quickly copilot supports various multimodal interactions which means it can handle tasks combining different data input and output types such as text and images this enhances the flexibility and depth of user interactions with the tool having covered Microsoft Copilot’s vast capabilities and features in Bing let’s explore how its varied modes adapt to an individual’s needs these modes creative balanced and precise enhance the experience by shaping the AI’s responses to fluently match the context of queries creative mode is suitable for tasks requiring a high degree of creativity such as composing poetry and images or crafting engaging narratives it enhances responses with stylistic elements like word play providing more elaborate and detailed communication for instance creative mode can be used in the retail industry to develop unique marketing campaigns that captivate customers consider a clothing brand wanting to launch a new line using creative mode they can generate inventive product descriptions engaging storytelling around the brand’s journey and eye-catching promotional materials that differentiate their offerings from competitors and attract more customers balanced mode is the default configuration providing a compromise between creative mode’s detailed expressiveness and precise mode succinct nature it aims to deliver factually correct responses yet includes a slight creative twist to enhance engagement this mode is well suited for regular inquiries that require clear and accurate information but are enriched by a creative element to maintain interest and readability in the manufacturing sector balanced mode can be used to write user manuals that are not only informative and precise but also easy to understand and engaging this helps ensure that technical documentation while accurate is also accessible to users enhancing customer satisfaction and reducing errors in product use precise mode focuses on delivering brief and accurate responses when precision and conciseness are critical this mode ensures that responses are direct and to the point concentrating solely on factual content without additional creative additions it is ideal for straightforward questions where timely and accurate information is needed or when a concise summary is required to quickly grasp the essential facts for example precise mode is essential for developers and data professionals when troubleshooting complex formulas this mode provides straightforward accurate responses that help individuals quickly understand errors in their code or apply the best techniques to optimize their queries without sifting through irrelevant information by harnessing the power of Microsoft Copilot you embark upon infinite digital possibilities with each query you explore and insight you uncover you’re not only keeping up with new age technology you begin driving it as a data analyst your agenda consists of creating a series of PowerBI reports that accurately capture the company’s performance over the past quarter you have gathered the necessary data and spent hours planning the data flow however as you explore the data set you encounter familiar roadblocks some of the formulas in your reports are returning errors disrupting the flow of your analysis moreover ensuring the aesthetics of the reports align with your company’s theme is proving to be more time consuming than anticipated you often find yourself pondering the hours spent each week on similar tasks time that could otherwise be directed towards deeper analysis that could propel the company forward the potential of integrating C-pilot with PowerBI becomes apparent in moments like these as a data analyst your daily work is fraught with challenges that can perplex even the most experienced professionals in the field each step presents obstacles from data collection to report delivery one of the primary issues data analysts face regularly is formula errors these errors can range from simple syntax mistakes to more complex logical problems that can skew the analysis and lead to incorrect conclusions such issues not only delay the reporting process but also jeopardize the accuracy and reliability of the information presented to decision makers maintaining consistency in color usage that reflects the company’s theme across all reports requires meticulous attention to detail and in-depth knowledge of branding guidelines these design challenges often consume a substantial amount of time and can divert one’s focus from core analytical responsibilities copilot paired with PowerBI transforms the way you navigate these challenges you can ask C-Pilot questions about techniques to improve your reports interface or instruct it to troubleshoot data analysis expressions or DAX formulas for instance you might say “Explain this DAX formula and why it results in an error then Copilot immediately interprets your request and generates the relevant explanation and corrected DAX formula without you manually troubleshooting it moreover Copilot’s machine learning or ML aspect continuously learns from the data it processes and its interactions with you this enables Copilot to become more adapted understanding your specific needs over time for example imagine you are working on a series of financial reports and Copilot has resolved DAX errors for these formulas earlier in the chat session copilot then recognizes these patterns in your query history and personalizes future interactions to ensure the chat context remains relevant this saves you time by reducing the need to copy and paste formulas repeatedly and helps ensure accuracy in your analysis by minimizing the potential for errors now that you understand how Copilot leverages cutting edge artificial intelligence technologies let’s explore the advantages this powerful tool offers for data analysts these features not only enhance the efficiency of workflows but also elevate the quality and impact of reports c-pilot excels in troubleshooting and optimizing DAX formulas which are central to data manipulation and analysis in PowerBI if you’re struggling with a formula’s performance or accuracy C-Pilot provides suggestions for optimization it can also explain the logic behind DAX functions in simple terms making it easier for you to understand and effectively use them in your reports from an aesthetic standpoint Copilot can analyze images of your current reports or even suggest improvements to the layout for example if you upload an image of a report you’re currently working on Copilot can analyze the placement of elements and suggest a more streamlined or visually appealing arrangement that enhances readability and viewer engagement when you upload an image representing a company’s branding like a logo or marketing material Copilot can analyze the colors and generate a color palette that matches the branding this feature ensures that all reports maintain a consistent visual style that aligns with a company’s identity enhancing the professional quality of your presentations copilot can also serve as a creative assistant by generating images that inspire the design of your reports for example if you need to create a report on sustainability C-Pilot can generate images that evoke themes of sustainability you can use these images as a reference to design your own report visuals ensuring your reports are not only informative but also aesthetically aligned with the topic it is clear that C-Pilot is not just a tool but an assistant that brings out the best in your analysis efforts remember every report you create every DAX formula you solve and every insight you derive contributes to the decision that drives the company forward as you continue to leverage the power of PowerBI redefine the boundaries of what you can achieve with data and let C-pilot guide you to a new horizon of possibilities it’s early Monday morning and your manager has assigned you a critical task whereby you must develop a report for the upcoming quarterly review your manager expects the report to embody the company’s new logo and color scheme to add to the challenge the task now is not only to present data but to do so in a way that reflects the company’s updated brand identity feeling the weight of this responsibility you take a deep breath sip your coffee and get to work you are confident you can complete this task well with your trusty ally Microsoft Copilot when designing a report matching colors to a company’s logo and branding isn’t just about aesthetics but also about communication and consistency using artificial intelligence or AI assisted tools like Microsoft Copilot enables you to easily integrate a new color pallet aligning your report with the updated company branding this AIdriven approach enhances productivity by automating the once time-consuming task of manual caller matching so let’s unpack how you can achieve this first open Microsoft Edge and select the C-Pilot icon next to the search bar this access point is part of Microsoft’s integrated experience merging the functionalities of Bing and Copilot ensure that you are signed in with your Microsoft account you’ll be prompted to create an account if you don’t have an account once signed in select the more creative button to activate creative mode creative mode is recommended for highly creative tasks like developing unique concepts or exploring artistic elements such as images now focus towards the bottom left of the interface next to ask me anything and select add an image followed by upload from this device next in the file explorer navigate to the location where the logo image is saved select the image file and confirm the selection by selecting the open button to upload it the selected image then begins to upload to Copilot type the instructions in the text box depending on what you need Copilot to do with the image in this instance let’s create a color palette by inputting generate a color palette based on this logo upon selecting the submit button Copilot uses its AI technology to analyze the uploaded logo image it examines the logo’s colors and uses algorithms designed to identify and extract predominant and accent colors based on the analysis Copilot presents the color palette in hex codes which is the standard for color representation if the initial palette isn’t satisfactory or lacks some colors you can modify your prompt to specify your needs further for instance if the company branding includes the color blue which wasn’t present in the logo you can amend your prompt to include shades of blue in the palette with your generated color palette it’s time to integrate these colors into your PowerBI report open the report and select the view tab now select the themes drop-down to expand the theme gallery upon selecting customize current theme input the hex codes provided by Copilot via the drop- down buttons for each color setting such as first level and second level these hex codes represent the colors identified from the logo after inputting the new colors select apply to update the report with a new theme there you have it you can now confidently use Microsoft Copilot to enhance your report design you achieved maximized productivity and reduced the time you spent on the task remember partnering with an AI tool such as Microsoft Copilot makes managing complex tasks and deadlines easier so enjoy the journey as you embrace and explore its powerful capabilities as a senior data analyst you’ve spent weeks crafting a PowerBI dashboard for the company’s quarterly review however as you run through the last data validations a series of errors cascade through critical data analysis expressions or DAX formulas these aren’t simple fixes they involve complex nested if statements within calculate functions that you had previously tested in this critical moment you recall the Microsoft Copilot in Bing is the solution you need in this video you’ll discover the importance of mastering DAX for data manipulation and analysis in PowerBI and learn how Copilot can be a valuable tool for addressing formula issues mastering DAX is essential to turn complex data into compelling business insights however even the most skilled data analysts can encounter errors when navigating through its syntax and functionalities understanding these common issues can help you write more robust and efficient DAX code let’s explore these and how to resolve them using Microsoft Copilot in Bing when applied over large data sets the filter function can be computationally expensive and slows report performance for instance imagine using filter to identify all sales transactions above a certain value across the sales database the row iterative nature of filter would examine each transaction individually causing delays in loading the report here Copilot can help optimize the formulas to enhance performance and assist in correcting any logical errors by refining the filter criteria let’s examine how to achieve this begin by opening your PowerBI desktop report and navigate to the table containing the filter formula you intend to refine next select the formula bar where the filter statement is displayed now copy the contents from the field with your formula copied launch Microsoft Edge and select the Copilot icon in the sidebar to access the integrated C-pilot in Bing upon loading Copilot select the more precise button that activates precise mode locate the ask me anything text box and paste the slow filter formula providing Copilot with context now type the specific query for assistance on a new line in the same prompt window in this instance to optimize performance you can type “How can I optimize this filter function to improve performance when handling large data sets?” Select the submit button to send the query to Copilot once you press submit Copilot processes your input using its artificial intelligence commonly referred to as AI capabilities once you have a revised filter formula and are satisfied copy this directly from the copilot interface by selecting the copy button upon navigating to your PowerBI report select the table where you want to apply the updated formula then select the formula bar and paste the updated formula make sure to replace the old formula completely to avoid conflicts or errors select enter to commit the formula in PowerBI and observe how it executes one of the most powerful yet tricky aspects of calculate is its ability to modify the filter context of a calculation suppose you want to use calculate to sum sales for all countries but as a result it returns total sales for only the United States microsoft Copilot in Bing can help guide you through the correct structuring of calculate formulas suggest how to perform dynamic aggregations and even detect and suggest fixes to syntax errors in the ask me anything text box paste the calculate formula you need to troubleshoot on a new line in the same prompt window type how can I modify this calculate formula to sum sales for all countries once you select the submit button Copilot returns an explanation and a corrected calculate formula with a requested context after reviewing the initial results you can ask some additional questions to deepen your understanding or refine your formula further for instance can you suggest ways to avoid common syntax errors in this calculate formula this followup empowers you to grasp common mistakes and learn best practices in writing DAX formulas once you are satisfied with the response from Copilot select the copy button finally paste the results in Microsoft PowerBI to assess whether the suggestions improve the formula’s functionality deeply nested if statements can become difficult to manage and troubleshoot imagine using nested if statements to categorize sales into different classes based on the column amount the complexity of checking multiple conditions can easily lead to mistakes and logic copilot can simplify this by suggesting straightforward alternatives or helping restructure these nested conditions into manageable components now in the ask me anything text box paste the if formula that requires troubleshooting on a new line in the same prompt window enter can you suggest a simpler alternative to this nested if statement for better manageability upon selecting the submit button Copilot generates suggestions to simplify or improve the efficiency after reviewing the feedback provided by Copilot select the copy button finally navigate to PowerBI desktop paste the revised if statement into the formula bar and select enter to apply the formula as your journey through mastering DAX comes to a close reflect on the transformative power of blending AI with your analytical skills as you move forward equipped with the knowledge of DAX and the support of AI remember that each challenge overcome is not just a step toward progression but a leap toward mastering PowerBI congratulations on completing the Microsoft PL300 exam preparation and practice course your dedication has given you the skills and tools for success when writing the Microsoft PL300 exam you have now achieved all the PowerBI milestones in this program this course gave you opportunities to practice your exam technique and refresh your knowledge of all the key areas assessed in the Microsoft PL300 exam you tested your knowledge in a series of practice exams mapped to all the main topics covered in the Microsoft PL300 exam to help you prepare for certification success you also got tips and tricks testing strategies useful resources and information on how to sign up for the Microsoft PL300 proctored exam now that you have successfully completed this professional certificate you are ready to schedule the Microsoft PL300 exam through Pearson View through a mix of videos readings and exercises you’ve learned about the expectations for the learning content by starting with an introduction to the course following this you were provided with information about the Microsoft certification here you explored an introduction to preparing for the exam how to prepare for the procedurate examination how the exam is administered topics covered in the PL300 exam and testing strategy next you reviewed what you learned about getting data from data sources here you revisited how to identify and connect to a data source using a shared data set or local data set direct query import and dual mode parameter values how to set up a data flow how to connect to a data flow the Microsoft data versse and how to get data from data sources you then investigated how to profile and clean data this included consolidating your knowledge of evaluating data data statistics and column properties how to resolve inconsistencies and data quality issues and an indepth dive into profiling and cleaning data after that you explored the process of transforming and loading data where you covered how to create and transform columns identify when to use reference queries how to merge and append queries table relationships and an in-depth view of transforming and loading data next you explored modeling data where you revised key concepts related to modeling data in PowerBI here you reviewed designing data models where you learned about how to design a schema implement role playing dimensions use calculate to manipulate filters and configure cardality and cross filter direction next you explored how to create model calculations using DAX this is where you explored calculated columns and single aggregation measures as well as how to implement time intelligence measures you also reviewed the differences between additive semi-additive and non-additive measures later you reviewed how to implement a data model this is where you explored calculated tables and data hierarchies you also covered how to optimize model performance this included reviewing important topics like using the performance analyzer and how to improve performance via cardality and summarization you reviewed data visualization and analysis techniques in PowerBI to help you prepare for the PL300 exam in this section you revisited the process of report creation this included reviewing important topics like using appropriate visualizations configuring and formatting visualizations applying slicing and filtering and exporting and printing reports you re-examined how to enhance reports for better usability and storytelling this included reviewing report navigation and sorting interactions between visuals sync slicers group and layer visuals by using the selection pane and how to design reports for mobile devices following that you explored how to identify patterns and trends you revisited how to detect outliers and anomalies grouping and binning data AI visuals reference lines and error bars and scorecards and metrics you then moved on to deploying and maintaining assets this is where you revised creating and managing workspaces and assets you reviewed key concepts such as workspaces and workspace roles workspace apps how to publish import or update assets in a workspace subscriptions and data alerts how to promote or certify PowerBI content and global options for files next you reviewed how to manage data sets this section provided you with a summary of data gateways rowle security and granting access to data sets to round off your learning you took a mock exam that has been set up in a similar style to the industry recognized Microsoft PL300 exam by passing the exam you’ll become a Microsoft certified PowerBI data analyst it will also help you to start or expand a career in this role this globally recognized certification is industry endorsed evidence of your technical skills and knowledge the exam measures your ability to perform the following tasks prepare data for analysis model data visualize and analyze data and deploy and maintain assets to complete the exam you should be familiar with Power Query and the process of writing expressions using data analysis expressions or DAX you’ve done a great job so far and you should be proud of your progress the experience you’ve gained will showcase your willingness to learn your motivation and your capability to potential employers it’s been a pleasure to embark on this journey of discovery with you best of luck in the future the Microsoft PowerBI Analyst program is an excellent resource to start your career whether you’re a beginner or a seasoned professional looking to improve your skills data is the driving force behind this everchanging modern world shaping and developing industries and society it has transformed the way institutions operate from banks and hospitals to schools and supermarkets and for businesses data is everything it informs decisions and helps create value for customers content streaming services analyze data to decide what content to promote social media services analyze data to determine what products their customers are interested in and your local supermarket gathers and analyzes data to ensure the products you want are available the result of having all this data is that professional analysts are required to process and sort it to gain the insights that drive both the business and social worlds are you intrigued by this career field and wondering how to get started let’s meet two other students who have just begun their careers in entry- levelvel positions discover how and why they’ve chosen to embark upon career paths in this field with Microsoft and Corsera lucas a recent information technology graduate is currently searching for his first IT job he is eager to secure a position in the IT sector that offers good earning potential and a quick career progression he wants to work full-time in data analysis as he feels this career would offer both benefits during his degree he found working with and analyzing cloud-based data to be the most enjoyable element hence his focus on this career path lucas currently works shifts in a warehouse environment so he will need the flexibility of self-paced learning his earnings are low so he wants to achieve the qualification using the same basic laptop he relied upon as a student despite being a beginner Lucas has already mapped out his career and certification path and has enrolled in the Microsoft PowerBI analyst program he plans to apply for an entry- levelvel position as a data analyst once he has successfully completed the program and passed the PL300 exam as a data analyst he will inspect data identify key business insights for new business opportunities and help solve business problems amelia has been working as an administrative assistant in sales and marketing since leaving high school now that a few years have passed she is ready to embark upon a new career path in her current role Amelia has seen PowerBI reports and dashboards created by colleagues and shared with the team she was impressed at how the information was used to shape and focus the sales campaigns this sparked an interest in a career in data analysis amelia’s job requires her to work long hours so the ability to structure her own learning path is vital she also has a long commute so would like to access e-learning through her smartphone or tablet pursuing the PowerBI analyst qualification will showcase her dedication and help her apply for more senior roles in the department in the short term amelia doesn’t have a scientific background but she finds IT concepts logical and easy to understand so she’s embarking on the Microsoft PowerBI analyst program as it doesn’t assume a pre-existing high level of technical knowledge in the long term she hopes to secure an entry-level role as a PowerBI analyst as a PowerBI analyst she will be responsible for building data models creating data assets like reports and dashboards and ensuring data requirements are met you may be in a similar position to Lucas and Amelia and possess an interest in this exciting field of data analysis like them you can begin your career in this field by enrolling in the Microsoft PowerBI analyst program this will be the start of your new adventure good luck with your learning journey

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
April 26, 2025
Database Engineering, SQL, Python, and Data Analysis Fundamentals
These resources provide a comprehensive pathway for aspiring database engineers and software developers. They cover fundamental database concepts like data modeling, SQL for data manipulation and management, database optimization, and data warehousing. Furthermore, they explore essential software development practices including Python programming, object-oriented principles, version control with Git and GitHub, software testing methodologies, and preparing for technical interviews with insights into data structures and algorithms.

Introduction to Database Engineering

This course provides a comprehensive introduction to database engineering. A straightforward description of a database is a form of electronic storage in which data is held. However, this simple explanation doesn’t fully capture the impact of database technology on global industry, government, and organizations. Almost everyone has used a database, and it’s likely that information about us is present in many databases worldwide.

Database engineering is crucial to global industry, government, and organizations. In a real-world context, databases are used in various scenarios:
- Banks use databases to store data for customers, bank accounts, and transactions.
- Hospitals store patient data, staff data, and laboratory data.
- Online stores retain profile information, shopping history, and accounting transactions.
- Social media platforms store uploaded photos.
- Work environments use databases for downloading files.
- Online games rely on databases.
Data in basic terms is facts and figures about anything. For example, data about a person might include their name, age, email, and date of birth, or it could be facts and figures related to an online purchase like the order number and description.

A database looks like data organized systematically, often resembling a spreadsheet or a table. This systematic organization means that all data contains elements or features and attributes by which they can be identified. For example, a person can be identified by attributes like name and age.

Data stored in a database cannot exist in isolation; it must have a relationship with other data to be processed into meaningful information. Databases establish relationships between pieces of data, for example, by retrieving a customer’s details from one table and their order recorded against another table. This is often achieved through keys. A primary key uniquely identifies each record in a table, while a foreign key is a primary key from one table that is used in another table to establish a link or relationship between the two. For instance, the customer ID in a customer table can be the primary key and then become a foreign key in an order table, thus relating the two tables.

While relational databases, which organize data into tables with relationships, are common, there are other types of databases. An object-oriented database stores data in the form of objects instead of tables or relations. An example could be an online bookstore where authors, customers, books, and publishers are rendered as classes, and the individual entries are objects or instances of these classes.

To work with data in databases, database engineers use Structured Query Language (SQL). SQL is a standard language that can be used with all relational databases like MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. Database engineers establish interactions with databases to create, read, update, and delete (CRUD) data.

SQL can be divided into several sub-languages:
- Data Definition Language (DDL) helps define data in the database and includes commands like CREATE (to create databases and tables), ALTER (to modify database objects), and DROP (to remove objects).
- Data Manipulation Language (DML) is used to manipulate data and includes operations like INSERT (to add data), UPDATE (to modify data), and DELETE (to remove data).
- Data Query Language (DQL) is used to read or retrieve data, primarily using the SELECT command.
- Data Control Language (DCL) is used to control access to the database, with commands like GRANT and REVOKE to manage user privileges.
SQL offers several advantages:
- It requires very little coding skills to use, consisting mainly of keywords.
- Its interactivity allows developers to write complex queries quickly.
- It is a standard language usable with all relational databases, leading to extensive support and information availability.
- It is portable across operating systems.
Before developing a database, planning the organization of data is crucial, and this plan is called a schema. A schema is an organization or grouping of information and the relationships among them. In MySQL, schema and database are often interchangeable terms, referring to how data is organized. However, the definition of schema can vary across different database systems. A database schema typically comprises tables, columns, relationships, data types, and keys. Schemas provide logical groupings for database objects, simplify access and manipulation, and enhance database security by allowing permission management based on user access rights.

Database normalization is an important process used to structure tables in a way that minimizes challenges by reducing data duplication and avoiding data inconsistencies (anomalies). This involves converting a large table into multiple tables to reduce data redundancy. There are different normal forms (1NF, 2NF, 3NF) that define rules for table structure to achieve better database design.

As databases have evolved, they now must be able to store ever-increasing amounts of unstructured data, which poses difficulties. This growth has also led to concepts like big data and cloud databases.

Furthermore, databases play a crucial role in data warehousing, which involves a centralized data repository that loads, integrates, stores, and processes large amounts of data from multiple sources for data analysis. Dimensional data modeling, based on dimensions and facts, is often used to build databases in a data warehouse for data analytics. Databases also support data analytics, where collected data is converted into useful information to inform future decisions.

Tools like MySQL Workbench provide a unified visual environment for database modeling and management, supporting the creation of data models, forward and reverse engineering of databases, and SQL development.

Finally, interacting with databases can also be done through programming languages like Python using connectors or APIs (Application Programming Interfaces). This allows developers to build applications that interact with databases for various operations.

Understanding SQL: Language for Database Interaction

SQL (Structured Query Language) is a standard language used to interact with databases. It’s also commonly pronounced as “SQL”. Database engineers use SQL to establish interactions with databases.

Here’s a breakdown of SQL based on the provided source:
- Role of SQL: SQL acts as the interface or bridge between a relational database and its users. It allows database engineers to create, read, update, and delete (CRUD) data. These operations are fundamental when working with a database.
- Interaction with Databases: As a web developer or data engineer, you execute SQL instructions on a database using a Database Management System (DBMS). The DBMS is responsible for transforming SQL instructions into a form that the underlying database understands.
- Applicability: SQL is particularly useful when working with relational databases, which require a language that can interact with structured data. Examples of relational databases that SQL can interact with include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
- SQL Sub-languages: SQL is divided into several sub-languages:
- Data Definition Language (DDL): Helps you define data in your database. DDL commands include:
- CREATE: Used to create databases and related objects like tables. For example, you can use the CREATE DATABASE command followed by the database name to create a new database. Similarly, CREATE TABLE followed by the table name and column definitions is used to create tables.
- ALTER: Used to modify already created database objects, such as modifying the structure of a table by adding or removing columns (ALTER TABLE).
- DROP: Used to remove objects like tables or entire databases. The DROP DATABASE command followed by the database name removes a database. The DROP COLUMN command removes a specific column from a table.
- Data Manipulation Language (DML): Commands are used to manipulate data in the database and most CRUD operations fall under DML. DML commands include:
- INSERT: Used to add or insert data into a table. The INSERT INTO syntax is used to add rows of data to a specified table.
- UPDATE: Used to edit or modify existing data in a table. The UPDATE command allows you to specify data to be changed.
- DELETE: Used to remove data from a table. The DELETE FROM syntax followed by the table name and an optional WHERE clause is used to remove data.
- Data Query Language (DQL): Used to read or retrieve data from the database. The primary DQL command is:
- SELECT: Used to select and retrieve data from one or multiple tables, allowing you to specify the columns you want and apply filter criteria using the WHERE clause. You can select all columns using SELECT *.
- Data Control Language (DCL): Used to control access to the database. DCL commands include:
- GRANT: Used to give users access privileges to data.
- REVOKE: Used to revert access privileges already given to users.
- Advantages of SQL: SQL is a popular language choice for databases due to several advantages:
- Low coding skills required: It uses a set of keywords and requires very little coding.
- Interactivity: Allows developers to write complex queries quickly.
- Standard language: Can be used with all relational databases like MySQL, leading to extensive support and information availability.
- Portability: Once written, SQL code can be used on any hardware and any operating system or platform where the database software is installed.
- Comprehensive: Covers all areas of database management and administration, including creating databases, manipulating data, retrieving data, and managing security.
- Efficiency: Allows database users to process large amounts of data quickly and efficiently.
- Basic SQL Operations: SQL enables various operations on data, including:
- Creating databases and tables using DDL.
- Populating and modifying data using DML (INSERT, UPDATE, DELETE).
- Reading and querying data using DQL (SELECT) with options to specify columns and filter data using the WHERE clause.
- Sorting data using the ORDER BY clause with ASC (ascending) or DESC (descending) keywords.
- Filtering data using the WHERE clause with various comparison operators (=, <, >, <=, >=, !=) and logical operators (AND, OR). Other filtering operators include BETWEEN, LIKE, and IN.
- Removing duplicate rows using the SELECT DISTINCT clause.
- Performing arithmetic operations using operators like +, -, *, /, and % (modulus) within SELECT statements.
- Using comparison operators to compare values in WHERE clauses.
- Utilizing aggregate functions (though not detailed in this initial overview but mentioned later in conjunction with GROUP BY).
- Joining data from multiple tables (mentioned as necessary when data exists in separate entities). The source later details INNER JOIN, LEFT JOIN, and RIGHT JOIN clauses.
- Creating aliases for tables and columns to make queries simpler and more readable.
- Using subqueries (a query within another query) for more complex data retrieval.
- Creating views (virtual tables based on the result of a SQL statement) to simplify data access and combine data from multiple tables.
- Using stored procedures (pre-prepared SQL code that can be saved and executed).
- Working with functions (numeric, string, date, comparison, control flow) to process and manipulate data.
- Implementing triggers (stored programs that automatically execute in response to certain events).
- Managing database transactions to ensure data integrity.
- Optimizing queries for better performance.
- Performing data analysis using SQL queries.
- Interacting with databases using programming languages like Python through connectors and APIs.
In essence, SQL is a powerful and versatile language that is fundamental for anyone working with relational databases, enabling them to define, manage, query, and manipulate data effectively. The knowledge of SQL is a valuable skill for database engineers and is crucial for various tasks, from building and maintaining databases to extracting insights through data analysis.

Data Modeling Principles: Schema, Types, and Design

Data modeling principles revolve around creating a blueprint of how data will be organized and structured within a database system. This plan, often referred to as a schema, is essential for efficient data storage, access, updates, and querying. A well-designed data model ensures data consistency and quality.

Here are some key data modeling principles discussed in the sources:
- Understanding Data Requirements: Before creating a database, it’s crucial to have a clear idea of its purpose and the data it needs to store. For example, a database for an online bookshop needs to record book titles, authors, customers, and sales. Mangata and Gallo (mng), a jewelry store, needed to store data on customers, products, and orders.
- Visual Representation: A data model provides a visual representation of data elements (entities) and their relationships. This is often achieved using an Entity Relationship Diagram (ERD), which helps in planning entity-relational databases.
- Different Levels of Abstraction: Data modeling occurs at different levels:
- Conceptual Data Model: Provides a high-level, abstract view of the entities and their relationships in the database system. It focuses on “what” data needs to be stored (e.g., customers, products, orders as entities for mng) and how these relate.
- Logical Data Model: Builds upon the conceptual model by providing a more detailed overview of the entities, their attributes, primary keys, and foreign keys. For mng, this would involve defining attributes for customers (like client ID as primary key), products, and orders, and specifying foreign keys to establish relationships (e.g., client ID in the orders table referencing the clients table).
- Physical Data Model: Represents the internal schema of the database and is specific to the chosen Database Management System (DBMS). It outlines details like data types for each attribute (e.g., varchar for full name, integer for contact number), constraints (e.g., not null), and other database-specific features. SQL is often used to create the physical schema.
- Choosing the Right Data Model Type: Several types of data models exist, each with its own advantages and disadvantages:
- Relational Data Model: Represents data as a collection of tables (relations) with rows and columns, known for its simplicity.
- Entity-Relationship Model: Similar to the relational model but presents each table as a separate entity with attributes and explicitly defines different types of relationships between entities (one-to-one, one-to-many, many-to-many).
- Hierarchical Data Model: Organizes data in a tree-like structure with parent and child nodes, primarily supporting one-to-many relationships.
- Object-Oriented Model: Translates objects into classes with characteristics and behaviors, supporting complex associations like aggregation and inheritance, suitable for complex projects.
- Dimensional Data Model: Based on dimensions (context of measurements) and facts (quantifiable data), optimized for faster data retrieval and efficient data analytics, often using star and snowflake schemas in data warehouses.
- Database Normalization: This is a crucial process for structuring tables to minimize data redundancy, avoid data modification implications (insertion, update, deletion anomalies), and simplify data queries. Normalization involves applying a series of normal forms (First Normal Form – 1NF, Second Normal Form – 2NF, Third Normal Form – 3NF) to ensure data atomicity, eliminate repeating groups, address functional and partial dependencies, and resolve transitive dependencies.
- Establishing Relationships: Data in a database should be related to provide meaningful information. Relationships between tables are established using keys:
- Primary Key: A value that uniquely identifies each record in a table and prevents duplicates.
- Foreign Key: One or more columns in one table that reference the primary key in another table, used to connect tables and create cross-referencing.
- Defining Domains: A domain is the set of legal values that can be assigned to an attribute, ensuring data in a field is well-defined (e.g., only numbers in a numerical domain). This involves specifying data types, length values, and other relevant rules.
- Using Constraints: Database constraints limit the type of data that can be stored in a table, ensuring data accuracy and reliability. Common constraints include NOT NULL (ensuring fields are always completed), UNIQUE (preventing duplicate values), CHECK (enforcing specific conditions), and FOREIGN KEY (maintaining referential integrity).
- Importance of Planning: Designing a data model before building the database system allows for planning how data is stored and accessed efficiently. A poorly designed database can make it hard to produce accurate information.
- Considerations at Scale: For large-scale applications like those at Meta, data modeling must prioritize user privacy, user safety, and scalability. It requires careful consideration of data access, encryption, and the ability to handle billions of users and evolving product needs. Thoughtfulness about future changes and the impact of modifications on existing data models is crucial.
- Data Integrity and Quality: Well-designed data models, including the use of data types and constraints, are fundamental steps in ensuring the integrity and quality of a database.
Data modeling is an iterative process that requires a deep understanding of the data, the business requirements, and the capabilities of the chosen database system. It is a crucial skill for database engineers and a fundamental aspect of database design. Tools like MySQL Workbench can aid in creating, visualizing, and implementing data models.

Understanding Version Control: Git and Collaborative Development

Version Control Systems (VCS), also known as Source Control or Source Code Management, are systems that record all changes and modifications to files for tracking purposes. The primary goal of any VCS is to keep track of changes by allowing developers access to the entire change history with the ability to revert or roll back to a previous state or point in time. These systems track different types of changes such as adding new files, modifying or updating files, and deleting files. The version control system is the source of truth across all code assets and the team itself.

There are many benefits associated with Version Control, especially for developers working in a team. These include:
- Revision history: Provides a record of all changes in a project and the ability for developers to revert to a stable point in time if code edits cause issues or bugs.
- Identity: All changes made are recorded with the identity of the user who made them, allowing teams to see not only when changes occurred but also who made them.
- Collaboration: A VCS allows teams to submit their code and keep track of any changes that need to be made when working towards a common goal. It also facilitates peer review where developers inspect code and provide feedback.
- Automation and efficiency: Version Control helps keep track of all changes and plays an integral role in DevOps, increasing an organization’s ability to deliver applications or services with high quality and velocity. It aids in software quality, release, and deployments. By having Version Control in place, teams following agile methodologies can manage their tasks more efficiently.
- Managing conflicts: Version Control helps developers fix any conflicts that may occur when multiple developers work on the same code base. The history of revisions can aid in seeing the full life cycle of changes and is essential for merging conflicts.
There are two main types or categories of Version Control Systems: centralized Version Control Systems (CVCS) and distributed Version Control Systems (DVCS).
- Centralized Version Control Systems (CVCS) contain a server that houses the full history of the code base and clients that pull down the code. Developers need a connection to the server to perform any operations. Changes are pushed to the central server. An advantage of CVCS is that they are considered easier to learn and offer more access controls to users. A disadvantage is that they can be slower due to the need for a server connection.
- Distributed Version Control Systems (DVCS) are similar, but every user is essentially a server and has the entire history of changes on their local system. Users don’t need to be connected to the server to add changes or view history, only to pull down the latest changes or push their own. DVCS offer better speed and performance and allow users to work offline. Git is an example of a DVCS.
Popular Version Control Technologies include git and GitHub. Git is a Version Control System designed to help users keep track of changes to files within their projects. It offers better speed and performance, reliability, free and open-source access, and an accessible syntax. Git is used predominantly via the command line. GitHub is a cloud-based hosting service that lets you manage git repositories from a user interface. It incorporates Git Version Control features and extends them with features like Access Control, pull requests, and automation. GitHub is very popular among web developers and acts like a social network for projects.

Key Git concepts include:
- Repository: Used to track all changes to files in a specific folder and keep a history of all those changes. Repositories can be local (on your machine) or remote (e.g., on GitHub).
- Clone: To copy a project from a remote repository to your local device.
- Add: To stage changes in your local repository, preparing them for a commit.
- Commit: To save a snapshot of the staged changes in the local repository’s history. Each commit is recorded with the identity of the user.
- Push: To upload committed changes from your local repository to a remote repository.
- Pull: To retrieve changes from a remote repository and apply them to your local repository.
- Branching: Creating separate lines of development from the main codebase to work on new features or bug fixes in isolation. The main branch is often the source of truth.
- Forking: Creating a copy of someone else’s repository on a platform like GitHub, allowing you to make changes without affecting the original.
- Diff: A command to compare changes across files, branches, and commits.
- Blame: A command to look at changes of a specific file and show the dates, times, and users who made the changes.
The typical Git workflow involves three states: modified, staged, and committed. Files are modified in the working directory, then added to the staging area, and finally committed to the local repository. These local commits are then pushed to a remote repository.

Branching workflows like feature branching are commonly used. This involves creating a new branch for each feature, working on it until completion, and then merging it back into the main branch after a pull request and peer review. Pull requests allow teams to review changes before they are merged.

At Meta, Version Control is very important. They use a giant monolithic repository for all of their backend code, which means code changes are shared with every other Instagram team. While this can be risky, it allows for code reuse. Meta encourages engineers to improve any code, emphasizing that “nothing at meta is someone else’s problem”. Due to the monolithic repository, merge conflicts happen a lot, so they try to write smaller changes and add gatekeepers to easily turn off features if needed. git blame is used daily to understand who wrote specific lines of code and why, which is particularly helpful in a large organization like Meta.

Version Control is also relevant to database development. It’s easy to overcomplicate data modeling and storage, and Version Control can help track changes and potentially revert to earlier designs. Planning how data will be organized (schema) is crucial before developing a database.

Learning to use git and GitHub for Version Control is part of the preparation for coding interviews in a final course, alongside practicing interview skills and refining resumes. Effective collaboration, which is enhanced by Version Control, is a crucial skill for software developers.

Python Programming Fundamentals: An Introduction

Based on the sources, here’s a discussion of Python programming basics:

Introduction to Python:

Python is a versatile and high-level programming language available on multiple platforms. It’s used in various areas like web development, data analytics, and business forecasting. Python’s syntax is similar to English, making it intuitive and easy for beginners to understand. Experienced programmers also appreciate its power and adaptability. Python was created by Guido van Rossum and released in 1991. It was designed to be readable and has similarities to English and mathematics. Since its release, it has gained significant popularity and has a rich selection of frameworks and libraries. Currently, it’s a popular language to learn, widely used in areas such as web development, artificial intelligence, machine learning, data analytics, and various programming applications. Python is easy to learn and get started with due to its English-like syntax. It also often requires less code compared to languages like C or Java. Python’s simplicity allows developers to focus on the task at hand, making it potentially quicker to get a product to market.

Setting up a Python Environment:

To start using Python, it’s essential to ensure it works correctly on your operating system with your chosen Integrated Development Environment (IDE), such as Visual Studio Code (VS Code). This involves making sure the right version of Python is used as the interpreter when running your code.
- Installation Verification: You can verify if Python is installed by opening the terminal (or command prompt on Windows) and typing python –version. This should display the installed Python version.
- VS Code Setup: VS Code offers a walkthrough guide for setting up Python. This includes installing Python (if needed) and selecting the correct Python interpreter.
- Running Python Code: Python code can be run in a few ways:
- Python Shell: Useful for running and testing small scripts without creating .py files. You can access it by typing python in the terminal.
- Directly from Command Line/Terminal: Any file with the .py extension can be run by typing python followed by the file name (e.g., python hello.py).
- Within an IDE (like VS Code): IDEs provide features like auto-completion, debugging, and syntax highlighting, making coding a better experience. VS Code has a run button to execute Python files.
Basic Syntax and Concepts:
- Print Statement: The print() function is used to display output to the console. It can print different types of data and allows for formatting.
- Variables: Variables are used to store data that can be changed throughout the program’s lifecycle. In Python, you declare a variable by assigning a value to a name (e.g., x = 5). Python automatically assigns the data type behind the scenes. There are conventions for naming variables, such as camel case (e.g., myName). You can declare multiple variables and assign them a single value (e.g., a = b = c = 10) or perform multiple assignments on one line (e.g., name, age = “Alice”, 30). You can also delete a variable using the del keyword.
- Data Types: A data type indicates how a computer system should interpret a piece of data. Python offers several built-in data types:
- Numeric: Includes int (integers), float (decimal numbers), and complex numbers.
- Sequence: Ordered collections of items, including:
- Strings (str): Sequences of characters enclosed in single or double quotes (e.g., “hello”, ‘world’). Individual characters in a string can be accessed by their index (starting from 0) using square brackets (e.g., name). The len() function returns the number of characters in a string.
- Lists: Ordered and mutable sequences of items enclosed in square brackets (e.g., [1, 2, “three”]).
- Tuples: Ordered and immutable sequences of items enclosed in parentheses (e.g., (1, 2, “three”)).
- Dictionary (dict): Unordered collections of key-value pairs enclosed in curly braces (e.g., {“name”: “Bob”, “age”: 25}). Values are accessed using their keys.
- Boolean (bool): Represents truth values: True or False.
- Set (set): Unordered collections of unique elements enclosed in curly braces (e.g., {1, 2, 3}). Sets do not support indexing.
- Typecasting: The process of converting one data type to another. Python supports implicit (automatic) and explicit (using functions like int(), float(), str()) type conversion.
- Input: The input() function is used to take input from the user. It displays a prompt to the user and returns their input as a string.
- Operators: Symbols used to perform operations on values.
- Math Operators: Used for calculations (e.g., + for addition, – for subtraction, * for multiplication, / for division).
- Logical Operators: Used in conditional statements to determine true or false outcomes (and, or, not).
- Control Flow: Determines the order in which instructions in a program are executed.
- Conditional Statements: Used to make decisions based on conditions (if, else, elif).
- Loops: Used to repeatedly execute a block of code. Python has for loops (for iterating over sequences) and while loops (repeating a block until a condition is met). Nested loops are also possible.
- Functions: Modular pieces of reusable code that take input and return output. You define a function using the def keyword. You can pass data into a function as arguments and return data using the return keyword. Python has different scopes for variables: local, enclosing, global, and built-in (LEGB rule).
- Data Structures: Ways to organize and store data. Python includes lists, tuples, sets, and dictionaries.
This overview provides a foundation in Python programming basics as described in the provided sources. As you continue learning, you will delve deeper into these concepts and explore more advanced topics.

Database and Python Fundamentals Study Guide

Quiz
1. What is a database, and what is its typical organizational structure? A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
2. Explain the role of a Database Management System (DBMS) in the context of SQL. A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
3. Name and briefly define at least three sub-languages of SQL. DDL (Data Definition Language) is used to define data structures in a database, such as creating, altering, and dropping databases and tables. DML (Data Manipulation Language) is used for operational tasks like creating, reading, updating, and deleting data. DQL (Data Query Language) is used for retrieving data from the database.
4. Describe the purpose of the CREATE DATABASE and CREATE TABLE DDL statements. The CREATE DATABASE statement is used to create a new, empty database within the DBMS. The CREATE TABLE statement is used within a specific database to define a new table, including specifying the names and data types of its columns.
5. What is the function of the INSERT INTO DML statement? The INSERT INTO statement is used to add new rows of data into an existing table in the database. It requires specifying the table name and the values to be inserted into the table’s columns.
6. Explain the purpose of the NOT NULL constraint when defining table columns. The NOT NULL constraint ensures that a specific column in a table cannot contain a null value. If an attempt is made to insert a new record or update an existing one with a null value in a NOT NULL column, the operation will be aborted.
7. List and briefly define three basic arithmetic operators in SQL. The addition operator (+) is used to add two operands. The subtraction operator (-) is used to subtract the second operand from the first. The multiplication operator (*) is used to multiply two operands.
8. What is the primary function of the SELECT statement in SQL, and how can the WHERE clause be used with it? The SELECT statement is used to retrieve data from one or more tables in a database. The WHERE clause is used to filter the rows returned by the SELECT statement based on specified conditions.
9. Explain the difference between running Python code from the Python shell and running a .py file from the command line. The Python shell provides an interactive environment where you can execute Python code snippets directly and see immediate results without saving to a file. Running a .py file from the command line executes the entire script contained within the file non-interactively.
10. Define a variable in Python and provide an example of assigning it a value. In Python, a variable is a named storage location that holds a value. Variables are implicitly declared when a value is assigned to them. For example: x = 5 declares a variable named x and assigns it the integer value of 5.
Answer Key
1. A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
2. A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
3. DDL (Data Definition Language) helps you define data structures. DML (Data Manipulation Language) allows you to work with the data itself. DQL (Data Query Language) enables you to retrieve information from the database.
4. The CREATE DATABASE statement establishes a new database, while the CREATE TABLE statement defines the structure of a table within a database, including its columns and their data types.
5. The INSERT INTO statement adds new rows of data into a specified table. It requires indicating the table and the values to be placed into the respective columns.
6. The NOT NULL constraint enforces that a particular column must always have a value and cannot be left empty or contain a null entry when data is added or modified.
7. The + operator performs addition, the – operator performs subtraction, and the * operator performs multiplication between numerical values in SQL queries.
8. The SELECT statement retrieves data from database tables. The WHERE clause filters the results of a SELECT query, allowing you to specify conditions that rows must meet to be included in the output.
9. The Python shell is an interactive interpreter for immediate code execution, while running a .py file executes the entire script from the command line without direct interaction during the process.
10. A variable in Python is a name used to refer to a memory location that stores a value; for instance, name = “Alice” assigns the string value “Alice” to the variable named name.
Essay Format Questions
1. Discuss the significance of SQL as a standard language for database management. In your discussion, elaborate on at least three advantages of using SQL as highlighted in the provided text and provide examples of how these advantages contribute to efficient database operations.
2. Compare and contrast the roles of Data Definition Language (DDL) and Data Manipulation Language (DML) in SQL. Explain how these two sub-languages work together to enable the creation and management of data within a relational database system.
3. Explain the concept of scope in Python and discuss the LEGB rule. Provide examples to illustrate the differences between local, enclosed, global, and built-in scopes and explain how Python resolves variable names based on this rule.
4. Discuss the importance of modules in Python programming. Explain the advantages of using modules, such as reusability and organization, and describe different ways to import modules, including the use of import, from … import …, and aliases.
5. Imagine you are designing a simple database for a small online bookstore. Describe the tables you would create, the columns each table would have (including data types and any necessary constraints like NOT NULL or primary keys), and provide example SQL CREATE TABLE statements for two of your proposed tables.
Glossary of Key Terms
- Database: A systematically organized collection of data that can be easily accessed, managed, and updated.
- Table: A structure within a database used to organize data into rows (records) and columns (fields or attributes).
- Column (Field): A vertical set of data values of a particular type within a table, representing an attribute of the entities stored in the table.
- Row (Record): A horizontal set of data values within a table, representing a single instance of the entity being described.
- SQL (Structured Query Language): A standard programming language used for managing and manipulating data in relational databases.
- DBMS (Database Management System): Software that enables users to interact with a database, providing functionalities such as data storage, retrieval, and security.
- DDL (Data Definition Language): A subset of SQL commands used to define the structure of a database, including creating, altering, and dropping databases, tables, and other database objects.
- DML (Data Manipulation Language): A subset of SQL commands used to manipulate data within a database, including inserting, updating, deleting, and retrieving data.
- DQL (Data Query Language): A subset of SQL commands, primarily the SELECT statement, used to query and retrieve data from a database.
- Constraint: A rule or restriction applied to data in a database to ensure its accuracy, integrity, and reliability. Examples include NOT NULL.
- Operator: A symbol or keyword that performs an operation on one or more operands. In SQL, this includes arithmetic operators (+, -, *, /), logical operators (AND, OR, NOT), and comparison operators (=, >, <, etc.).
- Schema: The logical structure of a database, including the organization of tables, columns, relationships, and constraints.
- Python Shell: An interactive command-line interpreter for Python, allowing users to execute code snippets and receive immediate feedback.
- .py file: A file containing Python source code, which can be executed as a script from the command line.
- Variable (Python): A named reference to a value stored in memory. Variables in Python are dynamically typed, meaning their data type is determined by the value assigned to them.
- Data Type (Python): The classification of data that determines the possible values and operations that can be performed on it (e.g., integer, string, boolean).
- String (Python): A sequence of characters enclosed in single or double quotes, used to represent text.
- Scope (Python): The region of a program where a particular name (variable, function, etc.) is accessible. Python has four main scopes: local, enclosed, global, and built-in (LEGB).
- Module (Python): A file containing Python definitions and statements. Modules provide a way to organize code into reusable units.
- Import (Python): A statement used to load and make the code from another module available in the current script.
- Alias (Python): An alternative name given to a module or function during import, often used for brevity or to avoid naming conflicts.
Briefing Document: Review of “01.pdf”

This briefing document summarizes the main themes and important concepts discussed in the provided excerpts from “01.pdf”. The document covers fundamental database concepts using SQL, basic command-line operations, an introduction to Python programming, and related software development tools.

I. Introduction to Databases and SQL

The document introduces the concept of databases as systematically organized data, often resembling spreadsheets or tables. It highlights the widespread use of databases in various applications, providing examples like banks storing account and transaction data, and hospitals managing patient, staff, and laboratory information.

“well a database looks like data organized systematically and this organization typically looks like a spreadsheet or a table”

The core purpose of SQL (Structured Query Language) is explained as a language used to interact with databases. Key operations that can be performed using SQL are outlined:

“operational terms create add or insert data read data update existing data and delete data”

SQL is further divided into several sub-languages:
- DDL (Data Definition Language): Used to define the structure of the database and its objects like tables. Commands like CREATE (to create databases and tables) and ALTER (to modify existing objects, e.g., adding a column) are part of DDL.
- “ddl as the name says helps you define data in your database but what does it mean to Define data before you can store data in the database you need to create the database and related objects like tables in which your data will be stored for this the ddl part of SQL has a command named create then you might need to modify already created database objects for example you might need to modify the structure of a table by adding a new column you can perform this task with the ddl alter command you can remove an object like a table from a”
- DML (Data Manipulation Language): Used to manipulate the data within the database, including inserting (INSERT INTO), updating, and deleting data.
- “now we need to populate the table of data this is where I can use the data manipulation language or DML subset of SQL to add table data I use the insert into syntax this inserts rows of data into a given table I just type insert into followed by the table name and then a list of required columns or Fields within a pair of parentheses then I add the values keyword”
- DQL (Data Query Language): Primarily used for querying or retrieving data from the database (SELECT statements fall under this category).
- DCL (Data Control Language): Used to control access and security within the database.
The document emphasizes that a DBMS (Database Management System) is crucial for interpreting and executing SQL instructions, acting as an intermediary between the SQL commands and the underlying database.

“a database interprets and makes sense of SQL instructions with the use of a database management system or dbms as a web developer you’ll execute all SQL instructions on a database using a dbms the dbms takes responsibility for transforming SQL instructions into a form that’s understood by the underlying database”

The advantages of using SQL are highlighted, including its simplicity, standardization, portability, comprehensiveness, and efficiency in processing large amounts of data.

“you now know that SQL is a simple standard portable comprehensive and efficient language that can be used to delete data retrieve and share data among multiple users and manage database security this is made possible through subsets of SQL like ddl or data definition language DML also known as data manipulation language dql or data query language and DCL also known as data control language and the final advantage of SQL is that it lets database users process large amounts of data quickly and efficiently”

Examples of basic SQL syntax are provided, such as creating a database (CREATE DATABASE College;) and creating a table (CREATE TABLE student ( … );). The INSERT INTO syntax for adding data to a table is also introduced.

Constraints like NOT NULL are mentioned as ways to enforce data integrity during table creation.

“the creation of a new customer record is aborted the not null default value is implemented using a SQL statement a typical not null SQL statement begins with the creation of a basic table in the database I can write a create table Clause followed by customer to define the table name followed by a pair of parentheses within the parentheses I add two columns customer ID and customer name I also Define each column with relevant data types end for customer ID as it stores”

SQL arithmetic operators (+, -, *, /, %) are introduced with examples. Logical operators (NOT, OR) and special operators (IN, BETWEEN) used in the WHERE clause for filtering data are also explained. The concept of JOIN clauses, including SELF-JOIN, for combining data from tables is briefly touched upon.

Subqueries (inner queries within outer queries) and Views (virtual tables based on the result of a query) are presented as advanced SQL concepts. User-defined functions and triggers are also introduced as ways to extend database functionality and automate actions. Prepared statements are mentioned as a more efficient way to execute SQL queries repeatedly. Date and time functions in MySQL are briefly covered.

II. Introduction to Command Line/Bash Shell

The document provides a basic introduction to using the command line or bash shell. Fundamental commands are explained:
- PWD (Print Working Directory): Shows the current directory.
- “to do that I run the PWD command PWD is short for print working directory I type PWD and press the enter key the command returns a forward slash which indicates that I’m currently in the root directory”
- LS (List): Displays the contents of the current directory. The -l flag provides a detailed list format.
- “if I want to check the contents of the root directory I run another command called LS which is short for list I type LS and press the enter key and now notice I get a list of different names of directories within the root level in order to get more detail of what each of the different directories represents I can use something called a flag flags are used to set options to the commands you run use the list command with a flag called L which means the format should be printed out in a list format I type LS space Dash l press enter and this Returns the results in a list structure”
- CD (Change Directory): Navigates between directories using relative or absolute paths. cd .. moves up one directory.
- “to step back into Etc type cdetc to confirm that I’m back there type bwd and enter if I want to use the other alternative you can do an absolute path type in CD forward slash and press enter Then I type PWD and press enter you can verify that I am back at the root again to step through multiple directories use the same process type CD Etc and press enter check the contents of the files by typing LS and pressing enter”
- MKDIR (Make Directory): Creates a new directory.
- “now I will create a new directory called submissions I do this by typing MK der which stands for make directory and then the word submissions this is the name of the directory I want to create and then I hit the enter key I then type in ls-l for list so that I can see the list structure and now notice that a new directory called submissions has been created I can then go into this”
- TOUCH: Creates a new empty file.
- “the Parent Directory next is the touch command which makes a new file of whatever type you specify for example to build a brand new file you can run touch followed by the new file’s name for instance example dot txt note that the newly created file will be empty”
- HISTORY: Shows a history of recently used commands.
- “to view a history of the most recently typed commands you can use the history command”
- File Redirection (>, >>, <): Allows redirecting the input or output of commands to files. > overwrites, >> appends.
- “if you want to control where the output goes you can use a redirection how do we do that enter the ls command enter Dash L to print it as a list instead of pressing enter add a greater than sign redirection now we have to tell it where we want the data to go in this scenario I choose an output.txt file the output dot txt file has not been created yet but it will be created based on the command I’ve set here with a redirection flag press enter type LS then press enter again to display the directory the output file displays to view the”
- GREP: Searches for patterns within files.
- “grep stands for Global regular expression print and it’s used for searching across files and folders as well as the contents of files on my local machine I enter the command ls-l and see that there’s a file called”
- CAT: Displays the content of a file.
- LESS: Views file content page by page.
- “press the q key to exit the less environment the other file is the bash profile file so I can run the last command again this time with DOT profile this tends to be used used more for environment variables for example I can use it for setting”
- VIM: A text editor used for creating and editing files.
- “now I will create a simple shell script for this example I will use Vim which is an editor that I can use which accepts input so type vim and”
- CHMOD: Changes file permissions, including making a file executable (chmod +x filename).
- “but I want it to be executable which requires that I have an X being set on it in order to do that I have to use another command which is called chmod after using this them executable within the bash shell”
The document also briefly mentions shell scripts (files containing a series of commands) and environment variables (dynamic named values that can affect the way running processes will behave on a computer).

III. Introduction to Git and GitHub

Git is introduced as a free, open-source distributed version control system used to manage source code history, track changes, revert to previous versions, and collaborate with other developers. Key Git commands mentioned include:
- GIT CLONE: Used to create a local copy of a remote repository (e.g., from GitHub).
- “to do this I type the command git clone and paste the https URL I copied earlier finally I press enter on my keyboard notice that I receive a message stating”
- LS -LA: Lists all files in a directory, including hidden ones (like the .git directory which contains the Git repository metadata).
- “the ls-la command another file is listed which is just named dot get you will learn more about this later when you explore how to use this for Source control”
- CD .git: Changes the current directory to the .git folder.
- “first open the dot get folder on your terminal type CD dot git and press enter”
- CAT HEAD: Displays the reference to the current commit.
- “next type cat head and press enter in git we only work on a single Branch at a time this file also exists inside the dot get folder under the refs forward slash heads path”
- CAT refs/heads/main: Displays the hash of the last commit on the main branch.
- “type CD dot get and press enter next type cat forward slash refs forward slash heads forward slash main press enter after you”
- GIT PULL: Fetches changes from a remote repository and integrates them into the local branch.
- “I am now going to explain to you how to pull the repository to your local device”
GitHub is described as a cloud-based hosting service for Git repositories, offering a user interface for managing Git projects and facilitating collaboration.

IV. Introduction to Python Programming

The document introduces Python as a versatile programming language and outlines different ways to run Python code:
- Python Shell: An interactive environment for running and testing small code snippets without creating separate files.
- “the python shell is useful for running and testing small scripts for example it allows you to run code without the need for creating new DOT py files you start by adding Snippets of code that you can run directly in the shell”
- Running Python Files: Executing Python code stored in files with the .py extension using the python filename.py command.
- “running a python file directly from the command line or terminal note that any file that has the file extension of dot py can be run by the following command for example type python then a space and then type the file”
Basic Python concepts covered include:
- Variables: Declaring and assigning values to variables (e.g., x = 5, name = “Alice”). Python automatically infers data types. Multiple variables can be assigned the same value (e.g., a = b = c = 10).
- “all I have to do is name the variable for example if I type x equals 5 I have declared a variable and assigned as a value I can also print out the value of the variable by calling the print statement and passing in the variable name which in this case is X so I type print X when I run the program I get the value of 5 which is the assignment since I gave the initial variable Let Me Clear My screen again you have several options when it comes to declaring variables you can declare any different type of variable in terms of value for example X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you”
- Data Types: Basic data types like integers, floats (decimal numbers), complex numbers, strings (sequences of characters enclosed in single or double quotes), lists, and tuples (ordered, immutable sequences) are introduced.
- “X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you you’ll learn more about this in an upcoming video on data types you can declare multiple variables and assign them to a single value as well for example making a b and c all equal to 10. I do this by typing a equals b equals C equals 10. I print all three… sequence types are classed as container types that contain one or more of the same type in an ordered list they can also be accessed based on their index in the sequence python has three different sequence types namely strings lists and tuples let’s explore each of these briefly now starting with strings a string is a sequence of characters that is enclosed in either a single or double quotes strings are represented by the string class or Str for”
- Operators: Arithmetic operators (+, -, *, /, **, %, //) and logical operators (and, or, not) are explained with examples.
- “example 7 multiplied by four okay now let’s explore logical operators logical operators are used in Python on conditional statements to determine a true or false outcome let’s explore some of these now first logical operator is named and this operator checks for all conditions to be true for example a is greater than five and a is less than 10. the second logical operator is named or this operator checks for at least one of the conditions to be true for example a is greater than 5 or B is greater than 10. the final operator is named not this”
- Conditional Statements: if, elif (else if), and else statements are introduced for controlling the flow of execution based on conditions.
- “The Logical operators are and or and not let’s cover the different combinations of each in this example I declare two variables a equals true and B also equals true from these variables I use an if statement I type if a and b colon and on the next line I type print and in parentheses in double quotes”
- Loops: for loops (for iterating over sequences) and while loops are introduced with examples, including nested loops.
- “now let’s break apart the for Loop and discover how it works the variable item is a placeholder that will store the current letter in the sequence you may also recall that you can access any character in the sequence by its index the for Loop is accessing it in the same way and assigning the current value to the item variable this allows us to access the current character to print it for output when the code is run the outputs will be the letters of the word looping each letter on its own line now that you know about looping constructs in Python let me demonstrate how these work further using some code examples to Output an array of tasty desserts python offers us multiple ways to do loops or looping you’ll Now cover the for loop as well as the while loop let’s start with the basics of a simple for Loop to declare a for loop I use the four keyword I now need a variable to put the value into in this case I am using I I also use the in keyword to specify where I want to Loop over I add a new function called range to specify the number of items in a range in this case I’m using 10 as an example next I do a simple print statement by pressing the enter key to move to a new line I select the print function and within the brackets I enter the name looping and the value of I then I click on the Run button the output indicates the iteration Loops through the range of 0 to 9.”
- Functions: Defining and calling functions using the def keyword. Functions can take arguments and return values. Examples of using *args (for variable positional arguments) and **kwargs (for variable keyword arguments) are provided.
- “I now write a function to produce a string out of this information I type def contents and then self in parentheses on the next line I write a print statement for the string the plus self dot dish plus has plus self dot items plus and takes plus self dot time plus Min to prepare here we’ll use the backslash character to force a new line and continue the string on the following line for this to print correctly I need to convert the self dot items and self dot time… let’s say for example you wanted to calculate a total bill for a restaurant a user got a cup of coffee that was 2.99 then they also got a cake that was 455 and also a juice for 2.99. the first thing I could do is change the for Loop let’s change the argument to quarks by”
- File Handling: Opening, reading (using read, readline, readlines), and writing to files. The importance of closing files is mentioned.
- “the third method to read files in Python is read lines let me demonstrate this method the read lines method reads the entire contents of the file and then returns it in an ordered list this allows you to iterate over the list or pick out specific lines based on a condition if for example you have a file with four lines of text and pass a length condition the read files function will return the output all the lines in your file in the correct order files are stored in directories and they have”
- Recursion: The concept of a function calling itself is briefly illustrated.
- “the else statement will recursively call the slice function but with a modified string every time on the next line I add else and a colon then on the next line I type return string reverse Str but before I close the parentheses I add a slice function by typing open square bracket the number 1 and a colon followed by”
- Object-Oriented Programming (OOP): Basic concepts of classes (using the class keyword), objects (instances of classes), attributes (data associated with an object), and methods (functions associated with an object, with self as the first parameter) are introduced. Inheritance (creating new classes based on existing ones) is also mentioned.
- “method inside this class I want this one to contain a new function called leave request so I type def Leaf request and then self in days as the variables in parentheses the purpose of the leave request function is to return a line that specifies the number of days requested to write this I type return the string may I take a leave for plus Str open parenthesis the word days close parenthesis plus another string days now that I have all the classes in place I’ll create a few instances from these classes one for a supervisor and two others for… you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
- Modules: The concept of modules (reusable blocks of code in separate files) and how to import them using the import statement (e.g., import math, from math import sqrt, import math as m). The benefits of modular programming (scope, reusability, simplicity) are highlighted. The search path for modules (sys.path) is mentioned.
- “so a file like sample.py can be a module named Sample and can be imported modules in Python can contain both executable statements and functions but before you explore how they are used it’s important to understand their value purpose and advantages modules come from modular programming this means that the functionality of code is broken down into parts or blocks of code these parts or blocks have great advantages which are scope reusability and simplicity let’s delve deeper into these everything in… to import and execute modules in Python the first important thing to know is that modules are imported only once during execution if for example your import a module that contains print statements print Open brackets close brackets you can verify it only executes the first time you import the module even if the module is imported multiple times since modules are built to help you Standalone… I will now import the built-in math module by typing import math just to make sure that this code works I’ll use a print statement I do this by typing print importing the math module after this I’ll run the code the print statement has executed most of the modules that you will come across especially the built-in modules will not have any print statements and they will simply be loaded by The Interpreter now that I’ve imported the math module I want to use a function inside of it let’s choose the square root function sqrt to do this I type the words math dot sqrt when I type the word math followed by the dot a list of functions appears in a drop down menu and you can select sqrt from this list I passed 9 as the argument to the math.sqrt function assign this to a variable called root and then I print it the number three the square root of nine has been printed to the terminal which is the correct answer instead of importing the entire math module as we did above there is a better way to handle this by directly importing the square root function inside the scope of the project this will prevent overloading The Interpreter by importing the entire math module to do this I type from math import sqrt when I run this it displays an error now I remove the word math from the variable declaration and I run the code again this time it works next let’s discuss something called an alias which is an excellent way of importing different modules here I sign an alias called m to the math module I do this by typing import math as m then I type cosine equals m dot I”
- Scope: The concepts of local, enclosed, global, and built-in scopes in Python (LEGB rule) and how variable names are resolved. Keywords global and nonlocal for modifying variable scope are mentioned.
- “names of different attributes defined inside it in this way modules are a type of namespace name spaces and Scopes can become very confusing very quickly and so it is important to get as much practice of Scopes as possible to ensure a standard of quality there are four main types of Scopes that can be defined in Python local enclosed Global and built in the practice of trying to determine in which scope a certain variable belongs is known as scope resolution scope resolution follows what is known commonly as the legb rule let’s explore these local this is where the first search for a variable is in the local scope enclosed this is defined inside an enclosing or nested functions Global is defined at the uppermost level or simply outside functions and built-in which is the keywords present in the built-in module in simpler terms a variable declared inside a function is local and the ones outside the scope of any function generally are global here is an example the outputs for the code on screen shows the same variable name Greek in different scopes… keywords that can be used to change the scope of the variables Global and non-local the global keyword helps us access the global variables from within the function non- local is a special type of scope defined in Python that is used within the nested functions only in the condition that it has been defined earlier in the enclosed functions now you can write a piece of code that will better help you understand the idea of scope for an attributes you have already created a file called animalfarm.py you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
- Reloading Modules: The reload() function for re-importing and re-executing modules that have already been loaded.
- “statement is only loaded once by the python interpreter but the reload function lets you import and reload it multiple times I’ll demonstrate that first I create a new file sample.py and I add a simple print statement named hello world remember that any file in Python can be used as a module I’m going to use this file inside another new file and the new file is named using reloads.py now I import the sample.py module I can add the import statement multiple times but The Interpreter only loads it once if it had been reloaded we”
- Testing: Introduction to writing test cases using the assert keyword and the pytest framework. The convention of naming test functions with the test_ prefix is mentioned. Test-Driven Development (TDD) is briefly introduced.
- “another file called test Edition dot Pi in which I’m going to write my test cases now I import the file that consists of the functions that need to be tested next I’ll also import the pi test module after that I Define a couple of test cases with the addition and subtraction functions each test case should be named test underscore then the name of the function to be tested in our case we’ll have test underscore add and test underscore sub I’ll use the assert keyword inside these functions because tests primarily rely on this keyword it… contrary to the conventional approach of writing code I first write test underscore find string Dot py and then I add the test function named test underscore is present in accordance with the test I create another file named file string dot py in which I’ll write the is present function I Define the function named is present and I pass an argument called person in it then I make a list of names written as values after that I create a simple if else condition to check if the past argument”
V. Software Development Tools and Concepts

The document mentions several tools and concepts relevant to software development:
- Python Installation and Version: Checking the installed Python version using python –version.
- “prompt type python dash dash version to identify which version of python is running on your machine if python is correctly installed then Python 3 should appear in your console this means that you are running python 3. there should also be several numbers after the three to indicate which version of Python 3 you are running make sure these numbers match the most recent version on the python.org website if you see a message that states python not found then review your python installation or relevant document on”
- Jupyter Notebook: An interactive development environment (IDE) for Python. Installation using python -m pip install jupyter and running using jupyter notebook are mentioned.
- “course you’ll use the Jupiter put her IDE to demonstrate python to install Jupiter type python-mpip install Jupiter within your python environment then follow the jupyter installation process once you’ve installed jupyter type jupyter notebook to open a new instance of the jupyter notebook to use within your default browser”
- MySQL Connector: A Python library used to connect Python applications to MySQL databases.
- “the next task is to connect python to your mySQL database you can create the installation using a purpose-built python Library called MySQL connector this library is an API that provides useful”
- Datetime Library: Python’s built-in module for working with dates and times. Functions like datetime.now(), datetime.date(), datetime.time(), and timedelta are introduced.
- “python so you can import it without requiring pip let’s review the functions that Python’s daytime Library offers the date time Now function is used to retrieve today’s date you can also use date time date to retrieve just the date or date time time to call the current time and the time Delta function calculates the difference between two values now let’s look at the Syntax for implementing date time to import the daytime python class use the import code followed by the library name then use the as keyword to create an alias of… let’s look at a slightly more complex function time Delta when making plans it can be useful to project into the future for example what date is this same day next week you can answer questions like this using the time Delta function to calculate the difference between two values and return the result in a python friendly format so to find the date in seven days time you can create a new variable called week type the DT module and access the time Delta function as an object 563 instance then pass through seven days as an argument finally”
- MySQL Workbench: A graphical tool for working with MySQL databases, including creating schemas.
- “MySQL server instance and select the schema menu to create a new schema select the create schema option from the menu pane in the schema toolbar this action opens a new window within this new window enter mg underscore schema in the database name text field select apply this generates a SQL script called create schema mg schema you 606 are then asked to review the SQL script to be applied to your new database click on the apply button within the review window if you’re satisfied with the script a new window”
- Data Warehousing: Briefly introduces the concept of a centralized data repository for integrating and processing large amounts of data from multiple sources for analysis. Dimensional data modeling is mentioned.
- “in the next module you’ll explore the topic of data warehousing in this module you’ll learn about the architecture of a data warehouse and build a dimensional data model you’ll begin with an overview of the concept of data warehousing you’ll learn that a data warehouse is a centralized data repository that loads integrates stores and processes large amounts of data from multiple sources users can then query this data to perform data analysis you’ll then”
- Binary Numbers: A basic explanation of the binary number system (base-2) is provided, highlighting its use in computing.
- “binary has many uses in Computing it is a very convenient way of… consider that you have a lock with four different digits each digit can be a zero or a one how many potential past numbers can you have for the lock the answer is 2 to the power of four or two times two times two times two equals sixteen you are working with a binary lock therefore each digit can only be either zero or one so you can take four digits and multiply them by two every time and the total is 16. each time you add a potential digit you increase the”
- Knapsack Problem: A brief overview of this optimization problem is given as a computational concept.
- “three kilograms additionally each item has a value the torch equals one water equals two and the tent equals three in short the knapsack problem outlines a list of items that weigh different amounts and have different values you can only carry so many items in your knapsack the problem requires calculating the optimum combination of items you can carry if your backpack can carry a certain weight the goal is to find the best return for the weight capacity of the knapsack to compute a solution for this problem you must select all items”
This document provides a foundational overview of databases and SQL, command-line basics, version control with Git and GitHub, and introductory Python programming concepts, along with essential development tools. The content suggests a curriculum aimed at individuals learning about software development, data management, and related technologies.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
April 18, 2025
MySQL Full Course for Beginners with Practical Learn MySQL in 3 Hours

YouTube Video

MySQL Full Course for Beginners with Practical [FREE] | Learn MySQL in 3 Hours

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

April 15, 2025
Beginning Oracle Database 12c Administration
This book, “Beginning Oracle Database 12c Administration, 2nd Edition,” is a comprehensive guide to Oracle database administration. It covers fundamental database concepts, SQL and PL/SQL, Oracle architecture, and essential administrative tasks such as user management, data loading, backups, and recovery. The text also emphasizes practical work practices and problem-solving methodologies, including the importance of proper planning and licensing. Finally, it highlights the broader IT context of database administration, emphasizing communication and the role of the DBA within an organization.

Oracle Database Administration Study Guide

SQL and PL/SQL

Subqueries

A subquery is a SELECT statement that is embedded within another DML statement (SELECT, INSERT, UPDATE, or DELETE) or within another subquery. Subqueries are always enclosed in parentheses and can return a single value, a single row, or multiple rows of data.

There are three main types of subqueries:
1. Inline view: This type of subquery appears in the FROM clause of a SELECT statement. It acts like a temporary table, allowing you to select from the results of the subquery.
2. Scalar subquery: This type of subquery returns exactly one data item from one row. It can be used wherever a single value is expected, such as in a SELECT list, a WHERE clause, or a HAVING clause.
3. Correlated subquery: This type of subquery depends on the outer query for its values. It is executed repeatedly, once for each row processed by the outer query.
Types of SQL

SQL is a powerful language for managing and manipulating relational databases. It is divided into two main categories:
1. Data Manipulation Language (DML): Used to retrieve, insert, update, and delete data in a database.
- SELECT: Retrieves data from one or more tables
- INSERT: Adds new rows into a table
- UPDATE: Modifies existing data in a table
- MERGE: Combines INSERT and UPDATE operations based on a condition
- DELETE: Removes rows from a table
1. Data Definition Language (DDL): Used to define the structure of the database, including creating, altering, and dropping database objects like tables, views, indexes, and users.
- CREATE: Creates a new database object
- ALTER: Modifies the structure of an existing object
- DROP: Removes an existing object
Railroad Diagrams

Oracle uses railroad diagrams to illustrate the syntax of SQL commands. These diagrams provide a visual representation of the different clauses and options available for each command, showing both mandatory and optional elements.

Database Architecture

Data Files

Data files are the physical files that store the actual data of an Oracle database. They are organized into logical units called tablespaces.

Key points about data files:
- Each data file belongs to one tablespace.
- Data files are typically named with a descriptive name and a .dbf or .ora extension.
- Space within data files is divided into data blocks, also called pages.
- Each data block contains data from only one table.
- A contiguous range of data blocks allocated to a table is called an extent.
Server Processes

Oracle uses server processes to manage connections and execute user requests. There are two main types of server architectures:
1. Dedicated Server Architecture: A dedicated server process is created for each user connection. This process handles all requests from the connected user.
2. Multithreaded Server (MTS) Architecture: A pool of shared server processes is used to handle user connections. Dispatcher processes route user requests to available shared servers. MTS is less commonly used than the dedicated server architecture.
Software Installation

The software installation process involves setting up the operating system environment, installing the Oracle software, and configuring the listener.

Key considerations:
- Setting up appropriate user accounts and permissions
- Configuring the network listener to allow client connections
- Setting up firewalls to secure the database server
Database Creation

The Database Configuration Assistant (DBCA) is a graphical tool that simplifies the process of creating and configuring an Oracle database.

Key parameters:
- db_block_size: Specifies the size of data blocks
- db_name: Defines the name of the database
- db_recovery_file_dest: Sets the location for recovery files
- memory_target: Sets the total amount of memory allocated to the SGA and PGA
- processes: Defines the maximum number of processes that can connect to the database
Physical Database Design

Physical database design focuses on the efficient storage and retrieval of data within the database.

Partitioning

Partitioning is a technique for dividing large tables and indexes into smaller, more manageable pieces called partitions.

Types of partitioning:
- List partitioning: Divides data based on a list of discrete values.
- Range partitioning: Divides data based on ranges of values.
- Interval partitioning: Automatically creates new partitions based on specified intervals.
- Hash partitioning: Distributes data randomly across partitions using a hashing function.
- Reference partitioning: Partitions a child table based on the partitioning scheme of its parent table.
- Composite partitioning: Combines different partitioning methods to create subpartitions within a partition.
Partition Views

Partition views combine data from multiple partitioned tables to present a unified view of the data to the user. They provide transparency to the user, hiding the underlying partitioning scheme.

User Management and Data Loading

User Management

Key commands for managing user accounts:
- CREATE USER: Creates a new user account in the database.
- ALTER USER: Modifies an existing user account, such as changing passwords, assigning quotas, or setting default and temporary tablespaces.
- DROP USER: Removes a user account from the database.
- GRANT: Assigns privileges to a user, allowing them to perform specific actions in the database.
- REVOKE: Removes privileges from a user.
Data Loading

Key methods for loading data into an Oracle database:
- Data Pump: A high-speed utility for exporting and importing data. The expdp and impdp commands provide a wide range of options for controlling the data loading process.
- Export/Import: An older utility for data loading. The exp and imp commands are still available but are less efficient than Data Pump.
- SQL*Loader: A command-line utility for loading data from external files. It uses a control file to define the format of the input data and map it to the database columns.
Quiz

Instructions: Answer the following questions in 2-3 sentences each.
1. What are the three main types of subqueries, and how do they differ?
2. Explain the difference between DML and DDL and provide examples of each.
3. How do railroad diagrams help in understanding SQL syntax?
4. What are data blocks and extents in the context of data files?
5. Compare and contrast the dedicated server and multithreaded server architectures.
6. What are some key considerations during the software installation process for Oracle Database?
7. Explain the concept of database partitioning and list at least three different partitioning methods.
8. What is the purpose of a partition view?
9. Describe the steps involved in creating a new user account and granting them privileges to access database objects.
10. List and briefly explain three different methods for loading data into an Oracle database.
Answer Key
1. The three main types of subqueries are inline views, scalar subqueries, and correlated subqueries. Inline views act like temporary tables in the FROM clause, scalar subqueries return a single value, and correlated subqueries depend on the outer query for their values.
2. DML (Data Manipulation Language) is used for manipulating data within a database, while DDL (Data Definition Language) is used for defining the database structure. Examples of DML include SELECT, INSERT, UPDATE, and DELETE, while examples of DDL include CREATE, ALTER, and DROP.
3. Railroad diagrams provide a visual representation of the syntax of SQL commands, showing both mandatory and optional elements. They help to understand the order and relationships between different clauses and options.
4. Data blocks (also called pages) are the units of storage within data files, with a fixed size. Extents are contiguous ranges of data blocks allocated to a specific table.
5. A dedicated server architecture assigns a separate process to each user connection, while a multithreaded server (MTS) architecture uses a pool of shared server processes to handle multiple connections. MTS can be more efficient for handling many concurrent connections but is less commonly used than the dedicated server architecture.
6. Key considerations during Oracle Database software installation include setting up appropriate user accounts and permissions, configuring the network listener, and setting up firewalls. These steps ensure security and allow clients to connect to the database server.
7. Database partitioning involves dividing large tables and indexes into smaller pieces called partitions. This improves manageability and performance. Different partitioning methods include list partitioning (based on discrete values), range partitioning (based on value ranges), and hash partitioning (based on a hashing function).
8. A partition view combines data from multiple partitioned tables into a single logical view. This allows users to query the data transparently without needing to know about the underlying partitioning scheme.
9. To create a new user account, use the CREATE USER command, specifying a username and password. Use the GRANT command to assign privileges to the user, allowing them to perform actions like creating tables, selecting data, or modifying data.
10. Three methods for loading data into Oracle Database are Data Pump (using expdp and impdp commands), Export/Import (using exp and imp commands), and SQL*Loader (using a control file to define the data format). Data Pump is the most efficient method for large datasets.
Essay Questions
1. Discuss the advantages and disadvantages of using different partitioning methods in Oracle Database. Provide real-world scenarios where each method would be most appropriate.
2. Explain the concept of read consistency in Oracle Database. How is it achieved, and what are its benefits and limitations?
3. Describe the different types of database backups available in Oracle Database. Discuss best practices for implementing a comprehensive backup and recovery strategy.
4. Explain the importance of database monitoring and performance tuning. Describe the tools and techniques available in Oracle Database for monitoring performance and identifying bottlenecks.
5. Discuss the role of the Oracle Data Dictionary in database administration. How can the Data Dictionary be used to obtain information about database objects, users, and privileges?
Glossary of Key Terms
- Data Block: The fundamental unit of storage within an Oracle data file, with a fixed size. Also called a page.
- Extent: A contiguous range of data blocks allocated to a table or index.
- Tablespace: A logical grouping of data files. Tablespaces help to organize and manage database storage.
- Dedicated Server Process: A server process dedicated to handling requests from a single user connection.
- Multithreaded Server (MTS): A server architecture that uses a pool of shared server processes to handle multiple user connections.
- Partitioning: A technique for dividing large tables and indexes into smaller, more manageable pieces called partitions.
- Partition View: A logical view that combines data from multiple partitioned tables, providing a unified view of the data.
- Data Pump: A high-speed utility for exporting and importing data in Oracle Database.
- SQL*Loader: A command-line utility for loading data into Oracle Database from external files.
- Read Consistency: A feature of Oracle Database that ensures that all data read during a transaction is consistent with the state of the database when the transaction started.
- Data Dictionary: A collection of metadata tables and views that store information about the structure and contents of an Oracle database.
- System Global Area (SGA): A shared memory area used by all Oracle processes to store database data and control information.
- Program Global Area (PGA): A private memory area allocated to each Oracle server process for its own use.
- SQL Tuning Advisor: A tool that analyzes SQL statements and recommends changes to improve their performance.
- Automatic Workload Repository (AWR): A repository that stores historical performance data about an Oracle database.
- Statspack: An older tool that collects and reports performance statistics for Oracle databases.
- Wait Interface: A set of dynamic performance views that provide information about the wait events experienced by Oracle processes.
Briefing Document: Oracle Database 12c Administration

This document reviews key themes and insights from excerpts of “Beginning Oracle Database 12c Administration, 2nd Edition,” focusing on database architecture, administration, maintenance, and tuning.

I. Database Architecture
- Data Storage: Oracle databases utilize data files organized into tablespaces. Data within these files is structured into equal-sized data blocks, typically 8KB. An extent is a contiguous range of data blocks allocated to a table when it requires more space.
- “The space within data files is organized into data blocks (sometimes called pages) of equal size… Each block contains data from just one table… When a table needs more space, it grabs a contiguous range of data blocks called an extent” (Chapter 2).
- Server Processes: Oracle employs a dedicated server process for each user connection. This process handles tasks like permission checks, query plan generation, and data retrieval.
- “A dedicated server process is typically started whenever a user connects to the database—it performs all the work requested by the user” (Chapter 2).
- Memory Structures: The System Global Area (SGA) is a shared memory region crucial for database operations. It includes the database buffer cache for storing frequently accessed data blocks, the redo log buffer for transaction logging, and the shared pool for storing parsed SQL statements and execution plans.
- Background Processes: Essential for database functionality, background processes include:
- DBWn (Database Writer): Writes modified data blocks from the buffer cache to data files.
- LGWR (Log Writer): Writes redo log entries from the redo log buffer to redo log files.
- CKPT (Checkpoint): Synchronizes data files and control files with the database’s current state.
- SMON (System Monitor): Performs instance recovery after a system crash and coalesces free space in tablespaces.
II. Database Administration
- SQL Language: Oracle utilizes SQL for both data manipulation (DML) and data definition (DDL). Railroad diagrams, often recursive, are used to explain the syntax and structure of SQL statements. Subqueries, particularly inline views and scalar subqueries, play significant roles in complex queries.
- User Management: The CREATE USER statement creates new users, defining their authentication, default and temporary tablespaces, and initial profile. ALTER USER modifies user attributes like passwords and tablespace quotas. GRANT and REVOKE commands control access privileges on database objects.
- “The CREATE USER statement should typically specify a value for DEFAULT TABLESPACE… and TEMPORARY TABLESPACE” (Chapter 8).
- Data Loading: Oracle provides several methods for importing data:
- SQL*Loader: A powerful utility for loading data from external files.
- Data Pump Export (expdp) and Import (impdp): Introduced in Oracle 10g, these utilities offer features like parallelism, compression, and encryption for efficient data transfer.
III. Physical Database Design
- Partitioning: A technique for dividing large tables into smaller, manageable pieces. Different partitioning strategies include range, list, hash, composite, and reference partitioning. Partitioning enhances query performance, backup and recovery, and data management.
- Indexes: Data structures that speed up data retrieval. B*tree indexes are commonly used in OLTP environments, while bitmap indexes are suitable for data warehousing.
- “Most indexes are of the btree (balanced tree) type and are best suited for online transaction-processing environments”* (Chapter 17).
IV. Database Maintenance
- Backups: Regular backups are vital for data protection and recovery. RMAN (Recovery Manager) is Oracle’s recommended tool for performing backups and managing backup sets. Strategies include full, incremental, and cumulative backups.
- Recovery: Techniques for restoring a database to a consistent state after failures. Options include:
- Data Recovery Advisor (DRA): An automated tool for diagnosing and repairing database corruption.
- Flashback Technologies: Allow for quick recovery from logical errors or unintentional data modifications.
- LogMiner: Enables analysis of archived redo logs to recover specific data changes.
- Space Management: Monitoring tablespace usage and free space is crucial. Techniques like segment shrinking and coalescing free space can help optimize storage utilization.
V. Database Tuning
- Performance Monitoring: Tools like Statspack, AWR (Automatic Workload Repository), and dynamic performance views provide insights into database performance.
- Statspack: Collects performance snapshots for analysis.
- “Note that Statspack is not documented in the reference guides for Oracle Database 10g, 11g, and 12c, even though it has been upgraded for all these versions” (Chapter 16).
- AWR: A more comprehensive and automated performance monitoring framework.
- SQL Tuning: Identifying and optimizing inefficient SQL statements is crucial for improving overall database performance. Techniques include index creation and tuning, hint usage, and utilizing the SQL Tuning Advisor.
- Wait Interface: Analyzing wait events helps pinpoint performance bottlenecks. Common wait events like db file sequential read and log file sync provide clues for optimization.
VI. Key Takeaways
- Understanding Oracle’s architectural components is fundamental for effective administration.
- Proper planning for licensing, hardware sizing, and configuration is essential for a successful deployment.
- Regular maintenance tasks like backups, recovery drills, and space management ensure database health and data integrity.
- Proactive performance monitoring and SQL tuning are critical for achieving optimal database performance.
- Utilizing Oracle’s various tools and features like RMAN, Data Pump, and the SQL Tuning Advisor simplifies administrative tasks and enhances efficiency.
Oracle Database Administration FAQ

What are the different types of subqueries in Oracle SQL?

There are three main types of subqueries:
- Inline views: These are subqueries used in the FROM clause as a table reference. They act like temporary views within a larger query.
- Scalar subqueries: These subqueries return a single value and can be used wherever a single value is expected, such as in a SELECT list or WHERE clause.
- Correlated subqueries: These subqueries depend on values from the outer query and are executed repeatedly for each row of the outer query.
How is space organized within Oracle data files?

Space in data files is structured in data blocks, also known as pages. Each data file has a fixed block size (e.g., 8KB) defined at the tablespace level. A block holds data for a single table. To accommodate growth, tables claim a contiguous series of data blocks, forming an extent.

What are the main types of server processes in Oracle?

Oracle primarily uses two types of server processes:
- Dedicated server processes: A dedicated server process handles requests for a single user connection. This is the typical model.
- Shared server processes (Multithreaded Server – MTS): In this model, a pool of shared server processes handles requests from multiple users. This approach can be more efficient for environments with many concurrent but mostly idle connections.
What are the different types of partitioning available in Oracle?

Oracle offers several partitioning methods:
- Range partitioning: Data is divided into partitions based on a range of values for a specific column, typically a date or number.
- List partitioning: Partitions are created based on lists of discrete values for a specific column.
- Hash partitioning: A hashing function distributes data across partitions, aiming for even data distribution.
- Interval partitioning: This is an extension of range partitioning where new partitions are automatically created based on a defined interval.
- Reference partitioning: This method partitions a child table based on the partitioning key of a referenced parent table.
- Composite partitioning: This approach combines multiple partitioning methods, allowing for partitions to be further divided into subpartitions.
How can I export and import data in Oracle?

Oracle provides multiple utilities for data export and import:
- Data Pump (expdp and impdp): This is the preferred method in modern Oracle versions, offering features like parallelism, compression, and encryption.
- Original Export/Import (exp and imp): Although less commonly used now, these utilities are still available and offer various options for data export and import.
- SQL*Loader: This utility loads data from external files into Oracle tables, using a control file to define the data format and loading rules.
What is the purpose of the Oracle Data Dictionary?

The Data Dictionary is a collection of metadata tables and views containing information about the structure and objects within an Oracle database. It stores details about tables, indexes, users, privileges, and other database components. It is crucial for understanding the database’s structure and troubleshooting issues.

What are some tools for monitoring an Oracle database?

Several tools help monitor an Oracle database:
- Oracle Enterprise Manager: A comprehensive suite with web-based interfaces for monitoring and managing various aspects of the database.
- Statspack: A lightweight performance monitoring tool capturing snapshots of database activity for analysis.
- Automatic Workload Repository (AWR): Built into the database, AWR automatically collects performance data and generates reports.
- Dynamic Performance Views: Real-time views providing detailed information about database activity.
- Third-party tools: Tools like Toad and DBArtisan provide extensive monitoring and management features.
What are some techniques for tuning SQL queries in Oracle?

Effective SQL tuning involves a multi-faceted approach:
- Understanding the Execution Plan: Analyze the query plan to identify bottlenecks and areas for optimization.
- Using Indexes Appropriately: Create and utilize indexes effectively to speed up data retrieval.
- Rewriting Queries for Efficiency: Optimize query structure, consider using hints, and avoid unnecessary operations.
- Collecting Statistics: Ensure up-to-date statistics are available for the optimizer to make informed decisions.
- Using the SQL Tuning Advisor: Employ the advisor to identify and implement potential optimizations.
- Considering Materialized Views: Pre-calculate and store query results to improve performance for frequently used complex queries.
Oracle 12c Database Administration

Timeline of Events:

This text excerpt does not present a narrative with a sequence of events. Instead, it offers technical information and instructions related to Oracle Database 12c administration. The provided content focuses on aspects like:
- SQL fundamentals: Introduction to SQL language, different types of SQL statements (DML and DDL), and the use of railroad diagrams for understanding SQL syntax.
- Database Structure: Explanation of data files, tablespaces, data blocks, and extents within Oracle databases.
- Server Processes: Description of dedicated server processes and the multithreaded server model.
- Software Installation: Instructions for software installation including setting up iptables firewall rules.
- Database Creation: Details about setting database parameters, data files, and tablespace sizes during database creation.
- Physical Database Design: Exploration of different partitioning techniques like list, range, interval, hash, reference, and composite partitioning for efficient data organization.
- User Management and Data Loading: Guidance on user creation, granting and revoking privileges, managing tablespaces, and using utilities like exp/imp and expdp/impdp for data loading and export.
- Database Support: Introduction to data dictionary views and their importance in database administration, and brief mention of third-party tools.
- Monitoring: Overview of monitoring database activity through alert logs, checking CPU and load average, understanding listener issues, and using tools like AWR and Statspack for performance monitoring.
- Fixing Problems: Troubleshooting scenarios related to unresponsive listeners and data corruption using tools like DRA and RMAN.
- Database Maintenance: Tasks like archiving, auditing, backups, purging, rebuilding, statistics gathering, and user management as part of regular database maintenance.
- SQL Tuning: Understanding the role of indexes, interpreting query execution plans, and utilizing tools like SQL Tuning Advisor for optimizing SQL statement performance.
Therefore, it’s not feasible to create a timeline based on the provided content.

Cast of Characters:

This technical text excerpt doesn’t feature individual characters in a narrative sense. It primarily focuses on technical concepts and instructions related to Oracle Database 12c administration.

However, we can identify some key entities mentioned:
- Oracle: The company developing and providing the Oracle Database software.
- DBA (Database Administrator): The individual responsible for managing and maintaining the Oracle database.
- Users: Individuals accessing and utilizing the Oracle database. Specific users like “ifernand,” “hr,” and “clerical_role” are mentioned as examples in user management and data loading sections.
Instead of character bios, we can highlight their roles:
- Oracle: Provides the software, documentation, and support for Oracle Database.
- DBA: Performs tasks like installation, configuration, security management, performance tuning, backup and recovery, and user management.
- Users: Utilize the database for various purposes, depending on their assigned roles and privileges.
This information clarifies the roles of entities involved in Oracle database administration, even though traditional character bios are not applicable in this context.

Oracle Database Administration

The most concrete aspect of a database is the files on the storage disks connected to the database host [1]. The location of the database software is called the Oracle home [1]. The path to that location is usually stored in the environment variable ORACLE_HOME [1]. There are two types of database software: server and client software [1]. Server software is necessary to create and manage the database and is required only on the database host [1]. **Client software is necessary to utilize the database and is required on every user’s computer. The most common example is the SQL*Plus command-line tool** [1].

Well-known configuration files include init.ora, listener.ora, and tnsnames.ora [2]. Data files are logically grouped into tablespaces [2]. Each Oracle table or index is assigned to one tablespace and shares the space with other tables assigned to the same tablespace [2]. Data files can grow automatically if the database administrator wishes [2]. The space within data files is organized into equally sized blocks; all data files belonging to a tablespace use the same block size [2]. When a data table needs more space, it grabs a contiguous range of data blocks called an extent [2]. It is conventional to use the same extent size for all tables in a tablespace [2].

Oracle records important events and errors in the alert log [3]. A detailed trace file is created when a severe error occurs [3]. Oracle Database administrators need to understand SQL in all its forms [4]. All database activity, including database administration activities, is transacted in SQL [4]. Oracle reference works use railroad diagrams to teach the SQL language [5]. SQL is divided into Data Manipulation Language (DML) and Data Definition Language (DDL) [5]. DML includes the SELECT, INSERT, UPDATE, MERGE, and DELETE statements [5]. DDL includes the CREATE, ALTER, and DROP statements for the different classes of objects in an Oracle database [5]. The SQL reference manual also describes commands that can be used to perform database administration activities such as stopping and starting databases [5].

Programs written in PL/SQL can be stored in an Oracle database [6]. Using these programs has many advantages, including efficiency, control, and flexibility [6]. PL/SQL offers a full complement of structured programming mechanisms such as condition checking, loops, and subroutines [6].

When you stop thinking in terms of command-line syntax such as create database and GUI tools such as the Database Creation Assistant (dbca) and start thinking in terms such as:
- security management
- availability management
- continuity management
- change management
- incident management
- problem management
- configuration management
- release management
- and capacity management,
the business of database administration begins to make coherent sense, and you become a more effective database administrator [7]. These terms are part of the standard jargon of the IT Infrastructure Library (ITIL), a suite of best practices used by IT organizations throughout the world [7].

Every object in a database is explicitly owned by a single owner, and the owner of an object must explicitly authorize its use by anybody else. The collection of objects owned by a user is called a schema [8, 9]. The terms user, schema, schema owner, and account are used interchangeably [8].

A database is an information repository that must be competently administered using the principles laid out in the IT Infrastructure Library (ITIL), including:
- security management
- availability management
- continuity management
- change management
- incident management
- problem management
- configuration management
- release management
- and capacity management [10].
The five commands required for user management are CREATE USER, ALTER USER, DROP USER, GRANT, and REVOKE [9].

Form-based tools also simplify the task of database administration [11]. A workman is as good as his tools [11].

Enterprise Manager comes in two flavors: Database Express and Cloud Control. Both are web-based tools. Database Express is used to manage a single database, whereas Grid Control is used to manage multiple databases [12]. You can accomplish most DBA tasks—from mundane tasks such as password resets and creating indexes to complex tasks such as backup and recovery—by using Enterprise Manager instead of command-line tools such as SQL*Plus [12].

SQL Developer is primarily a tool for software developers, but database administrators will find it very useful. Common uses are examining the structure of a table and checking the execution plan for a query [13]. It can also be used to perform some typical database administration tasks such as identifying and terminating blocking sessions [13].

Remote Diagnostic Agent (RDA) is a tool provided by Oracle Support to collect information about a database and its host system. RDA organizes the information it gathers into an HTML framework for easy viewing [13]. It is a wonderful way to document all aspects of a database system [13].

Oracle stores database metadata—data about data—in tables, just as in the case of user data. This collection of tables is called the data dictionary. The information in the data dictionary tables is very cryptic and condensed for maximum efficiency during database operation. The data dictionary views are provided to make the information more comprehensible to the database administrator [14].

The alert log contains error messages and informational messages. The location of the alert log is listed in the V$DIAG_INFO view. The name of the alert log is alert_SID.log, where SID is the name of your database instance [15]. Enterprise Manager monitors the database and sends e-mail messages when problems are detected [16]. The command AUDIT ALL enables auditing for a wide variety of actions that modify the database and objects in it, such as ALTER SYSTEM, ALTER TABLESPACE, ALTER TABLE, and ALTER INDEX [16]. The AUDIT CREATE SESSION command causes all connections and disconnections to be recorded [16]. Recovery Manager (RMAN) maintains detailed history information about backups. RMAN commands such as list backup, report need backup, and report unrecoverable can be used to review backups. Enterprise Manager can also be used to review backups [16].

Database maintenance is required to keep the database in peak operating condition. Most aspects of database maintenance can be automated. Oracle performs some maintenance automatically: collecting statistics for the query optimizer to use [17].

Competency in Oracle technology is only half of the challenge of being a DBA. If you had very little knowledge of Oracle technology but knew exactly “which” needed to be done, you could always find out how to do it—there is Google, and there are online manuals aplenty [18]. Too many Oracle DBAs don’t know “which” to do, and what they have when they are through is “just a mess without a clue” [18].

Any database administration task that is done repeatedly should be codified into an SOP. Using a written SOP has many benefits, including efficiency, quality, and consistency [19].

The free Oracle Database 12c Performance Tuning Guide offers a detailed and comprehensive treatment of performance-tuning methods [20].

Perhaps the most complex problem in database administration is SQL tuning. The paucity of books devoted to SQL tuning is perhaps further evidence of the difficulty of the topic [21]. The only way to interact with Oracle, to retrieve data, to change data, and to administer the database is SQL [21]. Oracle itself uses SQL to perform all the work that it does behind the scenes. SQL performance is, therefore, the key to database performance; all database performance problems are really SQL performance problems, even if they express themselves as contention for resources [21].

Relational Databases and SQL

A relational database is a database in which the data is perceived by the user as tables, and the operators available to the user are operators that generate “new” tables from “old” ones. [1] Relational database theory was developed as an alternative to the “programmer as navigator” paradigm prevalent in pre-relational databases. [2] In these databases, records were connected using pointers. To access data, you would have to navigate to a specific record and then follow a chain of records. [2] This approach required programmers to be aware of the database’s physical structure, which made applications difficult to develop and maintain. [3]

Relational databases address these problems by using relational algebra, a collection of operations used to combine tables. [4] These operations include:
- Selection: Creating a new table by extracting a subset of rows from a table based on specific criteria. [5]
- Projection: Creating a new table by extracting a subset of columns from a table. [5]
- Union: Creating a new table by combining all rows from two tables. [5]
- Difference: Creating a new table by extracting rows from one table that do not exist in another table. [6]
- Join: Creating a new table by concatenating records from two tables. [6]
One of the significant advantages of relational databases is that they allow users to interact with the data without needing to know the database’s physical structure. [3] The database management system is responsible for determining the most efficient way to execute queries. [7] This separation between the logical and physical aspects of the database is known as physical data independence. [8]

SQL (Structured Query Language) is the standard language used to interact with relational databases. [9] SQL allows users to perform various operations, including:
- Retrieving data.
- Inserting, updating, and deleting data.
- Managing database objects such as tables and indexes.
Despite its widespread adoption, SQL has been criticized for some of its features, including the allowance of duplicate rows and the use of nullable data items. [10, 11] However, SQL remains the most widely used language for interacting with relational databases, and it is an essential skill for database administrators. [11]

SQL and PL/SQL in Oracle Databases

SQL (Structured Query Language) is the primary language used to interact with Oracle databases, encompassing all database activities, including administration. [1] Database administrators need to be well-versed in SQL due to its extensive capabilities and functionalities. [1] The significance of SQL is evident in the sheer volume of the Oracle Database 12c SQL Language Reference, which spans nearly 2,000 pages. [1]

SQL offers a powerful set of features, including:
- Data Manipulation Language (DML): This subset of SQL focuses on modifying data within the database. DML statements include SELECT, INSERT, UPDATE, MERGE, and DELETE. [2, 3]
- Data Definition Language (DDL): DDL statements handle the creation, modification, and removal of database objects, such as tables and indexes. Common DDL statements include CREATE, ALTER, and DROP. [2, 4]
Oracle’s reference manuals utilize railroad diagrams to illustrate the syntax and numerous optional clauses of SQL statements. [5] These diagrams provide a visual representation of the structure and flow of SQL commands. [5] A notable aspect of railroad diagrams is their ability to incorporate subdiagrams and even reference themselves recursively, adding to the complexity and power of SQL. [6]

PL/SQL (Procedural Language/SQL) extends the capabilities of SQL by providing procedural programming constructs within the Oracle database. [7] PL/SQL empowers developers to create sophisticated programs that interact with the database, leveraging features such as:
- Condition checking: Implementing decision-making logic within PL/SQL programs. [7]
- Loops: Enabling repetitive execution of code blocks for efficient processing. [7]
- Subroutines: Encapsulating reusable code segments for modularity and code organization. [7]
One of the prominent applications of PL/SQL is the creation of triggers, which automatically execute predefined actions in response to specific database events. [7] For instance, the HR schema employs a trigger to log historical job changes whenever the job_id in the employees table is modified. [8] Triggers enhance data integrity, security, and auditing capabilities within the database. [9]

Storing PL/SQL programs within the database offers several advantages, such as:
- Enhanced efficiency: Reduced communication overhead between client and server, resulting in improved performance. [9]
- Improved control: Streamlined enforcement of business rules through triggers. [9]
- Increased flexibility: Empowering SQL statements with the added power and versatility of PL/SQL functions. [9]
The combined capabilities of SQL and PL/SQL make them essential tools for Oracle database administrators, enabling them to manage data, enforce rules, and optimize database operations effectively.

Database Backup and Recovery Strategies

Database backups are crucial for protecting against data loss due to user error, operator error, or hardware failure. Backups are essentially snapshots of a database or a portion of a database taken at a specific point in time. If a database is damaged, these backups can be used to restore it to a functional state. Additionally, archived logs, which contain records of all transactions performed on the database, can be used in conjunction with backups to replay modifications made after the backup was created, ensuring a complete recovery. [1]

Determining the appropriate backup strategy requires careful consideration of various factors, including the business needs, cost-effectiveness, and available resources. Several key decisions need to be made: [2]
- Storage Medium: Backups can be stored on tape or disk. Tapes offer advantages in terms of cost and reliability, while disks provide faster access and ease of management. A common approach is to create backups on disks initially and then copy them to tapes for long-term storage. [2-4]
- Backup Scope: Full backups capture the entire database, while partial backups focus on specific portions, such as changed data blocks or read-only tablespaces. [5]
- Backup Level: Level 0 backups are full backups, while level 1 backups, also known as incremental backups, only include data blocks that have changed since the last level 0 backup. This approach balances backup frequency with resource consumption. [6]
- Backup Type: Physical backups create exact copies of data blocks and files, while logical backups represent a structured copy of table data. Logical backups are generally smaller but cannot be used to restore the entire database. [7]
- Backup Consistency: Consistent backups guarantee a point-in-time representation of the database, while inconsistent backups may contain inconsistencies due to ongoing modifications during the backup process. The use of redo logs can address inconsistencies in physical backups. [8]
- Backup Mode: Hot backups, or online backups, allow database access and modifications during the backup operation, while cold backups, or offline backups, require the database to be unavailable. [9]
- Backup Management: Oracle-managed backups utilize Recovery Manager (RMAN), which offers numerous advantages such as ease of use, history data storage, and advanced features like incremental backups and corruption detection. User-managed backups employ alternative methods, such as snapshot technology, which can be integrated with RMAN for enhanced capabilities. [10-12]
Recovery, the process of repairing a damaged database, often follows a restore operation, which involves replacing damaged or missing files from backup copies. Different types of recovery cater to specific situations: [13, 14]
- Full Recovery: Restoring the entire database to a functional state. [14]
- Partial Recovery: Repairing only the affected parts of the database without impacting the availability of other parts. [14]
- Complete Recovery: Recovering all transactions up to the latest point in time. [15]
- Incomplete Recovery: Intentionally stopping the recovery process at a specific point in time, often used to reverse user errors. [15]
- Traditional Recovery: Using archived redo logs to replay transactions. [16]
- Flashback Recovery: Utilizing flashback logs to quickly unwind transactions, offering faster recovery times than traditional methods. [16]
Data Recovery Advisor (DRA) simplifies the database repair process by automating tasks and providing recommendations. By analyzing failures and generating RMAN scripts, DRA streamlines the recovery process for DBAs. [17]

Testing recovery procedures is crucial for ensuring their effectiveness and validating backup usability. RMAN offers the DUPLICATE DATABASE command, allowing DBAs to create a copy of the database for testing purposes without affecting the live environment. [18]

Documenting recovery procedures in standard operating procedures (SOPs) is vital for consistent and efficient execution, especially in stressful situations. SOPs should outline the steps involved in backups, recovery, and other critical database management tasks. [18, 19]

Database Performance Tuning: A Five-Step Approach

Database performance tuning is a critical aspect of database administration, aimed at optimizing the database’s efficiency and responsiveness in handling workloads. Tuning involves a systematic approach to identify performance bottlenecks, analyze their root causes, and implement solutions to improve overall performance.

One of the primary focuses of database tuning is on DB time, which represents the total time the database spends actively working on user requests. Analyzing DB time allows administrators to pinpoint areas where the database is spending excessive time and identify potential bottlenecks. The Statspack and AWR reports provide comprehensive insights into DB time distribution across various database operations, helping to isolate performance issues. [1, 2]

A widely recognized method for database tuning is the five-step approach, encompassing: [1, 3]
1. Define the problem: This crucial initial step involves gathering detailed information about the perceived performance issue, including specific symptoms, affected users, and any recent changes in the environment that might have contributed to the problem. Accurately defining the problem sets the foundation for effective investigation and analysis.
2. Investigate the problem: Once the problem is clearly defined, a thorough investigation is conducted to gather relevant evidence, such as Statspack reports, workload graphs, and session traces. This step aims to delve deeper into the problem’s nature and collect data for analysis.
3. Analyze the collected data: The evidence collected during the investigation is scrutinized to identify patterns, trends, and potential root causes of the performance issue. For example, examining the “Top 5 Timed Events” section of a Statspack report can reveal specific database operations consuming significant DB time. [4]
4. Solve the problem: Based on the analysis, solutions are formulated to address the identified performance bottlenecks. This step may involve adjusting database configuration parameters, implementing indexing strategies, optimizing SQL queries, or considering hardware upgrades.
5. Implement and validate the solution: The proposed solutions are implemented in the database environment, and their impact on performance is carefully monitored and validated. This step ensures the effectiveness of the implemented changes and verifies the desired performance improvements.
Tools like Statspack and AWR play a crucial role in database performance tuning, providing rich data for analysis and insights into database behavior. These tools offer comprehensive reports, customizable queries, and historical data collection, enabling DBAs to track performance trends over time and identify areas for improvement. [1] SQL Developer, another essential tool, enables DBAs to examine table structures, check the execution plan for queries, and even pinpoint blocking sessions that may be hindering performance. [5, 6]

Database tuning often involves addressing various factors contributing to performance issues. Some common areas of focus include:
- I/O Performance: Optimizing disk I/O operations can significantly impact database performance. Techniques may involve using faster disks, configuring RAID arrays for optimal performance, or tuning the database buffer cache to minimize disk reads. [7]
- Memory Management: Efficient memory allocation and utilization are essential for database performance. Tuning may involve adjusting the sizes of the shared pool, buffer cache, and other memory structures to optimize resource allocation. [8, 9]
- SQL Tuning: SQL queries that consume excessive resources can significantly degrade performance. SQL tuning involves analyzing query execution plans, identifying inefficient operations, and optimizing queries through techniques like indexing, rewriting queries, or using hints to influence the optimizer’s choices. [10-12]
- Contention: When multiple processes compete for the same resources, such as locks or latches, performance can suffer. Identifying and resolving contention issues may involve optimizing application logic, tuning concurrency settings, or implementing appropriate locking strategies.
- Workload Management: Analyzing and managing the database workload can help distribute resources effectively and prioritize critical operations. Techniques may include scheduling resource-intensive tasks during off-peak hours, implementing resource limits, or using features like Oracle Resource Manager to control resource allocation.
Monitoring database performance is an ongoing process, crucial for detecting and addressing performance degradation proactively. Tools like Enterprise Manager provide dashboards and alerts, enabling DBAs to stay informed about database health and performance metrics. By regularly reviewing performance data and identifying trends, DBAs can anticipate potential issues and take corrective actions before they impact users. [13-15]

Effective database performance tuning requires a deep understanding of database concepts, available tools, and a systematic approach to problem-solving. By leveraging these elements, DBAs can ensure that databases operate optimally, meeting the performance demands of their users and supporting business objectives.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
April 9, 2025
SQL Full Course for Beginners (30 Hours) – From Zero to Hero

YouTube Video

SQL Full Course for Beginners (30 Hours) – From Zero to Hero

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

April 6, 2025
Backend Full Course NodeJS ExpressJS PostgreSQL Prisma & Docker Full Stack Backend and Database Development
The text is from a programming tutorial focused on building a backend application with Node.js and related technologies. It guides the learner through creating server endpoints, handling HTTP requests (GET, POST, PUT, DELETE), and managing authentication. The tutorial covers setting up a database (SQLite and PostgreSQL), using an ORM (Prisma), and containerizing the application with Docker. Emphasis is put on building a full-stack application, managing user data, and securing endpoints using middleware and JSON Web Tokens (JWT). The process begins with a simple server and scales up to a production-ready application. Specific tasks include creating REST APIs, interacting with databases, and deploying the application in isolated environments.

Back-End Server Study Guide

Quiz

Answer each question in 2-3 sentences.
1. What is a callback function in the context of the listen function for a server?
2. Why is it important to kill the execution of the server during development?
3. Explain the purpose of npm run dev in the context of the source material.
4. What is a developer dependency, and how is it installed using npm?
5. What is the significance of “localhost:8383” (or a similar address) in server development?
6. Explain the difference between HTTP verbs (e.g., GET, POST, PUT, DELETE) and routes/paths in server requests.
7. Explain the difference between the “require” syntax and “import” syntax used for adding a javascript package.
8. What is an environment variable and why is it useful in server configuration?
9. What is an ORM and why is it useful?
10. What is a Docker container and what is it used for?
Quiz Answer Key
1. A callback function is a function passed as an argument to another function (in this case, listen), to be executed after the first function has completed its operation. In the context of the server’s listen function, the callback is executed once the server is up and running, usually to log a message indicating that the server has started.
2. Killing the server execution during development is important to reflect changes made to the server files. Without restarting the server, the changes won’t be implemented, and debugging becomes difficult.
3. npm run dev is a command defined in the package.json file to start the server using a script, often involving tools like Nodemon. This automates the server startup process and can include additional commands beyond just running the server file.
4. A developer dependency is a package needed only during development, not in production. It is installed using npm install –save-dev <package_name>, which adds the package to the devDependencies section of package.json.
5. “localhost:8383” is the address (URL) used to access the server running on the local machine. localhost refers to the local machine’s IP address, and 8383 specifies the port number the server is listening on for incoming requests.
6. HTTP verbs define the action the client wants to perform (e.g., GET to retrieve data, POST to send data to create a resource, PUT to update a resource, DELETE to remove a resource). Routes/paths are the specific locations (URLs) on the server where these actions are directed (e.g., /, /dashboard, /api/items).
7. The “require” syntax is the older syntax for adding a javascript package, where you could write, const express = require(‘express’). The import syntax is more modern and you can write import express from ‘express’.
8. An environment variable is a key-value pair stored outside the application code, often in an .env file or system settings, used to configure the application’s behavior. They’re useful for storing sensitive information (like API keys or database passwords) and for configuring different environments (development, production).
9. An ORM is an Object Relational Mapper, a tool that allows developers to interact with a database using an object-oriented paradigm. It simplifies database interactions by mapping database tables to objects, reducing the need to write raw SQL queries.
10. A Docker container is a lightweight, standalone, executable package that includes everything needed to run a piece of software, including the code, runtime, system tools, system libraries, and settings. It ensures consistency and portability across different environments.
Essay Questions
1. Discuss the evolution of server setup throughout the source material. Compare and contrast using node server.js, using npm scripts, and using Nodemon. What are the advantages and disadvantages of each approach?
2. Explain the role and implementation of middleware in the context of authenticating users for specific routes (e.g., to-do routes). How does the middleware intercept and process incoming requests, and what actions does it take based on the request’s authentication status?
3. Describe the process of setting up and interacting with a database using SQL queries. Explain the purpose of each table, the columns within each table, and the relationships between the tables.
4. Explain the process of containerization with Docker. Be sure to discuss Dockerfiles and Docker Compose, and describe the benefits of using Docker containers for application development and deployment.
5. Discuss the importance of security in back-end development as illustrated in the source material. Describe the techniques used to protect user passwords and to authorize users to access certain data.
Glossary of Key Terms
- Port: A virtual communication endpoint on a device that allows different applications or services to send and receive data over a network.
- Callback Function: A function passed as an argument to another function, to be executed after the first function has completed its operation.
- npm (Node Package Manager): A package manager for Node.js that allows developers to easily install, manage, and share JavaScript packages and libraries.
- Script (package.json): A set of commands defined in the package.json file that can be executed using npm run <script_name>.
- Developer Dependency: A package required only during development, not in production, and specified using the –save-dev flag during installation.
- Localhost: The standard hostname given to the address of the local computer.
- URL (Uniform Resource Locator): A reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.
- IP Address: A numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication.
- HTTP Verb: A method used to indicate the desired action to be performed on a resource (e.g., GET, POST, PUT, DELETE).
- Route/Path: A specific location (URL) on the server that corresponds to a particular resource or function.
- Endpoint: A specific URL on the server that represents a particular function or resource, and that listens for incoming network requests.
- REST (Representational State Transfer): An architectural style for designing networked applications, based on transferring representations of resources.
- API (Application Programming Interface): A set of rules and specifications that software programs can follow to communicate with each other.
- JSON (JavaScript Object Notation): A lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate.
- Environment Variable: A variable whose value is set outside the application code, often in an .env file or system settings, used to configure the application’s behavior.
- Middleware: Functions that intercept and process incoming requests before they reach the final route handler.
- bcrypt: A password-hashing function that is used to securely store passwords.
- JWT (JSON Web Token): A compact, URL-safe means of representing claims to be transferred between two parties.
- ORM (Object-Relational Mapping): A technique that lets you query and manipulate data from a database using an object-oriented paradigm.
- Docker Container: A lightweight, standalone, executable package that includes everything needed to run a piece of software, including the code, runtime, system tools, system libraries, and settings.
- Docker file: A text document that contains all the commands a user could call on the command line to assemble an image.
- Docker Compose: A tool for defining and running multi-container Docker applications.
- SQL (Structured Query Language): A standard language for accessing and manipulating databases.
Backend Development: Node, Express, PostgreSQL, and Docker

Okay, here’s a detailed briefing document summarizing the main themes and important ideas from the provided source.

Briefing Document: Backend Development Concepts and Project Setup

Overview:

This document summarizes key concepts and steps involved in setting up and developing a backend application, using Node.js, Express, and transitioning from SQLite to PostgreSQL with Prisma. It covers topics such as server initialization, routing, middleware, database management, authentication, and containerization using Docker. The main focus is on creating a to-do list application with authentication and data persistence.

Key Themes and Ideas:
1. Server Initialization and Basic Routing (Chapter 2 & 3):
- The initial setup involves creating a Node.js server using Express.js to listen for incoming network requests on a specified port.
- A simple server can be created with minimal code:
- const port = 8383;
- app.listen(port, () => {
- console.log(`Server has started on Port ${port}`);
- });
- The use of npm scripts (defined in package.json) to manage server startup and development processes.
- Use of nodemon to automatically restart the server upon file changes during development, improving the development workflow.
- npm install nodemon –save-dev
- Adjust the scripts in package.json to use nodemon server.js
1. Handling HTTP Requests and Responses:
- Servers need to be configured to interpret incoming requests, including HTTP verbs (GET, POST, PUT, DELETE) and routes/paths (e.g., /, /dashboard).
- The server uses a callback function for each route to handle the request and send an appropriate response.
- A 404 error indicates that the server could not find a route that matches the requested URL.
- The server can send back files, such as index.html, to serve a website to the client.
1. Project Structure and Modularization:
- Organizing the project into folders like routes, middleware, public, and src for better code management and separation of concerns.
- The public folder contains static assets (CSS, HTML, JavaScript) that are served to the client.
- The routes folder contains separate files for handling different types of routes (e.g., API routes, website routes).
- Using middleware for handling authentication and other request processing tasks.
1. Modular Syntax and Package Management:
- Adopting the newer JavaScript import syntax (import Express from ‘express’) instead of the older require syntax.
- Configuring the package.json file with type: “module” to enable the new syntax.
1. Database Management (Chapter 3):
- Using SQLite as a simple SQL database for storing user data and to-do items.
- Creating database tables (e.g., users, to-dos) with specific columns and data types using SQL commands.
- SQL databases use “tables” like sheets for managing different data.
- Example: CREATE TABLE users (id INTEGER, username TEXT UNIQUE, password TEXT)
- The primary key is used to enable communication between tables (e.g., associating a to-do item with a user).
- Using database.execute() to execute SQL commands.
1. Authentication and Security (Chapter 3):
- Implementing user registration and login functionality.
- Encrypting passwords using bcrypt to protect user data.
- “bcrypt has all the code for encrypting the passwords and creating a truly secure application”
- Generating JSON Web Tokens (JWT) for user authentication.
- Using middleware to verify JWTs and protect routes that require authentication.
1. Client-Side Emulation and Testing:
- Using a REST client (e.g., VS Code extension) to emulate browser network requests and test backend endpoints.
- Defining different emulations for various functionalities, such as registering a user, logging in, creating to-dos, etc.
1. Transitioning to PostgreSQL and Prisma (Chapter 4):
- Upgrading from SQLite to PostgreSQL for better scalability and reliability in a production environment.
- Using Prisma as an Object-Relational Mapper (ORM) to interact with the PostgreSQL database as if it were a JavaScript entity.
- “Prisma as an ORM to interact with our PostgreSQL database as if it were a JavaScript entity”
- Prisma simplifies database interactions and provides additional advantages.
1. Dockerization (Chapter 4):
- Containerizing the backend application using Docker for easy deployment and portability.
- Using a Dockerfile to define the steps for building a Docker image for the Node.js application.
- “Docker file is essentially an instruction sheet for creating one Docker container”
- Using docker-compose.yml to orchestrate multiple Docker containers (e.g., the Node.js server and the PostgreSQL database).
- Defining environment variables and port mappings in the docker-compose.yml file.
- Using volumes to persist data and configuration settings across container restarts.
1. Prisma Schema and Migrations (Chapter 4):
- Defining the database schema using Prisma’s schema language (e.g., schema.prisma).
- Using Prisma migrations to manage changes to the database schema over time.
Code Snippets and Examples:
- Creating a table in SQLite:database.execute(`
- CREATE TABLE users (
- id INTEGER PRIMARY KEY AUTOINCREMENT,
- username TEXT UNIQUE,
- password TEXT
- )
- `);
- Encrypting a password with bcrypt:const hashedPassword = bcrypt.hashSync(password, 8);
- Signing a JWT:const token = jwt.sign({ id: result.lastInsertRowID }, process.env.JWT_SECRET, {expiresIn: ’24h’});
- Verifying a JWT in middleware:jwt.verify(token, process.env.JWT_SECRET, (err, decoded) => {
- // …
- });
- Docker Compose:version: “3”
- services:
- app:
- build: .
- container_name: todo-app
- environment:
- DATABASE_URL: “postgresql://postgres:password@database:5432/todos?schema=public”
- JWT_SECRET: “your_jwt_secret_here”
- NODE_ENV: development
- PORT: 5003
- ports:
- – “5003:5003”
- depends_on:
- – database
- volumes:
- – .:/app
- database:
- image: postgres:13-alpine
- container_name: postgres-db
- environment:
- POSTGRES_USER: postgres
- POSTGRES_PASSWORD: password
- POSTGRES_DB: todos
- ports:
- – “5432:5432”
- volumes:
- – db_data:/var/lib/postgresql/data
- volumes:
- db_data:
Conclusion:

The source material covers a comprehensive guide to backend development, starting from basic server setup to advanced concepts like database management, security, and containerization. The progression from SQLite to PostgreSQL with Prisma, and the introduction of Docker, represents a significant shift towards production-ready backend applications. The key takeaway is the importance of structuring code, managing dependencies, and implementing security measures to build robust and scalable backend systems.

Server-Side Development: Key Concepts and Practices

### 1. What is a port in the context of server development, and why is it important to define one?

A port is a virtual communication endpoint on a device that allows different applications to listen for and receive network requests. Defining a port is crucial because it tells the server exactly where to listen for incoming connections. Common port numbers include 3000 and 8000, but any four-digit number can be used. Without a defined port, the server wouldn’t know where to “listen” for requests, and clients wouldn’t be able to connect to it.

### 2. What is a callback function in JavaScript, and how is it used in the context of creating a server?

A callback function is a function that is passed as an argument to another function, to be executed at a later time. In server creation, a callback function is often used with the `listen` function. This callback is executed when the server is successfully started and is listening for incoming requests. It can be used to log a message to the console, indicating that the server is running and on which port.

### 3. Why is it beneficial to use `npm` scripts for running a server, and how do they work?

Using `npm` scripts, defined in the `package.json` file, offers a structured and repeatable way to run server commands. They allow you to define shortcuts for complex commands, making it easier to start, stop, or restart the server. `npm` scripts work by defining a key (e.g., “dev”) in the “scripts” section of `package.json`, and assigning a command string to that key (e.g., “node server.js”). To run the script, you use the command `npm run [key]`, which executes the associated command.

### 4. What is `nodemon`, and why is it used as a developer dependency?

`nodemon` is a tool that automatically restarts the server whenever changes are made to the code. It’s used as a developer dependency (installed with `npm install –save-dev nodemon`) because it significantly improves the development workflow by eliminating the need to manually restart the server after each code modification. It’s not needed in production because the code shouldn’t be constantly changing.

### 5. What is the difference between a URL and an IP address, and how do they relate to a server?

A URL (Uniform Resource Locator) is a human-readable address that points to a specific resource on the internet, often a server. An IP address (Internet Protocol address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. Every URL is mapped to an IP address, allowing browsers to locate the server. A URL is easier for humans to remember, while the IP address is the actual address used for network communication.

### 6. What are HTTP verbs (methods) and routes (paths), and how are they used to handle incoming network requests?

HTTP verbs (e.g., GET, POST, PUT, DELETE) define the action a client wants to perform on a resource. Routes (or paths) specify the specific location or “endpoint” on the server that the client is trying to access (e.g., “/”, “/dashboard”, “/api/users”). The server is configured to listen for specific HTTP verbs on specific routes. When a request arrives, the server examines the verb and route to determine how to handle the request and what action to perform.

### 7. What is an environment variable, and why are they used in server-side applications?

An environment variable is a key-value pair that stores configuration information outside of the application’s code. They are used to store sensitive information like API keys, database passwords, and other settings that might vary between development, testing, and production environments. Storing these values in environment variables keeps them secure and allows you to change configurations without modifying the application’s code.

### 8. Explain the purpose and organization of the file structure created for a more sophisticated backend application (e.g., `src`, `routes`, `middleware`, `db.js`, `public`, `.env`, and `docker-compose.yaml`).

This structure aims to separate concerns and improve code organization. Here’s a breakdown:

* **`src`**: Contains the source code of the application.

* **`routes`**: Holds files that define the different API endpoints and their associated logic (e.g., `auth-routes.js`, `todo-routes.js`).

* **`middleware`**: Contains functions that intercept incoming requests and perform tasks like authentication or data validation before the request reaches the route handlers (e.g., `authMiddleware.js`).

* **`db.js`**: Contains the logic for connecting to and interacting with the database. Includes SQL queries.

* **`public`**: Stores static assets like HTML, CSS, and JavaScript files that make up the front-end of the application. These files are served directly to the client.

* **`.env`**: Stores environment variables (sensitive configuration information).

* **`docker-compose.yaml`**: Defines the configuration for running multiple Docker containers together, such as the application server and the database server.

Routes, Endpoints, and HTTP Verbs: A Server-Side Guide

Here’s a discussion of routes and endpoints, based on the provided source:
- A route can be understood as a specific path or destination within a server that is set up to listen for incoming network requests. In the context of a URL, the route is the sub-portion that directs the request to a specific area. For example, in the URL http://www.youtube.com/dashboard, /dashboard would be the route.
- HTTP verbs, also referred to as methods, describe the action that the network request intends to perform. Common HTTP verbs include:
- GET: Used to retrieve information
- POST: Used to create new data
- PUT: Used to update existing data
- DELETE: Used to delete data
- Endpoints are created within the server-side application using HTTP verbs and routes. An endpoint is a specific subregion within the server that is set up to listen to incoming network requests, and to which code can be directed to execute a body of logic and respond appropriately.
- Together, the route and the HTTP verb create the endpoint. For instance, a GET request to the /home route would define one endpoint, while a POST request to the /api/data route would define another. The method informs the nature of the request, and the route is a further subdirectory.
- Types of endpoints:Website endpoints: These send back HTML and are accessed when a user enters a URL in a browser.
- API endpoints: These do not send back a website, but rather facilitate behind-the-scenes operations, like saving data when a user clicks “save”. They might send back data in JSON format.
- Middleware can be used to handle authentication between the client and server side.
- Client emulators, such as the REST client in VS Code, can be used to emulate the process of sending out a network request and testing endpoints.
- Express.js is a framework that can be used to configure servers to listen to and interpret incoming requests.
Database Interactions: Storage, Management, and Security

Here’s a discussion of database interaction, as described in the sources:
- Databases store data from user interactions. When a user performs an action like saving data, a network request is sent to the back end, which may then send another network request to the database to persist the information.
- Backend applications are set up to listen for incoming network requests and can then interact with a database to store, retrieve, update, or delete data.
- SQL (Structured Query Language) commands are used to interact with databases. These commands can create tables, define columns, and establish relationships between tables.
- CRUD (Create, Read, Update, Delete) actions are the four basic operations that can be performed on data in a database. These actions correspond to specific HTTP verbs:
- Create corresponds to POST
- Read corresponds to GET
- Update corresponds to PUT
- Delete corresponds to DELETE
- When building an application, it is important to configure endpoints to manage authentication and to ensure data is properly saved to the database when a new user registers.
- Prepared statements can be used to safely insert values into SQL queries, preventing SQL injection vulnerabilities.
- ORMs (Object-Relational Mappers) provide an abstraction layer that allows developers to interact with databases using a more object-oriented approach.
- Migrations provide a way to manage changes to the database schema over time, ensuring that all instances of the database are updated to the most recent version.
- Docker allows containerization of databases, making it easier to deploy and manage them.
- Directly logging into the database via the command line provides the ability to modify it directly using SQL queries.
Authentication and Tokens: A Concise Guide

Here’s a discussion of authentication and tokens, based on the provided sources:
- Authentication is the process of verifying the identity of a user or client attempting to access a protected resource.
- Authentication middleware can intercept network requests to verify the token associated with the request and ensure that the user is authorized to access the requested resource.
Here are key aspects of authentication and tokens:
- Encryption: Instead of storing passwords verbatim, they are encrypted using libraries such as bcrypt. Bcrypt generates a hash for a given string, enhancing security. Encryption algorithms are deterministic, meaning that encrypting the same password using the same algorithm will always produce the same encrypted key.
- Tokens:
- After successful registration or login, the server generates a unique token (typically a JSON Web Token or JWT) and sends it back to the client.
- The client then stores this token (e.g., in local storage or cookies) and includes it in the headers of subsequent network requests.
- JSON Web Tokens (JWT) are a standard for securely transmitting information between parties as a JSON object.
- JWT Authentication: This involves the use of a JSON token as a secure password associated with a user, used to authenticate them without requiring repeated sign-ups.
- Token validation:
- When a network request is made to a protected endpoint, the server’s middleware extracts the token from the request headers.
- The server then uses a secret key to verify the authenticity and integrity of the token.
- If the token is valid, the middleware grants access to the requested resource. If the token is invalid or missing, the middleware rejects the request and returns an error response.
- Token expiration: Tokens typically have an expiration time, after which they are no longer valid and the user must re-authenticate.
- Environment variables: Secret keys used for token verification (e.g., JWT_SECRET) are stored as environment variables to prevent them from being exposed.
- Benefits of tokens: They allow a user to authenticate once and then access protected resources for a specified period without needing to re-enter their credentials.
Client Emulation: Testing Server Endpoints and Network Requests

Here’s a discussion of client emulations, based on the provided sources:
- Client emulators are tools that mimic the behavior of a client (such as a web browser) in order to test server endpoints and ensure they are functioning correctly.
- A client emulator emulates the process of sending out a network request.
- REST clients, such as the REST client in VS Code, are a commonly used type of client emulator. These tools allow developers to define and send HTTP requests to specific URLs and inspect the responses returned by the server.
- Client emulators can be used to test different types of endpoints, including website endpoints and API endpoints.
- When using a client emulator, it is possible to specify various parameters of the network request, such as the HTTP verb (e.g., GET, POST, PUT, DELETE), headers, and body.
- Client emulations can be created to mirror user actions. This is analogous to running tests in JavaScript or another programming language.
- To create a client emulation, you would specify the HTTP verb, URL, headers, and any data to be sent in the body of the request.
- Client emulators can be particularly useful when developing APIs, as they allow developers to test endpoints and verify that they are functioning correctly without having to build a full user interface.
- By predefining interactions, developers can emulate functionalities as if a user were using an application.
- The process involves thinking of everything a user could possibly do and then creating those actions programmatically. This ensures that everything is working adequately.
- Client emulations can be used to test authentication flows, such as user registration and login.
- The REST client in VS Code uses a .rest file extension to emulate browser network requests.
- The REST client separates all the template Network requests using triple pound signs (###).
- The REST client displays the response from the server, including the status code, headers, and body.
- Content type is a parameter of the network request that can be configured.
- The authorization token is encoded into the request so that middleware can intercept and interpret it, and consequently authorize the client.
Server Configuration: Back End, Ports, and Middleware

Here’s a discussion of server configuration, based on the provided sources:
- Back end requests: The back end is all of the code that goes on behind the scenes to facilitate user experience of the internet. The back end consists of external devices all around the world that communicate through a network via network requests. These network requests are encoded with all the information that allows communication to happen and for both parties to contribute to someone’s experience of the internet.
- Listening for requests: A back end application must be set up to listen to incoming Network requests; otherwise, no websites will load. The back end is just hardware running software that is connected to the internet and that listens to incoming requests to its IP address.
- Full stack interaction: The moment the network request leaves a computer and goes into the network, everything on the other end of that equation is the back end. The full stack is the front end, which is on the client side, and the back end, which happens server side.
- Ports: To tell an app to listen, one parameter that must be provided is a port, which is a subdirectory within the IP address.
- Middleware:
- Middleware is part of configuring a server.
- It is configuration that is set in between an incoming request and the interpretation of that request.
- It can be thought of as a protective layer.
- A common type of middleware is authentication middleware which handles all of the authentication between the client and the server side.
- File organization:
- The specs file contains all the specifications for a project.
- Modern project directories should contain source code, which is all the code that creates an application.
- The server should be the hub of the application.
- Node.js:
- Node.js is a Javascript runtime.
- With the experimental features available in the later versions of Node.js, the server reboots automatically when changes are saved.
- Express.js:
- Express.js is a minimalist web framework for Node.js.
- It is commonly used to build back end applications.
- Docker:
- Docker allows containerization of applications.
- Docker is an alternative to having software installed on a computer.
- Environment Variables:
- Environment variables are a storage of keys and values; the key is the lookup term, and the value is a potentially secret string of characters that needs to be referenced for the configuration of a project.
- Any top secret information is thrown in the .env file so that it can be avoided from being uploaded, for example, to GitHub.
- File types:
- .js is the Javascript file extension.
- .rest or .http are extensions used for client emulators.
- .env files are for environment variables.
- .yaml files are used to indicate instruction sheets.
- Setting up a server:
- Setting up a basic server only takes about four lines of code.
- The code must define a variable called Express and set it equal to the Express package.
- The code defines the back end application.
- The code configures the app and tells it to listen to incoming requests.
Backend Full Course | NodeJS ExpressJS PostgreSQL Prisma & Docker

The Original Text

hello there my name is James and welcome to this backend full course where we’re going to go from being complete amateurs at backend development to Absolute Pros building all sorts of amazing backend applications for the internet now I am very excited to have this course live on the channel because I personally know how hard it can be to learn the art of backend development I went through that on my learned code journey and now I’m very excited to have this course available because ever since that experience I’ve wanted to create a course that does three things first it’s super beginner friendly which means that even if you have absolutely no experience with backend development you will be able to complete this course we’ll start from the very beginning from scratch and build up from there the second thing the course does is it teaches you all of the Core Concepts and foundational knowledge you need to know all of the best practices the latest and greatest technologies that you need to know to go off and become these Supreme wizards of backend development so that should be super cool and last but absolutely not least if you get to the end of the course you will be left with a portfolio of projects of the caliber needed to get you hired as a full stack developer backend developer software Dev you name it these projects in your GitHub will knock the socks of prospective employers your family and friends and it will just be loads of fun now the course itself is broken down into four chapters chapter one is a theory lesson where I want you to sit back and open up your brain to the universe as I share with you some theory about how the internet works what the full stack is and consequently what the backend is what we can expect from it and how we can actually go about coding out some backend infrastructure now this is not for you to sit down and memorize everything that I’m saying or take some notes it’s just an exercise and gaining some familiarity with some of the concept that we will then put into practice in the latter three chapters in Chapter 2 project number one we’re going to build a very rudimentary backend application that just demonstrates some of these Core Concepts in code form pretty simple doesn’t look the best but you know it serves a purpose as it allows us to dive into the last two projects which is super cool project number two chapter 3 is a complete complete backend application it’s kind of like a quick start backend application where we develop a very comprehensive backend server and have complimentary database interactions we’ll be using the sqlite database which is a super lightweight FAS to get up and running database very popular maybe not so much for production but if you’re just looking to get your backend application up and running it is a great choice so we’ll learn how we can develop a backend application that serves up a front end uses authentication and has database interactions and then in the last project we’ll take that code base to the next level God te mode we’re going to be using postgress for the database we’re going to be using an OM which is an object relational mapper and that’s going to be Prisma we’ll serve up a front end we’ll handle all of the authentication and database and then at the end we’ll dockerize all of these uh backend services so that we have this containerized application it should just be absolutely Wicked a brilliant code base to have on your GitHub page for once again prospective employeers to have a look at and be like yes this is the person we want to hire to develop our backend infrastructure now as for the prerequisites what do you need to know to complete the course well the list is pretty short all you need to know is some JavaScript if you’re looking for a course to brush up on your JavaScript I’ve got one Linked In the description down below but everything else you need to know in this course will be taught to you so you just need some pretty you know reasonable JavaScript skills and you will be absolutely sweet getting to the end of the course and last but not least what do you do should you get stuck at any point well I’ve got you covered there too first up we have some cheat sheet notes you can just keep these open if you want they’re available for JavaScript and they just cover all the basic JavaScript techniques that you should be aware of Linked In the description down below and as for any questions qus or queries you may have if you head over to the GitHub page for this product project that has all of my code there that you can look at and compare and with all your questions you can hit over to the issues tab click on issues and then just write your question and either myself or someone else will respond and help you understand whatever it is that you may have been struggling with so that should be absolutely Wicked finally if you want to support the channel you can obviously become a channel member and unlock the Discord where I’m super active so you could ask any questions there too at the end of today should be an absolute Wicked course I’ve been so excited to release it I’ve worked on it for ages so proud of this material and I hope you thoroughly enjoy it and with that all said if you do enjoy the course don’t forget to smash the like And subscribe button so that I can continue to feed Doug a healthy diet all right it’s time to dive into chapter one which is a theory lesson about how the internet actually works now as I said in the intro I don’t want you to take any notes I just want you to sit back get comfortable and be a sponge for the information that I am about to share with you you don’t have to memorize all of it I just want you to be familiar with some of the terms and Concepts so that when I refer back to them later in this course you can be like I know exactly what’s going on here and the first concept I’m going to introduce you to is known as the full stack now when you open up your computer load up a browser and come to YouTube essentially the programming that goes into creating that experience is known as the full stack it’s the overarching body where the C the culmination of all of these individual puzzle pieces working together to create that experience it’s kind of like a burger a burger is the end product just like you loading up youtube.com that’s the end product you experience YouTube and a burger has a whole lot of subcomponents that to to create this experience of enjoying a burger and that’s the same with YouTube a programmer or lots of programers have sat down and worked together and creating all of these puzzle pieces that come together to creating your experience of YouTube and as a collective they are referred to as the full stack it’s the full stack of things coming together to create that experience now the full stack can be broken down into two primary components one is known as the front end and the other is known as the back end obviously this is a backend course but to understand what the back end does we also have to understand what the front end is responsible for now the front end kind of is summarized in three Core Concepts one is the user you using YouTube are a user we’ve all got experience being a user the second concept is known as the client the client is the medium through which you interact with the internet in most cases says it’s a browser so Google Chrome would be the client through which you interact with the internet whether that’s entering a URL or clicking a save button on a website that’s all a client side experience because it’s happening on your device it’s on the side of the equation that is associated with the user or the client now the last term that comes together to create the front end is actually the front end itself what is the front end well at the end of the day it’s pretty much just the website so when you load up youtube.com at its core that’s some HTML CSS and Java script that your browser runs to create this visualization of the website and that is referred to as the front end so if the client is the medium through which you interact with the internet the front end is the legitimate interface that you can interact with to have this internet experience and so that is the side of the equation that is the user side it’s on the user’s device and that collectively creates the front end it’s the tangible side of this full stack experience now at the same time when you load up YouTube you’re probably connected to the internet what’s going on in the background to facilitate this experience well the answer is a lot of stuff there is so much magic going on behind the scenes to create your front-end experience that without it you just wouldn’t be able to enjoy the internet and that’s what this course is all about how can we program these systems now the way that I want to describe how the backend works is actually by step by-step explaining to you what happens when you open up your computer load up a browser and enter ww.youtube.com and in a split second get that website displaying on your screen and essentially how it actually works at a very technical level is that when you type in this URL http://www.youtube.com and hit enter your browser sends out what’s known as a network request now we can imagine that your computer doesn’t have every single website on the internet saved on it so that means that when you load these websites your browser is actually having to request these websites from an external network now there’s loads of examples of what a network could be it could be a cellular network for mobile phones it could be a Wi-Fi network or Network there’s lots of different examples of networks but essentially your browser when you type in and hit enter on this URL your browser emits this electromagnetic wave that is encoded with the information that your browser needs access to so it is encoded with the URL the network request has this URL saved into it and then it emits this network request into the network where the URL actually locates a destination now a lot of us might actually think of the internet as being this ethereal thing that exists around us and you know we magically pull the information out of the air when we type in these URLs but actually it’s slightly more complex than that the URL a lot of people might not know this but it actually is an address so when you type in http://www.youtube.com your network request is directed to an address in the network and that address doesn’t actually locate a website it actually locates another device connected to the internet now every device connected to the internet has what’s known as an IP address and that is its metaphysical address in the network so when we encode these addresses into the network request you know the network is set up to navigate and ultimately locate these devices now a URL is just a human friendly address so every device connected to the internet has an IP address not all of them have URLs but when you type in a URL that gets converted to an address via what’s known as the DNS which is the domain naming service the domain is the sub portion of the URL so in the case of http://www.youtube.com the domain is youtube.com that gets converted into an IP address which is a sequence of numbers not very easy to remember which is why we have a URL and that’s where these Network requests get directed to they get directed to these addresses these IP addresses in the network which are corresponding to another device connected to the network now that doesn’t really explain how you end up with a website on your screen but we will get there these devices at these uh IP addresses are set up to listen to incoming Network requests so when we develop servers like we will do in this course we set them up we connect them to the internet and we set them up to listen to incoming Network requests to their IP addresses now when a server receives our Network request which is asking for a website if you’ve entered the YouTube the URL ww.youtube.com it can then decode that Network request see that oh this individual is looking for the HTML code to load up the YouTube homepage and it can then go to its little database where it stored all of this code or read some files that are available on this external device and it can then encode them into a response so these Network requests that these servers receive also have a Return to sender address so they interpret the intent of the network request they do any actions that they need to and then they respond with the appropriate information data or service and that all happens in the split second that it takes to load a website like youtube.com you hit enter on that URL your browser adits the network request it’s encoded with all of this information that describes the intent of that Network request which in the case of entering a url url is to retrieve a website or gain access to a website it hits This Server and then the response is sent back across the network as literally electromagnetic uh waves whether or not it’s through a fiber optic cable or you know the air your browser receives this response that is encoded with all of the appropriate information in the case of a website it’s h2l code it then interprets the HTML code and displays it on your screen and you get the website now that is a full stack interaction where the moment the network request leaves your computer and goes into you know the ethos or the network everything on the other end of that equation is the back end the back end is all of that code that goes on behind the scenes potentially on other devices all around the world to facilitate your experience of the internet if we didn’t have backend applications set up to listen to these incoming Network requests there would be no websites to load it would just be you know you’d send out a network request into Oblivion and you would never get any response back and so that’s what we’re going to be coding in this course today there is so much you can do with backend applications and we’re going to get a really solid understanding of some of the most core operations that you can expect from backend applications now now there’s two more things I just want to throw on this Theory lesson at the very end number one is what the front end actually does I’ve talked about that entering a URL sends out this network request to gain us access to the HTML code to display the website well the website itself is typically just an interface to make sending out these Network requests even easier so when you hit save the same operation occurs to persist that data in a database a network request is sent out saying save this information to the database the server listening for these incoming Network requests receive the network request interprets that oh the user wants to save the data it then might even send out another Network request to the database with that information and that then gets persisted in the database so the backend can be you know a whole lot of separate servers interconnected to facilitate your experience of the internet and the other thing I wanted to point now is that you know nowadays we have a lot of modern solutions for developing backend applications such as Cloud infrastructure or serverless backend infrastructure all of this is just more servers running more code set up to listen to incoming Network request so at the end of the day the full stack is just the front end which is on the client side on your computer where the user interacts with it they get access to a front end and the back end is everything that happens service side where the server is all of these external devices all around the world and they communicate through a network via Network requests which are encoded with all the information that allows that communication to happen and for both parties to contribute to someone’s experience of the internet now I know that’s a lot of information to swallow once again you don’t have to remember all of it just being familiar with the concept of a backend okay it’s this external device that’s connected to the internet that’s list listening to incoming requests sent to its IP address uh a URL is just an IP address that’s written in a human friendly form the client side is everything that happens on the user’s device the client is the browser the front end is the website they are the user just being familiar with these Core Concepts will really help give you a solid foundational understanding of all of the decisions we make throughout this course and just make you that much better of a backend programmer anyway that’s the theory lesson over well done on getting to the end of chapter 1 and with that all said it’s time to get our hands dirty with some code as we dive into our first project and chapter 2 of this backend full course all righty you know what they say with the theory done the fun may now begun that’s a good one for you now it’s time to dive into chapter 2 which is our first practical introduction to backend development and at this point we’re going to quickly introduce some of the technology you will need you probably already have it installed on your device to complete this backend full course there’s three particular installations you will be needing number one is a code editor we’re going to be using a vs code on my device uh the link to download visual studio code is available in the description down below uh you can select your operating system and install it you probably used it before if you know JavaScript but this is obviously what the window looks like when you open it up something a bit resemblant of this you might have a different theme but yes you will need Visual Studio code a place to write all of the JavaScript and build out our backend infrastructure now for the second installation is going to be node.js where we have JavaScript we write JavaScript as JavaScript set of instructions we need what is known as a JavaScript runtime a runtime interprets and executes your instructions and the runtime we’re going to be using is nodejs we are going to install it on our device and the link to download is available in the description down below you can either install it via package manager or pre-built installer once again choose your operating system as for the version I’d recommend the LTS version or current version we’re going to learn how we can middle with the versions later in this course so at the end of the day it’s not that important and the last installation is going to be darker now if you’re unfamiliar with darker essentially what darker does is it allows you to containerize your applications now the reason Docker is brilliant is because the code that we write is often dependent on a particular operating system and what Docker allows you to do is containerize your application and create a set of instructions for this container which is just a virtual environment that can be consistently run across all sorts of uh systems architecture or operating systems and it just means that you don’t run into issues where one person can’t run your code you can someone else can someone else can’t or you go to deploy your code and it’s complicated because you you know are locked into a particular operating system so Docker allows you to wrap your application inside of a virtual environment and then just Define the set of instructions to configure the virtual environment and you’re absolutely sorted brilliant technology looks brilliant on a resume and this will come into play in chapter 4 our final project for this full course once again the link to install Docker is available in the description down below and when you boot it up you should end up with a window that looks a little bit like this now that we have all our installations done it’s time to jump into the code so what I’m going to do to begin is open up a blank window of Visual Studio code so here we have my blank window open and what we’re going to do from within here is Select this open but button and we’re going to come to a folder personally this is where I keep all of my coding projects and in here you’re going to create a new folder called back end- f-c course now I’ve already created my folder so I’m just going to go ahead and select it and once you have created or selected that folder then you just want to select open and that is going to open up that folder inside of Visual Studio code and this is where our projects in this course are going to go now over on the side here we can in fact confirm that we are in the correct folder directory because if we open up our Explorer here it says back in full course and I already have one folder in here it’s the chapter one folder which contains a document that is just the you know written version of the theory lesson so if you want to refer back to that at any point you can find this chapter 1 folder and download it to your project Direct or just refer to in the GitHub Reaper which is linked in the description down below so you just want to open up the chapter one directory and look for the theory. markdown file totally not necessary just if you want to have it there now the first thing we’re going to do in here is create a folder for our chapter 2 project so what I’m going to do is right click and create a new folder equally you can use these action buttons right here and I’m just going to create a folder called chapter to if I hit enter on that that is now created and it is within this folder that we are going to house our very first backend project now when we’re first getting started with a backend project using node.js and JavaScript there’s a number of ways that we can go about configuring this project now typically when you’re developing backend code inside of JavaScript we like to take advantage of what is known as the node package manager ecosystem where essentially what that is is a whole lot of different packages or code libraries that we can very easily gain access to and utilize them inside of our codebase to save us having to do everything from the very beginning super standard practice if you’re developing backend applications inside of nodejs and JavaScript and we’re going to see how to do it but essentially what it requires us to do is initialize our project as a node mpm project now to do that super simple the first thing we’re going to do is open up a terminal instance inside of our Visual Studio code I like to use the control backtick keys to toggle a terminal instance just like that now if you want to know all the key shortcuts that I use inside of vs code to speed up my coding process there is a link to a website that that explains all of them in the description down below including the key command to toggle your terminal you can also open up a terminal instance from the folder options just as equally now with this terminal instance open the first thing we note is that we are in the backend full course project directory inside of our terminal we actually want to be inside of chapter2 so the First Command we’re going to type in here is the CD or change directory command and then we’re just going to specify the folder to which we want to enter which is is the chapter2 folder directory and if I go ahead and hit enter on that we can see that our folder directory inside of our terminal has now been updated to chapter 2 so we’re officially inside of our folder directory and this is where we’re going to initialize our project now to be able to initialize a project the first thing we’re going to have to do is ensure that we have node installed on our device correctly now there’s a very easy way to check whether or not you have node installed on your device all you do is you type the node space- v the- v flag inside of your terminal and hit enter and if you have successfully installed node.js on your device you should get a version popping up right here if you receive any type of error that means your installation is either incomplete or incorrect so that’s just one thing we need to do to confirm that node is installed on our device once again you don’t have to worry about the version of node that you have installed on your computer we’re going to learn how we can mdle with the node version later in this course that will become important in chapters 3 and four so now that we’ve confirmed node is working we should also be able to access npm DV so you can run the npm – V command that will ensure that you have access to node package manager on your computer and then what we can do is run the mpm init command with the dasy flag now this command is going to initialize our nodejs backend project inside of our chapter 2 folder directory so if I go ahead and hit enter there we can see that we get an output in our terminal we get told that we wrote to a particular file the package.json file and we wrote this code to this Json to that file so now if we come over to what was originally an empty chapter 2 folder we can see that we have a file inside called our package.json file now what this file is it’s a Json file so it’s basically just a glorified string object it essentially just qualifies exactly what our project is all about it’s kind of like a Specs or specification file so up here we can see we have some Fields the name of the project the version you can modify the version when you you know publish your project to production we can give our project a description so we could say a simple backend ser Ser underneath that we have a main that’s not really relevant to us we have a scripts field this is going to be very relevant to us we’ll see exactly how later and then we just have some other fields that at the moment aren’t really that important but the morel of the story is that this file is going to contain all the specifications for our project now I’m going to go ahead and create another file inside of this folder so I’m going to select that folder create a new file and this is going to be called server. JavaScript or. JS which is the JavaScript file extension now if I go ahead and click enter on that server.js that’s going to initialize that Javascript file and we can see in here I can open it up type a whole lot of code it’s all sorts of JavaScript uh and this is where for this project we are going to create our server application our backend project now when it comes to creating server side applications or backend applications inside of node.js a common framework or package that is used available within the node package manager ecosystem is called Express now you technically don’t need Express to create a Javascript file and set it up to listen to incoming Network requests and act as a server however it is infinitely easier using a package like Express because Express is designed specifically for that purpose as we can see here it’s a fast unopinionated minimalist web framework for node.js so we’re going to be using that inside of this course it’s incredibly common you’ll find most big uh Enterprise level backend applications built out of nodejs will use express or an equivalent package to basically allow them to build these backend applications so what we need to do now that we have a server is add this package to our project and we can see that you know in this website right here for this package they tell us exactly how we go about doing that and we just run this npm install Express command inside of our terminal so if I come back over to the code specifically the terminal what I’m going to do down here is clear everything out and I’m now going to run npm install which is how we install packages from the npm or node package manager ecosystem and I’m going to install the Express package so if I hit enter on that that’s going to go ahead and download everything we need to utilize that package inside of our project so here we can see it’s added a whole lot of packages in a very short time span and we can also see that a whole bunch of files and folders have been added to our chapter 2 project directory now we obviously had the server.js and we originally had a package.json after we initial ized our nodejs project if we come into the package.json we can see that one thing has changed we now have this dependencies field and within the dependencies field we have the Express package listed this is super important because once again our specs for the project need to specify what code packages our project is dependent on hence we list Express as a dependency we also specify what version we use for our project once again it’s not too important if your version isn’t exactly equal to mine as long as they’re approximately the same the code should all be equivalent now the other reason this dependencies field is important is because if someone else downloads your code base they need to be able to install all the necessary packages to run your code just as equally if you were to publish this to a production environment a live environment for the internet when you configure that environment it needs to know what dependencies to install to get your project up and running now as for where these packages have been downloaded to everything is within the node modules file so if we go ahead and open that up that’s got a whole lot of files and folders in there we will not be touching any of them all of those packages just get thrown into the node modules file or folder and that just sits there and we don’t need to do anything about it we can see we also have this package-lock do Json that’s that’s another complicated file that we’re really just not going to be touching it’s not one that you want to Middle with the folders and files that are important to us are specifically for now this package.json and the server. JavaScript now to initialize a server inside of a node.js or JavaScript file using Express that literally only takes about four lines of code so it’s incredibly easy to get a server up and running that doesn’t mean that the server is complete but you know it’s a good start so that’s exactly what we’re going to do now so to initialize a server using Express the first thing we need to do is Define a variable called Express and set it equal to and what we set it equal to is we require in the Express package so essentially what this line of code does is it requires the Express package and we assign whatever is in that pre exis and code that package to this variable so that we can then use it all throughout our project so that basically Imports Express into our code base right here the next thing we need to do now that we have access to express is to Define our backend application and we do that very simply by defining a variable called app and setting it equal to and we invoke Express as a function so that’s going to create our backend application for us and then the last step is we say app and if you recall from our lesson from our Theory lesson the back end is just Hardware running software that is connected to the internet that listens to incoming requests to its IP address so we have this server app and we the last thing we always do this line goes at the very bottom of our code is we configure it and then we tell it to listen to incoming requests so that’s what this line does now when it comes to telling an app to listen one parameter we need to provide it is known as a port which is basically just a subdirectory within the IP address so the IP address is the address of the device and the port is a location within that device and so what we’re going to do in here is Define a port I’m going to use all uppercase as a variable and I personally like to use 8383 typically it’s a four-digit number some common ones are 3,000 8,000 so on and so forth I just like 8383 those are my lucky numbers so now we have this port what we can do is we can pass in the port as an argument to the lesson and we can say all right Mr server app I now want you to listen to incoming requests to this IP address specifically at this this port now there’s one other argument we can pass to this listen function and that is a call back function so in here I’m going to create this Arrow function and once again just a reminder if your JavaScript needs a bit of brushing up on there is a course Linked In the description down below but anyway in here we have an arrow function this is a callback function to be executed when our server is up and running and all we’re going to do in here is console.log so we’re going to log something to the console it’s going to be a template literal string and it’s just going to say server has started on and then I’m going to use the dollar sign and the curly braces to inject the port variable into this template literal string so with that all done these are the four lines of code that we need to create a server that is officially listening to incoming requests over the Internet so obviously the next thing we need to do is actually run this file now there’s a number of different ways that we could go about doing that one is very simply to tell node to read the server.js file and go ahead and execute that file so if I hit enter on that we can see that right here I get this output server has started on Port 8383 absolutely brilliant our server is up and running and one other thing to note is that we never finished the execution of that file it’s kind of stuck in limbo and that will remain to be the case Cas while our server is indefinitely listening to these incoming requests so here it’s a continued execution of this file or basically we never ended the initial execution of this file it’s still running it’s still listening to these incoming Network requests so that’s pretty neat congratulations you have officially with four lines of code created a server that is technically connected to the internet listening to these incoming requests that is a solid backend application that doesn’t actually do anything but you know it’s a start now what I’m actually going to do at this point is kill the execution of the server and I’m going to do that using the control and C keys so there I type the control and C keys and that killed the execution of the file and I now have access to my terminal once again and can write some additional commands so that’s pretty cool now in this project that’s actually not how we’re going to go about booting up our server we’re going to do everything via npm and Via this package.json basically through the specs file now in here we have a field called scripts and we’re going to go ahead and add a script where the script is just some a set of instructions to basically get our server up and running now the first thing you have to do to add a new script is give it a title or key in this case it’s going to be called Dev and then what I do is I set it equal to a string adding a comma to the end of that field to keep the Json happy and in here I’m going to insert that command so that’s going to be node server.js now that’s just the same command we ran earlier and if you were just running between these two uh boot up instances it wouldn’t really matter which one you did but it’s good to get into the practice of doing it via the package.json and the script methodology because occasionally these scripts these startup commands become a lot more intricate and complicated so now I’ve gone ahead and added this simple Dev script right here I’m going to go ahead and save that package.json file now what we can do to boot up our server is we can tell npm to run that particular script so all we do in here is we type npm we say run the dev script and that’s going to execute that line of code and we can see here it checked what the line of code to be executed was no server.js and it booted up our server once again to the same outcome now that’s still actually not the way that we’re going to go about booting up our server throughout this course and the main reason for that is because if I now come in here and change something about my server let’s say I console.log this is an extra line and save that what’s happened absolutely nothing has happened so to get that that change to be reflected in the server execution I would now have to kill my file and boot it up again and now we can see we get that extra line but if you’re regularly making changes to your files that’s super annoying to have to restart your server every single time so what we’re going to do is once more kill our server uh applica kill our server execution our server application using the contrl c key and we’re going to install one more package called npm install node Monon so n o d m o n now node Monon is going to be what’s known as a developer dependency and what that means is that it’s not something you would use in production it’s specifically for development because that’s when we’re going to be making a mass amount of modifications to our file and needing it to regularly update so we’re going to install it slightly differently we’re going to write mpm install and then use a D- save-dev flag and then name the nodemon package now when I hit enter on this command it’s going to go ahead and stall that package all that code will be added to the node modules but now when I come into the package.json we can see that it actually hasn’t added it to the dependencies field but it has added it to the developer dependencies field and that means that when we publish this code to production it’s not going to worry about installing dependencies that are specifically for quality of life improvements when you’re developing the code now what we can do with nodemon installed as a developer dependency is slightly adjust this script to instead be node Monon instead of just node it’s now nodemon server. JavaScript and then we’re going to go ahead and save the package.json file with all of that done we can now run PM run run the developer script if I hit enter on that we can see once again we’ve got this as an extra line running consoling out we can see that our server has started we can see that it’s a continued execution however now when I come in here and remove this line and save this file we can see that our server was automatically restarted to reflect the changes in the code and that is going to be in definitely more convenient than constantly restarting our server every time we make adjustments to it so that’s absolutely brilliant now we have defined the code to initialize our server and we’ve got it set up to be really easy to work on modify update you know and add all the functionality we need to it to complete this project so now that I’ve created a server that is listening for incoming requests across a network network request does that mean I have a functional server well let’s go ahead and find out now earlier in the theory lesson I mentioned that one way we interact with these servers is via their address so what that means is that right now this server that is connected to the internet has an address that we can send Network requests to so from a technical standpoint the address of This Server connected to the network is Local Host it’s HTTP SL semicolon localhost 8383 this is the address or URL that is mapped to the IP address that locates this server in the network now where this instance is the URL so if we actually just specify the address uh let’s call this one the URL now I said earlier that every URL is mapped to an IP address so the IP equivalent is uh this series of numbers right here so if we were to go to a browser and enter this URL or this IP address both of them locate our server across the local network uh and that would be valid and I think technically that would actually have to be 8383 cuz we have to specify the port within the address of the device so why don’t we actually go ahead and try this and once again if you want to copy these these are available inside of the GitHub code inside of chapter 2 the server.js so if I come over to my browser I should technically be able to copy this URL paste it in here and hit enter but now we can see uh we actually get an error response that says cannot get and what we can do from within a browser is we can rightclick on this browser click inspect and that’s going to open up the Chrome developer tools and we can come over to to the network tab now within the network tab that’s going to let us see and keep in mind that this is the client side all of this is server side all of this is client side even though technically they’re on the same device for the sake of development the client when we hit enter on that URL sends out a network request across the local network for development that reaches this backend code and then it consequently responds and so that’s something that we can track our browser actually doing from within this network tab so if I refresh this whole process and hit enter on this URL we can see right here that a network request was emitted from our client from our browser if we take a look at the headers which are basically the properties or parameters that specify the intent of the network request here we can see we have the request URL which is the address we can see that we have a method which is the verb that describes the action of the request in this case it’s typically to get a website when you enter a URL into the browser it’s typically to get access to a website and so this network request has gone out into the local network it’s found our server has told the server that it wants a website but it’s received a response that says 404 that’s a status code it describes what is actually happening in the response but at the end of the day the summary is not found cannot get that website so the question becomes what’s actually happening here now one of the keys or answers is this little slash right here in addition to the get to be fair and essentially what we have to do is we actually need to configure our server to interpret these incoming requests now some of the things we need to set it up to interpret are known as the uh HTTP verbs and the routes or paths now in this case right up here this slash is known as the path and this get is the HTTP verb as far as a URL goes if I add a slash here that slash is for the home path if I were to have a dashboard that would be to the dashboard path if I were to have an or path you know that’s obviously an or path and then we could have SL or/ dasboard so on and so forth these are all of the routes or paths and they are part of the request URL now that is like once again in addition to how the port works that’s a further subdirectory to which we need to navigate these incoming requests and then we can Define actions at each of these end points these can be referred to as end points there’s specific sub regions within our server that is listening to this incoming or all of these incoming Network requests where we can direct the code to a specific endo and execute a body of logic to respond appropriately now that’s a whole lot of words right there you’ll see how it works very shortly but the other thing we have to throw into the mix here is that obviously we have these routes now where once again by default the uh the route is the slash so we can see that gives us the exact same response the second thing we need to do is configure our routes to listen to through these or interpret these specific verbs which help us further understand the intention of the network request now just before we do that one other thing I want to point out is that if we use this IP instead of the URL that’s going to give us the exact same response and that is because the URL is then converted to an IP address so we can just skip that step and use the IP address directly but obviously that’s going to be a lot harder for a human to remember now I’ve kind of said here the next step is to add in the HTTP verbs and routes but how does that work well essentially what we need to do is configure our app for these verbs and these routes now this is kind of like you know anticipating what a user is going to do so for example I could write app and I could assume that they want to get information when a user comes to my website I can assume that they want to get something you know that’s a pretty standard response and so what we do is we invoke this get method and we configure it now the first argument that gets passed into the get method is the route so in here what I’m going to do is use the slash route because up here we’re saying we cannot get that home default slash route well now I’m going to configure our server to handle incoming get requests to this Home Route and then what we do is we Define some logic to run when we get these incoming Network requests that want to get information at the Home Route and the way that we do that is by providing a second argument to this get method that is an arrow function that has two arguments a request and a response argument now that I have access to the request and the response in here I can Define some code to be executed when our server receives incoming requests to the slome route that I get request if we look at this network request right here we can see that the method is get so just to summarize that the method informs the nature of request and the route is a further subdirectory basically we direct the request to the body of code to respond appropriately and these locations or routes are called end points so technically this is end point number one and it’s the slash route once again this might be a little bit confusing initially but as we do it more and more it will become more apparent and obvious to you so now we have some code in here for the get request the income and get request which are the HTTP verbs to describe the intent of the request a verb is an action word get is to get information and these requests are directed to the slash endpoint within our server so in here what I’m going to do is have a console.log that says yay I hit an end point and I’m just going to go ahead and save that now that noon has restarted our server what I’m then going to do is go ahead and hit enter on this network request again and we can see this time we actually got a different response we didn’t get an error immediately we will soon but we actually got a different response and we can see that that website is still loading but if we come over to our console that was from earlier that’s not from this one we have absolutely nothing inside of our client side console but if we come over to our server side code we can see that we actually executed this console.log which means that that incoming Network request Was Heard by our server and this callback function was executed so in here we can now Define a whole lot of code to handle that incoming uh request now one thing I just want to point out once again is that the app needs to be configured first and then we tell it to listen second so this line of code always needs to be down the very bottom so to summarize what I just said we said that this endpoint has officially received this request and the way that we understand exactly what its intention is is via this request argument right here and the way that we respond is using this response argument so in here what I can do is I can console.log the request. method and the method is the HTTP verb so that’s pretty straightforward that could be kind of fun and to respond what we’re going to do is we’re going to call Rez and we’re going to learn our very first response type you know some responses send back HTML codes some send Json data some send all sorts of stuff some send straight up strings we’re going to send a status code of two 200 now I’ll explain what that means in just a second so if we go ahead and save that our code restarts and now if I refresh this page we no longer get an error here if I look at the network we can see that we get a 300 response right here we can see that we do in fact get something back and we get this little text element right here that just says okay and that is brilliant we officially removed all the errors from our application and we now have a full stack uh interaction where we can send out Network requests from the client they can reach our server that is listening to these incoming Network requests at its IP address specified right here it can interpret them by interpreting the intention or verb of the incoming request and also the route or path or endpoint destination which is in this case is the uh home or slash route now as for what the status codes mean whenever you have a network request we have a bunch of codes response codes or status codes that basically are a shorthand determination of the outcome of that initial request so any response code that is 200 level so 200 to 299 basically you know suggests that it was a successful request in this case we got back in okay Roger cool absolutely sweet 300 level responses are a bit more common just like uh 100 the most common are 200 so you could have 200 2011 203 202 uh then there’s 400 level responses so we saw earlier that we had a 404 that means not found and typically what that means is that there was an error in communication so anything 400 level is kind of an error in communication so 403 means that it’s a fidden request which means that you’re not authorized to do that uh 500 level requ requ ests mean that there was an error on the server side so we received your request and something went wrong so there’s a whole lot of different status codes that we can associate with the response now this line right here doing something with this resz is absolutely critical to Define how your server is meant to respond when an incoming request hits that particular end point or this body of code so for the minute what I’m going to do is just send back a 2011 one status in this case I’m going to go ahead and hit enter on that and we can see that gave us back a created response and that’s because the 201 status code says that you have created something we can see that information right here in that Network request we can see that the headers which specify the intent of the network request were to get some information at this IP address right here we can preview it doesn’t look like anything and we can see the response which is a created status now what happens if I type in/ dashboard well we get a very similar error to before where we cannot get the slash dashboard route there’s nothing there and if we look at this error code we can see that now it’s to the/ dashboard URL or route within our entire URL and we got back a 404 status code which means that we can’t find any so the way that we would have to go about configuring our server to receive incoming requests at this URL is by telling our app to listen to these get requests at this particular route so we can do that very easily just underneath right here we can say app.get we can pass in the/ dashboard route and then we can pass in this call back function receiving the request and response arguments and in here I can say console.log ooh now I hit the slash dashboard endpoint uh and what I can also do is res. send I’m not sending a status in this case I’m just going to send the string high now just before I save that one other thing I want to point out is that here we consoled get and that’s the request. method so that’s an example of how we can interpret these requests we’re listening for these incoming requests we can see what the method is we can do all sorts of stuff to understand what exactly is contained within this incoming request anyway we’ve now defined a second endpoint it’s a get endpoint at the/ dashboard route and it’s going to console some different text and it’s just ultimately going to send back a response which is a string that says hi so if I go ahead and save that our code our server restarts and if I now rehit enter on that URL we can see that we get back the response high so that is super cool we can see that the default status code of this response was 200 so in this case we didn’t specify a status but it just defaulted to 200 cuz it was a successful request all the information is okay this is the URL that we sent it out to the headers are basically the properties or parameters of the network request we got back a 200 level response there’s loads of information in here and we can preview the response that says hi and we can look at it and it’s just a string the other cool thing you can do is you can typically uh see exactly how long it took so up here I believe that would be the length of time taken to complete that uh that response which about 40 milliseconds which is very fast and just like that we have officially defined two endpoints where both of these endpoints use the get HTTP verb and we’ve seen how we can navigate or direct these incoming requests to certain end points or routes and consequently how we can respond back to them using the express Jazz framework so that’s super cool and now the next steps are to tidy up the code that we have here so that it actually resembles a more traditional web server so one of the weird things that’s happening right here is we’re obviously sending out these Network requests where the default method when you send out a network request from the client via the URL bar is of method get now one thing I wanted to clarify up here is that HTTP verbs are the same thing as the method which is just once again the action and together these create the end point which we can think of as literally the end destination of that Network request before it’s handled and consequently responded now these are both technically speaking how we actually create the endpoints within our server side application within our node.js backend and so we create loads of different endpoints to handle all sorts of different incoming Network requests so this is uh in obviously a get endpoint to this particular route uh so on and so forth now obviously in both of these cases you know in this one I send back a status code and that’s cool and all and this one I send back a string but that still doesn’t explain how we end up with a website on our screen you know when I come to a web URL in this case it’s localhost 8383 you can imagine that’s google.com is just for local development I hit enter I send out a network request across the network to the IP address that is associated with my backend server which is this one right here where’s the website well excellent question so what we’re going to do now is learn how we can convert these to actually send back a website now obviously a website is HTML code and so in future we’re going to learn how we can send back a whole HTML file but if we just really want to simplify that whole process at its very core all we’re doing is we’re res so we’re responding and we send back and I’m going to create a string and in here I’m going to create an H1 tag which is HTML we need the corresponding closing tag and this is literally just going to say this this is actually our website uh you know in Brackets HTML code so now what I’m going to do is save that our server is going to restart and now I’m going to refresh the page and this time we actually got back HTML code if I can just move this down somehow where’s the little scrolly bar it’s gone from the world or so it would seem come on you can do it uh if I can just inspect this properly elements everything’s up there but the the moral of the story is that there’s going to be HTML code in there but I can’t get to it silly silly silly anyway the point is that this is HTML code you know I could send back after that uh an input just like that that’s the HTML code for an input and now when I refresh the page we actually get an input sent back and this is how you end up with a website on your screen we go to this URL right here it’s to get URL it’s to get a website and when it rece when it hits this endpoint when our server receives that income and request we recognize that it’s a get request it’s loc you know its destination is this particular route it hits this end point so we know that the user wants a website so we can literally just send them back a website now that’s obviously one manner of communication you know one type of network request is to get websites and this is where we need to really tidy up our server because for example this dashboard route right here if someone’s going to the dashboard route you can imagine they want the homepage for the dashboard they don’t want some silly response that says hi but sometimes we literally need to send out responses that say hi and don’t load websites so what I like to do inside of my endpoints is I like to have some uh you know web it end points just like this and then I like to have a second type of websites which is more for like AP endpoints now the difference between the two of them is that website endpoints are specifically for sending back HTML so these endpoints are for sending back HTML and they typically come when a user enters a URL in a browser like we have been API endpoints are more like what happens when you type in your username and password and you hit submit that sends out the same network request it hits the back end we locate it to an API endpoint uh except these ones obviously don’t send back a website so we kind of you know that’s where the magic happens behind the scenes and we do some different with these and the point I’m trying to make with all of this is that just here these two end points I’m going to move them into this first division of endpoint which is the website endpoint so I might actually just label this this is going to be type one website endpoints and then these are going to be type two and we’ll call these nonvisual uh API end points and so now what I’m going to do is I’m going to change this first one the response so that it’s let’s just call it uh homage so that’s what it sends back as a homepage you can just imagine this is a whole lot more HTML code that is literally the homepage for our website and then in this one right here I’m going to send back the exact same thing except except for a dashboard so in this case instead of it being homepage it’s going to be Dash board uh if I can spell that correctly so here we’re going to send back the dashboard now when I refresh the page this one’s the homepage which is what we’d expect we get the website back now if I go to the dashboard link we locate the dashboard that’s absolutely Wicked for these ones down here I’m going to Define another endpoint this is going to be a get endpoint except this one’s going to be SL API SL dat now in this case we don’t have any data but what if our website you know was for example a job board and our server has all of these job listings that a user needs to get access to while the client is going to send out that Network request to the server and request all of that data all of those job listings and because we’re not sending back you know a website or anything it’s not really going to be a URL that gets entered inside of a browser I like to start them off with this SL API endpoint because it just basically signifies that this one isn’t sending back HTML it’s more of a nonvisual behind the scenes uh network communication request it’s still exactly the same you know request response Arrow function in here we can console.log this one was for data except now what I would do is I would send back I would res do send my data now for the minute I actually don’t have any data so what I’m going to do is just up here I’m going to Define some data going to say uh let data equal an object and that object is going to have uh you know a name that is equal to James so this is our object this is our data so now we have a backend server that serves up some HTML code when people will type in those URLs what we now also have is a means for letting someone get data now the question at this point becomes you know we’ve established that this endpoint here it’s a get request they want to get data but it’s not really a website that they’re getting so it doesn’t make sense that they would type it into a URL or a navbar is to this particular route how do we go about testing this you know what what do we actually do and this is where a tool known as a client emulator becomes extremely helpful so what we’re going to do is we are going to come over to the extensions tab right here and we’re going to look up a uh an extension called the rest client now it’s the top one right up here it’s uh by [Music] haow Ma it’s got over 5 million installations it’s absolutely brilliant and you just want to hit install on that extension and once we have that extension installed within our chapter two folder we’re going to create a new file and this one is going to be called test. rest or also client is uh valid as well but we’re going with rest rest is the extension that we’ll need now what this file allows us to do is emulate the process of sending out a network request so basically emulates a browser or a user for example in here we separate all the uh template Network requests we want to send out using the triple pound and the first one you know I can actually give this one a title this is going to be to test get uh homepage website and now to actually write the code in here what I would do is I’d specify it to get request and I’m sending it to http sl/ Local Host 8383 which is the URL now we can see that if I save that we get this tiny little send request button up here and if I click on that it emulates that whole functionality of us typing in the URL here except instead of displaying the HTML code as a website in the browser we can inspect literally what the response is which is the HTML code we can see that it was a 200 level response we can see it was powered by Express we can see that we get back HTML code and that was a successful request now I can do the exact same for test get SL dashboard uh website and then here I’ll just type uh get HTTP localhost 83 83 we give the address then we give the route which is the dashboard and we have defined this endpoint so that’s all good I had send it sends that request we can see it took 7 milliseconds down here and we now get back that HTML code for the dashboard and obviously these are our website uh endpoints that we’re testing here but I can also do it for a data end point so what I could do is I could use the get method the get verb and go HTTP localhost 8383 apiata which is our most recent endpoint that we’ve added and now let’s say I wanted to test that you know a user comes onto our website we need to ensure that they can fetch all of the job postings for our job board so we run this network request it has that endpoint and now we can see we get back Json data and we also console this one was for data and so this is a client emulator it sent out a network request from this client it hit our server our server directed it to this specific endpoint it interpreted the method of the request

which is to get information it located it or navigated it to this particular route we ran this console and we responded with the data so that’s super cool it’s a different kind of endpoint it doesn’t really show up a website it’s not something you would typically type in a URL even though technically you could you know in here I can go API SL dat and it will show me the data which is Json but that is not a website so that’s just another endpoint that we have officially configured so now that we have these three end points it’s time we start looking at some of different HTTP verbs or methods now at the end of the day most of these methods all come under the umbrella term of crud actions so if we take a look at the term crud I’m actually going to write it in here crud crud stands for create read update and delete now if we think about you know these are the four actions that basically control all of data modification the read is get that’s obviously associated with a get request so to read information is to get information if we want to create the HTTP verb that’s associated with is called post you post a parcel to someone else and it creates that package in their hand if we wanted to update information we use a put because we put something in the place of something that was already existing we create that place for it with a post request and then we put something there with a put request and a delete request or the delete functionality is literally associated with the delete method so in here what this is actually doing is we have uh the method and the crud action so the method is on the left and the crud action is on the right and I actually got that around the wrong way uh it’s the crud action on the left and the method on the right so that’s pretty cool now what we’re going to do is really take this application to the next level by literally creating something that is going to display our data so in here what I’m going to do is for our homepage which we have right here I’m going to turn this into a template literal string and that’s going to allow us to inject some data so this is our template literal string I’m going to enter it onto a new line which we can do with template literal strings and we know that HTML code needs a body that’s where all the visible part of the website the visible HTML code goes within the body tags within the body opening and closing tags and in here what I’m going to do is create a paragraph opening and closing tag and within that I’m going to use the dollar sign curly braces and I’m going to Json do stringify our data so this is going to inject our data into this template literal string and send it over as the HTML code so now if I refresh this page and hit enter we can see now we get back the Json code for our data and I’m going to actually uh throw an H1 tag above that and that’s just going to be called Data uh and I’m actually going to give this a style tag the body of style tag and that’s going to be equal to background pink color uh blue that’s going to look shocking so now if I save and refresh that we can see we have a website and we have our data so that’s kind of fun we added the style tags we sent back the HTML code someone loaded the website and that’s how every website you ever go to on the internet actually works so that’s super cool now you know let’s get creative how do we actually add modify data all of that kind of stuff well that’s where we get to these funky API endpoints and that’s where we’re also going to use our client emulator so what I’m going to do in here is I’m going to create an endpoint that allows someone to add data so in this case I’m going to use the new method we talked about here which is to post so I’m going to expect that someone’s going to post some data and I’m going to ensure that it is sent to the apiata route because these are all the routes that are responsible for handling that data and and here once again I’m going to define the uh function to handle the incoming request when it hits this endpoint so that’s a request and a response as arguments and in here what I’m going to do if someone is actually sending information instead of just asking for information I actually need to take a second to investigate what they’re actually sending me which we haven’t had to do before now fortunately that’s very easy to do with Express with Express typically when people send information there’s a number of ways they can send information but most commonly it’s as Json it’s formatted as Json and so what I can do here is Define a variable called const new data or new entry and set that equal to I can access the request which is the incoming request and I can look for the body of the request now the body of the request is literally the data associated with that request and typically when you have uh the create and post methods and the update input and occasionally the deletes you can typically expect there to be a b associated with that request instead of just you know a request for information which would be um more related to the read or get request so because in this case someone wants to actually create a user wants to create a user uh let’s say for example when they click a sign up button what would happen is the user clicks the sign up button after after entering their credentials and their browser is wired up to send out a network request to the server to handle that action and that’s what this endpoint is for so let’s actually take a second to go ahead and program that from within our client emulator so here we have our get endpoint we need a new one we’re going to use the triple pound and we’re going to say data end point for adding a user now in this case as we saw inside of the server that is a post request cuz we’re creating a user so we specify the verb and now we still need the URL including the route where that request gets you know directed to when it reaches our server so this obviously locates the server with the domain and the IP address and and then this route locates specifically the endpoint within the server that contains the logic needed to handle this request now since we’re actually posting information we need to Define that information and so what we’re going to do is create a Json object just here you need to have an empty space uh Above This object for formatting and I’m going to have a name and that is going to be um gilgames because why not let’s support gilgames so that is the data formatted as Json that we want to send now when we’re sending data the one last thing we want to do is configure a parameter of the network request which is known as the content type and we just set that to application Json and that means when our server receives this request from our client emulator it knows that it’s looking for Json and it can cons consquently pass that body and you know interpret that Json and gain access to this value so now we have a post end point where we can access the body we currently haven’t defined how we want to respond to it but let’s just go ahead and send this request from our client emulator and you have to remember that this would be equivalent to you know if they had a website where it’s like submit a new account add a user add a to-do their click the button and this is what happens so we send out their request and we see that it’s just waiting indefinitely now there’s a few things to note first up that uh we’re not getting a response and that’s because we haven’t basically told this endpoint how to respond when it receives this request so that’s what we’re going to do next so I just went ahead and cancelled that request and we got nothing back now to respond respond this is where we would send the status code of 201 which if you recall was associated with the created you know outcome a user was created or added so that would be a perfect response to that so let’s go ahead and send that request and this time we get back at 2011 which says that we have created a new user so that’s pretty cool now what we’re also going to do before we respond is we’re actually going to console.log this new entry restart that and I’m going to go ahead and send this request again and we can see that even though we tried to console.log the body of the request the request. body we get undefined and that’s because up here where we configure our server there’s one last thing we have to do so I’m going to create a section in here called middleware and we just have to tell our app to use express. Json and we invoke the Json method now middleware is part of configuring your server and basically it’s configuration that you slam in between uh the incoming request and interpreting that request so request hits your server this is like a protective layer on the like a middle in between these interactions so it’s just before we actually hit these end points and this line just basically configures our server to expect Json data as an incoming request so now that we’ve added this line right here we’re telling our app that it needs to use this express. Json expect to receive this Json data now when I send that request we can see that we actually logged out the new data and we consequently our server responded to the client emulator and we got a successful Response Code now the one last thing I want to do here is actually add the data so what I’m going to do is reformat it and I’m just going to create a users field and it’s going to be an array and that’s just going to have the entry James and to be fair what I could do is uh get rid of a lot of this and just literally make it an array with the name James so now when I refresh the page we can see we get the array back James that’s the data rendered onto the screen now what I want to do is add a user so we come down to this particular endpoint where we handling this data it’s an API endpoint it’s not for rendering a website and we’re just going to say data. push and we’re going to push the uh new user here we can see that we have to access the name parameter within the new entry so we push the new entry. name field because that’s what we want gain access to that’s an object the request. body is an object uh and we access the name key within that object we push it to our data array and then we confirm that we have created that new entry so now when I save that if I refresh the page we get our singular entry inside of this array rendered as our website and now we come over to our client emulator we emulate the process of adding a new user by sending this request gilgames has been added we have confirmed that we have added gilgames and now when I refresh this page we can see that gilgames has been added to this list so that’s absolutely Wicked we can see now that you know we’re really developing a full stack interaction where we have a client where we can actually visually see all of these backend interactions that are just going on by telling our server to listen to these incom requests and defining end points that handle all sorts of different you know behaviors expectations and intentions of these requests now a big server might have a 100 different end points all for different things uh but at its very core all you’re doing is creating different destinations where each route and each verb together create the end point where each endpoint has you know a specific utility so that’s pretty cool you know if we just wanted to take this one last step we could say app. delete uh we specify the route which is/ API endpoint and we Define the arrow function to be executed when we receive an incoming request to this endpoint we give it the request and the response as arguments and in here what we do is we just say uh data. pop we’re going to to pop an entry off the end of that element we could also throw in a console.log just saying we uh we deleted I can’t remember if it pop is off the end I think it’s off the end so we’re going to say we deleted the element off the end of the array and then we just res. send the status code of 2003 for example I’m sure there’s one that’s associated with a successful delete but I’m not not sure what it is in this moment so what we could do now is emulate that request I could literally just copy and paste this code right here actually I’m not even going to do that I’m just going to create a new one so I’m going to use the triple pound and say delete endpoint uh and that’s going to be a delete request to the HTTP SL localhost SL API slata and we have to throw the port in here so that’s 8383 and then very simply I can go ahead and execute this request in this case this request does not need to contain any data because it’s not specific to an entry it just literally gets rid of the entry off the end of the uh array run that request oh and we got a 404 which means that that cannot be found and that’s because I specified the route incorrectly so here you can see that we are looking for a slash data whereas I’ve configured it for/ endpoint cuz I’m a muppet so let’s now change that to API sdata save that file reexecute that request and we get back at 203 which is definitely the wrong Response Code for this action but we have confirmed on our server side that we did in fact hit that end point and now if I refresh the page we actually got rid of both entries let’s go ahead and restart this once more that’s going to refresh everything so now we have James then we’re going to uh add an entry send that request that’s a user created now we have a new user and then we pop a value off the end of data so let’s go emulate that behavior a user clicks delete and a website it’s the same thing we’re just emulating it we delete an entry and then we refresh the page and now it’s worked appropriately so now we only have one entry on the end of our array and just like that you can really start to see our backend application coming along now the one last thing I want to do in here to really just make this feel a little bit more like a website is throw in some anchor tags that have an hre this anchor tag is going to have an hre to the/ dashboard route and that’s going to have the text dashboard and then we have to close that anchor tag and then within the dashboard I want to do the exact same thing uh so this is going to have the H1 we have to put that inside of a body tag so I’m going to change that from a quote to a template literal string throw the dashboard down on a new line put it within the body tag so let’s create the opening and closing body tag into that onto a new line just do some text formatting really quickly throw that up there and then throw in an anchor tag just here that takes a user back to the homepage and that’s just going to say home and just with these extra lines of code so on the homepage I’ve added a link to the dashboard page and on the dashboard page I’ve added a link home if I save that and refresh this page now we get that link I can click it it routes us to the dashboard page we render that HTML code and that’s going to take us home so that is really cool and you could absolutely take this to the moon for example I could add a button in here where the button actually adds a new user and instead of having to emulate that functionality that could actually be done within that HTML code I could also throw in a script in here you know I can throw in a script close the script consequently and that’s going to say console.log this is my script and now you know let’s say up here I throw in a cons cons. log user requested the homepage uh website now if I save this back in server and refresh the page we can see that on the back end we received an incoming request to this endpoint where a user is requesting the homepage website and we can see that we responded with the HTML code which I just cannot find where what if I close that there it is there’s the HTML code and then in the console we executed that script which is so cool because that was just sent back as text but our browser interpreted the HTML it executed the script and Bob’s your uncle which is absolutely Wicked now the one last thing I want to do just to demonstrate a small concept before we jump into the next project which actually looks reasonable obviously this is a whole hodg podge of stuff that doesn’t look very attractive but it demonstrates a lot of important Concepts is I just want to show you an extra feature that we can kind of amend to this response right here and essentially what we can do is we can throw in a custom status code in front of the do send and in here you know we can we can specify the status code that we want so if I did like a 500 or 599 that wouldn’t make any sense but now if I come back to that get request for for that endpoint and send that we can see that we’ve also specified that status code in addition to getting back the data and then just from within here you know I can add a new user I can rquest the data and the data has been updated and it’s super cool because this little object right here has basically been a database for us it’s obviously a very simplified version of what you’d typically find in production level environment but you know for all intents and purposes this has been a great little database that’s storing users we can access new users and all of this can just be scaled to the Moon based off the same Core Concepts so anyway that’s it for chapter 2 this is our very introductory project just to once again demonstrate how we can actually build a functional server and do some of the core things that we talked about inside of our Theory lesson it obviously doesn’t look great but that’s what chapter 3 and chapter 4 are all about they are absolutely brilliant projects they’re super exciting and it’s time that we absolutely dive into them all right we just wrapped up project number one in Chapter 2 of this back in full course it is now time for us to jump into the second project which is Chapter 3 of this back in full course now this is a really cool project we’re going to be using node.js as we did in the first example to develop our web server we’re also going to learn how we can take advantage of what are currently known as experimental features within node one specifically is the sqlite database now if you’re unfamiliar with sqlite basically it’s just a very lightweight SQL database that is very popular very common place and very easy to get up and running with and it’s now built into the latest version of node which we are going to unlock by utilizing the most recent experimental versions of node via something known as node version manager now the backend application itself that we’ll be building in this course is going to take a lot of the beginner Concepts we learned in the first project in Chapter 2 and really just extend upon them by ultimately just building out a more comprehensive backend application we’re also going to learn how we can tidy up our project directory really develop a project directory that is going to set us up for success as our projects become more complex so it should be loads of fun and the first thing on our list is to upgrade our version of node now when you download node you probably selected a version to download and we can check what that version is by typing in the node DV command in our terminal now this is just the exact same terminal we were using inside of uh chapter 2 in our last project and we can see that the node version I’m using in here is version 20.10.19 to set the minimum version we’re going to need to use these tools if these features are available in the standard node.js version then you won’t have to add any of the experimental Flags but we’ll see what that’s all about shortly but yeah should be absolutely loads of fun now to kick off this process the first thing we’re going to do is get NVM node version manager up and running on our device now to do so what I’ve done is I’ve linked node version manager in the description down below and this tells you exactly how to configure it it might look a little bit overwhelming but at its very core all you have to do is come down to the installing and updating section which we can see right here here’s the install and update and when you see this curl command you just need to copy it and paste it into a terminal instance for example the one that we have open right here so I could literally paste that command and that should theoretically install no version manager on our device once it’s installed you can check that it’s installed uh using the NVM DV command so just in here if I type in NVM DV that will tell me what version of node version manager is installed on our device and if that command’s not working for you then there’s been an issue in your installation and all of that should be covered in this guide how to basically get it up and running once again this is linked in the description down below and if you have any challenges or the documentation isn’t very clear you’re welcome to once again post them as an issue on the GitHub repo for this back and full course which is also linked in the description down below and either myself or someone else can help you overcome that hurdle lastly chat GPT can actually be pretty good at helping you get these installations up and running as well but ultimately should just be copy and PAs in this one line and then you should be able to just type the NVM – V command and that should all be good to go now once nbm is installed on your device it’s so incredibly easy to use here are some examples of the commands basically all you have to do is type NVM install or NVM use that particular node version so in the case of our project the node version we’re wanting to use is is I believe it’s 22 but once again you can use 22 or later so what we’re going to do here is we’re going to type envm install and then we’re just going to type in the version that we want to install which is going to be 22 once again if you have a version that is more recent which would result in it being a higher number then you really don’t have to worry about this part this is just if we have an older version or if you’re watching this tutorial at the time of release so if we hit install that’s going to go ahead and download the version of node that we will need for this tutorial essentially to take advantage of some of the more experimental features so now that that is installed on our computer we can just type NVM use 202 and that’s going to set us up to use that version of node now the next thing we have to do obviously we can close chapter 2 and we’re going to create a new folder which is just going to be chapter3 it’s our chapter 3 project now in here what we’re going to do is absolutely nothing cuz it’s all going to begin within our terminal I’m just going to type the clear command that’s going to clear out the terminal currently we are in the chapter 2 directory from the previous project we need to change that so first what we need to do is go up a directory level so from the chapter 2 directory to to the backend full course level and we can do that by typing the change directory or CD command and then using the double period that’s going to jump us up a level so now you can see we’re within the backend full course directory and from within here we can change directory into the chapter 3 folder and now we can go ahead and initialize our node backend project now if you remember from chapter 2 the way that we did that is by typing npm in it and then the dashy flag and if we hit enter on that command that once again creates this magical package.json file which is the specs file for our project so if I click on that once again we have a pretty rudimentary package.json file and that’s how we initialize a backend project using node.js that allows us to leverage the npm node package manager ecosystem as we learned a little bit about in Chapter 2 so now that we have this package.json file the next thing we’re going to do is actually set up our folder directories and just finish the configuration of our project so that we can then get our hands dirty with all of the code now the first thing I’m going to do is within the chapter 3 folder directory I’m going to create a file called server. javascript. JS the JavaScript file extension and hit enter on that just like we did in Chapter 2 very similar and now that we have these two files we can go ahead and set up some folder directories that are going to be used for their own sub files which is just going to help keep our code a whole lot cleaner than the example in Chapter 2 so the folders we’re going to need for this project number one is called middleware now if you remember inside of chapter 2 we only had a very brief look at middleware right here middleware exists in a lot of different shapes and forms we’re going to have some middle Weare in this chapter 3 project that is all about authenticating a user so that’s going to be super cool so we need a folder where inside that folder we can keep the files that maintain that code now the second folder we’re going to need inside of chapter 3 is called routes now obviously once again in chapter 2 we had all of our routes set up inside this file but you can see how it’s already getting kind of long and that’s not really ideal if you want to have you know a best practice implementation of a server side application so in this case what we’re going to do is we’re going to move all of our logic that sets up how all the different endpoints work into this folder called routes now after that we need another file and this one is going to be a new file in the chapter 3 directory Factory that’s called db. JS now this file is going to have all of the logic for the database which is going to be sqlite which is just a SQL database where SQL stands for structured query language and it has to be the most popular type of database used globally there’s a lot of different examples of SQL databases in this case we’re using SQL light another common one is my SQL and then there’s also postgress SQL which will be using in chapter 4 that’s going to be loads of fun but essentially this db. Javascript file is where we’re going to have all the code to configure our extra special database and then we’re going to need another folder this folder is going to be called public we’ll learn all about what the public folder is for very shortly uh and then last but not least we’re going to need two more files one is going to be another rest file which is used for for emulating the browser so that’s going to be called Todo app. rest just like that we’ll create that file and then lucky last is going to be one called Dov now EnV files are for environment variables and if you’re unfamiliar with what an environment variable is it’s essentially just a storage of keys and values the key is the lookup term and the value is potentially a secret uh string of characters that needs to be referenced for the configuration of our project so any top secret information is going to be thrown in this EnV file and that way we can avoid uploading it for example to GitHub it can stay local on our device and that means we don’t end up in a situation where we accidentally share all of our secret passwords with the world now we’ll learn more about how the EMV file Works shortly but first we’re just going to finish up with the configuration of the files for this chapter another folder that we need to create is going to be called Source now SRC stands for source and I actually made a little oopsy here all of this code that we have created so far with exception to the EnV file and the Todo app. rest needs to go inside of the source code so our server. j s is going to go into the source I’m going to drag that in there the database. JS goes into the source and then the middleware folder goes into the source and so does the routes folder so if we close the source directory we should only have the package.json the EnV and the rest file in addition to the public folder as direct Children of the chapter 3 directory within the Source folder we have this middleware folder we have a routes folder and we have the database. JS and the server.js now once again just a reminder that at any point you can compare your code to mine via the GitHub repo the link to the GitHub repo is available in the description down below and if you do go and check it out if you could star the repo love that support that would be super appreciated now we’re almost done setting up the folder directories the last things we have to do is create the files to go Within These folders now within the middleware we just have one file called orth aut middleware do JavaScript this is where we’re going to write all the JavaScript to handle the middleware and for the routes as you learned in the previous tutorial in Chapter 2 we had two kinds of routes we had API routes and then we had website routes well there can actually be you know a pleora of different types of routes and consequently API endpoints and so we’re just going to subdivide them in this project to orth routes. JavaScript and finally to do routes. JavaScript now in this project we’re going to create a full stack application from a backend application where we serve up a website where the website is essentially an authenticated protected uh to-do application looks absolutely excellent and we’ll see how we can get that up and running shortly but the two types of backend endpoints we’ll need will actually three we’ll need one to serve up our website we’ll need some logic or end points to handle all of the authentication and then we’ll need some logic or endpoints to handle all of the crud operations where we’re creating reading updating and deleting different to-dos in our to-do list and so that’s what these files are for we’re just going to have all of our auth indication routes in here and all of our to-do routes in here and with all of those files now created that is our project configuration 99% complete we’re not going to do the 1% just yet but essentially just to summarize what we’ve done within chapter 3 we have created two folders one is called public and one is called Source we also initialize this package.json via the terminal using the npm in n-y command that is the project specification for our chapter 3 project and we also defined a file a rest file which takes advantage of the uh rest client vs code extension and this file is going to be where we can emulate some browser network requests emulate the client so that’s going to be handy for when we test our endpoints later now for now the public directory is empty we’ll change that shortly and within the source directory the source code if you’re familiar with that expression the source code is basically all the code that creates our application and that is within the source folder now directly within that folder we have two files two JavaScript files one is the server that’s going to be the Hub of our application and the database. JS is going to be for all the database configuration logic and then finally we have two folders one is called middleware we’re going to learn all about the middleware very shortly but it’s essentially what it’s going to do is just handle all of the authentication between the client and the server side between the front end and the back end and that logic is going to go within the or middleware do JavaScript and finally we have two routes folders which is just going to separate all the logic out for the different types of API endpoints that we will need for this application now don’t stress if that’s a lot of information as we work with these files will all become very comfortable and familiar to you especially as we jump into chapter 4 it’s just the same thing but slightly more advanced once again but do make sure that you have configured this folder directory properly because that will be important for linking between different files as we code out this project so now that we’ve set up all the folder directories the next thing we need is to add in all of the npm packages we’re going to need for this project now the list is not actually that great but it’s not as small as it was in the first example for this project we’re going to need to install the Express package so we’re going to use the npm install Express command once again however in this case we have a couple of other packages that we want to throw in so we’re going to type them all in the one line so here we have mpm install Express that’s package number one the next one we’re going to need is called bcrypt JS just like that and that’s the library that is responsible for encrypting data specifically usernames and passwords and that is super important when you’re developing your own authentication system now in this project we use an authentication system known as JWT authentication or Json web token authentic a same thing as we get to that I’m going to explain very explicitly how that system works to create a very secure authentication system and consequently full stack application but for now all you need to know is that we will be needing a package called bt. JS specifically because we don’t want to have to write all the code to create these encryption algorithms now the last package we’ll need is called Json web token and that is once again just another package to facilitate our authentication system so now that I’ve typed out those three packages I’m going to hit enter and npm is going to add them to our project we can see that was super fast and now we have this node modules folder within the chapter 3 directory once again we’re not going to go touching any of those folders because someone else has coded them we’re just going to add them to our project so that we can Leverage that code if you wanted you could check out the documentation for all of these different packages to really understand how they work but I’m going to cover it all in this course anyway but we can also see that these packages have been added to our dependencies list within our project specs which is the package.json file now within here I’m just going to change the description of this uh project this is going to be a full stack uh to-do application that us uses a nodejs backend a seq light database and JWT authentication so that’s pretty cool and now with those packages installed you might remember in chapter 2 we installed a developer dependency called nodon well we actually don’t need that in this project because one of the experimental features that is available in the later versions of node.js is A system that essentially does the same thing automatically reboots our server when we save or create adjustments to the code base so that’s super exciting now with all of that done the last thing we need to do to boot up our application is create a script that defines exactly how npm or node package manager should start up our application now I’m going to call this the dev script and here needs a comma at the end it’s within the scripts field and essentially how this is going to work is it’s going to be node space and then we’re going to use some Flags now the first flag we’re going to need is en mv- file is equal Tov now historically to add EnV files or use environment variables which are the secret protected Keys within a noj application you used to need a package called EnV now within the later versions of node it’s built in and this is how we tell node where to look for our environment variables inside of thatv file the second flag we’re going to need is D- experimental Das strip D types once again this process is specifically if you are only using no node version 22 which is the experimental version of node as node officially releases these features you will not necessarily need these flags you’ll probably need this one but you might not need the experimental flags and then after this experimental flag we need another one this one’s going to be experimental Das seq light as I said earlier if you’re using a more recent version of node at some time in the future sqlite will probably just be built into the official release and then last but not least we need to specify the file that we want node to run and that is going to be dot slash because we have to enter the source directory and then we want it to boot up the server. Javascript file now I actually forgot one flag in here the last flag is to uh tell it to emulate the feature that nodemon used to do which is automatically restarting everything and that’s just a d-at flag now with all of these flags and this whole command set up Suddenly you can really understand why it’s beneficial to have these scripts because instead of writing that out technically I could write this out in the terminal every single time but now if I save that package.json I can just run that script using this simple command every single time and you know we can just have the magic happen so technically we can go ahead and run that command and that is absolutely completed and our application is up and running but there’s nothing currently to run so I’m going to go ahead and kill that uh that’s pretty easy uh and now we can get to the actual code so the file we’re going to start off with is our server. JavaScript and if you recall in our previous project it only took about four lines to get us up and running with a server and the very first thing we did was we imported the Express package so in this case what I’m going to do is type const Express is equal to and then I’m going to use the require command and I’m going to require the Express package and that’s going to bring it into our application or at least that’s what we would have done in Chapter 2 in this project we’re going to look at a different way of importing files and folders into our uh basically application this is a more modern synx so in the newer versions of node it’s now best practice to instead of using the old require syntax instead we just use almost a more uh logical syntax where we just import Express from the Express package so this is a slightly different syntax and this is actually one of the criticisms of node.js is that they jump between these different importing syntaxes and it’s a whole can of worms that I’m not really going to open up right here but the moral of the story is that for this project we’re going to use this slightly different importing sent TX it’s my personal preference I think it’s much easier to work with but to configure our uh node.js application to work with this new syntax we need to come to the spec file and make sure that there’s a little line inside of the spec file that basically configures our application to use this new syntax so underneath the main line right here within our package.json I’m just going to add a field that’s called type and the value associated with that is going to be called module now you might have noticed just there that there were two options actually come up one was called module and one was called commonjs commonjs is for the uh previous syntax that we were using that’s the default value if you want to use the modular syntax then you just have to specify this line right here if we save that then we’re all good to use that newer uh import syntax throughout our project so now that we’ve imported that package we can go ahead and do what we always did which was Define our app and invoke Express that’s pretty straightforward we’re going to need a port so I can define a port just here const Port is equal to and in this one I’m going to use 5,000 we’ll do something a little bit differently and the other thing I’m going to do just here is actually set the 5,000 as a backup because what I want to do is provide a value from the environment variables instead if it exists now when we Define variables Within These EnV files and we’ll see how we can do that later we can read them into our application by typing in process.env and then accessing the name of the environment variable which will be port and once again we’ll see how we can configure that shortly but essentially what this slightly improved syntax does is it just checks if there’s a port environment variable if there is we use that if there isn’t then we default to 5,000 and then lucky last we just tell our app to listen app. listen at the port and then we pass in a function to be executed if our uh server boots up adequately and so that’ll just be an arrow function where we can console.log a template literal string that just says server has started on Port semicolon dollar sign curly braces and we can inject the port just there so that’s our few lines to create the boiler plate code needed to build our entire server now with that done I can save that file and run the npm Run Dev commands and that is going to fail to boot up our application and that’s because that Port is already in use so I must have something running on my device at that Port so I’m going to change it to 5003 I would expect that that would have worked for you if it has then that’s totally sweet let me just change that backup port and now we can see that our server has started on Port 503 and we can test that that watch flag is working because if I say console.log hello world just throw that in there and save that file we can see that our server automatically restarts due to that watch flag and we print out hello world and just like that we have done everything we need to set up a more modern project directory it’s going to make it much easier to develop a more sophisticated back-end application we’ve created the code we need to configure the beginnings of our server side application and from here we can really start to flesh out all of the end points and all of the features and functionalities of our backend application starting off with serving a front end now currently as with the previous project we have started up a server on our local network at our local host on Port 50003 so you know let’s expect we can come across to our browser and look at Local Host 5003 or if you’re using Port 5000 it will be Port 5000 and just like with chapter 2 when I hit enter we get a cannot get it’s a 404 which means that the browser sent out the network request to that address and it potentially may have got there but we didn’t handle that incoming request there was no endpoint that had method get at this particular route to receive that request and consequently respond in this case with what we would expect to be a website so that’s going to be step number one is sending back a website we need to send back a front end that a user can interact with to have a full stack experience and that’s going to be our Authentication protected to-do list so that’s step one let’s send back a website now the question becomes where’s the website well I’ve got you covered I built the website in advance so the front end is completely developed all of the logic is available there I’m going to copy it across to this public directory right here within chapter 3 now for you you will need to go over to the GitHub code which is linked in the description down below check out chap three and copy across the files within the public directory and while you’re there if you could star the repo love that support that would be super appreciated so just here I’m going to copy across the three files into this public directory so there’s a fan. CSS an index.html and a styles.css now Fanta CSS is just my little uh child it’s like a design Library so that just Styles everything styles.css is is all of the layout Styles so not so much of the prettiness of the application but the functional layouts and then the index.html is just some HTML code with some scripts and uh JavaScript at the bottom to handle all of the different uh crud actions and all that good stuff so that’s super important if you want you can totally go through and have a look at the code there’s a little bit of it but at the same time there’s not an infinite amount uh I’ve commented out a lot of it so it should be uh pretty self-explained but once again you know this is a backend full course and the front end comes pre-completed so you just need to copy these files across now as for why we’re copying them into a public directory well the public directory is the canonical folder from within which we serve up everything any assets from our uh for our project so in this case we need to serve up a front-end application and consequently here is our front-end applic so now what we have to do is actually take this you know these files and when we get this network request we need to send them all back across to the browser and then the browser receives the CSS sheets and the HTML and loads that website now that should be relatively straightforward and most of that logic is going to go within our server. JavaScript now what I’m going to do is start off by getting rid of that little console log that’s going to re booot our server and the first thing we need to do is Define this endpoint the end destination for this network request that our browser emits when we go to the URL which when you deploy the application could be you know to-do app.com or whatever it might be so as we saw in the previous uh chapter defining that particular endpoint is actually pretty easy so we can see that the method or verb is a get so we type app we access our server app and we use the get method to Define that endpoint the next thing we need to provide as an argument to the get method is obviously the route and here we can see the route is the slash route so we’re just going to have the slash route and then the second argument is the call back function that’s going to be an arrow function and that receives the request and the response as arguments and now I can open that up onto a new line now we saw in the previous chapter how we can send back some code how we can send back status codes all that good stuff if we want to send some files like we do in this instance we want to send back the index.html and all that good stuff we need to res. send file now in here we need a little bit of code to basically determine or locate the files that we need to send so this might be a little bit complicated but we need to take advant vage of a JavaScript module known as path now we need to import path into our project that’s step number one and that’s native to express so we just import path from the module called path now there’s something else we have to import from this uh path module so what we’re going to do is after this path we’re going to throw in a comma and we need to destructure out this particular input so we’re going to use the curly parenthesis just here and the the uh the item we need to import is called directory name or dur name so that also needs to be imported and while we’re up here there’s one other import we need that’s also native to JavaScript so we’re just going to import and this one also needs to be destructured and it’s going to be file URL to path from a module called URL so that is super cool so these are the two Imports we need and they’re going to enable our Javascript server.js file to look for the HTML files and consequently send them back as a response so we’ve got those Imports now the next thing we have to do is just above this endpoint we need to get the file path from the URL of the current module now that’s a slightly confusing sentence but essentially it’s just a configuration line to allow us to navigate the folder directory that we have just here from within our code so we need to Define a variable called underscore uncore file name and that’s going to be equal to and we’re going to call the file URL to path and then as an argument we’re just passing uh import met. URL so that’s going to give us access to the file name and then underneath that we need to get the directory name from the file path and that’s going to basically tell our operating system okay this is the directory where the files can be found so we need a variable just in here called const and this is the double underscore once again and that’s called directory name and that’s equal to and we’re going to invoke dur name which we um imported above and we’re going to pass in the double uncore file name now this is going to come in handy in a number of places and we’ll see uh all about that just shortly but the first thing we have to do is we have to send this file and what path does all of this ultimately comes down to allowing our code to uh locate files and folders on our device or whatever device it’s running on so path allows us to construct the ultimate path to find these files and folders and so in this case this endpoint is for serving up the HTML file from the slash public directory and so this path we’re going to join the underscore directory name which is basically the directory of our project and onto that we’re going to throw the public directory and then we’re going to throw the index.html which is specifically the file name so this code essentially isolates our directory and then what we do is we join together the current directory and I think I’ve got one too many underscores right there there should only be two so I’ll just get rid of one of them two underscores it joins together the directory with the public folder and consequently the file and that’s how our body of code knows to find that file that it can then send back across the network and that’s literally all we need to do to send back the HTML file so if I now go ahead and save that our server restarts and I can refresh this page and loading that website gives us one error and we can see here this is the ultimately resolved path name and we can see that the issue is that it’s looking for the public folder within the source directory so there’s one last line we need to add and this is known as middleware which we kind of saw earlier and it’s just a bit of configuration and so we need to tell it exactly where the public directory is cuz currently it thinks that the public directory is in the same level directory as the server.js but it’s actually one above so we’re going to basically uh add a line that serves the HTML file from thepublic directory and also tells Express to serve all files from the public folder as static files so that’s what I was talking about the assets the static assets SL files now this is important because uh any requests for the CSS files will be resolved to the public directory and we’ll see exactly how that Works in just a second so we need to throw one little line in here it’s just a little bit of middle Weare it’s part of configuration for our app so we just say app.use and in this case we use an Express method so we access Express and we tell it to use the static method and basically this is saying okay where do we serve the static content from well it’s from the public directory so we call Path We join once again to create the ultimate path or the the absolute path for the public directory and we just go from theore directory name again and onto it we add the double dot /pu directory so that basically says okay you can find the public directory but it’s actually not within the source directory it’s one up and that’s why we have the double dot because that’s how we go up a level of folders so if I now go ahead and save that once again and refresh the page we can see now we actually get back the website so this is the endpoint that literally serves back the file and this is a configuration line that basically says you can find the public directory not quite where we are right now but if you go up one level that’s where it’ll be from so the express. static line basically is used to tell our code where to find the public directory and the public directory is what serves up all of our assets so that’s really cool we’ve literally actually sent back a website where in the previous chapter we just sent back some HTML code written as a string now one thing we’ll note if we right click and inspect and then come across to the network tab we can see that when I hit enter a bunch of requests are sent first is the local host and this sends back the HTML file which itself doesn’t have any Styles it’s not a styled file and so here you can see the styled equivalent of this web page and that set back is all this HTML code however at the top we have these links now when there’s a link essentially what happens is the browser goes out and fetches the information at that link so we can see there’s a SL Styles and a slant. CSS sheet so it consequently our browser went out and sent those requests out fetching the CSS files so here we have this styles.css sheet and that’s the URL at when to fetch it from and because of this line our app knows to serve up these files from our public directory and so that’s what it got back it previewed it got back all the CSS and then it could apply it and it did exactly the same for the fan. CSS sheet here and consequently we load a styled application this is the authentication page it’s super responsive looks pretty nice and neat so that’s hunky dory we now have a website a front end being served up from our backend code now the front end is super cool because when we can later authenticate it’s wired up to send out all sorts of network requests for all sorts of different interactions logging in registering a new user create read update and delete different to-dos and all that good stuff and that will just allow our browser to send out all of those Network requests that will reach the different endpoints that will’ll code throughout this tutorial that are going to go in these routes just here but this this code just for sending up this home website is uh definitely some code that we can have within our server.js now the one other line I want to add before we move on to some of these other endpoints or these routes is just one that allows our server to receive Json information when it receives Network requests that have the method of post or put potentially if you recall that’s something we did in Chapter 2 to enable our endpoints to actually interpret that Json data which could be a username it could be a to-do or anything so whenever the client is actually sending information instead of just asking for something via a get request and that’s just one other line of middleware so this is going to be the middle Weare right here this app.use and we’re just going to add one other line of middleware and that’s just going to be called app.use Express Json so that basically just configures our app to expect Json and consequently enables it to pass or interpret that information so we’re just going to throw that in there as well and I’m actually just going to move that up uh directly under the middle Weare line above this other one because this is uh specific to this line here now once again if you’re wondering how I magically mve that line around I’ve got a link to all of the vs code shortcuts that I use in the description down below there’s a website I made that uh basically tells you all about them and just like that we’re almost done with our server.js most of the code that we’re going to write from here is going to be within all these other routes so now that we’ve just about done all the logic for our server the next file I wanted to get started on is the database because we’re going to need our database up and running if we want to do any authentication if we want to have any data storage and we can get started on that by heading into the db.jpg package and the way that we do that is we import and we need to destructure out this particular import and it’s called database sync so it’s a synchronous database and we import that from node uh semicolon from the sqlite package once we have that imported we can go ahead and just like we created our app with Express we create a database by creating a variable called DB and setting it equal to and we have a new synchronous database and in here we pass in a string and that string as you can see just here in the documentation is going to be an inmemory database and so that means we don’t have to manage any external files and so we’re just going to have the semicolon and type memory now this isn’t what you would use for a production database we’ll see how we can configure that in the last project in chapter 4 but if you just want to get up and running with a SQL database then we’re just going to use memory for that and that will be more than adequate so now that we have our database the next thing we need to do is basically set up our database when we boot up our applic a now for that we need to execute uh some SQL statements from strings now the way that SQL works or structured query language is it’s you can almost think about it as an Excel spreadsheet where you have different columns and different tables where a table is kind of like an Excel sheet so you can have different sheets for managing different data now unfortunately when we first create our database none of these sheets exist or none of these tables exist the table is the literal term for it so in this case we’re going to have two different tables where each table is like a sheet one sheet is or one table is going to handle our users and then the other is going to handle all of the to-dos and for every to-do it’s going to associate them with a user now to actually make this happen within the database we write command using the SQL language and using this node package we can get our JavaScript to execute these commands and configure our database so what we’re going to do is we use the database do ex uh execute method that’s going to execute a SQL command and act it upon our uh database now that takes a template literal string as that’s going to allow us to uh write some um strings across different lines now the SQL command to create a table where once again we need two tables where each table manages different data one table is specifically for users and once again you can just think of that as like a tab as a tab as a tabular database like an Excel spreadsheet we need to actually create it so we create a table called users just like that now after we create the table we need to specify some of the different columns in our table so we have these circular parentheses and we’re going to enter the circular parentheses on some new lines now in here we enter the different columns and we specify what kind of data type they’re going to be in addition to some other information so the first field is going to be an ID now the ID is going to be of type integer so that by itself is pretty straightforward after that we’re going to have a comma and then on the next line we’re going to need the username and that’s going to be a text field and it has to be unique so we can throw the unique key onto it and then lastly we have a password field and that’s also going to be of type text so this SQL command right here is going to be executed upon our database and will configure that table so that our database is up and listing and it’s ready to accept the new users where each user has a username and a password and that gets save to the database now the second command we’re going to need is for our second table and that’s going to be for all of our to-dos so we’re going to have a database. execute and we want to execute a SQL command I’m going to open that onto a new line and this one’s going to be very similar we’re going to create another table which once again is just like a sheet and one is going to be called to-dos and then we’re going to have the circular braces where in here we’re going to specify the different columns it’s pretty straightforward the first one is also going to be a unique ID the ID is the best way for referencing different lists or different elements in the uh table and that is once again going to be an integer field then we have a comma for the next column the second one is going to be a user ID now this field is going to also be an integer field but more importantly it’s going to be the field that Associates it to do with a particular user so every user is going to have an ID right here and the user ID field is going to keep track of which user a to-do is for and that’s super important so that when someone authenticates they only receive to-dos that are specific to them now to create this level of community communication between tables we need to essentially uh configure a field to be what’s known as a primary key so what that means is that in this ID field here since I’m saying that the users table needs to be able to be referenced from other tables and we’re going to reference by the ID we need to set this key up to have superpowers and essentially create it as a primary key so we use the fields primary key right there and that’s going to set up this d as like a superpow key that can be referenced within other tables so such as the uh to-do table right here now the last element I just want to add onto this one is called Auto increment and that’s because when we create a user we’re not going to specify an ID we want it to be automatically assigned to the new user and we just want them to Auto increment so our first user is going to have the ID of one the second user is going to have an ID of two now now all of the SQL stuff will become more and more clear to you as we continue to use it and also as we actually look in the database as we create all of these interactions and finally in chapter 4 as we build out a more complex database and literally start interacting with it obviously in both of these applications using the application will save data to the database but there’s like basically uh hacker ways that you can overwrite it and work in the background and we’ll see how all of that works and it should be a really beneficial experience to help you understand exactly what’s going on but anyway the moral of the story what happened there is we have this user ID which Associates a to-do entry with a particular user but to allow that communication method we need to set up this ID to be a primary key so that it has superpowers so that we can reference it from within other tables so now we have this integer field and this one right here does not need to be a primary key because it just refers to this primary key however the ID of the to-do every to-do also has its own unique ID this actually also is going to be set up as a primary key and uh we’ll configure it so that it auto increments after the user ID filled we obviously have the task and that’s just going to be a text type uh and then we have a completed status which is going to be a Boolean yes or no so it’s either complete or it’s not complete and that’s going to track the status of it and that is going to have a default of zero which is going to be false so we’re going to use a numeric value to track the true or false state so zero is false one is going to be true and finally we have a foreign key which is going to be the user ID uh and that is just going to reference the users ID field so that’s obviously quite a few SQL commands once again this is a backend full course we’re definitely uh not giving SQL the attention it deserves the SQL ecosystem is incredibly you know

sophisticated and it’s you could spend 100 hours looking into it uh and becoming more and more competent with SQL but you know for now we just need to configure the tables and as I said earlier when we start going behind the scenes and modifying uh our SQL databases using SQL commands all of this will become much clearer and apparent to you but for now we just need to get them up and running so if we go ahead and save that that’s the code we need to create our two database tables set them up with some columns and give them the means to communicate and reference one another now the last thing we need to do from this file is export have a default export of DB and this line right here is going to allow us to interact with this database file this database variable from other folders and files such as from within our server such as our or routes our to-do routes and our middleware and as you can see it also allows us to keep a very tidy project directory because you know I’ve got 23 lines to configure an entire database which is super sleek and now I can you know quickly know where to reference that code it’s not all just jammed into one file uh everything is compartmentalized so what our database created it’s time we start setting up some endpoints to manage our authentication which is going to be step one of getting this front end working properly with our database and backend now just before we jump into our orth routs and the next section I noticed there’s one little error I made in this particular file so we’ve configured our database so that when we boot up our application it basically creates these tables where we can then save all of our data however when we reference between the two tables when we assign a user to a to-do or associate a to-do with a particular user we’re referencing the users table however we only named it user so that just needs to be pluralized and we can go and save that and that is now fixed so for the next step let’s actually come back to the application which is now being served up I refresh the page this is what loads and let’s try enter a user I like using test @gmail.com and I just do a password uh it’s a couple of digits and now let’s see what happens when I click submit having this network tab open so I click submit and I get an error showing up and if I look at the network tab we sent out a request to let’s take a look where let’s have a look at the headers section here we have the header it’s got the general information including the URL and this is the end point that we sent this network request out to from the client to the back end there’s a post request which means that it contains a payload and that’s got a username and password and that is specified as Json information and we did configure our server to pass that information with this line here we said to our server expect this Json information however even after having done that you know we got a 404 which basically said there was no response we didn’t hear anything back no idea what happened and that’s because we don’t have an endpoint set up for this particular route and that’s why we got the 404 so what we need to do now if we want to log in a user or let’s say I want to sign up instead let’s go and submit that to the registration endpoint we need to create both of these end points cuz right now we’re getting back 404s because they don’t exist we haven’t made them and they are the endpoints we’re going to be creating within this orth routes. Javascript file now obviously uh it’s super fun to use this interface to be able to send out these Network requests but we’re going to also do the exact same thing from our client emulator which is this to-do app. rest uh but we’ll see how to do that very shortly first let’s actually create the endpoints cuz there’s no point in emulating these Network requests if there’s no code to rece reive them so from within the or routes this is where we’re going to Define all of these endpoints for handling the authentication functionalities now in here we need to do a few things one is we need to import Express from the Express package two we need to import bcrypt from the bcrypt JS package now if you’re called bcrypt has all the code for encrypting the passwords and creating a truly secure application and as we come to this code and implement it I’ll explain a little bit about how the encryption algorithms work we also need to import a package called JWT from Json web token and that’s just going to allow us to create a Json token which is just uh an alpha numeric key that is essentially you know a sec password that we can associate with a user to authenticate them uh when they make future network requests but without needing them to uh sign up again so that’s going to be important and the last thing we need to import is our database and that’s going to be from our data uh our db.jpg import it into another so these are the four Imports we’re going to need the databases obviously because if we’re registering a new user we need to write that new user to the database and if we’re logging them in we need to check the database to see if that user actually exists now one New Concept we’re going to introduce here is how to configure endpoints or routes when you’re not defining them in the original file if we come back to chapter 2 just here I made a whole lot of endpoints in this server JS it was pretty straightforward we just called our app and we configured the endpoint for the method and the route and consequently wrote the logic to respond and obviously that works we’ve already done one example of that with this endpoint right here this home get endpoint that serves up our HTML website however when we’re subdividing uh or compartmentalizing our routes into these sub files we need one extra basically configuration layer and so what we’re going to do is Define something called router it’s a variable called router and that’s equal to express. router now the reason we do this is because what we do from here is we export our default router and then and it needs to be a lowercase R and our main application just in here what we can do within the server.js is we can Divi uh Define a section here called routes and instead of writing out all of our end points we just say app.use and for any authentication routes so any routes that are within this path we just use orth routes now or routes is an import that we need so we come up to the top just here we’re going to import orth routes from / routes slor routes. JavaScript so there’s a few steps there let me just go over them once again obviously we just got an issue right here cannot find this particular module uh rout slor routes we’ll get to that in a second but anyway the moral of the story is that we inside of or routes just here create this router and it’s to this router that we’re going to assign all of these methods so it’s basically like a you know a subordinate app or like a a a subsection of our app where we can create all these methods so for example I’m going to have one method that is a register method right here so it’s a post request to the register endpoint which if we look just here this is the post request to the register endpoint and we’re not throwing in this SL or throut on the front and we’ll see why in a second it’s just to the register endpoint and then just as we have been I can provide a second argument which is the request and the response and that is the function to be executed when some code hits this endpoint now when we export this router and we import it into our server ensuring that I save it and then we use this line right here it basically takes all of the routes that we Define for the or routes and It Slam them on the end of the SL or route so it combines basically the paths or the routes and so that will Define this particular endpoint so we’ve got the SL or and then all of the end points we Define within our or routes will just be these sub routes within that now we’ve got an issue just here that says we cannot find the module uh DB so that is within our orth routes and I think that just needs to be db. JS instead so let’s save that and now that’s working perfectly so what we’re going to do inside of this uh authentication routes section we’re going to create two endpoints so instead of using the app now we’re using the router and that just allows us to subdivide all of our uh endpoints and routes into these nice little files and so we’re going to have a secondary post request and this one is going to for/ login and that’s once again just going to have a call back function that receives the request and the response as an argument and this is going to contain the logic to log in a user when we hit that endpoint now when we save that these two endpoints that we’ve defined in here are added to this router which is exported from this file and then in our server.js we import all of that as orth routes and then we just slam that on the end of any slor request we tell our app to use all of these or routes when we hit end points that contain the/ oror route now the exact same thing is going to happen within the to-do route so we may as well just configure that before we actually get into the nitty-gritty of uh defining all the routes so in this uh to-do routes file we’re also going to import Express from the Express package we’ll also need our database so we’ll uh import the DB from DOD database. JS then we’ll Define the router which allows us to create these uh specific routes Within These sub files so router is equal to express. router and then we can just have a router doget method and this is going to be to get all todos for a user so this one get all todos for logged in user and then we just have the request response set that up that’s going to be an arrow function then we need another one this is going to be to create a new to-do and in here we’re going to have router. poost cuz if we’re creating a new to-do we’re not just asking for information we’re actually sending over what the new to-do is going to be that’s going to be some information entered into the front end that’s sent over the Network as a network request our back end is going to receive that post request uh and that’s going to be to the/ route and then we’re going to have the function to handle that and consequently save it to our database we’ll need one for update a to-do and so this one is going to be a put method which if you recall putting is for when the networker quest wants to put information in the place of an already existing thing so post is for creating and put is for modification so this is going to be to the slash now the route for the put is slightly more complex if you recall within the database here when we create a to-do all of them get an ID now if we’re updating a to-do the way that we do that is we update the to-do with the ID that matches the ID of the one that we’re updating so we check the database we match up the ID and then we make the modification specifically to that task so that means that when we send out this request we actually need to specify the ID now one way we could specify that ID is by posting that as Json but another way is by using a dynamic query parameter essentially what we do is we use the semicolon and then we provide the parameter where you know if I actually created the request I could use IDE of three in the place of this now we’ll see how all of that works when we go and create or uh when we create the emulations of all of these Network requests but for now this is just a dynamic ID which is going to allow us to identify exactly which to do we need to make the modifications to now in here we still have the Callback function or the function to be executed when our Network request hits that endpoint we’ll still have a body of information that is sent over is Json but we just specify the ID just here so that’s nice and neat and then the last one we need is to uh delete a to-do and that’s going to be router. delete and that is also going to be to the ID parameter or ID path and that’s going to allow us to basically say only delete the to-do entry that has this particular ID and that’s also going to have the function now with all of these endpoints done obviously we’ll come back and create the code for each of them later but I’m just going to export default the router that’s going to assign all of these end points to this router entity and then once we’ve exported it we can save this file and come into our server.js and we can import from the/ routes f older and then consequently the to-do routes.js and I’m going to import these as a variable called to-do routs now the name that I use to basically assign all of these Imports to doesn’t have to match the one that was exported from the file so in both of these cases I’ve exported a variable called router but when I’m importing them I’m importing the value and assigning it to this name essentially so to do routes and now what we can do is we can just duplicate this line right here except in this instance it’s going to uh be for routs that are to the to-dos endpoint and ultimately we can see with this configuration it’s just going to allows to have a whole lot more end points but basically uh subdivide them into their own files which is just going to keep any everything a whole lot cleaner uh so that’s super nice and neat and technically we’re actually finished with our server.js file everything from here is just uh filling out all of these routes and their functionalities now that’s actually a lie there’s one last thing we need to do uh and that is add some middle wear to the to-do routes because when we assign them we have to throw in some middleware that authenticates a user before they can actually have access to those end points so we’ll come back and make one little modification to this line later but that should do for the minute now one little error we just need to clean up real quick is uh in the server I noticed that this was not the we meant to assign the to-do routes to this/ too route so now that we have all of this code done we’re ready to go and fill out our endpoints the next thing we’re going to do is actually create the emulations for all of the functionalities now the reason we do this is because obviously when a user uses our complete application they’re going to be able to do all of it from the user interface however while we’re still developing our application it can be useful to basically predefine all of these interactions and we’ll set them up in here and that way we can emulate these functionalities as if a user was using our application and we can ensure that our backend is set up to handle everything now this process is kind of analogous to running tests in JavaScript or any particular particular programming language the testing where testing Works in a similar manner essentially you just basically think of everything a user could possibly do and then you create those actions programmatically and then you can be sure that they are working adequately so the ones that we’re going to start off with are the authentication routes is the first thing the user would come in and do they’re going to have to register user so I think that’s a good one to start off with and so what I’m going to do here is I’m just going to add in some triple pounds inside of our uh too- app. rest and they’re going to separate the different emulations now I actually lied the first one is going to be the get/ endpoint and that’s going to be to check that the website loads so when a user hits this endpoint they are sending out a get request to the HTTP localhost and then our app is on for me it’s uh port 50003 for you it’s potentially 5,000 or whatever other Port you specified and so they’re going to enter this URL and they expect to get back a website so we can now go ahead and test this endpoint and we see that we do in fact get a successful response we get a 200 response which means success excellent and we can see just here we have all of the HTML code that if we did this via a browser or if a user did it via a browser uh would then be interpreted by the browser and rendered onto the screen and all of the JavaScript would be run so that’s pretty cool that’s our first endpoint setup and it’s working we’ve created an emulation for it and we can see that uh that communication between the client emulator and the back end is successful now the next one is for registration so what I’m going to do to test exactly how the registration works is actually come over to the index. HTML and we’re going to look at how the the client or the front end actually creates this registration Network request from the browser and this is also going to be beneficial if you wanted to come and Fiddle with this front end code uh it’s pretty self-explanatory but we’re just going to uh run through it together so let’s look for the function that registers a user here it is authenticate so if we come down we see this line of code just here obviously there’s a bunch of guard Clauses up the top that basically just say if a user doesn’t have a username or password then let’s not even bother sending out a network request but we check to see if the status is is registration and if it’s true then this code right here is the code that creates that Network request via the fetch API and here we have the URL or the to which we send this network request so we can see that the Point is/ orregister API base is just up here the API base is uh the Local Host in this case it would just be slash because it matches the uh host from which we serve up this uh front end now if we look at the network request we can see that the method is post we can see that we specify the content type and we can see that we have a body which contains some Json which is originally an object and that contains a username and a password field so what we’re going to do is literally just create these three fields from within our client emulator so first up is the method it’s a post request and then we provide the URL so that’s going to be HTTP sl/ localhost 003 and then that’s to an orth route and a register route and together that creates that specific end point point and if we check our server we can see that all of our or routes go to the/ or route right here and then within our or routes we can see that we then have the register endpoint and this is the endpoint that we would expect to receive this network request so that’s the post request set up now if we’re posting information we need to actually create that information and that’s going to be a Json object right here and that has two Fields it has a username as we saw I’m just going to leave that as an empty string for a second and then we also have a password field and that is also going to be an empty string and that’s just set up like an object except we uh ensure that all keys are stringified using the double quotations now just for formatting reasons it’s important that we keep a space above the Json that we’re creating for this network request now the last thing we saw inside of our index. HTML is that we had some headers where we specify the content type to be Json so we just have to do that as well that’s what our front end would do so that’s what our emulator has to do so in here we’re just going to specify the content type parameter and that’s going to be application Json not JavaScript just like that and so now we have set up our client emulator to emulate that Network request as if it were a user actually using the front end and since we’ve created that endpoint we should be able to send this request and because within that particular endpoint which we have right here we don’t have any code to respond what will actually happen is the network request will find this endpoint the endpoint exists and it will just wait indefinitely it will wait for a response which it never gets and typically there is a timeout associated with uh receiving a response and if the timeout is reached basically we don’t get a response within a period of time then it will default fail that request and that will take a second and that’s opposed to if we hadn’t Define this endpoint we would get an instant 404 saying the endpoint itself actually doesn’t exist so now if I send that request we can see that we just sit here waiting for a response nothing’s happening and that’s because the code has hit this end point but this we don’t get a response back so we just sit waiting indefinitely now because that’s not what we actually want but we have confirmed that we’re reaching this endpoint we can cancel that little Network request and we can start defining some of the logic to register a new user so the way that this works first what I’m going to do is just comment that this is a uh a register a new user endpoint and that is the/ or SL register route and it’s obviously a post request now the first thing we need to do if we’re registering a new user is our back and receive this uh Network request we need to figure out what the username and password associated with this network request are we know they’re posting information and when we post information specifically as Json that is always contained within the request the incoming request and it’s associated with the body key now once again this particular line right here allows us to read that Json body of the incoming Network request so what we can do is we can say const body is equal to requestbody and that’s going to give us access to the Json body of the incoming request now one thing I’m actually going to do is save ourselves a step right here and instead of creating this variable variable body from within which we would have to access the username and password key I’m actually just going to destruct rure out the username and password directly and so now if I wanted to console.log the username and the password and for the second we can end this communication system and just confirm that it’s working by res. send status and we can send a 2011 which would be typical for creating a new user so we can now test the completion of this uh communication cycle from client to backend and back to the client and the reason I want to do this is I just want to confirm that I can in fact access the username and password which we will then need and I also want to confirm that we can successfully respond to the network request so let’s go ahead and save that that restarts our application and I can now emulate this request now notice just here I am printing a line but I’m not actually getting anything out of it and that’s because I’m a nump d and I haven’t Associated any values so let’s go ahead and throw in some values here let’s just say the username is Giles gmail.com I’d be so curious to see if that’s someone’s actual email and I just like doing 1 123123123 as an easy password and now uh and just before we can see that we did in fact get back a successful request but I really want to console out the username and password so I’m going to rerun that and we can see now that we do fact log from the back end in the backend uh terminal in the backend console the username and password and that implies that we can access this code inside of our or routes and save it now to register it to our uh database so that is super Nifty absolutely excellent step one complete now while we’re here I’m just going to do the exact same thing underneath this other pound key except this one is going to be for the login uh route and that is just going to mean that once we’ve created the login code login backing code we should be able to emulate the registration and then emulate the login actions that quickly as opposed to faffing around with a front end and that is actually the beauty of API API endpoints when you get more accustomed to them sometimes it can actually be convenient to cut out the front end all together and just do it programmatically via an API but obviously if you’re having a to-do application then it makes sense to have a front end to do that uh the one thing I want to do before I exit this file is just explain what these end points do so this is to register a user and that is a post to the/ orregister Route uh and I’ll do the same for the following one the triple hash key basically creates a code comment so this is going to be log in a user to the or/ login route so we’re finished our emulations for these routes let’s go and dive into the code so now that we’re ready to write the logic to register a user it’s time to get technical and let’s talk about Security in developing a full stack authentication protected application now one of the biggest oopsies companies make is they get into the habit of when a user creates you know a new account with a username and a password they save the username and the password to the database and the problem with this is that if they got hacked for any reason suddenly everyone’s password is exposed to the world now usernames aren’t as important as long as you don’t have both username and password so what we do instead of just verbatim saving people’s passwords as a string to a database is we Crypt them and the way that we’re going to do that is with this bcrypt package now the thing that becomes challenging as an outcome of encrypting every password is that when we go to login we look up the username and we check the password and we need to see if it matches the one that they’ve just entered the problem is when we look up in our database the password associated with the user we only get back the encrypted one now the problem with that you might say well why don’t we just decrypt it these encryption algorithms are one way now there’s a whole lot of technical information that explains exactly how that system works I’m not going to dive into it but the purpose is is that you can’t actually decrypt that password and that makes it so incredibly secure but it also means that we can’t match the password in the database with the password that’s just been entered or at least that’s what you might think so just to really give you an example let’s say just here we get this username and password and then we save Gilgamesh gmail.com and as the password we end up with some long series of keys like that and it just looks like mumbo jumbo and this is what we save to the database so we let’s just say we save the usern name and an irreversibly encrypted password so that’s what get puts in the database now when we log in a user we get their email and we look up the password associated with that email in the database but we get it back and see it’s encrypted which means that we cannot compare it to the one the user just used trying to log in so what we have to do is Again One Way encrypt the password the user just entered now the encryption algorithms are deterministic which means that when you encrypt a particular word using a particular key they encryption algorithms always have a key associated with them and that’s basically just a way of them to create mumbo jumbo consistently and so what we do is when a user enters a password if we encrypt that using the exact same algorithm it will get to the exact same outcome and then we can compare the two encrypted passwords and that’s how we authentic to us it now security is a whole big topic and I really just did a 5minute overview as we practice it and implement it it will become more obvious but yeah on the whole essentially what we do is we use a special key we encrypt a password we save it to the database when a user registers and we do that so if we ever get hacked the passwords are totally meaningless cuz they’re encrypted irreversibly when a user logs in we take the password they’ve just entered and we encrypt it using the same algorithm and that because it’s deterministic will produce the same encrypted key and if they’re exactly the same we know that the password the user just entered must have been the one they used when they registered their account and therefore they are equal and the user is the correct individual now the way that this works from a programming standpoint is the first step we need to do is to encrypt the part password and so what we do is we say const hashed password we create a variable called const hash password and that is equal to the bcrypt library we’re going to use it to encrypt our password and we use the synchronous and we use the hash sync method and we put in the password and we provide a secondary key which is the salt which here we can see in the prompt the salt length to generate or salt to use defaults to 10 and that’s just going to help us synchronously generate a hash for the given string so in this case we’re going to use the value 8 now we need to use that consistently as well so now we have an encrypted password and I could actually console.log the hashed password just here and remove this other uh line and now we could emulate that request and see what the hash password looks like so let’s go ahead and run that and here you can see this is the hash password that we would save the encrypted password we would associate with a user and this is irreversible so if our database was hacked no one could ever undo these encryptions and figure out what the passwords originally were and so that’s what we securely saved to the database instead of just saving the plain old string that a hacker could totally take advantage of so now that we’ve done that we can come back to our orth routes and we now have a hash password that we can save to a database now now when we’re interacting with a database in production environments typically a database is actually a separate server entity and this case we’re having it all within the same server entity and there’s nothing wrong with that it’s great for development in chapter 4 we will separate them into their own backend entities but because in production basically we’re creating a new communication uh Bridge or system so now we have frontend server and the database I like to throw this code in inside of a TR catch block where we catch the error and that’s just going to allow us to handle any errors we might encounter in this process and that’s super important for having a functional robust backend so in here I actually like to do the catch case first what we’re going to do is we’re just going to console.log the error. message if we get a message and we’re actually going to respond to the user we’re going to respond and send a status of 503 now if you remember 500 LEL codes which are between 500 and 599 mean that the server has broken down somewhere in this process and that’s exactly what would have happened if we fail to save a user to the database so let me just add a line here this is going to be to uh save the new user and hashed password to the DB so we send back the status now one important thing to note is that if we send back this status we can’t then send back another status that’s going to give you an error there can only be one status one response and so we’re going to either send back one if we bug out the code or if we successfully run this Tri block then we’ll send back a 2011 or actually we’re going to send back a token we’ll see how that works in a second so as for the logic that we need to save to the database well what we’re going to do here is actually run some more SQL queries so the first thing we’re going to do is create a variable and it’s going to be called const insert user and that’s equal to DB and we run a prepare method now the prepare method is pretty equivalent to the ex execute method right here exec where we basically just run a SQL query however the prepare method allows us to inject some values into these SQL queries so what that actually looks like is we write a SQL command right here and the SQL command to add an entry to an existing table inside of a database is we say insert into then we specify the DAT the table within the database so that’s going to be the users uh database and then we have some circular parentheses es and in here we have a username and a password those are the two columns that we want to insert into if you remember in when we configured this we had the fields uh username and password those are the two columns within the users table the ID is automatically assigned so we don’t have to worry about that so we’re going to insert into the users table specifically The Columns username and password and then we specify the values and in here that’s just some circular parentheses and for the minute that’s going to be a question mark and a question mark so that’s going to prepare the SQL command that we’re going to run and then what we do is we Define a second variable called the result and this logic is a little bit specific to the sqlite database that’s part of the node ecosystem but now what we do is we take the prepared query and we run it but we call run as a method and we pass in the values that we want to save to the database so in this case it’s the username which we destructured out of the incoming request the body of the request and the hashed password so just to summarize those two lines we first prepare a SQL query where this is just our SQL query we’re to insert data into a table that exists within the database we insert into then we spe specify the table and then we further specify the exact columns to which we want to add information so we want to add information to the username and the password columns and then we specify the values and we basically leave them as blanks until we then run that SQL command and then we provide the actual values which will be injected into these places and that will be sent into the database now one thing I like to do just in here is when we register a new user and consequently create a new user in the database I want to give them a default to-do so I actually want to create a to-do for them that will then be shown on the screen and that’s just going to basically give them you know an entry in their to-do list to prompt them to create some more and understand how the application works so now that we have a user I want to add their first t to do for them now in this case the to-do or the default to-do that I want to add is going to be called const uh default to-do and that’s equal to a string that just says hello exclamation mark and says add your first to-do and we can kind of see how that is just going to I mean technically it’s a to-do and I’m actually going to change that for a smiley face that’s technically a to-do they can complete it when it’s done and that’s just going to prompt them to create some more to-dos and now that we have that line we can create a variable called const insert too and that’s equal to database. prepare and we prepare another SQL line or command so that’s going to be the template literal string and inside here we’re just going to insert into the todos table and then inside the circular parentheses is just here we’re going to specify the columns to which we want to add information so if we come look at the schema for the database we can see right here we have the ID that’s automatic we have the completed status that’s automatically assigned as default to incomplete when we add a new to-do so the two fields we need to specify are the user ID that the to-do entry is going to be associated with and the actual text for the to do so in this case we’re going to enter information into the user ID and the task columns and those are going to have the values and once again that’s just going to be a question mark and a question mark so that’s going to prepare the SQL query and then we’re going to go ahead and run it so we’re going to just say in this case we actually don’t need uh to assign it to a variable we can just uh type insert Tod do. run and now the first entry we have to provide as an argument to this run method is the user ID and the user ID can actually be uh found from within the result of creating the first user so we can in here get the result and we can access the field called last insert row ID so what that does is when we get back the result we just check the uh ID of the last row or entry added to that table in which case it’s going to be the ID associated with the most recently entered uh new user so we get that ID that’s the ID that we want to associate the to-do with and then we just provide the value which is going to be the default to-do and that’s going to go ahead and insert the to-do and then the last thing we need to do now that we have within our database added the user to the user’s column and created their very first to do is we now create a token now the token is super important because once we log in a user they are then in a position to create new to-dos update to-dos delete to-dos but those to-dos are specifically associated with that user and we can’t let them modify everyone’s to-dos just theirs so whenever they run those actions whenever they try to add a new to-do we need to associate a special token or key with that Network request that confirms they are in fact an authenticated user so this is kind of like an API key in a sense we create a token and in this case the way that we do that is we say const token and that’s equal to Json web token. sign and in here we pass an object and the object has a key ID and that’s just the result. last insert row ID as we had just up here uh so we get the ID of the most recently added user and then the second value that we have to provide to the signing method is an environment variable and this is a secret key so what we’re going to do is we’re going to say process.env and that’s going to read the environment variables file and then we’re going to access the JWT secret key and then finally we’re going to have a third argument and that’s going to be an object and that’s just going to have a value expires in and the associated value is going to be uh 24 hours so that means that the special token that a user can attach to their Network request will expire in 24 hours at which time they’ll have to reauthenticate to gain access to a new token now as for this JWT secret we don’t have it yet now this is a secret key that only we know and because it’s a secret key our immediate first thought is to throw it in the environment variables because if people gain access to this key they’re one step closer to decrypting all of the passwords and being able to basically fraudulently act on behalf of a user so in here we’re going to create this environment variable called JWT secret the name needs to match whatever we throw on the end of process.env so that name has to match so we’re going to create this uh key inside of our environment variables and we’re going to set it equal to and in this case it can just be a string it can be any particular string so I’m just going to say your JWT secretor key you could fill it out with anything that’s going to work for us for the second now one other value I want to pass in here is a port and I just want to set that equal to 53 and that means that uh our server is actually going to use that port instead of defaulting to the 5,000 three so now that we have that done we’ve created this token we now have to send it back to the user so we just say res. Json that’s going to send back some Json as a response and in here we provide an object and we use the key value of the token so this syntax right here is going to create the key token and assign the value associated with this new token now what happens when we emulate this request we’ finished with the logic for this particular endpoint we add a new user we assign a default to-do to that user and then we create a special token that we can use later to confirm they are in fact the correct user well let’s go ahead and emulate this request so I go ahead and run that and we can see that we have now added a new user to our database and we get back the special token that looks a little bit like the hashed password we assign to the data base but it’s actually not it’s a unique token and this token contains all sorts of information and essentially what the front end does is if we look at the logic let’s close that and open up the index.html we can see here we get back some data so we receive the Json we basically pass the Json and we assign it to a value called Data now this is all within the front end and then if the data contains a field called token we save that token to the local storage which is basically a client side database it’s how all data is persisted on a front-end only system it’s kind of like a cookie if you ever get asked you know do you want to save your cookies it’s a similar concept it’s saved to local storage so that we can conf consistently access the token even if we refresh the page or reopen it a day later and then if we have that token we then fetch all the Tod do associated with that token now in this case uh if I come across to my application just here and go into the local storage we don’t yet have a token but this is eventually where that token will be saved but if we once again come across to the index.html and now we go down to the fetch todos we can see that when we fetch these to-dos which are going to come to this particular endpoint within our to-do routes we’re going to get all to-dos associated with the user that code comes right here in this fetch to-dos function within the index.html we can see the fetch API is used to send out the network request we send it to the too’s route and we assign some headers with it now in this particular fetch request I don’t specify the method but by default it’s going to be a get request cuz we’re getting information now what we do within the is is we specify for the authorization we add the token and then when this network request is sent out with the token encrypted within the network request we then receive that endpoint right here however eventually it will be intercepted by our authentication middleware which is going to basically check that the token is for a valid user and then we’ll only send them back to does associated with that particular user and that particular token now I recognize that’s probably a mountain of jargon a whole lot of new Concepts if it feels overwhelming absolutely don’t stress when we code all of these systems out and really get a good understanding of what does what and how it all works and how it comes together it will become much much clearer but anyway that is our first authentication endpoint done the login one is fairly equivalent and that’s what we’re going to jump into to now and we should be able to check that they have worked and are working successfully by from within our client emulator and we will eventually just be able to register a new user and then log in a user all right so we just finished up our registration route that’s all working we tested it with our client emulator looks absolutely excellent and what that registration route does is it creates a new user inside of the user’s table with a username and pass password so now once we have a registered user we can allow them to log in and the way this is going to work if we come into the index.html and look at our authenticate function here we have the is registration code we’re obviously not registering anymore we’re logging in so we hit this else case where we log in an account and you can see first it’s to the/ off/ login R route or path we post information as part of that Network request it’s of cont ENT type Json and we transmit a username and a password over as the body of the request now when that hits our endpoint the first thing we’re going to have to do here is destructure out the username and the password and the reason we have to do that is because we need to check our database for an existing user that matches that username and then we need to retrieve the hash password and compare the two of them see if they are valid so the first thing we’re going to do is just like we had inside of the registration route we’re going to destructure out the username and the password where if you haven’t picked up on this yet the username and the email are equivalent now they come from the requestbody which is the body of the request which is the information that is being posted with the network request now whenever we interact with the database we’re going to throw that inside of a TR catch block where we catch the error and we console.log the error. message so if we do have an error we can see what it is and in the case that we get an error we’re going to send a status we’re going to say response. send status and we’re going to send a 503 which indicates that we had an error in the back end internal server error now the tri block is going to contain the logic that is going to attempt to interact with the database and just because that can potentially be a precarious oper operation for example in the instance where our database is shut shut down you know we need to handle that potential error case anyway so it’s time to interact with the database the first thing we need to do is pull up the existing user so what we’re going to do is Define a variable called const and we’re going to get the user get user and that’s equal to database and we’re going to prepare a query that is going to read the database for this user so we’re going to go database. prepare invoke that method and in here we’re going to have a string now the SQL command we need to use to read an entry from the database is we say select and then we use an asterisk key to say we want to read every single column from the database so we read everything from the users’s database and now what we’re going to do is throw in a condition so we’re going to read in all of the data from users where the username is equal to and then we’re going to have the placeholder so this is the SQL command we need to read all of the entries from the user’s database but then actually have a condition that filters out a whole lot of them so now that we have that query prepared we can go ahead and run it so we can assign it to a variable called user and we can just say get user. getet method and we get the user that matches the the username so essentially what this command does is it’s going to inject this username and into this question mark right here and then it’s going to read everything from the users where their username matches the one that we pass in so that’s just an email lookup from a SQL database or a sqlite database so now that we have this theoretical user we need to throw in some conditional Logic for the case where no user is returned so if they try login and they don’t have an account we need to reject them out of this process so what we’re going to do is throw in an if Clause that says if not user then what we’re going to do is we’re going to return out of this function but we also need to respond to the network request telling them that we couldn’t find a user so we’re going to say res. status we’re going to throw in a 404 could not find and then we’re going to send an object that contains the key message and the associated value of user not found so if we go ahead and save that we can actually test that so what I’m going to do is come into our to-do app. rest and I’m going to restart this code so I’m going to contrl C rerun npm runev and that is going to reboot our application and the reason I’m doing that is cuz that is going to empty our database every time we restart our server it’s going to empty empty out our database now if I log in I would expect us to not find a user however since we’ve handled that case we should get an appropriate response and if we send that we can see that we do in fact get back a 404 that contains that message user not found now if I were to instead register a user we get back the token and now that has created an entry in the database and I should then be able to login and we can see that when I hit the login it actually doesn’t respond we’re just stuck waiting and that’s specifically because we haven’t handled the case in which we actually find a user we haven’t responded to it yet so I’m just going to cancel that but that confirms everything is working well for the case where we cannot find a user so if we get past this what’s known as a guard Clause cuz it guards the code or the successful code in the case where we do have a user now what we need to do is check that the password is valid so what we’re going to do is Define a variable called const password is valid and what we’re going to do is use a bcrypt method so we’re going to type bcrypt and we’re going to compare we’re going to use the compare sync method which is a synchronous comparing and we’re going to compare the password which is the one that the user has just entered with the user. password password so user. password is the second argument to this compare synchronous and essentially we can see what the method does right here it synchronously tests a string against a hash so essentially what it’s going to do is as I described earlier it’s going to Hash our password and compare it to the hashed password make sure that they are equivalent so essentially that’s going to return a Boolean where if the password is valid it’s true if it’s not if it’s incorrect correct if the comparison is not true then it’s going to return false now because we’re in the habit of using guard Clauses we’re going to first handle the case where we uh find that the password is incorrect so we’re going to say if not password is valid so if the password isn’t valid then we’re going to return out of the function break out of this code and end the execution and we’re going to respond with a status code of 401 and we’re going to send back an object that has a message key that is associated with the string uh invalid password and that can be lowercase password so now if the password is incorrect we’re going to respond and basically say nice try buddy not getting in today now if we get past this guard cause then we have a successful uh Authentication so let me just add in some comments here so uh if the password does not match return out of the function and up here this G Clause says uh if we cannot find a user associated with that username exit or return out from the function so now we can handle the case where we’ve matched the password we’ve found the user and everything looks good so what do we do well just like we did above we sign a token and we send back the token and I actually think the code is nearly equivalent so all we do is we give them back the unique token which we associate with their account which they can use to authenticate all of their crud actions and all their to-do updates and then whenever they go and make those actions we can just verify that they are in fact the correct user so first we have to get the token we’re going to Define a variable called const token and that’s going to be equal to and we’re going to use the JWT Json web token Library do sign the sign method and this takes an ID like it did just above up here we provide the ID of the user except in this case we access the ID via this little uh through the result. last inserted row ID in this case when we do have a user we can just access the ID field now what I might just do here is actually console.log the user so that we can see what we’re actually looking at when we run this request but anyway that is the first argument we need to pass into the sign method after that once again we do the process. EnV we access the environment variables file and specifically the JWT secret key and then we have one last argument which is uh an object and that contains the expires in key and that just once again expires in 24 hours so that’s going to create this token for us and then the last thing we do is we send that back we res. send or res. Json if you want to send Json and in this case we just send the token back from our endpoint and just like that we have all the logic we need to successfully handle authenticating a user so now we can actually go ahead and test that let’s once again restart our code by control cing out of it and then running the npm Run Dev command and now we’re going to log in a user which won’t exist we get back user not found now I’m going to register a user by running this register end client emulation so now we get back a token that’s super cool and then we should be able to log in a user however I’m going to use the incorrect password so now we log in we do in fact get an invalid password but now if I correct the password we should uh receive the token so that works successfully and we can see we actually consoled the database entry for that user so we can see they have an ID of one we can see their username and we can see the associated hash password now once again this is the token we’re going to use to authenticate all of our uh to-do crud actions so we’ll see how to use that in just a second but before we do any of that we have some crud end points to Now set up so we’re going to come across to the to-do routes and start filling out these end points so there’s four end points in here there’s a get for getting all of the users to do there’s a post a put and a delete now these endpoints are relatively straightforward for example if we want to get all of the uh to-dos associated with a user all we have to do is once again prepare a SQL query so we type const get todos and that is equal to we go to our database which we’ve import imported just up here and we prepare a SQL query now once again just like when we’re reading from all of the users we say select we use the asteris which ensures that we select all columns uh and we select from the to-dos database now because we only want to get to-dos associated with a particular user we just throw in a wear command and we match the user ID the user ID has to be equal to the placeholder that we will fill out shortly so just to summarize how this query works we say select all the columns from the to-do database where only where so it’s actually technically not every entry it’s just only where the user ID matches the value we’re going to pass in and now what we can do is we can say const to do is equal to and that’s just the query and in this case we’re going to use the all method cuz we want all of them and we’re just going to access the request. user ID now at this point you might say but James don’t we have to read these values from the body of the request and that would be one way of doing this however the request in this instant is slightly different because of some logic we’re going to add to the middle Weare so once again the middleware intercepts the end point receiving the network request so it gets there just first and it’s like a security layer so what we’re going to do is actually finish this endpoint assuming that our request does in fact have access to the user ID and that should be a lowercase ID and then we’re going to see how the authentication middleware works so in the case that we do fetch all the to-dos associated or where a user ID matches the one that we’ve just got from the request we can just send back Json containing the to-dos which is an object so that’s this endpoint complete but as I said we need to complete the middle Weare which is going to authenticate a user and make sure that the correct person now all of that is going to happen inside of our or middleware so we’ll save this file and head over there now the way that this middleware is going to work we’ll once again need the uh JWT package so we’re going to import JWT from Json web token now notice how I’ve been signing all of these tokens in giving the user all of these tokens and if you come back to the index.html and look at any of the fetch to-dos we can see that we then send this to-do as a network request over the Internet when a user makes any of the crud actions once they’re actually logged in so they’re logged in they have a token that authorizes them and and then that token is attached to every Network request they create while managing reading updating deleting all of their to-dos now the purpose of the middleware is we intercept that Network request and we read in the token and we verify that the token is correct for that particular user so in here what we’re going to do is Define a function called orth middleware and that is going to to uh receive some arguments it’s going to receive three arguments now the first one is the request the second is the response and the third is a parameter called Next which is new we haven’t seen that yet the request and the response are pretty standard the request is going to be super important because we’re going to need to access the token associated with that request and the response is also important because if our middleware our authentication middleware intercepts this request and finds that the user is not in fact correct then we can reject them using this response so we can basically respond before the endpoint actually receives the information we’ll see what next does very shortly anyway I’m going to go ahead and open up this code now in here the first thing we need to do is Define a variable called token and we’re going to read that from the headers of the incoming request so the way that works is we say const token is equal to the request do headers field so we access the headers of the request which once again if you remember from when we create the network request from the client we have the token inside of the headers associated with the authorization key so I’m going to copy that key because we are going to read authorization from the headers and that is going to give us access to the Token associated with the incoming Network request now if we do not have a token if we try to read the token and there’s nothing there then we can return out of this function and we can respond with res. status we send back a 401 Network request which basically says the error doesn’t have a problem actually your request is problematic and we send back a message we’re actually going to use the Json method to send back this message that’s going to be an object and that’s going to have a key message and an Associated string that says no token provided now that’s a good little gar clause and if we get past that line that guarantees that the token has been provided and now we can verify it so what we do is we use the JWT package and we use a verify method and this takes a bunch of arguments the first one is the token so that’s pretty straightforward we have to verify the token the second if you recall when we signed these tokens we signed them with a key the JWT secret key and once again this is a highly secure key and that’s why it’s in our environment variables. EnV file so we’re going to go ahead and read in from process. env. JWT secret we’re going to gain access to that secret key as the second argument and then the third argument is a call back function so essentially what this function does is we uh verify the token and then we get given some outputs and this function is run and it allows us to basically say in this case do this now this function receives two arguments one is the error in the case where something goes wrong we’re trying to verify and something goes wrong and the last one is a parameter called decoded now we’re going to go ahead and open up this Arrow function and in here if we get an error then we’re just going to return out of the function once again and we’re going to respond with a status of 401 and we’re going to provide some Json that’s going to have an object with the message key an Associated string that says invalid token so we tried to verify them and it didn’t work and so we’re sending back a response saying nice try buddy you’re not the right person or potentially just that their token has expired they need to log in again so that’s in the in case that we get an error now the decoded argument is basically going to give us access to some of the core parameters of the uh verified user and what we’re going to do is we’re going to assign them to the request so the request as much as you might think of it as a network request coming in something we can’t change well technically if we intercept it then we can modify some parameters of it before it actually hits the endpoint and it works just like an object so we’re going to say request. user ID and that is going to be equal to the decoded ID which is the ID that we found from that user and then the last thing we do is we call the next method and that basically says okay now you’re good to head to the end point so we’ve modified the request and then when we call next we say you passed this checkpoint the security checkpoint you can now reach that endpoint where if they were trying to get to-dos we can now read the to-dos from the database and since we’ve added this user ID parameter to the request we can then access it from within this uh this endpoint from this request now the reason we don’t just do this process inside of the endpoint is because with middleware we can write this function once and then slam it in front of every single authentication protected endpoint anyway so now we have this code where we basically verify the token if we find out that they are indeed the correct person then we modify the incoming request to ensure that it also contains the ID of the user since we’ve verified them and then we tell them you may carry on to that particular endpoint so that’s the uh or middleware complete now how do we actually throw it in front of the endpoints well the first thing we need to do is export a default module called or middleware so we have to export it from this file and then if we come over to our server. JavaScript and come down to this particular app. use to-do’s endpoint all we do right here is we literally slam it in front of our to-do routes so in this case we would just throw our or middleware which I’m using the Auto Imports right here we can see it’s suggesting I import it and I’m going to throw it in between the to-do routes so we can kind of think of it as like okay we hit this endp point first we encounter the middleware and then every single to-do route endpoint is blocked by this middleware and that is imported just here and that is now available and if I go ahead and save that that should make sure that all of our to-do routes are protected by our or metalware where we have to confirm the token now two small things I wanted to clarify really quickly first is uh regarding the inside the or middleware specifically what this next does this next just says okay you may now proceed

to the end point so it’s the final step before saying okay we’re done with the middleware let’s go on to the actual endpoint which is one of these to-do routes and that is you know all of them here so we hit the or middleware we call that next in the case where it’s an actual verified user and then we send it through to the to-do routes having added the ID to the request so that within the to-do routes we can read the ID from the request now as for the decoded ID if you recall when we originally create these tokens when either we register or we log in we actually create the token encoding the ID so the ID is what we associate with the user and so when we encode it into the token we can then decode it and that’s what this decoded is and consequently we can get out the ID and verify the user so those were just two small clarifications I wanted to make now we’re actually at a super cool point in our project because what we’ll do eventually is come down and write the uh client emulations for all of these endpoints but we can actually register a user log in a user and fetch all of our to-dos which for a brand new user should just be that one default entry that we uh added when they register so they get this default to-do everyone who signs up to our app and the front end should be able to fetch all of those things cuz we have created the complimentary backend endpoints to facilitate that interaction so what I’m going to do is once again restart my server completely that’s going to clean out our database and then I can now try to I can now attempt to log in let’s try a random user I go ahead and sign in it says failed to authenticate that makes sense because we don’t actually have that user saved in the database if I right click and come over to the uh in Chrome developer tools by clicking inspect we should be able to take a look at that Network request right here so if I refresh that page and we do that once more we’ll go for test gmail.com I’ll do the favorite password and submit that we can see we send out that Endo we get a 404 not found we can see here’s the payload that got sent out as part of the network request from the client and then the back end received it and then we responded with the uh Json that said use it not found so that all works perfectly however now if I try to sign up I can submit that and we can see it’s actually logged Us in and we can see a few extra Network requests were just run one was this register endpoint and we can see we got back at 200 okay we can see it was to the orregister endpoint and we can see we sent over the username and password and we got back a response containing the token now the way registration is working in this application is that it also upon uccessful registration logs in a user if we take a look at the front end code and come up to the authenticate function basically the registration and the login functionalities both serve to gain access to a token and then once we have this token if the data contains this token we save it we cach it essentially in cookies or local storage and then we load all the to-dos by calling this fetched to-dos method which if we come down sends out a network request saying okay now we have access to this token let’s send it to the/ todos endpoint it’s a get request we get all of the to-dos back the endpoint response with the to-dos it looks them up in the database once our middleware has authenticated the user via this token and then when we get the to-dos back we display them on the screen we render the to-dos and consequently we end up with a dashboard right here and that’s what it looks like it’s nice and responsive mobile responsive looks great uh and we have our first to-do right here now currently once again we can’t add to-dos we can’t edit to-dos we can’t delete to-dos cuz we haven’t finished those end points for example if I TR say done we just get a failed Network request we don’t get anything back nothing happens if we delete we also just get absolutely nothing back it’s not working so we’re going to have to program them in a second uh but we can now log in display a dashboard register a user login a user and our token is working and we can confirm that by a reloading this page if I reload this page we can see it asks us to log in again and now if I go test. gmail.com type in our password and submit we can authenticate once again and now we have this token now as for the token we can come across to our applic right here and we can see that inside of local storage we have our token saved right there so that is all hunky dory uh and that’s working brilliantly so now what I’m going to do is start to code out the rest of these end points so the first one we have to code out now is this endpoint that allows us to create a new to-do now the way that this one is going to work is not too dissimilar to this first endpoint but it is a great opportunity for us to learn some more SQL queries which is kind of fun CU at the end of the day we’re writing these SQL queries to inject the data into our database tables so what I’m going to do is once again log in we submit that that is done we’re now in the dashboard to add all of our information so if we first come into our index.html let let’s find the function for adding a to-do so just here we can see the network request from the client that tells our backend endpoint to add this new to-do to the database so we send out a network request it’s a post method which is typically for the creation of something and we post information we send information across with this network request to the to-do’s endpoint we include the content type we say that it’s Json information that we’re encoding and we authorize this request with the token so that our middleware can confirm we are the correct user and then finally we send over the task which is whatever is input into this field right here when we click this button so our backend can expect to receive the task as part of the body of the request so if we come into our post request right here for creating a new to-do the first thing we can do is is we can define a variable or better said we can destructure out the task from the requestbody once we have access to that task what we can do in here is we can Define the SQL query so we can say const insert Tod do and that is equal to database. prepare and we can now prepare the SQL query so we’re going to use the back Tex and then in here we’re going to say insert into then we’ll specify the table which is the to-dos table and then we’re going to provide the columns for which we have information which is going to be the user idore ID and the task column the user ID is going to specify what user our task should be associated with and that is going to be something that we can now access from within the request because of our middleware authenticating our user so once we specify the columns then we provide the values and those are just going to be two empty question marks as placeholders until we complete the query so now that we’ve prepared the query we can go ahead and insert too. run we can run the query and we can pass in the request. userid as the ID for the user that we want to associate the task with and then we can also pass in the task itself and that is going to insert that to-do into our database inside of the to-dos table now once we’ve inserted that to- do the last thing we need to do is res. Json we need to respond so we’re going to use the red res. Json method and in here we’re going to provide two key value pairs the first one is going to be the ID of the to-do which we can access by going to the insert to-do and we can get the last ID that’s going to give us access to the ID of the most recently added entry and then once we have that we can now also send back the task and we can send back the completed status of zero because it’s false for the minute the zero represents the false bullan State and that is because our to-do is not yet done we’ve literally just added it so we can now save that endpoint now what I want to do at this point is just actually create the client emulations for for both of these actions so the first one so obviously we’ve got three so far one is to get the homepage we’ve got a register a user and log in a user now we need to emulate the uh fetch all to-dos endpoint which is going to be a get to the slash uh to-dos endpoint a get request to the slash to’s route or path better said and I might also make a note that this one is protected and essentially how this is going to work is we very simply have a get request to http localhost uh 53 or whatever Port you’re using SL todos now first what I actually want to do since this is a get request we don’t need to provide any information as part of that request we can just send it but in its current state I haven’t provided the token to associate us with a user so if I just run this request we would expect to be blocked because we can’t our middleware can’t authenticate us so if I go ahead and run that I get back a 401 which says that I’m unauthorized and we get the message saying no token provided now the way that from this rest client we provide an authorization token is very simply by writing authorization and then a semicolon and then in here we put the token now for the minute we don’t have a token so what I’m going to do is register a user that going to create a user with these credentials and then I’m going to copy the token right here and that’s the token we’re going to use for authorizing these uh to-do crud actions so now that I’ve pasted in the token I should be able to send this request except the difference is this time when we send out the request we have encoded the authorization token into that request so that our middleware our or middleware can intercept that and interpret it and consequently authorize us so now when I send that out we can see that I do in fact get back a to-do entry it has an ID it’s the very first to do we can see that it’s associated with a user with the ID number one so that’s our first added user and here we can see we have the task and here we can see we have a completed status which is zero which is the false bullan status so that is an incomplete uh to-do now what I want to do is Define an in or a client emulation that creates a new to-do and that is a post to the slash too’s endpoint and that is also protected now that is going to be a post to the HTTP localhost 5003 SL toos now this one has a bit more information first up we once again need to authorize the user so we’re going to copy and paste that token now one thing to note is that if you’re restarting your user that token will be invalid as our database refreshes every time we reboot our server now that might seem counterintuitive but in the fourth project we will learn how we can persist that information as we create a third party database entry but for now the moral of the story is that this token is only relevant for either one login session or one registration session while that user is persisted so we use the same to token now what we need is also some Json that contains a task so we’re going to add the task field right here and also have an Associated string that says finish coding the projects now as always because we’re including Json we have to add the content type header to this request by specifying the content-type and that’s going to be application sljs so now when I save that we should be able to create this new entry because we have also created that endpoint so I’m going to go ahead and send that request and we can see that we get back our client endpoint response with the task and the completed status of zero now one thing I noticed is that we should have gotten back an ID as part of that field and I think what we’re missing just here is we have to create a variable here that’s called const result and instead of just running this command we actually have to assign the output to a variable called result and then this should actually be result do last insert row ID so now what I’m going to do is once again go ahead and run that and let’s try that once more so if I come back to the client emulator we’re going to have to run all of that again cuz our server has been restarting so we will well actually let’s try log in yeah user not found so let’s register a user we get the key we’re going to have to replace the key in all of these emulations uh so these authorization tokens are going to have to change to the new token and now I should be able to test them both so let’s get all the to-dos we can see we have one to-do just there that’s the default entry when we register a new user now we create it too and here we can see we get back the IED this time so that has worked and it’s important that we get back the ID because in these emulations when we start specifying what to-do we actually want to modify or delete we’ll do that by specifying the ID of the to-do to you know perform these actions to now if I go ahead and get them again I would expect to have two IDE two tasks inside of my database and indeed I do I have the default one and I also have the secondary one right here that we just added so that is super cool now we can go on to the put entry now for the put entry what I’m actually going to do is uh start off by creating the client emulation for that so we can understand how these Dynamic query parameters actually work so we’ve now added two to-dos so we can go ahead and modify one of the added to-dos so in here we’re going to create a new client emulation and that’s going to be called update a Todo and this is to the slash uh how how did I specify it it’s post to/ too so this one is put if I can spell that correctly put to slash todos and then it’s slash an ID which is a dynamic parameter so we throw in front the uh pound key and this is also a protected route uh and when I say ID this is just a demonstration there are 100 different uh update or modification actions you could make for example you could change what the Tod do in actually says but in this case it’s just modifying the completed field so when we click the completed button or done button just here that counts as a modification so that’s what we’re going to use to demonstrate this kind of endpoint so that is a put to the HTTP localhost uh 53 too’s endpoint however if we come to the actual endpoint we can see that now where these ones were just slash so we can’t just do SL toos it now has to be/ toos slash an ID entry so since we you know saw earlier when we sent out these requests we have an ID of one for the first to-do and an ID of two for the second request cuz they automatically increment so I’m going to go ahead and modify the second Todo entry so in here what we’re going to do is specify the second entry so we’re going to add add the slash2 on the end which is the slash ID that’s the ID of the to-do that we want to modify now we’ll still need the content type of application SL Json cuz we’re sending data and we’re also going to need the authorization token so we’ll copy that from just there and now what I’m going to do is specify the Json data and tell it that we want to change the completed field to a value of one currently it’s zero we want it to be one so that should go ahead and complete that to-do and what you could actually do since we’re running all these modifications is you could also specify the task and change the data associated with that task but we’re not going to do that we’re just going to modify its completed status so if I save that I should now be able to run that and update that entry so I go ahead and run that and we can see that nothing happens and that’s cuz we haven’t actually coded out that endpoint but the moral of the story is you can see just here how these Dynamic paths actually work so just here we can now from within our to-do routes figure out what this ID is associated with so the first thing we’re going to do in here is a we’re going to access the completed status from the requestbody so we’re going to destructure out completed from the body of the request now the second thing we’re going to need to do is access the ID now the way that we get the ID from the URL is we say const and we destructure out the ID from the request. params so that’s the parameters of the request of which the ID is now one now you can also get uh parameters from the queries of the request and and that’s just specific to the URLs and we’re actually not going to worry about that in this course but it’s just a demonstration of the different ways that we can send information via a network request we can either send it per the body or we can throw it into the actual URL what the last example would look like is uh just to show you quickly uh let’s say here we had a question mark that said uh task is equal to and this is the is the updated text or something what we could do is if we went request. queries we could access the task field where the queries come after a question mark a common example is page page is equal to four that’s a query uh that’s associated with the request and then you can access the page number and you can consequently get the value associated with that but we’re not going to worry about that well actually why don’t I just show you page is equal to four now if I go ahead and run that once again that’s going to have the exact same output uh which in this case is to weight so that determines that this URL still hits the exact same endpoint however now what we would do is inside of the to-do routes to access the page I would say const uh and destructure the page out from the request. query so now we have collectively uh demonstrated the three different ways that should you want to you can send information or parameters across via Network request from the body via the parameters or as a query entry once again this last one is not really relevant to this course it’s just good to be aware of and it’s not really going to change anything you can just throw these uh queries onto the end of the rest of the URL so we have access to the ad and we have to access to the new completed status so what we can do in here is we can now say const updated to do is equal to and prepare our SQL query so that’s database. prepare and the query here now we already have an entry inside the database so we’re not going to use the insert to create a new one we’re going to update an existing one and the way that that works is we say update to-dos we update the to-dos table and we set the completed field equal to question mark Mark now if we wanted to update the task as well what you would do is you would just say task is equal to question mark and you would comma separate them so if you wanted to you know have two different columns being modified you just throw them after the set key and you just comma separate them this is obviously the common name and this is the new value with a question mark is a placeholder for the data that we will add in a second but in this case we’re just going to modify the completed step status so we’re going to remove that and now we only want to set these new values where the ID is equal to a placeholder which is going to be this ID just up here the ID associated with that to do so now that we have that uh SQL command prepared we can say const result is equal to well actually we don’t need the result in this case cuz we don’t have to send back an ID we can just say updated to do do run and in this case we pass in the new completed status which we destructured and we also pass in the second question mark which is the ID and that’s the ID that we gained from the parameters which is part of the URL once we’ve successfully updated that we can just send back res. Json and we can send back an object with the key message and the associated string uh too completed so now if I go ahead and save that we should be able to run all of these uh emulations so first we have to register a user that gives us the token we’ll copy the token and we’ll paste it inside of the uh crud actions so first what we’re going to do is just get all of them so that’s our default entry it’s currently got a completed status of zero now we’re going to change the authorization token for adding a new entry we’re going to to add this new entry right here that is now added we get back its ID of two it is also incomplete and now we have one for updating and that is going to update the to-do with the ID of two that’s the dynamic parameter and this is the query on the end that as I said earlier anything after the question mark isn’t going to change the actual end point we hit it’s just specifying some further unimportant information in this case that’s not relevant it was just a demonstrate a point of the three different ways we can encode information into a network request via the body or via a parameter or as a query anyway so this hits the update end point have we updated the token let’s just confirm contrl v no now it is updated so we can go ahead and update this and this should change the completed status of the to-do with the ID of two to now complete that is now completed we get that success for response it’s a 200 level status code so that is perfect and now if I get them we can see that the first entry is still incomplete but the second entry is now complete and even cooler if I refresh the page and log in what are the credentials that we’re logging in with Gilgamesh gmail.com and we type in the password this should still work cuz they’re both using the same server and database we can see that we actually have two entries just here and because one is actually complete we can no longer click that complete button and it is inside of the complete column and we now have only one open to do and now if I go ahead and click done on this particular entry we can see that that Network request is sent across just here we get back at 200 level status code and our application changes so it is now two complete entries and zero open entries so everything is working perfectly and so now that that’s all done we have but left one more end point and that is the delete endpoint now what I’m going to do for the delete endpoint is once again create the emulation first now in this case what I’m going to do is actually just copy and paste this update one cuz it’s very similar and this one is just going to be delete a to-do it’s going to be a delete method and I think that should just be lowercase for consistency and in this case cuz we’re modifying a particular Tod do in this case deleting it uh we need the secondary Dynamic ID parameter now we don’t need the query in this case we’re just going to have the ID uh we still have the same authorization token and now we actually don’t need to send any information but we do have to update the method to delete and that is the emulation all done now one thing I would like to point out uh and this is just you know FYI for your information is that if you’re working in a big organization typically what we’re doing right now which is known as a hard delete is almost not recommended uh because it’s it’s permanently erased you can’t necessarily get it back very easily so what a lot of companies will do let’s say you’re managing lots of Google docs for example if we came over to where we create this database typically they will create an additional field inside the relevant table called Soft delete and that’s once again just a Boolean value where when a user deletes it you actually don’t remove the entry from the database you just change the soft delete value to true and that way it can be restored at a later date so it’s kind of just like a fake delete that’s just something interesting to be aware of we’re not necessarily going to handle that in this case we’re going to go for a Perma delete uh but I just thought I’d mention that any who so now that we have this emulation going let’s go ahead and fill out this last endpoint and wrap this project up so that we can dive into the more advanced version so this endpoint is once again relatively straightforward it’s going to delete a to-do and the first thing we need is to access the ID so we’re going to destructure the ID by saying const ID is equal to the request. prams because it’s a parameter this is a dynamic parameter so that’s going to give us access to the ID then we’re just going to prepare the SQL query so that’s going to be const delete Todo and that is equal to database. prepare now the SQL command to delete an entry is as follows we are going to delete from then we specify the table which is the to-do’s table and then all we have to do is say provide the condition essentially so we say delete from the to-dos where the ID is equal to a placeholder that we’ll fill out and a Boolean operator so both of these have to evaluate to true for it to be deleted and the user ID is equal to question mark now the reason we’re having a Boolean condition here is because we want to match both the to-do ID in addition with the user ID so that’s just basically a double secur to ensure that we’re only deleting a to-do that is associated with the correct user so now that we have prepared the query we can go ahead and run it by saying delete Todo do run and then we just pass in the ID as the first field and then the user ID as the second field and we need to get access to the user ID so we can just go ahead and destructure out the user ID uh from the request that would be one way to do it or we could literally Define a variable called user ID uh and set that equal to request. user ID so that’s going to gain us access to the user ID these values are going to be injected into this query and consequently run and that should complete our endpoint so now if I go ahead and save that it’s going to refresh our database and let’s run through this from the very beginning I’m also going to refresh our app and the other thing I’m also going to do is just uh Delete the token in here and refresh the app uh so now that we also have a blank application so first up let’s test the emulator well first what we’re going to do is we’re going to register a user that works perfectly and now that we have registered a user I can go ahead and log in the user to confirm that that’s working so now we log in a user we can see the user just down here there’s the hash password and we can copy this token now that we have the token we can run all of these authentication protected endpoints so first I’m just going to change the token and all of them and then we’ll go and have a fiddle so first what we want to do is we want to fetch all the to-dos we just get the default one that’s perfect now we’re going to go ahead and create a to-do that adds a new to do and I’m actually going to create that twice so now we have an ID of three and if I get them all again we can see that I do in fact have three Todo entries now I’m going to go ahead and run the put which is a modification and that essentially enacts to complete a to-do so if I run that that should complete the I the to-do with ID of two so this middle one so we run that and the to-do is now complete and if I go ahead and get them all again we can see that the to-do with the ID of two is in fact complete this one state represents true it’s kind of like binary and then what we can now do with our last endpoint is delete the entry with the two since we’ve completed it and that didn’t work and that’s because we forgot to respond I actually think in the context of our database that will have run because technically we will have hit the endpoint and run the logic so if I go ahead and fetch all the to-dos we’ll see that the element or the entry is missing but yeah we just need to respond inside of uh the to-do endpoint so that’s the last line that I totally forgot so in here we’ll just say res. send uh and we’ll send a message that says too deleted very simple and just like that we have officially completed project number two and chapter number three we now have a fully functional application if I once again uh come over to the dashboard and try sign in with gilgames run our password that fails so I’m going to sign up I can now submit my sign up we add that default entry to the database uh I can say that we’ve done it even though technically we haven’t that changes the tab that it’s available in I can add a new to-do let’s say uh go to the gym add that that gets put in here I can refresh the page those are fetched back for us I can say I’ve done that now that is in the complete tab we have no open to do so I can add one it says hello that is now added and I could go ahead and delete that other entry and then if I refresh the page everything is persisted and our project is complete so that is absolutely brilliant massive congratulations uh once again if you do want to support the channel be sure to St the repo love that support and with that project complete is now time to dive into chapter 4 our final project where we’re going to take this code base to the absolute Moon all righty welcome to chapter 4 of this full course where we’re going to take our back-end programming skills to the absolute Moon by building out the ultimate backend application now in this particular project we’re not going to start the same as we have started the rest of the projects where historically we’ve run that mpm in- y command and we’ve built our project up from there installing all of the modules from the npm ecosystem and creating all the files and folders instead this chapter 4 project number three is actually going to be an evolution of chapter 3 chapter 3 is like the beginner version of developing a complete backend application and chapter 4 is going to be like what you would find in a company or massive Tech organization Enterprise level absolute God tear backend infrastructure as for what exactly is being evolved well there’s two things actually three things I guess in particular that are going to change and upgrade from our previous project number one is the database in the previous project we used sqlite which is a brilliant database but if you’re going to build a big production level application then you want to go with a more reputable SQL database such as MySQL or in this case postgress SQL postgress SQL is my all-time favorite SQL database and in this project we are going to put it to good use the second core difference between chapter 3 and chapter 4 is that in this project we are no longer going to be writing out these custom SQL queries instead we’re going to use what is known as an OM or an or or an object relational mapper and what this does is it’s like a middleman between our postgress database and our JavaScript where we can now interact with our database our SQL database as if it were a JavaScript entity and that is thanks to our o the middle man and in this case we’re going to be using an RM called Prisma it’s very popular we’re going to learn how we can integrate it into our project with postgress and we’re also going to learn about all the other advantages that come with using an RM because there are many and last but absolutely not least is that we’re going to dockerize our entire project now in chapter 3 we actually had our database and our server as essentially the same entity and this project is going to be different they’re going to be two separate environments that means that our server is going to have to communicate to an external database and both of these are going to be their own independent Docker environments now this is a much better practice because if your server breaks down it doesn’t mean that your database has to completely restart itself and it also means that our database is going to be able to persist data that much more effectively so at the end of the day these are some absolutely massive changes and there’s also going to be a whole lot of other stuff that we will learn as a product of making these evolutionary changes so it should be loads of fun now as for how we’re going to kick this project off it’s not going to be like the previous chapters either where we have previously run npm inet Dy and then installed all the packages from the npm ecosystem and then built up our file directory from there in chapter 4 what we’re actually going to do is create a duplicate of chapter 3 by right clicking Hing copy and then we’re just going to paste that folder directory and we’re going to end up with a duplicate that we can then rename and we can turn it into chapter 4 so now we have our code base that we can go and rip to Pieces keep the core logic keep the server and make any necessary changes so that we can create this evolved backend project now the first thing I’m going to do inside of here is come into our package.json because I’m just going to change the name of our project and I’m also going to change the description just here so this is instead going to be a dockerized full stack application that uses no jst backend uh a postgress SQL database a Prisma RM and JWT authentication so those are going to be the core changes made in this project now as for what we’re going to start off with we’re going to install the necessary packages that we need for these new techies that we’ll be using in this project specifically we’re going to need to npm install number one is Prisma number two we can space separate different ones is Prisma SL client and number three is a package called PG and we’ll learn what that does later but essentially it’s just a client for postgress so if we hit enter that’s going to install those packages and I realize I’m a muppet we actually need to uh first change directory into our new project so I’m just going to CD into chapter4 and then run that command once again and that will install them all and now we can see inside of our package.json we have our Prisma client we have PG and we have Prisma now the second thing I’m going to do and this is where we’re going to start off making these modifications is we’re first going to create this Prisma client and the way that we kick that off is we start by typing in a command npx Prisma in net and if we hit enter on that that is going to create a Prisma folder inside of our chapter 4 uh project directory now inside of this Prisma folder directory we can see there’s a file called schema. Prisma now if you’re unfamiliar with what a schema does it basically is a folder that specifies the structure of our database so if we think about chapter 3 when we created this database. JS file we specified what we wanted our tables to look like inside of our SQL database well the schema does it in a similar way where instead of using a SQL command we create it as if it were a slightly complicated object and that’s because it’s going to allow us to interact with it as if it were some form of object and that’s just going to ensure that our code stays much much cleaner so we’re going to open up the schema. Prisma and there’s about 15 lines in here and we can just leave them all in there now there might be a couple of lines in this file that are a bit confusing but we’ll only add about 14 lines if you want to learn more you can check out the docs at this particular link and it will explain everything you need to do but you know you’ll also learn by doing in this particular case so when we create this conversion between our SQL database and JavaScript we essentially need to create a model for our JavaScript to allow to interpret these SQL tables so in this case we’re going to need two models so we say model and then we Define the name of the model which is the user and this is essentially going to be uh the structure of the model it’s kind of like predefining what the tables are actually going to look like so we’re going to create this user object and in here what we’re going to do is just like we did before we’re going to create the columns that are going to exist inside of that table so the first one is going to be an ID as we had before then I’m going to tab across and specify that it’s of integer type typ and then I’m going to tab again and in this case we’re going to have some extra parameters so the at ID is going to have an at default and that’s going to autoincrement and we’re going to call that so this just here basically says that you know this is going to be a default parameter which means that we don’t need to specify it when we create a new user and we want it to autoincrement as we add new users now the second field in here is going to be the username and that is going to be of type string and in here we’re just going to ensure that that is unique cuz we can’t have two users with the same name underneath that we have a password column and that is going to be also a string and there’s nothing special there and last we have the to-dos and this creates the relation between the two tables and this is just going to be a to-do as an array so that’s going to be our user model that we’re going to use to basically interact with our postgress SQL database using a JavaScript syntax now we’re going to Define our second model which is going to be the to-do this is what our to-do is going to look like and once again we’re almost done with this file but if you want to learn more about all of the information is in this docs for this project once we create this schema then we just initialize our database using the schema and we’re good to go so for the to-do the first parameter is going to be an pretty similar this one is going to have of type integer and it’s also going to be at ID at default uh autoincrement so that’s pretty straight forward uh the second field for that to do as per the chapter 3 project was the task and that is of type string uh underneath that we have a completed status and that is of type bull in and in here that’s going to have a default at default of false because when we add a new to-do it probably hasn’t been completed yet underneath that we’re going to have a user ID so that’s going to associate the task with a particular user that’s going to be an integer field and finally we’re going to associate this table with the users table so we’re going to have the user and that’s of type user and in here that’s going to be a relation so we’re going to type at relation and we’re going to have the fields and in here we’re going to have a user ID inside of uh the square parentheses and then we’re going to create the references and that’s going to be exactly the same just like that and this is the schema we need for our entire database we can see up here it’s already configured for a postgress SQL database and we have the client so this right here is the code we need to configure our postgress SQL database for a user or the table specifically and here we have another table a predefined template for our to-do uh table so we can go ahead and save that schema now one of the reasons why an OM is also absolutely brilliant is because of a concept known as migrations with our previous project if we were to deploy this live to the internet we create these tables suddenly if we’re in a production environment and we need to change what our database looks like that’s going to be incredibly complicated how do we go back through and make these modifications to all of the entries inside of our database you know like if you have 100 users that are using a primitive form of your database and then 100 that are using a later version essentially version control of your database becomes incredibly complicated when you have loads of users relying on it on a daily basis using something like an omm allows you to easily introduce the concept of migration so essentially what it is is it’s just a record of all the modifications that have been made to the database and when you run your migrations every instance of your database is updated to reflect these changes so it’s always the most recent version and it’s also you know supports these Legacy entries so all of the previous entities have been updated to reflect these changes so essentially what’s going to happen is eventually when we create our postgress environment our postgress SQL database we will run our very first migration and it will format our database to match the schema but we’ll see how that works in a second I know that can be a little bit confusing now the other file we’ll need for our database is a Prisma client and I’m going to create that inside of the source directory so we’re going to create a new file called the prison client.js now this file is pretty equivalent to the database without all this funny business down below we’re just going to create an entity a Prisma entity through which we can interface with our postgress SQL database so in here what we’re going to do is import the Prisma client from at Prisma client pretty straightforward then we initialize it we say const Prisma is equal to new Prisma client and we invoke that we instantiate that class and then finally we export default Prisma so that we can access this Prisma entity from anywhere inside of our project that’s that file totally complete and now that we have it we can go ahead and see how radically improved writing and interacting with our database can actually B so the database interactions that we’re going to be modifying are within these orth routes and these to-do routes and I’m going to start off with the or routes so just up here we can see this is the code that we used to do to create a new entity inside of our uh user table and then likewise with the to-dos tables where we prepare the SQL query and then we execute them well once again with Prisma it’s a little bit different so what I’m going to do is start off by deleting this insert user query and instead I’m going to say const user is equal to and I’m going to await because now that our database is a third party entry the communication between the server and the database is an asynchronous process so we have to make sure that our in points are asynchronous so now what we do is we await Prisma do user we access the user model that we have created and all we do here is we say we call the create method and we pass in an object the object has some a data field and that is an object itself and in here we provide the username and the password which is the hashed password and just like that we have created a user using a JavaScript syntax so that is super easy now the second we have to do is insert a to do I can get rid of all these SQL entries and now I can just await Prisma do too. create pass in an object as an argument and in here we have a data field that’s also an object and we just have the task which is the default too and we have a user ID field which is the user. ID it’s super simple this user right here is just what gets returned it’s essentially just that model object that we created inside of the schema and that’s all we need to do well actually there’s one more thing now we just take this user to do and replace this code just down here and we have now updated this file to instead use the OM Prisma instead of having to manually write out all of these SQL queries so that is the registration done let’s see what it looks like for the login well for the login we can get rid of these two lines right here and we can say const user is equal to we await since we’re having to await we need to make this endpoint asynchronous so we just throw an async in front of that function and then we just await Prisma the user entity and we find a unique entry that takes an object and in here we specify a where Clause so it’s kind of like the SQL logic where we say where ID is equal to ID or in this case where username uh is associated is matches the username that we have entered just here and that is literally all the code we need to find our unique user so I can now save that file and that is complete now we’re not quite ready to boot up our project just yet because we haven’t actually instantiated our postgress Docker environment just yet but we will get to that very shortly first we’re just going to update some of these to-do endpoints so that we can finish up with our Prisma omm configuration so first one first let’s get all the to-dos so we’re just going to remove this code right here and we’re just going to say const todos is equal to we’re going to await we’re going to import Prisma which we also had to make sure we did inside of this file and I actually didn’t do that so that’s me being naughty let’s make sure we import Prisma from our Prisma client save that make sure it’s imported inside of our to-do routes as well and then we can await Prisma we access the to-do table and we find many cuz we’re getting a lot of them and in here we provide an object and we just say where and we want to return the entries where the user ID matches the request. user ID now once again this needs to be changed to an asynchronous endpoint and that is all we need to do to access all of the to-dos where the user ID matches the ID present in the request super straightforward once again you can find the documentation for all of these Methods at that link inside of the Prisma client but we’re going to demonstrate how most of it works inside of these endpoints for creating a new to-do you might be starting to get the hang of this now all we’re going to do is throw in an async key right just in front of that endpoint function and now I’m just going to uh say const Todo is equal to and await Prisma do too. create and in here that’s an object and it takes a data field that’s also an object and we just provide the task and the user ID which is the request. user ID super simple and then we just return the to-do so we’re going to actually just send back the to-do we don’t even have to create all these fields it’s just manually assigned to this variable and we will get an object that represents this new to-do for the put entry once again super straightforward we can get rid of all of this logic and I’m also just going to get rid of this query as well and in here we just say const updated to do is equal to this one also needs to be made asynchronous so we throw the async key in there I’m also just going to throw that in front of the delete one while we’re down here so I don’t forget to do it and then we just say equal to we await Prisma do too. update and that just takes an object and we say where that’s also an object and we say where the ID is equal to and we’re going to pass integer so we’re going to convert it to a numeric value and we’re going to access the ID that’s the field right there and then after that we’re also going to match the user ID with the request. userid field so that’s going to make sure we update to-dos only where the to-do ID matches and also we have the correct user and then underneath that we just provide the new data which is just going to be the new completed field super straightforward and I think in this case the completed field is currently going to come through as as a numeric value so what I’m going to do is throw a double exclamation mark in front of it and that is going to convert it to a Boolean amount and then once again we can now just send back the updated to-do done and done super simple so this command just to summarize we update the to-do where all of the IDS match to confirm it’s the correct to-do we want to update and we provide the new data and we force our completed field to become a Boolean value by throwing the double exclamation in front of it that’s a little secret hack it’ll convert anything to its uh truthy or falsy state and finally we have the delete field once again pretty straightforward we get access to the user ID and in here we just uh await Prisma do too and we use the delete method that takes an object and we just specify where and that is exactly the same as this where Clause up here so I’m just going to pass that in and we are done that’s literally all it takes to use the Prisma RM super straightforward and I actually think I can uh do that and that we’ll use that value instead so it’s just so much tidier than having all these SQL commands all throughout your files and just like that we have configured all of our end points to use the Prisma omm and also to be relevant for postgress which is super important so now that we’ve done that we are now ready to dockerize our environments and actually get our postgress SQL database up and running and then the last thing we’ll do is we’ll see how we can create a composed. yaml file which essentially just configures everything so we can boot it all up in one command all right it is now time to get our hands dirty with Docker and container Miz some of our uh infrastructure so the first thing we’re going to need to do is actually boot up the docker desktop on our device if you recall we installed it earlier and the link to install Docker is available in the description down below so you can just come over to that link and hit download Docker desktop and select your operating system now when you have it open it should look something like this particular screen here where all of our environment are referred to as containers we contain our application it’s like its own mini environment where we can configure how to set it up and consequently what code to run now everything else is going to be done from the terminal uh from the command line so we can just move that to the side we don’t necessarily need it open but but we have to have the application running and there will be some advantages to having the client open later now the way that we go about containerizing our environments is by creating a doer file file now the docka file is basically just an instruction sheet on how we can create this environment so that it has everything it needs to run our little you know infrastructure backend infrastructure be that the database or our node.js server so what I’m going to do inside of chapter 4 is create a new file and it’s just going to be Docker file just like that doesn’t even have a file extension now inside here as I said a second ago it’s pretty much an instruction sheet and the first instruction we need obviously we’re running a node.js application is so we need to set up this environment to have access to nodejs so what we’re going to do I’m going to leave a comment just to walk us through these steps I’m going to say use an official node.js runtime as a parent image now I remember when I first came across this term image I was just thinking oh it’s like a picture and that’s kind of true but in this context an image is actually more like a snapshot it’s a snapshot of a separate instruction sheet so when we eventually create this Docker file and build our container what is happening is we’re creating a snapshot of that environment and then whenever we run our containers we can just run that snapshot and get us right back to where we were and we can build off pre-existing images in this case we’re going to build off the node.js official image and that’s just going to take a snapshot of the node environment that we specify and add it to our new environment that we’re creating via this Docker file so we’re just going to use the command from and we’re going to say node version 22- Alpine so that is the official node.js image we need to throw into our new containerized environment now this Docker file just here is specifically for our nojz application we’ll see how we can get our postgress SQL environment up and running very shortly so the second line we need is to Now set the working directory so we’re creating this new environment we need to specify a folder for our project so we’re going to set the working directory in the container and we do that using the work dur command and we’re just going to say slash app that’s where our working directory is going to be step three now that we’ve got that no JS Bas image and we’ve got our working directory we need to copy the files from our local project into this new environment because basically it’s like you’re setting up a new computer you need to copy all your stuff across so we’re just going to copy the package.json and the package-lock do Json files to the container and the command that we use to copy stuff from our local device into our Docker container is the copy command and the files we’re going to copy first is going to be the package.json files so we’re going to specify that we want to copy any file that has a package in it so then we’re going to use the uh little asteris so that it selects both the package-lock and the package.json and then we just want them to be the um Json file so that is from the source and then the destination we want to copy it to is the period which is going to be the current working directory which is the SL app so it’s going to copy these two files from our local device and slam them into the app of our Docker environment now that we have access to the package.json we need to install all of the necessary npm packages or dependencies that we need for our project and since we have access to the nodejs and consequently npm ecosystem we can do that very easily so we’re going to install the dependencies now traditionally we’ve typed mpm install and then we’ve specified the name of the package however if we just want to install every dependency inside of our project we can do that using the npm install command and we don’t have to specify any packages and what that will do is it will just read our dependencies list and install them all and since we have access to that file inside of our Docker environment we can just use the Run command and we can run the mpm install command inside of our Docker environment and that will install all of the dependencies now that we have the dependencies installed we’re good to copy the rest of our application across so we’re going to copy the rest of the of the application code and that is once again a copy from The Source destination which is our current file which is chapter 4 to the destination which is the uh SLA directory the period is the current file and this is the current working directory inside of our uh Docker environment and the destination so that’s going to copy all the remaining source code across now the reason we separate these commands is because the way that a Docker image is built is from the top down and if we change some of our source code when we next go to build that image it will rebuild our container from any files that have been changed if all of this stuff is exactly the same which it likely is we’re probably not going to be installing any more packages and that means all of our dependencies will remain consistent Docker is clever and so can cach all of this build information and it can just rebuild the image from the changed line so technically we could just copy everything in one go but that means that when we make changes to our source code we would then consequently have to recopy this line and reinstall all the dependencies which we can avoid if we just copy first the package.json then install the dependencies and then next time we change our source code we can just rebuild our image from this line down you don’t have to do this but if you just want to make the process of building your containers slightly more efficient this is a good way to do it so this line copies our entire source code across to the container now that we have copied all of our source code across what we need to do is expose the port that the app runs on now what this means is that when we create this Docker container and we run our application inside of it it’s essentially walled off from the rest of the world and what we need to do is open up its ports and we map an external port to an internal port and we’ll see how we can do that later but the point is we need to expose the port that we run our application on for consistency I’m going to do 53 and we can see just here this uh EXP expose command exposes the port that the container should listen on Define Network ports for this container to listen uh on a runtime so once again just to summarize this line we need to tell our environment to open up this port to incoming Network requests from whatever Source if we didn’t have this line it would be an impermeable barrier that we couldn’t send Network requests into so it’s just like opening up a wormhole between our real environment and this docker environment and then once we’ve exposed this port we are now good to boot up our application inside of this Docker container so we’re going to define the command to run your application and the way that we do that is with the command uh command CMD and in here we have an array of the strings or words needed to boot up our application now the way that we typically run a file is we say node and then in a separate command we specify where node can find the executable file which in this case is the/ source server.js file and you’ll note that that is fairly equivalent to what we see inside of our script we say node and then we specify the file but we obviously have all of this jargon in the middle now because in chapter 4 we’re using postgress and we’re not using any experimental feat feates inside of node.js we can remove all of these uh all of these different lines so that’s super handy so we can have a very simple startup script we don’t need any of these experimental features and consequently this is the command that is going to boot up our application and I actually think these are meant to be the double quotation mark not the single quotation mark so I’m just going to change that very quickly just like that so that is now happy and that is going to boot up our application inside of this container and it’s going to listen to incoming Network requests on Port 53 so that is officially the instruction sheet that our container needs to get our application up and running now this is super cool because anyone on any operating system can suddenly start up our application inside of this little environment another good example is that if you wanted to run postgress normally what you would have to do is install postgress on your device and then you can up get postgress up and running in this case at no point have we installed postgress because our postgress environment is going to run inside of its own little Docker container and so we can just actually tell Docker to create an environment with postgress installed and we can just refrain from installing it on our device so it just makes it super easy to you know deploy your code to a different environment for someone else to download your GitHub repo runner on their device uh and at the end of the day for for us just to not have any software installed on our computer aside from Docker and we can still boot up these amazing applications so Docker is super handy and it is ubiquitous which is a good word so now that we have this instruction sheet what we have to do next is actually build this container we’ve just created the instruction sheet to build this container however one thing we need to do first since we’re not running these uh experimental Flags we need to make sure that we’re not involving sqlite node sqlite anywhere inside of our project currently we still have the import line for our original sqlite database in our routes files and that means that when we boot up this container it’s going to try and execute this database as well as having a postgress database now the problem with this is that since we haven’t enabled at these experimental Flags that’s going to break our container so we don’t have to delete these files we just need to delete the imports from the or routes and the to-do routes to that database file and then it’s just going to sit there not doing anything so that is absolutely fine so we’ll just remove those two lines from our code base and now we’re almost ready to build our container the one last thing we need to do before we actually build our container is finalize our Prisma setup now the way that we’re going to do that is is from our terminal and we need to run a command that is going to generate a config file for our Prisma client now the reason we have to generate this config file is because it’s specific to our schema which is our database structure essentially and every time we change or modify this schema we need to rerun this command and the command can be found within the readme.md file if you’re looking for it later there’s a whole instruction sheet just here on how to get this up and running but essentially what we’re going to do from inside of the chapter 4 directory now that we’re finished with the schema is we’re just going to run npx Prisma generate and hit enter on that command so now that has generated that Prisma client and it saved it inside of our node modules that is all done we are almost ready to build our containers currently if we went head and build our containers we would build our Docker file for this nojz application which is brilliant however the problem is that doesn’t help us with our postgress SQL database now to configure a postgress database inside of a Docker container you essentially just need to run this first command and because it’s just one line we can actually do that from what’s known as a composed. yaml file now where a Docker file is essentially an instruction sheet for creating one Docker container when you have an application that uses potentially you know numerous or even tens of different containers or environments you need to define a configuration sheet to boot up all of these Docker environments so what we’re going to do is create a new file and that’s going to be called docker-compose yaml y ML and I’m going to hit enter on that and so where the docker file is the setup instruction sheet for a singular Docker container the docker composed. yaml is a configuration sheet for our conglomerate of docka files or individual containers it’s kind of like a glorified specs sheet now there’s a few different lines in here and I like to think of it as kind of like a bullet point specification list so we just have a bullet point and some tab indentation of all of the different specs we need to get all of our containers up and running in one Fell Swoop so in here the first uh parameter we have to specify is the version which is going to be version three that line doesn’t really mean much underneath that we have a line that means a whole lot more it’s called Services now inside of services underneath that we’re going to tab across we’re going to indent it and this is where we Define the configuration for all of our different containers the first one is going to be the app container and that is going to be the nodejs docker file that we just created so in here I’m going to use a semicolon enter and then tab across once again it’s like indentation uh instead of using bullet points now this app needs what’s known as a build line now we use the build line when we have a Docker file to build that container

and the path to that Docker file is just the period which is the current directory so it’s going to look in the same directory as this yaml file and it’s going to find the docker file and that’s the instruction sheet it’s going to use to build that container now underneath that at the same indentation we’re going to have a container name and that’s just going to be too- app underneath that we’re going to have an environment parameter now the environment parameter is for specifying the environment variables where in the previous projects we use a EnV file in this case we can do it directly from the specs file which is super handy now the first parameter we’re going to need in here is a database and this is all upper case URL now if you remember inside of this schema Prisma we had to provide a URL and this was the default code created when we created this file so this was already there and it’s looking inside of the environment variables for a database URL now in chapter 3 our server and our database were one unified entity however in this project we have one container with our database and a separate entity that has our server and so we need to provide our server with an address address to locate our database container and that is the database URL so in this case the database URL is a little bit complicated and I’m actually just going to copy it across so once again if you head over to the GitHub code you’ll be able to find this line uh just look for the docker composed. yl file and copy this line across and this is the address we need to find our database and if you do check out the GitHub be sure to St the project love that support so this will give our node server an address through which it can communicate with the database now the second environment variable We need oh I added a little um quotation mark there the second environment variable we need for this app is the JWT secret and that’s an environment variable we’re familiar with from chapter 3 and in here I’m just going to provide a random string your J wtor secret here now this string can be any string it could be a jargon string a whole bunch of random mumbo jumbo or it could be something specific to you but once again environment variables are secure they are protected and it’s something that only you should have information to so whatever string you choose to put here whatever selection of characters just make sure that it’s only available to you equally you know you could just use this uh for development as secure on your device so it’s not going to be a huge issue the third environment variable is called the nodecore EnV so that’s the node environment now typically there’s two environments sometimes there’s three one is development so when you boot up your application in development that’s going to be the development environment another example would be a production environment and a third one might be staging which is typically somewhere in between it’s the envir prior to deploying your code to production now the reason we like to specify this as an environment variable is because sometimes we have code that runs specifically in development other times we’ll have code that only runs in production and so if we specify inside of the environment variables that just allows us to have the same code base but change one line and that’s going to basically specify what environment that we’re in so in this case it’s going to be the development environment finally we’re going to specify the port environment variable which if you recall inside of our server.js we read the port for our app to listen on from the environment variables under the port key we obviously have a backup of 50003 and in this case the port for me is just going to be the same 50003 so that is our environment variables complete now underneath the next parameter we have to specify for our app is the ports now this is called Port mapping where what we do is we match an external port to an internal port and just like exposing we basically set up a configuration for an external network request to meet a port and we match that to an internal Port so in this case I like to keep them the same and that’s going to be a string and we’re just going to match 5,000 in your case or 5,000 and3 in my case to Port 53 so this is the external port on our container and this is the internal Port that we’re going to match it to or map it to underneath the ports the next parameter is going to be a depends on field now obviously our server depends on our database now we haven’t configured the database service just yet but we will do that very shortly but all we do is we just tell it that it depends on the database so one is dependent on the other and that is going to interconnect the two of them and then the last field we need is a volumes field now the volumes field essentially what it does is it creates a database or a storage or a history record of our server so if we didn’t have a volume every time we booted up our container it would be a blank slate if we do have a volume that is a place for us to save the previous state of our container so that when we shut it down and we boot it up again we can just read back from where we were so in this case the storage for basically where we’re at is going to be in this directory right here that is the volume and that is just going to persist any configuration any data any information that is available inside of our container and this would only be erased in the case where we would actually delete the container and rebuild it from scratch so it’s important to have a volume so that your app can remember essentially and that is our first app service complete now the second service which has to be at the same indentation as the app is going to be called the database now the database doesn’t have a build file because we haven’t specified a Docker file for it and as I said earlier instead what we’re going to do is just build it directly from an image now the image we need for our postgress database is the postgress version 13- Alpine and obviously this image is one tab indented from the database which is one tab indented from the file left hand side the indentation is super critical for this file so if you’re uncertain be sure once again to compare it to mine in the GitHub repo now that we have the image which is basically all the setup we need for creating that environment with postgress inside it we’re going to give the container a name so the container name is going to be the postgress database and under that we’re going to have some environment variables now these ones are going to be a little bit different essentially the first one is going to be a postgress user and this is all uppercase and the username is going to be postgress so we’re essentially just defining the login information if you wanted to be a hacker and modify the database behind the scenes so it’s just all of these security credentials for the database the postgress password password is going to also be postgress and finally the postgress uh database name is going to be called to-do app and this to-do app right here has to match this to-do app at the end of the uh database URL so they need to match perfectly now with that done we can specify the ports and the port mapping I want for our database standard practice or convention is is to match 5432 to Port 5432 so that’s just going to map Port 5432 on the outside or the external port for our container to the same port on the inside of our container and once again if we come up to our URL we can see that the port we are using for our database URL is 5432 and as I said that’s just convention and we are going to stick with it now the last thing we need and it’s arguably more important in the case of our database is to specify the volumes which is once again going to just create a data persistence for this container if we didn’t have this once again every time we rebuilt our container or reran it it would be starting from scratch and that’s not very convenient when you’re working with a database we need an environment that’s going to persist data until we literally delete that container off the face of the earth now the volumes URL where this data is going to be saved is a little bit more comp licated in this case it’s just going to be postgress ddata semicolon SLV SL lib SL postgress SQL slata that’s where all of the information is going to be persisted so that when we reboot up our container we can pick up right where we left off finally we’re going to have one more field and this is going to be at the far left hand side so no indentation and that’s just going to be called volumes and that’s going to be postgress Das data just like that and with that our specs file our composed. yaml file is complete so now that we’ve configured this specs file which basically is an instruction sheet on how Docker can boot up every container needed for our application we can actually go ahead and build these containers and finish our project now there’s a few steps required to build these containers and consequently get them up and running one such example is when we have our postgress container working we need to then go into it and make sure that the tables are created with inside that database and we’ll see how that works shortly all of the commands that I’m about to run are available within the get started section of the remy. md4 chapter 4 so you can find all the commands we’re going to be using just there it is a bit of a step-by-step process but once it’s done it’s complete and you now have This brilliant application so we’ve completed the docker compos yaml let’s go ahead and build all of these containers now with Docker open in the background the First Command we’re going to go ahead and write is called Docker space compose space build now what this is going to do is it’s going to build our containers from that composed. yaml file and here we can see it’s starting off by running all of those Docker file commands first for our app and then it also does it for our postgress database now with these containers built we are now ready to boot them up based off the images that are created or the snapshots of these completed environments and we can go ahead and run these virtual environments however before we go and run them both together we now need to make sure our database is updated to match the schema. Prisma file or essentially the tables necessary for our database now the command we’re going to use for that is a little bit complicated it’s Docker compose but instead of running build we are going to run our app and inside of the app we’re going to execute the command npx Prisma migrate Dev d-name and net now what this command is going to do is it’s going to migrate or run a migration for our database and that is going to our schema file and it’s going to create the first version history for essentially any modifications made from our database which in this case is going from a completely blank database to having the necessary tables we need for our application so I’m going to hit enter on that command and we can see what has happened is it has run our postgress container executed this command and it has created the to-do table and the user table so that is excellent we can see that it’s also created a migrations folder inside of our Prisma file this is absolutely normal and it’s not something that you want to Middle with uh it’s just the record history of any modifications made to our database so that is all done now that we have finished that line our database is set up to you know have all the tables we need for our project we can go ahead and boot up our two Docker containers and the command we use for that is Docker compose and then up now you can also specify a/d flag and what that’s going to do is it’s just going to boot them up in the background and give us access to our terminal again but I’m not going to use that in this case so if I go ahead and hit enter we can see now we are running two Services we have a container called postgress DB and another one called to-do Das app and we can see that our to-do app has actually executed that console.log that we have when we tell our app to listen on Port 5003 that is now running and even cooler if we open up Docker desktop we can see here we have chapter 4 we can click on that we have our two containers running and we have a nice log that we need for our whole project so that is absolutely brilliant you could look at the logs for each of them specifically by clicking on each one but here we have just a global log for our containers and with that done that is actually our app running inside of these two containers and so we should be able to I’m just going to clear out any um tokens that we have saved just here so that we can start from scratch so let’s refresh this page we have absolutely nothing local storage I’m going to delete that token so we have our blank application we can see that it is being served up on Port 5003 which is super cool because I’m not running this I’m not running npm run Dev you know like our application is running inside of this container and it’s serving up our application I could go ahead and try to log in and that fails to authenticate which means that our backend endpoints are working what I could also do is I could come into our client emulator right here out too app. rest and I could run this register command I could emulate this client request and we can see that that works and that registers us and it gives us back a token I could try log in again I haven’t actually registered a user inside of the application but that client emulation has created that entry inside of the database so now when I submit I’m actually logged in and we can see that we even have it to-do which means that our backend inside of this container wrote to our postgress database I could go in here and add a new to-do and I could say that this first one is complete and now I can refresh the page and we can see that this data is persisted inside of our database so that is super cool and it gets even cooler what I could do now is I could go ahead and delete this token once again refresh the page that’s going to log us out and I could create a new account and this is going to be test gmail.com here’s a basic password and I can log in and we now have a new user created and what we’re going to do now that we have made all these uh entries to the database is we’re going to see how we can log directly into the database which is a place where we can modify it directly using SQL queries so if I come up here we have all of this uh loging going on inside of our Docker container and what I’m going to do is create a new terminal instance and that’s going to keep this one running in the background but it’s also going to give us a terminal that we can run some new commands in now to log in directly to our database is a slightly complex command and it is once again available inside of the readme.md file but essentially what we do is we write Docker execute exec Das it and we specify the name of the database which is postgress – DB we specify the user which is postgress if you recall that’s what we uh specified in the environment variables and then we specify dasd and we call uh call the to-do app container now if I run this command it’s going to log us into our database uh and I realized I made a mistake just there I missed out one command just after the DB we need to have a psql which is the postgress SQL command and that is important to put in front of all these flags so if I go ahead and run that in that logs Us in directly to our database where we can go ahead and run SQL commands now the first SQL command I’m going to introduce you to is the back SL DT now if I hit enter on that that’s going to show us all the tables available inside of our database so just here we can see we have three tables one is a history of our migrations which are the modifications made to our database or the changes made over time kind of like a version history we also have a to-do table which has all of our to-dos and we have a users table and what I could do now is run the select query to read all the entries inside of a table I could say select everything from the to-do table and what we’re going to do here is just wrap to-do inside of the double quotation marks and that’s just going to match the casing which is going to be super important and when you run these commands you always have to finish them off with a semicolon so if I hit enter on that we can see here we have a table that shows all the data inside of our database we can see we have three tasks we can see the first one is go to the gym and we can see that’s true and we can see that’s currently incomplete it’s completed status is false we can see that we have hello add your first too and and that’s true if you remember I clicked complete when we added our first to-do go to the gym and we can see that both of these to-dos are associated with our first created user and then we can see we have another to-do and that’s associated with our second user and then if I wanted to exit out of this database all I do is I write the Quick Command and that gives me back access to my terminal and you could log in you could actually run a whole lot of crud actions using SQL commands directly on the database and all of these changes would be reflected in the front end but ultimately that is our backend application complete we’ve seen how we can build a server we can set it up to listen to incoming requests to its Port we can add authentication and database interactions we can serve up a front-end application that can create network requests between the front end and the back end so it’s ultimately a full stack application we’ve added middle wear to add authentic ation protection to a whole lot of our crud to-do endpoints we’ve created Docker containers for two different environments one is our server and one is our postr SQL database and we’ve seen how we can boot them up and run them as a collective application for development and this is such a cool backend project to have because a lot of the backend infrastructure you would find at almost any company is just going to be a slightly more sophisticated or equivalent version of what we have just coded here in this F course so I’m super proud of you well done for persisting to the very end congratulations you should pet yourself on the back learning back in development is you know takes a lot of time and practice but now you have an absolutely amazing codebase 2 reference as you build out some absolutely amazing backend applications in future thank you guys so much for sticking with me throughout this course I hope you’ve had a thoroughly good time and if you have enjoyed the course don’t forget to smash the like And subscribe buttons I’ll catch you guys later peace learning to code if so be sure to check out the learn to code road map or dive straight in with these videos that’s a good one

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 9, 2025
Data Science and Machine Learning Foundations
This PDF excerpt details a machine learning foundations course. It covers core concepts like supervised and unsupervised learning, regression and classification models, and essential algorithms. The curriculum also explores practical skills, including Python programming with relevant libraries, natural language processing (NLP), and model evaluation metrics. Several case studies illustrate applying these techniques to various problems, such as house price prediction and customer segmentation. Finally, career advice is offered on navigating the data science job market and building a strong professional portfolio.

Data Science & Machine Learning Study Guide

Quiz
1. How can machine learning improve crop yields for farmers? Machine learning can analyze data to optimize crop yields by monitoring soil health and making decisions about planting, fertilizing, and other practices. This can lead to increased revenue for farmers by improving the efficiency of their operations and reducing costs.
2. Explain the purpose of the Central Limit Theorem in statistical analysis. The Central Limit Theorem states that the distribution of sample means will approximate a normal distribution as the sample size increases, regardless of the original population distribution. This allows for statistical inference about a population based on sample data.
3. What is the primary difference between supervised and unsupervised learning? In supervised learning, a model is trained using labeled data to predict outcomes. In unsupervised learning, a model is trained on unlabeled data to find patterns or clusters within the data without a specific target variable.
4. Name three popular supervised learning algorithms. Three popular supervised learning algorithms are K-Nearest Neighbors (KNN), Decision Trees, and Random Forest. These algorithms are used for both classification and regression tasks.
5. Explain the concept of “bagging” in machine learning. Bagging, short for bootstrap aggregating, involves training multiple models on different subsets of the training data, and then combining their predictions. This technique reduces variance in predictions and creates a more stable prediction model.
6. What are two metrics used to evaluate the performance of a regression model? Two metrics used to evaluate regression models include Residual Sum of Squares (RSS) and R-squared. The RSS measures the sum of the squared differences between predicted and actual values, while R-squared quantifies the proportion of variance explained by the model.
7. Define entropy as it relates to decision trees. In the context of decision trees, entropy measures the impurity or randomness of a data set. A higher entropy value indicates a more mixed class distribution, and decision trees attempt to reduce entropy by splitting data into more pure subsets.
8. What are dummy variables and why are they used in linear regression? Dummy variables are binary variables (0 or 1) used to represent categorical variables in a regression model. They are used to include categorical data in linear regression without misinterpreting the nature of the categorical variables.
9. Why is it necessary to split data into training and testing sets? Splitting data into training and testing sets allows for training the model on one subset of data and then evaluating its performance on a different, unseen subset. This prevents overfitting and helps determine how well the model generalizes to new, real-world data.
10. What is the role of the learning rate in gradient descent? The learning rate (or step size) determines how much the model’s parameters are adjusted during each iteration of gradient descent. A smaller learning rate means smaller steps toward the minimum. A large rate can lead to overshooting or oscillations, and is not the same thing as momentum.
Answer Key
1. Machine learning algorithms can analyze data related to crop health and soil conditions to make data-driven recommendations, which allows farmers to optimize their yield and revenue by using resources more effectively.
2. The Central Limit Theorem is important because it allows data scientists to make inferences about a population by analyzing a sample, and it allows them to understand the distribution of sample means which is a building block to statistical analysis.
3. Supervised learning uses labeled data with defined inputs and outputs for model training, while unsupervised learning works with unlabeled data to discover structures and patterns without predefined results.
4. K-Nearest Neighbors, Decision Trees, and Random Forests are some of the most popular supervised learning algorithms. Each can be used for classification or regression problems.
5. Bagging involves creating multiple training sets using resampling techniques, which allows multiple models to train before their outputs are averaged or voted on. This increases the stability and robustness of the final output.
6. Residual Sum of Squares (RSS) measures error while R-squared measures goodness of fit.
7. Entropy in decision trees measures the impurity or disorder of a dataset. The lower the entropy, the more pure the classification for a given subset of data and vice-versa.
8. Dummy variables are numerical values (0 or 1) that can represent string or categorical variables in an algorithm. This transformation is often required for regression models that are designed to read numerical inputs.
9. Data should be split into training and test sets to prevent overfitting, train and evaluate the model, and ensure that it can generalize well to real-world data that it has not seen.
10. The learning rate is the size of the step taken in each iteration of gradient descent, which determines how quickly the algorithm converges towards the local or global minimum of the error function.
Essay Questions
1. Discuss the importance of data preprocessing in machine learning projects. What are some common data preprocessing techniques, and why are they necessary?
2. Compare and contrast the strengths and weaknesses of different types of machine learning algorithms (e.g., supervised vs. unsupervised, linear vs. non-linear, etc.). Provide specific examples to illustrate your points.
3. Explain the concept of bias and variance in machine learning. How can these issues be addressed when building predictive models?
4. Describe the process of building a recommendation system, including the key challenges and techniques involved. Consider different data sources and evaluation methods.
5. Discuss the ethical considerations that data scientists should take into account when working on machine learning projects. How can fairness and transparency be ensured in the development of AI systems?
Glossary
- Adam: An optimization algorithm that combines the benefits of AdaGrad and RMSprop, often used for training neural networks.
- Bagging: A machine learning ensemble method that creates multiple models using random subsets of the training data to reduce variance.
- Boosting: A machine learning ensemble method that combines weak learners into a strong learner by iteratively focusing on misclassified samples.
- Central Limit Theorem: A theorem stating that the distribution of sample means approaches a normal distribution as the sample size increases.
- Classification: A machine learning task that involves predicting the category or class of a given data point.
- Clustering: An unsupervised learning technique that groups similar data points into clusters.
- Confidence Interval: A range of values that is likely to contain the true population parameter with a certain level of confidence.
- Cosine Similarity: A measure of similarity between two non-zero vectors, often used in recommendation systems.
- DB Scan: A density-based clustering algorithm that identifies clusters based on data point density.
- Decision Trees: A supervised learning algorithm that uses a tree-like structure to make decisions based on input features.
- Dummy Variable: A binary variable (0 or 1) used to represent categorical variables in a regression model.
- Entropy: A measure of disorder or randomness in a dataset, particularly used in decision trees.
- Feature Engineering: The process of transforming raw data into features that can be used in machine learning models.
- Gradient Descent: An optimization algorithm used to minimize the error function of a model by iteratively updating parameters.
- Heteroskedasticity: A condition in which the variance of the error terms in a regression model is not constant across observations.
- Homoskedasticity: A condition in which the variance of the error terms in a regression model is constant across observations.
- Hypothesis Testing: A statistical method used to determine whether there is enough evidence to reject a null hypothesis.
- Inferential Statistics: A branch of statistics that deals with drawing conclusions about a population based on a sample of data.
- K-Means: A clustering algorithm that partitions data points into a specified number of clusters based on their distance from cluster centers.
- K-Nearest Neighbors (KNN): A supervised learning algorithm that classifies or predicts data based on the majority class among its nearest neighbors.
- Law of Large Numbers: A theorem stating that as the sample size increases, the sample mean will converge to the population mean.
- Linear Discriminant Analysis (LDA): A dimensionality reduction and classification technique that finds linear combinations of features to separate classes.
- Logarithm: The inverse operation of exponentiation, used to find the exponent required to reach a certain value.
- Mini-batch Gradient Descent: An optimization method that updates parameters based on a subset of the training data in each iteration.
- Momentum (in Gradient Descent): A technique used with gradient descent that adds a fraction of the previous parameter update to the current update, which reduces oscillations during the search for local or global minima.
- Multi-colinearity: A condition in which independent variables in a regression model are highly correlated with each other.
- Ordinary Least Squares (OLS): A method for estimating the parameters of a linear regression model by minimizing the sum of squared residuals.
- Overfitting: When a model learns the training data too well and cannot generalize to unseen data.
- P-value: The probability of obtaining a result as extreme as the observed result, assuming the null hypothesis is true.
- Random Forest: An ensemble learning method that combines multiple decision trees to make predictions.
- Regression: A machine learning task that involves predicting a continuous numerical output.
- Residual: The difference between the actual value of the dependent variable and the value predicted by a regression model.
- Residual Sum of Squares (RSS): A metric that calculates the sum of the squared differences between the actual and predicted values.
- RMSprop: An optimization algorithm that adapts the learning rate for each parameter based on the root mean square of past gradients.
- R-squared (R²): A statistical measure that indicates the proportion of variance in the dependent variable that is explained by the independent variables in a regression model.
- Standard Deviation: A measure of the amount of variation or dispersion in a set of values.
- Statistical Significance: A concept that determines if a given finding is likely not due to chance; statistical significance is determined through the calculation of a p-value.
- Stochastic Gradient Descent (SGD): An optimization algorithm that updates parameters based on a single random sample of the training data in each iteration.
- Stop Words: Common words in a language that are often removed from text during preprocessing (e.g., “the,” “is,” “a”).
- Supervised Learning: A type of machine learning where a model is trained using labeled data to make predictions.
- Unsupervised Learning: A type of machine learning where a model is trained using unlabeled data to discover patterns or clusters.
AI, Machine Learning, and Data Science Foundations

Okay, here is a detailed briefing document synthesizing the provided sources.

Briefing Document: AI, Machine Learning, and Data Science Foundations

Overview

This document summarizes key concepts and techniques discussed in the provided material. The sources primarily cover a range of topics, including: foundational mathematical and statistical concepts, various machine learning algorithms, deep learning and generative AI, model evaluation techniques, practical application examples in customer segmentation and sales analysis, and finally optimization methods and concepts related to building a recommendation system. The materials appear to be derived from a course or a set of educational resources aimed at individuals seeking to develop skills in AI, machine learning and data science.

Key Themes and Ideas
1. Foundational Mathematics and Statistics
- Essential Math Concepts: A strong foundation in mathematics is crucial. The materials emphasize the importance of understanding exponents, logarithms, the mathematical constant “e,” and pi. Crucially, understanding how these concepts transform when taking derivatives is critical for many machine learning algorithms. For instance, the material mentions that “you need to know what is logarithm what is logarithm at the base of two what is logarithm at the base of e and then at the base of 10…and how does those transform when it comes to taking derivative of the logarithm taking the derivative of the exponent.”
- Statistical Foundations: The course emphasizes descriptive and inferential statistics. Descriptive measures include “distance measures” and “variational measures.” Inferential statistics requires an understanding of theories such as the “Central limit theorem” and “the law of large numbers.” There is also the need to grasp “population sample,” “unbiased sample,” “hypothesis testing,” “confidence interval,” and “statistical significance.” The importance is highlighted that “you need to know those Infamous theories such as Central limit theorem the law of uh large numbers uh and how you can um relate to this idea of population sample unbias sample and also u a hypothesis testing confidence interval statistical sign ific an uh and uh how you can test different theories by using uh this idea of statistical”.
1. Machine Learning Algorithms:
- Supervised Learning: The course covers various supervised learning algorithms, including:
- “Linear discriminant analysis” (LDA): Used for classification by combining multiple features to predict outcomes, as shown in the example of predicting movie preferences by combining movie length and genre.
- “K-Nearest Neighbors” (KNN)
- “Decision Trees”: Used for both classification and regression tasks.
- “Random Forests”: An ensemble method that combines multiple decision trees.
- Boosting Algorithms (e.g. “light GBM, GBM, HG Boost”): Another approach to improve model performance by sequentially training models. The training of these algorithms incorporates “previous stump’s errors.”
- Unsupervised Learning:“K-Means”: A clustering algorithm for grouping data points. Example is given in customer segmentation by their transaction history, “you can for instance use uh K means uh DB scan hierarchal clustering and then you can evaluate your uh clustering algoritms and then select the one that performs the best”.
- “DBScan”: A density-based clustering algorithm, noted for its increasing popularity.
- “Hierarchical Clustering”: Another approach to clustering.
- Bagging: An ensemble method used to reduce variance and create more stable predictions, exemplified through a weight loss prediction based on “daily calorie intake and workout duration.”
- AdaBoost: An algorithm where “each stump is made by using the previous stump’s errors”, also used for building prediction models, exemplified with a housing price prediction project.
1. Deep Learning and Generative AI
- Optimization Algorithms: The material introduces the need for “Adam W RMS prop” optimization techniques.
- Generative Models: The course touches upon more advanced topics including “variation Auto encoders” and “large language models.”
- Natural Language Processing (NLP): It emphasizes the importance of understanding concepts like “n-grams,” “attention mechanisms” (both self-attention and multi-head self-attention), “encoder-decoder architecture of Transformers,” and related algorithms such as “gpts or Birch model.” The sources emphasize “if you want to move towards the NLP side of generative Ai and you want to know how the ched GPT has been invented how the gpts work or the birth mode Ro uh then you will definitely need to uh get into this topic of language model”.
1. Model Evaluation
- Regression Metrics: The document introduces “residual sum of squares” (RSS) as a common metric for evaluating linear regression models. The formula for the RSS is explicitly provided: “the RSS or the residual sum of square or the beta is equal to sum of all the squar of y i minus y hat across all I is equal to 1 till n”.
- Clustering Metrics: The course mentions entropy, and the “Silo score” which is “a measure of the similarity of the data point to its own cluster compared to the other clusters”.
- Regularization: The use of L2 regularization is mentioned, where “Lambda which is always positive so is always larger than equal zero is the tuning parameter or the penalty” and “the Lambda serves to control the relative impact of the penalty on the regression coefficient estimates.”
1. Practical Applications and Case Studies:
- Customer Segmentation: Clustering algorithms (K-means, DBScan) can be used to segment customers based on transaction history.
- Sales Analysis: The material includes analysis of customer types, “consumer, corporate, and home office”, top spending customers, and sales trends over time. There is a suggestion that “a seasonal Trend” might be apparent if a longer time period is considered.
- Geographic Sales Mapping: The material includes using maps to visualize sales per state, which is deemed helpful for companies looking to expand into new geographic areas.
- Housing Price Prediction: A linear regression model is applied to predict house prices using features like median income, average rooms, and proximity to the ocean. An important note is made about the definition of “residual” in this context, with the reminder that “you do not confuse the error with the residual so error can never be observed error you can never calculate and you will never know but what you can do is to predict the error and you can when you predict the error then you get a residual”.
1. Linear Regression and OLS
- Regression Model: The document explains that the linear regression model aims to estimate the relationship between independent and dependent variables. In the context, it emphasizes that “beta Z that you see here is not a variable and it’s called intercept or constant something that is unknown so we don’t have that in our data and is one of the parameters of linear regression it’s an unknown number which the linear regression model should estimate”.
- Ordinary Least Squares (OLS): OLS is a core method to minimize the “sum of squared residuals”. The material states that “the OLS tries to find the line that will minimize its value”.
- Assumptions: The materials mention an assumption of constant variance (homoscedasticity) for errors, and notes “you can check for this assumption by plotting the residual and see whether there is a funnel like graph”. The importance of using a correct statistical test is also highlighted when considering p values.
- Dummy Variables: The need to transform categorical features into dummy variables to be used in linear regression models, with the warning that “you always need to drop at least one of the categories” due to the multicolinearity problem. The process of creating dummy variables is outlined: “we will use the uh get uncore d function in Python from pandas in order to uh go from this one variable to uh five different variable per each of this category”.
- Variable Interpretation: Coefficients in a linear regression model represent the impact of an independent variable on the dependent variable. For example, the material notes, “when we look at the total number of rooms and we increase the number of rooms by uh one additional unit so one more room added to the total underscore rooms then the uh house value uh decreases by minus 2.67”.
- Model Summary Output: The materials discuss interpreting model output metrics such as R-squared which “is the Matrix that show cases what is the um goodness of fit of your model”. It also mentions how to interpret p values.
1. Recommendation Systems
- Feature Engineering: A critical step is identifying and engineering the appropriate features, with the recommendation system based on “data points you use to make decisions about what to recommend”.
- Text Preprocessing: Text data must be cleaned and preprocessed, including removing “stop words” and vectorizing using TF-IDF or similar methods. An example is given “if we use no pen we use no action pack we use denture once we use movies once you 233 use Inspire once and you re use me once and the rest we don’t use it SWS which means we get the vector 0 0 1 1 1 1 0 0 zero here”.
- Cosine Similarity: A technique to find similarity between text vectors. The cosine similarity is defined as “an equation of the dot product of two vectors and the multiplication of the magnitudes of the two vectors”.
- Recommending: The system then recommends items with the highest cosine similarity scores, as mentioned with “we are going to provide we are going to recommend five movies of course you can recommend many or 50 movies that’s completely up to [Music] you”.
1. Career Advice and Perspective
- The Importance of a Plan: The material emphasizes the value of creating a career plan and focusing on actionable steps. The advice is “this kind of plan actually make you focus because if you are not focusing on that thing you could just going anywhere at that lose loose loose loose lose your way”.
- Learning by Doing: The speaker advocates doing smaller projects to prove your abilities, especially as a junior data scientist. As they state, “the best way is like yeah just do the work if like a smaller like as you said previously youly like it might be boring stuff it might be an assum it might be not leading anywhere but those kind of work show”.
- Business Acumen: Data scientists should focus on how their work provides value to the business, and “data scientist is someone who bring the value to the business and making the decision for the battle any business”.
- Personal Branding: Building a personal brand is also seen as important, with the recommendation that “having a newsletter and having a LinkedIn following” can help. Technical portfolio sites like “GitHub” are recommended.
- Data Scientist Skills: The ability to show your thought process and motivation is important in data science interviews. As the speaker notes, “how’s your uh thought process going how’s your what what motivated you to do this kind of project what motivated you to do uh this kind of code what motivated you to present this kinde of result”.
- Future of Data Science: The future of data science is predicted to become “invaluable to the business”, especially given the current rapid development of AI.
- Business Fundamentals: The importance of thinking about the needs-based aspect of a business, that it must be something people need or “if my roof was leaking and it’s raining outside and I’m in my house you know and water is pouring on my head I have to fix that whether I’m broke or not you know”.
- Entrepreneurship: The importance of planning, which was inspired by being a pilot where “pilots don’t take off unless we know where we’re going”.
- Growth: The experience at GE emphasized that “growing so fast it was doubling in size every three years and that that really informed my thinking about growth”.
- Mergers and Aquisitions (M&A): The business principle of using debt to buy underpriced assets that can be later sold at a higher multiple for profit.
1. Optimization
- Gradient Descent (GD): The update of the weight is equal to the current weight parameter minus the learning rate times the gradient and so “the same we also do for our second parameter which is the bias Factor”.
- Stochastic Gradient Descent (SGD): HGD is different from GD in that it “uses the gradient from a single data point which is just one observation in order to update our parameters”. This makes it “much faster and computationally much less expensive compared to the GD”.
- SGD With Momentum: SGD with momentum addresses the disadvantages of the basic SGD algorithm.
- Mini-Batch Gradient Descent: A trade-off between the two, and “it tries to strike a balance by selecting smaller batches and calculating the gradient over them”.
- RMSprop: RMSprop is introduced as an algorithm for controlling learning rates, where “for the parameters that will have a small gradients we will be then controlling this and we will be increasing their learning rate to ensure that the gradient will not vanish”.
Conclusion

These materials provide a broad introduction to data science, machine learning, and AI. They cover mathematical and statistical foundations, various algorithms (both supervised and unsupervised), deep learning concepts, model evaluation, and provide case studies to illustrate the practical application of such techniques. The inclusion of career advice and reflections makes it a very holistic learning experience. The information is designed to build a foundational understanding and introduce more complex concepts.

Essential Concepts in Machine Learning

Frequently Asked Questions
- What are some real-world applications of machine learning, as discussed in the context of this course? Machine learning has diverse applications, including optimizing crop yields by monitoring soil health, and predicting customer preferences, such as in the entertainment industry as seen with Netflix’s recommendations. It’s also useful in customer segmentation (identifying “good”, “better”, and “best” customers based on transaction history) and creating personalized recommendations (like prioritizing movies based on a user’s preferred genre). Further, machine learning can help companies decide which geographic areas are most promising for their products based on sales data and can help investors identify which features of a house are correlated with its value.
- What are the core mathematical concepts that are essential for understanding machine learning and data science? A foundational understanding of several mathematical concepts is critical. This includes: the idea of using variables with different exponents (e.g., X, X², X³), understanding logarithms at different bases (base 2, base e, base 10), comprehending the meaning of ‘e’ and ‘Pi’, mastering exponents and logarithms and how they transform when taking derivatives. A fundamental understanding of descriptive (distance measures, variational measures) and inferential statistics (central limit theorem, law of large numbers, population vs. sample, hypothesis testing) is also essential.
- What specific machine learning algorithms should I be familiar with, and what are their uses? The course highlights the importance of both supervised and unsupervised learning techniques. For supervised learning, you should know linear discriminant analysis (LDA), K-Nearest Neighbors (KNN), decision trees (for both classification and regression), random forests, and boosting algorithms like light GBM, GBM, and XGBoost. For unsupervised learning, understanding K-Means clustering, DBSCAN, and hierarchical clustering is crucial. These algorithms are used in various applications like classification, clustering, and regression.
- How can I assess the performance of my machine learning models? Several metrics are used to evaluate model performance, depending on the task at hand. For regression models, the residual sum of squares (RSS) is crucial; it measures the difference between predicted and actual values. Metrics like entropy, also the Gini index, and the silhouette score (which measures the similarity of a data point to its own cluster vs. other clusters) are used for evaluating classification and clustering models. Additionally, concepts like the penalty term, used to control impact of model complexity, and the L2 Norm used in regression are highlighted as important for proper evaluation.
- What is the significance of linear regression and what key concepts should I know? Linear regression is used to model the relationship between a dependent variable (Y) and one or more independent variables (X). A crucial aspect is estimating coefficients (betas) and intercepts which quantify these relationships. It is key to understand concepts like the residuals (differences between predicted and actual values), and how ordinary least squares (OLS) is used to minimize the sum of squared residuals. In understanding linear regression, it is also important not to confuse errors (which are never observed and can’t be calculated) with residuals (which are predictions of errors). It’s also crucial to be aware of assumptions about your errors and their variance.
- What are dummy variables, and why are they used in modeling? Dummy variables are binary (0 or 1) variables used to represent categorical data in regression models. When transforming categorical variables like ocean proximity (with categories such as near bay, inland, etc.), each category becomes a separate dummy variable. The “1” indicates that a condition is met, and a “0” indicates that it is not. It is essential to drop one of these dummy variables to avoid perfect multicollinearity (where one variable is predictable from other variables) which could cause an OLS violation.
- What are some of the main ideas behind recommendation systems as discussed in the course? Recommendation systems rely on data points to identify similarities between items to generate personalized results. Text data preprocessing is often done using techniques like tokenization, removing stop words, and stemming to convert data into vectors. Cosine similarity is used to measure the angle between two vector representations. This allows one to calculate how similar different data points (such as movies) are, based on common features (like genre, plot keywords). For example, a movie can be represented as a vector in a high-dimensional space that captures different properties about the movie. This approach enables recommendations based on calculated similarity scores.
- What key steps and strategies are recommended for aspiring data scientists? The course emphasizes several critical steps. It’s important to start with projects to demonstrate the ability to apply data science skills. This includes going beyond basic technical knowledge and considering the “why” behind projects. A focus on building a personal brand, which can be done through online platforms like LinkedIn, GitHub, and Medium is recommended. Understanding the business value of data science is key, which includes communicating project findings effectively. Also emphasized is creating a career plan and acting responsibly for your career choices. Finally, focusing on a niche or specific sector is recommended to ensure that one’s technical skills match the business needs.
Fundamentals of Machine Learning

Machine learning (ML) is a branch of artificial intelligence (AI) that builds models based on data, learns from that data, and makes decisions [1]. ML is used across many industries, including healthcare, finance, entertainment, marketing, and transportation [2-9].

Key Concepts in Machine Learning:
- Supervised Learning: Algorithms are trained using labeled data [10]. Examples include regression and classification models [11].
- Regression: Predicts continuous values, such as house prices [12, 13].
- Classification: Predicts categorical values, such as whether an email is spam [12, 14].
- Unsupervised Learning: Algorithms are trained using unlabeled data, and the model must find patterns without guidance [11]. Examples include clustering and outlier detection techniques [12].
- Semi-Supervised Learning: A combination of supervised and unsupervised learning [15].
Machine Learning Algorithms:
- Linear Regression: A statistical or machine learning method used to model the impact of a change in a variable [16, 17]. It can be used for causal analysis and predictive analytics [17].
- Logistic Regression: Used for classification, especially with binary outcomes [14, 15, 18].
- K-Nearest Neighbors (KNN): A classification algorithm [19, 20].
- Decision Trees: Can be used for both classification and regression [19, 21]. They are transparent and handle diverse data, making them useful in various industries [22-25].
- Random Forest: An ensemble learning method that combines multiple decision trees, suitable for classification and regression [19, 26, 27].
- Boosting Algorithms: Such as AdaBoost, light GBM, GBM, and XGBoost, build trees using information from previous trees to improve performance [19, 28, 29].
- K-Means: A clustering algorithm [19, 30].
- DB Scan: A clustering algorithm that is becoming increasingly popular [19].
- Hierarchical Clustering: Another clustering technique [19, 30].
Important Steps in Machine Learning:
- Data Preparation: This involves splitting data into training and test sets and handling missing values [31-33].
- Feature Engineering: Identifying and selecting the most relevant data points (features) to be used by the model to generate the most accurate results [34, 35].
- Model Training: Selecting an appropriate algorithm and training it on the training data [36].
- Model Evaluation: Assessing model performance using appropriate metrics [37].
Model Evaluation Metrics:
- Regression Models:
- Residual Sum of Squares (RSS) [38].
- Mean Squared Error (MSE) [38, 39].
- Root Mean Squared Error (RMSE) [38, 39].
- Mean Absolute Error (MAE) [38, 39].
- Classification Models:
- Accuracy: Proportion of correctly classified instances [40].
- Precision: Measures the accuracy of positive predictions [40].
- Recall: Measures the model’s ability to identify all positive instances [40].
- F1 Score: Combines precision and recall into a single metric [39, 40].
Bias-Variance Tradeoff:
- Bias: The inability of a model to capture the true relationship in the data [41]. Complex models tend to have low bias but high variance [41-43].
- Variance: The sensitivity of a model to changes in the training data [41-43]. Simpler models have low variance but high bias [41-43].
- Overfitting: Occurs when a model learns the training data too well, including noise [44, 45]. This results in poor performance on unseen data [44].
- Underfitting: Occurs when a model is too simple to capture the underlying patterns in the data [45].
Techniques to address overfitting:
- Reducing model complexity: Using simpler models to reduce the chances of overfitting [46].
- Cross-validation: Using different subsets of data for training and testing to get a more realistic measure of model performance [46].
- Early stopping: Monitoring the model performance and stopping the training process when it begins to decrease [47].
- Regularization techniques: Such as L1 and L2 regularization, helps to prevent overfitting by adding penalty terms that reduce the complexity of the model [48-50].
Python and Machine Learning:
- Python is a popular programming language for machine learning because it has a lot of libraries, including:
- Pandas: For data manipulation and analysis [51].
- NumPy: For numerical operations [51, 52].
- Scikit-learn (sklearn): For machine learning algorithms and tools [13, 51-59].
- SciPy: For scientific computing [51].
- NLTK: For natural language processing [51].
- TensorFlow and PyTorch: For deep learning [51, 60, 61].
- Matplotlib: For data visualization [52, 62, 63].
- Seaborn: For data visualization [62].
Natural Language Processing (NLP):
- NLP is used to process and analyze text data [64, 65].
- Key steps include: text cleaning (lowercasing, punctuation removal, tokenization, stemming, and lemmatization), and converting text to numerical data with techniques such as TF-IDF, word embeddings, subword embeddings and character embeddings [66-68].
- NLP is used in applications such as chatbots, virtual assistants, and recommender systems [7, 8, 66].
Deep Learning:
- Deep learning is an advanced form of machine learning that uses neural networks with multiple layers [7, 60, 68].
- Examples include:
- Recurrent Neural Networks (RNNs) [69, 70].
- Artificial Neural Networks (ANNs) [69].
- Convolutional Neural Networks (CNNs) [69, 70].
- Generative Adversarial Networks (GANs) [69].
- Transformers [8, 61, 71-74].
Practical Applications of Machine Learning:
- Recommender Systems: Suggesting products, movies, or jobs to users [6, 9, 64, 75-77].
- Predictive Analytics: Using data to forecast future outcomes, such as house prices [13, 17, 78].
- Fraud Detection: Identifying fraudulent transactions in finance [4, 27, 79].
- Customer Segmentation: Grouping customers based on their behavior [30, 80].
- Image Recognition: Classifying images [14, 81, 82].
- Autonomous Vehicles: Enabling self-driving cars [7].
- Chatbots and virtual assistants: Providing automated customer support using NLP [8, 18, 83].
Career Paths in Machine Learning:
- Machine Learning Researcher: Focuses on developing and testing new machine learning algorithms [84, 85].
- Machine Learning Engineer: Focuses on implementing and deploying machine learning models [85-87].
- AI Researcher: Similar to machine learning researcher but focuses on more advanced models like deep learning and generative AI [70, 74, 88].
- AI Engineer: Similar to machine learning engineer but works with more advanced AI models [70, 74, 88].
- Data Scientist: A broad role that uses data analysis, statistics, and machine learning to solve business problems [54, 89-93].
Additional Considerations:
- It’s important to develop not only technical skills, but also communication skills, business acumen, and the ability to translate business needs into data science problems [91, 94-96].
- A strong data science portfolio is key for getting into the field [97].
- Continuous learning is essential to keep up with the latest technology [98, 99].
- Personal branding can open up many opportunities [100].
This overview should provide a strong foundation in the fundamentals of machine learning.

A Comprehensive Guide to Data Science

Data science is a field that uses data analysis, statistics, and machine learning to solve business problems [1, 2]. It is a broad field with many applications, and it is becoming increasingly important in today’s world [3]. Data science is not just about crunching numbers; it also involves communication, business acumen, and translation skills [4].

Key Aspects of Data Science:
- Data Analysis: Examining data to understand patterns and insights [5, 6].
- Statistics: Applying statistical methods to analyze data, test hypotheses and make inferences [7, 8].
- Descriptive statistics, which includes measures like mean, median, and standard deviation, helps in summarizing data [8].
- Inferential statistics, which involves concepts like the central limit theorem and hypothesis testing, help in drawing conclusions about a population based on a sample [9].
- Probability distributions are also important in understanding machine learning concepts [10].
- Machine Learning (ML): Using algorithms to build models based on data, learn from it, and make decisions [2, 11-13].
- Supervised learning involves training algorithms on labeled data for tasks like regression and classification [13-16]. Regression is used to predict continuous values, while classification is used to predict categorical values [13, 17].
- Unsupervised learning involves training algorithms on unlabeled data to identify patterns, as in clustering and outlier detection [13, 18, 19].
- Programming: Using programming languages such as Python to implement data science techniques [20]. Python is popular due to its versatility and many libraries [20, 21].
- Libraries such as Pandas and NumPy are used for data manipulation [22, 23].
- Scikit-learn is used for implementing machine learning models [22, 24, 25].
- TensorFlow and PyTorch are used for deep learning [22, 26].
- Libraries such as Matplotlib and Seaborn are used for data visualization [17, 25, 27, 28].
- Data Visualization: Representing data through charts, graphs, and other visual formats to communicate insights [25, 27].
- Business Acumen: Understanding business needs and translating them into data science problems and solutions [4, 29].
The Data Science Process:
1. Data Collection: Gathering relevant data from various sources [30].
2. Data Preparation: Cleaning and preprocessing data, which involves:
- Handling missing values by removing or imputing them [31, 32].
- Identifying and removing outliers [32-35].
- Data wrangling: transforming and cleaning data for analysis [6].
- Data exploration: using descriptive statistics and data visualization to understand the data [36-39].
- Data Splitting: Dividing data into training, validation, and test sets [14].
1. Feature Engineering: Identifying, selecting, and transforming variables [40, 41].
2. Model Training: Selecting an appropriate algorithm, training it on the training data, and optimizing it with validation data [14].
3. Model Evaluation: Assessing model performance using relevant metrics on the test data [14, 42].
4. Deployment and Communication: Communicating results and translating them into actionable insights for stakeholders [43].
Applications of Data Science:
- Business and Finance: Customer segmentation, fraud detection, credit risk assessment [44-46].
- Healthcare: Disease diagnosis, risk prediction, treatment planning [46, 47].
- Operations Management: Optimizing decision-making using data [44].
- Engineering: Fault diagnosis [46-48].
- Biology: Classification of species [47-49].
- Customer service: Developing troubleshooting guides and chatbots [47-49].
- Recommender systems are used in entertainment, marketing, and other industries to suggest products or movies to users [30, 50, 51].
- Predictive Analytics are used to forecast future outcomes [24, 41, 52].
Key Skills for Data Scientists:
- Technical Skills: Proficiency in programming languages such as Python and knowledge of relevant libraries. Also expertise in statistics, mathematics, and machine learning [20].
- Communication Skills: Ability to communicate results to technical and non-technical audiences [4, 43].
- Business Skills: Understanding business requirements and translating them into data-driven solutions [4, 29].
- Problem-solving skills: Ability to define, analyze, and solve complex problems [4, 29].
Career Paths in Data Science:
- Data Scientist
- Machine Learning Engineer
- AI Engineer
- Data Science Manager
- NLP Engineer
- Data Analyst
Additional Considerations:
- A strong portfolio demonstrating data science project is essential to showcase practical skills [53-56].
- Continuous learning is necessary to keep up with the latest technology in the field [57].
- Personal branding can enhance opportunities in data science [58-61].
- Data scientists must be able to adapt to the evolving landscape of AI and machine learning [62, 63].
This information should give a comprehensive overview of the field of data science.

Artificial Intelligence: Applications Across Industries

Artificial intelligence (AI) has a wide range of applications across various industries [1, 2]. Machine learning, a branch of AI, is used to build models based on data and learn from this data to make decisions [1].

Here are some key applications of AI:
- Healthcare: AI is used in the diagnosis of diseases, including cancer, and for identifying severe effects of illnesses [3]. It also helps with drug discovery, personalized medicine, treatment plans, and improving hospital operations [3, 4]. Additionally, AI helps in predicting the number of patients that a hospital can expect in the emergency room [4].
- Finance: AI is used for fraud detection in credit card and banking operations [5]. It is also used in trading, combined with quantitative finance, to help traders make decisions about stocks, bonds, and other assets [5].
- Retail: AI helps in understanding and estimating demand for products, determining the most appropriate warehouses for shipping, and building recommender systems and search engines [5, 6].
- Marketing: AI is used to understand consumer behavior and target specific groups, which helps reduce marketing costs and increase conversion rates [7, 8].
- Transportation: AI is used in autonomous vehicles and self-driving cars [8].
- Natural Language Processing (NLP): AI is behind applications such as chatbots, virtual assistants, and large language models [8, 9]. These tools use text data to answer questions and provide information [9].
- Smart Home Devices: AI powers smart home devices like Alexa [9].
- Agriculture: AI is used to estimate weather conditions, predict crop production, monitor soil health, and optimize crop yields [9, 10].
- Entertainment: AI is used to build recommender systems that suggest movies and other content based on user data. Netflix is a good example of a company that uses AI in this way [10, 11].
- Customer service: AI powers chatbots that can categorize customer inquiries and provide appropriate responses, reducing wait times and improving support efficiency [12-15].
- Game playing: AI is used to design AI opponents in games [13, 14, 16].
- E-commerce: AI is used to provide personalized product recommendations [14, 16].
- Human Resources: AI helps to identify factors influencing employee retention [16, 17].
- Fault Diagnosis: AI helps isolate the cause of malfunctions in complex systems by analyzing sensor data [12, 18].
- Biology: AI is used to categorize species based on characteristics or DNA sequences [12, 15].
- Remote Sensing: AI is used to analyze satellite imagery and classify land cover types [12, 15].
In addition to these, AI is also used in many areas of data science, such as customer segmentation [19-21], fraud detection [19-22], credit risk assessment [19-21], and operations management [19, 21, 23, 24].

Overall, AI is a powerful technology with a wide range of applications that improve efficiency, decision-making, and customer experience in many areas [11].

Essential Python Libraries for Data Science

Python libraries are essential tools in data science, machine learning, and AI, providing pre-written functions and modules that streamline complex tasks [1]. Here’s an overview of the key Python libraries mentioned in the sources:
- Pandas: This library is fundamental for data manipulation and analysis [2, 3]. It provides data structures like DataFrames, which are useful for data wrangling, cleaning, and preprocessing [3, 4]. Pandas is used for tasks such as reading data, handling missing values, identifying outliers, and performing data filtering [3, 5].
- NumPy: NumPy is a library for numerical computing in Python [2, 3, 6]. It is used for working with arrays and matrices and performing mathematical operations [3, 7]. NumPy is essential for data visualization and other tasks in machine learning [3].
- Matplotlib: This library is used for creating visualizations like plots, charts, and histograms [6-8]. Specifically, pyplot is a module within Matplotlib used for plotting [9, 10].
- Seaborn: Seaborn is another data visualization library that is known for creating more appealing visualizations [8, 11].
- Scikit-learn (psyit learn): This library provides a wide range of machine learning algorithms and tools for tasks like regression, classification, clustering, and model evaluation [2, 6, 10, 12]. It includes modules for model selection, ensemble learning, and metrics [13]. Scikit-learn also includes tools for data preprocessing, such as splitting the data into training and testing sets [14, 15].
- Statsmodels: This library is used for statistical modeling and econometrics and has capabilities for linear regression [12, 16]. It is particularly useful for causal analysis because it provides detailed statistical summaries of model results [17, 18].
- NLTK (Natural Language Toolkit): This library is used for natural language processing tasks [2]. It is helpful for text data cleaning, such as tokenization, stemming, lemmatization, and stop word removal [19, 20]. NLTK also assists in text analysis and processing [21].
- TensorFlow and PyTorch: These are deep learning frameworks used for building and training neural networks and implementing deep learning models [2, 22, 23]. They are essential for advanced machine learning tasks, such as building large language models [2].
- Pickle: This library is used for serializing and deserializing Python objects, which is useful for saving and loading models and data [24, 25].
- Requests: This library is used for making HTTP requests, which is useful for fetching data from web APIs, like movie posters [25].
These libraries facilitate various stages of the data science workflow [26]:
- Data loading and preparation: Libraries like Pandas and NumPy are used to load, clean, and transform data [2, 26].
- Data visualization: Libraries like Matplotlib and Seaborn are used to create plots and charts that help to understand data and communicate insights [6-8].
- Model training and evaluation: Libraries like Scikit-learn and Statsmodels are used to implement machine learning algorithms, train models, and evaluate their performance [2, 12, 26].
- Deep learning: Frameworks such as TensorFlow and PyTorch are used for building complex neural networks and deep learning models [2, 22].
- Natural language processing: Libraries such as NLTK are used for processing and analyzing text data [2, 27].
Mastering these Python libraries is crucial for anyone looking to work in data science, machine learning, or AI [1, 26]. They provide the necessary tools for implementing a wide array of tasks, from basic data analysis to advanced model building [1, 2, 22, 26].

Machine Learning Model Evaluation

Model evaluation is a crucial step in the machine learning process that assesses the performance and effectiveness of a trained model [1, 2]. It involves using various metrics to quantify how well the model is performing, which helps to identify whether the model is suitable for its intended purpose and how it can be improved [2-4]. The choice of evaluation metrics depends on the specific type of machine learning problem, such as regression or classification [5].

Key Concepts in Model Evaluation:
- Performance Metrics: These are measures used to evaluate how well a model is performing. Different metrics are appropriate for different types of tasks [5, 6].
- For regression models, common metrics include:
- Residual Sum of Squares (RSS): Measures the sum of the squares of the differences between the predicted and true values [6-8].
- Mean Squared Error (MSE): Calculates the average of the squared differences between predicted and true values [6, 7].
- Root Mean Squared Error (RMSE): The square root of the MSE, which provides a measure of the error in the same units as the target variable [6, 7].
- Mean Absolute Error (MAE): Calculates the average of the absolute differences between predicted and true values. MAE is less sensitive to outliers compared to MSE [6, 7, 9].
- For classification models, common metrics include:
- Accuracy: Measures the proportion of correct predictions made by the model [9, 10].
- Precision: Measures the proportion of true positive predictions among all positive predictions made by the model [7, 9, 10].
- Recall: Measures the proportion of true positive predictions among all actual positive instances [7, 9, 11].
- F1 Score: The harmonic mean of precision and recall, providing a balanced measure of a model’s performance [7, 9].
- Area Under the Curve (AUC): A metric used when plotting the Receiver Operating Characteristic (ROC) curve to assess the performance of binary classification models [12].
- Cross-entropy: A loss function used to measure the difference between the predicted and true probability distributions, often used in classification problems [7, 13, 14].
- Bias and Variance: These concepts are essential for understanding model performance [3, 15].
- Bias refers to the error introduced by approximating a real-world problem with a simplified model, which can cause the model to underfit the data [3, 4].
- Variance measures how much the model’s predictions vary for different training data sets; high variance can cause the model to overfit the data [3, 16].
- Overfitting and Underfitting: These issues can affect model accuracy [17, 18].
- Overfitting occurs when a model learns the training data too well, including noise, and performs poorly on new, unseen data [17-19].
- Underfitting occurs when a model is too simple and cannot capture the underlying patterns in the training data [17, 18].
- Training, Validation, and Test Sets: Data is typically split into three sets [2, 20]:
- Training Set: Used to train the model.
- Validation Set: Used to tune model hyperparameters and prevent overfitting.
- Test Set: Used to evaluate the final model’s performance on unseen data [20-22].
- Hyperparameter Tuning: Adjusting model parameters to minimize errors and optimize performance, often using the validation set [21, 23, 24].
- Cross-Validation: A resampling technique that allows the model to be trained and tested on different subsets of the data to assess its generalization ability [7, 25].
- K-fold cross-validation divides the data into k subsets or folds and iteratively trains and evaluates the model by using each fold as the test set once [7].
- Leave-one-out cross-validation uses each data point as a test set, training the model on all the remaining data points [7].
- Early Stopping: A technique where the model’s performance on a validation set is monitored during the training process, and training is stopped when the performance starts to decrease [25, 26].
- Ensemble Methods: Techniques that combine multiple models to improve performance and reduce overfitting. Some ensemble techniques are decision trees, random forests, and boosting techniques such as Adaboost, Gradient Boosting Machines (GBM), and XGBoost [26]. Bagging is an ensemble technique that reduces variance by training multiple models and averaging the results [27-29].
Step-by-Step Process for Model Evaluation:
1. Data Splitting: Divide the data into training, validation, and test sets [2, 20].
2. Algorithm Selection: Choose an appropriate algorithm based on the problem and data characteristics [24].
3. Model Training: Train the selected model using the training data [24].
4. Hyperparameter Tuning: Adjust model parameters using the validation data to minimize errors [21].
5. Model Evaluation: Evaluate the model’s performance on the test data using chosen metrics [21, 22].
6. Analysis and Refinement: Analyze the results, make adjustments, and retrain the model if necessary [3, 17, 30].
Importance of Model Evaluation:
- Ensures Model Generalization: It helps to ensure that the model performs well on new, unseen data, rather than just memorizing the training data [22].
- Identifies Model Issues: It helps in detecting issues like overfitting, underfitting, and bias [17-19].
- Guides Model Improvement: It provides insights into how the model can be improved through hyperparameter tuning, data collection, or algorithm selection [21, 24, 25].
- Validates Model Reliability: It validates the model’s ability to provide accurate and reliable results [2, 15].
Additional Notes:
- Statistical significance is an important concept in model evaluation to ensure that the results are unlikely to have occurred by random chance [31, 32].
- When evaluating models, it is important to understand the trade-off between model complexity and generalizability [33, 34].
- It is important to check the assumptions of the model, for example, when using linear regression, it is essential to check assumptions such as linearity, exogeneity, and homoscedasticity [35-39].
- Different types of machine learning models should be evaluated using appropriate metrics. For example, classification models use metrics like accuracy, precision, recall, and F1 score, while regression models use metrics like MSE, RMSE, and MAE [6, 9].
By carefully evaluating machine learning models, one can build reliable systems that address real-world problems effectively [2, 3, 40, 41].

AI Foundations Course – Python, Machine Learning, Deep Learning, Data Science

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 3, 2025
SQL Fundamentals: Querying, Filtering, and Aggregating Data
The text is a tutorial on SQL, a language for managing and querying data. It highlights the fundamental differences between SQL and spreadsheets, emphasizing the organized structure of data in tables with defined schemas and relationships. The tutorial introduces core SQL concepts like statements, clauses (SELECT, FROM, WHERE), and the logical order of operations. It explains how to retrieve and filter data, perform calculations, aggregate results (SUM, COUNT, AVERAGE), and use window functions for more complex data manipulation without altering the data’s structure. The material also covers advanced techniques such as subqueries, Common Table Expressions (CTEs), and joins to combine data from multiple tables. The tutorial emphasizes the importance of Boolean algebra and provides practical exercises to reinforce learning.

SQL Study Guide

Review of Core Concepts

This study guide focuses on the following key areas:
- BigQuery Data Organization: How data is structured within BigQuery (Projects, Datasets, Tables).
- SQL Fundamentals: Basic SQL syntax, clauses (SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT).
- Data Types and Schemas: Understanding data types and how they influence operations.
- Logical Order of Operations: The sequence in which SQL operations are executed.
- Boolean Algebra: Using logical operators (AND, OR, NOT) and truth tables.
- Set Operations: Combining data using UNION, INTERSECT, EXCEPT.
- CASE Statements: Conditional logic for data transformation.
- Subqueries: Nested queries and their correlation.
- JOIN Operations: Combining tables (INNER, LEFT, RIGHT, FULL OUTER).
- GROUP BY and Aggregations: Summarizing data using aggregate functions (SUM, AVG, COUNT, MIN, MAX).
- HAVING Clause: Filtering aggregated data.
- Window Functions: Performing calculations across rows without changing the table’s structure (OVER, PARTITION BY, ORDER BY, ROWS BETWEEN).
- Numbering Functions: Ranking and numbering rows (ROW_NUMBER, RANK, DENSE_RANK, NTILE).
- Date and Time Functions: Extracting and manipulating date and time components.
- Common Table Expressions (CTEs): Defining temporary result sets for complex queries.
Quiz

Answer each question in 2-3 sentences.
1. Explain the relationship between projects, datasets, and tables in BigQuery.
2. What is a SQL clause and can you provide three examples?
3. Why is it important to understand data types when working with SQL?
4. Describe the logical order of operations in SQL.
5. Explain the purpose of Boolean algebra in SQL.
6. Describe the difference between UNION, INTERSECT, and EXCEPT set operators.
7. What is a CASE statement, and how is it used in SQL?
8. Explain the difference between correlated and uncorrelated subqueries.
9. Compare and contrast INNER JOIN, LEFT JOIN, and FULL OUTER JOIN.
10. Explain the fundamental difference between GROUP BY aggregations and WINDOW functions.
Quiz Answer Key
1. BigQuery organizes data hierarchically, with projects acting as top-level containers, datasets serving as folders for tables within a project, and tables storing the actual data in rows and columns. Datasets organize tables, while projects organize datasets, offering a structured way to manage and access data.
2. A SQL clause is a building block that makes up a complete SQL statement, defining specific actions or conditions. Examples include the SELECT clause to choose columns, the FROM clause to specify the table, and the WHERE clause to filter rows.
3. Understanding data types is crucial because it dictates the types of operations that can be performed on a column and determines how data is stored and manipulated, and it also avoids errors and ensures accurate results.
4. The logical order of operations determines the sequence in which SQL clauses are executed, starting with FROM, then WHERE, GROUP BY, HAVING, SELECT, ORDER BY, and finally LIMIT, impacting the query’s outcome.
5. Boolean algebra allows for complex filtering and conditional logic within WHERE clauses using AND, OR, and NOT operators to specify precise conditions for row selection based on truth values.
6. UNION combines the results of two or more queries into a single result set, INTERSECT returns only the rows that are common to all input queries, and EXCEPT returns the rows from the first query that are not present in the second query.
7. A CASE statement allows for conditional logic within a SQL query, enabling you to define different outputs based on specified conditions, similar to an “if-then-else” structure.
8. A correlated subquery depends on the outer query, executing once for each row processed, while an uncorrelated subquery is independent and executes only once, providing a constant value to the outer query.
9. INNER JOIN returns only matching rows from both tables, LEFT JOIN returns all rows from the left table and matching rows from the right, filling in NULL for non-matches, while FULL OUTER JOIN returns all rows from both tables, filling in NULL where there are no matches.
10. GROUP BY aggregations collapse multiple rows into a single row based on grouped values, while window functions perform calculations across a set of table rows that are related to the current row without collapsing or grouping rows.
Essay Questions
1. Discuss the importance of understanding the logical order of operations in SQL when writing complex queries. Provide examples of how misunderstanding this order can lead to unexpected results.
2. Explain the different types of JOIN operations available in SQL, providing scenarios in which each type would be most appropriate. Illustrate with specific examples related to the course material.
3. Describe the use of window functions in SQL. Include the purpose of PARTITION BY and ORDER BY. Explain some practical applications of these functions, emphasizing their ability to perform complex calculations without altering the structure of the table.
4. Discuss the use of Common Table Expressions (CTEs) in SQL. How do they improve the readability and maintainability of complex queries? Provide an example of a query that benefits from the use of CTEs.
5. Develop a SQL query using different levels of aggregations. Explain the query and explain its purpose.
Glossary of Key Terms
- Project (BigQuery): A top-level container for datasets and resources in BigQuery.
- Dataset (BigQuery): A collection of tables within a BigQuery project, similar to a folder.
- Table (SQL): A structured collection of data organized in rows and columns.
- Schema (SQL): The structure of a table, including column names and data types.
- Clause (SQL): A component of a SQL statement that performs a specific action (e.g., SELECT, FROM, WHERE).
- Data Type (SQL): The type of data that a column can hold (e.g., INTEGER, VARCHAR, DATE).
- Logical Order of Operations (SQL): The sequence in which SQL clauses are executed (FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT).
- Boolean Algebra: A system of logic dealing with true and false values, used in SQL for conditional filtering.
- Set Operations (SQL): Operations that combine or compare result sets from multiple queries (UNION, INTERSECT, EXCEPT).
- CASE Statement (SQL): A conditional expression that allows for different outputs based on specified conditions.
- Subquery (SQL): A query nested inside another query.
- Correlated Subquery (SQL): A subquery that depends on the outer query for its values.
- Uncorrelated Subquery (SQL): A subquery that does not depend on the outer query.
- JOIN (SQL): An operation that combines rows from two or more tables based on a related column.
- INNER JOIN (SQL): Returns only matching rows from both tables.
- LEFT JOIN (SQL): Returns all rows from the left table and matching rows from the right table.
- RIGHT JOIN (SQL): Returns all rows from the right table and matching rows from the left table.
- FULL OUTER JOIN (SQL): Returns all rows from both tables, matching or not.
- GROUP BY (SQL): A clause that groups rows with the same values in specified columns.
- Aggregation (SQL): A function that summarizes data (e.g., SUM, AVG, COUNT, MIN, MAX).
- HAVING (SQL): A clause that filters aggregated data.
- Window Function (SQL): A function that performs a calculation across a set of table rows that are related to the current row.
- OVER (SQL): A clause that specifies the window for a window function.
- PARTITION BY (SQL): A clause that divides the rows into partitions for window functions.
- ORDER BY (SQL): A clause that specifies the order of rows within a window function.
- ROWS BETWEEN (SQL): A clause that defines the boundaries of a window.
- Numbering Functions (SQL): Window functions that assign numbers to rows based on specified criteria (ROW_NUMBER, RANK, DENSE_RANK, NTILE).
- ROW_NUMBER() (SQL): Assigns a unique sequential integer to each row within a partition.
- RANK() (SQL): Assigns a rank to each row within a partition based on the order of the rows. Rows with equal values receive the same rank, and the next rank is skipped.
- DENSE_RANK() (SQL): Similar to RANK(), but assigns consecutive ranks without skipping.
- NTILE(n) (SQL): Divides the rows within a partition into ‘n’ approximately equal groups, assigning a bucket number to each row.
- Common Table Expression (CTE): A named temporary result set defined within a SELECT, INSERT, UPDATE, or DELETE statement.
SQL and BigQuery: A Comprehensive Guide

Okay, I will analyze the provided text excerpts and create a briefing document that summarizes the main themes, important ideas, and key facts related to SQL and BigQuery. I will include quotes from the original source where appropriate.

Briefing Document: SQL and BigQuery Fundamentals

Overview:

This document summarizes key concepts and functionalities of SQL, specifically within the context of BigQuery. The material covers data organization, query structure, data manipulation, and advanced techniques like window functions and common table expressions. The focus is on understanding the logical order of operations within SQL queries and using this understanding to write efficient and effective code.

1. Data Organization in BigQuery:
- Tables: Data is stored in tables, which consist of rows and columns, similar to spreadsheets.
- “Data in BigQuery and in SQL in general exists in the form of tables and a table looks just like this… it is a collection of rows and columns and it is quite similar to a spreadsheet…”
- Datasets: Tables are organized into datasets, analogous to folders in a file system.
- “In order to organize our tables we use data sets… a data set is just that it’s a collection of tables and it’s similar to how a folder works in a file system.”
- Projects: Datasets belong to projects. BigQuery allows querying data from other projects, including public datasets.
- “In BigQuery each data set belongs to a project… in Big Query I’m not limited to working with data that leaves in my project I could also from within my project query data that leaves in another project for example the bigquery public data is a project that is not mine…”
2. Basic SQL Query Structure:
- Statements: A complete SQL instruction, defining data retrieval and processing.
- “This is a SQL statement it is like a complete sentence in the SQL language. The statement defines where we want to get our data from and how we want to receive these data including any processing that we want to apply to it…”
- Clauses: Building blocks of SQL statements (e.g., SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT).
- “The statement is made up of building block blocks which we call Clauses and in this statement we have a clause for every line… the Clauses that we see here are select from where Group by having order and limit…”
- Importance of Data Types: Columns have defined data types which dictates the operations that can be performed. SQL tables can be clearly connected with each other.
- “You create a table and when creating that table you define the schema the schema is the list of columns and their names and their data types you then insert data into this table and finally you have a way to define how the tables are connected with each other…”
3. Key SQL Concepts:
- Cost Consideration: BigQuery charges based on the amount of data scanned by a query. Monitoring query size is crucial.
- “This query will process 1 kilobyte when run so this is very important because here big query is telling you how much data will be scanned in order to give you the results of this query… the amount of data that scanned by the query is the primary determinant of bigquery costs.”
- Arithmetic Operations: SQL supports combining columns and constants using arithmetic operators and functions.
- “We are able to combine columns and constants with any sort of arithmetic operations. Another very powerful thing that SQL can do is to apply functions and a function is a prepackaged piece of logic that you can apply to our data…”
- Aliases: Using aliases (AS) to rename columns or tables for clarity and brevity.
- Boolean Algebra in WHERE Clause: The WHERE clause uses Boolean logic (AND, OR, NOT) to filter rows based on conditions. Truth tables help understand operator behavior.
- “The way that these logical statements work is through something called Boolean algebra which is an essential theory for working with SQL… though the name may sound a bit scary it is really easy to understand the fundamentals of Boolean algebra now…”
- Set Operators (UNION, INTERSECT, EXCEPT): Combining the results of multiple queries using set operations. UNION combines rows, INTERSECT returns common rows, and EXCEPT returns rows present in the first table but not the second. UNION DISTINCT removes duplicate rows, while UNION ALL keeps them.
- “This command is called Union and not like stack or or something else is is that this is a set terminology right this comes from the mathematical theory of sets… and unioning means combining the values of two sets…”
4. Advanced SQL Techniques:
- CASE WHEN Statements: Creating conditional logic to assign values based on specified conditions.
- “When this condition is true we want to return the value low which is a string a piece of text that says low… all of this that you see here this is the case Clause right or the case statement and all of this is basically defining a new column in my table…”
- Subqueries: Embedding queries within other queries to perform complex filtering or calculations. Correlated subqueries are slower as they need to be recomputed for each row.
- “SQL solves this query first gets the result and then plugs that result back back into the original query to get the data we need… on the right we have something that’s called a correlated subquery and on the left we Define this as uncor related subquery…”
- Common Table Expressions (CTEs): Defining temporary named result sets (tables) within a query for modularity and readability.
- JOIN Operations: Combining data from multiple tables based on related columns. Types include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
- “A full outer join is like an inner join plus a left join plus a right join…”.
- GROUP BY and Aggregation: Summarizing data by grouping rows based on one or more columns and applying aggregate functions (e.g., SUM, AVG, COUNT, MIN, MAX). The HAVING clause filters aggregated results.
- “Having you are free to write filters on aggregated values regardless of the columns that you are selecting…”.
- Window Functions: Performing calculations across a set of rows that are related to the current row without altering the table structure. They use the OVER() clause to define the window.
- “Window functions allow us to do computations and aggregations on multiple rows in that sense they are similar to what we have seen with aggregations and group bu the fundamental difference between grouping and window function is that grouping is fundamentally altering the structure of the table…”
- Numbering Functions (ROW_NUMBER, DENSE_RANK, RANK): Assigning sequential numbers or ranks to rows based on specified criteria.
- “Numbering functions are functions that we use in order to number the rows in our data according to our needs and there are several numbering functions but the three most important ones are without any doubt row number dense Rank and rank…”
5. Logical Order of SQL Operations:

The excerpts emphasize the importance of understanding the order in which SQL operations are performed. This order dictates which operations can “see” the results of previous operations. The general order is:
1. FROM (Source data)
2. WHERE (Filter rows)
3. GROUP BY (Aggregate into groups)
4. Aggregate Functions (Calculate aggregations within groups)
5. HAVING (Filter aggregated groups)
6. Window Functions (Calculate windowed aggregates)
7. SELECT (Choose columns and apply aliases)
8. DISTINCT (Remove duplicate rows)
9. UNION/INTERSECT/EXCEPT (Combine result sets)
10. ORDER BY (Sort results)
11. LIMIT (Restrict number of rows)
6. Postgress SQL Quirk

Integer Division: When dividing two integers postgress assumes that you you are doing integer Division and returns integer as well. To avoid it, at least one number needs to be floating point number.

Conclusion:

The provided text excerpts offer a comprehensive overview of SQL fundamentals and advanced techniques within BigQuery. A strong understanding of data organization, query structure, the logical order of operations, and the various functions and clauses available is crucial for writing efficient and effective SQL code. Mastering these concepts will enable users to extract valuable insights from their data and solve complex analytical problems.

BigQuery and SQL: Data Management, Queries, and Functions

FAQ on SQL and Data Management with BigQuery

1. How is data organized in BigQuery and SQL in general?

Data in BigQuery is organized in a hierarchical structure. At the lowest level, data resides in tables. Tables are collections of rows and columns, similar to spreadsheets. To organize tables, datasets are used, which are collections of tables, analogous to folders in a file system. Finally, datasets belong to projects, providing a top-level organizational unit. BigQuery also allows querying data from public projects, expanding access beyond a single project.

2. How does BigQuery handle costs and data limits?

BigQuery’s costs are primarily determined by the amount of data scanned by a query. Within the sandbox program, users can scan up to one terabyte of data each month for free. It’s important to check the amount of data that a query will process before running it, especially with large tables, to avoid unexpected charges. The query interface displays this information before execution.

3. What are the fundamental differences between SQL tables and spreadsheets?

While both spreadsheets and SQL tables store data in rows and columns, key differences exist. Spreadsheets are typically disconnected, whereas SQL provides mechanisms to define connections between tables. This allows relating data across multiple tables through defined schemas, specifying column names and data types. SQL also enforces a logical order of operations, which dictates the order in which the various parts of a query are executed.

4. How are calculations and functions used in SQL queries?

SQL allows performing calculations using columns and constants. Common arithmetic operations are supported, and functions, pre-packaged logic, can be applied to data. The order of operations in SQL follows standard arithmetic rules: brackets first, then functions, multiplication and division, and finally addition and subtraction.

5. What are Clauses in SQL, and how are they used?

SQL statements are constructed from building blocks known as Clauses. Key clauses include SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT. Clauses define where the data comes from, how it should be processed, and how the results should be presented. The clauses are assembled to form a complete SQL statement. The order in which you write the clauses is less important than the logical order in which they are executed, which is FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY and LIMIT.

6. How do the WHERE clause and Boolean algebra work together to filter data in SQL?

The WHERE clause is used to filter rows based on logical conditions. These conditions rely on Boolean algebra, which uses operators like NOT, AND, and OR to create complex expressions. Understanding the order of operations within Boolean algebra is crucial for writing effective WHERE clauses. NOT is evaluated first, then AND, and finally OR.

7. What are set operations in SQL, and how are they used?

SQL provides set operations like UNION, INTERSECT, and EXCEPT to combine or compare the results of multiple queries. UNION combines rows from two or more tables, with UNION DISTINCT removing duplicate rows and UNION ALL keeping all rows, including duplicates. INTERSECT DISTINCT returns only the rows that are common to both tables. EXCEPT DISTINCT returns rows from the first table that are not present in the second table.

8. How can window functions be used to perform calculations across rows without altering the structure of the table?

Window functions perform calculations across a set of table rows related to the current row, without grouping the rows like GROUP BY. They are defined using the OVER() clause, which specifies the window of rows used for the calculation. Window functions can perform aggregations, ordering, and numbering within the defined window, adding insights without collapsing the table’s structure. Numbering functions include ROW_NUMBER, RANK, and DENSE_RANK. Numbering functions are often used in conjunction with Partition By and Order By which can divide data into logical partitions in which to number results. Ranking functions, when used with PARTITION BY and ORDER BY can define a rank, for instance, for each race result, ordered fastest to slowest. They can then be further filtered with use of a CTE, a Common Table Expression.

SQL Data Types and Schemas

In SQL, a data model is defined by the name of columns and the data type that each column will contain.
- Definition: The schema of a table includes the name of each column in the table and the data type of each column. The data type of a column defines the type of operations that can be done to the column.
- Examples of data types:
- Integer: A whole number.
- Float: A floating point number.
- String: A piece of text.
- Boolean: A value that is either true or false.
- Timestamp: A value that represents a specific point in time.
- Interval: A data type that specifies a certain span of time.
- Data types and operations: Knowing the data types of columns is important because it allows you to know which operations can be applied. For example, you can perform mathematical operations such as multiplication or division on integers or floats. For strings, you can change the string to uppercase or lowercase. For timestamps, you can subtract a certain amount of time from that moment.
SQL Tables: Structure, Schema, and Operations

In SQL, data exists in the form of tables. Here’s what you need to know about SQL tables:
- StructureA table is a collection of rows and columns, similar to a spreadsheet.
- Each row represents an entry, and each column represents an attribute of that entry. For example, in a table of fantasy characters, each row may represent a character, and each column may represent information about them such as their ID, name, class, or level.
- SchemaEach SQL table has a schema that defines the columns of the table and the data type of each column.
- The schema is assumed as a given when working in SQL and is assumed not to change over time.
- OrganizationIn SQL, tables are organized into data sets.
- A data set is a collection of tables and is similar to a folder in a file system.
- In BigQuery, each data set belongs to a project.
- Table IDThe table ID represents the full address of the table.
- The address is made up of three components: the ID of the project, the data set that contains the table, and the name of the table.
- Connections between tablesSQL allows you to define connections between tables.
- Tables can be connected with each other through arrows. These connections indicate that one of the tables contains a column with the same data as a column in another table, and that the tables can be joined using those columns to combine data.
- Table operations and clausesFROM: indicates the table from which to retrieve data.
- SELECT: specifies the columns to retrieve from the table.
- WHERE: filters rows based on specified conditions.
- DISTINCT: removes duplicate rows from the result set.
- UNION: stacks the results from multiple tables.
- ORDER BY: sorts the result set based on specified columns.
- LIMIT: limits the number of rows returned by the query.
- JOIN: combines rows from two or more tables based on a related column.
- GROUP BY: groups rows with the same values in specified columns into summary rows.
SQL Statements: Structure, Clauses, and Operations

Here’s what the sources say about SQL statements:

General Information
- In SQL, a statement is like a complete sentence that defines where to get data and how to receive it, including any processing to apply.
- A statement is made up of building blocks called clauses.
- Query statements allow for retrieving, analyzing, and transforming data.
- In this course, the focus is exclusively on query statements.
Components and Structure
- Clauses are assembled to build statements.
- There is a specific order to writing clauses; writing them in the wrong order will result in an error.
- Common clauses include SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT.
Order of Execution
- The order in which clauses are written (lexical order) is not the same as the order in which they are executed (logical order).
- The logical order of execution is FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, and finally LIMIT.
- The actual order of execution (effective order) may differ from the logical order due to optimizations made by the SQL engine. The course focuses on mastering the lexical order and the logical order.
Clauses and their Function
- FROM: Specifies the table from which to retrieve the data. It is always the first component in the logical order of operations because you need to source the data before you can work with it.
- SELECT: Specifies which columns of the table to retrieve. It allows you to get any columns from the table in any order. You can also use it to rename columns, define constant columns, combine columns in calculations, and apply functions.
- WHERE: Filters rows based on specified conditions. It follows right after the FROM clause in the logical order. The WHERE clause can reference columns of the tables, operations on columns, and combinations between columns.
- DISTINCT: removes duplicate rows from the result set.
Combining statements
- UNION allows you to stack the results from two or more tables. In BigQuery, you must specify UNION ALL to include duplicate rows or UNION DISTINCT to only include unique rows.
- INTERSECT returns only the rows that are shared between two tables.
- EXCEPT returns all of the elements in one table except those that are shared with another table.
- For UNION, INTERSECT, and EXCEPT, the tables must have the same number of columns, and the columns must have the same data types.
Subqueries
- Subqueries are nested queries used to perform complex tasks that cannot be done with a single query.
- A subquery is a piece of SQL logic that returns a table.
- Subqueries can be used in the FROM clause instead of a table name.
Common Table Expressions (CTEs)
- CTEs are virtual tables defined within a query that can be used to simplify complex queries and improve readability.
- CTEs are defined using the WITH keyword, followed by the name of the table and the query that defines it.
- CTEs can be used to build data pipelines within SQL code.
SQL Logical Order of Operations

Here’s what the sources say about the logical order of operations in SQL:

Basics
- The order in which clauses are written (lexical order) is not the order in which they are executed (logical order).
- Understanding the logical order is crucial for accelerating learning SQL.
- The logical order helps in building a powerful mental model of SQL that allows tackling complex and tricky problems.
The Logical Order
- The logical order of execution is: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, and finally LIMIT.
- The JOIN clause is not really separate from the FROM clause; they are the same component in the logical order of operations.
Rules for Understanding the Schema
- Operations are executed sequentially from left to right.
- Each operation can only use data that was produced by operations that came before it.
- Each operation cannot know anything about data that is produced by operations that follow it.
Implications of the Logical Order
- FROM is the very first component in the logical order of operations because the data must be sourced before it can be processed. The FROM clause specifies the table from which to retrieve the data. The JOIN clause is part of this step, as it defines how tables are combined to form the data source.
- WHERE Clause follows right after the FROM Clause. After sourcing the data, the next logical step is to filter the rows that are not needed. The WHERE clause drops all the rows that are not needed, so the table becomes smaller and easier to deal with.
- GROUP BY fundamentally alters the structure of the table. The GROUP BY operation compresses down the values; in the grouping field, a single row will appear for each distinct value, and in the aggregate field, the values will be compressed or squished down to a single value as well.
- SELECT determines which columns to retrieve from the table. The SELECT clause is where new columns are defined.
- ORDER BY sorts the result of the query. Because the ordering occurs so late in the process, SQL knows the final list of rows that will be included in the results, which is the right moment to order those rows.
- LIMIT is the very last operation. After all the logic of the query is executed and all data is computed, the LIMIT clause restricts the number of rows that are output.
Window Functions and the Logical Order
- Window functions operate on the result of the GROUP BY clause, if present; otherwise, they operate on the data after the WHERE filter is applied.
- After applying the window function, the SELECT clause is used to choose which columns to show and to label them.
Common Errors
- A common error is to try to use LIMIT to make a query cheaper. The LIMIT clause does not reduce the amount of data that is scanned; it only limits the number of rows that are returned.
- Another common error is to violate the logical order of operations. For example, you cannot use a column alias defined in the SELECT clause in the WHERE clause because the WHERE clause is executed before the SELECT clause.
- In Postgres, you cannot use the labels that you assign to aggregations in the HAVING clause.
Boolean Algebra: Concepts, Operators, and SQL Application

Here’s what the sources say about Boolean algebra:

Basics
- Boolean algebra is essential for working with SQL and other programming languages.
- It is fundamental to how computers work.
- It is a simple way to understand the fundamentals.
Elements
- In Boolean algebra, there are only two elements: true and false.
- A Boolean field in SQL is a column that can only have these two values.
Operators
- Boolean algebra has operators that transform elements.
- The three most important operators are NOT, AND, and OR.
Operations and Truth Tables
- In Boolean algebra, operations combine operators and elements and return elements.
- To understand how a Boolean operator works, you have to look at its truth table.
NOT Operator
- The NOT operator works on a single element, such as NOT TRUE or NOT FALSE.
- The negation of p is the opposite value.
- NOT TRUE is FALSE
- NOT FALSE is TRUE
AND Operator
- The AND operator connects two elements, such as TRUE AND FALSE.
- If both elements are true, then the AND operator will return true; otherwise, it returns false.
OR Operator
- The OR operator combines two elements.
- If at least one of the two elements is true, then the OR operator returns true; only if both elements are false does it return false.
Order of Operations
- There is an agreed-upon order of operations that helps solve complex expressions.
- The order of operations is:
1. Brackets (solve the innermost brackets first)
2. NOT
3. AND
4. OR
Application in SQL
- A complex logical statement that is plugged into the WHERE filter isolates only certain rows.
- SQL converts statements in the WHERE filter to true or false, using values from a row.
- SQL uses Boolean algebra rules to compute a final result, which is either true or false.
- If the result computes as true for the row, then the row is kept; otherwise, the row is discarded.
Example

To solve a complex expression, such as NOT (TRUE OR FALSE) AND (FALSE OR TRUE), proceed step by step:
1. Solve the innermost brackets:
- TRUE OR FALSE is TRUE
- FALSE OR TRUE is TRUE
1. The expression becomes: NOT (TRUE) AND (TRUE)
2. Solve the NOT:
- NOT (TRUE) is FALSE
1. The expression becomes: FALSE AND TRUE
2. Solve the AND:
- FALSE AND TRUE is FALSE
1. The final result is FALSE
Intuitive SQL For Data Analytics – Tutorial

Data Analytics FULL Course for Beginners to Pro in 29 HOURS – 2025 Edition

The Original Text

learn SQL for analytics Vlad is a data engineer and in this course he covers both the theory and the practice so you can confidently solve hard SQL challenges on your own no previous experience required and you’ll do everything in your browser using big query hi everyone my name is Vlad and I’m a date engineer welcome to intuitive SQL for analytics this here is the main web page for the course you will find it in the video description and this will get updated over time with links and resources so be sure to bookmark it now the goal of this course is to quickly enable you to use SQL to analyze and manipulate data this is arguably the most important use case for SQL and the Practical objective is that by the end of this course you should be able to confidently solve hard SQL problems of the kind that are suggested during data interviews the course assumes no previous knowledge of SQL or programming although it will be helpful if you’ve work with spreadsheets such as Microsoft Excel or Google Sheets because there’s a lot of analogies between manipulating data in spreadsheets and doing it in SQL and I also like to use spreadsheets to explain SQL Concepts now there are two parts to this course theory and practice the theory part is a series of short and sweet explainers about the fundamental concepts in SQL and for this part we will use Google bigquery bigquery which you can see here is a Google service that allows you to upload your own data and run SQL on top of it so in the course I will teach you how to do that and how to do it for free you won’t have to to spend anything and then we will load our data and we will run SQL code and besides this there will be drawings and we will also be working with spreadsheets and anything it takes to make the SQL Concepts as simple and understandable as possible the practice part involves doing SQL exercises and for this purpose I recommend this website postest SQL exercises this is a free and open-source website where you will find plenty of exercises and you will be able to run SQL code to solve these exercises check your answer and then see a suggested way to do it so I will encourage you to go here and attempt to solve these exercises on your own however I have also solved 42 of these exercises the most important ones and I have filmed explainers where I solve the exercise break it apart and then connect it to the concepts of the course so after you’ve attempted the exercise you will be able to see me solving it and connect it to the rest of the course so how should you take this course there are actually many ways to do it and you’re free to choose the one that works best if you are a total beginner I recommend doing the following you should watch the theory lectures and try to understand everything and then once you are ready you should attempt to do the exercises on your own on the exercise uh website that I’ve shown you here and if you get stuck or after you’re done you can Watch How I solved the exercise but like I said this is just a suggestion and uh you can combine theory and practice as you wish and for example a more aggressive way of doing this course would be to jump straight into the exercises and try to do them and every time that you are stuck you can actually go to my video and see how I solved the exercise and then if you struggle to understand the solution that means that maybe there’s a theoretical Gap and then you can go to the theory and see how the fundamental concepts work so feel free to experiment and find the way that works best for you now let us take a quick look at the syllabus for the course so one uh getting started this is a super short explainer on what SQL actually is and then I teach you how to set up bigquery the Google service where we will load our data and run SQL for the theory part the second uh chapter writing your first query so here I explained to you how big query works and how you can use it um and how you are able to take your own data and load it in big query so you can run SQL on top of it and at the end of it we finally run our first SQL query chapter 3 is about exploring some ESS IAL SQL Concepts so this is a short explainer of how data is organized in SQL how the SQL statement Works meaning how we write code in SQL and here is actually the most important concept of the whole course the order of SQL operations this is something that is not usually taught properly and a lot of beginners Miss and this causes a lot of trouble when you’re you’re trying to work with SQL so once you learn this from the start you will be empowered to progress much faster in your SQL knowledge and then finally we get into the meat of the course this is where we learn all the different components in SQL how they work and how to combine them together so this happens in a few phases in the first phase we look at the basic components of SQL so these are uh there’s a few of them uh there’s select and from uh there’s learning how to transform columns the wear filter the distinct Union order by limit and then finally we see how to do simple aggregations at the end of this part you will be empowered to do the first batch of exercises um don’t worry about the fact that there’s no links yet I will I will add them but this is basically involves going to this post SQL exercises website and going here and doing this uh first batch of exercises and like I said before after you’ve done the exercises you can watch the video of me also solving them and breaking them down next we take a look at complex queries and this involves learning about subqueries and Common Table expressions and then we look at joining tables so here is where we understand how SQL tables are connected uh with each other and how we can use different types of joints to bring them together and then you are ready for the second batch of exercises which are those that involve joints and subqueries and here there are eight exercises the next step is learning about aggregations in SQL so this involves the group bu the having and window functions and then finally you are ready for the final batch of exercises which actually bring together all the concepts that we’ve learned in this course and these are 22 exercises and like before for each exercise you have a video for me solving it and breaking it apart and then finally we have the conclusion in the conclusion we see how we can put all of this knowledge together and then we take a look at how to use this knowledge to actually go out there and solve SQL challenges such as the ones that are done in data interviews and then here you’ll find uh all the resources that are connected to the course so you have the files with our data you have the link to the spreadsheet that we will use the exercises and all the drawings that we will do this will definitely evolve over over time as the course evolves so bookmark this page and keep an eye on it that was that was all you needed to know to get started so I will see you in the course if you are working with SQL or you are planning to work with SQL you’re certainly a great company in the 2023 developer survey by stack Overflow there is a ranking of the most popular Technologies out there if we look at professional developers where we have almost 70,000 responses we can see that SQL is ranked as the third most popular technology SQL is certainly one of the most in demand skills out there not just for developers but for anyone who works with data in any capacity and in this course I’m going to help you learn SQL the way I wish I would have learned it when I started out on my journey since this is a practical course we won’t go too deep into the theory all you need to know for our purposes is that SQL is a language for working with data like most languages SQL has several dialects you may have heard of post SQL or my sqil for example you don’t need to worry about these dialects because they’re all very similar so if you learn SQL in any one of the dialects you’ll do well on all the others in this course we will be working with B query and thus we will write SQL in the Google SQL dialect here is the documentation for Google big query the service that we will use to write SQL code in this course you can see that big query uses Google SQL a dialect of SQL which is an compliant an compliant means that Google SQL respects the generally recognized standard for creating SQL dialects and so it is highly compatible with with all other common SQL dialects as you can read here Google SQL supports many types of statements and statements are the building blocks that we use in order to get work done with SQL and there are several types of statements listed here for example query statements allow us to retrieve and analyze and transform data data definition language statements allow us to create and modify database objects such as tables and Views whereas data manipulation language statements allows us to update and insert and delete data from our tables now in this course we focus exclusively on query statements statements that allow us to retrieve and process data and the reason for this is that if you’re going to start working with big query you will most likely start working with this family of statements furthermore query statements are in a sense the foundation for all other families of statements so if you understand uh query statements you’ll have no trouble learning the others on your own why did I pick big query for this course I believe that the best way to learn is to load your own data and follow questions that interest you and play around with your own projects and P query is a great tool to do just that first of all it is free at least for the purposes of learning and for the purposes of this course it has a great interface that will give you U really good insights into your data and most importantly it is really easy to get started you don’t have to install anything on your computer you don’t have to deal with complex software you just sign up for Google cloud and you’re ready to go and finally as you will see next big query gives you many ways to load your own data easily and quickly and get started writing SQL right away I will now show you how you can sign up for Google cloud and get started with bigquery so it all starts with this link which I will share in the resources and this is the homepage of Google cloud and if you don’t have an account with Google Cloud you can go here and select sign in and here you need to sign in with your Google account which you probably have but if you don’t you can go here and select create account so I have now signed in with my Google account which you can see here in the upper right corner and now I get a button that says start free so I’m going to click that and now I get taken to this page and on the right you see that the first time you sign up for Google Cloud you get $300 of free credits so that you can try the services and that’s pretty neat and here I have to enter some extra information about myself so I will keep it as is and agree to the terms of service and continue finally I need to do the payment information verification so unfortunately this is something I need to do even though I’m not going to be charged for the services and this is for Google to be able to verify my my identity so I will pick individual as account type and insert my address and finally I need to add a payment method and again uh I need to do this even though I’m not going to pay I will actually not do it here because I don’t intend to sign up but after you are done you can click Start my free trial and then you should be good to go now your interface may look a bit different but essentially after you’ve signed up for Google Cloud you will need to create a project and the project is a tool that organizes all your work in Google cloud and essentially every work that you do in Google cloud has to happen inside a specific project now as you can see here there is a limited quota of projects but that’s not an issue because we will only need one project to work in this course and of course creating a new project is totally free so I will go ahead and give it a name and I don’t need any organization and I will simply click on create once that’s done I can go back back to the homepage for Google cloud and here as you can see I can select a project and here I find the project that I have created before and once I select it the rest of the page won’t change but you will see the name of the project in the upper bar here now although I’ve created this project as an example for you for the rest of the course you will see me working within this other project which was the one that I had originally now I will show you how you can avoid paying for Google cloud services if you don’t want to so from the homepage you have the search bar over here and you can go here and write billing and click payment overview to go to the billing service now here on the left you will see your billing account account which could be called like this or have another name and clicking here I can go to manage billing accounts now here I can go to my projects Tab and I see a list of all of my projects in Google cloud and a project might or might not be connected to a billing account if a project is not connected to a billing account then then Google won’t be able to charge you for this project although keep in mind that if you link your project with a billing account and then you incur some expenses if you then remove the billing account you will still owe Google Cloud for those uh expenses so what I can do here is go to my projects and on actions I can select disabled building in case I have a billing account connected now while this is probably the shest way to avoid incurring any charges you will see that you will be severely limited in what you can do in your project if that project is not linked to any billing account however you should still be able to do most of what you need to do in B query at least for this course and we can get more insight into how that works by by going to the big query pricing table so this page gives us an overview of how pricing works for big query I will not analyze this in depth but what you need to know is that when you work with bigquery you can fundamentally be charged for two things one is compute pricing and this basically means all the data that bigquery scans in order to return the results that you need when you write your query and then you have storage pricing which is the what you pay in order to store your data inside bigquery now if I click on compute pricing I will go to the pricing table and here you can select the region that uh most reflects where you are located and I have selected Europe here and as you can see you are charged $625 at the time of this video for scanning a terabyte of data however the first terabyte per month is free so every month you can write queries that scan one terabyte of data and not pay for them and as you will see more in detail this is more than enough for what we will be doing in this course and also for for what you’ll be doing on your own in order to experiment with SQL and if I go back to the top of the page and then click on storage pricing you can see here that again you can select your region and see um several pricing uh units but here you can see that the first 10 gab of storage per month is free so you can put up to 10 gigabytes of data in B query and you won’t need a billing account you won’t pay for storage and this is more than enough for our needs in order to learn SQL in short bigquery gives us a pretty generous free allowance for us to load data and play with it and we should be fine however I do urge you to come back to this page and read it again because things may have changed since I recorded this video video to summarize go to the billing service check out your billing account and you have the option to decouple your project from the billing account to avoid incurring any charges and you should still be able to use B query but as a disclaimer I cannot guarantee that things will work just the same uh at the time that you are watching this video so be sure to check the documentation or maybe discuss with Google Cloud support to um avoid incurring any unexpected expenses please do your research and be careful in your usage of these services for this course I have created an imaginary data set with the help of chat GPT the data set is about a group of fantasy characters as well as their items and inventories I then proceed proed to load this data into bigquery which is our SQL system I also loaded it into Google Sheets which is a spreadsheet system similar to Microsoft Excel this will allow me to manipulate the data visually and help you develop a strong intuition about SQL operations I’m going to link a separate video which explains how you can also use chat PT to generate imaginary data according to your needs and then load this data in Google Sheets or bigquery I will also link the files for this data in the description which you can use to reproduce this data on your side next I will show you how we can load the data for this course into bigquery so I’m on the homepage of Google cloud and I have a search bar up here and I can write big query and select it from here and this will take me to the big query page now there is a panel on the left side that appears here if I hover or it could be fixed and this is showing you several tools that you can use within bigquery and you can see that we are in the SQL workspace and this is actually the only tool that we will need for this course so if you if you’re seeing this panel on the left I recommend going to this arrow in the upper left corner and clicking it so you can disable it and make more room for yourself now I want to draw your attention to the Explorer tab which shows us where our data is and how it is organized so I’m going to expand it here now data in bigquery and in SQL in general exists in the form of tables and a table looks just like this as you can see here the customer’s table it is a collection of rows and columns and it is quite similar to a spreadsheet so this will be familiar to you if you’ve ever worked with Microsoft Excel or Google Sheets or any spreadsheet program so your data is actually living in a table and you could have as many tables as you need in B query there could be quite a lot of them so in order to organize our tables we use data sets for example in this case my data is a data set which contains the table customers and employee data and a data set is is just that it’s a collection of tables and it’s similar to how a folder Works in a file setem system it is like a for folder for tables finally in bigquery each data set belongs to a project so you can see here that we have two data sets SQL course and my data and they both belong to this project idelic physics and so on and this is actually the ID of my project this is the ID of the project that I’m working in right now the reason the Explorer tab shows the project as well is that in big query I’m not limited to working with data that leaves in my project I could also from within my project query data that leaves in another project for example the bigquery public data is a project that is not mine but it’s actually a public project by bigquery and if I expand this you will see that it contains a collection of of several data sets which are in themselves um collections of tables and I would be able to query these uh tables as well but you don’t need to worry about that now because in this course we will only focus on our own data that lives in our own project so this in short is how data is organized in big query now for the purpose of this course I recommend creating a new data set so so that our tables can be neatly organized and to do that I can click the three dots next to the project uh ID over here and select create data set and here I need to pick a name for the data set so I will call this fantasy and I suggest you use the same name because if you do then the code that I share with you will work immediately then as for the location you can select the multi region and choose the region that is closest to you and finally click on create data set so now the data set fantasy has been created and if I try to expand it here I will see that it is empty because I haven’t loaded any data yet the next step is to load our tables so I assume that you have downloaded the zip file with the tables and extracted it on your local computer and then we can select the action point here next to the fantasy data set and select create table now as a source I will select upload and here I will click on browse and access the files that I have downloaded and I will select the first table here here which is the characters table the file format is CSV so Google has already understood that and scrolling down here I need to choose a name for my table so I will call it just like the file uh which is characters and very important under schema I need to select autodetect and we will see what this means in a bit but basically this is all we need so now I will select create table and now you will see that the characters table has appeared under the fantasy data set and if I click on the table and then go on preview I will should be able to see my data I will now do the same for the other two tables so again create table source is upload file is inventory repeat the name and select autod detect and I have done the same with the third table so at the end of this exercise the fantasy data set should have three tables and you can select them and go on preview to make sure that the data looks as expected now our data is fully loaded and we are ready to start querying it within big query now let’s take a look at how the bigquery interface works so on the left here you can see the Explorer which shows all the data that I have access to and so to get a table in big query first of all you open the name of the project and then you look at the data sets that are available within this project you open a data set and finally you see a table such as characters and if I click now on characters I will open the table view now in the table view I will find a lot of important information about my table in these tabs over here so let’s look at the first tab schema the schema tab shows me the structure of my table which as we shall see is very important and the schema is defined essentially by two things the name of each column in my table and the data type of each column so here we see that the characters table contains a few columns such as ID name Guild class and so on and these columns have different data types for example ID is an integer which means that it contains natural numbers whereas name is string which means that it contains text and as we shall see the schema is very important because it defines what you can do with the table and next we have the details tab which contains a few things first of all is the table ID and this ID represents the full address of the table and this address is made up of three components first of all you have the ID of the project which is as you can see the project in which I’m working and it’s the same that you see here on the left in the Explorer tab the next component is the data set that contains the table and again you see it in the Explorer Tab and finally you have the name of the table this address is important because it’s what we use to reference the table and it’s what we use to get data from this table and then we see a few more things about the table such as when it was created when it was last modified and here we can see the storage information so we can see here that this table has 15 rows and on the dis it occupies approximately one kilobyte if you work extensively with P query this information will be important for two reasons number one it defines how much you are paying every month to store this table and number two it defines how much you would pay for a query that scans all the data in this table and as we have seen in the lecture on bigquery pricing these are the two determinants of bigquery costs however for the purpose of this course you don’t need to worry about this because the tables we are working with are so small that they won’t put a dent in your free month monthly allowance for using big query next we have the preview tab which is really cool to get a sense of the data and this basically shows you a graphical representation of your table and as you will notice it looks very similar to a spreadsheet so you can see our columns the same ones that we saw in the schema tab ID name Guild and so on and as you remember we saw that ID is an integer column so you can only contain numbers name is a text column and then you see that this table has 15 rows and because it’s such a small table all of it fits into this graphical representation but in the real world you may have tables with millions of rows and in this case the preview will show you only a small portion of that table table but still enough to get a good sense of the data now there are a few more tabs in the table view we have lineage data profile data quality but I’m not going to look at them now because they are like Advanced features in bigquery and you won’t need them in this course instead I will run a very basic query on this table and this is not for the purpose of understanding query that will come soon it is for the purpose of showing you what the interface looks like after you run a query so I have a very basic query here that will run on my table and you can see that the interface is telling me how much data this query will process and this is important because this is the main determinant of cost in bigquery every query scans a certain amount of data and you have to pay for that but as we saw in the lecture of bigquery pricing this table is so small that you could run a million or more of these queries and not exhaust your monthly allowance so if you see 1 kilobyte you don’t have to worry about that so now I will click run and my query will execute and here I get the query results view this is the view that that appears after you have successfully run a query so we have a few tabs here and the first step that you see is results and this shows you graphically the table that was returned by your query so as we shall see every query in SQL runs on a table and returns a table and just like the preview tab showed you a graphical view of your table the results tab shows you a graphical view of the table that your query has returned and this is really the only tab in the query results view that you will need on this course the other ones show different features or more advanced features that we won’t look at but feel free to explore them on your own if you are curious but what’s also important in this view is this button over here save results which you can use to EXP report the result of your query towards several different destinations such as Google drive or local files on your computer in different formats or another big query table a spreadsheet in Google Sheets or even copying them to your clipboard so that you can paste them somewhere else but we shall discuss this more in detail in the lecture on getting data in and out of big query finally if you click on this little keyboard icon up here you can see a list of shortcuts that you can use in the big query interface and if you end up running a lot of queries and you want to be fast this is a nice way to improve your experience with big query so be sure to check these out we are finally ready to write our first query and in the process we will keep exploring the Fantastic bigquery interface so one way to get started would be to click this plus symbol over here so that we can open a new tab now to write the query the first thing I will do is to tell big query where the data that I want leaves and to do that I will use the from Clause so I will simply write from and my data lives in the fantasy data set and in the characters table next I will tell SQL what data I actually want from this table and the simplest thing to ask for is to get all the data and I can do this by writing select star now my query is ready and I can either click run up here or I can press command enter on my Mac keyboard and the query will run and here I get a new tab which shows me the results now the results here are displayed as a table just as uh we saw in the preview tab of the table and I can get an idea of uh my results and this is actually the whole table because this is what I asked for in the query there are also other ways to see the results which are provided by bigquery such as Json which shows the same data but in a different format but we’re not going to be looking into that for this course one cool option that the interface provides is if I click on this Arrow right here in my tab I can select split tab to right and now I have a bit of less room in my interface but I am seeing the table on the left and the query on the right so that I can look at the structure of the table while writing my query for example if I click on schema here I could see which columns I’m able to um reference in my query and that can be pretty handy I could also click this toggle to close the Explorer tab temporarily if I don’t need to look look at those tables so I can make a bit more room or I can reactivate it when needed I will now close this tab over here go back to the characters table and show you another way that I can write a query which is to use this query command over here so if I click here I can select whether I want my query in a new tab or in a split tab let let me say in new tab and now bigquery has helpfully uh written a temp template for a query that I can easily modify in order to get my data and to break down this template as you can see we have the select Clause that we used before we have the from clause and then we have a new one called limit now the from Clause is doing the same job as before it is telling query where we want to get our data but you will notice that the address looks a bit different from the one that I had used specifically I used the address fantasy. characters so what’s happening here is that fantasy. characters is a useful shorthand for the actual address of the table and what we see here that big query provided is the actual full address of the table or in other words it is the table ID and as you remember the table ID indicates the project ID the data set name and the table name and importantly this ID is usually enclosed by back ticks which are a quite specific character long story short if you want to be 100% sure you can use the full address of the table and bigquery will provide it for you but if you are working within the same project where the data lives so you don’t need to reference the project you can also use this shorthand here to make writing the address easier and in this course I will use these two ways to reference a table interchangeably I will now keep the address that bigquery provided now the limit statement as we will see is simply limiting the number of rows that will be returned by this query no more than 1,000 rows will be returned and next to the select we have to say what data we want to get from this table and like before I can write star and now my query will be complete before we run our query I want to draw your attention to this message over here this query will process 1 kilobyte when run so this is very important because here big query is telling you how much data will be scanned in order to give you the results of this query in this case we are returning um all the data in the table therefore all of the table will be scanned and actually limit does not have any influence on that it doesn’t reduce how much data is scanned so this query will scan 1 kilobyte of data and the amount of data that scanned by the query is the primary determinant of bigquery costs now as you remember we are able to scan up to one terabyte of data each month within the sandbox program and if we wanted to scan more data then we would have to pay so the question is how many of these queries could we run before running out of our free allowance well to answer that we could check how many kilobytes are in a terabyte and if you Google this the conversion says it’s one to um multipli by 10 to the power of 9 which ends up being 1 billion therefore we can run 1 billion of these queries each month before running out of our allowance now you understand why I’ve told you that as long as you work with small tables you won’t really run out of your allowance and you don’t really have to worry about costs however here’s an example of a query that will scan a large amount of data and what I’ve done here is I’ve taken one of the public tables provided by big query which I’ve seen to be quite large and I have told big query to get me all the data for this table and as you can see here big query says that 120 gabt of data will be processed once this query runs now you would need about eight of these queries to get over your free allowance and if you had connected to B query you could also be charged money for any extra work that you do so be very careful about this and if you work with large tables always check this message over here before running the query and remember you won’t actually be charged until you actually hit run on the query and there you have it we learned how the big query interface works and wrote our first SQL query it is important that we understand how data is organized in SQL so we’ve already seen a a preview of the characters table and we’ve said that this is quite similar to how you would see data in a spreadsheet namely you have a table which is a collection of rows and columns and then in this case on every row you have a character and for every character you have a number of information points such as their ID their name their class level and so on the first fundamental difference with the spreadsheet is that if I want to have some data in a spreadsheet I can just open a new one and uh insert some data in here right so ID level name and so on then I could say that I have a character id one who is level 10 and his name is Gandalf and this looks like the data I have in SQL and I can add some more data as well well a new character id 2 level five and the name is frao now I will save this spreadsheet and then some days later someone else comes in let’s say a colleague and they want to add some new data and they say oh ID uh is unknown level is um 20.3 and the name here and then I also want to uh show their class so I will just add another column here and call this Mage now spreadsheets are of course extremely flexible because you can always um add another column and write in more cells and you can basically write wherever you want but this flexibility comes at a price because the more additions we make to this uh to the data model that is represented here the more complex it will get with time and the more likely it will be that we make confusions or mistakes which is what actually happens in real life when people work with spreadsheets SQL takes a different approach in SQL before we insert any actual data we have to agree on the data model that we are going to use and the data model is essentially defined by two elements the name of our columns and the data type that each column will contain for example we can agree that we will use three columns in our table ID level and name and then we can agree that ID will be an integer meaning that it will contain contain whole numbers level will be a integer as well and name will be a string meaning that it contains text now that we’ve agreed on this structure we can start inserting data on the table and we have a guarantee that the structure will not change with time and so any queries that we write on top of this table any sort of analysis that we create for this table will also be durable in time because it will have the guarantee that the data model of the table will not change and then if someone else comes in and wants to insert this row they will actually not be allowed to first of all because they are trying to insert text into an integer column and so they’re violating the data type of the column and they are not allowed to do that in level they are also violating the data type of the column because this column only accepts whole numbers and they’re trying to put a floating Point number in there and then finally there are also violating the column definition because they’re they’re trying to add a column class that was not actually included in our data model and that we didn’t agree on so the most important difference between spreadsheets and SQL is that for each SQL table you have a schema and as we’ve seen before the schema defines exactly which columns our table has and what is the data type of each column so in this case for the characters table we have several columns uh and here we can see their names and then each column has a specific data types and all the most important data types are actually represented here specifically by integer we mean a whole number and by float we mean a floating Point number string is a piece of text Boolean is a value that is either true or false and time stamp is a value that represents a specific point in time all of this information so the number of columns the name of each column and the type of each column they constitute the schema of the table and like we’ve said the schema is as assumed as a given when working in SQL and it is assumed that will not change over time now in special circumstances there are ways to alter the schema of a table but it is generally assumed as a given when writing queries and we shall do the same in this course and why is it important to keep track of the data type why is it important to distinguish between integer string Boolean the simple answer is that the data type defines the type of operations that you you can do to a column for example if you have an integer or a float you can multiply the value by two or divide it and so on if you have a string you can turn that string to uppercase or lowercase if you have a time stamp you can subtract 30 days from that specific moment in time and so on so by looking at the data type you can find out what type of work you can do with a column the second fundamental difference from spreadsheets is that spreadsheets are usually disconnected but SQL has a way to define connections between tables so what we see here is a representation of our three tables and for each table you can see the schema meaning the list of columns and their types but the extra information that we have here is the connection between the tables so you can see that the inventory table is connected to the items table and also to the character table moreover the characters table is connected with itself now we’re not going to explore this in depth now because I don’t want to add too much Theory we will see this in detail in the chapter on joints but it is a fundamental difference from spreadsheets that SQL tables can be clearly connected with each other and that’s basically all you need to understand how data is organized in SQL for now you create a table and when creating that table you define the schema the schema is the list of columns and their names and their data types you then insert data into this table and finally you have a way to define how the tables are connected with each other I will now show you how SQL code is structured and give you the most important concept that you need to understand in order to succeed at SQL now this is a SQL statement it is like a complete sentence in the SQL language the statement defines where we want to get our data from and how we want to receive these data including any processing that we want to apply to it and once we have a statement we can select run and it will give us our data now the statement is made up of building block blocks which we call Clauses and in this statement we have a clause for every line so the Clauses that we see here are select from where Group by having order and limit and clauses are really the building blocks that we assemble in order to build statements what this course is about is understanding what each Clause is and how it works and then understanding how we can put together these Clauses in order to write effective statements now the first thing that you need to understand is that there is an order to write in these Clauses you have to write them in the correct order and there is no flexibility there if you write them in the wrong order you will simply get an error for example if I I were to take the work clause and put it below the group Clause you can see that I’m already getting an error here which is a syntax error but you don’t have to worry about memorizing this now because you will pick up this order as we learn each clause in turn now the essential thing that you need to understand and that slows down so many SQL Learners is that while we are forced by SQL to write Clauses in this specific order this is not actually the order in which the Clauses are executed if you’ve interacted with another programming language such as python or or JavaScript you’re used to the fact that each line of your program is executed in turn from top to bottom generally speaking and that is pretty transparent to understand but this is not what is happening here in SQL to give you a sense of the order in which these Clauses are run on a logical level what SQL does is that first it reads the from then it does the wear then the group by then the having then it does the select part after the select part is do it does the order by and finally the limit all of this just to show that the order in which operations are executed is not the same as the order in which they’re written in fact we can distinguish three orders that pertain to SQL Clauses and this distinction is so important to help you master SQL the first level is what we call the lexical order and this is simply what I’ve just shown you it’s the order in which you have to write these Clauses so that SQL can actually execute the statement and not throw you an error then there’s the logical order and this is the order in which the clause are actually run logically in the background and understanding this logical order is crucial for accelerating your learning of SQL and finally for the sake of completeness I had to include the effective order here because what happens in practice is that your statement is executed by a SQL engine and that engine will usually try to take shortcuts and optimize things and save on processing power and memory and so the actual order might be a bit different because the Clauses might be moved around um in the process of optimization but like I said I’ve only included it for the sake of completeness and we’re not going to worry about that level in this course with we are going to focus on mastering the lexical order and The Logical order of SQL Clauses and to help you master The Logical order of SQL Clauses or SQL operations I have created this schema and this is the fundamental tool that you will use in this course this schema as you learn it progressively will allow you to build a powerful mental model of SQL that will allow you to tackle even the most complex and tricky SQL problems now what this schema shows you is all of the Clauses that you will work with when writing SQL statements so these are the building blocks that you will use in order to assemble your logic and then the sequence in which they’re shown is corresponding to The Logical order in which they are actually executed and there are three simple rules for you to understand this schema the first rule is that operations are EX executed sequentially from left to right the second rule is that each operation can only use data that was produced by operations that came before it and the third rule is that each operation cannot know anything about data that is produced by operations that follow it what this means in practice is that if you take any of these components for example the having component you already know that having will have access to data that was produced by the operations that are to to its left so aggregations Group by where and from however having will have absolutely no idea of information that is produced by the operations that follow for example window or select or Union and so on of course you don’t have to worry about understanding this and memorizing it now because we will tackle this gradually throughout the course and throughout the course we will go back to the schema again and again in order to make sense of the work we’re doing and understand the typical errors and Pitfall that happen when working with SQL now you may be wondering why there are these two cases where you actually see two components stacked on top of each other that being from and join and then select an alas these are actually components that are tightly coupled together and they occur at the same place in The Logical ordering which is why I have stacked them like this in this section we tackle the basic components that you need to master in order to write simple but powerful SQL queries and we are back here with our schema of The Logical order of SQL operations which is also our map for everything that we learn in this course but as you can see there is now some empty space in the schema because to help us manage the complexity I have removed all of the components that we will not be tackling in this section let us now learn about from and select which are really the two essential components that you need in order to write the simplest SQL queries going back now to our data let’s say that we wanted to retrieve all of the data from the characters table in the fantasy data set now when you have to write a SQL query the first question you need to ask yourself is where is the data that I need because the first thing that you have to do is to retrieve the data which you can then process and display as needed so in this case it’s pretty simple we know that the data we want leaves in the characters table once you figured out where your data leaves you can write the from Clause so I always suggest starting queries with the from clause and to get the table that we need we can write the name of the data set followed by a DOT followed by the name of the table and you can see that bigquery has recognized the table here so I have written the from clause and I have specified the address of the table which is where the data leaves and now I can write the select clause and in the select Clause I can specify which Columns of the table I want to see so if I click on the characters table here it will open in a new tab in my panel and as you remember the it shows me here the schema of the table and the schema includes the list of all the columns now I can simply decide that I want to see the name and the guilt and so in the select statement here I will write name and guilt and when I run this I get the table with the two columns that I need and one neat thing about this I could write the columns in any order it doesn’t have to be the original order of the schema and the result will show that order and if I I wanted to get all of The Columns of the table I could write them here one by one or I could write star with which is a shorthand for saying please give me all of the columns so this is the corresponding data to our table in Google Sheets and if you want to visualize select in your mind you can imagine it as vertically selecting the parts of the table that you need for example if I were to write select Guild and level this would be equivalent to taking these two columns over here and selecting them let us now think of The Logical order of these operations so first comes the from and then comes the select and this makes logical sense right because the first thing you need to do is to Source the data and later you can select the parts of the data that you need in fact if we look at our schema over here from is the very first component in The Logical order of operations because the first thing that we need to do is to get our data we have seen that the select Clause allows us to get any columns from our table in any order but the select Clause has many other powers so let’s see what else we can do with it one useful thing to know about SQL is that you can add comments in the code and comments are parts of text which are not uh executed as code they’re just there for you to um keep track track of things or or explain what you are doing so I’m going to write a few comments now and the way we do comments is by doing Dash Dash and now I’m going to show you aliasing aliasing is simply renaming a column so I could take the level column and say as character level provided a new name and after I run this we can see that the name of the colum has changed now one thing that’s important to understand as we now start transforming the data with our queries is that any sort of change that we apply such as in this case we change the name of the column it only affects our results it does not affect the original table that we are querying so no matter what we do here moving forward Ward the actual table fantasy characters will not change all that will change are the results that we get after running our query and of course there are ways to go back to Fantasy characters and permanently change it but that is outside the scope for us and going back to our schema you will see that Alias has its own component and it happens happens at the same time as the select component and this is important because as we will see in a bit that it’s a common temptation to use these aliases these labels that we give to columns in the phases that precede this stage which typically fails because as our rules say um every component does not have access to data that is computed after it so something that we will come back to now another power of Select that we want to show is constants and constants is the ability of creating new columns which have a constant value for example let’s say that I wanted to implement a versioning system for my characters and I would say that right now all the characters I have are version one but then in the future every time I change a character I will increase that version and so that will allow me to keep track of changes I can do that by simply writing one over here in the column definition and when I run this you will see that SQL has created a new column and it has put one for every Row in that column this is why we call it a constant column so if I scroll down down all of it will be one and this column has a weird name because we haven’t provided a name for it yet but we already know how to do this we can use the alas sync command to say to call it version and here we go so in short when you write a column name in the select statement SQL looks for that column in the table and gives you that column but when instead you write a value SQL creates a new column and puts that value in every Row the next thing that SQL allows me to do is calculations so let me call the experience column here as well and get my data now one thing I could do is to take experience and divide it by 100 so what we see here is a new column which is the result of this calculation now 100 is a constant value right so you can imagine in the background SQL has created a new column and put 100 in every row and then it has done the calculation between experience and that new column and we get this result and and in short we can do any sort of calculation we want combining current columns and constants as well for example although this doesn’t make any sense I could take experience add 100 to it divided by character level and then multiply it by two and and we see that we got an error can you understand why we got this error pause the video and think for a second I am referring to my column as character level but what is character level really it is a label that I have assigned over here now if we go back to our schema we can see that select and Alias happen at the same time so so this is the phase in which we assign our label and this is also the phase in which we try to call our label now if you look at our rules this is not supposed to work because an operation can only use data produced by operations before it and Alias does not happen before select it happens at the same time in other words this part part over here when we say character level is attempting to use information that was produced right here when we assigned the label but because these parts happen at the same time it’s not aware of the label all this to say that the logical order of operations matters and that what we want here is to actually call it level because that is the name of the column in the table and now when I run this I get a resulting number and so going back to our original point we are able to combine columns and constants with any sort of arithmetic operations another very powerful thing that SQL can do is to apply functions and a function is a prepackaged piece of logic that you can apply to our data and it works like this there is a function called sqrt which stands for square root which takes a number and computes the square root so you call the function by name and then you open round brackets and in round brackets you provide the argument and the argument can be a constant such as 16 or it can be a column such as level and when I run this you can see that in this case the square root of 16 is calculated as four and this creates a constant column and then here for each value of level we have also computed the square root there are many functions in SQL and they vary according to the data type which you provide as you remember we said that knowing the data types of columns such as distinguishing between numbers and text is important because it it allows us to know which operations we can apply and so there are functions that work only on certain data types for example here we see square root which only works on numbers but we also have text functions or string functions which only work on text one of them is upper so if I take upper and provide Guild as an argument what do you expect will happen we have created a new column where the G is shown in all uppercase so how can I remember which functions there are and how to use them the short answer is I don’t uh there are many many functions in SQL and here in the documentation you can see a very long list of all the functions that you can use in big query and as we said the functions vary according to the data that they can work on so if you look look here on the left we have array functions um date functions mathematical functions numbering functions time functions and so on and so on it is impossible to remember all of these functions so all you need to know is how to look them up when you need them for example if I know I need to work with numbers I could scroll down here and go to mathematic iCal functions and here I have a long list of all the mathematical functions and I can see them all on the right and I should be able to find the square root function that I have showed you and here the description tells me what the function does and it also provides some examples to summarize these are some of the most powerful things you can do with a select statement not only you can retrieve every column you need in any order you can rename columns according to your needs you can Define constant columns with a value that you choose you can combine columns and constant columns in all sorts of calculations and you can apply functions to do more complex work I definitely invite you to go ahead and put your own data in big query as a I’ve shown you and then start playing around with select and see how you can transform your data with it one thing worth knowing is that I can also write queries that only include select without the front part that is queries that do not reference a table let’s see how that works now after I write select I clearly cannot reference any columns because there is no table but I can still reference constant for example I could say hello one and false and if I run this I get this result so remember in SQL we always query tables and we always get back tables in this case we didn’t reference any previous table we’ve just created constants so what we have here are three columns with constant values and there is only one row in the resulting table this is useful mainly to test stuff so let’s say I wanted to make sure that the square root function does what I expect it to do so I could call it right here and uh look at the result let’s use this capability to look into the order of arithmetic operations in SQL so if I write an expression like this would you be able to compute the final result in order to do that you should be able to figure out the order in which all these operations are done and you might remember this from arithmetic in school because SQL applies the same order that is taught in school and we could Define the order as follows first you would execute any specific functions that take a number as Target and uh then you have multi multiplication and division then you have addition and subtraction and finally brackets go first so you first execute things that are within brackets so pause the video and apply these rules and see if you can figure out what this result will give us now let’s do this operation and do it in stages like we were doing in school so first of all we want to worry about what’s in the brackets right so I will now consider this bracket over here and in this bracket we have the multiplication and addition multiplication goes first so first I will execute this which will give me four and then I will have 3 + 4 + 1 which should give me 8 next I will copy the rest of the operation and here here I reach another bracket to execute what is in these brackets I need to First execute the function so this is the power function so it takes two and exponentiate it to the power of two which gives four and then 4 minus 2 will give me two and this is what we get now we can solve this line and first of all we need to execute multiplication and division in the order in which they occur so the first operation that occurs here is 4 / 2 which is 2 and I will just copy this for clarity 8 – 2 * 2 / 2 the next operation that occurs now is 2 * 2 which is 4 so that would be 8 – 4 / 2 and the next operation is 4 / 2 which is two so I will have 8 – 2 and all of these will give me a six now all of these are comments and we only have one line of code here and to see whether I was right I just need to execute this code and indeed I get six so that’s how you can use the select Clause only to test your assumptions and uh your operations and a short refresher on the order of arithmetic operations which will be important for solving certain sequal problems let us now see how the where statement works now looking at the characters table I see that there is a field which is called is alive and this field is of type Boolean that means that the value can be either true or false so if I go to the preview here and scroll to the right I can see that for some characters this is true and for others it is false now let’s say I only wanted to get those characters which are actually alive and so to write my query I would first write the address of the table which is fantasy characters next I could use the where Clause to get those rows where is a five is true and finally I could do a simple select star to get all the columns and here I see that I only get the rows where is alive is equal to true so where is effectively a tool for filtering table rows it filters them because it only keeps rows where a certain condition is true and discards all of the other rows so if you want to visualize how the wear Filter Works you can see it as a horizontal selection of certain slices of the table like in this case where I have colored all of the rows in which is alive is true now the we statement is not limited to Boolean Fields it’s not limited to columns that can only be true or false we can run the we filter on any column by making a logical statement about it for example I could ask to keep all the rows where Health number is bigger than 50 this is a logical statement Health bigger than 50 because it is either true or fals for every row and of course the wh filter will only keep those rows where this statement evaluates to true and if I run this I can see that in all of my results health will be bigger than 50 and I can also combine smaller logical statements with each other to make more complex logical statements for example I could say that I want all the rows where health is bigger than 50 and is a live is equal to true now all of this becomes one big logical statement and again this will be true or false for every row and we will only keep the rows where it is true and if I run this you will see that in the resulting table the health value is always above 50 and is alive is always true in the next lecture we will see in detail how these logical statements work and how we can combine them effectively but now let us focus on the order of operations and how the wear statement fits in there when it comes to the lexical order the order in which we write things it is pretty clear from this example first you have select then from and after from you have the WHERE statement and you have to respect this order when it comes to The Logical order you can see that the where Clause follows right after the from Clause so it is second actually in The Logical order if you think about it this makes a lot of sense because the first thing that I need to do is to get the data from where it Lees and then the first thing I want to do after that is is that I’m going to drop all the rows that I don’t need so that my table becomes actually smaller and easier to deal with there is no reason why I should carry over rows that I don’t actually need data that I don’t actually want and waste memory and processing power on it so I want to drop those unneeded rows as soon as possible and now that you know that where happens at this stage in The Logical order you can avoid many of the pitfalls that happen when you’re just learning SQL let’s see an example now take a look at this query I’m going to the fantasy characters table and then I’m getting the name and the level and then I’m defining a new column this is simply level divided by 10 and I’m calling this level scaled now let’s say that I wanted to only keep the rows that have at at least three as level scaled so I would go here and write aware filter where level scaled bigger than three and if I run this I get an error unrecognized name can you figure out why we get this error level scaled is an alas that we assign in the select stage but the we Clause occurs before the select stage so the we Clause has no way to know about this alias in other words the we Clause is at this point and our rules say that an operation can only use data produced by operations before it so the we Clause has no way of knowing about the label which is a sign at this stage so how can we solve this problem right here the solution is to not use the Alias and to instead repeat the logic of the transformation and this actually works because it turns out that when you write logical statements in the we filter you can not only reference The Columns of the tables but you can also reference operations on columns and this way of writing operations of on columns and combinations between columns works just as what we have shown in the select part so that was all you need to know to get started with the wear clause which is a powerful Clause that allows us to filter out the row that we don’t need and keep the rows that we need based on logical conditions now let’s delve a bit deeper into how exactly these logical statements work in SQL and here is a motivating example for you this is a selection from the characters table and we have a wear filter and this we filter is needlessly complicated and I did this intentionally because by the end of this lecture you should have no trouble at all interpreting this statement and figuring out for which rows it will be true and likewise you will have no problem writing complex statements yourself or deciphering them when you encounter them in the wild the way that these logical statements work is through something called Boolean algebra which is an essential theory for working with SQL but also for working with any other programming language and is indeed fundamental to the way that computers work and though the name may sound a bit scary it is really easy to understand the fundamentals of Boolean algebra now let’s look back at so-called normal algebra meaning the common form that is taught in schools in this algebra you have a bunch of elements which in this case I’m only showing a few positive numbers such as 0 25 100 you also have operators that act on these elements for example the square root symbol the plus sign the minus sign the division sign or the multiplication sign and finally you have operations right so in operations you apply The Operators to your elements and then you get some new elements out of them so here are two different types of operation in one case we take this operator the square root and we apply it to a single element and out of this we get another element in the second kind of operation we use this operator the plus sign to actually combine two elements and again we get another element in return Boolean algebra is actually very similar except that it’s simpler in a way because you can only have two elements either true or false those are all the elements that you are working with and of course this is why when there’s a Boolean field in SQL it is a column that can only have these two values which are true and false now just like normal algebra Boolean algebra has several operators that we can use to transform the elements and for now we will only focus on the three most important ones which are not and and or and finally in Boolean algebra we also have operations and in operations we combine operators and elements and get back elements now we need to understand how these operators work so let us start with the not operator to figure out how a Boolean operator works we have to look at something that’s called a truth table so let me look up the truth table for the not operator and in this Wikipedia article this is available here at logical negation now first of all we see that logical negation is an operation on one logical value what does this mean it means that the not operator works on a single element such as not true or not false and this this is similar to the square root operator in algebra that works on a single element a single number next we can see how exactly this works so given an element that we call P and of course P can only be true or false the negation of p is simply the opposite value so not true is false and not false is true and we can easily test this in our SQL code so if I say select not true what do you expect to get we get false and if I do select not false I will of course get true next let’s see how the end operator works so we’ve seen that the not operator works on a single element on the other hand the end operator connects two elements such as writing true and false and in this sense the end operator is more similar to the plus sign here which is connecting two elements so what is the result of true and false to figure this out we have to go back to our truth tables and I can see here at The Logical conjunction function section which is another word for the end operator now the end operator combines two elements and each element can either be true or false so this creates four combinations that we see here in this table and what we see here is that only if both elements are true then the end operator will return true in any other case it will return false so going back here if I select true and false what do you expect to see I am going to get false and it’s only in the case when I do true and true that the result here will be true and finally we can look at the or operator which is also known as a logical disjunction it’s also combining two elements it also has four combinations but in this case if at least one of the two elements is true then you get true and only if both elements are false then you get false and so going back to our SQL true or true will of course be true but but even if one of them is false we will still get true and only if both are false we will get false so now you know how the three most important operators in Boolean algebra work now the next step is to be able to solve long and complex Expressions such as this one and you already know how operators work the only information you’re missing is the order of operations and just like in arithmetic we have an agreed upon order of operations that helps us solve complex expressions and the Order of Operations is written here first you solve for not then you solve for and and finally for or and as with arithmetic you first solve for the brackets so let’s see how that works in practice let us now simplify this expression so the first thing I want to do is to deal with the brackets so if I copy all of this part over here as a comment so it doesn’t run as code you will see that this is the most nested bracket the innermost bracket in our expression and we have to solve for this so what is true or true this is true right and now I can copy the rest of my EXP expression up to here and here I can solve the innermost bracket as well so I can say true and what I have here is false and true so this is false right because when you have end both of them need to be true for you to return true otherwise it’s false so I will write false moving on to the next line I need to solve what’s in the bracket so I can copy the knot and now I have to solve what’s in this bracket over here now there are several operators here but we have seen that not has the Precedence right so I will copy true and here I have not false which becomes true and then I can copy the last of the bracket I’m not going to do any more at this step to avoid confusion and then I have or and I can solve for this bracket over here and true and false is actually false moving on I can keep working on my bracket and so I have a lot of operations here but I need to give precedence to the ends so the first end that occurs is this one and that means I have to start with this expression over here true and and true results in true and then moving on I will copy the or over here and now I have another end which means that I have to isolate this expression false and true results in false and finally I can copy the final end because I’m not able to compute it yet because I needed to compute the left side and I can copy the remaining part as well moving on to the next line um I need to still do the end because the end takes precedence and so this is the expression that I have to compute so I will say true or and then this expression false and true computes to false and then copy the rest now let me make some rul over here and go to the next line and I can finally compute this bracket we have true or false which we know is true next I need to invert this value because I have not true which is false and then I have or false and finally this computes to false and now for the Moment of Truth F intended I can run my code and see if the result actually corresponds to what we got and the result is false so in short this is how you can solve complex expressions in Boolean algebra you just need to understand how these three operators work and you can use truth tables like like this one over here to help you with that and then you need to remember to respect the order of operations and then if you proceed step by step you will have no problem solving this but now let’s go back to the query with which we started because what we have here is a complex logical statement that is plugged into the wear filter and it isolates only certain rows and we want to understand exactly how this statement works so let us apply what we’ve just learned about Boolean algebra to decipher this statement now what I’ve done here is to take the first row of our results which you see here and just copi the values in a comment and then I’ve taken our logical statement and copied it here as well so let us see what SQL does when it checks for this Row the first thing that we need to do is to take all of these statements in our wear filter and convert them to true or false and to do that we have to look at our data let us start with the first component which is level bigger than 20 so for the row that we are considering level is 12 so this comes out as false next I will copy this end and here we have is alive equals true now for our row is alive equals false so this statement computes as false Mentor ID is not null with null representing absence of data in our case Mentor ID is one so it is indeed not null so here we have true and finally what we have in here is class in Mage Archer so we have not seen this before but it should be pretty intuitive this is a membership test this is looking at class which in this case is Hobbit and checking whether it can be found in this list and in our case this is now false so now that we’ve plugged in all the values for our row what we have here is a classic Boolean algebra expression and we are able to solve this based on what we’ve learned so let us go and solve this and first I need to deal with the brackets and what I have here I have an end and an or and the end TR takes precedence so false and false is false and I will copy the rest and here I have not false which is true next we have false or true which is true and true and in the end this computes to true now in this case we sort of knew that the result was meant to come out as true because we started from a row that survived the wear filter and that means that for this particular row this statement had to compute as true but it’s still good to know exactly how SQL has computed this and understand exactly what’s going on and this is how SQL deals with complex logical statements for each row it looks at the relevant values in the row so that it can convert the statement to a Boolean algebra expression and then it uses the Boolean algebra rules to compute a final result which is either true or false and then if this computes as true for the row then the row is kept and otherwise the row is discarded and this is great to know because this way of resoling solving logical statements applies not only to the word component but to all components in SQL which use logical statements and which we shall see in this course let us now look at the distinct clause which allows me to remove duplicate rows so let’s say that I wanted to examine the class column in my data so I could simply select it and check out the results so what if I simply wanted to see all the unique types of class that I have in my data this is where distinct comes in handy if I write distinct here I will see that there are only four unique classes in my data now what if I was interested in the combinations between class and guilt in my data so let me remove the distinct from now and add guilt here and for us to better understand the results I’m going to add an ordering and here are the combinations of class and Guild in my data there is a character who is an Archer and belongs to Gondor and there are actually two characters who are archers and belong belong to mirkwood and there are many Hobbits from sholk and so on but again what if I was interested in the unique combinations of class and Guild in my data I could add the distinct keyword here and as you can see there are no more repetitions here Archer and merkwood occurs only once Hobbit and Shar f occurs only once because I’m only looking at unique combinations and of course I could go on and on and add more columns and expand the results to show the unique combinations between these columns so here Hobbit and sherol has expanded again because some Hobbits are alive and others unfortunately are not at the limit I could have a star here and what I would get back is actually my whole data all the 15 rows because what we’re doing here is looking at rows that have the same value on all columns rows that are complete duplicates and there are no such rows in the data so when I do select star in this case distinct has no effect so in short how distinct works it looks at the columns that you’ve selected only those which you have selected and then it looks at all the rows and two rows are duplicate if they have the exact same values on every column that you have selected and then duplicate rows are removed and only unique values are preserved so just like the wear filter the distinct is a clause that removes certain rows but it is more strict and less flexible in a sense it only want does one job and that job is to remove duplicate rows based on your selection and if we look at our map of SQL operations we can place distinct it occurs right after select right and and this makes sense because we have seen that distinct Works only on the columns that you have selected and so it has to wait for select to choose the columns that we’re interested in and then we can D duplicate based on those for the following lecture on unions I wanted to have a very clear example so I decided to go to the characters table and split it in two and create two new tables out of it and then I thought that I should show you how I’m doing this because it’s a pretty neat thing to know and it will help you when you are working with SQL in bigquery so here’s a short primer on yet another way to create a table in bigquery you can use your newly acquired power of writing cql queries to turn those queries into permanent tables so here’s how you can do it first I’ve written a simple query here and you should have no trouble understanding it by now go to the fantasy characters table keep only rows where is alive is true and then get all the columns next we need to choose where the new table will live and how it will be called so I’m placing it also in the fantasy data set and I’m calling it characters alive and finally I have a simple command which is create table now what you see here is a single statement in SQL it’s a single command that will create the table and you can have in fact multiple statements within the same code and you can run all the statements together when you hit run the trick is to separate all of them with this semicolon over here the semicolon tell SQL hey this command over here is over and and uh next I might add another one so here we have the second statement that we’re going to run and this looks just like the one above except that our query has changed because we’re getting rows where is alive is false and then we are calling these table characters dead so I have my two statements they’re separated by semicolons and I can just hit run and I will see over here that bigquery is showing me the two statements on two different rows and you can see that they are both done now so if I open my Explorer over here I will see that I have two new tables characters alive and characters dead and if I go here for characters alive is alive will of course be true on every row now what do you think would happen if I ran this script again let’s try it so I get an error the error says that the table already exists and this makes sense because I’ve told SQL to create a table but SQL says that table already exists I cannot create it again so there are ways that we can tell SQL what to do if the table already exists again so that we specify the behavior we want and we are not going to just get an error one way is to say create or replace table fantasy characters alive and what this will do is that if the table already exists uh big query will delete it and then create it again or in other words it will overwrite the data so let’s write it down to and let’s make sure that this query actually works so when I run this I will get no errors even if the table already existed because bigquery was able to remove the previous table and create a new one alternatively we may want to create the table only if it doesn’t exist yet and leave it untouched otherwise so in that case we could say create table if not exists so what this will do is that if this table is already existing big query won’t touch it and it won’t throw an error but if it doesn’t exist it will create it so let us write it down two and make sure that this query runs without errors and we see that also here we get no errors and that in short is how you can save the results of your queries in big query and make them into full-fledged tables that you can save and and create query at will and I think this is a really useful feature if you’re analyzing data in big query because any results of your queries that you would like to keep you can just save them and then come back and find them later let’s learn about unions now to show you how this works I have taken our characters table and I have split it into two parts and I believe the name is quite self descriptive there is a separate table now for characters who are alive and a separate table for characters who are dead and you can look at the previous lecture to see how I’ve done this how I’ve used a query to create two new tables but this is exactly the characters table with you know the same schema the same columns the same times is just split in two based on the E alive column now now let us imagine that we do not have the fantasy. characters table anymore we do not have the table with all the characters because it was deleted or we never had it in the first place and let’s pretend that we only have these two tables now characters alive and characters dead and we want to reconstruct the characters table out of it we want to create a table with all the characters how can we do that now what I have here are two simple queries select star from fantasy characters alive and select star from fantasy characters dead so these are two separate queries but actually in big query there are ways to run multiple queries at the same time so I’m going to show you first how to do that now an easy way to do that is to write your queries and then add a semicolon at the end and so what you have here is basically a SQL script which contains multiple SQL statements in this case two and if you hit run all of these will be executed sequentially and when you look at the results so you’re not just getting a table anymore because it’s not just a single query that has been executed but you can see that there have been two commands uh that have been executed which are here and then for each of those two you can simply click View results and you will get to the familiar results tab for that and if I want to see the other one I will click on the back arrow here and click on the other view results and then I can see the other one another way to handle this is that I can select the query that I’m interested in and then click run and here I see the results so big query has only executed the part that I have selected or I can decide to run the other query in my script select it click run and then I will see the results for that query and this is a pretty handy functionality in big query it’s also functionality that might give you some headaches if you don’t know about it because if for some reason you selected a part of the code uh during your work and then you just want to run everything you might hit run and get an error here because B queer is only seeing the part that you selected and cannot make sense of it so it’s good to know about this but our problem has not been solved yet because remember we want to reconstruct the characters table and what we have here are two queries and we can run them separately and we can look at the results separately but we still don’t have a single table with all the results and this is where Union comes into play Union allows me to stack the results from these two tables so so if I take first I will take off the semic columns because this will become a single statement and then in between these two queries I will write Union distinct and when I run this you can verify for yourself we have 15 rows and we have indeed reconstructed the characters table so what exactly is going on here well it’s actually pretty simple SQL is taking all of the rows from this first query and then all of the rows for the second query and then it’s stacking them on top of each other so you can really imagine the act of horizontally stacking a table on top of the other to create a new table which contains all of the rows of these two queries combined and that in short is what union does now there are a few details that we need to know when working with Union and to figure them out let us look at a toy example so I’ve created two very simple tables toy one and toy two and you can see how they look in these comments over here they all have three columns which are called imaginatively call One Call two call three and then this is the uh Toy one table and then this is the toy 2 table now just like before we can combine this table tabls by selecting all of them and then writing a union in between them now in B query you’re not allow to write Union without the further qualifier a keyword and it has to be either all or distinct so you have to choose one of these two and what is the choice about well if you do Union all you will get all of the rows that are in the first table and those that are in the second table regardless of whether they are duplicate okay but with Union distinct you will get again all of the rows from the two tables but you will only consider unique rows you will not get any duplicates now we can see that these two table share a column which is actually identical one through yes over here and the same row over here now if I write Union all I expect the result to include this row twice so let us verify that and you can see that here you have one true yes and at the end you also have one true yes and in total you get four rows which are all the rows in the two tables however if I do Union distinct I expect to get three rows and I expect this row to appear only once and not to be duplicated again you need to make sure you’re not selecting any little part of your script before you run it so the whole script will be run and as you can see we have three rows and there are no duplicates now it’s interesting that big query actually forces you to choose between all or distinct because in many SQL systems for your information you are able to write Union without any qualifier and in that case it means Union distinct so in other SQL systems when you write Union it is understood that you want Union distinct and if you actually want to keep the duplicate rows you will explicitly write Union all but in big query you always have to explicitly say whether you want Union all or Union distinct now the reason this command is called Union and not like stack or or something else is is that this is a set terminology right this comes from the mathematical theory of sets which you might remember from school and the idea is that a table is simply a set of rows so this table over here is a set of two rows and this table over here is a set of two rows and once you have two sets you can do various set operations between them and the most common operation that we do in SQL is unioning and unioning means combining the values of two sets so you might remember from school the V diagram which is a typical way to visualize the relations between sets so in this simple vent diagram we have two circles A and B which represent two sets and in our case a represents the collection of rows in the first table and B represents the all the rows that are in the second table so what does it mean to Union these sets it means taking all of the elements that are in both sets so taking all of the rows that are in both tables and what is the difference here between union distinct and Union all where you can see that the rows of a are this part over here plus this part over here and the rows of B are this part over here plus this part over here and so when we combine them we’re actually counting the intersection twice we are counting this part twice and so what do you do with this double counting do you keep it or do you discard it if you do Union all you will keep it so rows that are in common between A and B will duplicate you will see them twice twice but if you do Union distinct you will discard it and so um you won’t have any duplicates in the results so that’s one way to think about it in terms of sets but we also know that Union is not the only set operation right there are other set operations a very popular one is the intersect operation now the intersect looks like this right it it says take only the El elements that are in common between these two sets so can we do that in SQL can we say give me only the rows that are in common between the two tables and the answer is yes we can do this and if we go back here we can instead of Union write intersect and then distinct and what do you expect to see after I run this command take a minute to think about it so what I expect to see is to get only the rows that are shared between the two tables now there is one row which is shared between these two tables which is uh the one true yes row which we have seen and if I run this I will get exactly this row so intersect distinct gives me the rows that are shared between the two tables and I have to write intersect distinct I cannot write intersect all because actually doesn’t mean anything so it’s not going to work and here’s another set operation which you might consider which is subtraction so what if I told you give me all of the elements in a except the elements that a shares with B so what would that look on the drawing it would look like this right so this is taking all of the elements that are in a except these ones over here because they are in a but they’re also in B and I don’t want the elements shared with b and yes I can also do that in squl I can come here and I could say give me everything from Toy one except distinct everything from Toy two and what this means is that I want to get all of my rows from Toy one except the rows that are shared with toy two so what do you expect to see when I run this let’s hit run and I expect to see only this row over here because this other row is actually shared with b and this is what I get again you have to write accept distinct you cannot write accept all because it’s actually actually doesn’t mean anything and keep in mind that unlike the previous two operations which are union and distinct the accept operation is not symmetric right so if I swap the tables over here I actually expect to get a different result right I expect to see this row over here selected because I’m saying give me everything from this table uh Toy 2 except the rows that are shared with toy one so so let us run this and make sure and in fact I get the three through uh maybe row so careful that the accept operation is not symmetric the order in which you put the two tables matters so that was a short overview of Union intersect except and I will link this here which is the bigquery documentation on this and you can see that they’re actually called set operators in fact in real life you almost always see Union very rarely you will see somebody using intersect or accept a lot of people also don’t know about them but I think it’s worth it that we briefly looked at all three and it’s especially good for you to get used to thinking about tables as sets of rows and thinking about SQL operations in terms of set set operations and that will also come in handy when we study joints but let us quickly go back to our toy example and there are two essential prerequisites for you to be able to do a union or any type of sort operations number one the tables must have the same number of columns and number two the columns must have the same same data type so as you can see here we are combining toy 2 and toy 1 and both of them have three columns and the First Column is an integer the second is a Boolean and the third is a string in both tables and this is how we are able to combine them so what would happen if I went to the first table and I got only the first two columns and then I tried to combine it you guessed it I would get an error because I have a mismatched column count so if I want to select only the first two columns in a table I need to select only the first two columns in another table and then the union will work now what would happen if I messed up the order of the columns so let’s say that here I will select uh column one and column 3 and here I will select column one and column two let me run this and I will get an error because of incompatible types string and bull so what’s happening here is that SQL is trying to get the values of call three over here and put it into call two over here and it’s trying to get a string and put it into a Boolean column and that simply doesn’t work because as you know SQL enforces streak Types on columns and so this will not work but of course I could select call three in here as well and now again we will have a string column going into a string column and of course this will work so so to summarize you can Union or intersect or accept any two tables as long as they have the same number of columns and the columns have the same data types let us now illustrate a union with a more concrete example so we have our items table here and our characters table here so the items table repres represents like magical items right while the characters table we’re familiar with it represents actual characters so let’s say that you are managing a video game and someone asks you for a single table that contains all of the entities in that video game right and the entities include both characters and items so you want to create a table which combines these two tables into one we know we can use Union to do that we know we can use Union to stack all the rows but we cannot directly Union these two tables be because they have a different schema right they have a different number of columns and then those columns have different data types but let’s analyze what these two tables have in common and how we could maybe combine that so first of all they both have an ID and in both cases it’s an integer so that’s already pretty good they both have a name and in both cases the name is a string so we can combine that as as well the item type can be thought of being similar to the class and then each item has a level of power which is expressed as an integer and each character has a level of experience which is expressed as an integer and you can think that they are kind of similar and then finally we have a timestamp field representing a moment in time for both columns which are date added and last active so looking at this columns that the two have sort of in common we can find a way to combine them and here’s how we can translate this into SQL right so I’m went to the fantasy items table and I selected The Columns that I wanted and then I went to the characters table and I selected the columns that I wanted to combine with those um in in the right order so we have ID with ID name with name class with item type level with power and last active with date added so I have my columns they’re in the right order I wrote Union distinct and if I run this you will see that I have successfully combined the rows from these two tables by finding out which columns they have in common and then writing them in the right order and then adding Union distinct now all the columns that we’ve chosen for the combination have the same type but what would happen if I wanted to combine two columns that are not actually the same type so let’s say what if we wanted to combine Rarity which is a string with experience which is an integer as you know I cannot do this directly but I can go around it by either taking Rarity and turning it into an integer or taking um experience and turning it into a string I just have to make sure that they both have the same data type now the easiest way is usually to take um any other data type and turn it into a string because we you just turn it into text so let’s say that for the sake of this demonstration we will take integer experience which is an integer and turn it into a string which is text and then combine that with Rarity so I will go back to my code and I will make some room over here and here in items I will add Rarity and here in characters I will add experience and you can see that I already get an error here saying that the union distinct has incompatible types just like expected so what I want to do here is to take experience and turn it into string and I can do that with the cast function so I can do cast experience as string and what this will do is basically take these values and convert them to string and if I run this you can see that this has worked so we combined two tables into one and now the result is a single table it has a column called Rarity the reason it’s called Rarity is that um it’s it’s taking the name from the first table in the in the operation but we could of course rename it to whatever we need and this is now a text column because we have combined a text column with also a text column thanks to the casting function so what we see here are a bunch of numbers which came originally from The Experience uh column from the character table but they’re now converted to text and if I scroll down then I will also see the original values of Rarity from the items table finally let us examine Union in the context of The Logical order of SQL operations so you can see here that we have our logical map but it looks a bit different than usual and the reason it’s different is that we are considering what happens when you un two tables and here the blue represents one table and the red represents the other table so I wanted to show you that all of the ordering that we have seen until now so first get the table then use the filter with where then select the columns you want and if you want use this thing to remove duplicates all of these happens in the same order separately for the two tables that you are unioning and this applies to all of the other operations like joining and grouping which we will see um later in the course so at first the two tables are working on two separate tracks and SQL is doing all this operations on them in this specific order and only at the end of all this only after all of these operations have run then we have the union and in the Union these two tables are combined into one and only after that only after the tables have been combined into one you apply the last two operations which are order by and limit and actually nothing forces you to combine only two tables you could actually have any number of tables that you are combining in Union but then the logic doesn’t change at all all of these operations will happen separately for each of the tables and then only when all of these operations are done only when all of the tables are ready then they will be combined into one and if you think about it it makes a lot of sense because first of all you need the select to have run in order to know what is the schema of the tables that you are combining and then you also also need to know if distinct has run on each uh table because you need to know which rows you need to combine in the union and that is all you need to know to get started with Union this very powerful statement that allows us to combine rows from different tables let us now look at order by so I’m looking at the characters table here and as you can see we have an ID column that goes from one to 15 which assigns an ID to every character but you will see that the IDS don’t appear in any particular order and in fact this is a general rule for SQL there is absolutely no order guarantee for your data your data is not stored in any specific order and your data is not going to be returned in any specific order and the reason for this is fun fundamentally one of efficiency because if we had to always make sure that our data was perfectly ordered that would add a lot of work it would add a lot of overhead to the engine that makes the queries work and uh there’s really no reason to do this however we do often want to order our data when we are querying it we want to order the way that it is displayed and this is why the order by clause is here so let us see how it works I am selecting everything from fantasy characters and again I’m going to get the results in no particular order but let’s say I wanted to see them in uh ordered by name so then I would do order by name and as you can see the rows are now ordered alphabetically according to the name I could also invert the order by writing desk which stands for descending and that means U descending alphabetical order which means from the last letter in the alphabet to the first I can of course also order by number columns such as level and we would see that the level is increasing here and of course that could also be descending to to go in the opposite direction and the corresponding keyword here is ask which stands for ascending and this is actually the default Behavior so even if you omit this you will get the same going from the smallest to the largest I can also order by multiple columns so I could say order by class and then level and what that looks like is that first of all the rows are ordered by class so as you can see this is done alphabetically so first Archer and then the last is Warrior and then within each class the values within the class are ordered according to the level going from the smallest level to the biggest level and I can invert the order of one of them for example class and in this case we will start with Warriors and then within the warrior class we will still will order the level in ascending order so I can for every column uh that’s in the ordering I can decide whether that ordering is in ascending order or descending order now let us remove this and select the name and the class and once again I get my rows in no particular order and I’m seeing the name and the class so I wanted to show you that you can also order by columns which you have not selected Ed so I could order these elements by level even though I’m not looking at at level and it will work all the same and finally I can also order by operations so I could say take level divide it by experience and then multiply that by two for some reason and it would also work in the order ordering even though I am not seeing that calculation that calculation is being done in the background and used for the ordering so I could actually take this here and copy it create a new column call it calc for calculation and if I show you this you will see the results are not uh very meaningful but you will see that they are in ascending order so we have ordered by that and sometimes you will see this notation over here order by 21 for example and as you can see what we’ve done here is that we’ve ordered by class first of all because we starting with archers and going to Warriors and then within each class we are ordering by name uh also in ascending order so this is basically referring to the columns that are referenced in the select two means order by the second column which you have referenced which in this case is class and one means order by the First Column that you referenced so it’s basically a shortcut that people sometimes use to avoid rewriting the names of columns that they have selected and finally when we go back to the order of operations we can see that order bu is happening really at the end of all of this process so as you will recall I have created this diagram that’s a bit more complex to show show what happens when we Union different tables together what happens is that basically all these operations they run independently on each table and then finally the tables get uh unioned together and after all of this is done SQL knows the final list of rows that we will include in our results and that’s the right moment to order those rows it would not be possible to do that before so it makes sense that order is located here let us now look at the limit Clause so what I have here is a simple query it goes to the characters table it filters for the rows where the character is alive and then it gets three columns out of this so let’s run this query and you can see that this query returns 11 rows now let us say that I only wanted to see five of those rows and this is where limit comes into place limit will look at the final results and then pick five rows out of those results reducing the size of my output and here you can see that we get five rows now as we said in the lecture of ordering by default there is no guarantee of order in a SQL system so when you are getting all your data with a query and then you run limit five on top of it you have no way of kn knowing which of the rows will be selected to fit amongst those five you’re basically saying that you’re okay with getting any five of all of the rows from your result because of this people often will use limit in combination with order by for example I could say order by level and then limit five and what I would get here is essenti the first five most inexperienced characters in my data set and let us say that you have a problem of finding the least experienced character in your data the character with the lowest level so of course you could say order by level and then limit one and you would get the character with the lowest level right and this works however it is not ideal there is a problem with this solution so can you figure out what the problem with this solution is the problem will be obvious once I go back to limit 5 and I look here and I see that I actually have two characters which have the lowest level in my data set so in theory I should be able to return both of them because they both have the lowest level however when I write limit one it simply cuts the rows in my output and it is unaware of that uh further information that is here in this second row and in the further lectures we will see how we can solve this better and get results which are more precise and if we look at The Logical order of operations we can see that limit is the very last operation and so all of the logic of our query is executed all our data is computed and then based on that final result we sometimes decide to not output all of it but to Output a limited number of rows so a common mistake for someone who is starting with SQL is thinking that they can use limit in order to have a cheaper query for example you could say oh this is a really large table this table has two terabytes of data it would cost a lot to scan the whole table so I will say select star but then I will put limit 20 because I only want to see the first 20 rows and that will means that I will only scan 20 rows and my query will be very cheap right no that is actually wrong that doesn’t save you anything and you can understand this if you look at the map because all of the logic is going to execute before you get to limit so you’re going to scan the whole table when you say select star and you’re going to apply all of the logic and the limit is actually just changing the way your result is displayed it’s not actually changing the way the your result is computed if you did want to write your query so that it scans less rows one thing you should do is focus on the where statement actually because the where statement is the one that runs in the beginning right after getting the table and it is able to actually eliminate rows which usually saves you on computation and money and so on however I do need to say that there are systems where writing limit may actually turn into savings because different systems are optimized in different ways and um allow you to do different things with the commands but as a rule usually with SQL limit is just changing the way your result is displayed and doesn’t actually change anything in the logic of execution let us now look at the case clause which allows us to apply conditional logic in SQL so you can see here a simple query I am getting the data from the characters table I am filtering it so that we only look at characters who are alive and then for each character we’re getting the name and the level now when you have a column that contains numbers such as level one typical thing that you do in data analysis is bucketing and bucketing basically means that I look at all these multiple values that level can have and and I reduce them to a smaller number of values so that whoever looks at the data can make sense of it uh more easily now the simplest form of bucketing that you can have is the one that has only two buckets right so looking at level our two buckets for example could be uh in one bucket we put values that are equal or bigger than 20 so characters who have a level that’s at least 20 and in the other bucket we put all the characters that have a level that is less than 20 for example now how could I Define those two buckets so we know that we can Define new columns in the select statement and that we can use calculations and logical statements to define those columns so one thing that I could do would be to go here and then write level bigger than bigger or equal than 20 and then call this new column level at least 20 for example and when I run this I get my column now of course this is a logical statement and for each row this will be true or false and then you can see that our new column here gives us true or false on every column and this is a really basic form of bucketing because it allows us to take you know level has basically 11 different values in our data and it can be complicated to look at this many values at once and now we’ve taken these 11 values and reduced them to two uh to two buckets so that we have um organized our data better and it’s easier to read but there are two limitations with this approach one I might not want to call my buckets true or false I might want to give more informative names to my buckets such as experienced or inexperienced for example the other limitation is that with this approach I can effectively only divide my data in two buckets because once I write a logical statement it’s either either true or false so my data gets divided in two but often it’s the case that I want to use multiple buckets for my use case now bucketing is a typical use case for the case when statement so let’s see it in action now so let me first write a comment not any actual code where I Define what I want to do and then I will do it with the code so I have written here the buckets that I want to use to classify the characters level so up to 15 they are considered low experience between 15 and 25 they are considered mid and anything above 25 we will classify as super now let us apply the case Clause to make this work so the case Clause Is Always bookended by these two parts case and end so it starts with case it ends with end and a typical error when you’re getting started is to forget about the end part so my recommendation is to always start by writing both of these and then going in the middle to write the rest now in the middle we’re going to Define all the conditions that we’re interested in and each condition starts with the keyword when and is Then followed by a logical condition so our logical condition here is level smaller than 15 now we have to Define what to do when this condition is true and it follows with the keyword then and when this condition is true we want to return the value low which is a string a piece of text that says low next we proceed with the following condition so when level is bigger and equal to 15 and level is lower than 25 so if you have trouble understanding this logical statement I suggest you go back to the lecture about Boolean algebra but what we have here there are two micro statements right Level under 25 and level equal or bigger than 15 they are conect connected by end which means that both of these statements have to be true in order for the whole statement to be true which is what we want in this case right and what do we want to return in this case we will return the value mid and the last condition that we want to apply when level is bigger or equal than 25 then we will return super now all of this that you see here this is the case Clause right or the case statement and all of this is basically defining a new column in my table and given that it’s a new column I can use the alas sync to also give it a name and I can call this level bucket now let’s run this and see what we get and as you can see we have our level bucket and the characters that are above 25 are super and then we have a few Ms and then everyone who’s under 15 is low so we got the results we wanted and now let us see exactly how the case statement works so I’m going to take Gandalf over here and he has level 30 so I’m going to write over here level equals 30 because we’re looking at the first low row and that is the value of level and then I’m going to take the conditions for the case statement that we are examining and add them here as a comment now because in our first row level equals 30 I’m going to take the value and substitute it here for level now what we have here is a sequence of logical statements and we have seen how to work with these logical statements in the lecture on Boolean algebra now our job is to go through each of these logical statements in turn and evaluate them and then as soon as we find one that’s true we will stop so the first one is 30 smaller than 50 now this is false so we continue the second one is a more complex statement we have 30 greater or equal to 15 which is actually true and 30 Oops I did not substitute it there but I will do it now and 30 smaller than 25 which is false and we know from our Boolean algebra that true and false evaluates to false therefore the second statement is also false so we continue and now we have 30 greater or equal than 25 which is true so we finally found a line which evaluates as true and that means that we return the value super and as you can see for Gandalf we have indeed gotten the value super let us look very quickly at one more example we get Legolas which is level 22 and so I will once again copy this whole thing and comment it and I will substitute 22 for every value of level cuz that’s the row we’re looking at and then looking at the first row 22 small than 15 is false so we proceed and then looking at the second row 22 bigger than 15 is true and 22 smaller than 25 is also true so we get true and true which evaluates to true and so we return mid and then looking at Legolas we get mid so this is how the case when statement Works in short for each row you insert the values that correspond to your Row in this case the value of level and then you evaluate each of these logical conditions in turn and as soon as one of them returns true then you return the value that corresponds to that condition and then you move on to the next row now I will clean this up a bit and now looking at this statement now and knowing what we know about the way way it works can we think of a way to optimize it to make it nicer to remove redundancies think about it for a minute now one thing we could do to improve it is to remove this little bit over here because if you think about it this part that I have highlighted is making sure that the character is not under 15 so that it can be classified as meat but actually we already have the first condition that makes sure that if the character is under 15 then the statement will output low and then move on so if the character is under 15 we will never end up in the second statement but if we do end up in the second statement we already know that the character is not under 15 this is due to the fact that case when proceeds condition by condition and exits as soon as the condition is true so effectively I can remove this part over here and then at the second condition only make sure that the level is below 25 and you will see if you run this that our bucketing system works just the same and the other Improvement that I can add is to replace this last line with an else CL Clause so the else Clause takes care of all the cases that did not meet any of the conditions that we specified so the case statement will go condition by condition and look for a condition that’s true but in the end if none of the conditions were true it will return what the else Clause says so it’s like a fallback for the cases when none of our conditions turned out to be true and if you look at our logic you will see that if this has returned false and this has returned false all that’s left is characters that have a level which is either 25 or bigger than 25 so it is sufficient to use an else and to call those super and if I run this you will see that our bucketing works just the same for example Gandalf is still marked as super because in the case of Gandalf this condition has returned false and this condition has returned false and so the else output has been written there now what do you think would happen if I completely removed the else what do you think would happen if I only had two conditions but it can be the the case that none of them is true what will SQL do in that case let us try it and see what happens so the typical response in SQL when it doesn’t know what to do is to select the null value right and if you think about it it makes sense because we have specified what happens when level is below 15 and when level is is below 25 but none of these are true and we haven’t specified what we want to do when none of these are true and because we have been silent on this issue SQL has no choice but to put a null value in there so this is practically equivalent to saying else null this is the default behavior for SQL when you don’t specify an else Clause now like every other piece of SQL the case statement is quite flexible for instance you are not forced to always create a text column out of it you can also create an integer column so you could Define a simpler leveling system for your characters by using one and two else three for the higher level characters and uh this of course will also work as you can see here however one thing that you cannot do is to mix types right because what this does is that it results in one column in a new column and as you know in SQL you’re not allowed to mix types between columns so always keep it consistent when it comes to typing and then when it comes to writing the when condition all the computational power of SQL is available so you can reference columns that you are not selecting you can run calculations as I am doing here and you can change logical statements right Boolean statements in complex ways you can really do anything you want although I generally suggest to keep it as simple as possible for your sake and the sake of the people who use your code and that is really all you need to know to get started with the case statement to summarize the case statement allows us to define a new columns whose values are changing conditional on the other values of my row this is also called conditional logic which means that we consider several conditions and then we do have different behaviors based on which condition is true and the way it works is that in the select statement when you are mentioning all your columns you create a new column which in our case is this one and you bookend it with a case and end and then between those you write your actual conditions so every condition starts with a when is followed by a logical statement which needs to evaluate to true or false and then has the keyword then and then a value and then the case when statement will go through each of these conditions in turn and as soon as one of them evaluates to true you will output the value that you have specified if none of the conditions evaluate to true then it will output the value that you specify in the else keyword and if the lse keyword is missing it will output null and so this is what you need to use the case statement and then experience and exercise and coding challenges will teach you when it’s the case to use it pun intended now where does the case statement fit in our logical order of SQL operations and the short answer is that it is defined here at the step when you are selecting your columns that’s when you can use the case when statement to create a new column that applies your conditional logic and this is the same as what we’ve shown in the lecture on SQL calculations you you can use select statement not only to get columns which already exist but to Define new columns based on calculations and logic now let us talk about aggregations which are really a staple of any sort of data analysis and an aggregation is a function that takes any number of values and compresses them down to a single informative value so I’m looking here at at my usual characters table but this is the version that I have in Google Sheets and as you know we have this level column which contains the level of each character and if I select this column in Google Sheets you will see that in the bottom right corner I can see here a number of aggregations on this column and like I said no matter how many values there are in the level columns I can use aggregations to compress them to one value and here you see some of the most important aggregations that you will work with some simply adding up all values together the average which is doing the sum and then dividing by the number of values the minimum value the maximum the count and the count numbers which is the same here so these are basically summaries of my column and you can imagine in cases where where you have thousands or millions of values how useful these aggregations can be for you to understand your data now here’s how I can get the exact same result in SQL I simply need to use the functions that SQL provides for this purpose so as you can see here I’m asking for the sum average minimum maximum and count of the column level and you can see the same results down here now now of course I could also give names to this column for example I could take this one and call it max level and in the result I will get a more informative column name and I can do the same for all columns now of course I can run aggregations on any columns that I want for example I could also get the maximum of experience and call this Max experience and I can also run aggregations on calculations that involve multiple columns as well as constants so everything we’ve seen about applying arithmetic and logic in SQL also applies now of course looking at the characters table we know that our columns have different data types and the behavior of the aggregate functions also is sensitive to the data types of the columns for example let us look at the many text columns that we have such as class now clearly not all of the aggregate functions that we’ve seen will work on class because how would you take the average of these values it’s not possible right however there are some aggregate functions that also work on strings so here’s an example of aggregate functions that we can run on a string column such as class first we have count which simply counts the total number of non null values and I will give you a bit more detail about the count functions soon then we have minimum and maximum now the way that strings are ordered in SQL is something called lexicographic order which is basically a fancy word for alphabetical order and basically you can see here that for minimum we get the text value that occurs earlier in uh alphabetical order whereas Warrior occurs last and finally here’s an interesting one called string EG and what this does is that this is a function that actually takes two arguments the first argument as usual is the name of the column and the second argument is a separator and what this outputs is now a single string a single piece of text where all of the other pieces of text have been glued together and then separated by this character that we specified over here which in our case is a comma Now if you go to the Google documentation you will find an extensive list of all the aggregate functions that you can use in Google SQL and this includes the ones that we’ve just seen such as average or Max as well as a few others that we will not explore in detail here so let us select one of them such as average and see what the description looks like now you can see that this function Returns the average of all values that are not null and don’t worry about this expression in an aggregated group for now just think about this as meaning all the values that you provide to the function all the values in the column now there is a bit about window functions which we will see later and here there are in the caveat section there are some interesting edge cases for example what happens if you use average on an empty group or if all values are null in that case it returns null and so on you could see what the function does when it finds these edge cases and here is perhaps the most important section which is supported argument types and this tells you what type of columns you can use this aggregation function on so you can see that you can use average on any numeric input type right any column that contains some kind of number and also on interval and interval we haven’t examined it in detail but this is actually a data type that specifies a certain span of time so interval could express something like 2 hours or 4 days or 3 months it is a quantity of time and finally in this table returned data types you can see what the average function will give you based on the data type that you insert so if you insert uh integer column it will return to you a float column and that makes sense because the average function involves a division and that division will usually give you floating Point values but for any other of the allowed input types such as numeric bit numeric and so on and these are all data types which represent numbers in B query the average function as you can see here will present Reserve that data type and finally we have some examples so whenever you need to use an aggregate function that is whenever you need to take many values a sequence of multiple values and compress them all down to one value but you’re not sure about which function to use or what the behavior of the function is you can come to this page and look up the functions that interest you and then read the documentation to see how they work now here’s an error that typically occurs when starting out with aggregations so you might say well I want to get the name of each character and their level but I also want to see the average of all levels and because I want to compare those two values I want to compare the level of my character with the average on all levels so I can write a query that looks like this right go to the Fant as a characters table and then select name level and then average level but as you can already see this query is not functioning it’s giving me an error and the error says that the select list expression references column name which is neither grouped nor aggregated so what does this actually mean to show you what this means I’ve gone back to my Google Sheets where I have the same data for my characters table and I have copy pasted our query over here now what this query does it takes the name column so I will copy paste it over here and then it takes the level column copy paste this here as well and then it computes the average over level now I can easily compute this with sheet formula by writing equal and then calling the function which is actually called average and then within the function I can select all these values over here and I get the average now this is the result that SQL computes but SQL is actually not able to return this result and the reason is that there are three columns but they have mismatch number of values specifically these two columns have 15 values each whereas this column has a single value and SQL is not able to handle this mismatch because as a rule every SQL query needs to return a table and a table is a series of columns where each column has the same number of values if that constraint is not respected you will get an error in SQL and we will come back to this limitation when we examine Advanced aggregation techniques but for now just remember that you can mix non-aggregated columns with other non-aggregated columns such as name and level and you can mix aggregated columns with aggregated columns such as average level with some level for example so I could simply do this and I would be able to return this as a table because as you can see there are two columns both columns have a single Row the number of rows matches and this is actually valid but you might ask can’t I simply take this value over here and just copy it in every row and until I make sure that average level has the same number of values as name and level and so return a table and respect that constraint indeed this is possible you can totally do this and then it would work and then this whole table would become a single table and you would be able to return this result however this requires the use of window functions which is a a feature that we will see in later lectures but yes it is totally possible and it does solve the problem now here’s a special aggregation expression that you should know about because it is often used which is the count star and count star is simply counting the total number of rows in a table and as you can see if I say from fantasy characters select count star I get the total count of rows in my results and this is a common expression used across all SQL systems to figure out how many rows a table has and of course you can also combine it with filters with the wear clause in order to get other types of measures for example I could say where is alive equals true and then the count would become actually the count of characters who are alive in my data so this is a universal way to count rows in SQL although you should know that if you’re simply interested in the total rows of a table and you are working with bigquery an easy and totally free way to do it is to go to the details Tab and look at the number of rows here so this was all I wanted to tell you about simp Le aggregations for now and last question is why do we call them simple simple as opposed to what I call them simple because the way we’ve seen them until now the aggregations take all of the values of a column and simply return One summary value for example the sum agregation will take all of the values of the level column and then return a single number which is the sum of all levels and more advanced aggregations involved grouping our data for example a question we might ask is what is the average level for Mages as opposed to the average level for Archers and for Hobbits and for warriors and so on so now you’re Computing aggregations not over your whole data but over groups that you find in your data and we will see how to do that in the lecture on groupi but for now you can already find out a lot of interesting stuff about your data by running simple aggregations let us now look at subqueries and Common Table expressions and these are two fundamental functionalities in SQL these functionalities solve a very specific problem and the problem is the following sometimes you just cannot get the result you require with a single query sometimes you have to combine multiple SQL queries to get where you need to go so here’s a fun problem that will illustrate my point so we’re looking at the characters table and we have this requirement we want to find all those characters whose experience is between the minimum and the maximum maximum value of our experience another way to say this we want characters who are more experienced than the least experienced character but less experienced than the most experienced character in other words we want to find that middle ground that is between the least and the most experienced characters so let us see how we could do that uh I have here A Simple Start where I am getting the name and experience column from the characters table now let us focus on the first half of the problem find characters who have more experience than the least experienced character now because this is a toy data set I can sort of eyeball it so I can scroll down here and I can see that the lowest value of experience is pipin with 2100 and so what I need to do now is to filter out from this table all the rows that have this level of experience but apart from eyeballing how would we find the lowest level of experience in our data if you thought of aggregate functions you are right so we have seen a in a previous lecture that we have aggregated functions that take any number of values and speed out a single value that’s a summary for example meing minum and maximum and indeed we need to use a function like that for this problem so your first instinct might be let us take this table and let us filter out rows in this way so let’s say where experience is bigger than the minimum of experience and on the surface this makes sense right I am using an aggregation to get the smallest value of experience and then I’m only keeping rows that have a higher value than that however as you see from this red line this actually does not work because it tells us aggregate function is not allowed in the work Clause so what is going on here so if you followed the lecture on aggregation you might have a clue as to why this doesn’t work but it is good to go back to to it and understand exactly what the problem is so I’m going back to my Google sheet over here where I have the exact same data and I copied our current query down here and now let’s see what happens when SQL tries to run this so SQL goes to the fantasy characters table and the Second Step In The Logical order as you remember is to filter it and for the filter it has to take the column of experience so let me take this column and copy it down here and then it has to compute minimum of experience right so I will Define this column here and I will use Google Sheets function to achieve that result so equals mean and then selecting the numbers and here I get the minimum value of experience and now SQL has to compare these column but this comparison doesn’t work right because these are two columns that have a different number of rows they have a different number of values so SQL is not able to do this comparison you cannot do an element by element comparison between a column that has 15 values and a column that has a single value so SQL throws an error but you might say wait there is a simple solution to this just take this value and copy it all over here until you have two columns of the same size and then you can do the comparison indeed that would work that’s a solution but SQL doesn’t do it automatically whereas if you work with other analytics tools such as pandas in python or npy you will find that um in a situation like this this would be done automatically this would be copied all over here and there’s a process called broadcasting for that but SQL does not take so many assumptions and so many risks with your data if it literally doesn’t work then SQL will not do it so hopefully now you have a better understanding of why this solution does not work so how could we actually approach this problem now a Insight is that I can run a different query so I will open this on the right to find out the minimum experience right I can go back to the characters table and I can select the minimum of experience this is simply what we’ve learned to do in the lecture on aggregations and I get the value here that is the minimum value of experience now that I know the minimum value of experience I could simply copy this value and insert it here into a wear filter and if I run this this will actually work it will solve my problem the issue of course is that I do not want to hard code this value first of all it is not very practical to run a separate query and copy paste the value in the code and second the minimum value might change someday and then I might not remember to update it in my code and then this whole query would become invalid to solve this problem I will use a subquery and I will simply delete the hardcoded value and I will open round brackets which is a way to get started on a subquery and I will take the query that I have over here and put them put it in the round brackets and when I run this I get the result that I need so what exactly is going on here we are using a subquery or in other words a query within a query so when SQL looks at this code it says all right so this is the outer query right and it has a inner query inside it a nested query so I have to start with the innermost query I have to start with the nested query so let me compute this and so SQL runs this query first and then it gets a value out of it which in our case we know that is 2100 and after that SQL substitutes this code over here by the value that was computed and we know from before that this works as expected and to compute the other half of our problem we want our character to have less experience than the most experienced character so this is just another condition in the wear filter and so I can add an end here and copy this code over here except that now I want my experience to be smaller than the maximum of EXP experience in my table now you might know this trick that if you select only part of your code like this and then you click run SQL will only execute that part of the code and so here we get the actual maximum for our experience and we can write it here in the comment and now we know that when SQL runs this query all of these will be computed to 15,000 and then experience will will be compared on that and the query will work as intended and here is the solution to our problem now here’s the second problem which shows another side of subqueries we want to find the difference between a character’s experience and their mentors so let us solve it manually for one case in the characters table so let us look at this character over here which is Saran with id1 and their experience is 8500 and then Saruman has character id6 as their Mentor so if I look for id6 we have Gandalf this is not very Canon compared to the story but let’s just roll with it and Gandalf has 10,000 of experience and now if we select the experience of Gandalf minus the experience of Saran we can see that there is A500 difference between their experience and this is what I want to find with my query now back to my query I will first Alias my columns in order to make them more informative and this is a great trick trick to make problems clearer in your head assign the right names to things so here instead of ID I will call this mentee ID and here I have Mentor ID and here instead of experience I will call this Mente experience so I have just renamed my columns now the missing piece of the puzzle is the mentor experience right so how can I get the mentor experience for example in the first case I know that character 11 is mentored by character 6 how can I get the experience of character six now of course I can take a new tab over here split it to the right go to Fantasy characters filter for ID being equal to six which is the ID of our mentor and get their experience and the experience in this case is 10,000 this is the same example that we saw before but now I would have to write this separate query for each of my rows so here six I’ve already checked but then I will need to check two and seven and one and this is really not feasible right and the solution of course is to solve it with a subquery so what I’m going to do here is open round brackets and in here I will write the code that I need and here I can simply copy the code that I’ve written here get experience from the characters where ID equals six now the six part is still hardcoded because in the first row Mentor ID is six to avoid hardcoding this part there are two components to this the first one is noticing that I am referencing the same table fantasy. characters in two different places in my code and this could get buggy and this could get confusing and the solution is to give separate names to these two instances now what are the right names to give so if we look at this outer query right here this is really information about the M te right because we have the Mente ID the ID of their mentor and the Mente experience so I can simply call this Mente table and as you can see I can Alias my table by simply writing it like this or I could also add the as keyword it would work works just the same on the other hand this table will give us the experience of the mentor this is really information about the mentor so we can call this Mentor table now we’re not going to get confused anymore because these two instances have different names and now what do we want this ID to be if we’re not going to hardcode it we want it to be this value over here we want it to be the mentor ID value from the Mente table we want it to be the M’s mentor and to refer to that column I will get the table name dot the column name so this is telling me get the mentor ID value from mentee table and now that I have the subquery which defines a colum with these two brackets I can Alias the result just like I always do and run this and now you will see after making some room here that we have successfully retrieved The Experience value for the mentor now I realize that this is not the simplest process so let us go back to our query over here and make sure that we understand exactly what is happening now first of all we are going to the characters table which contains information about our mentee the person who is being mentored and we label the table so that we remember what it’s about we filter it because we’re not interested in characters that do not have a mentor and then we’re getting a few data right the ID in this case represents the IDE of the mentee and we also have their Mentor ID and we also have the experience which again this is the table about the Mente represents the mentee experience now our goal is to also get the experience of their Mentor our goal is to see that we have a mentor id6 and we want to know that their experience is 10,000 and we do that with a subquery it’s a query within a query and in this subquery which is an independent piece of SQL code we are going back to the characters table but this is another instance of the table right that we’re looking at so to make sure we remember that we call this Mentor table because it contains information about the mentor and how do we make sure that we get the right value over here that we don’t get confused between separate mentors we make sure that for each row the ID of the character in this table is equal to the mentor ID value in the menty table in other words we make sure that we plug in this value over here in this case six into the table to get the right row and then from that row we get the experience value all of these code over here defines a new column which we call Mentor experience and this is basically the same thing that we did manually when we opened a table on the right and queried the table and copy pasted a hardcoded value this is just the way to do it dynamically with a subquery now we are not fully done with the problem right because we wanted to see the difference between the characters experience and their mentors so let’s see how to do this and the way to do it is with a column calculation just like the ones we’ve seen before so given that this column represents the mentor experience I can remove the Alias over here and over here as well and I can subtract the experience from this and a column minus a column gives me another column which I can then Alias as experience difference and if I I run this I will see the value that we originally computed manually which is the difference between the mentor and the Mente experience there’s nothing really new about this as long as you realize that this expression over here defines a column and this is the reference to a column and so you can subtract them and then give a name an alias to the result and now we can look at our two examples of nested queries side by side and we can figure out what they have in common and where do they differ so what they have in common is that they’re both problem that you cannot resolve with a simple query because you need to use values that you have to compute separately values that you cannot simply refer to by name like we usually do with our columns in this case on the left you need to know what are the minimum and maximum values for experience and in this case on the right you need to know what is the experience of a character’s mentor and so we solve that problem by writing a new query a nested query and making sure that SQL solves this query first gets the result and then plugs that result back back into the original query to get the data we need there is however a subtle difference between these two queries that turns out to be pretty important in practice and I can give you a clue to what this difference is by telling you that on the right we have something that’s called a correlated subquery and on the left we Define this as uncor related subquery now what does this really mean it means that here on the left our subqueries are Computing the minimum and the maximum experience and these are actually fixed values for all of our characters it doesn’t matter which character you’re looking at the whole data set has the same values from minimum experience and maximum experience so you could even imagine comp Computing these values first before running your queries for example you could say minimum experience is the minimum and maximum experience is the max and then you could imagine replacing these values over here right this will not actually work because you cannot Define variables like this in in SQL but on a logical level you can imagine doing this right because you only need to compute these two once I will revert this here so we don’t get confused on the other hand on the right you will see that the value that is returned by sub by this subquery needs to be computed dynamically for every row this value as you also see in the results is different for every row because every row references a different Mentor ID and so SQL cannot compute this one value here for for all rows at once it has to recompute it for every row and this is why we call it a correlated subquery because it’s connected to the value that is in each row and so it must run for each row and an important reason to distinguish between uncorrelated and correlated subqueries is that you can imagine that correlated subqueries are actually slow slower and more expensive to run because you have you’re running a SQL query for every row at least At The Logical level so this was our introduction to subqueries they allow you to implement more complex logic and as long as you understand it logically you’re off to a great start and then by doing exercises and solving problems you will learn with experience when it’s the case to use them in the last lecture we saw that we could use subqueries to retrieve singular values for example what is the minimum value of experience in my data set but we can also use subqueries and Common Table Expressions as well to create new tables all together so here’s a motivating example for that so what I’m doing in this query right here is that I am scaling the value of level based on the character’s class and you might need this in order to create some balance in your game or for whatever reason now what this does is that if the character is Mage the level gets divided by half or multiplied by 0.5 if the character is Archer or Warrior the level we take the 75% of it and in all other cases the level gains 50% so the details are not very important it’s just an example but the point is that we modify the value of level based on the character class and we do this with the case when statement that we saw in a previous lecture and as you can see in the results we get a new value of power level for each character that you can see here but now let’s say that I wanted to filter my my characters based on this new column of power level say that I wanted to only keep characters that have a power level of at least 15 how would I do that well we know that the wear filter can be used to filter rows so you might just want to go here and add a wear statement and say where power level is equal or bigger than 15 but this is not going to work right we know this cannot work because we know how the logical order of SQL operations works and so the case when column that we create power level is defined here at the select stage but the wear filter occurs here at the beginning right after we Source our table so due to our rules the wear component cannot know about this power level column that will actually get created later so the query that we just wrote actually violates the logical order of SQL operations and this is why we cannot filter here now there is actually one thing that I could do here to avoid using a subquery and get around this error and that’s something would be to avoid using this Alias power level that we assigned here and that the we statement cannot know about and replace it with the whole logic of the case when statement so this is going to look pretty ugly but I’m going to do it and if I run this you will see that we in fact get the result we wanted now in the wear lecture we saw that the wear Clause doesn’t just accept simple logical statements you can use all the calculations and all the techniques that are available to you at the select stage and you can also use case when statements and this is why this solution here actually works however this is obviously very ugly and impractical and you should never duplicate code like this so I’m going to remove this wear Clause over here and show you how you can achieve the same result with a subquery so let me first rerun this query over here so that you can see the results and now what I’m going to do I’m going to select this whole logic over here and wrap it in round brackets and then up here I’m going to say select star from and when I run this new query this data that I’m seeing over here should be unchanged so let us run it and you will see that the data has not changed at all but what is actually happening here well it’s pretty simple usually we say select star from fantasy characters right and by this we indicate the name of a table that our system can access but now instead of a table name we are showing a subquery and this subquery is a piece of SQL logic that obviously returns a table so SQL will look at this whole code and we’ll say say okay there is a outer query which is this one and there is an inner query a nested query which is this one so I will compute this one first and then I will treat this as just another table that I can then select from and now because this is just another table we can actually apply a wear filter on top of it we can say where power level is equal or greater than 15 and you will see that we get the result we wanted just like before but now our code looks actually better and the case when logic is not duplicated if you wanted to visualize this in our schema it would look something like this so the flow of data is the following first we run the inner query that works just like all the other queries we’ve seen until now it starts with the from component which gets the table from the database and then it goes through the usual pipeline of SQL logic that eventually produces a result which is a table next that table gets piped into the outer query the outer query also starts with the from component but now the from component is not redem directly from the dat database it is reading the result of the inner query and now the outer query goes through the usual pipeline of components and finally it produces a table and that table is our result and this process could have many levels of nesting because the inner query could reference another query which references another query and eventually we would get to the database but it could take many steps to get there and to demonstrate how multiple levels of nesting works I will go back to my query over here and I will go into my inner query which is this one and this is clearly referencing the table in the database but now instead of referencing the table I will reference yet an other subquery which can be something like from fantasy characters where is alive equals true select star so I will now run this and we have added yet another subquery to our code this was actually not necessary at all you could add the wear filter up here but it is just to demonstrate the fact that you can Nest a lot of queries within each other the other reason I wanted to show you this code is that I hope you will recognize that this is also not a great way of writing code it can get quite confusing and it’s not something that can be easily read and understood one major issue is that it interrupts the natural flow of reading code because you constantly have to interrupt a query because another nested query is beginning within it so you will read select start from and then here another query starts and this is also querying from another subquery and after reading all of these lines you will find this wear filter that actually refers to the outer query that has started many many lines back and if you find this confusing well I think you’re right because it is and the truth is that when you read code on the job or in the wild or when you see solutions that people propose to coding challenges unfortunately this is something that occurs a lot you have subqueries within subqueries within subqueries and very quickly the code becomes impossible to read fortunately there is a better way to handle this and a way that I definitely recommend over this which is to use common table Expressions which we shall see shortly it is however very important that you understand this way of writing subqueries and that you familiarize yourself with it because whether we like it or not a lot of code out there is written like this we’ve seen that we can use the subquery functionality to define a new table on the Fly just by writing some code a new table that we can then query just like any other SQL table and what this allows us to do is to run jobs that are too complex for a single query and to do that without defining new tables in our database and and storing new tables in our database it is essentially a tool to manage complexity and this is how it works for subqueries so instead of saying from and then the name of a table we open round brackets and then we write a independent SQL query in there and we know that every sqle query returns a table and this is the table that we can then work on what we do here is to select star from this table and then apply a filter on this new column that we created in the subquery power level and now I will show you another way to achieve the same result which is through a functionality called Common Table Expressions to build a Common Table expression I will take the logic of this query right here and I will move it up and next I will give a name to this table I will call it power level table and then all I need to say is with power level table as followed by the logic and now this is just another table that is available in my query and it is defined by the logic of what occurs Within the round brackets and so I can refer to this over here and query it just like I need and when I run this you see that we get the same results as before and this is how a Common Table expression works you start with the keyword with you give an alias to the table that you’re going to create you put as open round brackets write an independent query that will of course return a table under this alas over here and then in your code you can query this Alias just like you’ve done until now for any SQL table and although our data result hasn’t changed I would argue that this is a better and more elegant way to achieve the same result because we have separated in the code the logic for the these two different tables instead of putting this logic in between this query and sort of breaking the flow of this table we now have a much cleaner solution where first we Define the virtual table that we will need and by virtual I mean that we treat it like a table but it’s not actually saved in our database it’s still defined by our code and then below that we have the logic that uses this virtual table we can also have multiple Common Table expressions in our query let me show you what that looks like so in our previous example on subquery we added another part where here instead of querying the fantasy characters table we queried a filter on this characters table and it looked like this we were doing select star where is alive equals true so I’m just reproducing what I did in the previous lecture on subqueries now you will notice that this is really not necessary because all we’re doing here is add a wear filter and we could do this in this query directly but please bear with with me because I just want to show you how to handle multiple queries the second thing I want to tell you is although this code actually works and you can verify for yourself I do not recommend doing this meaning mixing Common Table expressions and subqueries it is really not advisable because it adds unnecessary complexity to your code so here we have a common table expression that contains a subquery and I will rather turn this into a situation where we have two common table expressions and no subqueries at all and to do that I will take this logic over here and paste it at the top and I will give this now an alias so I will call it characters alive but you can call it whatever is best for you and then I will do the keyword as add some lines in here to make it more readable and now once we are defining multiple Common Table Expressions we only need to do the with keyword once at the beginning and then we can simply add a comma and please remember this the comma is very important and then we have the Alias of the new table the as keyword and then the logic for that table all that’s needed to do now is to fill in this from because we took away the subquery and we need to query the characters alive virtual table here and this is what it looks like and if you run this you will get your result so this is what the syntax looks like when you have multiple Common Table Expressions you start with the keyword with which you’re only going to need once and then you give the Alias of your first table as keyword and then the logic between round brackets and then for every extra virtual table that you want to add for every extra Common Table expression you only need to add a comma and then another Alias the ask keyword and then the logic between round brackets and when you are done listing your Common Table Expressions you will omit the comma you will not have a comma here because it will break your code and finally you will run your main query and in each of these queries that you can see here you are totally free to query real tables you know material tables that exist in your database as well as common table Expressions that you have defined in this code and in fact you can see that our second virtual table here is quering the first one however be advised that the order in which you write these Common Table Expressions matters because a Common Table expression can only reference Common Table Expressions that came before it it’s not going to be able to see those that came after it so if I say here instead of from fantasy characters I try to query from power level table you will see that I get an error from bigquery because it thinks it doesn’t recognize it basically because the code is below so the order in which you write them matters now an important question to ask is when should I use subqueries and when should I use common table expressions and the truth is that they have a basically equivalent functionality what you can do with the subquery you can do with a common table expression my very opinionated advice is that every time you need to define a new table in your code you should use a Common Table expression because they are simpler easier to understand cleaner and they will make your code more professional in fact I can tell you that in the industry it is a best practice to use common table Expressions instead of subqueries and if I were to interview you for a data job I would definitely pay attention to this issue but there is an exception to this and this is the reason why I’m showing you this query which we wrote in a previous lect lecture on subqueries this is a query where you need to get a single specific value right so if you remember we wanted to get characters whose experience is above the minimum experience in the data and also below the maximum experience so characters that are in the middle to do this we need to dynamically find at any point you know when this query is being run what is the minimum experience and the maximum experience and the subquery is actually great for that you will notice here that we don’t really need to define a whole new table we just really need to get a specific value and this is where a subquery works well because it implements very simple logic and doesn’t actually break the flow of the query but for something more complex like power level table you know this specific query we’re using here which takes the name takes the level then applies a case when logic to level to create a new column called power level you could this do this with a subquery but I actually recommend doing it with a common table expression and this is a cool blog post on this topic by the company DBT it talks about common table expressions in SQL why they are so useful for writing complex SQL code and the best best practices for using Common Table expressions and towards the end of the article there’s also an interesting comparison between Common Table expressions and subqueries and you can see that of CTE Common Table expressions are more readable whereas subqueries are less readable especially if there there are many nested ones so you know a subquery within a subquery within a subquery quickly becomes unreadable recursiveness is a great advantage of CTE although we won’t examine this in detail but basically what this means is that once you define a Common Table expression in your code you can reuse it in any part of your code you can use it in multiple parts right you can use it in other CTE you can use it in your main query and so on on the other hand once you define a subquery you can really only use it in the query in which you defined it you cannot use it in other parts of your code and this is another disadvantage this is a less important factor but when you define a CTE you always need to give it a name whereas subqueries can be anonymous you can see it very well here we of course had to give a name to both of these CTE but the subqueries that we’re using here are Anonymous however I don’t I wouldn’t say that’s a huge difference and finally you have that CTE cannot be used in a work Clause whereas subqueries can and this is exactly the example that I’ve shown you here because this is a simple value that we want to use in our work clause in order to filter our table subqueries are the perfect use case for this whereas CTE are suitable for more complex use cases when you need to Define entire tables in conclusion the article says CTS are essentially temporary views that you can use I’ve used the term virtual table but temporary view works just as well conveys the same idea they are great to give your SQL more structure and readability and they also allow reusability before we move on to other topics I wanted to show you what an amazing tool to Common Table expressions are to create complex data workflows because Common Table expressions are not just a trick to execute certain SQL queries they’re actually a tool that allows us to build data pipelines within our SQL code and that can really give us data superpowers so here I have drawn a typical workflow that you will see in complex SQL queries that make use of Common Table Expressions now what we’re looking at here is a single SQL query it’s however a complex one because it uses CTE and the query is represented graphically here and in a simple code reference here the blue rectangles represent the Common Table Expressions these virtual tables that you can Define with the CTE syntax whereas the Red Square represents the base query the query at the bottom of your code that ultimately will return the result so a typical flow will look like this you will have a first Common Table expression called T1 that is a query that references a real table a table that actually exists in your data set such as fantasy characters and of course this query will do some work right it can apply filters it can calculate new columns and so on everything that we’ve seen until now and then the result of this query gets piped in to another Common Table expression this one is T2 that gets the result of whatever happen happened at T1 and then apply some further logic to it apply some more Transformations and then again the result gets piped into another table where more Transformations run and this can happen for any number of steps until you get to the final query and in the base query we finally compute the end result that will then be returned to the user so this is effectively a dat pipeline that gets data from the source and then applies a series of complex Transformations and this is similar to The Logical schema that we’ve been seeing about SQL right except that this is one level further because in our usual schema the steps are done by Clauses by these components of the SQL queries but here every step is actually a query in itself so of course this is a very powerful feature and this data pipeline applies many queries sequentially until it produces the final result and you can do a lot with this capability and also you should now be able to understand how this is implemented in code so we have our usual CTE syntax with and then the first table we call T1 and then here we have the logic within round brackets for T1 and you can see here that in the from we are referencing a table in the data set and then for every successive Common Table expression we just add a comma a new Alias and the logic comma new Alias and the logic and finally when we’re done we write our base query and you can see that the base query is selecting from T3 T3 is selecting from T2 T2 is selecting from T1 and T1 is selecting from the database but you are not limited to this type of workflow here is another maybe slightly more complex workflow that you will also see in the wild and here you can see that at the top we have two common table Expressions that reference the the database so you can see here like like the first one is getting data from table one and then transforming it the second one is getting data from table two and then transforming it and next we have the third CTE that’s actually combining data from these two tables over here so we haven’t yet seen how to combine data except through the union um I wrote The Joint here which we’re going to see shortly but all you need to know is that T3 is combining data from this these two parent tables and then finally the base query is not only using the data from T3 but also going back to T1 and using that data as well and you remember we said that great thing about ctes is that tables are reusable you define them once and then you can use them anywhere well here’s an example with T1 because T1 is defined here at the top of the code and then it is referenced by T3 but it is also referenced by the base query so this is another example of a workflow that you could have and really the limit here is your imagination and the complexity of your needs you can have complex workflows such as this one which can Implement very complex data requirements so this is a short overview of the power of CTE and I hope you’re excited to learn about them and to use them in your sequel challenges we now move on to joints which are a powerful way to bring many different tables together and combine their information and I’m going to start us off here with a little motivating example now on the left here I see my characters table and by now we’re familiar with this table so let’s say that I wanted to know for each character how many items they are carrying in their inventory now you will notice that this information is not available in the characters table however this information is available in the inventory table so how exactly does the inventory table works when you are looking at a table for the first time and you want to understand how it works the best question you can ask is the following what does each row represent so what does each row represent in this table well if we look at the columns we can see that for every row of this table we have a specific character id and an item id as well as a quantity and some other information as well such as whether the item is equipped when it was purchased and and so on so looking at this I realized that each row in this table represents a fact the fact that a character has an item right so I know by looking at this table that character id 2 has item 101 and character ID3 has item six and so on so clearly I can use this in order order to answer my question so how many items is Gandalf carrying to find this out I have to look up the ID of Gandalf which as you can see here is six and then I have to go to the inventory table and in the character id column look for the ID of Gandalf right now unfortunately it’s not ordered but I can look for myself here and I can see that at least this row is related to Gandalf because he has character id6 and I can see that Gandalf has item id 16 in his inventory and I’m actually seeing another one now which is this one which is 11 and I’m not seeing anyone uh any other item at the moment so for now based on my imperfect uh visual analysis is I can say that Gandalf has two items in his inventory of course our analysis skills are not limited to eyeballing stuff right we have learned that we can search uh a table for the information we need so I could go here and query the inventory table in a new tab right and I could say give me um from the inventory table where character id equals 6 this should give me all the information for Gandalf and I could say give me all the columns and when I run this I should see that indeed we have uh two rows here and we know that Gandalf has items 16 and 11 in his inventory we don’t know exactly what these items are but we know that he’s carrying two items so that’s a good start okay but uh what if I wanted to know which items Frodo is carrying well again I can go to the characters table and uh look up the name Frodo and I find out that Frodo is id4 so going here I can just plug that uh number into my we filter and I will find out that Frodo is carrying a single type of item which has id9 although it’s in a quantity of two and of course I could go on and do this for every character but it is quite impractical to change the filter every time and what if I wanted to know how many items each character is carrying or at least which items each character is carrying all at once well this is where joints come into play what I really want to do in this case is to combine these two tables into one and by bringing them together to create a new table which will have all of the information that I need so let’s see how to do this now the first question we must answer is what unites these two tables what connects them what can we use in order to combine them and actually we’ve already seen this in our example um the inventory table has a character id field which is actually referring to the ID of the character in the character’s table so we have two columns here the character id column in inventory and the ID column in characters which actually represent the same thing the identifier for a character and this logical connection the fact that these columns repres repr the same thing can be used in order to combine these tables so let me start a fresh query over here and as usual I will start with the from part now where do I want to get my data from I want to get my data from the characters table just as we’ve been doing until now however the characters table is not not enough for me anymore I need to join this table on the fantasy. inventory table so I want to join these two tables how do I want to join these two tables well we know that the inventory table has a character id column which is the same as the character tables ID column so like we said before these two columns from the different tables they represent the same thing so there’s a logical connection between them and we will use it for the join and I want to draw your attention to the notation that we’re using here because in this query we have two tables present and so it is not enough to Simply write the name of columns it is also necessary to specify to which table each column belongs and we do it with this notation so the inventory. character uh is saying that the we are talking about the character id colum in the inventory table and the ID column in the characters table so it’s important to write columns with this notation in order to avoid ambiguity when you have more than one table in your your query so until now we have used the from uh Clause to specify where do we want to get data from and normally this was simply specifying the name of a table here we are doing something very similar except that we are creating a new table that is obtained by combining two pre-existing tables okay so we are not getting our data from the characters table and we are not getting it from the inventory table but we are getting it from a brand new table that we have created by combining these two and this is where our data lives and to complete the query for now we can simply add a select star and you will now see the result of this query so let me actually make some room here and expand these results so I can show you what we got and as you can see here we have a brand new table in our result and you will notice if you check the columns that this table includes all of the columns from the characters table and also all of the columns from the inventory table as as you can see here and they have been combined by our join statement now to get a better sense of what’s Happening let us get rid of this star and let us actually select the columns that we’re interested in and once again I will write columns with this notation in order to avoid ambiguity and in selecting these columns uh I will remind you that we have all of the columns from the characters table and all of the columns from the inventory table to choose from so what I will do here is that I will take the ID columns from characters and I will take the name column from characters and then I will want to see the ID of the item so I will take the inventory table and the item id column from that table and from the inventory table I will also want to see the quantity of each item and to make our results clearer I will order my results by the characters ID and the item ID and you can see here that we get the result that we needed we have all of our characters here with their IDs and their name and then for each character we can tell which items are in their inventory so you can see here that Aragorn has item id4 in his inventory in quantity of two he also has Item 99 so because of this Aragorn has two rows if we look back at Frodo we see the uh information that we retrieved before and the same for Gandalf who has these two items so we have combined the characters table and the inventory table to get the information that we needed what does each row represent in our result well it’s the same as the inventory table each row is a fact which is that a certain character possesses a certain item but unlike the inventory table we now have all the information we want for a character and not just the ID so here we’ve uh we’re showing the name of each character but we could of course select more columns and get more information for each character as needed now a short note on notation when you see SQL code in the wild and u a query is joining on two or more tables people uh you know programmers were usually quite lazy and we don’t feel like writing the name of the table all all of the time right like we we’re doing in this case with characters so what we usually do is that we add an alias um on the table like this so from fantasy characters call it C we will join on inventory call it I and then basically we use this Alias um everywhere in the query both in the instructions for joining and in the column names and the same with characters so I will substitute everything here and and yes maybe it’s a bit less readable but it’s faster to write and we programmers are quite lazy so we’ll often see this notation and you will often also see that in the code we omit the as keyword which can be let’s say implicit in SQL code and so we write it like this from fantasy. character C join uh fantasy. inventory i and then C and I refer to the two tables that we’re joining and I can run this and show you that the query works just as well now we’ve seen why join is useful and how it looks like but now I want you to get a detailed understanding of how exactly the logic of join works and for this I’m going to go back to my spreadsheet and what I have here is my characters table and my inventory table these are just like you’ve seen them in big query except that I’m only taking um four rows each in order to make it simpler for the example and what you see here is the same query that I’ve just run on big query this is a t a query that takes the characters table joins it on the inventory table on this particular condition and then picks a few columns from this so let us see how to simulate this query in Google Sheets now the first thing I need to do is to build the table that I will run my query on because as we’ve said before the from part is now referencing not the characters table not the inventory table but the new table which is built by combining these two and so our first job is to build this new table and the first step to building this new table is to take all of the columns from characters and put them in the new table and then take all of the columns from inventory and then put them in the new table and what we’ve obtained here is the structure of our new table the structure of our new table is uh simply created by taking all of The Columns of the T table on the left along with all of the columns from the table on the right now I will go through each character in turn and consider the join condition the join condition is that the ID of a character is present in the character id column of inventory so let us look at my first character um we have Aragorn and he has ID one now is this ID present in the character id column yes I see it here in the first row so we have a match given that we have a match I will take all of the data that I have in the characters table for Aragorn and then I will take all of the data in the inventory table for the row that matches and I have built here my first row do I have any other Row in the inventory table that matches yes the second row also has a character id of one so because I have another match I will repeat the operation I will will take all of the data that I have in the left table for Aragorn and I will add all of the data from the right column in the row that matches now there are no more matches for id1 uh in the inventory table so I can proceed and I will proceed with Legolas he has character id of two question is there any row that has the value two in the character id column yes I can see it here so I have another match so just like before I will take the information for Legolas and paste it here and then I will take the matching row which is this one and paste it here we move on to gimly because there’s no other matches for Legolas now gimly has ID3 and I can see a match over here so I will take the row for gimly paste it here and then take the matching row character id 3 and paste it here great finally we come to Frodo character id for is there any match for this character I can actually find no match at all so I do nothing this row does not come into the resulting table because there is no match and this completes the job of this part of the query over here building the table that comes from joining these two tables this is my resulting table and now to complete the query I simply have to pick the columns that the query asks for so the First Column is character. ID which is this column over here so I will take it and I will put it in my result the second column I want is character. name which is this column over here the third column is the item id column which is this one right here and finally I have quantity which is this one right here and this is the final result of my query and of course this is just like any other SQL table so I can use all of the other things I’ve learned to run Logic on this table for example I might only want to keep items that are present in a quantity of two and so to do that I will simply add a wear filter here and I will refer uh the inventory table because that’s the parent table of the quantity column so I will say I will say i. quantity um bigger or equal to two and then how my query will work is that first it will build this table like we’ve seen so it will do this stage first and then it will run the wear filter on this table and it will only keep the rows where quantity is at least two and so as a result we will only get this row over here instead of this result that we see right here H except that um we will of course also have to only keep the columns that are specified in the select statement so we will get ID name um Item ID and quantity so this will be the result of my query after I’ve added a wear filter so let us actually take this and add it to B query and make sure that it works so so I have to add that after the from part and before the order by part right this is the order and after I run this I will see that indeed I get um Aragorn and Frodo is not exactly the same as in our sheet but that’s because our sheet has um less data but uh this is what we want to achieve and now let us go back to our super important diagram of the order of SQL operation and let us ask ourselves where does the join fit in in this schema and as you can see I have placed join at the very beginning of our flow together with the from because the truth is that the joint Clause is not really separate from the from CL Clause they are actually one and the same component in The Logical order of operations so as you remember the first stage specifies where our data lives where we do we want to get our data from and until now we were content to answer this question with a single table name with the address of a single table because all the data we needed was in just one table and now instead of doing this we are taking it a step further we are saying our data lives in a particular combination of two or more tables so let me tell you which tables I want to combine and how I want to combine them and the result of this will be of course yet another table and then this table will be the beginning of my flow and after that I can apply all the other operations that I’ve come to know uh on my table and it will work just like U all our previous examples the result of a join is of course just another table so when you look at a SQL query and this query includes a join you really have to see it as one and the same with the front part it defines the source of your data by combining tables and everything else that you do will be applied not to a single table not to any of the tables that you’re combining everything that you do will be applied to the resultant table that comes from this combination and this is why from and join are really the same component and this is why they are the first step in The Logical order of SQL operations let us now briefly look at multiple joints because sometimes the data that you need is in three tables or four tables and you can actually join as many tables as you want uh or at least as many tables as your system uh allows you to join before it becomes too slow so we have our example here from before we have each character and we have their name and we know which items are in their inventory but we actually don’t know what the items are we just know their ID so how can I know uh that if Aragorn has item four what item does Aragorn actually have what is the name of this item now obviously this information is available in the items table that you have here on the right and you can see here that we have a name column and just like before I can actually eyeball it I can look for it myself I know that I’m looking for item id 4 and if I go here and uh I go to four I can see that this item is a healing potion and now let us see how we can add this with the join so now I will go to my query and after joining with characters in inventory I will take that result and simply join it on a third table so I will write join on fantasy. items and I can call this it to use a uh brief form uh because I am lazy as all programmers are and now I need to specify the condition on which to join so the condition is that the item ID column which actually came from the inventory table right that’s its parent so I’m going to call it inventory. item um ID except that yeah I’m referring to inventory as a simple I that is the brief form is the same as the items table the ID column in the items table and now that I’ve added my condition the data that I’m searcing is now a combination of these three tables and in my result I now have access to The Columns of the items table and I can access these columns simply by referring to them so I will say it. name and some other thing it. power and after I run this query I should be able for each item to see the name and the power right so Aragorn has a healing potion with power of 50 Legolas has a Elven bow with power of 85 and so on now you may have noticed something a bit curious and it’s that name here is actually written as name1 and can you figure out why this is happening well well it’s happening because there’s an ambiguity right the characters table has a column called name and the items table also has a column called name and because bigquery is not referring to the columns the way we are doing it right by saying the the parent table and then the name of the column it uh it would find itself in a position of having two identically named columns so the second one uh it tries to distinguish it by adding underscore one and how we can remedy this is by renaming the column to something more meaningful for example we could say call this item name which would be a lot clearer for whoever looks at the result of our query and as you can see now the name makes more sense so you can see that the multiple join is actually nothing new because when we join the first time like we did before we have combined two two tables into a new one and then this new table gets joined to a third table so it’s simply repeating the join operation twice it’s nothing actually new but let us actually simulate a multiple join in our spreadsheet to make sure that we understand it and that it’s nothing new so again I have our tables here but I have added the items table which we will combine and I’ve written here our query right so take the characters table and join it with inventory uh like we did before and then take the result of that table and join it to items and here we have the condition so the first thing we need to do is to process our first join and this is actually exactly what we’ve done before so let us do it again first of all the combined table uh characters and inventory its structure is obtained by taking all the columns of characters and then all the columns of inventory and putting them side by side and this is the result table now for the logic of this table I will now do it faster because we’ve done it before but basically we get the first character id1 it has two matches so I’ll actually take this values and put them into two rows and for the inventory part I will simply call copy these two rows to um complete my match then we have Legolas there is one match here so I will take the left side and I will take so I’m looking for id2 so I will take this row over here that’s all we have and then we have gimle and he also has one match so I’ll will take it here and the resulting column and then finally Frodo has no match so I will not add him to my result this is exactly what we’ve done before so now that we have this new table we can proceed with our next join which is with items okay so the resulting table will be the result of our first join combined with items and to show you that we’ve already computed uh this and now it’s one table I have added round brackets now the rules for joining are just the same so take all of the columns in the left side table and then take all of the columns in the right side table and now we have the resulting structure of our table and then let us go through every row so let us look at the first row what does the joint condition say Item ID needs to be in the ID table of items so I can see a match here so I will simply take this row on the left side and the matching row on the right side and add it here second row the item ID is four do I have a match yes I can see that I have a match so I will paste the row on the left and the mat matching row on the right third column item id 2 do I have a match no I don’t so I don’t need to do anything and in the final row item id 101 I don’t see a match so I don’t have to do anything and so this is my final result in short multiple join works just like a normal join combine the first two tables get the resulting table and then keep doing this until you run out of joins now there’s another special case of join uh which is the self join and this is something that people who are getting started with SQL tend to find confusing but I want to show you that there’s nothing uh confusing about it because really it’s just a regular join that works just like all the other joints that we’ve seen there’s nothing actually special about it so we can see here uh the characters table and you might remember that for each character we are we have a column of Mentor ID now in a lot of cases this column has value null so it means that there’s nothing there but in some cases there is a value there and what this means is that this particular character so we are looking at number three uh that is Saruman uh this particular character has a mentor and who is this Mentor uh all we know is that their ID is six and it turns out that the ID in this column is referring to the ID in the characters table so to find out who six is I just have to look who has an ID of six and I can see that it is Gandalf so by eyeballing it I know that San has a mentor and that Mentor is Gandalf and then elron also has the same Mentor which is Gand so I can solve this by eyeballing the table but how can I get a table that shows for each character who has a mentor who their Mentor is it turns out that I have to take the character’s table and join it on the characters table on itself so let’s see how that works in practice so let me start a new query here on the right and so my goal here is to list every character in the table and then to also show their Mentor if they have one so I will of course have to get the characters table for this and the first time I take this table it is simply to list all of the characters right so to remind myself of that I can give it a label which is chars now as you know each character has a mentor ID value and but to find out who like what is the name of this Mentor I actually need to look it up in the characters table so to do this I will join on another instance of the characters table right this is another let’s say copy of the same data but now I’m going to use it for a different purpose I will not use it to list my characters I will use it to get the name of the mentor so I will call this mentors to reflect this use now what is The Logical connection between these two copies of the characters table each character in my list of characters has a mentor ID field and I want to match this on the the ID field of my mentor table so this is The Logical connection that I’m looking for and I can now add a select star to quickly complete my query and see the results over here so the resulting table has all of The Columns of the left table and all of The Columns of the right table which means that the columns of the characters table will be repeated uh twice in the result as you can see here but on the left I simply have my list of characters okay so the first one is Saruman and then on the right I have the data about their Mentor so Saran has a mentor ID of six and then here starts the data about the mentor he has ID of six and his name is Gandalf so you can see here that our self jooin has worked as intended but this is actually a bit messy uh we don’t need uh all of these columns so let us now select Only The Columns that we need so from my list of characters I want the name and then from the corresponding Mentor I also want the name and I will label these columns so that they make sense to whoever is looking at my data so I will call this character character name and I will call this Mentor name and when I run this query you can see that quite simply we get what we wanted we have the list of all our characters at least the ones who have a mentor and for each character we can see the name of their Mentor so a self join works just like any other join and the key to avoiding confusion is to realize that you are joining on two different copies of the same data okay you’re not actually joining on the same exact table so one copy of fantasy characters we call characters and we use for a purpose and then a second copy we call mentors and we use for another purpose and when you realize this you see that you are simply joining two tables uh and all the rules that you’ve learned about normal joints apply it just so happens that in this case the two tables are identical because you’re getting the data from the same source and to drive the point home let us quickly simulate this in our trusty spreadsheet and so as you can see here uh I have the query that I’ve run in B query and we’re now going to simulate it so the important thing to see here is that that we’re not actually joining one table to itself although that’s what it looks like we’re actually joining two tables which just happen to look the same okay and so one is called chars and one is called mentors based on the label that we’ve given them but then once we join them the rules are just the same as we’ve seen until now so to create the structure of the resulting table take all the columns from the left left and then take all the columns from the right and then go row by row and look for matches based on on the condition now the condition is that Mentor ID in chars needs to be in the ID column of mentors so first row Aragorn has Mentor 2 is this in the ID column yes I can see a match here so let me take all the values from here and all the values from the matching rows paste them together are there any other matches no second row we’re looking for Mentor ID 4 do we have a match yes I can see it here so let me take all of the values from the left and all of the values from the matching row on the right now we have two more rows but but as you can see in both cases Mentor ID is null which means that they have no mentor and basically for the purposes of the join we can ignore these rows we are not going to find a match in these rows in fact as an aside even if there was a character whose ID was null uh we wouldn’t match with Mentor ID null on a character whose ID was null because in squl in a sense null does not equal null because null is not a specific value but it represents the absence of data so in short when Mentor ID is null we can be sure that in this case uh there will be no match and the row will not appear in the join now that we have our result we simply need to select the columns that we want and so the first one is name which comes from the charge table which is this one over here and the second one is name that comes from the mentor table which is this one over here and here is our result so that’s how a self join works so until now we have seen uh joint conditions which are pretty strict and and straightforward right so there’s a column in the left table and there’s a column in the right table and they represent the same thing and then you look for an exact match between those two columns and typically they’re an ID number right so one table has the item id the other table also has the item ID and then you look for an exact match and if there’s an exact match you include the row in the join otherwise not that’s pretty straightforward but what I want to show you here is that the join is actually much more flexible and and powerful than that and you don’t always need you know two columns that represent the exact same thing or an exact match in order to write a joining condition in fact you can create your own you know complex conditions and combinations that decide how to join two tables and for this you can simply use the Boolean algebra magic that we’ve learned about in this course and that we’ve been using for example when working on the wear filter so so let us see how this works in practice now I’ve tried to come up with an example that will illustrate this so let’s say that we have a game you know board game or video game or whatever and we have our characters and we have our items okay and in our game um a character cannot simply use all of the items in the world okay there is a limit to which items a character can use and a limit is based on the following rule um let me write it here as a comment and then we will uh use it in our logic so a character can use any item for which the power level is equal or greater than the characters experience divided by 100 okay so this is just a rule uh that exists in our game and now let us say that we wanted to get a list of all characters and the items that they can use okay and this is clearly uh a case where we would need a join so let us actually write this query I will start by getting my data from fantasy. characters and I will call this c as a shorthand and I will need to join on the items table right and what is the condition of the join the condition of the join is that the character’s experience divided by 100 is greater or equal than the items power level and I forgot here to add a short hand I for the items table so this is the condition that refects our Rule and out of this table that I’ve created I would like to see the characters name and the characters experience divided by 100 and then I would like to see the items name and the items power to make sure that my um join is working as intended so let us run this and look at the result so this looks a bit weird because we haven’t given a label to this column but basically I can see um that I have Gandalf and his experience divided by 100 is 100 and he can wear the item Excalibur that has a power of 100 which satisfies our condition let me actually order by character name so that I can see in one place all of the items that a character can wear so we can see that Aragorn is first and his experience divided by 100 is 90 and then uh this is the same in all all of these rows that we see right now but then we see all of the items that Aragorn is allowed to use and we see their power and in each case you will see that their power does not exceed this value on the left so the condition uh that we wrote works as intended so as you can see what we have here is a Boolean expression just like the ones we’ve seen before which is a logical statement that eventually if you run it it evaluates to either true or false and all of the rules that we’ve seen for Boolean Expressions apply here as well for example I can decide that this rule over here does not apply to Mages because Mages are special and then I can say that if a character is Mage then I want them to be able to use all of the items well how can I do this in this query can you pause the video and figure it out so what I can do is to Simply expand my Boolean expression by adding an or right and what I want to test for is that character class equals Mage so let me check for a second that I have class and I have Mage so this should work and if I run this going through the result I will not do it but you can uh do it yourself and and verify for yourself that if a character is a Mage you will find out that they can use all of the items and this of course is just a Boolean expression um in which you have two statements connected by an or so if any of this is true if at least one of these two is true then the whole statement will evaluate to true and so the row will match if you have trouble seeing this then go back to the video on the Boolean algebra and uh everything is explained in there so this is just what we did before when we simulated The Joint in the spreadsheet you can imagine taking the left side table which is uh characters and then going row by row and then for the first row you check all of the rows in the right side table which is items all of the rows that have a match but this time you won’t check if the ID corresponds you will actually run this expression to see whether there is a match and when this expression evaluates as true you consider that to be a match and you include the row in the join however if this condition does not evaluate to true it’s not a match and so the row is not included in the join so this is simply a generalization from the exact match which shows you that you can use any conditions in order to join uh two tables now I’ve been pretending that there is only one type of join in SQL but that is actually not true there are a few different types of join that we need to know so let us see uh what they are and how they work now this is the query that we wrote before and this is exactly how we’ve written it before and as you can see we’ve simply specified join but uh it turns out that what we were doing all the time was something called inner join okay and now that I’ve written it explicitly you can see that if I rerun the query I will get exactly the same results and this is because the inner join is by far the most common type of join that you find in SQL and so in many uh styles of SQL such as the one used by bigquery they allow you to skip this specification and they allow you to Simply write join and then it is considered as an inner join so when you want to do an inner join you have the choice whether to specify it explicitly or to Simply write join but what I want to show you you now is another type of join called Left join okay and to see how that works I want to show you um how we can simulate this query in the spreadsheet so as you can see this is very similar to what we’ve done before I have the query uh that I want to simulate and notice the left join and then I have my two tables now what is the purpose of the left join in the previous examples which were featuring the inner join we’ve seen that when we combine two tables with an inner join the resulting table will only have rows that have a match in both tables okay so what we did is that we went through every Row in the characters table and if it had a match in the inventory table we kept that row but if there was no match we completely discarded that row but what if we wanted in our resulting table to see all of the characters to make sure that our list of characters was complete regardless of whether they had a match in the inventory table this is what left join is for left join exists so that we can keep all of the rows in the left table whether they have a match or not so let us see that in practice okay so when we need to do a left join between characters and inventory so first of all I need to determine the structure of the resulting table and to do this I will take all of the columns from the left table and all of the columns from the right table nothing new there next step let us go row by Row in the left table and look for matches so we have Aragorn and he actually has two matches uh by now we’ve uh remembered this so these two rows have a match in character id with the ID of characters so I will take these two rows and add them to my resulting table next is Legolas and I see a match here so I will take the rows where Legolas matches and put it here it’s only one row actually gimly has also a single match so I will create the row over here um and so this is the match for gimly and of course I can ensure that I’m doing things correctly by looking at this ID column and uh this character id column over here and they have to be identical right if they’re not then I’ve made a mistake and finally we come to Frodo now Frodo you will see does not have a match in this table so before we basically discarded this row because it had no match right now though we are dealing with the left join that means that all of the rows in the characters table need to be included so I don’t have a choice I need to take this row and include it and add it here and now the question is what values will I put in here well I cannot put any value from the inventory table because I don’t have a match so the only thing that I can do is to put NS in here NS of course represent the absence of data so they’re perfect for this use case and that basically completes uh the sourcing part of our left join now you may have noticed that there is an extra row here in inventory which does not have a match right it is referred into character id 10 but there is no character id 10 so here the frao row also did not have a match but we included it so should we include this row as well the answer is no why not because this is a left joint okay so left joint means that we include all of the rows in the left table even if they don’t have a match but we do not include rows in the right table when they do not have a match okay this this is why it’s a left join so but if you’re still confused about this don’t worry because it will become clearer once we see the other types of join and of course for the sake of completeness I can actually finish the query by selecting my columns which would be the uh character id and the character name and the item ID and the item quantity and this is my final result and in the case of Frodo we have null values which tells us that this row found no match in the right table which in this case means that Frodo does not have any items now that you understand the left join you can also easily understand the right joint it is simply the symmetrical operation to the left joint right right so whether you do characters left joint inventory or you do inventory right join characters the result will be identical it’s just the symmetrical operation right this is why I wrote here that table a left joint b equals table B right joint a so hopefully that’s pretty intuitive but of course if I I did characters right join inventory then the results would be reversed because I would have to keep all of the rows of inventory regardless of whether they have a match or not and only keep rows in characters which have a match so if you experiment for yourself on the data you will easily convince yourself of this result let us now see the left joint in practice so remember the query from before um where we take each character and then we see their Mentor this is the code exactly as we’ve written it before and so now you know that this is an inner join because when you don’t specify what type of join you want SQL assumes it’s an inner join at least that’s what the SQL in bigquery does and you can see that if I write inner join um I think I have a typo there uh the result is absolutely identical and in this case we’re only including characters who have a mentor right we are missing out on characters who don’t have a mentor meaning that Mentor ID is null because in the inner join there is no match and so they are discarded but what would happen if I went here and instead turn this into a left join what I expect to happen is that I will keep all of my characters so all of the rows from the left side table regardless of whether they have a match or not regardless of whether they have a mentor or not and so let us run this and let us see that this is in fact the case I now have a row for each of my characters and I have a row for Gandalf even though Gandalf does not have mentor and so I have a null value in here so the left join allows me to keep all of the rows of the left table now we’ve seen the inner join the left join and the right join which are really the same thing just symmetrical to each other and finally I want to show you the full outer join this is the last type of join that I want to that I want to show you now you will see that a full outer joint is like a combination of all of the joints that we’ve seen until now so a full outer join gives us all of the rows uh that have a match in the two tables plus all of the rows in the left table that don’t have a match with the right table plus all of the rows in the right table that don’t have a match in the left table so let us see how that works in practice what I have here is our usual query but now as you can see I have specified a full outer join so let us now simulate this join between the two tables now the first step as usual is to take all of the columns from the left table and all of the columns from the right table to get the structure of the resulting table and now I will go row by Row in the left table so as usual we have Aragorn and you know what I’m already going to copy it here because even if there’s not a match I still have to keep this row uh because this is a full outer joint and I’m basically not discarding any row now that I’ve copied it is there a match well I already know from the previous examples that there are two rows uh in the inventory table that match because they have character id one so I’m just going to take them and copy them over here and in the second row I will need to replicate these values perfect let me move on to Legolas and again I can already paste it because there’s no way that I’m going to discard this row but of course we know that Legolas has a m match and moving quickly cuz we’ve already seen this gimly has a match as well and now we come to Frodo now Frodo again I can already copy it because I’m keeping all the rows but Frodo does not have a match so just like before with the left join I’m going to keep this row but I’m going to add null values in the columns that come from the invent table so now I’ve been through all of the rows in the left table but I’m not done yet with my join because in a full outer join I have to also include all of the rows from the right table so now the question is are there any rows in the inventory table that I have not considered yet and for this I can check the inventory ID from my result 1 2 3 4 and compare it with the ID from my table 1 2 3 4 5 and then I realize that I have not included row number five because it was not selected by any match but since this is a full outer join I will add this row over here I will copy it and of course it has no correspondent uh in the left table so what do I do once again I will insert null values and that completes the first phase of my full outer join the last phase is always the same right pick the columns that are listed in the select so you have the ID the name Item ID and quantity and this completes my full outer join so remember how I said that a full outer join is like an inner join plus a left join plus a right join here is a visualization that demonstrates now in the result the green rows are the rows in which you have a match on the left table and the right table right and these rows correspond to the inner join and if you run an inner join this this will be the only rows that are returned right now the purple row is including a row that is present in the left table but does not have any match in the right table so if you were to run a left join what would the result be a left joint would include all of the green rows because they have a match and and additionally they would also include the purple row because in the left joint you keep all of the rows from the left if on the other hand you were to run a right join and you wouldn’t like swap the names of the tables or anything right you would do characters right join inventory you would get of course all of the green rows because they are a match Additionally you would get the blue row at the end because this row is present in the right table even though there’s no match and in the right join we want to keep all the rows that are in the right table and finally in a full outer join you will include all of these rows right so first of all all of the rows that have a match and then all of the rows in the left table even though they don’t have a match and finally all of the rows in the right table even though they don’t have a match and these are the three or four types of joint that you need to know and that you will find useful in solving your problems now here’s yet another way to think about joints in SQL and to visualize joints which you might find helpful so one way to think about SQL tables is that a table is a set of rows and that joints correspond to different ways of uh combining sets and you might remember this from school this is a v diagram it represents the relation uh between uh two sets and the elements that are inside these two sets so you can take set a to be our left table uh containing all of the rows from um the left table and set B to be our right table with all of the rows from the right table and in the middle here you can see that there is an intersection between the sets this intersection represents the rows that have a match uh so this would be the rows that I have colored green in our example over here so what will happen if I select if I want to see only the rows that are a match only the rows that belong in both tables let me select this now and you can see that this corresponds to an inner joint because I only want to get the rows that have a match then what would happen if I wanted to include all of the rows in the left table regardless of whether they have a match or not to what type of join does that correspond I will select it here and you can see that that corresponds to a left join the left join produces a complete set of records from table a with the matching records in table B if there is no match the right side will contain null likewise if I wanted to keep all of the rows in uh table B including the ones that match with a I would of course get a right join which is just symmetrical to a left join finally what would I have to do to include all of the rows from both tables regardless of whether they have a match or not if I do this then I will get a full outer join so this is just one way to visualize what we’ve already seen there is one more thing you can actually realize from this uh tool which is in some cases you might want to get all of the records that are in a except those that match in B so all of the record that records that a does not have in common with b and you can see how you can actually do this this is actually a left join with an added filter where the b key is null so what does that mean the meaning will be clear if I go back to our example for the left join you can see that this is our result for the left join and because Frodo had no match in the right table the ID column over here is null so if I take this table and I apply a filter where ID where inventory ID is null I will only get this result over here and this is exactly the one row in the left table that does not have a match in the right table so this is more of a special case you don’t actually see this a lot in practice but I wanted it wanted to show it briefly to you in case you try it and get curious about it likewise the last thing that you can do you could get all of the rows from A and B that do not have a match so the set of Records unique to table a and table B and this is actually very similar you do a full outer join and you check that either key is null so either inventory ID is null or character id is null and if you apply that filter you will get these two rows which is the set of rows that are in a and only in a plus the rows that are in B and only in B once again I’ve honestly never used this in practice I’m just telling you for the sake of completeness in case you get curious about it now a brief but very important note on how SQL organizes data so you might remember from the start of the course that I’ve told you that in a way SQL tables are quite similar to spreadsheet tables but there are two fundamental difference one difference is that each SQL table has a fixed schema meaning we always know what the columns are and what type of data they contain and we’ve seen how this works extensively the second thing was that SQL tables are actually connected with each other which makes SQL very powerful and now we are finally in a position to understand just exactly how SQL tables can be connected with each other and this will allow you to understand how SQL represents data so I came here to DB diagram. which is a very uh nice website for building representations of SQL data and this is uh this type of um of chart of representation that we see here is also known as ER as you can see me writing here which is stands for entity relationship diagram and it’s basically a diagram that shows you how your data is organized in your SQL system and so you can see a representation of each table uh this is the example that’s shown on the web website and so you have three tables here users follows and posts and then for each table you can see the schema right you can see that the users table has four columns one is the user ID which is an integer the other is the username which is varar this is another way of saying string so this is a piece of text rooll is also a piece of text and then you have a Tim stamp that shows when the user was created and the important thing to notice here is that these tables are actually they’re not they don’t exist in isolation but they are connected with each other they are connected through these arrows that you see here and what do these arrows represent well let’s look at the follows table okay so each row of this table is a fact shows that one user follows another and so in each row you see the ID of the user who follows and the ID of the user who is followed as well as the time when this event happened and what are these uh arrows telling us they’re telling us that the IDS in this table are the same thing as the user ID column in this table which means that you can join the follows table with the users table to get the information about the two users that are here the user who is following and the user who is followed so like we’ve seen before a table has a column which is the same thing as another tables column which means that you can join them to combine their data and this is how in SQL several tables are connected with each other they are connected by logical correspondences that allow you to join those tables and combine their data likewise you have the post table and each row represent a post and each post post has a user ID and what this arrow is telling you is that uh you can join on the user table using this ID to get all the information you need about the user who has created this post now of course as we have seen you are not limited to joining the tables along these lines you can actually join these tables on whatever condition you can think of but this is a guarantee of consistency between these tables that comes from how the data was distributed and it’s a guarantee it’s a promise that you can get the data you need by joining on these specific columns and that is really all you need to know in order to get started with joints and use them to explore your data and solve SQL problems to conclude this section I want to go back to our diagram and to remind you that from and join are really one and the same they are the way for you to get the data that you need in order to answer your question and so when the data is in one table alone you can get away with just um using the from and then specifying the name of the table but often your data will be distributed in many different tables so you can look at the ER diagram such as this one if you have it to figure out how your uh data works and then once you decided which tables you want to combine you can write a from which combines with a join and so create a new table uh which is a combination of two or more tables and then all of the other operations that you’ve learned will run on top of that table we are finally ready for a in-depth discussion of grouping and aggregations in SQL and why is this important well as you can see I have asked Chad GPT to show me some typical business questions that can be answered by data aggregation so let’s see what we have here What’s the total revenue by quarter how many units did did each product sell last month what is the average customer spent per transaction which region has the highest number of sales now as you can see these are some of the most common and fundamental business questions um that you would be asking when you do analytics and this is why grouping and aggregation are so important when we talk about SQL now let’s open our date once again in the spreadsheet and see what we might achieve through aggregation so I have copied here four columns from my characters table Guild class level and experience and I’m going to be asking a few questions the first question which you can see here is what are the level measures by class so what does this mean well earlier in the course we looked at aggregations and we call them simple aggregations because we were running them over the whole table so you might remember that if I select the values for level here I will get a few different aggregations in the lower right of my screen so what you can see here is that I have a count of of 15 which means that there are 15 rows for level and that the maximum level is 40 the minimum is 11 and then I have an average level of 21.3 more or less and if you sum all the levels you get 319 so this is already some useful information but now I would like to take it a step further and I would like to know this aggregate value within each class so for example what is the maximum level for warriors and what is the maximum level for Hobbits are they different how do they compare this is where aggregation comes into play so let us do just that now let us find the maximum level Within each class and let us see how we might achieve this now to make things quicker I’m going to sort the data to fit my purpose so I will select the range over here and then go to data sort range and then in the in the advanced options I will say that I want to sort by column B because that’s my class and now as you can see the data is ordered by class and I can see the different values for each class next I will take all the different values for class and separate them just like this so first I have Archer then I have hobbit then I have Mage and finally I have Warrior so here they are they’re all have their own sp right now finally I just need to take to compress each of these ranges so that each of them covers only one row so for Archer I will take the value of the class Archer and then I will have to compress these numbers to a single number and to do that I will use the max function this is the aggregation function that we are using and quite intuitively this function will look at the list of values we’ll pick the biggest one and it will reduce everything to the biggest value and you can also see it here in this tool tip over here doing the same for Hobbit compress all of the values to a single value and then compress all of the numbers to a single number by applying a an aggregation function so I’ve gone ahead and done the same for mage and Warrior and all that’s left to do is to take this and bring all these rows together and this is my result this is doing what I have asked for I was looking to find the maximum level Within each class so I have taken all the unique values of class and then all the values of level within each class I have compressed them to a single number by taking the maximum and so here I have a nice summary which shows me what the maximum level is for each class and I can see that mes are much more powerful than everyone and that Hobbits are much more weaker according to this measure I’ve learned something new about my data now crucially and this is very important in my results I have class which is a grouping field and then level which is an aggregate field okay so what exactly do I mean by this now class is a grouping field because it divides my data in several groups So based on the value of class I have divided my data as you see here so Archer has three values Hobbit has four values and so on level is an aggregate field because it was obtained by taking a list of several values so here we have three here we have four and in the wild we could have a thousand or 100 thousand or Millions it doesn’t matter it’s a list of multiple values and then I’ve taken these values and compressed them down to one value I have aggregated them down to one value and this is why level is an aggregate field and whenever you work with groups and aggregations you always have this division okay you are have some fields that you use for grouping you know for subdividing your data and then you have some fields on which you run aggregations and aggregations such as for example looking at a list of value and taking the maximum value or the average or the minimum and so on aggregations are what allow you to understand the differences between groups so after aggregating you can say oh well the the Mages are certainly much more powerful than the hobbits and so on and if you look work with the dashboards like Tableau or other analytical tools you will see that another way to refer to these terms is by calling the grouping Fields dimensions and the aggregate Fields measures okay so I’m just leaving it here you can say grouping field and aggregate field or you can talk about dimensions and measures and they typically refer to the same type of idea now let’s see how I can achieve the same result in SQL so I will start a new query here and I want to get data from fantasy. characters and after I’ve sourced this table I want to Define my groups okay so I will use Group by which is my new clause and then here I will have to specify the grouping field I will have to specify the group that I want to use in order to subdivide the data and that group is class in this case after that I will want to define the columns that I want to see in my result so I will say select and first of all I want to see the class and then I want to see the maximum level within each class so if I run this you will see that I get exactly the same result that I have in Google Sheets so we have seen this before Max is an aggregation function it takes a list of Val vales and then compresses them down to a single value right except that before we were running it on at the level of the whole table right so if I select this query alone and run it what do you expect to see I expect to see a single value because it has looked at all the levels in the table and it has simply selected the biggest one it has reduced all of them to a single value however if I run it after defining a group buy then this will run not on the whole table at once it will run within each group identified by my grouping field and we’ll compute the maximum within that group and so the result of this will be that I can see the maximum level for each group now I’m going to delete this and I don’t need to limit myself to a single aggregation I can write as many aggregations as I wish so I will put this down here and I’ll actually give it a label so that it makes sense and then I will write a bunch of other aggregations such as count star which basically is the number of values within that class um I can also look at the minimum level I can also look at the average level so let’s run this and make sure that it works so as you can see we have our unique values for class as usual and then and for each class we can compute as many aggregated values as we want so we have the maximum level the minimum level and we didn’t give a label to this so we can call it average level and then number of values n values is not referring to level in itself it’s a more General aggregation which is simply counting how many examples I have of each class right so I know I have four Mages three archers four Hobbits and four Warriors by looking at this value over here and here’s another thing I am absolutely not limited to the level column as you can see I also have the experience column which is also an integer and the health column which is a floating Point number so I can get the maximum health and I can get the minimum [Music] experience and it all works all the same all the aggregations are computed within each class but one thing I need to be really careful of is the match between the type of aggregation that I want to run and the data type of the field on which I plan to run it so all of these that we show here they’re number columns right either integers or floats what would happen if I ran the average aggregation on the name column which is a string what do you expect to happen you can already see that this is an error why no matching signature for aggregate function average for a type string so it’s saying this function does not accept the type string it accepts integer float and all types of number columns but if you ask me to find the average between a bunch of strings I have no idea how to do that so I can add as many aggregations as I want within my grouping but the aggregations need to make sense but these Expressions can be as complex as I want them to be so instead of taking the average of the name which is a string it doesn’t make sense I could actually run another function instead of this inside of this which is length and what I expect this to do is that for each name it will count how long that name is and then after I’ve done all these counts I can aggregate them uh I could take the average for them and what I get back is the average name length within each class doesn’t sound really helpful as a thing to calculate but this is just to show you that these Expressions can get quite complex now whatever system you’re working with it will have a documentation in some place which lists all the aggregate functions that you have at your disposal so here is that page for big query and as you can see here we have our aggregate functions and if you go through the list you will see some of the ones that I’ve shown you such as count Max mean and some others that uh I haven’t shown you in this example such as sum so summing up all the values um any value which simply picks uh one value I think it it happens at random and U array a which actually built a list out of those values and so on so when you need to do an analysis you can start by asking yourself how do I want to subdivide the data what are the different groups that I want to find in the data and then after that you can ask yourself what type of aggregations do I need within each group what do I want to know um about each group and then you can go here and try to find the aggregate function that works best and once you think you found it you can go to the documentation for that function and you can read the description so Returns the average of non-null values in an aggregated group and then you can see what type of argument is supported for example average supports any numeric input type right so any data type that represents a number as well as interval which represents a space of time now in the previous example we have used a single grouping field right so if we go back here we have our grouping field which is class and we only use this one field to subdivide the data but you can actually use multiple grouping Fields so let’s see how that works what I have here is my items table and for each item we have an item type and a rarity type uh and then for each item we know the power so what would happen if we wanted to say to see the average Power by item type and Rarity combination one reason we might want to see this is that we might ask ourselves is within every item type is it always true that if you go from common to rare to Legendary the power increases is this true for all item types or only for certain item types let us go and find out so what what I’m going to do now is that I’m going to use two fields to subdivide my data I’m going to use item type and Rarity and to do this as a first step I will sort the data so that it makes it convenient for me so I will go here and I will say sort range Advanced ranged sorting option and first of all I want to sort by column A which is item type and I want to add another sort column which will be column B and you can see that my data has been sorted next I’m going to take each unique combination of the values of my two grouping Fields okay so the first combination is armor common so I’m going to take this here and then I’m going to to write down all the values that come within this combination so in this case we only have one value which is 40 next I have armor legendary and within this combination I only have one value which is 90 next I have armor rare So for armor rare I actually have two values so I’m going to write them here next we have potion and common for this we actually have three values so I’m going to write them here so I’ve gone ahead and I’ve done it for each combination and you can see that each unique combination of item type and Rarity I’ve now copied the re relevant values and now I need to get the average power with in these combinations so I will take the first one put it over here and then I will take the average of the values this is quite easy because there’s a single value so I’ll simply write 40 next I will take the armor legendary combination and once again I have a single value for armor rare I have two values so I will actually press equal and write average to call the the spreadsheet function and then select the two values in here to compute the average and here we have it and I can go on like this potion common get the average Within These values potion legendary is a single value so I’ve gone ahead and completed this and this gives me the result of my query here I have all the different combinations for the values of uh what were they item type and Rarity and within each combination the average power so to answer my question is it that within each item type the power grows with the level of Rarity where for armor it goes from 40 to 74 to 90 so yes for potion we don’t have um a rare potion but basically it also grows from common to Legendary and in weapon we have uh 74 87 and 98 so I would say yes within each item type power grows with the level of Rarity so what are these three fields in the context of my grouping well item type is grouping field and Rarity is also a grouping field and the average power within each group is a aggregate field right so I am now using two grouping fields to subdivide my data and then I’m Computing this aggregation within those groups so let us now figure figure out how to write this in SQL it’s actually quite similar to what we’ve seen before we have to take our data from the items table and then we want to group by and here I have to list my grouping Fields okay so as I’ve said I have two grouping Fields they are item type and and Rarity so this defines my groups and then in the select part I will want to see my grouping fields and then within each group I will want to see the average of power I believe we used yes so I will get the average of power and here are our results just like in the sheets now as a tiny detail you may notice that power here is colored in blue and the reason for this is that power is actually a big query function so if you do power of two three you should get uh eight because it calculates the two to to to the power of three so it can be confusing when power is the name of a column because B query might think it’s a function but there’s an easy way to remedy this you can just use back ticks and that’s your way of telling big query hey don’t get confused this is not the name of a function this is actually the name of a column and as you can see it also works and it doesn’t create issues and just like before we could add as many aggregations as we wanted and for example we could take the sum of power also on other fields not just on Power and everything would be computed within the groups defined by the two grouping fields that I have chosen as expected now now let us see where Group by fits in The Logical order of SQL operations so as you know a SQL query starts with from and join this is where we Source the data this is where we take the data that we need and as we learned in the join section we could either just specify a single table in the from clause or we could specify a join of two or more tables either way the result is the same we have assembled the table where our data leaves and we’re going to run our Pipeline on that data we’re going to run all the next operations on that data next the work Clause comes into play which we can use in order to filter out rows that we don’t need and then finally our group group Pi executes so the group Pi is going to work on the data that we have sourced minus the rows that we have excluded and then the group Pi is going to fundamentally alter the structure of our table because as you have seen in our examples the group I basically compresses down our values or squishes them as I wrote here because in the grouping field you will get a single Row for each distinct value and then in the aggregate field you will get an aggregate value within each class okay so if I use a group bu it’s going to alter the structure of my table after doing the group bu I can compute my aggregations like you’ve seen in our examples so I can compute uh minimum maximum average sum count and and all of that and of course I need to do this after I have applied my grouping and after that after I I’ve computed my aggregations I can select them right so I can choose which columns to see um and this will include the grouping fields and the aggregated fields we shall see this more in detail in a second and then finally there’s all the other oper ations that we have seen in this course and this is where Group by and aggregations fit in our order of SQL operations now I want to show you an error that’s extremely common when starting to work with group pi and if you understand this error I promise you you will avoid a lot of headaches when solving SQL problems so I have my IDE items table here again and you can see the preview on the right and I have a simple SQL query okay so take the items table Group by item type and then show me the item type and the average level of power within that item type so so far so good but what if I wanted to see what I’m showing you here in the comments what if I wanted to see each specific item the name of that item the type of that item and then the average Power by the type of that item right so let’s look at the first item chain mail armor this is a armor type of item and we know that the average power for armors is 69.5 so I would like to see this row and then let’s take Elven bow now Elven baow is a weapon as you can see here the average powerful weapons is 85. 58 and so I would like to see that now stop for a second and think how might I achieve this how might I modify my SQL query to achieve this oh and there is a error in the column name over here because I actually wanted to say name but let’s see how to do it in the SQL query so you might be tempted to Simply go to your query and add the name field in order to reproduce What you see here and if I do this and I run it you will see that I get an error select expression references column name which is neither grouped nor aggregated understanding this error is what I want to achieve now because it’s very important so can you try to figure out on your own why this query is failing and what exactly this error message means so I’m going to go back to my spreadsheet and get a copy of my items table and as you can see I have copied the query that doesn’t work over here so let us now uh go ahead and reproduce this query so I have to take the items table here it is and then I have to group by item type and as you can see I’ve already sorted by item type to facilitate our work and then for each item we want to select the item type so that would be armor and we want to select the average power so to find that I can run a spreadsheet function like this it’s called average and get the power over here and then I am asked to get the name so if I take the name for armor and put it here this is what I have to add and here you can already see the problem that we are facing for this particular class armor there is a mismatch in the number of rows that each column is providing because as an effect of group by item type now there is only one row in which item type is armor and as an effect of applying average to power within the armor group now there is only one row of power corresponding to the armor group but then when it comes to name it’s neither present in a group Pi nor is it present in an aggregate function and that means that in the case of name we still have four values four values instead of one and this mismatch is an issue SQL cannot accept it because SQL doesn’t know how to combine columns which have different numbers of rows in a way it’s like SQL is telling us look you’ve told me to group the data by item type and I did so I found all the rows that correspond to armor and then you told me to take the average of the power level for those rows and I did but then you asked me for name now the item type armor has four names in it what am I supposed to do with them how am I supposed to combine them how am I supposed to squish them into a single value you haven’t explained how to do that so I cannot do it and this takes us to a fundamental rule of SQL something I like to call the law of grouping and the law of grouping is actually quite simple but essential it tells you what type of columns you can select after you’ve run a group pi and there are basically two types of columns that you can select after running a group bu one is grouping Fields so those those are the columns that appear after the group by Clause those are the columns you are using to group the data and two aggregations of other fields okay so those are fields that go inside a Max function a mean function a sum function a count function and so on now those are the only two types of columns that you can select if you try to select any other column you will get an error and the reason you will get an error is Illustrated here after a group Pi each value in the grouping Fields is repeated exactly once and then for that value the aggregation makes sure that there’s only one corresponding value in the aggregated field in this case there’s only one average power number within each item type however any other field if it’s not a grouping field and you haven’t run an aggregation on it you’re going to get all of its values and then there’s going to be a mismatch so the law of grouping is made to prevent this issue now if we go back to our SQL hopefully you understand now better why this error Isen happening and in fact this error message makes a lot more sense after you’ve heard about the law of grouping you are referencing a column name which is neither grouped nor aggregated so how could we change this code so that we can include the column name without triggering an error well we have two options either we turn it into a grouping field or we turn it into an aggregation so let’s try turning it into an aggregation let’s say for example that I said mean of name what do you expect would happen in that case so if I run this you will see that I have my grouping by item type I have the average power within each item type and then I have one name and so when you run mean on a sequence of uh text values what it does is that it gives you the first value in alphabetical order so we are in fact seeing the first name in alphabetical order within each item type so we’ve overcome the error but this field is actually not very useful we don’t really care to see what’s the first name in alphabetical order within each type but at least our aggregation is making sure that there’s only one value of name for each item type and so the golden rule of grouping is respected and we don’t get that error anymore the second alternative is to take name and add it as a grouping field which simply means putting it after item type type in here now what do you expect to happen if I run this query so these results as they show here are a bit misleading because there’s actually the name column is hidden so I will also add it here and as you can see I can now refer the name column in select without an aggregation why because it is a grouping field okay and what do we see here in the results well we’ve seen what happens when you Group by multiple columns that the unique combinations of these columns end up subdividing the data so in fact our values for average power are not divided by item type anymore we don’t have the average power for armor potion and weapon anymore we have the average power for an item that’s type armor and it’s called chain mail armor and that is in fact there’s only one row that does that and has power 70 likewise we have the average power for uh any item called cloak of invisibility which is of item type armor and again there’s only one example of that so we’ve overcome our error by adding name as a grouping field but we have lost the original group division by item type and we have subdivided the data to the point that it doesn’t make sense anymore so as you surely have noticed by now we made the error Disappear by including name but we haven’t actually achieved our original objective which was to show the name of each item the item type and then the average power within that item type well to be honest my original objective was to teach you to spot this error and understand the law of grouping but now you might rightfully ask how do I actually achieve this and the answer unfortunately is that you cannot achieve this with group Pi not in a direct simple way and this is a limitation of group Pi which is a very powerful feature but it doesn’t satisfy all the requirements of aggregating data the good news however is that this can be easily achieved with another feature called window functions now window functions are the object of another section of this course so I’m not going to go into depth now but I will write the window function for you just to demonstrate that it can be done easily with that feature so I’m going to go down here and write a a new query I’m going to take the items table and I’m going to select the name and the item type and then I’m going to get the average of power and again I’m going to use back ticks so bigquery doesn’t get confused with the function that has the same name and then I’m going to say take the average of power over Partition by item type so this is like saying average of power based on this item type and I will call this average Power by type and if I select this and run the query you will see that I get what I need I have a chain mail armor it’s armor and the average power for an armor is 69.5 so this is how we can achieve the original objective unfortunately not with grouping but with window functions now I want to show you how you can filter on aggregated values after a group buy so what I have here is a basic Group by query go to the fantasy characters table group it by class and then show me the class and within each class the average of the experience for all the characters in that class and you can see the results here now what if I wanted to only keep those classes where the average experience is at least 7,000 how could I go and do that one Instinct you might have is to add a wear filter right for example Le I could say where average experience is greater than or equal to 7,000 and if I run this I get an error unrecognized name average experience the wear filter doesn’t work here maybe it’s a labeling problem what if I actually add the logic instead of the label so what if I say where average of experience is bigger or equal to 7,000 well an aggregate function is actually not allowed in the work Clause so this also doesn’t work what’s happening here now if we look at the order of SQL operations we can see that the where Clause runs right after sourcing the data and according to our rules over here an operation can only use data produced before it and doesn’t know about data produced after it so the wear operation cannot have any way of knowing about aggregations which are computed later after it runs and after running the group bu and this is why it is not allowed to use aggregations inside the wear filter luckily SQL provides us with a having operation which works just like the wear filter except it works on aggregations and it works on aggregations because it happens after the group buy and after the aggregations so to summarize you can Source the table and then drop rows before grouping this is what the wear filter is for and then you can do your grouping and Compu your aggregations and after that you have another chance to drop rows based on a filter that runs on your aggregations so let us see how that works in practice now instead of saying where average experience actually let me just show you what we had before this is our actual result and we want to keep only those rows where average experience is at least 7,000 so after group Pi I will write having and then I will say average experience greater than or equal to 7,000 let me remove this part here run the query and you can see that we get what we need and you might be thinking well why do I have to to write down the function again can’t I just use the label that I’ve assigned well let’s try it and see if it still works and the answer is that yes this works in Big query however you should be aware that bigquery is an especially userfriendly and funto use product in many databases however this is actually not allowed in the sense that the database will not be kind enough to recognize your label in the having operation instead you will have to actually repeat the logic as I’m doing now and this is why I write it like this because I want you to be aware of this limitation another thing that you might not realize immediately is that you you can also filter by aggregated columns which you are not selecting so let’s say that I wanted to group by class and get the average experience for each class but only keep classes with a high enough average level I am perfectly able to do that I just have to write having average level greater than or equal to 20 and after I run this you will see that instead of four values I actually get three values so I’ve lost one value and average level is not shown in the results but I can of course show it and you will realize that out of the values that have stayed they all respect this condition they all have at least 20 of average level so in having you are free to write filters on aggregated values regardless of the columns that you are selecting so to summarize once more you get the data that you need you drop rows that are not needed you can then Group by if you want subdivide the data and then compute aggregations within those groups if you’ve done that you have the option to now filter on the result of those aggregations and then finally you can pick which columns you want to see and then apply all the other operations that we have seen in the course we are now ready to learn about window function a very powerful tool in SQL now window functions allow us to do computations and aggregations on multiple rows in that sense they are similar to what we have seen with aggregations and group bu the fundamental difference between grouping and window function is that grouping is fundamentally altering the structure of the table right because if I go here and I take this items table and I group by item type right now I’m looking at uh about 20 rows right but if I were to group the resulting table would have one two three three rows only because there’re only three types of items so that would significantly compress the structure of my table and in fact we have seen with the basic law of grouping that after you apply a group ey you have to work around this fundamental alteration in the structure of a table right because here you can see that the items table has 20 rows but how many rows do you expect it to have after you Group by item type I would expect it to have three rows because there’s only three types of items and so my table is being compressed my table is changing its structure and the basic law of grouping teaches you how to work with that it tells you that if you want a group by item type you can just select power as is because your table will have three rows but you have 20 values of power so you have to instead select an aggregation on power so that you can compress those values to a single value for each item type and if you want to select name you also cannot select name as is you also have to apply some sort of aggregation for example you could put the names into a list an array uh or so on but window functions are different window functions allow us to do aggregations allow us to work on multiple values without however altering the structure of the table without changing the number of rows of the table so let us see how this works in practice now imagine that I wanted to get the sum of all the power values for my items so what is the total power for all of my items so you should already be aware of how to do this in SQL to just get that sum right I can I can do this by getting my fantasy items table and then selecting the sum over the power so if I take this query and paste it in big query I will get exactly that and this now is a typical aggregation right the sum aggregation has taken 20 different values of power and has compressed them down to one value and it has done the same to my table it’s taken 20 different rows to my table and it has squished them it has compressed them down to one row and this is how aggregations work as we’ve seen in the course but what if I wanted to show the total power without altering the structure of the table what if I wanted to show the total power on every Row in other words I can take the sum of all the values of power and this is the same number that we’ve seen in B query and I can paste it over here and hopefully I can now expand it and this is exactly what I meant what if I can take that number and put it on every row and why would I want to do this well there’s several things that I can do with this setup right for example I could go here um for Phoenix Feather which is power 100 and I could say take this 100 and divide it by the total power in this row and then turn this into a percentage and now I have this 6.5 approximately percentage and thanks to this I can say Hey look um the phoenix feather covers about 6 or 7% of all the power that is in my items of all the power that is in my game and that might be a useful information a more mundane concern uh could be that this is uh your your budget so this is the stuff you’re spending on and instead of power you have the the price of everything and then you get the total sum right which is maybe what you spent in a month and then you want to know going at the movies what percent of your budget it covered and so on now I will delete this value because we’re not going to use it and let us see what we need to write to obtain this result in SQL so once again we go to the fantasy items table and I’m going to move it a bit down and then we select the sum power just just like before except that now I’m going to add this over open round bracket and close round bracket and this is enough to obtain this result well to be precise when I write this in B query I will want to see a few columns as well so I will want to see the name item tab Ty and power and here I will need a comma at the end as well as the sum power over and I will also want to give a label to this just like I have in the spreadsheet now this is the query that will reproduce What you see here in the spreadsheet so how this works is that the over keyword is signaling to SQL that you want to use a window function and this means that you will get an aggregation you will do a calculation but you’re not going to alter the structure of the table you are simply going to take the value and put it in each row this is what the over keyword signals to SQL now because this is a window function we also need to define a window what exactly is a window a window is the part of the table that each row is is able to see now we will understand what this means much more in detail by the end of this lecture so don’t worry about it but for now I want to show you that this is the place where we usually specify the window inside these brackets after the over but we have nothing here and what this means is that our window for each row is the entire table so that’s pretty simple right each row sees the entire table so to understand how the window function is working we always have to think row by row because the results can always be different on different rows so let us go row by row and figure out how this window function is working so now we have the first row and what is the window in this case meaning what part of the table does does this row see well the answer is that this row sees all of the table given that it sees all of the table it has to do the sum of power and so it will take this thing compute a sum over it put it in the cell now that was the first row moving on to the second row now what’s the window here what part of the table does this row see once again it sees all of the table given that it sees all of the table it takes power computes some over it gets the result and puts it in the cell now I hope you can see that the result has to be identical in every cell in every Row in other words because every row sees the same thing and every Row computes the same thing and this is why every Row in here gets the same value and this is probably the simplest possible use of a window function so let us now take this code and bring it to B query and make sure that it runs as intended and like I said in the lecture on grouping you will see that power is blue because bequer is getting confused with its functions so always be best practice to put it into back tis to be very explicit that you are referring to a column but basically what you see here is exactly what we have in our sheet and now of course we have this new field which shows me the total of power on every row and like I said we can use this for several purposes for example I can decide to show for each item what p percentage of total power it covers right that’s what I did before in the sheet so to do this I can take the power and I can divide by this window expression which will give me the total power not sure what happened there but let me copy paste here and I can call this percent total power now this is actually just a division so if I want to see the percentage I will have to also multiply by 100 but we know how to do this and once I look at this we can see that when we have power 100 we have almost 6.5% of the total power so this is the same thing that we did before and this goes to show that you can use these fields for your calculations and like I said if this was your budget you could use this to calculate what percentage of your total budget is covered by each item it’s a pretty handy thing to know now why do I have to take this uh to repeat uh all of this logic over here why can’t I just say give me power divided by some power well as you know from other parts of the course the select part is not aware of these aliases it’s not aware of these labels that we are providing so when I try to do this it won’t recognize the label so unfortunately if I want to show both I have to repeat the logic and of course I’m not limited to just taking the sum right what I have here is an aggregation function just like the ones we’ve seen with simple aggregations and grouping in aggregation so instead of sum I could use something like average using the back TI over right I need to remember uh to add the over otherwise it won’t work because it won’t know it’s a window function and I can give it a label and now for each row I will see the same value which is the average of power over the whole data set and you you can basically use any aggregation function that you need it will work all the same few more btics to put in here just to be precise but the result is what we expect now let us proceed with our Explorations so I would like now to see the total power for each row but now I’m not interested in the total power of the data set I’m interested of in the total Power by item type okay so if my item is an armor I want to see the total power of all armors if my have item is a potion I want to see the total power of all potions and so on because I want to compare items within their category I don’t want to compare every item with every item so how can I achieve this in the spreadsheet well let us start with the first r row so I need to check what item type I have and conveniently I have sorted this so we can be quicker now we have an armor so I want to see the total power for armor so what I can do is to get the sum function and be careful to select only rows where the item type is armor and this is what I get and then the next step would be to Simply copy this value and then fill in all of the rows which are armor because for all of the rows but again you have to be careful because the spreadsheet wants to complete the pattern but what I want is the exact same number and then all of the rows that have item type armor will have this value because I’m looking within the item type now I will do it for potion so here I need to get the sum of power for all items that are potions 239 and then make sure to co copy the exact same value and to extend it to all potions and next we have weapons so sum of all power by weapon which is here then copy it and copy it and then let’s see if it tries to complete the pattern it does so I’m just going to go ahead and paste it and now make this a bit nicer and now I have what I wanted to get each row is showing the total power within the items that are the same as the one that we see in the row now how can I write this in SQL so let me go ahead and write it here now two parts of this query will be the same same because we want to get the items table and see these columns but we need to change how we write the window function so once again I want to get the sum of power and I will need now to define a specific window now remember the window defines what each row sees so what do I want each row to see when it takes the sum of power for example what do I want the first row to see when it takes the sum of power I wanted to see only rows which have the item type armor or in other words all the rows with the same item type and I can achieve this in the window function by writing Partition by item type by adding a partition defining the window as a partition by item type means that each row will look at its item type and then we’ll partition the table so that it only sees rows which have the same item type so this row over here will see only these four rows and then you will take the sum of power and then you will put it in the cell and for this uh the second third third and fourth row the result will be the same because they will each see this part of the table when we come to potion so this row over here will say hey what is my item type it’s potion okay then I I will only look at rows that have item type potion and so this will be the window for these four rows and then in those rows I’m going to take power and I’m going to Summit and finally when we come to to these rows over here so starting with this row it will look at its item type and say okay I have item type uh weapon let me look at all the rows that share the same item type and so each window will look like this so let me color it properly its window will look like this and then it will take the sum of these values of power that fit in the window and put it in the cell second cell sees the same window sums over these values of power puts it in the cell and this is how we get the required result this is how we use partitioning in window functions so let’s go now to Big query and make sure that this actually works and when I run this I didn’t put a label but you can see that I’m basically getting the same result when I have a weapon I see a certain value when I have a potion I see uh another one and when I have an armor I see the third value so now for each item I am seeing the total power not over the whole table but within the item type now next task find the cumulative sum of power which is this column over here what is a cumulative sum it’s the sum of the powers of this item plus all of the items that are less powerful so to do this in the spreadsheet I will first want to reorder my data because I want to see it simply in order of power so I will actually take this whole range and I will go to data sort range Advance options and I will say that the data has a header row so that I can see the names of the columns and then I will order by power ascending so as you can see my records have now been sorted in direction of ascending power now how do I compute the cumulative sum of power in the first row all we have is 30 so the sum will be 30 in the second row I have 40 in this row plus 30 before so E I will have 70 when it comes here I have 50 in this row and then the sum up to now was actually 70 which I can see by looking at these two cells or I can see more simply by looking at the last cell so 50 + 70 will be 820 and proceeding like this I could compute the cumulative power over the whole column now for your reference I have figured out the correct Google Sheets formula that will allow you to compute the cumulative sum of power for our example and I went ahead and computed it so that we have it for all our data now this is is the formula right here and I’m not going to go in depth into it because this is not a course on on spreadsheets but I will show you the formula just in case you’re curious so the sum IF function will take the sum over a range only but it will only consider values that satisfy a certain logical condition so the first argument is the range that we want to sum over and this is the power and the Criterion so what needs to be true for a value to be um to be considered is that this value is lesser than or equal to the level of power in this row so what this formula is saying is take the level of power in this row and then take all the values of power which are lesser or equal and then sum them up this is exactly what our window function does and so our formula reproduces this now if you go and look what’s the way to do a cumulative sum in Google Sheets or what’s the way to do a running total there are other Solutions but they do come with some um pitfalls they do come with some Corner cases so this is a Formula that’s actually reproducing the behavior of SQL now let us go back to actually SQL and see how we would write this so I’m going to take the fantasy items table and I’m still going to select the columns and now I have to write my window function now the aggregation is just the same so take the sum of power and now I have to Define my window now my window is not defined Now by a partition but it is defined by an ordering order by power and when I say order by power in a window function what’s implicit in this is the keyword ask for ascending so this means that the window will order power from the smallest to the biggest and I can choose to write this keyword or not because just like in order by in SQL when you don’t specify it the default value is ascending from smallest to biggest so how does this window work work let’s start with the first row and let’s say we need to fill in this value so I’m going to look at my power level it is 30 and then the window says that I can only see rows where the power level is equal or smaller and what are the rows where the power level is equal or smaller to 30 there’re these rows over here so effectively this this is the only part of the table that this window sees on the first row and then take the sum over power so sum over 30 is 30 move on to the second row the power level is 40 the window says I only see rows where the power level is smaller uh or equal and this includes these two rows over here now take the sum of power over here you get 70 put it in the cell third row I have power level 50 I’m only seeing these rows so take the sum of power over this it’s 120 put it in the cell and I can continue like this until I get to the highest value in my data set it’s 100 never mind that is not the last row because both of the last two rows they have the highest value and when you look at this um when you come to this row and you look at 100 and and you say what’s the window what rows can I see I can see all rows where power is 100 or less and that basically includes all of the table right it includes all of the table so when you take the sum of power you will get the total sum and in fact you can see that in this case the cumulative power is equal to the total power that we computed before just as we would expect so this is easy to see here because we have ordered um our data conveniently but it works in any case and so what the order by does in a window function is that it makes sure that each row only sees rows which come before it given your ordering so if I want to order from the smallest power to the biggest power each row will only see rows that come before it in this ordering so they have the same level of power or lower but they don’t have a higher level of power so let us now take it to Big query and make sure it works as intended and I will add an ordering by power and here I will see the same thing that I’ve shown you in the spreadsheet I notice now that some numbers are different that these two items have 90 instead of 100 but never mind that the logic is the same and the numbers make sense now I’m also able to change the direction of the ordering right so let’s say that I take this field and copy it just the same except that instead of ordering by power ascending I order by power descending so what do you expect to see in this case let’s take a look now what I see here is that each item is going to look at its level of power and then it’s only going to consider items that are just as powerful or more powerful right so it’s the exact same logic but it’s reversed so when you look at the weakest item potion it has 30 and so it is looking at all the items because there’s no weaker item and so it finds the total level of power in our data set but if you go to the strongest item like Excalibur it has a power level of 100 and there’s only two items in the whole data set that have this power level itself and the phoenix feather so if you sum the power over this you get 200 so you can see it’s the exact same logic but now each row only sees items that have the same level of power or higher so when you order inside a window function you can decide the direction of this ordering by using descending or ascending or if you are a lazy programmer you can omit the um ascending key word and it will work just the same because that’s the default and finally we want to compute the cumulative sum of Power by type and you might notice that it is in a way the combination of these two uh requirements so let us see how to do that now the first thing I want to do is to sort our data in order to help us so I’m going to get this whole thing and I’m going to say sort range I’m going to need the advanced options I have a heading row and so first of all I want to order by type and then within each type I want to order by power and this is our data now now for each item I want to show the cumulative sum of power just like I did here except that now I only want to do it within the same item type so if we look at Armor it’s already sorted right so I have power 40 and this is the smallest one so I will just put 40 over here next I have uh this item with power 70 it’s still armor has power 70 and so I’m going to look at these two values and sum them up now I have uh 7 8 so I will take this plus 78 which is the sum of these three values and finally I have um 90 which is the sum of those values and now I’m done with armor right I’m beginning with a new item type so I have to start all over again I’m looking at potions now so we start with 30 that is the smallest value then we move to uh 50 so this is now seeing 30 and 50 uh which is 80 add 60 to 80 that is 140 and finally we want to add we want to add 99 plus 140 which is another way of saying that we want to add these values all the values for potion so this is what we want cumulative sum of power within item type so we do it within the item type and then when we find a new type we start over so to calculate it for weapon I could copy my function from here paste it in weapon and then I would need to modify it right I would need the range to only include weapon so that’s from C10 so go here C10 is the first one and the value that I want to look at here would have to be C10 as well because I want to start by looking at the power level for the for weapon and for some reason it’s purple however it should be correct it should always be the sum of the previous value so we start with 65 then we have 65 + 75 66 75 65 and so on so this is our result it’s cumulative power within the item type and to write this in SQL I will take my previous query over here and now when we Define the window we can simply combine what we’ve done before we can combine the partition buy with the order bu and you need to write them in the following order first the partition and then the order so I will Partition by item type and I will order by power ascending and this will achieve the required result so for each row in this field the window will be defined as follows first Partition by item type right so first of all you can only see rows which have the same item type as you have but then within this partition you can you have to keep only rows where the power is equal or smaller than what you have so in the case of the first item you only get this row likewise in the case of the first potion item you only get this row if you look at the second armor item again it looks it partitions right so it looks at all the items which have armor but then it has to discard those that have a bigger power than itself so it will be looking at these two rows and if for example example we look at the last row over here so this row will say oh okay I’m a weapon so I can only see those that are weapon and then I can only see those that have a level of power that’s equal or smaller than mine and that checks out those are all the rows and in fact the sum over here is equal to the sum of Power by type which is what we would expect so once again let us verify that this works in Big query and I will actually want to order by item type and power just so I have the same ordering as in my sheet and I should be able to see that within armor you have this like growing uh cumulative sum and then once the item changes it starts all over right it starts again at the value it grows it grows it accumulates and then we’re done with potions and then we have weapons and then again it starts and then it grows and it goes all the way to include the total sum of all powers in the weapon item type so here’s a summary of all the variants of Windows that we’ve seen we have seen four variants now in all of those for clarity we’ve kept the aggregation identical right we are doing some over the power field but of course you know that you can use any aggregate function here on any column which is compatible with that aggregate function and then we have defined four different Windows the first one is the simplest one there’s actually nothing in the definition we just say over and this means that it will just look at all the table so every row will see the whole table and so every row will show you the total of power for the whole table simple as that the second window is introducing a partition by item type and what this means in practice is that each row will uh look at its own item type and then only consider rows which share the same exact item type and So within those rows it will calculate the sum of power third window we have an ordering field so what this means is that each row is going to look at its level of power because we are ordering by power and then it’s going to only see rows where the power level is equal or smaller and the reason why we’re looking in this direction is that when we order by power is implicitly uh understood that we want to order by power ascending If instead we ordered by power descending it would be the same just in the opposite direction each row would would look at its level of power and then only consider rows where power is equal or bigger and then finally we have a combination of these two right a we have a window where we use both a partition and an order and so what this means is that uh each row is going to look at its item type and discard all of the rows which don’t have the same item type but then within the rows that remain it’s going to apply that ordering it’s going to only consider rows which have the same level of power or lesser so it’s simply a combination of these two conditions and this is the gist of how window functions work first thing to remember window function provide aggregation but they don’t change the structure of the table they just insert a specific value at each row but after applying a window function the number of rows in your table is the same second thing thing to remember is that in the window definition you get to Define what each row is able to see when Computing the aggregation so when you are thinking about window function you should be asking yourself what part of the table does each row see what’s the perspective that each row has and there are two dimensions on which you can work in order to Define these windows one is the partition Dimension and the other is the ordering Dimension the partition Dimension Cuts up the table based on the value of a column so you will only keep rows that have the same value the order Dimension Cuts up the table based on the ordering of a field and then depending on ascending or descending depending on the direction that you choose you can you can look at rows that are after you in the ordering or you can look at rows that are before you in the ordering and you can pick either of these right either partitioning or ordering or you can combine them and by using this you can Define all of the windows that you might need to get your data now as a quick extension of this I want to show you that you’re not limited to defining windows on single fields on single columns you can list as many columns as you want so in this example I’m going to the fantasy characters table I’m getting a few columns and then I’m defining an aggregation uh on a window function so I’m taking the level uh field and I’m summing it up and then I’m partitioning by two Fields actually by Guild and is alive so what do you expect to happen if I do this this is actually the exact same logic as grouping by multiple fields which we’ve seen in the group ey now the data is not going to be divided by Guild and is not going to be divided by whether the character is alive or not but by the all the mutual combinations between these fields okay so um merkwood and true is one combin ation and so the people in here are going to fit together right so in fact we have two characters here 22 and 26 and their sum is 48 so you can see here that they both get 48 for sum of level and likewise when you look at Sher folk true these three they all end up in the same group and so they all share the same sum of level which is 35 but sh Fulk fals this is another group and they’re actually alone right it’s 12 and then the sum is 12 so again when you Partition by multiple Fields the data is divided in groups that are obtained by all the combinations between the values that these fields can have and if you experiment a bit by yourself you should have an easier time to convince yourself of this likewise the same idea applies to the order uh part of a window we have until now for Simplicity ordered by one field to be honest most times you will only need to order by one field but sometimes you might want to order by different fields so in this example you can see that we are defining our ordering based on two Fields power and then weight and then based on that ordering we calculate the sum of power and this is again a case of cumulative sum however now the ordering is different and you will realize this if we go to the most powerful items in our data these last two which are both at 100 so if you remember when we were ordering by power alone these two uh Fields had the same value in this um window function because when you order just by power they are actually the same they both have 100 but because now we’re ordering by weight and again we’re ordering by weight ascending so from the smallest weight to the biggest weight now the phoenix feather comes first because although it has the same power as Excalibur the Phoenix weather is lighter and because it comes first it has a different value for this aggregation and of course we have the power to to say ascending or descending on each of the fields by which we order so if I wanted to reverse this I could simply write descending after the weight and be careful that in this case descending is only referring to weight it’s not referring to power so this is just as if I’ve wrote this right so the this one can be omitted um because it’s ascending by default but I would write both to be clear and now if I run this you will see that our result is reversed right Excalibur comes first because we have weight descending so it’s heavier and then last we have the phoenix feather which is lighter and again understanding this theoretically is one thing but I do encourage you to experiment with this with your data with exercises and then you will um you will be able to internalize it and now we are back to our schema for The Logical order of SQL operations and it is finally complete again because we’ve seen all of the components that we can use to assemble our SQL query and now the question is where do window functions fit into this well as you can see uh we have placed them right here so what happens is that again you get your data and then the we filter runs dropping rows which you don’t need and then you have a choice whether to do a group by right now if you do a group by you’re going to change the structure of your table it’s not going to have the same number of rows it’s going to have a number of rows that depends of the unique values of your grouping field or the unique combinations of values of your Fields if you have used more than one if you group you will probably want to compute some aggregations and then you may want to filter on those aggregations meaning dropping rows uh based on the values of those aggregations and here is where window functions come into play it is on this result that window functions work so if you haven’t done a group bu then window functions will work on your data after the wear filter runs if you have done a group buy we window functions will work on the result of your aggregation and then after applying the window function you can select which columns you want to show give them uh labels and then all the other parts run right so you can choose to drop duplicates from your result meaning duplicate rows rows which have the same value on every column you can stack together different tables right you can put them on top of each other and then finally when you have your result you can apply some ordering and also you can cut the result you can limit it so you only show a few uh rows and this is where window functions fit into the big scheme of things and there are some other implications of this ordering one interesting one is that if you have computed aggregations such as the sum of a value Within within a um a class um you can actually use those aggregations in the window function so you can sort of do an aggregation of an aggregation but this is uh in my opinion an advanced topic and it doesn’t fit into this um fundamentals course it may fit uh someday in a later more advanced course I want to show you another type of window functions which are very commonly used and very useful in SQL challenges and SQL interviews and these are numbering functions numbering functions are functions that we use in order to number the rows in our data according to our needs and there are several numbering functions but the three most important ones are without any doubt row number dense Rank and rank so let’s let’s see how they work in practice now what I have here is a part of my uh inventory table I’m basically showing you the item ID and the value of each number and conveniently I have ordered our rows uh by value ascending okay and now we are going to number our rows according to the value by using these window functions now I’ve already written the query that I want to reproduce so I’m going to the fantasy inventory table and then I’m selecting the item ID and the item value as you see here and then I’m using uh three window functions so the syntax is the same as what we’ve seen uh in the previous exercise except that now I’m not using an aggregation function over a field like I did before when I was doing a sum of power and so on but I’m using another type of function this is a numbering function okay so this functions over here they don’t actually take a parameter as you can see that there’s nothing between these round brackets because I don’t need to provide it an argument or a parameter all I need to do is to call the function but what really uh what’s really important here is to define the correct window and as you can see in the three examples here the windows are all the same I am simply ordering my rows by value ascending which means that when it’s going to compute the window function every row will look at its own value and then say okay I’m only going to see rows where the value is the same or smaller I’m not going to be able to visualize rows where the value is bigger than mine and this is what the window does so the first row over here will’ll only see value of 30 the second row will see this the third row will see these and so on up until the last row which will see itself and all the other rows as well now let us start with row number so row number is going to use this ordering over here in order to number my rows and it’s as simple as saying putting one in the first row two in the second one 3 four and so on so if I extend this pattern I’m going to get a number for every row and that’s it that’s all that row number does it assigns a unique integer number to every row based on the ordering that’s defined by the window function and you might think oh big deal why do I need this don’t I already have like row numbers over here in the spreadsheet well in Pro SQ problems you often need to order things based on different values and um row number allows you to do this you can also have many different orderings coexisting in the same table based on different conditions and that can come in handy as you will discover if you do SQL problems now let’s move on to ranking so first of all we have dense rank okay and ranking is another way of counting but is slightly different sometimes you just want to count things you know sometimes uh like we did here in row number like I don’t know you are a dog sitter and you’re given 20 dogs and you getting confused between all their their names and then you assign a unique number to every dog so that you can identify them uh and you can sort them by I don’t know by age or by how much you’re getting paid to docit them sometimes on the other hand you want to rank things like when choosing which product to buy or expressing the results of a race right if and the difference between ranking and Counting can be seen when you have the same value right so when you want to Simply number like we did here when you want to Simply assign assign a different number to each element and two things have the same value then you don’t really care right you need to sort of arbitrarily decide that okay one of them will be a number two and one of them will be number three but you cannot do the same for ranking if two students in a classroom get the best score you can’t just randomly choose that one of them is number one and the other is number two they have to both be number one right and if two people finish a race at at the same time and is the best time you can’t say that one uh won the race and the other didn’t that because one is number one the other is arbitrarily number two they both have to be number one right they have to share that Rank and this is where ranking differs so let’s go in here and apply our rank now we are ordering by value ascending which means that the smallest value will have rank number one and so 30 has rank number one now we go to the second row and again remember window functions that you always have to think row by row you have to think what each row sees and what each row decides so again the row is going to order by uh value so it’s only going to see these values over here and it has to decide its rank so this row says uh oh I’m not actually number one because there is a value which is smaller than me so that means I have to be number number two and then we get to the third row and this row is uh seeing all the values that come before it right they’re equal or or or smaller and now it’s saying oh I’m not number one because there’s something smaller but then uh the value 50 which uh this guy has uh is rank two and I have the same value number 50 we arrived in the same spot so I must have the same rank okay and this is the difference between row number and rank that identical values get the same rank but they don’t get the same row number and now we come to this row which is 60 so it’s going to look back and it’s going to say oh from what I see 30 is the smallest one so it has a rank of one and then you have 50 and 50 they both share a rank of two but I am bigger so I need a new rank and so what am I going to pick now as a new rank well I’m going to pick three because it’s the next uh number in the sequence then the next one is going to pick four the next one is going to pick five and then we have six and then it proceeds in the following way so I’ll do it quickly now so 7 8 9 10 11 and again careful here we’re sharing the same value so they are both 11 next we can proceed to 12 13 again the same value right so they have to share the 13th spot 14 so 14 for 1700 and then 14 again and then 15 and then 16 and this is what we expect to see when we compute the dense rank and finally we come to rank now rank is very similar to dense rank but there is one important difference so let’s do this again smallest value has rank number one like before and then we have 50 which has rank number two and then 50 is once more sharing rank number two and now we move from 50 to 60 so we need a new rank but instead of three we put four over here why do we put four because the previous rank covered uh two rows and it sort of at the three it sort of expanded to eight the three So based on the rules of Simply rank we have to lose the three and put four over here so this is just another way of managing ranking and you will notice that it conveys another piece of information compared to dense rank because not only I see that um this row over here has a different rank than the previous row but I can only I can also see how many members were covered by the previous uh ranks I can see that in the previous ranks uh they must have involved three members because I’m at four already and this piece of information was not available for dence rank so I will continue over here and so I have a new value which is uh rank five and then I have rank six rank seven rank 8 rank n Rank 10 rank 11 now I have rank 12 and again I have to share the rank 12 because two identical values but now because 12 has eaten up two spots I can’t use the 13 anymore the second 12 has like eaten the 13 and so I need to jump straight to 14 15 15 again and now I have to jump to 17 because 15 had two spots 17 again and now I have to jump to 19 and then finally I have 20 so you can see that the final number uh is 20 for rank just as with row number because it’s not only differentiating between ranks but it’s also counting for me how many elements have come before me how many rows are contained in the previous ranks I can tell that there’s 19 rows in the previous ranks uh because of how rank Works whereas with 10 rank we end ended up using only 16 uh ended up being only up to 16 so we sort of lost information on how many records we have and this might be one of the reasons why by default you have this method of ranking instead of this method of ranking even though dense rank seems more intuitive when you are uh building the ranking yourself so we can now take this query and hopefully I’ve written it correctly and go to big query and try to run it and as you can see we have our items they are sorted by value and then we have our numbering functions so row number should go from one to 20 without any surprises CU it’s just numbering the rows this dense rank should have rank one for the first and then these two should share the same rank because they have both have 50 and then the next rank is three so just as I’ve shown you in the spreadsheet similarly here you have 11 11 and then 12 rank uh instead starts off uh just the same uh smallest value has rank number one and the next two values have rank number two but then after using up two and two it’s like you’ve used up the three so you jump straight to four and after doing 15 and 15 you jump straight to 17 after doing 17 17 you jump straight to 19 and then the the highest number here is 20 which tells you how many rows you’re dealing with of course what you see here are window functions they work just the same as we I’ve shown you and so you could pick up Rank and you could order by value descending and then you will see you will find the inverse of that rank in the sense that the highest value item will give you rank one and it will go from there and the lowest value item will have sort of the the biggest rank number and and rank is often used like this you know the thing that has the most of what we want you know the biggest salary the biggest value the most successful product we rank it we make it so that it’s rank one it’s like the first in our race and then everyone else goes from there and so we often have actually we order by something descending when we calculate the rank and of course because these numbering functions are window functions they can also be combined with Partition by if you want to cut the data into subgroups so here’s an example on the fantasy characters table we are basically uh partitioning by class meaning that each row only sees the other rows that share the same class so archers only care about archers Warriors only care about Warriors and so forth and then within the class we are ordering by level descending okay so the highest levels come first and using this to rank the characters okay so if I go here then I can see that within the archers the highest level Archer has level 26 so they get the first Rank and then all the others is go down down from there and then we have our Warriors and the highest level Warrior is 25 and they also get rank one because they are being ranked within Warriors so this is like when you have races and there are categories this like when you have a race and there are categories within the race so there are like many people who arrive first because they arrive first in their category it’s not that everyone competes with everyone and so on and so forth you can see that each uh class of character has their own dedicated ranking and you can check the uh bigquery page on numbering function if you want to learn more about these functions you can see here the ones we’ve talked about rank row number and dense rank there are a few more but these are the ones that are most commonly used in SQL problems and because I know that it can be a bit confusing um to distinguish between row number dense Rank and rank here’s a visualization that you might find useful so let’s say that we have a list of values uh which are these ones and we are ordering them in descending order so you can see that there’s quite some repetition in these values and given this list of values how would these different numbering functions work on them right so here’s row number row number is easy it just um assigns a unique number to to each of them so it doesn’t matter that the values are sometimes the same you sort of arbitrarily pick um one to be one the other to be two and then you have three and then here you have 10 10 10 but it doesn’t matter you just want to order them so you uh do four five six and then finally seven dense rank is actually cares about the values being the same so 50 and 50 they both get one uh 40 gets two and then uh the 10 get three and then five gets four so easy the rank just grows uh using all the integer numbers dense rank is also assigning rank one to 50 and 50 but it’s also throwing away the two because there are two elements in here then the next one is getting rank three because the two has already been used and then the next batch 1011 is getting rank four but it’s also burning five and six and the next one then can only get rank seven so these are the differences between row number dance Rank and rank visualized we have now reached the end of our journey through the SQL fundamentals I hope you enjoyed it and I hoped that you learned something new you hopefully now have some understanding of the different components of SQL queries and the order in which they work and how they come together to allow us to do what we need with the data now of course learning the individual components and understanding how they work is only half the battle the other half of the battle is how do I put these pieces together how do I use them to solve real problems and in my opinion the response to that is not more Theory but it’s exercises go out there and do SQL challenges do SQL interviews find exercises or even better find some data that you’re interested in upload it in big query and then try to analyze it with SQL I should let you know that I have another playlist where I am solving 42 SQL exercises in postrest SQL and I think this can be really useful to get the other half of the course which is doing exercises and knowing how to face real problems with SQL and I really like this playlist because I’m using a free website a website that doesn’t require any sign up or any login uh it just works works and you get a chance to go there and do all of these exercises that cover all the theory that we’ve seen in this course and then after trying it yourself you get to see me solving it and my thought process and my explanation and I think it could be really useful if you want to deepen your SQL skills but in terms of uh how do I put it all together how do I combine all of this stuff I do want to leave you with another resource that I have created which is this table and this table shows you the fundamental moves that you will need to do whenever you do any type of data analytics and I believe that every sort of analytics that you might work on no matter how simple or complicated can ultimately be reduced to these few basic moves and what are these moves they should actually be quite familiar to you by now so we have joining and this is where we combine data from multiple tables based on some connections between columns and in SQL you can do that with the join then we have filtering filtering is when we pick certain rows and discard others so you know let’s look only at customers that joined after 2022 now how do you do that in SQL there are a few tools tools that you can use to do that the most important one is the wear filter and the wear filter comes in action right after you’ve loaded your data and it decides which rows to keep which rows to discard having does just the same except that it works on aggregated fields it works on fields that you’ve obtained after a group by qualify we actually haven’t seen it in this course because it’s not a universal component of SQL certain systems have it others don’t but qualify is basically also a filter and it works on the result of window functions and finally you have distinct which runs quite at the end of your query and it’s basically removing all duplicate rows and then of course you have grouping and aggregation and we’ve seen this in detail in the course you subdivide the data um on certain dimensions and then you calculate aggregate values within those Dimensions fundamental for analytics how do we aggregate in SQL we have the group by we have the window functions and for both of them we use aggregate functions such as sum average and so on and then we have column Transformations so this is where you apply logic uh arithmetic to transform columns combine column values and take take the data that you have in order to compute data that you need and we do this where we write the select right we can write calculations that involve our columns we have the case when which allows us to have a sort of branching logic and decide what to do based on some conditions and of course we have a lot of functions that make our life easier by doing specific next we have Union Union is pretty simp simple take tables that have the same columns and stack them together meaning put their rows together and combine them and finally we have sorting which can change how your data is sorted when you get the result of your analysis and can be also used in window functions in order to number or rank our data and these are really the fundamental elements of every analysis and every equal problem that you will need to solve so one way to face a problem even if you are finding it difficult is to come back to these fundamental components and try to think of how do you need to combine them in order to solve your problem and how can you take your problem and break it down to simpler operations that involve these steps now at the beginning of the course I promised you that uh we we would be solving a hard squl challenge together at the end of the course so here it is let us try now to solve this challenge applying the concepts in this course now as a quick disclaimer I’m picking a hard challenge because it’s sort of fun and it gives us um a playground to Showcase several Concepts that we’ve seen in the course and also because I would like to show you that even big hard scary ch Alles that are marked as hard and even have advanced in their name can be tackled by applying the basic concepts of SQL however I do not intend for you to jump into these hard challenges um from the very start it would be much better to start with basic exercises and do them step by step and be sure that you are confident with the basic steps before you move on to more advanced steps so if you have trouble uh approaching this problem or even understanding my solution don’t worry about it just go back to your exercises and start from the simple ones and then gradually build your way up that being said let’s look at the challenge marketing campaign success Advanced on strata scratch so first of all we have one table that we will work on for this challenge marketing campaign so marketing campaign has a few columns and it actually looks like this okay so there’s a user ID created that product ID quantity price now when I’m looking at the new table the one question that I must ask to understand it is what does each row represent and just by looking at this table I can have some hypotheses but I’m actually not sure what each row represents so I better go and read the text until I can get a sense of that so let’s scroll up and read you have a table of inapp purchases by user okay so this explains my table what does each row represent it represents an event that is a purchase okay so it means that user ID 10 bought product ID 101 in a quantity of three at the price of 55 and created that tells me when this happened so this happened 1st of January 2019 so great now I understand my table and now I can see what the problem wants from me let’s go on and read the question so I have a table of inapp purchases by users users that make their first inapp purchase are placed in a marketing campaign where they see call to actions for more Ina purchases find the number of users that made additional purchases due to the success of the marketing campaign the marketing campaign doesn’t start until one day after the initial app purchase so users that made one or multiple purchases on the first day do not count nor do we count users that over time purchase only the products they purchased on the first day all right so that was a mouthful okay so this on the first run it’s actually a pretty complicated problem so our next task now is to understand this text and to simplify it to the point that we can convert it into code okay and a good intermediate step before jumping into the code is to write some notes and we can use the SQL commenting feature for that so what I understand from this text is that users make purchases and we are interested in users that make additional purchases we’re interested in users who make additional purchases thanks to this marketing campaign how do we Define additional purchases additional purchase is defined as and the fundamental sentence is this one users that made one or multiple Pur purchases on the first day do not count so additional purchase happens after the first day right nor do we count users that over time purchase only the products they purchased on the first day so the other condition that we’re interested in is that it involves a product that was not bought the first day and finally what we want is the number of users so get the number of these users that should be a good start for us to begin writing the code so let us look at the marketing campaign table again and I remind you that each row represents a purchase so what do we need to find First in this table so we want to compare purchases that happen on the first day with purchases that happen the following day so we need a way to count days and what do we mean first day and following days do we mean the first day that the shop was uh open no we actually mean the first day that the user ordered right because the user signs up does the first order and then after that the marketing campaign starts so we’re interested in numbering days for each user such that we know what purchases happened on the first day what purchases happened on the second day third day and so on and what can we use to run a numbering by user we can use a window function with a numbering function right so I can go to my marketing campaign table and I can select the user ID and the date in which they bought something and the product ID for now now I said that I need a window function so let me start and Define the window now I want to count the days within each user so I will actually need to Partition by user ID so that each row only looks at the rows that correspond to that same user and then there is an ordering right there is a a sequence from the first day uh in which the user bought something to the second and the third and so on so my window will also need an ordering and what column in my table can provide an ordering it is created at and then what counting function do I need to use here well the the way to choose is to say what happens when the same user made two two different purchases on the same date what do I want my function to Output do I want it to Output two different numbers as a simple count or do I want them want it to Output the same number and the answer is that I wanted to Output the same number because all of the purchases that happened on day one need to be marked as day one and all the purchases that have happened on day two need to be marked as day two and so on and so the numbering function that allows us to achieve this is Rank and if you remember ranking is works just like ranking the winners of a race everyone who shares the same spot gets the same number right and this is what we want to achieve here so let us see what this looks like now and let us order by user ID and created at let us now see our purchases now user 10 started buying stuff on this day they bought one product and the rank is one Let’s us actually give a better name to this column so that it’s not just rank and we can call it user day all right so this user id10 had first user day on the this date and they brought one product then at a later date they had their second user day and they bought another product and then they had a third now user 14 started buying on this date this was their first user day they bought product 109 and then the same day they bought product 107 and this is also marked as user day one so this is what we want and then at a later day they bought another product and this is marked as user day three remember with rank you can go from 1 one to three because this the F the spot marked as one has eaten the spot Mark as two that’s not an issue in this problem so we are happy with this now if we go back to our notes we see that we are interested in users who made additional purchases and additional means that it happen s after the first day and how can we identify purchases that happened after the first day well there’s a simple solution for this we can simply filter out rows that have a user day one right all of the rows where the user day is one represent purchases that the user made on their first day so we can discard this and keep only purchase that happened on the following days now I don’t really have a way to filter on this uh window function because as you recall from the order of SQL operation the window function happens here and the wear filter happens before that so the wear filter cannot be aware of what happens in the window function and the having also happens before it so I need a different solution to filter on this field what I need to do is to use a Common Table expression so that I can break this query in two steps so I’m going to wrap this logic into a table called T1 or I can call it purchases for it to be more meaningful and if I do select star from purchases you will see that the result does not change but what I can do now is to use the wear filter and make sure that the user day is bigger than one and if I look here you will see that I have all purchases which happened after the users first day but there is yet one last requirement that I have to deal with which is that the purchase must happen after the first day and it must involve a product that the user didn’t buy on the first day so how can I comply with this requirement now for all of the rows that represent a purchase I need to drop the rows that involve a product ID that the user bought the first day so if I find out that user 10 bought product 119 on day one this purchase does not count I’m not interested in it so how can I achieve this in code I’m already getting all the purchases that didn’t happen on day one and then I want another condition so I will say and product ID not in and here I will say products that this user bought on day one right it makes sense so this is all the filters I need to complete my problem show me all the purchases that happened not on day one and also make sure that the user didn’t buy this product on day one so what I need to do is to add a subquery in here and before I do that let me give a Alias to this table so so that I don’t get confused when I call it again in the subquery so this first version of purchases that we’re using we could call it next days because we’re only looking at purchases that happen after the first day whereas in the subquery we want to look at purchases but we’re interested in the ones that actually happened on day one so we could call this first day and and we can use a wear filter to say that first day user day needs to be equal to one so this is a way that we can use to look at the purchases that happened on the first day now when we make this list we need to make sure that we are use looking at the same user right and to do that we can say end first day user ID needs to be the same as next day’s user ID and this ensures that we’re looking at the same user and we’re not getting confused between users and finally what do we need from the list of first day purchases we need the list of products so let me first see if the query runs so it runs there’s no mistakes and now let us review the logic of this query we have purchases which is basically a list of purchases with the added value that we know if it happened on day one on day two on day three and so on and then we are getting all of these purchases the ones that happened after day one and we are also getting the the list of products that they this user bought on day one and we are making sure to exclude those products from our final list and this is a correlated subquery because it is a specific SQL query that provides different results for every row that must run for every row because in the first row we need to get the list of products that user ID 10 has bought on day one and make sure that this product is not in it um and then when we go to another row such as this one we need to get the list of all products that user 13 bought on day one and make sure that 118 is not in those products so this is why it’s a correlated subquery and the final step in our problem is to get the number of these users so instead of selecting star and getting all of the C columns I can say count distinct user ID and if I run this I get 23 checking and this is indeed the right solution so this is one way to solve the problem and hopefully it’s not too confusing but if it is don’t worry it is after all an advanced problem if you go to solution here I do think however that my solution is a bit clearer than what strata scratch provides this is actually a bit of a weird solution but that’s ultimately up to you to decide and I am grateful to strata scratch for providing problems that I can solve for free such as this one welcome to postgress SQL exercises the website that we will use to exercise our SQL skills now I am not the author of this website I’m not the author of these exercises the author is Alis D Owens and he has generously created this website for anyone to use and it’s free you don’t even need to sign up you can go here right away and start working on it I believe it is a truly awesome website in fact the best at uh what it does and I’m truly grateful to Alis there for making this available to all the way the website works is pretty simple you have a few categories of exercises here and you can select a session and once you select a session you have a list of exercises you can click on an exercise and then here in the exercise view you have a question that you need to solve and you see a representation of your three tables we’re going to go into this shortly and then you see your expected results and here in this text box over here you can write your uh answer and then hit run to see if it’s the correct one the results will appear in this lower quadrant over here and if you get stock you can ask for a hint um and uh here there are also a few keyboard shortcuts that you can use and then after you submit your answer uh or if you are completely stuck you can go here and see the answers and and discussion and that’s basically all there is to it now let’s have a brief look at the data and see what that’s about and the data is the same for all exercises and what we have here is the data about a newly opened Country Club and we have three tables here members represents the members of the country club so we have their surname and first name their address their telephone and uh the the date that which they joined and so on and then we have the bookings so whenever a member makes a booking into a facility that event is stored into this table and then finally we have a table of facility where we have information about each facility and U in there we have some some tennis courts some badminton courts uh massage rooms uh and so on now as you may know this is a standard way of representing how data is stored in a SQL system so you have um the tables and for each table you see the columns and for each column you see the name and then the data type right so the data type is the type of data that is allowed into this column and as you know each column has a single data type and you are not allowed to mix multiple data types within each column so we have a few different data types here and they have the postgress um name so in postgress an integer is a whole number like 1 2 3 and a numeric is actually a FL floating Point number such as 2.5 or 3.2 character varying is the same as string it represents a piece of text and if you wonder about this number in round brackets 200 it represents the maximum limit of characters that you can put into this piece of text so you cannot have a surname that’s bigger than 200 characters and then you have a time stamp which represents a specific point in time and this is actually all the data types that we have here and finally you can see that the tables are connected so in the booking table every entry every row of this table represent an event where a certain facility ID was booked by a certain member ID at a certain time for a certain number of slots and the facility ID is the same as the facility ID field in facilities and the M ID field field is the same as the M ID or member ID field in members therefore the booking table is connecting to to both of these table and these logical connections will allow us to use joins in order to build queries that work on all of these three tables together and we shall see in detail how that works finally we have an interesting Arrow over here which represents a self relation meaning that the members table has a relation to itself and if you and if you look here this is actually very similar to the example that I have shown in my U mental models course um for each member we can have a recommended bu field which is the ID of another member the member who recommended them into the club and this basically means that you can join the members table to itself in order to get at the same time information about a specific member and about the member who recommended them and we shall see that in the exercises and clearly the exercises run on post SQL and postgress is one of the most popular open-source SQL systems out there postgress SQL is a specific dialect of SQL which has some minor difference es from other dialects such as my SQL or Google SQL that used is used by bigquery but it is mostly the same as all the others if you’ve learned SQL with another dialect you’re going to be just fine postgress sqle does have a couple of quirks that you should be aware about but I will address them specifically as we solve these exercises now if you want to rock these exercises I recommend keep keeping in mind The Logical order of SQL operations and this is a chart that I have introduced and explained extensively in my mental models course where we actually start with this chart being mostly empty and then we add one element at a time making sure that we understand it in detail so I won’t go in depth on this chart now but in short this chart represents the logical order of SQL operations these are are all the components that we can assemble to build our SQL queries they’re like our Lego building blocks for for SQL and these components when they’re assembled they run in a specific order right so the chart represents this order it goes from top to bottom so first you have from then you have where and then you have all the others and there are two very important rules that each operation can only use data produced above it and an operation doesn’t know anything about data produced below it so if you can keep this in mind and keep this chart as a reference it will greatly help you with the exercises and as I solve the exercises you will see that I put a lot of emphasis on coming back to this order and actually thinking in this order in order to write effective queries let us now jump in and get started with our basic exercises so I will jump into the first exercise which is retrieve everything from a table so here I have my question and how can I get all the information I need from the facilities table and as you know all my data is represented here so I can check here to see where I can find the data that I need now as I write my query I aim to always start with the front part why start with the front part first of all it is the first component that runs in The Logical order so again if I go back to my chart over here I can see that the from component is the first and that makes sense right because before I do any work I need to get my data so I need to tell SQL where my data is so in this case the data is in the facilities table next I need to retrieve all the information from this table so that means I’m not going to drop any rows and I’m going to select all the columns and so I can simply write select star and if I hit run I get the result that I need here in this quadrant I can see my result and it fits the expected results now the star is a shortcut for saying give me all of The Columns of this table so I could have listed each column in turn but instead I took a shortcut and used a star retrieve specific columns from a table I want to print a list of all the facilities and their cost to members so as always let’s start with the front part where is the data that we need it’s in the facilities table again and now the question is actually not super clear but luckily I can check the expected results so what I need are two columns from this table which is name and member cost so to get those two columns I can write select name member cost hit run and I get the result that I need so if I write select star I’m going to get all the columns of the table but if I write the name of specific columns separated by comma I will get uh only those columns specifically control which rows are retrieved we need a list of facilities that charge a fee to members so we know that we’re going to work with the facilities table and now we need to keep certain rows and drop others we need to keep only the rows that charge a fee to members so what component can we use in order to do this if I go back to my components chart I can see that right after from we have the we component and the we component is used to drop rows that we don’t need right so in after getting the facilities table I can see I can say where member cost is bigger than zero meaning that they charge a fee to members and finally I can get all of the columns from this control which rows are retrieved part two so like before we want the list of facilities that charge a fee to members but our filtering condition is now a bit more complex because we need that fee to be less than 150th of the monthly maintenance cost so I copied over the code from the last exercise we’re getting the data from our facilities list and we’re filtering for those where the member cost is bigger than zero and now we need to add a new condition which is that that fee which is member cost is less than 150th of the monthly maintenance cost so I can take monthly maintenance over here and divide it by 50 and I have my condition now when I have multiple logical conditions in the wear I need to link them with the logical operator so SQL can figure out how to combine them because the final result of all my conditions needs to be a single value which is either true or false right so let’s see how to do this in my mental models course I introduced the Boolean operators and how they work so you can go there for more detail but can you figure out which logical operator do we need here to chain these two conditions as suggested in the question the operator that I need is end so I can put it here here and what end does is that both of these conditions need to be true for the whole expression to evaluate to true and for the row to be kept so only the rows where both of these conditions are true will be kept and all other rows will be discarded now to complete my exercise I just need to select a few specific columns because we don’t want to return all the columns here and I think that I will cheat a bit by copying them from the expected results and putting them here but normally you would look at the table schema and figure out which columns you need and that completes our exercise basic string searches produce a list of all facilities with the word tennis in their name so where is the data we need it’s in the CD facilities table next question do I need all the rows from this table or do I need to filter out some rows well I only want facilities with the word tennis in their name so clearly I need a filter therefore I need to use the wear statement how can I write the wear statement I need to check the name and I need to keep only facilities which have tennis in their name so I can use the like statement here to say that the facility needs to have tennis in its name but what this wild card signify is that we don’t care what precedes tennis and what follows tennis it could be zero or more characters before it and after it we just care to check that they have tennis in their name and finally we need to select all all the columns from these facilities and that’s our result beware like I said before of your use of the quotes So what you have here is a string it’s a piece of text that uh allows you to do your match therefore you need single quotes if you as it’s likely to happen used double quotes you would get an error here and the error tells you that the column tenis the does not exist because double quotes are used to represent column names and not pieces of text so be careful with that matching against multiple possible values can we get the details of facilities with id1 and id5 so where is my data is in the facilities table and do I need all the rows from this table or only certain ones I need only certain rows because I want those that have id1 and id5 so I need to use a wear statement Now what are my conditions here their ID actually facility ID equals 1 and facility ID equals 5 so I have my two logical conditions now what operator do I need to use in order to chain them I need to use the or operator right because only one of these need needs to be true in order for the whole expression to evaluate to true and in fact only one of them can be true because it’s impossible for the idea of a facility to be equal to one and five at the same time therefore the end operator would not work and what we need is the or operator and finally we need to get all the data meaning all the columns about this facility so I will use select star the problem is now solved but now let’s imagine that tomorrow we need this query again and we need to include another id id 10 so what we can do is put or facility ID equals 10 but this is becoming a bit unwieldy right because imagine having a list of 10 IDs and then writing or every time and it’s it’s not very scalable as an approach approach so as an alternative we can say facility ID in and then list the values like one and five so if I take this and make it into my condition I will again get the same result I will get the the solution but this is a more elegant approach and it’s also more scalable because it’s much easier to come back and insert other IDs inside this list so this is a preferred solution in this case and logically what in is doing is looking at the facility ID for each row and then checking whether that ID is included in this list if it is it returns true therefore it keeps the row if it’s not returns false therefore it drops the row and we shall see a bit later that the in uh notation is also powerful because in this case we have a static list of IDs we know that we want IDs one and five but in more advanced use cases instead of a static list we could provide another query a SQL query or a subquery that would dynamically retrieve a certain list and then we could use that in our query so we shall see that in later exercises classify result into buckets produce a list of facilities and label them cheap or expensive based on their monthly maintenance so we want to get our facilities do we need a filter do we need to drop certain rows no we actually don’t we want to get all facilities and then we want to label them and we need to select the name of the facility and then here we need to provide the label so what SQL statement can we use to provide a text level label according to the value of a certain column what we need here is a case statement which implements conditional logic which implements a branching right it’s similar to the if else statements in other programming languages because if the monthly maintenance cost is more than 100 then it’s expensive otherwise it’s cheap so this call for a case statement now I always start with case and end with end and I always write these at the beginning so I don’t forget them and then for each condition I write when and what is the condition that I’m interested in monthly maintenance being above 100 that’s my first condition what do I do in that case I output a piece of text which says expensive and remember single quotes for test text next I could write the next condition explicitly but actually if it’s not 100 then it’s less than 100 so all I need here is an else and in that case I need to Output the piece of text which says cheap and finally I have a new column and I can give it a label I can call it cost and I get my result so whenever you need to put values into buckets or you need to label values according to certain rules that’s usually when you need a case statement working with dates let’s get a list of members who joined after the start of September 2012 so looking at these tables where is our data it’s in the members table so I will start writing this and now do I need to filter this table yes I only want to keep members that joined after a certain time and now how can I run this the condition on this table I can say where join date is bigger than 2012 September 01 so luckily in SQL and in postgress filtering on dates is quite intuitive even though here we have a time stamp that represents a specific moment in time up to the second we can say bigger or equal actually because we also want to include those who joined on the first day we can write bigger or equal and just specify the the date and SQL will fill in the the rest of the remaining values and the filter will work and next we want to get a few columns for these members so I will copy paste here select and this solves our query removing duplicates and ordering results we want an ordered list of the first 10 surnames in the members table and the list must not contain duplicates so let’s start by getting our table which is the members table now we want to see the surnames so if I write this I will see that there are surnames which are shared by members so there are actually duplicates here so what what can we do in SQL in order to remove duplicates we have seen in the mental models course that we have the distinct keyword and the distinct is going to remove all duplicate rows based on the columns that we have selected so if I run this again I will not see any duplicates anymore now the list needs to be ordered alphabetically as I see here in the expected results and we can do that with the order by statement and when you use order by on a piece of text the default behavior is that the text is ordered alphabetically and uh if I were to use Des sending then it would be ordered in Reverse alphabetical order however that’s not what I need I need it in alphabetical order so now I see that they are ordered and finally I want the first 10 surnames so how can I return the first 10 rows of my result I can do that with the limit statement so if I say limit 10 I will get the first 10 surnames and since I have ordered alphabetically I will get the first 10 surnames in alphabetical order and this is my result now going back to our map over here we have the from which gets a table we have a where which drops rows that we don’t need from that table and then all the way down here we have the select which gets the columns that we need and then we have the distinct right and the distinct needs to know which columns we need because it’s it drops duplicates based on these columns so in this example over here we’re only taking a single column surname so the distinct is going to drop duplicate surnames and then at the end of it all when all the processing is done we can order our results and then finally once our results are ordered we can do a limit to limit the number of rows that we return so I hope this makes sense combining results from multiple queries so let’s get a combined list of all surnames and all facility names so where are the surnames there in CD members and from CD m mbers I can select surname right and this will give me the list of all surnames and where are the facility names there are in CD facilities and I could say select name from CD facilities and I would get a list of all the facilities now we have two distinct queries and they both produce a list or a column of text values and we want to combine them what does it mean we want to stack them on top of each other right and how does that work well if I just say run query like this I will get an error because I have two distinct query here queries here and they’re not connected in any way but when I have two queries or more defining tables and I want to stack them on top of each other I can use the union statement right and if I do Union here I will uh get what I want because all the surnames will be stacked uh vertically with all the names and I will get a unique list containing both of these columns now as I mentioned in the mental models course typically when you have just Union uh it means Union distinct and actually other systems like bigquery don’t allow you to write just Union they want you to specify Union distinct and what this actually does is that after stacking together these two tables it removes all duplicate rows and uh the alternative to this is Union all which um does not do this it actually keeps all the rows and as you know we have some duplicate surnames and then we get them here and it doesn’t fit with our result but if you write just Union it will be Union distinct and you won’t have any duplicates and if you look at our map for The Logical order of SQL operations we are getting the data from a certain table and uh filtering it and then doing all sorts of operations and um on on this data and then we are selecting The Columns that we need and then we can uh remove the the duplicates from this one table and then what comes next is that we could combine this table U with other tables right we can tell SQL that we want to Stack this table on top of another table so this is where the union comes into play and only after we have combined all the tables only after we have stacked them all up on top of each other we can order the results and limit the results also remember and I showed this in detail in the mental models course um when I combine two or more table tables with a union what I need is for them to have the exact same number of columns and all of the columns need to have the same data type so in this case both tables have one column and this column is a text so the the union works but if I were to add another column here and it’s an integer column it would not work because the union query must have the same number of columns right I will get an error however if I were to add an integer column in the second position in both tables they would work again because again I have the same number of columns and they have the same data type simple aggregation I need the sign up date of my last member so I need to work with the members table and we have a field here which is join date and I need to get the latest value of this date the time when a member last joined right so how can I do that I can take my join date field and run an aggregation on top of it what is the correct aggregation in this case it is Max because when it comes to dates Max will take the latest date whereas mean will take the earliest date and I can label this as latest and get the result I need now how aggregations work they are uh functions that look like this you write the name of the function and then in round brackets you provide the arguments the first argument is always the column on which to run the aggregation and what the aggregation does is that it takes a list of values could be 10 100 a million 10 million it doesn’t matter it takes a long list of values and it compresses this list to a single value it um does like we’ve seen in this case taking all of the dates and then returning the latest date now to place this in our map we get the data from the table we filter it and then sometimes we do a grouping which we we shall see later in the exercises but whether we do grouping or not here we have aggregations and if we haven’t done any grouping the aggregation works at the level of all the rows so in the absence of grouping as in this case the aggregation will look at all the rows in my table except for the rows that I filtered away but otherwise it will look at all the rows and then it will compress them into a single value more aggregation we need the first and last name of the last member who signed up not just the date so in the previous exercise we saw that we can say select Max join date from members and we would get the last join date the date when the last member signed up right so given that I want the first and the last name you might think that you can say first name and surname in here but this actually doesn’t work this gives an error the error is that the column first name must appear in the group by clause or be used in a aggregate function now the meaning behind this error and how to avoid it is described in detail in the mental models course in the group by section but the short version of it is that what you’re doing here is that with this aggregation you’re compressing join date to a single value but you’re doing no such compression or aggregation for first name and surname and so SQL is left with the um instruction to return something like this and as you can see here we have a single value but for these columns we have multiple values and this does not work in SQL because you need all columns to have the same number of values and so it it throws an error and what we really need to do here is to take this maximum join date and use it in a wear filter because we only want to keep that row which corresponds to the latest join date so we can take the members table and get the row where join date is equal to the max join date and from that select the name and the surname unfortunately this also doesn’t work so what we saw in the course is that you cannot you’re not allowed to use aggregations inside wear so you cannot use max inside where and the reason why is that actually pretty clear because aggregations happen at this stage in the in the process and aggregations need to know whether a group ey has occurred or not they need to know whether they have to happen over all the rows in the table or only within the groups defined by the group ey and when we are at the where stage the groupy hasn’t happened yet so we don’t know at which level to execute the aggregations and because of this we are not allowed to do aggregations inside the where statement so how can we solve the problem now well a a sort of cheating solution would be if we knew the exact value of join date we could place it here and then our filter would work we’re not using an aggregation and we could put join date in here to display it as well and that would would work however this is a bit cheating right because um the maximum join date is actually a dynamic value it will change with time so we don’t want to hardcode it we want to actually um compute it but because this is not allowed what we actually need is a subquery and the subquery is a SQL query that runs within a query to return a certain result and we can have a subquery by opening round brackets here and write writing a a query and in this query we need to go to the members table and select the maximum join date and this is our actual solution so in this execution you can imagine that SQL will go here inside the subquery run this get the maximum jointed place it in the filter uh keep only the row for the latest member who has joined and then retrieve what we need about this member let us now move to the joints and subqueries exercises the first exercise retrieve the start times of members bookings now we can see that the information we need is spread out into tables because we want the start time for bookings that and that information is in the bookings table but we want to filter to only get members named David farel and the name of the member is contained in the members table so because of that we will need a join so if we briefly look at the map for the order of SQL operations we we can see here that from and join are really the same uh step um and how this works is that in the from statement sometimes uh all my data is in one table and then I just provide the name of that table but sometimes I need to combine two or more different tables in order to get my data and in that case I would use the join but everything in SQL works with tables right so when I when I take two or more tables and combine them together at the end all I get is just another table and this is why from and join are actually the same component and they are the same step so as usual let us start with the front part and we need to take the booking table and we need to join it on the members table and I can give an alas to each table to make my life easier so I will call this book and I will call this mem and then I need to specify The Logical condition for joining this table and The Logical condition is that the M ID column in the booking table is really the same thing as the M ID column in the members table concretely you can imagine um SQL going row by Row in the booking table and looking at the M ID and then checking whether this m ID is present in the members table and if it’s present it combines the row uh the current Row from bookings with the matching Row for members does this with all the matching rows and then drops rows which don’t have a match and we saw that in detail in the mental models course so I’m not going to go in depth into it now that we have our table which is uh comes from the joint of members and bookings we can properly properly filter it and what we want is that the first name column is David in the column which comes from the members table right so m. first name is indicating the parent table and then the column name and the surname is equal to FAL and remember single quotes when using pieces of text this is a where filter you have two logical conditions and then we use the operator end because both of them need to be true so now we have uh filtered our data and finally we need to select the start time and that’s our query now remember that when we use join in a query what’s implied is that we are using inner join and there are several types of join but inner joint is the most common so it’s the default one and what inner joint means is that it’s going to return uh from the two tables that we’re joining is going to return only the rows that have a match and all the row that don’t have a match are going to be dropped so if there’s a row in bookings and it has a m ID that doesn’t exist in the members table that row will be dropped and conversely if there’s a row in the members table and it has a m ID that is not referenced in the booking table that row will also be dropped and that’s an inner join work out the start times of bookings for tennis courts so we need to get the facilities that are actually tennis courts and then for each of the facility we’ll have several bookings and we need to get the start time for those uh bookings and it will be in a specific date so we know that we need the data from these two tables because the name of the facility is here but the data about the bookings is here so I will go from CD facilities join CD bookings on what are the fields that we can join on logically now let me first give an alias to these tables so I will call this fox and this I will call book and now what I need to see is that the facility ID matches on both sides now we can work work on our filters so first of all I only want to look at tennis courts and if you look at the result here um it means that in the name of the facility we want to see tennis and so we can filter on uh string patterns on text patterns by using the like uh command so I can take facilities name and get it like tennis and the percentage signs are um wild cards which means that tennis could be preceded and followed by zero or more characters we don’t care we just want to get those strings that have tennis in them but that’s not enough as a condition we also need the booking to have happened on a specific date so I will put an end here so end is the operator we need because we’re providing two logical conditions and they both need to be true so end is what we need and then I can take the start time from the booking table and um say that it should be equal to the date provided in the instructions because I want the booking to have happened in this particular date however this will not work so I can actually complete the query and show you that it will not work because here we get zero results so can you figure out why this um command here did not work now I’m going to write a few comments here and uh this is how you write them and they are just pieces of text they’re not actually executed as code and I’ll just use them to show you what’s going on so the value for start time looks like this so this is a time stamp and is showing a specific point in time but the date that we are providing for the comparison looks like this so as you can see we have something that is uh less granular because we we’re not showing all of this data about hour minute and uh and second now in order to compare these two things which are different SQL automatically fills in uh this date over here and what it does is that since there’s nothing there it puts zeros in there and now that it has made this um extension it’s going to actually compare them so when you look at this uh comparison over here between these two elements this comparison is false false because the hour is different now when we write this uh filter command over here SQL is looking at every single start time and then comparing it with this value over here which is the very first moment of that date but there’s no start time that is exactly like this one so basically this is always false and thus we get uh zero rows in our result so what is the solution to this before when we take a start time from the data before comparing it we can put it into the date function and if I take my example here if I put it into the date function it’s going to drop that extra information about hour minute and second and it’s only going to keep uh the information about the date so once I do this if I uh if I pass it to the date function before comparing it to my reference date now this one is going to become the result which is this one and then I’m going to compare it with my reference date and then this is going to be true so all this to say that before we compare start time with our reference date we need to reduce its granularity and we need to reduce it to its uh to its date so if I run the query now I will actually get my start times and after this I just need to add the name and finally I need to order by time so I need to order bu um book start time there is still a small error here so sometimes you just have to look at what you get and what’s expected and if you notice here we are returning data about the table tennis facility but we’re actually just interested in tennis court so what are we missing here the string filter is not precise enough and we need to change this into tennis court and now we get our results produce a list of all members who have recommended another member now if we look at the members table we have all these data about each member and then we know if they were recommended by another member and recommended by is the ID of the member who has recommended them and because of this the members table like we said has a relation to itself because one of its column references its ID column so let’s see how to put this in practice so to be clear I simply want a list of members who appear to have recommended another member so if I wanted just the IDS of these people my task would be much simpler right I would go to the members table and then I could select recommended by and then I will put a distinct in here to avoid repetitions and what I would get here is the IDS of all members who have recommended another member however the problem does not want this because the problem wants the first name and Sur name of these uh of these people so in order to get the first name and the name of these people I need to plug this ID back into the members table and get the the data there so for example if I went to the members table and I selected everything where the M ID is 11 then I would get the data for this first member but now I need to do this for all members so what I will have to do is to take the members table and join it to itself and the first time I take the table I’m looking at the members quite simply but the second time I take the members table I’m looking at data about the recommenders of the members so I will call this second instance re so both of these they come from the same table but they’re now two separate instances and what is the logic to join these two tables the members table has this recommended by field and we take the ID from recommended by and we plug it back into the table into M ID to get the data about the recommenders and now we can go into the recommenders which we got by plugging that ID and get their first name and surname I want to avoid repetition because a member may have been recommending multiple members but I want to avoid repetition so I will put a distinct to make sure that I don’t get any uh repeated rows at the end and then finally I can order by surname and first name and I get my result so I encourage you to play with this and experiment a bit until it is clear and in my U mental models course I go into depth into the self joint and uh do a visualization in Google Sheets that also makes it uh much clearer produce a list of all members along with a recommender now if we look at the members table we have a few column and then we have the recommended by column and sometimes we have the ID of another member who recommended this member um it can be repeated because the same member may have recommended multiple people and then sometimes this is empty and when this is empty we have a null in here which is the value that SQL uses to represent absence of data now let us count the rows in members so you might know that to count the rows of a table we can do a simple aggregation which is Count star and we get 31 and let’s just make a note of this that members has 31 rows because in the result we want a list of all members so we must ensure that we return 31 rows in our results now I’m going to delete this select and as before I want want to go for each member and check the ID they have here in recommended bu and then plug this back into the table into M ID so I can get the data about the recommender as well and I can do that with a self jooin so let me take members and join on itself and the first time I will call it Ms and the second time I will call it Rex and the logic for joining is that in Ms recommended by is the same um is connected to to Rex M ID so this is taking the ID in the recommended by field and plugging it back into me ID to get the data about the recommender now what do I want from this I want to get the first name of the member and the last name uh surname and then the first name and last name of the recommender uh surname great so it’s starting to look like the right result but how many rows do we think we have here and in order to count the rows I can do select count star from and then if I simply take this table uh if I simply take this query and en close it in Brackets now this becomes a a subquery so I can ah the subquery must have an alias so I can give it an alias like this and I get 22 so how this works is that first SQL will compute the content of the subquery which is the table that we saw before and then it will uh we need to assign it an alas otherwise it doesn’t work this changes a bit by System but in post you need to do this so we we call it simply T1 and then we run a count star on this table to get the number of rows and we see that the result has 22 rows and this is an issue because we saw before that members has 31 rows and that we want to return all of the members therefore our result should also have 31 rows so can you figure figure out why are we missing some rows here now the issue here is that we are using an inner join so remember when we don’t specify the type of joint it’s an inner joint and what does an inner joint do it keeps only rows that have matches so if you we saw before that in members sometimes this field is empty it has a null value because U you know maybe the member wasn’t recommended by anyone maybe they just apply it themselves and what happens when we use this in an inner joint and it has a null value the row for that me member will be dropped because obviously it cannot have a match with M ID because null cannot match with with anything with any ID and so that row is dropped and we lose it however that’s not what we want to do therefore instead of an inner join we need to use a left join here the the left join will look at the left table so the table that is left of the join command and it will make sure to keep all the rows in that table even the rows that don’t have a match in the rows that don’t have a match it will not drop them it will just put a null in the values that correspond to the right table and if I run the count again uh I will get 31 so now I have I’m keeping all the members and I have the number of rows that I need so now I can get rid of all of these because I know I have the right amount of of rows and I can um get my selection over here and it would actually help if we could make this a bit uh more ordered and a assign aliases to the columns so I will follow the expected results here and call this m first name me surname W first name Rec surname now we have the proper labels and you can see here that we always have the name of the member but some member weren’t recommended by anyone and therefore for the first and last name of the recommender we simply have null values and this is what the left join does the last step here is to order and we want to order by the last name and the first name of each member and we finally get our result so typically you use inner joints which is the default joint because you’re only interested in the rows from both tables that actually have a match but sometimes you want to keep all the data about one table and then you would put that table on the left side and do a left join as we did in this case produce a list of all members who have used a tennis court now now for this problem we need to combine data from all our tables because we need to get look at the members and we need to look at their bookings and we need to check what’s the name of the facility for their bookings so as always let us start with the front part and let us start by joining together all of these tables CD facilities on facility ID and then I want to also join on members and that is my join so we can always join two or more tables in this case we’re joining three tables and how this works is that the first join creates a new table and then this new table is joined with the with the next one over here and this is how multiple joints are managed now I have my table which is the join of all of these tables and um we we’re only interested in members who have used the tennis court if a member has made no bookings um we are we don’t we’re not interested in that member and so it’s okay to have a join and not a left join and we’re for each booking we want to see the name of the facility and if there was a booking who didn’t have the name of the facility we wouldn’t be interested in that booking anyway and so um this joint here also can be an inner join and doesn’t need to be a left join this is how you can think about whether to have a join or left join now we want the booking to include a tennis court so we can filter on this table and we will look at the name of the facility and uh make sure that it has tennis court in it with the like operator and now that we have filtered we can get the first name and the surname of the member and we can get the facility name so here we have a starting result now in the expected result we have merged the first name and the surname into a single string and um in SQL you can do this with a concatenation operator which is basically taking two strings and putting them together into one string now if I do this here I will get um something like this and so this looks a bit weird and what I want to do here is to add an empty space in between and again concatenate it and now the names will look uh will look fine I also want to label this as member and this other column as facility to match the expected results next I need to ensure that there is no duplicate data so at the end of it all I will want to have distinct in order to remove duplicate rows and then I want to order the final result by member name and facility name so order by member and then facility and this will work because the order bu coming second to last coming at the end of our logical order of SQL operations over here the order by is aware of the alas is aware of the label that I have that I have put on the columns and here I get the results that I needed not a lot happening here to be honest it’s just that we’re joining three tables instead of two but it still works um just like uh any other join and then concatenating the strings filtering according to the facility name and then removing duplicate rows and finally ordering produce a list of costly bookings so we want to see all bookings that occurred in this particular particular day and we want to see how much they cost the member and we want to keep the bookings that cost more than $30 so clearly in this case we also need the information from from all tables because if you look at the expected results we want the name of the member which is in the members table the name of the facility which is in the facilities table and the cost for which we will need the booking table so we need to start with a join of these three tables and since we did it already in the last exercise I have copied the code for that uh join so if you want more detail on this go and check the last exercise as well as I have copied the code to get the first name of the member by concatenating strings and the name of the of the facility now we need to calculate the cost of each booking so how does it work looking at our data so we have here a list of bookings and um a booking is defined as a number of slots and a slot is a one uh is a 30 minute usage of that facility and then we also have mid which tells us whether the member is a guest or not I mean whether the person is a guest or a member because if mid is zero then that person is a guest otherwise that person is a member and then I also know the facility that this person booked and if I go and look at the facility it has uh two different prices right one price uh is for members the other price is for guests and the price applies to the slots so we have all of the ingredients that we need for the cost in our join right and to convince ourselves of that let us actually select the here so in Booking I can see facility ID member ID and then slots and then in facility I can see the member cost the guest cost and I guess that’s all I need really to calculate the cost and as you can see after the join I’m in a really good position because for each row I do have all of these values placed on each row so now I just have to figure out how to combine all of these values in order to get the cost now the way that I can get the cost is that I can look at the number of slots and then I need to multiply this by the right cost which is either member cost or guest cost and how do I know which of these to pick if it depends on the M ID if the M id M ID is zero then I will use the guess cost otherwise I will use the member cost so let me go back to my code here and after this I can say I want to take the slots and I want to multiply it by either member cost or guest cost now how can I put some logic in here that will choose uh either member cost or guest cost based on the ID of this person what can I use in order to make this Choice whenever I have such a choice to make I need to use a case statement so I can start with a case statement here and I will already write the end of it so that I don’t forget it and then in the case statement M what do I need to check for I need to check that the member ID is zero in that case I will use the guest cost and in all other cases I will use the member cost so I’m taking slots and then I’m using this case when to decide by which column I’m going to multiply it and this is actually my cost now let’s take a look at this and so I get this error that the column reference M ID is ambiguous so can you figure out why I got this error what’s happening is that I have joined U multiple tables and the M ID column appears twice now in my join and so I cannot refer to it just by name because SQL doesn’t know which column I want so I have to to reference the parent of the column every time I use it so here I will say that it comes from the booking table and now I get my result so if I see here then um I can see that I have successfully calculated my cost and let’s look at the first row uh first it’s um the me ID is not zero therefore it’s a member and here the member cost is zero meaning that this facility is free for members so regardless of the slots the cost will be zero and let’s look at one who is a guest so this one uh is clearly a a guest and they have uh taken one slot and the member cost is zero but uh so it’s free for members but it costs five per slot for guests so the total cost is five So based on this sanity check the cost looks good now I need to actually filter my table because we have um we should consider only bookings that occurred in a certain day so after creating my new table uh and joining I can write aware filter to drop the rows that I don’t need and I can say this is the the time column that I’m interested in the start time needs to be equal to this date over here and we have seen before that this will not work because start time is a Tim stamp it also shows hour um minute and seconds whereas here is just a date so this comparison will fail and so before I do the comparison I need to take this and reduce it to a date so that I’m comparing Apples to Apples on the time check that that didn’t break anything now we should have significantly fewer rows so now what we need to do is to only keep rows that have a cost which is higher than 30 so can I go here and say end cost bigger than 30 no I cannot do it column cost does not exist right typical mistake but if you look at the logical order of SQL operations first you have the sourcing of the data then you have the wear filter and then all of the logic um by which we calculate the cost happens here and the label cost happens here as well so we cannot um filter on this column on the column cost because the we component has no idea about the uh column cost so this will now work but what we can do is to take all of the logic we’ve done until now and wrap it in round brackets and then introduce a Common Table expression and call this T1 so I will say with T1 as and then I can from T1 and now I can use my filter right so cost bigger than 30 I can select star from this table and I’m starting to get somewhere because the cost has been successfully filtered now I have a lot of columns that I don’t want in my final result that I used to help me reason about the cost so I want to keep member and I want to keep the facility but I don’t want to keep any of these great now as a final step I need to order by cost descending and there’s actually a issue that I have because I copy pasted code from the previous exercise I kept a distinct and you have to be very careful with this especially if you copy paste code anyway for learning it would be best to write it always from scratch but the distinct will remove uh rows that are duplicate and can actually cause an issue now I remove the distinct and I get the um solution that I want and if you look here we have if you look at the last two rows you can see that they’re absolutely identical and so the distinct would remove them but there are two uh bookings that happen to be just the same uh in our data and we want to keep them we don’t want to delete them so having distinct was a mistake in this case to summarize what we did here first we joined all the tables so we could have all the columns uh that we needed side by side and then we filtered on on the date pretty straightforward and then we took the first name and surname and um concatenated them together as well as the facility name and then we computed the cost and to compute the cost we got the number of slots and we used used a case when to multiply this by either the guest cost or the member cost according to the member’s ID and at the end we wrapped everything in a Common Table expression so that we could filter on this newly computed value of cost and keep only those bookings that had a cost higher than 30 now I am aware that the question said not to use any subqueries technically I didn’t because this is a common table expression but if you look at the author solution it is slightly different than ours so here they did basically the same thing that we did to compute the the cost except that in the case when they inserted the whole uh expression which is fine works just the same the difference is that um in this case they added a lot of logic in the we filter so that they could use a we filter in the first query so clearly they didn’t use any columns that were added at the stage of the select they didn’t use cost for example because like we said that wouldn’t be possible so what they did is that they added the date filter over here and then in this case they added a um logical expression and in this logical expression either one of these two needed to be true for us to keep the row either the M ID is zero meaning that it’s a it’s a guest and so the calculation based on Guess cost ends up being bigger than 30 or the M ID is not zero which means it’s a member and then this calculation based on the member cost ends up being bigger than 30 so this works I personally think that there’s quite some repetition of the cost calculation both by putting it in the we filter and by uh putting it inside the case when and so I think that uh the solution we have here is a bit cleaner because we’re only calculating cost once uh in this case and then we’re simply referencing it thanks to the Common Table expression so if you look at the mental models course you will see that I warmly recommend not repeating logic in the code and using Common Table Expressions as often as possible because I think that they made the code uh clearer and um simpler to to understand produce a list of all members and the recommender without any joins now we have already Sol solved this problem and we have solved it with a self join as you remember we take the members table and join it on itself so that we can get this uh recommend by ID and plug it into members ID and then see the names of both the member and the recommender side by side but here we are challenged to do it without a join so let us go to the members table and let us select the first name and the surname now we actually want want to concatenate these two into a single string and call this member now how can we get data about the recommender without a self-join typically when you have to combine data you always have a choice between a join in a subquery right so what we we can do is to have a subquery here which looks at the recommended by ID from this table and um goes back to the members table and gets the the data that we need so let’s see how that would look let us give an alias to this table and call it Ms and now we need to go back to this table inside the subquery and we can call it Rex and we want to select again the first name and surname like we’re doing here and how are we able to identify the right row inside this subquery we can use aware filter and we want the Rex M ID to be equal to the Mims recommended by value and once we get this value we can call this recommender and now we want to avoid duplicates so after our outer select we can say distinct which will remove any duplicates from the result and then we want to sort I guess by member and recommender and here we get our result so replacing a join with a subquery so we go row by Row in members and then we take the recommended by ID and then we query the members table again inside the subquery and we use the wear filter to plug in that recommended by and find the row where the mem ID is equal to it and then getting first name and surname we get the data about the recommender and uh and that’s how we can do it in the mental models course we discuss the subqueries and um and this particular case we talk about a correlated subquery why is this a correlated subquery because you can imagine that the the query that is in here it runs again for every row because for every row row I have a different value recommended by and I need and I need to plug this value into the members table to get the data about the recommender so this is a correlated subquery because it runs uh every time and it is different for every row of the members table produce a list of costly bookings using a subquery so this is the exact exercise that we did before and as you will remember uh we actually ignored it instructions a bit and we did use not a subquery but a Common Table expression and by reference this is the code that we used and this code works with that exercise as well and we get the result so you can go back to that exercise to see the logic behind this code and why this works and if we look at the author’s uh solution they are actually using a subquery instead of a common table expression so they have an outer quer query which is Select member facility cost from and then instead of the from instead of telling the name of the table they have all of this logic here in this subquery which they call bookings and finally they they add a filter and order now this is technically correct it works but I’m not a fan of uh of writing queries like this I prefer writing them like this as a common table expression and I explain this in detail in my mental models course the reason I prefer this is because U it doesn’t break queries apart so in my case this is one query and this is another query and it’s pretty easy and simple to read however in this case you will start reading this query and then it is broken uh in in two by another query and when people do this sometimes they go even further and here when you have the from instead of a table you have yet another subquery it gets really complicated um so because of these uh two approaches are equivalent I definitely recommend going for a Common Table expression every time and avoiding subqueries unless they are really Compact and you can fit them in one row let us now get started with aggregation exercises and the first problem count the number of facilities so I can go to the facilities table and then when I want to count the number of rows in a table and here every row is a facility I can use the countar aggregation and we get the count of facilities so what we see here is a global aggregation and when you run an aggregation without having done any grouping it runs on the whole table therefore it will take all the rows of this table no matter how many compress them into one number which is determined by the aggregation function in this case we have a count and it returns a total of nine rows so in our map aggregation happens right here so we Source the table we filtered it if needed and then we might do a grouping which we didn’t do in this case but whether we do it or not aggregations happen here and if grouping didn’t happen the aggregation is at the level of the whole table count the number of expensive facilities this is similar to the previous exercise we can go to the facilities table but here we can add a filtering because we’re only interested in facilities that have guest cost greater than or equal to 10 and now once again I can get my aggregation count star to count the number of rows of this resulting table looking again at our map why does this work because with the from We’re sourcing the table and immediately after the wear runs and it drops unneeded rows and then we can decide whether to group by or not and in our case in this case we’re not doing it um but then the aggregations Run so by the time the aggregations run I’ve already dropped the rows in the wear and this is why in this case after dropping some rows the aggregation only sees six rows which is what we want count the number of recommendations each member makes so in the members table we have a field which is recommended by and here is the ID of the member who recommended the member that that this row is about so now we want to get all these uh recommended by values and count how many times they appear so I can go to my members table and what I need to do here is to group by recommended by so what this will do is that it will take all the unique values of this column recommended by and then you will allow me to do an aggregation on all of the rows in which those values occur so now I can go here to select and call this column again and if I run this query I get all the unique values of recommended buy without any repetitions and now I can run an aggregation like count star what this will do is that for recomend recomended by value 11 it will run this aggregation on all the rows in which recommended by is 11 and the aggregation in this case is Count star which means that it will return the number of rows in which 11 appears which in the result happens to be one and so on for all the values what I also want to do is to order by recommended buy to match the expected results now what we get here is almost correct we see all the unique values of this column and we see the number of times that it appears in our data but there’s one discrepancy which is this last row over here so in this last row you cannot see anything which means that it’s a null value so it’s a value that represents absence of data and why does this occur if you look at the original recommended by column there is a bunch of null values in this column because there’s a bunch of member that have null in recommended by so maybe we don’t know who recommended them or maybe they weren’t recommended they just applied independently when you group bu you take all the unique values of the recommended by column and that includes the null value the null value defines a group of its own and the count works as expected because we can see that there are nine members for whom we don’t have the recommended by value but the solution does not want to see this because we only want to see the number of recommendations each member has made so we actually need to drop this row therefore how how can I drop this row well it’s as simple as going to uh after the from and putting a simple filter and saying recommended by is not not null and this will drop all of the rows in which in which that value is null therefore we won’t appear in the grouping and now our results are correct remember when you’re checking whether a value is null or not you need to use the is null or is not null you cannot actually do equal or um not equal because um null is not an act ual value it’s just a notation for the absence of a value so you cannot say that something is equal or not equal to null you have to say that it is not null let’s list the total slots booked per facility now first question where is the information that I need the number of slots booked in the is in the CD bookings and there I also have the facility ID so I can work with that table and now how can I get the total slots for each facility I can Group by facility ID and then I can select that facility ID and within each unique facility ID what type of uh aggregation might I want to do in every booking we have a certain number of slots right and so we want to find all the bookings for a certain facility ID and then sum all the slots that are being booked so I can write sum of slots over here and then I want to name this column total slots uh looking at the expected results but this will actually not work because um it’s it’s two two separate words so I actually need to use quotes for this and remember I have to use double quotes because it’s a column name so it’s always double quotes for the column name and single quotes for pieces of text and finally I need to order by facility ID and I get the results so for facility ID zero we looked at all the rows where facility ID was zero and we squished all of this to a single value which is the unique facility ID and then we looked at all the slots that were occurring in these rows and then we compress them we squished them to a single value as well using the sum aggregation so summing them all up and then we get the slum the sum of the total slots list the total slots booked per facility in a given month so this is similar to the previous problem except that we are now isolating a specific time period And so let’s us think about how we can um select bookings that happened in the month of September 2012 now we can go to the bookings table and select the start time column and to help our exercise I will order by start time uh descending and I will limit our results to 20 and you can see here that start time is a time stamp call and it goes down to the second because we have year month day hour minutes second so how can we check whether any of these dates is corresponds to September 2012 we could add a logical check here we could say that start time needs to be greater than or equal to 2012 September 1st and it needs to be strictly smaller than 2012 October 1st and this will actually work as an alternative there is a nice function that we could use which is the following date trunk month start time let’s see what that looks like so what do you think this function does like the name suggests it truncates the date to a specific U granularity that we choose here and so all of the months are reduced to the very first moment of the month in which they occur so it is sort of cutting that date and removing some information and reducing the granularity I could of course uh have other values here such as day and then every um time stem here would be reduced to its day but I actually want to use month and now that I have this I can set an equality and I can say that I want this to be equal to September 2012 and this will actually work and I also think it’s nicer than the range that we showed before now I’ve taken the code for the previous exercise and copied it here because it’s actually pretty similar except that now after we get bookings we need to insert a filter to isolate our time range and actually we can use this logical condition directly I’ll delete all the rest and now what I need to do is to change the ordering and I actually need to order by the the total slots here and I get my result to summarize I get the booking table and then I uh take the start time time stamp and I truncate it because I’m only interested in the month of that of that time and then I make sure that the month is the one I actually need and then I’m grouping by facility ID and then I’m getting the facility ID and within each of those groups I’m summing all the slots and finally I’m ordering by this uh column list the total slots booked per facility per month in the year 2012 so again our data is in bookings and now we want to see how we how can we isolate the time period of the year 2012 for this table now once again I am looking at the start time column from bookings uh to see how we can extract the the year so in the previous exercise we we saw the date trunk function and we could apply it here as well so we could say date trunk start time um Year from start time right because we want to see it at the Year resolution and then we will get something like this and then we could check that this is equal to 2012 0101 and this would actually work but there’s actually a better way to do it what we could do here is that we could say extract Year from start time and when we look at here we got a integer that actually represents the year and it will be easy now to just say equal to 2012 and make that test so if we look at what happened here extract is a different function than date time because extract is getting the year and outputting it as an integer whereas date time is still outputting a time stamp or a date just with lower granularity so you have to use one or another according to your needs now to proceed with our query we can get CD bookings and add a filter here and insert this expression in the filter and we want the year to be 2012 so this will take care of isolating our desired time period next we want to check the total slots within groups defined by facility fac ID and month so we want a total for each facility for each month as you can see here in the respected results such that we can say that for facility ID zero in the month of July in the year 2012 we uh booked to 170 slots so let’s see how we can do that this basically means that we have to group by multiple values right and facility ID is easy we have it however we do not have the month so how can we extract the month from the start time over here well we can use the extract function right which is which we just saw so if we write it like this and we put month here um this function will look at the month and then we’ll output the month as an actual integer and um the thing is that I can Group by uh the names of columns but I can also Group by Transformations on columns it works just as well SQL will compute uh this expression over here and then it will get the value and then it will Group by that value now when it comes to getting the columns what I usually do is that when I group by I want to see the The Columns in which I grouped so I just copy what I had here and I add it to my query and then what aggregation do I want to do within the groups defined by these two columns I have seen it in the previous exercise I want to sum over the the slots and get the total slots I also want to take this column over here and rename it as month and now I have to order by ID and month and we get the data that we needed so what did we learn with this exercise we learned to use the extract function to get a number out of a date and we use that we have used uh grouping by multiple columns which simply defines a group as the combination of the unique values of two or more columns that’s what multiple grouping does we have also seen that not only you can Group by providing a column name but you can also Group by a logical operation and you should then reference that same operation in the select statement so that you can get the uh value that was obtained find the count of members who have made at least one booking so where is the data that we need it’s in the bookings table and for every booking we have the ID of the member who has made the booking so I can select this column and clearly I can run a count on this column and the count will return the number of nonnull values however this count as you can see is quite inflated What’s Happening Here is that uh a single member can make any number of bookings and now we’re basically counting all the bookings in here but if I put distinct in here then I’m only going to count the unique values of mid in my booking table and this give me gives me the total number of members who have made at least one booking so count will get you the count of non-null values and count distinct will get you the count of unique nonnull values list the facilities with more than 1,000 slots booked so what do we need to do here we need to look at each facility and how many slots they each booked so where is the data for this as you can see again the data is in the bookings table now I don’t need to do any filter so I don’t need the wear statement but I need to count the total slots within each facility so I need a group pi and I can Group by the facility ID and once I do that I can select the facility ID and to get the total slots I can simply do sum of slots and I can call this total slots it’s double quotes for a column name now I need to add the filter I want to keep those that have some of slots bigger than 1,000 and I cannot do it in a where statement right so if I were to write this in a where statement I would get that aggregate functions are not allowed in wear and if I look at my map uh we have been through this again the wear runs first right after we Source the data whereas aggregations happens happen later so the wear cannot be aware of any aggregations that I’ve done for this purpose we actually have the having component so the having component works just like wear it’s a filter it drops rows based on logical conditions the difference is that having runs after the aggregations and it works on the aggregations so I get the data do my first filtering then do the grouping compute an aggregation and then I can filter it again based on the result of the aggregation so I can now now go to my query and take this and put having instead of where and place it after the group pi and we get our result and all we need to do is to order bu facility ID and we get our result find the total revenue of each facility so we want a list of facilities by name along with their total revenue first question as always where is my data so if I want facility’s name it’s in the facilities table but to calculate the revenue I need to know about the bookings so I’ll actually need to join on both of these tables so I will write from CD bookings book join CD facilities fact on facility ID next I will want the total revenue of the facilities but I don’t even have the revenue yet so my first priority should be to compute the revenue let us first select the facility name and here I will now need to add the revenue so to do that I will need to have something like cost times slots and that determines the revenue of each booking however I don’t have a single value for cost I have two values member cost and guest cost and as you remember from previous exercises I need to choose every time which of them to apply and the way that I can choose is by looking at the member ID and if it’s zero then I need to use the guest cost otherwise I need to use the member cost so what can we use now in order to choose between these two variants for each booking we can use the case statement for this so I will say case and then immediately close it with end and I’ll say when uh book M ID equals zero then Fox guest cost I always need to reference the parent Table after a join to avoid confusion else fax member cost so this will allow me to get the C cost dynamically it allows me to choose between two columns and I can multiply this by slots and get the revenue now if I run this I get this result which is the name of the facility and the revenue but I need to ask myself at what level am I working here in other words what does each row represent well I haven’t grouped yet so each row here represents a single booking having joined bookings and facilities and not having grouped anything we are still at the level of this table where every row represent a single booking so to find the total revenue for each facility I now need to do an aggregation I need to group by facility name and then sum all all the revenue I can actually do this within the same query by saying Group by facility name and if I run this I will now get an error can you figure out why I’m getting this error now so I have grouped by facility name and then I’m selecting by facility name and that works well because now this column has been squished has been compressed to show only the unique names for each facility however I am then adding another column which is revenue which I have not compressed in any way therefore this column has a different number of rows than than this column and the general rule of grouping is that after I group by one or more columns I can select by The Columns that are in the grouping and aggregations right so nothing else is allowed so fax name is good because it’s in the grouping revenue is not good because it’s not in the grouping and it’s not an aggregation and to solve this I can simply turn it into an aggregation by doing sum over here and when I run this this actually works and now all I need to do is to sort by Revenue so if I say order by Revenue I will get the result that I need so there’s a few things going on here but I can understand it by looking at my map now what I’m doing is that I’m first sourcing the data and I’m actually joining two tables in order to create a new table where my data is then I’m grouping by a c a column which is the facility name so this compresses the column to all the unique facility name and next I run the aggregation right so the aggregation can be a sum over an existing column but as we saw in the mental models course the aggregation can also be a sum over a calculation I can actually run logic in there it’s very flexible so if I had a revenue column here I would just say sum Revenue as revenue and it would be simpler but I need to do some to put some logic in there and uh this logic involves uh choosing whether to get guest cost or member cost but I’m perfectly able to put that logic inside the sum and so SQL will first evaluate this Logic for each row and then um it will sum up all the results and it will give me Revenue finally after Computing that aggregation I uh select the columns that I need and then I do an order buy at the end find facilities with a total revenue of less than 1,000 so the the question is pretty clear but wait a second we calculate ated the total revenue by facility in the previous exercise so we can probably just adapt that code here’s the code from the previous exercise so check that out if you want to know how I wrote this and if I run this code I do indeed get the total revenue for for each facility and now I just need to keep those with a revenue less than 1,000 so how can I do that it’s a filter right I need to filter on this Revenue column um I cannot use a wear filter because this uh revenue is an aggregation and it was computed after the group buy after the wear so the wear wouldn’t be aware of that uh column but as we have seen there is a keyword there is a statement called having which does the same job as where it filters based on logical conditions however it works on aggregations so I could say having Revenue smaller than 1,000 unfortunately this doesn’t work can you figure out why this doesn’t work in our query we do a grouping and then we compute an aggregation and then we give it a label and then we try to run a having filter on this label if you look now at our map for The Logical order of SQL operations this is where the group by happens this is where we compute our aggregation and this is where having runs and now having is trying to use the Alias that comes at this step but according to our rules having does not know of the Alias that’s assigned at this step because it hasn’t happened yet now as the discussion for this exercise says there are in fact database systems that try to make your life easier by allowing you to use labels in having but that’s not the case with postgress so we need a slightly different solution here note that if I repeated all of my logic in here instead of using the label it would work so if I do this I will get my result I just need to order by Revenue and you see that I get the correct result why does it work when I put the whole logic in there instead of using the label once again the logic happens here and so the having is aware of this logic having happened but the having is just not aware of the Alias however I do not recommend repeating logic like this in your queries because it increases the chances of errors and it also makes them less elegant less readable so the simpler solution we can do here is to take this original query and put it in round brackets and then create a virtual table using a Common Table expression here and call this all of these T1 and then we can treat T1 like any other table so I can say from T1 select everything where revenue is smaller than 1,000 and then order by Revenue remove all this and we get the correct answer to summarize you can use having to filter on the result of aggregation ations unfortunately in postest you cannot use the labels that you assign to aggregations in having so if it’s a really small aggregation like if it’s select some revenue and then all of the rest then it’s fine to say sum Revenue smaller than 1,000 there’s a small repetition but it’s not an issue however if your aggregation is more complex as in this case you don’t really want to repeat it and then your forced to add an extra step to your query which you can do with a common table expression output the facility ID that has the highest number of slots booked so first of all we need to get the number of slots booked by facility and we’ve actually done it before but let’s do it again where is our data the data is in the booking table and uh we don’t need to filter this table but we need we do need to group by the facility ID and then once we do this we can select the facility ID this will isolate all the unique values of this column and within each unique value we can sum the number of slots and call this total slots and if we do this we get the total slots for each facility now to get the top one the quickest solution really would be to order by total slots and then limit the result to one however this would give me the one with the smallest number of slots because order is ascending by default so I need to turn this into descending and here I would get my solution but given that this is a simple solution and it solved our exercise can you imagine a situation in which this query would not achieve what we wanted it to let us say that there were multiple facilities that had the top number of total slots so the top number of slots in our data set is 1404 that’s all good but let’s say that there were two facilities that had this uh this top number and we wanted to see both of them for our business purposes what would happen here is that limit one so the everything else would work correctly and the ordering would work correctly but inevitably in the ordering one of them would get the first spot and the other would get the second spot and limit one is always cutting the output to a single row therefore in this query we would only ever see one facility ID even if there were more that had the same number of top slots so how can we solve this clearly in instead of combining order by and limit we need to figure out a filter we need to filter our table such that only the facilities with the top number of slots are returned but we cannot really get the maximum of some slots in this query because if I tried to do having some slots equals maximum of some slots I would be told that aggregate function calls cannot be nested and if I go back to my map I can see that having can only run after all the aggregations have completed but what we’re trying to do here is to add a new aggregation inside having and that basically doesn’t work so the simplest solution here is to just wrap all of this into a Common Table expression and then get this uh table that we’ve just defined and then select star where the total slots is equal to the maximum number of slots which we know to be 1404 however we cannot hardcode the maximum number of slots because for one we might not know what it is and for and second it uh it will change with time so this won’t work when the data changes so what’s the alternative to hardcoding this we actually need some logic here to get the maximum value and we can put that logic inside the subquery and the subquery will go back to my table T1 and you will actually find the maximum of total slots from T1 so first this query will run it will get the maximum and then the filter will check for that maximum and then I will get uh the required result and this won’t break if there are many facilities that share the same top spot because we’re using a filter all of them will be returned so this is a perfectly good solution for your information you can also solve this with a window function and um which is a sort of row level aggregation that doesn’t change the structure of the data we’ve seen it in detail in the mental models course so what I can do here is to use a window function to get the maximum value over the sum of slots and then I can I will say over to make it clear that this is a window function but I won’t put anything in the window definition because I I just want to look at my whole data set here and I can label this Max slots and if I look at the data here you can see that I will get the maximum for every row and then to get the correct result I can add a simple filter here saying that total slots should be equal to Max slots and I will only want to return facility ID and total slots so this also solves the problem what’s interesting to note here for the sake of understanding window functions more deeply is that the aggregation function for this uh window Clause works over an aggregation as well so here we sum the total slots over each facility and then the window function gets the maximum of all of those uh value and this is quite a powerful feature um and if I look at my map over here I can see that it makes perfect sense because here is where we Group by facility ID and here is where we compute the aggregation and then the window comes later so the window is aware of the aggregation and the window can work on on that so A few different solutions here and overall um a really interesting exercise list the total slots booked per facility per month part two so this is a bit of a complex query but the easiest way to get it is to look at the expected results so what we see here is a facility ID and then within each month of the year 2012 we get the total number of slots and um at the end of it we have a null value here and for facility zero and what we get is the sum of all slots booked in 2012 and then the same pattern repeats repeats with every facility we have the total within each month and then finally we have the total for that facility in the year here so there’s two level of aggregations here and then if I go at the end there’s a third level of aggregation which is the total for all facilities within that year so there are three levels of aggregation here by increasing granularity it’s total over the year then total by facility over the year and then finally total by Facility by month within that year so this is a bit breaking the mold of what SQL usually does in the sense that SQL is not designed to return a single result with multiple levels of aggregation so we will need to be a bit creative around that but let us start now with the lowest level of granularity let’s get this uh this part right facility ID and month and and then we’ll build on top of that so the table that I need is in the bookings table and first question do I need to filter this table yes because I’m only interested in the year 2012 so we have seen that we can use the extract function to get the year out of a Tim stamp which would be start time and we can use this function in a wear filter and what this function will do is that it will go to that time stamp and then we will get an integer out of it it will get a number and then we can check that this is uh the year that we’re interested in and let’s do a quick sanity check to make sure this worked so I will get some bookings here and they will all be in the year 2012 next I need to Define my grouping right so I will need to group by facility ID but then I will also need to group by month however I don’t actually have a column named uh month in this table so I need to calculate it I can calculate it once again with the extract function so I can say extract extract month from start time and once again this will go to the start time and sped out a integer which for this first row would be seven and uh as you know in the group bu I can select a column but I can also select an operation over a column which works just as well now after grouping I cannot do select star anymore but I want to see The Columns that I have grouped by and so let us do a quick sanity check on that it looks pretty good I get the facility ID and the month and I can actually label this month and next I simply need to take the sum over the slots within each facility and within each month and when I look at this I have my first level of granularity and you can see that the first row corresponds to the expected result now I need to add the next level of granularity which is the total within each facility so can you think of how can I add that next level of granularity to my results the key Insight is to look at this uh expected results table and to see it as multiple tables stacked on top of each other one table is the one that we have here and this is uh total by facility month a second table that we will need is the total by facility and then the third table that we will need is the overall total which you could see here at the bottom and how can we stack multiple tables on top of each each other with a union statement right Union will stack all the rows from my tables on top of each other so now let us compute the table which has the total by facility and I will actually copy paste what I have here and and I just need to remove a level of grouping right so if I do this I I will not Group by month anymore and I will not Group by month anymore and once I do this I get an error Union query must have the same number of columns so do you understand this error here so I will write a bit to show you what’s happening so how does it work when we Union two tables let’s say the first table in our case is facility ID month and then slots and then the second table if you look here it’s facility ID and then slots now when you Union these two tables SQL assumes that you have the same number of columns and that the ordering is also identical so here we are failing because the first table has three columns and the second table has only two and not only We are failing because there’s a numbers mismatch but we are also mixing the values of month and Slots now this might work because they’re both integers so SQL won’t necessarily complain about this but it is logically wrong so what we need to do is to make sure that when we’re unioning these two tables we have the same number of columns and the same ordering as well but how can we do this given that the second table does indeed have one column less it does have less information so what I can do is to put null over here so what happens if I do select null this will create a column of a of constant value which is a column of all NS and then the structure will become like this now when I Union first of all I’m going to have the same number of columns so I’m not going to see this uh this error again that we have here and second in u the facility ID is going to be mixed with the facility ID slots is going to be mixed with slots which is all good and then month is going to be mixed with null which is what we want because in some cases we will have the actual month and in some cases we won’t have any anything so I have added uh null over here and I am unioning the tables and if I run the query I can see that I don’t get any error anymore and this is what I want so I can tell that this row is coming from the second table because it has null in the value of month and so it’s showing the total slots for facility um zero in every month whereas this row came from the upper table because it’s showing the sum of slots for a facility within a certain month so this achieves the desired result next we want to compute the last level of granularity which is the total so once again I will select my query over here and and I don’t even need to group by anymore right because it’s the total number of slots over the whole year so I can simply say sum of slots as slots and remove the grouping next I can add the Union as well so that that I can keep stacking these tables and if I run this I get the same error as before so going back to our little uh text over here we are now adding a third table and this table only has slots and of course I cannot this doesn’t work because there’s a mismatch in the number of columns and so the solution here is to also add a null column here and a null column here and so I have the same number of columns and Slots gets combined with slots and everything else gets filled with null values and I can do it here making sure that the ordering is correct so I will select null null and then sum of slots and if I run this query I can see that the result works the final step is to add ordering sorted by ID and month so at the end of all of these unions I can say order by facility ID one and I finally get my result so this is now the combination of three different tables stacked on top of each other that show different levels of granularity and as you can see here in the schema we added null columns to uh two of these tables just to make sure that they have the same number of columns and that they can stack up correctly and now if we look again at the whole query we can see that there are actually three select statements in this query meaning three tables which are calculated and then finally stack with Union and all of them they do some pretty straightforward aggregation the first one um Aggregates by facility ad and month after extracting the month the second one simply Aggregates by facility ID and the third one gets the sum of slots over the whole data without any grouping and then we are adding the null uh constant columns here to make the the column count [Music] match and it’s also worth it to see this in our map of the SQL operations so here um you can see that this order is actually repeating for every table so for each of our three tables we are getting our data and then we are running a filter to keep the year 2012 and then we do a grouping and compute an aggregation and select the columns that we need adding null columns when necessary and then it repeats all over right so for the second table again the same process for the third table the same process except that in the third table we don’t Group by and then when all three tables are done the union r runs the union runs and stacks them all up together and now instead of three tables I only have one table and after the union has run now I can finally order my table and return the result list the total hours booked per named facility so we want to get the facility ID and the facility name and the total hours that they’ve been booked keeping keeping in mind that what we have here are number of slots for each booking and a slot represents 30 minutes of booking now to get my data I will need both the booking table and the facilities table because I need both the information on the bookings and the facility name so I will get the bookings table and the facilities table and join them together next I don’t really need to filter on anything but I need to group by facility so I will Group by facility ID and then I also need to group by facility name otherwise I won’t be able to use this in the select part and now I can select these two columns and to get the total hours I will need to get the sum of the slots so I can get the total number of slots within each facility and I will need to divide this by two right so let’s see what that looks like now superficially this looks correct but there’s actually a pitfall in here and to realize a pitfall I will take some slots as well before dividing it by two and you can see it already in the first row 9911 ided by 2 is not quite 455 so what is happening here the thing is that in postgress when you take an integer number such as some slots the sum of the slots is an integer number and you divide by another integer postgress assumes that you you are doing integer Division and since you are dividing two integers it returns uh an integer as well so that means that um that the solution is not exact if you are thinking in floating Point numbers and the solution for this is that at least one of the two numbers needs to be a Flo floating Point number and so we can turn two into 2.0 and if I run this I now get the correct result so it’s important to be careful with integer division in postest it is a potential Pitfall now what I need to do is to reduce the number of zeros after the comma so I need some sort of rounding and for this I can use the round function which looks like this and this is a typical function in uh in SQL and how it works is that it takes two arguments the first argument is a column and actually this is the column right this whole operation and then the second argument is how many uh figures do you want to see after the zero after the comma sorry so now I can clean this up a bit label this as total hours and then I will need to order by facility ID and I get my result so nothing crazy here really we Source our data from a join which is this part over here and then we Group by two columns we select those columns and U then we sum over the slots divide making sure to not have integer division so we use one of the numbers becomes a floating Point number and we round the result of this column list each Member’s First booking after September 1st 2012 so in order to get our data where does our data leave we need the data about the member and we also need data about their bookings so the data is actually in the members and bookings table so I will quickly join on these [Music] tables and we now have our data do we need a filter on our data yes because we only want to look after September 1st 2012 so we can say where start time is bigger than and it should be enough to just provide the date like this now in the result we need the members surname and first name and their memory ID and then we get to we need to see the first booking in our data meaning the earliest time so again we have an aggregation here so in order to implement this aggregation I need to group by all of these columns that I want to call so surname first name and member ID now that I have grouped by this columns I can select them so now I am I have grouped by each member and now I have all the dates for all their bookings after September 1st 2012 and now how can I look at all these dates and get the earliest date what type of aggregation do I need to use I can use the mean aggregation which will look at all of the dates and then compress them to a single date which is the smallest date and I can call this start time finally I need to order by member ID and I get the result that I needed so this is actually quite straightforward I get my data by joining two tables I make sure I only have the data that I need by filtering on the on the time period and then I group by all the information that I want to see for each member and then within each member I use mean to get the smallest date meaning the earliest date now I wanted to give you a bit of an insight into the subtleties of how SQL Compares timestamps and dates because the results here can be a bit surprising so I wrote three logical Expressions here for you and your job is to try to guess if either of these three Expressions will be true or false so take a look at them and try to answer that as you can see what we have here is a time stamp uh that indicates the 1st of September 8:00 whereas here we have uh simply the indication of the date the 1st of September and the values are the same in all three but my question is are they equal is this uh greater or is this smaller so what do you think I think the intuitive answer is to say that in the first case we have September 1st on one side September 1st on the other they are the same day so this ought to be true whereas here we have again the same day on both sides so this is not strictly bigger than the other one so this should be false and it is also not strictly smaller so this would be false as well now let’s run the query and see what’s actually happening right so what we see here is that we thought this would be true but it’s actually false we thought this would be false but it’s actually uh true and this one is indeed false so are you surprised by this result or is it what you expected if you are surprised can you figure out what’s going on here now what is happening here is that you are running a comparison between two expressions which have a different level of granularity the one on the left is showing you day hour minute seconds and the one on the right is showing you the date only in other words the value on the left is a Tim stamp whereas the value on the right is a date so different levels of precision here now to make the comparison work SQL needs to convert one into the other it needs to do something that is known technically as implicit type coercion what does it mean type is the data type right so either time stamp or date type coercion is when you take a value and you convert it to a different type and it’s implicit uh because we haven’t ask for it and SQL has to do it on its own behind the scenes and so how does SQL choose which one to convert to the other the choice is based on let’s keep the one with the highest precision and convert the other so we have the time stamp with the higher Precision on the left and we need to convert the date into the timestamp this is how SQL is going to handle this situation it’s going to favor the one with the highest Precision now in order to convert a date to a time stamp what SQL will do is that it will add all zeros here so this will basically represent the very first second of the day of uh September 1st 2012 now we can verify which I just showed you I’m going to comment this line and I’m going to add another logical expression here which is taking the Tim stamp representing all zeros here and then setting it equal to the date right here so what do we we expect to happen now we have two different types there will be a type coercion and then SQL will take this value on the right and turn it into exactly this value on the left therefore after I check whether they’re equal I should get true here turns out that this is true but I need to add another step which is to convert this to a Tim stamp and after I do this I get what I expected which is that this comparison here is true so what this notation does in postest is that it does the type coercion taking this date and forcing it into a time stamp and I’ll be honest with you I don’t understand exactly why I need to to do this here I thought that this would work simply by taking this part over here but u i I also need to somehow explicitly tell SQL that I want this to be a time stamp nonetheless this is the Insight that we needed here and it allows us to understand why this comparison is actually false because we are comparing a time stamp for the very first second of September 1st with a time stamp that is the first second of the eighth hour of September 1st and so it fails and we can also see why on on this line the left side is bigger than the right hand side and uh and this one did not actually fool us so we’re good with that so long story short if you’re just getting started you might not know that SQL does this uh implicit type coercion in the background and this dates comparison might leave you quite confused now I’ve cleaned the code up a bit and now the question is what do we need to do with the code in order to match our initial intuition so what do we need to do such that this line is true and the second line is false and this one is still false so we don’t have to worry about it well since the implicit coercion turns the date into a time stamp we actually want to do the opposite we want to turn the time stamp into a date so it will be enough to do the type coion ourselves and transform this into dates like this and when I run this new query I get exactly what I expected so now I’m comparing at the level of precision or granularity that I wanted I’m only looking at the at the date so I hope this wasn’t too confusing I hope it was a bit insightful and that you have a new appreciation for the complexities that can arise when you work with dates and time stamps in SQL produce a list of member names with each row containing the total member count let’s look at the expected results we have the first name and the surname for each member and then every single row shows the total count of members there are 31 members in our table now if I want to get the total count of members I can take the members table and then select the count and this will give me 31 right but I cannot add first name and surname to this I will get uh an error because count star is an aggregation and it takes all the 31 rows and produces a single number which is 31 while I’m not aggregating first name and surname so the standard aggregation doesn’t work here I need an aggregation that doesn’t change the structure of my table and that works at the level of the row and to have an aggregation that works at the level of the row I can use a window function and the window function looks like having an aggregation followed by the keyword over and then the definition of the window so if I do this I get the count at the level of the row and to respect the results I need to change the order a bit here and I get the result that I wanted so a window function has these two main components an aggregation and a window definition in this case the aggregation counts the number of rows and the window definition is empty meaning that our window is the entire table and so this aggregation will be computed over the entire table and then added to each row there are far more details about the window functions and how they work in my mental model course produce a numbered list of members ordered by their date of joining so I will take the members table and I will select the first name and surname and to to produce a numbered list I can use a window function with the row number aggregation so I’ll say row number over so row number is a special aggregation that works only for window functions and what it does is that it numbers the rows um monotonically giving a number to each starting from one and going uh forward and it never assigns the same number to two rows and in the window you need to define the ordering uh for for the numbering so what is the ordering in this case it’s um defined by the join date and by default it’s ascending so that’s good and we can call this row number and we get the results we wanted and again you can find a longer explanation for this with much more detail about the window functions and and row number in the mental models course output the facility ID that has the highest number of slots booked again so we’ve we’ve already solved this problem in a few different ways let’s see a new way to to solve it so we can go to our bookings table and we can Group by facility ID and then we can get the facility ID in our select and then we could sum on slots to get the total slots booked for each facility and since we’re dealing with window functions we can also rank facilities based on the total slots that they have booked and this would look like rank over order by some slots descending and we can call this RK for Rank and if I order by some slots uh descending I should see that my rank works as intended so we’ve seen this in the mental models course you can think of rank as U deciding the outcome of a Race So the person who did the most in this case gets ranked one and then everyone else gets rank two 3 four but if there were two um candidates that got the same score the highest score they would both get rank one because they would both have won the race so to speak and the rank here is defined over the window of the sum of slots descending so that is what we need and next to get all the facilities that have the highest score or we could wrap this into a Common Table expression and then take that table and then select the facility ID and we can label this column total then we will get total and filter for where ranking is equal to one and we get our result aside from how rank works the the other thing to note in this exercise is that we can Define the window based on an aggregation so in this case we are ordering the elements of our window based on the sum of slots and if we look at our map over here we can see that uh we get the data we have our group ey we have the aggregation and then we have the window so the window follows the aggregation and So based on our rules the window has access to the aggregation and it’s able to use it rank members by rounded hours used so the expected results are quite straightforward we have the first name and the surname of each member we have the total hours that they have used and then we are ranking them based on that so the information for this result where is it uh we can see that it’s in the members and bookings tables and so we will need to join on these two tables members Ms join bookings book on M ID and that’s our join now we need to get the total hours so we can Group by our first name and we also need to group by the surname because we will want to display it and now we can select these two columns and we need to compute the total hours so how can we get that for each member we know the slots that they got uh at every booking so we need to get all those those uh slots sum them up and uh every slot represents a 30 minute interval right so to get the hours we need to divide this value by two and remember if I take an integer like sum of slots and divide by two which is also an integer I’m going to have integer division so I won’t have the part after the comma in the result of the division and that’s not what I want so instead of saying divide by two I will say divide by 2.0 so let’s check um how the data looks like this is looking good now but um if we read the question we want to round to the nearest 10 hours so 19 should probably be 20 115 should probably be 120 because I think that we round up when we have 15 and so on as you can see here in the result so how can we do this rounding well we have a Nifty round function which as the first argument takes the column with all the values and the second argument we can specify how do we want the rounding and to round to the nearest 10 you can put -1 here so actually let’s keep displaying the the total hours as well as the rounded value to make sure that we’re doing it correctly so as you can see we are indeed um rounding to to the nearest 10 so this is looking good and for the to understand the reason why I used minus one here and how the rounding function works I will have a small section about it when we’re done with this exercise but meanwhile Let’s uh finish this exercise so now I want to rank all of my rows based on this value here that I have comped computed and since this is an aggregation it will already be available to a window function right because in The Logical order of operations aggregation happen here and then Windows happen afterward and they have access to the data uh from the aggregation so it should be possible to transform this into a window function so think for a moment uh of how we could do that so window function has its own aggregation which in this case is a simple Rank and then we have the over part which defines the window and what do we want to put in our window in this case we want to order by let’s say our um rounded hours and we want to order descending because we want the guest the member with the high hours to have the best rank but uh clearly we don’t have a column called rounded hours what we have here is this logic over here so I will substitute this name with my actual logic and I will get my actual Rank and now I can delete this column here that I was was just looking at and I can sort by rank surname first name small error here I actually do need to show the hour as well so I need to take this logic over here again and call this ours and I finally get my result so to summarize what we are doing in this exercise we’re getting our data by joining these two tables and then we’re grouping by the first name and the surname of the member and then we are summing over the slots for each member dividing by 2.0 to make sure we have an exact Division and uh using the rounding function to round down to the nearest hour and so we get the hours and we use the same logic inside a window function to have a ranking such that the members with the with most hours get rank of one and then the one with the second most hours get rank of two and so on as you can see here in the result and I am perfectly able to use use this logic to Define The Ordering of my window because window functions can use uh aggregations as seen in The Logical order of SQL operations here because window functions occur after aggregations and um and that’s it then we just order by the required values and get our results now here’s a brief overview of how rounding Works in SQL now rounding is a function that takes a certain number and then returns an approximation of that number which is usually easier to parse and easier to read and you have the round function and it works like this the first argument is a value and it can be a constant as in this case so we just have a number or it can be a column um in which case it will apply the round function to every element of the column and the second argument specifies how we want the rounding to occur so here you can see the number from which we start and the first rounding we apply has an argument of two so this means that we really just want to see two uh numbers after the decimal so this is what the first rounding does as you can see here and we we round down or up based on whether the values are equal or greater than five in which case we round up or smaller than five in which case we round down so in this first example two is lesser than five so we just get rid of it and then we have eight eight is greater than five so we have to round up and so when we round up this 79 becomes an 80 and this is how we get to this first round over here here then we have round with an argument of one which leaves one place after the decimal and which is this result over here and then we have round without any argument which is actually the same as providing an argument of zero which means that we really just want to see the whole number and then what’s interesting to note is that the rounding function can be generalized to continue even after we got rid of all the decimal part by providing negative arguments so round with the argument of-1 really means that I want to round uh round this number to the nearest 10 so you can see here that from 48,2 192 we end up at 48,2 190 going to the nearest 10 rounding with a value of -2 means going to the nearest 100 so uh 290 the nearest 100 is 300 right so we have to round up and so we get this minus 3 means uh round to the nearest thousand so if you look at here we have 48,3 and so the nearest thousand to that is 48,000 minus 4 means the nearest 10,000 ,000 so given that we have 48,000 the nearest 10,000 is 50,000 and finally round minus 5 means round to the nearest 100,000 and um the given that we have 48,000 the nearest 100,000 is actually zero and from now on as we keep going negatively we will always get zero on this number so this is how rounding Works in brief it’s a pretty useful function not everyone knows that you can provide it negative arguments actually I didn’t know and then when I did the first version of this course um commenter pointed it out so shout out to him U don’t know if he wants me to say his name but hopefully now you understand how rounding works and you can use it in your problems find the top three Revenue generating facilities so we want a list of the facilities that have the top three revenues including ties this is important and if you look at the expected results we simply have a the facility name and a bit of a giveaway of what we will need to use the rank of these facilities so there’s this other exercise that we did a while back which is find the total revenue of each facility and from this exercise I have taken the code that uh allows us to get to this point where we see the name of the facility and the total revenue for that facility and you can go back there to that exercise to see in detail how this code works but in brief we are joining the bookings and Facilities tables and we are grouping by facility name and then we are getting that facility name and then within each booking we are Computing the revenue by taking the slots and using a case when to choose whether to use guest cost or member cost and so this is how we get the revenue for each single booking and now given that we grouped by facility we can sum all of these revenues to get the total revenue of each facility and this is how we get to this point given this partial result all that’s left to do now is to rank these facilities based on their revenue so what I need here is a window function that will allow me to implement this ranking and this window function would look something like this I have a rank and why is rank the right function even though they sort of uh gave it away because if you want the facilities who have the top revenues including ties you can think of it as a race all facilities are racing to get to the top revenue and then if two or three or four facilities get that top Revenue if there are more in the top position you can’t arbitrarily say oh you are first and they are second second you have to give them all the rank one because you have to tell them um recognize that they are all first so these type of problems uh call for a ranking solution so our window function would use rank as the aggregation and then we need to Define our window and how do we Define our window we Define the ordering for the ranking here so we can say order by Revenue descending such that the high highest revenue will get rank one the next highest will get rank two and so on now this will not work because I don’t have the revenue column right I do have something here that is labeled as Revenue but the ranking part will not be aware of this label however I do have the logic to compute the revenue so I could take the logic right here and paste it over here and I will add a comma now this is not the most elegant looking code but let’s see if it works and we need to order by Revenue descending to see it in action and if I order by Revenue descending you can in fact see that the facility with the highest revenue gets rank one and then it goes down from there so now I just need to clean this up a bit first I will remove the revenue column and then I will remove the ordering and what I need here for the result is to keep only the facilities that have rank of three or smaller so ranks 1 2 three and there’s actually no way to do it in this query so I actually have to wrap this query into a common table expr expression and then take that table and say select star from T1 where rank is smaller or equal to three and I will need to order by rank ascending here and I get the result I needed so what happened here we built upon the logic of getting the total revenue for each facility and again we saw that in the previous exercise and um then what we did here is that we added a rank window function and within this rank we order by this total revenue so this might look a bit complex but you have to remember that when we have many operations that are nested you always start with the innermost operation and move your way up from there so the innermost operation is a case when which chooses between guest cost and member cost and then multiplies it by slots and this inner operation over here is calculating the revenue for each single booking the next operation is an aggregation that takes that revenue for each single booking and sums this these revenues up to get the total revenue by each facility and finally the outermost operation is taking the total revenue for each facility and it’s ordering them in descending order in order to figure out the ranking and the reason all of this works we can go back to our map of SQL operations you can see here that after getting the table the first thing that happens here is the group buy and then the aggregations and here is where we sum over the total of of Revenue and after the aggregation is completed we have the window function so the window function has access to the aggregation and can use them when defining the window and finally after we get the ranking we we have no way of isolating only the first three ranks in this query so we need to do it with a common table expression and if you look here back to our map this makes sense because what components do we have in order to filter our table in order to only keep certain rows we have the wear which happens here very early and we have the having and they both happen before the window so after the window function you actually don’t have another filter so you need to use a Common Table expression classify facilities by value so we want to classify facilities into equally sized groups of high average and low based on their revenue and the result you can see it here for each facility it’s classified as high average or low and the point is that we decid decided uh at the beginning that we want three groups and this is arbitrary we could have said we want two groups or five or six or seven and then but we have three and then all the facilities that we have are distributed equally Within These groups so because we have nine facilities we get uh three facilities within each group and I can already tell you that there is a spe special function that will do this for us so we will not go through the trouble of implementing this manually which could be pretty complex so I have copied here the code that allow allows me to get the total revenue for each facility and we have seen this code more than one time in past exercises so if you’re still not clear about how we can get to this point uh check out the the previous exercises so what we did in the previous exercise was rank the facilities based on the revenue and how we did that is that we took the ranking window function and then we def defined our window as order by Revenue descending except that we don’t have a revenue column here but we do have the logic to compute the revenue so we can just get this logic and paste it in here and when I run this I will get a rank for each of my facilities where the biggest Revenue gets rank one and then it goes up from there now the whole trick to solve this exercise is to replace the rank aggregation with a antile aggregation and provide here the number of groups in which we want to divide our facilities and if I run this you see that I get what I need the facilities have been equally distributed into three groups where group number one has the facilities with the highest revenue and then we have group number two and finally group number three which has the facilities with the lowest revenue and to see how this function works I will simply go to Google and say postest antile and the second link here is the postest documentation and this is the page for window functions so if I scroll down here I can see all of the functions that I can use in window functions and you will recognize some of our old friends here row number rank dance rank uh and here we have antile and what we see here is that antile returns an integer ranging from one to the argument value and the argument value is what we have here which is the number of buckets dividing the partition as equally as possible so we call the enti function and we provide how many buckets we want to divide our data into and then the function divides the data as equally as possible into our buckets and how will this division take place that depends on the definition of the window in this case we are ordering by Revenue descending and so this is how the ntile function works so we just need to clean this up a bit I will remove the revenue part because that’s not required from us and I will call this uh enti quite simply and now I need to add a label on top of this enti value as you can see in the results so to do that I will wrap this into a Common Table expression and when I have a common table expression I don’t need the ordering anymore and then I can select from the table that I have just defined and what what do I want to get from this table I want to get the name of the facility and then I want to get the enti value with a label on top of it so I will use a case when statement to assign this label so case when NTI equals 1 then I will have high when anti equals 2 then I will have average else I will have low uh and the case and call this revenue and finally I want to order by antile so the results are first showing High then average then low and also by facility name and I get the result that I wanted so to summarize uh this is just like the previous exercise except that we use a different window function because instead of rank we use end tile so that we can pocket our data and in the window like we said in the previous exercise there’s a few nested operations and you can figure it out by going to the deepest one and moving upwards so the first one picks up the guest cost or member cost multiplies it by slots gets the revenue Vue for each single booking the next one Aggregates on top of this within each facility so we get the total revenue by facility and then we use this we order by Revenue descending this defines our window and this is what the bucketing system uses to distribute the facilities uh in each bucket based on their revenue and then finally we need to add another layer of logic uh here we need to use a common table expression so that we can label our our percentile with the required um text labels calculate the payback time for each facility so this requires some understanding of the business reality that this data represents so if we look at the facilities table we have an initial outlay which represents the initial investment that was put into getting this facility and then we also have a value of monthly maintenance which is what we pay each month to keep this facility running and of course we will also have a value of monthly revenue for each facility so how can we calculate the amount of time that each facility will take to repay its cost of ownership let’s actually write it down so we don’t lose track of it we can get the monthly revenue of each facility but what we’re actually interested in is the monthly profit right um and to get the profit we can subtract the monthly maintenance for each facility so Revenue minus expenses equals profit and when we know how much profit we make for the facility each month we can take the initial investment and divided by the monthly profit and then we can see how many months it will take to repay the initial investment so let us do that now and what I have done here once again I copied the code to calculate the total revenue for each facility and um we have seen this in the previous exercises so you can check those out if you still have some questions about this and now that we have the total revenue for each facility we know that we have three complete months of data so far so how do we get to this to the monthly Revenue it’s as simple as dividing all of this by three and I will say 3.0 so we don’t have integer division but we have proper division you know and I can call this monthly revenue and now the revenue column does not exist anymore so I can remove the order buy and here I can see the monthly revenue for each facility and now from the monthly revenue for each facility I can subtract the monthly maintenance and this will give me the monthly profit but now we get this error and can you figure out what this is about monthly maintenance does not appear in the group by Clause so what we did here is that we grouped by facility name and then we selected that which is fine and all the rest was gation so remember as a rule when you Group by you can only select the columns that you have grouped by and aggregations and monthly maintenance uh is not an aggregation so in order to make it work we need to add it to the group by statement over here and now I get the monthly profit and finally the last step that I need to take is to take the initial outlay and divide it by by all of the rest that we have computed until now and we can call this months because this will give us the number of months that we need in order to repay our initial investment and again we get the same issue initial outlay is not an aggregation does not appear in the group by clause and easy solution we can just add it to the group by clause so something is pretty wrong here the values look pretty weird so looking at all this calculation that we have done until now can you figure out why the value is wrong the issue here is related to the order of operations because we have no round brackets here the order of operation will be the following initial outlay will be divided by the total revenue then it will be divided by 3.0 and then out of all of these we will subtract monthly maintenance but that’s not what we want to do right what we want to do is to take initial outlay and divide it by everything else which is the profit so I will add round brackets here and here and now we get something that makes much more sense because first we execute everything that’s Within These round brackets and we get the monthly profit and then all of it we divide initial outlay by and then what we want to do is to order by facility name so I will add it here and we get the result so quite a representative business problem calculating a revenue and profits and time to repay initial investment and uh overall is just a bunch of calculations starting from the group bu that allows us to get the total revenue for each booking we sum those revenues to get the total revenue for each facility divide by three to get the monthly Revenue subtract the monthly expenses to get the monthly profit and then take the initial investment and divide by the monthly profit and then we get the number of months that it will take to repay the facility calculate a rolling average of total revenue so for each day in August 2012 we want to see a rolling average of total revenue over the previous 15 days rolling averages are quite common in business analytics and how it works is that if you look at August 1st this value over here is the average of daily revenue for all facilities over the past 15 days including the day of August 1st and then this average is rolling by one win one day or sliding by one day every time so that the next average is the uh same one except it has shifted by one day because now it includes the 2nd of August so let’s see how to calculate this and in here I have basic code that calculates the revenue for each booking and I’ve taken this from previous exercises so if you have any questions uh check those out and what we have here is the name of each facility and um and the revenue for each booking so each row here represents just a single booking so this is what we had until now but if you think about it we’re not actually interested in seeing the name of the facility because we’re going to be uh summing up over all facilities we’re not interested in the revenue by each facility but we are interested in seeing the date in which each booking occurs because we want to aggregate within the date here so to get the date I can get the start part time field from bookings and because this is a time stamp so it shows hours minutes seconds I need to reduce it to a date and what I get here is that for again each row is a booking and for each booking I know the date on which it occurred and the revenue that it generated now for the next step I need to see the total revenue over each facility within the date right so this is a simple grouping so if I group by this calculation over here which gives me my date I can then get the date and now I have um I have compressed all the different occurrences of dates to Unique values right one row for every date and now I need to compress as well all these different revenues for each date to a single value and for that I can put this logic inside the sum aggregation as we have done before and this will give me the total revenue across all facilities for each day and we have it here for the next step my question for you is how can I see the global average over all revenues on each of these rows so that is a roow level aggregation that doesn’t change the structure of the table and that’s a window function right so I can have a window function here that gets the average of Revenue over and for now I can leave my window definition open because I will look at the whole table however um Revenue will not work because revenue is just a label that I’ve given on this column and but but this part here is not aware of the label I don’t actually have a revenue column at this point but instead of saying Revenue I could actually copy this logic over here and it would work because the window function occurs after Computing the aggregation so the window function is aware of it so this should work and now for every row I see the global average over all the revenues by day now for the next step I would like to first order by date ascending so we have it here in order and my next question of for you is how can we make this a cumulative average average so let’s say that our rows are already ordered by date and how can I get the average to grow by date so in the first case the average would be equal to the revenue because we only have one value on the second day the average would be the average of these two values so all the values we’ve seen until now on the third day it would be the average of the first three values and so on how can I do that the way that I can do that is that I can go to my window definition over here and I can add an ordering and I can order by date but of course the column date does not exist because that’s a label that will be assigned after all this part is done uh window function is not aware of labels but again window function works great with logic so I will take the logic and put it in here and now you can see that I get exactly what I wanted on the first row I get the average is equal to the revenue and then as it grows we only look at the current revenue and all the previous revenues to compute the average and but we don’t look at all of the revenues so on the second row uh we have the average between this Revenue over here and this one over here and then on the third row we have the average between these three revenues and so on now you will realize that we are almost done with our problem and the only piece that’s missing is that right now if I pick a random day within my data set say this one the the average here is computed over all the revenues from the previous days so all the days in my data that lead up to this one they get averaged and we compute this revenue and what I want to do to finish this problem is that instead of looking at all the days I only want to look 15 days back so I need to to reduce the maximum length that this window can extend in time from limited to 15 days back now here is where it gets interesting so what we need to do is to fine-tune the window definition in order to only look 15 days back and with window functions we do have the option to fine-tune the window and it turns out that there’s a another element to the window definition which is usually implicit it’s usually not written explicitly but it’s there in the background and it’s the rows part so I will now write rows between unbounded preceding and current rows row now what the rose part does is that it defines how far back the window can look and how far forward the the window can look and what we see in this command is actually the standard Behavior it’s the thing that happens by default which is why we usually don’t need to write it and what this means is that it says look as far back in the past as you can look as far back as you can based on the ordering and the current row so this is what we’ve been seeing until now and if I now run the query again after adding this part you will see that the values don’t change at all because this is what we have been doing until now so now instead of unbounded proceeding I want to look 14 rows back plus the current row which together makes 15 and if I run this my averages change because I’m now looking um I’m now averaging over the current row and the 14 previous rows so the last 15 values and now what’s left to do to match our result is to remove the actual Revenue over here and call this Revenue and finally we’re only interested in values for the month of August 2012 so we need to add a filter but we cannot add a filter in this table definition here because if we added a wear filter here um isolating the period for August 2012 can you see what the problem would be um if my data could only see um Revenue starting from the 1st of August he wouldn’t be able to compute the rolling average here because to get the rolling average for this value you need to look two weeks back and so you need to look into July so you need all the data to compute the rolling revenue and we must filter after getting our result so what that looks like is that we can wrap all of this into a Common Table expression and we can we won’t need the order within the Common Table expression anymore and then selecting this we can filter to make sure that the date fits in the required period so we could truncate this date at the month level and make sure that it is equal that the truncated value value is equal to the month of August and we have seen how day trunk works in the previous exercises and then we could select all of our columns and order by date I believe we may have an extra small error here because I kept the partial wear statement and if I run this I finally get the result that I wanted so a query that was a bit more complex it was the final boss of our exercises um so let’s summarize it we get the data we need by joining booking and facility um and then we are getting the revenue for each booking that is this um multiply slots by either guest cost or member cost cost depending on whether the member is a guest or not this is getting the revenue within each booking then we are grouping by date which you see uh over here and summing all of these revenues that we computed so that we get the total revenue within each day for all facilities then the total revenue for each day goes into a window function which computes an aggre ation at the level of each row and the window function computes the average for these total revenues within a specific window and the window is defines an ordering based on time so the the ordering based on date and the default behavior of the window would be to look at the average for the current day and all the days that precede up until the earliest date and we’re doing here is that we are fine-tuning the behavior of this function by saying hey don’t look all the way back in the past uh only look at 14 rows preceding plus the current row which means that given the time ordering we compute the average over the last 15 values of total revenue and then finally we wrap this in a Common Table expression and we filter so that we only see the rolling average for the month of August and we order by date and that were all the exercises that I wanted to do with you I hope you enjoyed it I hope you learned something new as you know there are more sections in here that go more into depth into date functions and string functions and how you can modify data I really think you can tackle those on your own these were the uh Essentials ones that I wanted to address and once again thank you to the author of this website aliser Owens who created this and made it available for free I did not create this website um so you can just go here and without signing up or paying anything you can just do these exercises my final advice for you don’t be afraid of repetition we live in the age of endless content so there’s always something new to do but there’s a lot of value to um repeating the same exercises over and over again when I Was preparing for interviews when I began as a date engineer I did these exercises and altogether I did them like maybe three or four times um and um I found that it was really helpful to do the same exercises over and over again because often I did not remember the solution and I had to think through it all over again and it strengthened those those uh those learning patterns for me so now that you’ve gone through all the exercises and seen my Solutions uh let it rest for a bit and then come back here and try to do them again I think it will be really beneficial in my course I start from the very Basics and I show you in depth how each of the SQL components work I um explore the logical order of of SQL operations and I spend a lot of time in Google Sheets um simulating SQL operations in the spreadsheet coloring cells moving them around making some drawings in excal draw uh so that I can help you understand in depth what’s happening and build those mental models for how SQL operations work this course was actually intended as a complement to that so be sure to check it out

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 2, 2025

Visual FoxPro 6 Help Documentation – Study Notes

Welcome to Microsoft Visual FoxPro. Visual FoxPro is the object-oriented relational database management system that makes it possible for you to create database solutions for the desktop to the Web. Visual FoxPro provides powerful data handling capabilities, rapid application development tools for maximum productivity, and the flexibility needed to build all types of database solutions.

Visual FoxPro 6 Language Reference Book on Archive.Org

Guidelines for Using Visual FoxPro Foundation Classes

The Visual FoxPro .vcx visual class libraries located in the \Ffc\ folder contain a variety of foundation classes for enhancing your Visual FoxPro applications with little or no programming. You can freely distribute the foundation classes with your applications. These foundation classes are contained in the Component Gallery. The Component Gallery provides a quick and easy way to learn more about the properties, events, and methods of each of the foundation classes.

You can also open up a foundation class in the Class Designer or Class Browser to see its structure and code. This is a great way to learn how the foundation class works as well as offering excellent insights into programming with Visual FoxPro.

The following guidelines provide information about how you can add the Visual FoxPro foundation classes to your applications.

Class Types

You need to know the Visual FoxPro base class of a foundation class before you can add the foundation class to your application. Certain foundation classes can be used only as visual objects on forms, while others are non-visual and can be run programmatically without being placed on a form. The Visual FoxPro Foundation Classes documentation indicates the base class of each foundation class so you can determine if the foundation class can be added to a form, or run programmatically in your application. Note that in the Component Gallery you can right-click a foundation class to display a shortcut menu. Choose Properties from the shortcut menu, and then choose the Class tab to display the base class.

The following table lists the Visual FoxPro base classes and how they can be added to your applications.

Category A – base classes that can be dropped onto a form	Category B – base classes that can be dropped onto a form or run programmatically in your application	Category C – base classes that can only be run programmatically in your application
Checkbox	Custom	Form
Combobox	Container	Formset
Commandbutton	Timer	Toolbar
Commandgroup	ProjectHook
Editbox	ActiveDoc
Grid
Hyperlink
Image
Label
Line
Listbox
OLE Control
Optionbutton
Optiongroup
Shape
Spinner
Textbox

Adding Foundation Classes to Forms

You will most often add foundation classes to forms. You can drag and drop foundation classes from the Component Gallery, Class Browser, Project Manager, and the Forms Control toolbar onto forms.

Note You can select a foundation class you’ve added to a form and then choose Class Browser from the Tools menu to display more information about the foundation class.

Component Gallery The Component Gallery provides the easiest way to add foundation classes to a form. For foundation classes with Category A and B base classes, you can drag the foundation class from the Component Gallery and then drop it on a form. You can also right-click a foundation class in the Component Gallery to display a shortcut menu, and then choose Add to Form to add the foundation class to the form.

Some of the foundation classes have associated builders that are automatically launched to prompt you for more information needed by the foundation class.

Class Browser You can drag foundation classes with Category A and B base classes directly from the Class Browser to a form by using the drag icon in the upper left corner of the Class Browser. Select the foundation class in the Class Browser, click the icon for the foundation class in the upper left corner of the Class Browser, and then drag the icon over the form. Release the mouse button over the form where you’d like the foundation class to appear on the form.

Foundation classes dragged from the Class Browser to a form do not launch the associated builder. However, you can launch the builder by after the foundation class has been dropped on the form. Select the foundation class on the form, and then right-click to display the shortcut menu. Choose Builder from the shortcut menu to launch the builder.

Project Manager Foundation classes with Category A and B base classes can be dragged from the Project Manager and dropped on a form.

Foundation classes dragged from the Project Manager to a form do not launch the associated builder. However, you can launch the builder by after the foundation class has been dropped on the form. Select the foundation class, and then right-click to display the shortcut menu. Choose Builder from the shortcut menu to launch the builder.

Form Controls toolbar Foundation classes with Category A and B base classes added to the Form Controls toolbar can be added to a form.

If the Builder Lock isn’t on, foundation classes dropped from the Form Controls toolbar may launch an associated builder. If the Build Lock is on, you can launch the builder by after the foundation class has been added to the form. Select the foundation class on the form, and then right-click to display the shortcut menu. Choose Builder from the shortcut menu to launch the builder.

Adding Foundation Classes to Projects

When a form containing foundation classes is added to a Visual FoxPro project, the Project Manager automatically adds the visual class libraries containing the foundation classes to the project. However, there are other cases where you may need to add foundation classes to a project. For example, your application may run a Category C foundation class, so the foundation class must be added to the application’s project.

Foundation classes can be added to a project from the Component Gallery, by dragging the .vcx visual class library containing the foundation classes from the Windows Explorer, or by choosing the Add button in the Project Manager.

Adding Foundation Classes from the Component Gallery

You can drag a foundation class from the Component Gallery to a project, or you can right-click the foundation class in the Component Gallery to display a shortcut menu, and then choose Add to Project to add the foundation class to the project. When you add a foundation class to a project, the Add Class to Project dialog box is displayed, prompting you with the following options: Add class to project Choose this option to add the foundation class and its .vcx class library to the project. Again, this is done automatically for classes dropped onto a form (Categories A and B). For certain Category B and C classes where you plan to call them programmatically from within your application, you will want to choose this option. Create a new class from selected class Choose this option to create a new subclass from the foundation class you want to add to the project. This option makes it possible for you to enhance the functionality of the original foundation class, usually by adding additional program code. Create a new form from selected class Choose this option for foundation classes with a Form base class (for example, the foundation classes in _dialogs.vcx). This option makes it possible for you to create a new form from the foundation class and enhance the functionality of the original foundation class.

Adding Foundation Classes from the Windows Explorer

A foundation class can be added to a project by dragging the .vcx visual class library containing the foundation class from the Window Explorer to the Project Manager. The visual class library is added to the Class Libraries item in the Project Manager.

Adding Foundation Classes from within the Project Manager

A foundation class can be added to a project by selecting the Classes tab and then choosing the Add button. Select the class library from the \Ffc\ folder that contains the foundation class to add to the project.

Incorporating Classes into your Application

In many situations, most foundation classes don’t require additional programming to work with your application. However, you may need to provide additional program code for certain foundation classes (those of Category B and Category C non-visual base classes).

Non-Visual Foundation Classes

For example, foundation classes are often based on the Category B Custom class, and these require additional programming. These non-visual classes often perform common tasks such as checking information in the Windows registry, handling environment settings, managing application errors, and utilizing Automation with other applications, such as performing mail merge with Microsoft Word.

You can drop these non-visual classes onto a form, but you will need to do some additional work in order for them to work with your application. In some cases, a builder is launched when you drag a foundation class onto a form.

The following example demonstrates some of the program code typically necessary to use a non-visual foundation class in your application:

Drag the File Version foundation class from the Component Gallery (Foundation Classes\Utilities folder) onto a form.
Add a command button to the form and add the following code to its Click event: WITH THISFORM._FILEVERSION1 .cFileName = HOME( )+ ‘VFP7.EXE’ .GetVersion( ) .DisplayVersion( ) ENDWITH
Run the form and click the command button.

You can incorporate a non-visual class in your application without dropping it on a form, as long as you include it in the project used to create the application. The following code illustrates how to executed this same File Version foundation class if the class is not dropped onto a form.

LOCAL oFileVersion
oFileVersion = NewObject('_fileversion', '_utilities.vcx')
WITH oFileVersion
   .cFileName = HOME( )+ 'VFP7.EXE'
   .GetVersion( )
   .DisplayVersion( )
ENDWITH

Note This example assumes that the code can locate the _utilities.vcx class library or from an .app file that is built containing _utilities.vcx.

When you use a non-visual foundation class, you need to know how and when the class is used within your application so it can be scoped correctly. If only a single form uses the class, you can just drag the class onto the form. However, if the class is used by many forms or is used globally by the application, the foundation class should have a global scope in the application so it remains accessible throughout the application. A global scope may also improve performance.

Visual Foundation Classes

You can also programmatically add visual foundation classes, such as those with form base classes, to your application. The following example shows how you can add code to your application to display an About dialog box box.

LOCAL oAbout
oAbout = NewObject('_aboutbox','_dialogs.vcx')
oAbout.Show( )

You can create a subclass of the dialog box foundation class for each of your applications so that you can customize the contents of the dialog box for each application. The following example demonstrates how you can subclass the Aboutbox foundation class:

Drag and drop the Aboutbox class from the Component Gallery (Foundation Classes\Dialogs folder) to the project for your application.
Select Create new form from selected class in the Add Class to Project dialog box that is displayed, and enter name for the form.
Change the Caption property for the new form for your application. Save and close the form.
Add program code (DO FORM FormName) to the procedure that runs the form, such as an About menu item procedure. -or-

Drag the Run Form button class from the Component Gallery (Foundation Classes\Buttons folder) onto the form. A builder is launched, and you can specify the name of the form to execute.

If you use the Visual FoxPro 7.0 Application Framework, the Application Builder automatically handles adding forms (both .scx and .vcx form classes). The new Application Wizard or the Component Gallery New Application item installs this framework in the projects they create. The Application Builder interacts directly with the framework and enables you to specify how and where the form is launched.

By using a framework built with the Application Wizard, the Application Builder, and Component Gallery, you have a rich set of tools for creating entire applications with minimal manual coding.

Class Naming Conventions

The Visual FoxPro Foundation classes and their properties and methods use the following naming conventions.

Classes and Class Libraries

Most foundation classes are subclassed from classes in the _base.vcx visual class library, which you can also find in the \Ffc\ folder. The naming conventions for these classes reflect the base class used. For example, a subclass of the Custom class is called _Customin _base.vcx. All classes use an underscore ( _ ) to preface the name of a class in _base.vcx.

A few class libraries do not contain classes that are subclassed from _base.vcx because these classes are shared with other Visual FoxPro components such as wizards and builders. These classes are contained in class libraries without a preceding underscore, such as Registry.vcx.

Methods and Properties

Methods are often based on an action name such as RunForm. If the name contains several words, for example, RunForm, then capitalization reflects this. Properties are usually prefaced with a single letter characterizing the data type of that particular property. For example, cFileName indicates that the property is of character type. In addition, default values for properties are also set to the appropriate data type. For example, a logical property can be initialized to false (.F.), while a numeric property can be initialized to 0.

Properties of classes that shipped in earlier versions of Visual FoxPro do not strictly adhere to these property-naming conventions, and retain their earlier names to avoid compatibility conflicts with user code referencing these properties.

Enhancing or Modifying FoxPro Foundation Classes

You can enhance or modify the Visual FoxPro foundation classes to meet the needs of your application. However, we recommend that you do not modify the foundation classes themselves. The foundation classes may be periodically updated with new functionality.

Subclassing the Foundation Class

The source code is provided for the foundation classes, so you can subclass any foundation class to override or enhance properties and methods. This choice is common when the behavior of a particular foundation class varies between different applications. One application might use a foundation class directly, while another application uses a subclass of the foundation class.

Updating _base.vcx

If you want to add global changes to the Visual FoxPro foundation classes, you can modify _base.vcx. Since foundation classes are subclassed from _base.vcx, changes to this class library are automatically propagated to the foundation classes. A common set of methods and properties are provided for all the classes in _base.vcx. However, you can alter the classes in _base.vcx if they add desired behavior to your applications (unlike the foundation classes that we recommend that you do not change).

Instead of changing _base.vcx, however, you should redefine the classes in _base.vcx to inherit their behavior from your own custom base classes (rather than from the Visual FoxPro base classes currently used). If you already have a custom class library which subclasses the Visual FoxPro base classes, you can redefine the classes in _base.vcx to inherit from your custom classes so that when components use the _base classes they will inherit from your custom classes too. You can use the Class Browser to redefine the parent class for a particular class.

Note If you redefine the classes to inherit from your own custom base classes, you should add DODEFAULT( ) calls at appropriate locations if you desire that parent class method code be executed.

If you replace the entire _base.vcx class with your own, make sure that you have the same set of named classes; otherwise the foundation classes will have missing links.

Visual FoxPro 6 Commands

& Command

Performs macro substitution.


& VarName[.cExpression]

Parameters

& VarName

Specifies the name of the variable or array element to reference in the macro substitution. Do not include the M. prefix that distinguishes variables from fields. Such inclusion causes a syntax error. The macro should not exceed the maximum statement length permitted in Visual FoxPro.

A variable cannot reference itself recursively in macro substitution. For example, the following generates an error message:

	Copy Code
STORE ‘&gcX’ TO gcX ? &gcX

Macro substitution statements that appear in DO WHILE, FOR, and SCAN are evaluated only at the start of the loop and are not reevaluated on subsequent iterations. Any changes to the variable or array element that occur within the loop are not recognized. [. cExpression]

The optional period (.) delimiter and .cExpression are used to append additional characters to a macro. cExpression appended to the macro with .cExpression can also be a macro. If cExpression is a property name, include an extra period (cExpression..PropertyName).

Remarks

Macro substitution treats the contents of a variable or array element as a character string literal. When an ampersand (&) precedes a character-type variable or array element, the contents of the variable or element replace the macro reference. You can use macro substitution in any command or function that accepts a character string literal.

Tip:
Whenever possible, use a name expression instead of macro substitution. A name expression operates like macro substitution. However, a name expression is limited to passing character strings as names. Use a name expression for significantly faster processing if a command or function accepts a name (a file name, window name, menu name, and so on). For additional information on name expressions, see Name Expression Creation.

While the following commands are acceptable:

	Copy Code
STORE ‘customer’ TO gcTableName STORE ‘company’ TO gcTagName USE &gcTableName ORDER &gcTagName

use a name expression instead:

	Copy Code
USE (gcTableName) ORDER (gcTagName)

Macro substitution is useful for substituting a keyword in a command. In the following example, the TALK setting is saved to a variable so the setting can be restored later in the program. The original TALK setting is restored with macro substitution.

Note:
Performing concatenation with a single ampersand (&) when attempting to include double ampersands (&&) in a string literal might produce undesirable results. For example, suppose you assign the string “YYY” to a variable, BBB. Performing concatenation using “AAA&” and “&BBB” replaces “BBB” with “YYY” so instead of getting the result “AAA&&BBB”, the result is “AAA&YYY”. For more information, see && Command.

Example

	Copy Code
STORE SET(‘TALK’) TO gcSaveTalk SET TALK OFF * * Additional program code * SET TALK &gcSaveTalk && Restore original TALK setting


&& [Comments]

Parameters

&& [ Comments]

Specifies inline comments that follow.

Remarks

Inserting inline comments to denote the end of the IF … ENDIF, DO, and FOR … ENDFOR structured programming commands greatly improves the readability of programs when including many such structures.

Caution:
Including double ampersands (&&) in a string literal, for example, “AAA&&BBB”, generates an error. Instead, to include double ampersands, use concatenation as shown: “AAA&” + “&” + “BBB”.

Note:
When using concatenation, use caution with placement of a single ampersand (&), which is used to perform macro substitution and thus might produce undesirable results. For example, suppose you assign the string “YYY” to a variable, BBB. Performing concatenation using “AAA&” and “&BBB” replaces “BBB” with “YYY”, so instead of getting the result “AAA&&BBB”, the result is “AAA&YYY”. For more information, see & Command.

To continue a comment on the following line, place a semicolon (;) at the end of the comment line to be continued.

Note:
In earlier versions of Visual FoxPro, you cannot place && and a comment after the semicolon that is used to continue a command line to an additional line.

Example

The following example includes the inline comments “20 years of monthly payments” indicated by the && command:

	Copy Code
STORE (2012) TO gnPayments && 20 years of monthly payments NOTE Initialize the page number; variable. STORE 1 to gnPageNum Set up the loop DO WHILE gnPageNum <= 25 && loop 25 times gnPageNum = gnPageNum + 1 ENDDO && DO WHILE gnPageNum <= 25


* [Comments]

Parameters

Comments

Specifies the comment in the comment line. For example:

	Copy Code
* This is a comment

Remarks

Place a semicolon (;) at the end of each comment line that continues to a following line.

Any text added to a method or event in a Visual Class Library (VCX) or form (SCX) Code window will cause that class to have Override behavior for the method or event. Therefore, code for the method or event in a parent class will not be executed by default (it must be explicitly executed). This includes non-executable comment lines that begin with “*”.

Example

	Copy Code
* Initialize the page number; variable. STORE 1 to gnPageNum * Set up the loop DO WHILE gnPageNum <= 25 && loop 25 times gnPageNum = gnPageNum + 1 ENDDO && DO WHILE gnPageNum <= 25


? \| ?? Expression1 [PICTURE cFormatCodes] \| [FUNCTION cFormatCodes] \| [VnWidth] [AT nColumn] [FONT cFontName [, nFontSize [, nFontCharSet]] [STYLE cFontStyle \| Expression2]] [, Expression3] …

Parameters

? Expression1

Evaluates the expression specified by Expression1 and sends a carriage return and line feed preceding the expression results.

The results display on the next line of the main Visual FoxPro window or the active user-defined window and are printed at the left margin of a page unless a function code specified by cFormatCodes or the _ALIGNMENT system variable specifies otherwise.

If you omit the expressions, a blank line is displayed or printed. A space is placed between expression results when multiple expressions are included.

If Expression1 is an object, the ? command returns the character string, “(Object)”. ?? Expression1

Evaluates the expression specified by Expression1 and displays the expression results on the current line at the current position of the main Visual FoxPro window, an active user-defined window, or the printer. A carriage return and line feed are not sent before the results. PICTURE cFormatCodes

Specifies a picture format in which the result of Expression1 is displayed. cFormatCodes can consist of function codes, picture codes, or a combination of both. You can use the same codes available in the Format Property and InputMask Property.

Function codes affect the overall format of the result; picture codes act on individual characters in the result. If function codes are used in cFormatCodes, they must appear before the picture codes and they must be preceded by an at (@) sign. Multiple function codes with no embedded spaces can immediately follow the @ sign. The last function code must be followed by one or more spaces. The space or spaces signal the end of the function codes and the start of the picture codes. FUNCTION cFormatCodes

Specifies a function code to include in the output from the ? and ?? commands. If the function clause is included, do not precede the function codes with an @ sign. Function codes must be preceded by a @ sign when included in PICTURE. V nWidth

Specifies a special function code that enables the results of a character expression to stretch vertically within a limited number of columns. nWidth specifies the number of columns in the output.

	Copy Code
? ‘This is an example of how the V function code works.’ ; FUNCTION ‘V10’

AT nColumn

Specifies the column number where the output is displayed. This option makes it possible for you to align output in columns to create a table. The numeric expression nColumn can be a user-defined function that returns a numeric value. FONT cFontName[, nFontSize [, nFontCharSet]]

Specifies a font for output by the ? or ?? command. cFontName specifies the name of the font, and nFontSize specifies the point size. You can specify a language script with nFontCharSet. See the GETFONT( ) Function for a list of available language script values.

For example, the following command displays the system date in 16-point Courier font:

	Copy Code
? DATE( ) FONT ‘Courier’,16

If you include the FONT clause but omit the point size nFontSize, a 10-point font is used.

If you omit the FONT clause, and output for the ? or ?? command is placed in the main Visual FoxPro window, the main Visual FoxPro window font is used for the output. If you omit the FONT clause, and output for the ? or ?? command is placed in a user-defined window, the user-defined window font is used for the output.

Note:
If the font you specify is not available, a font with similar font characteristics is substituted.

STYLE cFontStyle

Specifies a font style for output by the ? or ?? commands. If you omit the STYLE clause, the Normal font style is used. If the font style you specify is not available, a font style with similar characteristics is substituted.

Note:
You must include the FONT clause when you specify a font style with the STYLE clause.

The following table lists font styles that you can specify for cFontStyle.

cFontStyle	Font style
B	Bold
I	Italic
N	Normal
Q	Opaque
–	Strikeout
T	Transparent
U	Underline

You can include more than one character to specify a combination of font styles. For example, the following command displays the system date in Courier Bold Italic:

	Copy Code
? DATE( ) FONT ‘COURIER’ STYLE ‘BI’

Remarks

To send the results to the printer only, use the following commands:

	Copy Code
SET PRINTER ON SET CONSOLE OFF

To send the results to the printer and the main Visual FoxPro window or an active user-defined window, use the following command:

	Copy Code
SET PRINTER ON

The setting of SET ALTERNATE affects the destination for the ? and ?? commands. For more information, see SET ALTERNATE Command.

The ? command displays binary data for Varbinary data types in hexadecimal format with no limitation on size. For more information, see Varbinary Data Type.

Example

The following example displays evaluates and displays the expressions specified:

	Copy Code
? 15 * (10+10) ? ‘Welcome to ‘ PICTURE ‘@!’ ?? ‘Visual FoxPro’


??? cExpression

Parameters

cExpression

Specifies the characters that are sent to the printer.

Remarks

A group of three question marks bypasses the printer driver and sends the contents of cExpression directly to the printer. cExpression must contain valid printer codes.

Printer control codes make it possible for you to reset the printer, change type styles and sizes, and enable or disable boldface printing. These codes can consist of any combination of printable or nonprintable characters that are specific to the printer you are using. You can direct control codes to the printer in several different ways:

Use combinations of CHR( ) and quoted strings concatenated with + to send ASCII characters directly to the printer.
Use quotation marks to send a string containing printer codes or ASCII characters.
Codes can be sent to the printer before printing begins and after printing ends with the _PSCODE and _PECODE system variables. For more information, see _PSCODE System Variable and _PECODE System Variable.

Printer control codes vary from printer to printer. The best source for information about printer control codes is the manual that came with your printer.


@ nRow, nColumn CLASS ClassName NAME ObjectName

Parameters

@ nRow, nColumn

Specifies the position of the control or object. The height and width of the control or object is determined by the class default height and width values.

Rows are numbered from top to bottom. The first row is number 0 in the main Visual FoxPro window or in a user-defined window. Row 0 is the row immediately beneath the Visual FoxPro system menu bar.

Columns are numbered from left to right. The first column is number 0 in the main Microsoft Visual FoxPro window or in a user-defined window. When a control or object is placed in a user-defined window, the row and column coordinates are relative to the user-defined window, not to the main Visual FoxPro window.

A position in the main Visual FoxPro window or in a user-defined window is determined by the font of the window. Most fonts can be displayed in a wide variety of sizes; some are proportionally spaced. A row corresponds to the height of the current font; a column corresponds to the average width of a letter in the current font.

You can position the control or object using decimal fractions for row and column coordinates. CLASS ClassName

Specifies the class of the control or object. ClassName can be a Visual FoxPro base class or a user-defined class. The following table lists the Visual FoxPro base classes you can specify for ClassName. NAME ObjectName

Specifies the name of the object reference variable to create. The object-oriented properties, events, and methods of the control or object can be manipulated by referencing this variable. For a complete list of the Visual FoxPro base classes, see Base Classes in Visual FoxPro.

Remarks

@ … CLASS provides an intermediate step for converting programs and applications created in earlier versions of FoxPro to the preferred object-oriented programming methods of Visual FoxPro. For additional information about backward compatibility with FoxPro 2.x controls, see Controls and Objects Created in Earlier Versions.

For information about object-oriented programming in Visual FoxPro, see Object-Oriented Programming.

Example

The following example demonstrates how @ … CLASS can be used with programming techniques used in earlier FoxPro versions (in this example, use of READ to activate controls). @ … CLASS is used to create a text box whose properties can be changed with the Visual FoxPro object-oriented programming techniques.

ON KEY LABEL is used to display the Windows Color dialog box when you press CTRL+I. The TextBox is placed on the main Visual FoxPro window using @ … CLASS, and READ activates the text box.

	Copy Code
CLEAR ON KEY LABEL CTRL+I _SCREEN.PageFrame1.Page1.goFirstName.BackColor; = GETCOLOR( ) @ 2,2 SAY ‘Press Ctrl+I to change the background color’ @ 4,2 CLASS TextBox NAME goFirstName READ CLEAR


@ nRow1, nColumn1 [CLEAR CLEAR TO nRow2, nColumn2]

Parameters

@ nRow1, nColumn1 CLEAR

Clears a rectangular area whose upper-left corner begins at nRow1 and nColumn1 and continues to the lower-right corner of the main Visual FoxPro window or a user-defined window. CLEAR TO nRow2, nColumn2

Clears a rectangular area whose upper-left corner is at nRow1 and nColumn1 and whose lower-right corner is at nRow2 and nColumn2.

Remarks

If you omit CLEAR or CLEAR TO, Visual FoxPro clears nRow1 from nColumn1 to the end of the row.

Example

The following example clears the screen, main Visual FoxPro window, or user-defined window from the second row to the bottom of the window.

	Copy Code
@ 2,0 CLEAR

The following example clears a rectangular region. The area from row 10 and column 0 to row 20 and column 20 is cleared.

	Copy Code
@ 10,0 CLEAR TO 20,20


@ nRow1, nColumn1 FILL TO nRow2, nColumn2[COLOR SCHEME nSchemeNumber COLOR ColorPairList]

Parameters

@ nRow1, nColumn1

Specifies the upper-left corner of the area to change. FILL TO nRow2, nColumn2

Specifies the lower-right corner of the area to change. COLOR SCHEME nSchemeNumber

Specifies the color of the area. Only the first color pair in the specified color scheme determines the color of the area. COLOR ColorPairList

Specifies the color of the area. Only the first color pair in the specified color pair list determines the color of the area.

If you omit the COLOR SCHEME or COLOR clauses, the rectangular portion is cleared. An area can also be cleared with @ … CLEAR.

Remarks

This command changes the colors of text within a rectangular area of the main Visual FoxPro window or the active user-defined window. You can set the foreground and background color attributes for existing text only. Any text output to the same area after you issue @ … FILL appears in the default screen or window colors.

Example

The following example clears the main Visual FoxPro window and fills an area with color.

	Copy Code
ACTIVATE SCREEN CLEAR @ 4,1 FILL TO 10, 8 COLOR GR+/B


@ nRow1, nColumn1 TO nRow2, nColumn2 SCROLL [UP \| DOWN \| LEFT \| RIGHT] [BY nMoveAmount]

Parameters

@ nRow1, nColumn1 TO nRow2, nColumn2 SCROLL

Moves a rectangular area whose upper-left corner is at nRow1, nColumn1 and lower-right corner is at nRow2, nColumn2. UP | DOWN | LEFT | RIGHT

Specifies the direction in which rectangular area is moved. If you omit a direction clause, the area is moved upward. BY nMoveAmount

Specifies the number of rows or columns the rectangular area is moved. If you omit BY nMoveAmount, the region is moved by one row or column.


\TextLine


\\TextLine

Parameters

\ TextLine

When you use \, the text line is preceded by a carriage return and a line feed. \\ TextLine

When you use \\, the text line is not preceded by a carriage return and a line feed.

Any spaces preceding \ and \\ are not included in the output line, but spaces following \ and \\ are included.

You can embed an expression in the text line. If the expression is enclosed in the text merge delimiters (<< >> by default) and SET TEXTMERGE is ON, the expression is evaluated and its value is output as text.

Remarks

The \ and \\ commands facilitate text merge in Visual FoxPro. Text merge makes it possible for you to output text to a file to create form letters or programs.

Use \ and \\ to output a text line to the current text-merge output file and the screen. SET TEXTMERGE is used to specify the text merge output file. If text merge is not directed to a file, the text line is output only to the main Visual FoxPro window or the active user-defined output window. SET TEXTMERGE NOSHOW suppresses output to the main Visual FoxPro window or the active user-defined window.

Example

	Copy Code
CLOSE DATABASES OPEN DATABASE (HOME(2) + ‘Data\testdata’) USE Customer && Open customer table SET TEXTMERGE ON SET TEXTMERGE TO letter.txt \<<CDOW(DATE( ))>>, <<CMONTH(DATE( ))>> \\ <<DAY(DATE( ))>>, <<YEAR(DATE( ))>> \ \ \Dear <<contact>> \Additional text \ \Thank you, \ \XYZ Corporation CLOSE ALL MODIFY FILE letter.txt NOEDIT


= Expression1 [, Expression2 …]

Parameters

Expression1[, Expression2…]

Specifies the expression or expressions that the = command evaluates.

Remarks

The = command evaluates one or more expressions, Expression1, Expression2 …, and discards the return values. This option is particularly useful when a Visual FoxPro function or a user-defined function has a desired effect, but there is no need to assign the function’s return value to a variable, array element, or field.

For example, to turn insert mode on, you can issue the following command:

	Copy Code
= INSMODE(.T.)

INSMODE normally returns a True (.T.) or False (.F.) value. In the example above, the function is executed but the return value is discarded.

If only one expression (Expression1) is included, the equal sign is optional.

Note:
There are two unrelated uses for the equal sign (=). It can be used as an operator in logical expressions to make a comparison, or to assign values to variables and array elements. In these two cases, the equal sign (=) is an operator and not a command. See Relational Operators for more information about using the equal sign (=) as an operator in logical expressions. See STORE Command for more information about using the equal sign (=) to assign values to variables and array elements.


ACTIVATE MENU MenuBarName [NOWAIT] [PAD MenuTitleName]

Parameters

MenuBarName

Specifies the name of the menu bar to activate. NOWAIT

Specifies that at run time the program should not wait for the user to choose a menu from the active menu bar or to press ESC. Instead, the program continues to execute. A menu activated with the NOWAIT option does not return program execution to the line following the ACTIVATE MENU command when DEACTIVATE MENU is issued. PAD MenuTitleName

Specifies the menu title name that is automatically selected when the menu bar is activated. If you don’t specify a menu title name, the first menu title name in the activated menu bar is activated by default.

Remarks

Displays and activates the menu bar specified with MenuBarName. This command works in conjunction with DEFINE MENU and DEFINE PAD.

Note:
When you include the Visual FoxPro system menu bar (_MSYSMENU) in an application, there is no need to activate the menu. Instead, issue SET SYSMENU AUTOMATIC.

Example

The following example uses ACTIVATE MENU to display and activate a user-defined menu system. The current system menu bar is first saved to memory with SET SYSMENU SAVE, and then all system menu titles are removed with SET SYSMENU TO.

Two menu titles are created with DEFINE PAD; DEFINE POPUP is used to create a drop-down menu for each menu title. DEFINE BAR is used to create menu items on each of the menus. When a menu title is chosen, ON PAD uses ACTIVATE POPUP to activate the corresponding menu. ACTIVATE MENU displays and activates the menu bar.

When a menu item is chosen from a menu, the CHOICE procedure is executed. CHOICE displays the name of the chosen item and the name of the menu containing the item.

	Copy Code
* Name this program ACTIMENU.PRG * CLEAR SET SYSMENU SAVE SET SYSMENU TO ON KEY LABEL ESC KEYBOARD CHR(13) DEFINE MENU example BAR AT LINE 1 DEFINE PAD convpad OF example PROMPT ‘\<Conversions’ COLOR SCHEME 3 ; KEY ALT+C, ” DEFINE PAD cardpad OF example PROMPT ‘Card \<Info’ COLOR SCHEME 3 ; KEY ALT+I, ” ON PAD convpad OF example ACTIVATE POPUP conversion ON PAD cardpad OF example ACTIVATE POPUP cardinfo DEFINE POPUP conversion MARGIN RELATIVE COLOR SCHEME 4 DEFINE BAR 1 OF conversion PROMPT ‘Ar\<ea’ ; KEY CTRL+E, ‘^E’ DEFINE BAR 2 OF conversion PROMPT ‘\<Length’ ; KEY CTRL+L, ‘^L’ DEFINE BAR 3 OF conversion PROMPT ‘Ma\<ss’ ; KEY CTRL+S, ‘^S’ DEFINE BAR 4 OF conversion PROMPT ‘Spee\<d’ ; KEY CTRL+D, ‘^D’ DEFINE BAR 5 OF conversion PROMPT ‘\<Temperature’ ; KEY CTRL+T, ‘^T’ DEFINE BAR 6 OF conversion PROMPT ‘T\<ime’ ; KEY CTRL+I, ‘^I’ DEFINE BAR 7 OF conversion PROMPT ‘Volu\<me’ ; KEY CTRL+M, ‘^M’ ON SELECTION POPUP conversion DO choice IN actimenu; WITH PROMPT( ), POPUP( ) DEFINE POPUP cardinfo MARGIN RELATIVE COLOR SCHEME 4 DEFINE BAR 1 OF cardinfo PROMPT ‘\<View Charges’ ; KEY ALT+V, ” DEFINE BAR 2 OF cardinfo PROMPT ‘View \<Payments’ ; KEY ALT+P, ” DEFINE BAR 3 OF cardinfo PROMPT ‘Vie\<w Users’ ; KEY ALT+W, ” DEFINE BAR 4 OF cardinfo PROMPT ‘\-‘ DEFINE BAR 5 OF cardinfo PROMPT ‘\<Charges ‘ ; KEY ALT+C, ” ON SELECTION POPUP cardinfo; DO choice IN actimenu WITH PROMPT( ), POPUP( ) ACTIVATE MENU example DEACTIVATE MENU example RELEASE MENU example EXTENDED SET SYSMENU TO DEFAULT ON KEY LABEL ESC PROCEDURE choice PARAMETERS mprompt, mpopup WAIT WINDOW ‘You chose ‘ + mprompt + ‘ from popup ‘ + mpopup NOWAIT


ACTIVATE POPUP MenuName [AT nRow, nColumn] [BAR nMenuItemNumber] [NOWAIT] [REST]

Parameters

MenuName

Specifies the name of the menu to activate. AT nRow, nColumn

Specifies the position of the menu on the screen or in a user-defined window. The row and column coordinate applies to the upper-left corner of the menu. The position you specify with this argument takes precedence over a position you specify with the FROM argument in DEFINE POPUP. BAR nMenuItemNumber

Specifies the item in the menu that is selected when the menu is activated. For example, if nMenuItemNumber is 2, the second item is selected. The first item is selected if you omit BAR nMenuItemNumber or if nMenuItemNumber is greater than the number of items in the menu. NOWAIT

Specifies that, at run time, a program does not wait for the user to choose an item from the menu before continuing program execution. Instead, the program continues to execute. REST

A menu created with the PROMPT FIELD clause of DEFINE POPUP places records from a field into the menu. When the menu is activated, the first item in the menu is initially selected, even if the record pointer in the table containing the field is positioned on a record other than the first record.

Include REST to specify that the item selected when the menu is activated corresponds to the current record pointer position in the table.

Remarks

ACTIVATE POPUP works in conjunction with DEFINE POPUP, used to create the menu, and DEFINE BAR, used to create the items on the menu.

Example

This example uses ACTIVATE POPUP with ON PAD to activate a menu when a menu title is chosen. The current system menu bar is first saved to memory with SET SYSMENU SAVE, and then all system menu titles are removed with SET SYSMENU TO.

Two new system menu titles are created with DEFINE PAD; DEFINE POPUP is used to create a menu for each menu title. DEFINE BAR is used to create menu items on each of the menus. When a menu title is chosen, ON PAD uses ACTIVATE POPUP to activate the corresponding menu.

When an item is chosen from a menu, the CHOICE procedure is executed. CHOICE displays the name of the chosen item and the name of the menu containing the item. If the Exit item is chosen from the Card Info menu, the original Visual FoxPro system menu is restored.

	Copy Code
* Name this program ACTIPOP.PRG * CLEAR SET SYSMENU SAVE SET SYSMENU TO DEFINE PAD convpad OF _MSYSMENU PROMPT ‘\<Conversions’ COLOR SCHEME 3 ; KEY ALT+C, ” DEFINE PAD cardpad OF _MSYSMENU PROMPT ‘Card \<Info’ COLOR SCHEME 3 ; KEY ALT+I, ” ON PAD convpad OF _MSYSMENU ACTIVATE POPUP conversion ON PAD cardpad OF _MSYSMENU ACTIVATE POPUP cardinfo DEFINE POPUP conversion MARGIN RELATIVE COLOR SCHEME 4 DEFINE BAR 1 OF conversion PROMPT ‘Ar\<ea’ KEY CTRL+E, ‘^E’ DEFINE BAR 2 OF conversion PROMPT ‘\<Length’ ; KEY CTRL+L, ‘^L’ DEFINE BAR 3 OF conversion PROMPT ‘Ma\<ss’ ; KEY CTRL+S, ‘^S’ DEFINE BAR 4 OF conversion PROMPT ‘Spee\<d’ ; KEY CTRL+D, ‘^D’ DEFINE BAR 5 OF conversion PROMPT ‘\<Temperature’ ; KEY CTRL+T, ‘^T’ DEFINE BAR 6 OF conversion PROMPT ‘T\<ime’ ; KEY CTRL+I, ‘^I’ DEFINE BAR 7 OF conversion PROMPT ‘Volu\<me’ ; KEY CTRL+M, ‘^M’ ON SELECTION POPUP conversion; DO choice IN actipop WITH PROMPT(), POPUP() DEFINE POPUP cardinfo MARGIN RELATIVE COLOR SCHEME 4 DEFINE BAR 1 OF cardinfo PROMPT ‘\<View Charges’ ; KEY ALT+V, ” DEFINE BAR 2 OF cardinfo PROMPT ‘View \<Payments’ ; KEY ALT+P, ” DEFINE BAR 3 OF cardinfo PROMPT ‘Vie\<w Users’ ; KEY ALT+W, ” DEFINE BAR 4 OF cardinfo PROMPT ‘\-‘ DEFINE BAR 5 OF cardinfo PROMPT ‘\<Charges’ ; KEY ALT+C, ” DEFINE BAR 6 OF cardinfo PROMPT ‘\-‘ DEFINE BAR 7 OF cardinfo PROMPT ‘E\<xit’; KEY ALT+X, ” ON SELECTION POPUP cardinfo; DO choice IN actipop WITH PROMPT(),POPUP() PROCEDURE choice PARAMETERS mprompt, mpopup WAIT WINDOW ‘You chose ‘ + mprompt + ; ‘ from popup ‘ + mpopup NOWAIT IF mprompt = ‘Exit’ SET SYSMENU TO DEFAULT ENDIF


ACTIVATE SCREEN

Remarks

Use ACTIVATE WINDOW to direct output to a user-defined window.


ACTIVATE WINDOW WindowName1 [, WindowName2 …] \| ALL [IN [WINDOW] WindowName3 \| IN SCREEN [BOTTOM \| TOP \| SAME] [NOSHOW]

Parameters

WindowName1[, WindowName2…]

Specifies the name of each window to activate. Separate the window names with commas. In Visual FoxPro, you can specify the name of a toolbar to activate. See SHOW WINDOW Command for a list of Visual FoxPro toolbar names. ALL

Specifies that all windows are activated. The last window activated is the active output window. IN [WINDOW] WindowName3

Specifies the name of the parent window within which the window is placed and activated. The activated window becomes a child window. A parent window can have multiple child windows. A child window activated inside a parent window cannot be moved outside the parent window. If the parent window is moved, the child window moves with it.

Note:
The parent window must be visible for any of its child windows to be visible.

IN SCREEN

Places and activates a window in the main Visual FoxPro window. A window can be placed in a parent window by including IN WINDOW in DEFINE WINDOW when the window is created. Including the IN SCREEN clause in ACTIVATE WINDOW overrides the IN WINDOW clause in DEFINE WINDOW. BOTTOM | TOP | SAME

Specifies where windows are activated with respect to other previously activated windows. By default, a window becomes the window on top when it is activated. Including BOTTOM places a window behind all other windows. TOP places it in front of all other windows. SAME activates a window without affecting its front-to-back placement. NOSHOW

Activates and directs output to a window without displaying the window.

Remarks

To successfully use this command on user-defined windows, any target user-defined window must have been created using the DEFINE WINDOW Command command.

Activating a window makes it the window on top and directs all output to that window. Output can be directed to only one window at a time. A window remains the active output window until it is deactivated or released, or until another window or the main Visual FoxPro window is activated.

The names of user-defined windows appear in the bottom section of the Window menu. The name of the active user-defined window is marked with a check mark.

More than one window can be placed in the main Visual FoxPro window at one time, but output is directed only to the last window activated. When more than one window is open, deactivating the active output window removes it from the main Visual FoxPro window and sends subsequent output to another window. If there is no active output window, output is directed to the main Visual FoxPro window.

Note:
To ensure output is directed to a specific window when you deactivate the active output window, you must explicitly activate the window you want to send output to with ACTIVATE WINDOW.

All activated windows are displayed until DEACTIVATE WINDOW or HIDE WINDOW is issued to remove them from view. Issuing either command removes windows from view but not from memory. Windows can be redisplayed by issuing ACTIVATE WINDOW or SHOW WINDOW.

To remove windows from view and from memory, use CLEAR WINDOWS, RELEASE WINDOWS, or CLEAR ALL. Windows that are removed from memory must be redefined to place them back in the main Visual FoxPro window.

You can use ACTIVATE WINDOW to place Visual FoxPro system windows in the main Visual FoxPro window or in a parent window.

The following system windows can be opened with ACTIVATE WINDOW:

Command
Call Stack
Debug
Debug Output
Document View
Locals
Trace
Watch
View

To activate a system window and or a toolbar, enclose the entire system window or toolbar name in quotation marks. For example, to activate the Call Stack debugging window in Visual FoxPro, issue the following command:

Historically in prior versions of Visual FoxPro, the Data Session window has always been referred to as the View window. Additionally, language used to control this window, such as HIDE WINDOW, ACTIVATE WINDOW, WONTOP( ), also refers to this window as the View window. Visual FoxPro continues to refer to the View window for the ACTIVATE WINDOW command.

Use HIDE WINDOW or RELEASE WINDOW to remove a system window from the main Visual FoxPro window or a parent window.

Example

The following example defines a window named output and activates it, placing it in the main Visual FoxPro window. The WAIT command pauses execution, the window is hidden, and then redisplayed.

Microsoft Visual FoxPro 9.0

Article
08/29/2016

Microsoft® Visual FoxPro® database development system is a powerful tool for quickly creating high-performance desktop, rich client, distributed client, client/server, and Web database applications. Employ its powerful data engine to manage large volumes of data, its object-oriented programming to reuse components across applications, its XML Web services features for distributed applications, and its built-in XML support to quickly manipulate data.

Note that Visual FoxPro 9.0 is the last version and was published in 2007.

Download Visual FoxPro 9.0 SP2

Download Service Pack 2 for Microsoft Visual FoxPro 9.0. SP2 provides the latest updates to Visual FoxPro 9.0 combining various enhancements and stability improvements into one integrated package.

Three Hotfixes for Visual FoxPro 9.0 SP2

Fix for the issue where a toolbar on an SDI form gets disabled.
Fix for the issue where records from another user session that violate the criteria for a parent table are displayed in the browse window for a child table in a Visual FoxPro 9.0 Service Pack 2 multi-user environment.
Fix for a reporting issue where the group header of a data grouping is not printed at the top of each page as expected after you install Microsoft Visual FoxPro 9.0 Service Pack 2.

Visual FoxPro Samples and Updates

Find code samples and product updates for Visual FoxPro.

Visual FoxPro on MSDN Forums

Join the conversation and get your questions answered on the Visual FoxPro Forum on MSDN.

Visual FoxPro 9.0 Overview

With its local cursor engine, tight coupling between language and data, and powerful features, Visual FoxPro 9.0 is a great tool for building database solutions of all sizes. Its data-centric, object-oriented language offers developers a robust set of tools for building database applications for the desktop, client-server environments, or the Web. Developers will have the necessary tools to manage data—from organizing tables of information, running queries, and creating an integrated relational database management system (DBMS) to programming a fully-developed data management application for end users.

Data-Handling and Interoperability. Create .NET compatible solutions with hierarchical XML and XML Web services. Exchange data with SQL Server through enhanced SQL language capabilities and newly supported data types.
Extensible Developer Productivity Tools. Enhance your user interfaces with dockable user forms, auto-anchoring of controls, and improved image support. Personalize the Properties Window with your favorite properties, custom editors, fonts, and color settings.
Flexibility to Build All Types of Database Solutions. Build and deploy stand-alone and remote applications for Windows based Tablet PCs. Create and access COM components and XML Web Services compatible with Microsoft .NET technology.
Reporting System Features. Extensible new output architecture provides precision control of report data output and formatting. Design with multiple detail banding, text rotation, and report chaining. Output reports supported include in XML, HTML, image formats, and customizable multi-page print preview window. Backward compatible with existing Visual FoxPro reports.

Resources

Visual FoxPro Downloads

Article
08/29/2016

Download samples, along with the final product updates including service packs for Visual FoxPro to ensure maximum productivity and performance from your Visual FoxPro development.

Visual FoxPro 9.0 Updates

Visual FoxPro 9.0 Service Pack 2 (SP2)
Download Service Pack 2 for Microsoft Visual FoxPro 9.0. SP2 provides the latest updates to Visual FoxPro 9.0 combining various enhancements and stability improvements into one integrated package.
Help Download for Visual FoxPro 9.0 SP2
Download product documentation for Visual FoxPro 9.0 SP2.
GDI+ Update for Visual FoxPro 9.0 SP2
Security update patch for Visual FoxPro 9.0 SP2 for fixing Buffer Overrun in JPEG Processing (GDI+).
GDI+ Update for Visual FoxPro 9.0 SP1
Security update patch for Visual FoxPro 9.0 SP1 for fixing Buffer Overrun in JPEG Processing (GDI+). Note: We highly recommend that you install Service Pack 2, then apply the GDI+ SP2 update.
Visual FoxPro 9.0 ‘Sedna’ AddOns
AddOn pack for Visual FoxPro 9.0. This download contains six components: VistaDialogs4COM, Upsizing Wizard, Data Explorer, NET4COM, MY for VFP and VS 2005 Extension for VFP.
XSource for Visual FoxPro 9.0 SP2
Download XSource for Visual FoxPro 9.0 SP2. XSource.zip has its own license agreement for usage, modification, and distribution of the Xbase source files included.
Microsoft OLE DB Provider for Visual FoxPro 9.0 SP2
The Visual FoxPro OLE DB Provider (VfpOleDB.dll) exposes OLE DB interfaces that you can use to access Visual FoxPro databases and tables from other programming languages and applications. The Visual FoxPro OLE DB Provider is supported by OLE DB System Components as provided by MDAC 2.6 or later. The requirements to run the Visual FoxPro OLE DB Provider are the same as for Visual FoxPro 9.0. Note: This version of the VFP OLE DB provider is the same version as the one included with Visual FoxPro 9.0 SP2.
VFPCOM Utility
Extend Visual FoxPro interoperability with other COM and ADO components with the VFPCOM Utility. This utility is a COM server that provides additional functionality when you use ADO and access COM events with your Visual FoxPro 9.0 applications. For installation instructions and more details on the issues that have been addressed, consult the VFPCOM Utility readme.
Visual FoxPro ODBC Driver
The VFPODBC driver is no longer supported. We strongly recommend using the Visual FoxPro OLE DB provider as a replacement. Please refer to the following article for more information and related links to issues when using the VFPODBC driver: https://support.microsoft.com/kb/277772.

Visual FoxPro 8.0 Updates

Visual FoxPro 8.0 Service Pack 1Download Microsoft Visual FoxPro 8.0 Service Pack 1 (SP1), which provides the latest updates to Visual FoxPro 8.0. SP1 combines various enhancements and stability improvements into one integrated package. The download contains all the documentation for these updates. For installation instructions and more details on SP1, consult the Service Pack 1 readme.
GDI+ Update for Visual FoxPro 8.0 SP1Security update patch for Visual FoxPro 8.0 SP1 for fixing Buffer Overrun in JPEG Processing (GDI+).
Visual FoxPro 8.0 SP1 Task Pane Source CodeSource code for Task Pane Manager component included in SP1 for Visual FoxPro 8.0. SP1 for VFP 8.0 included an updated Task Pane Manager component as an .APP application file but did not contain the update source code files associated with the updated version.
Visual FoxPro 8.0 Localization Toolkit OverviewOverview document of the Localization Toolkit project results for making available various language versions of the design-time IDE DLL and help documentation as add-ons to the English version of Visual FoxPro 8.0.

Visual FoxPro 7.0 Updates

Visual FoxPro 7.0 Service Pack 1Download Microsoft Visual FoxPro 7.0 Service Pack 1 (SP1), which provides the latest updates to Visual FoxPro 7.0. SP1 combines various enhancements and stability improvements into one integrated package. The download contains all the documentation for these updates. For installation instructions and more details on SP1, consult the Service Pack 1 readme.

Code Samples

.NET Samples for Visual FoxPro DevelopersThis download contains different projects and source files which are designed to show how how some common Visual FoxPro functionally is created in Visual Basic .NET.
Visual FoxPro 8.0 SamplesThis download contains different projects which are designed to show how new features in Visual FoxPro 8.0 can be used. Each project is self-contained and can be run independently of any other. There is a readme text file contained in each project that describes each sample program.
Sample: Visual FoxPro DDEX Provider for Visual Studio 2005A Data Designer EXtension Provider allows a data source to integrate better with data tools in Visual Studio. Visual FoxPro “Sedna” included a sample for such a provider for VFP data.This is now available as a stand-alone download.

System Requirements

Article
08/29/2016

To install Microsoft Visual FoxPro 9.0, you need:Expand table

Minimum Requirements
Processor	PC with a Pentium-class processor
Operating System	Microsoft Windows 2000 with Service Pack 3 or later operating systemMicrosoft Windows XP or laterMicrosoft Windows Server 2003 or later
Memory	64 MB of RAM minimum; 128 MB or higher recommended
Hard Disk	165 MB of available hard-disk space for typical installation; 20 MB of additional hard-disk space for Microsoft Visual FoxPro 9.0 Prerequisites
Drive	CD-ROM or DVD-ROM drive
Display	Super VGA 800 X 600 or higher-resolution monitor with 256 colors
Mouse	Microsoft Mouse or compatible pointing device

Frequently Asked Questions

Article
08/29/2016

Find answers to your frequently asked questions about Visual FoxPro.

Q: What operating system is required for Visual FoxPro 9.0?

Developing applications with Visual FoxPro 9.0 is supported only on Microsoft Windows 2000 Service Pack 3 or later, Windows XP, Windows Server 2003 and Windows Vista. You can create and distribute run-time applications for Windows 98, Windows Me, Windows 2000 Service Pack 3 or later, Windows XP, Windows Server 2003 and Windows Vista. Installation on Windows NT 4.0 Terminal Server Edition is not supported.

Q: Will there be a Visual FoxPro 10.0?

No. There will not be another major release of Visual FoxPro (see announcement: A message to the community, March 2007).

Q: Will there be updates to Visual FoxPro?

Yes. Visual FoxPro will continue to be supported as per the lifecyle policy (https://support.microsoft.com/lifecycle/?p1=7992). Visual FoxPro 9 will be supported until 2014. In support of these products we may release patch updates from time to time. These typically fix problems discovered either internally or by a customer and reported to our product support engineers.

Q: Will there be a service pack 3 for Visual FoxPro 9?

At this time there are no plans to release a service pack for Visual FoxPro. However if there arises a need to publish a collection of fixes we may release a service pack. We will make announcements on the Visual FoxPro home page.

Q: What types of applications can I build with Visual FoxPro 9.0?

With its local cursor engine, tight coupling between language and data, and powerful features, such as object-oriented programming, Visual FoxPro 9.0 is a great tool for building database solutions of all sizes, from desktop and client/server database applications to data-intensive COM components and XML Web services.

Visual FoxPro 9.0 is an application development tool for building extremely powerful database applications and components. Its data-centric, object-oriented language offers developers a robust set of tools for building database applications on the desktop, client/server, or on the Web, through components and XML Web services. Developers will have the necessary tools to manage data from organizing tables of information, running queries, and creating an integrated relational database management system (DBMS) to programming a fully developed data management application for end users.

Q: Can I use Visual FoxPro to build Web applications?

Visual FoxPro COM components can be used with Internet Information Services (IIS) to build high-powered Internet database applications. This is because Visual FoxPro components can be called from Active Server Pages (ASP). Visual FoxPro is compatible with ASP but works even better in conjunction with the more modern ASP.NET. The components will retrieve and manipulate data, and will build some of the HTML returned to the user.

Q: Can you consume XML Web services with Visual FoxPro?

Yes, Visual FoxPro 9.0 makes it easy to consume XML Web services by integrating the SOAP Toolkit into the product.

Q: Is Visual FoxPro a part of MSDN Subscriptions?

Yes, Visual FoxPro 9.0 is included in the Professional, Enterprise, and Universal levels of MSDN Subscriptions. Visual FoxPro 9.0 is available for download to MSDN Subscribers via MSDN Subscriber downloads.

Q: How long will Visual FoxPro be supported by Microsoft?

Visual FoxPro 9.0 has standard support by Microsoft through January 2010 and extended support through January 2015 as per the developer tools lifecycle support policy.

Q: How long will the SOAP Toolkit included in Visual FoxPro 9.0 be supported by Microsoft?

Licensed users of Visual FoxPro 9.0 have a special lifecycle support plan for the SOAP Toolkit, supported by Microsoft on the same support plan as Visual FoxPro 8.0 which is through April 2008 and extended support through September 2013.

Q: Is Visual FoxPro 9.0 compatible with Visual Studio 2005 and SQL Server 2005?

Yes. We improved XML support and added new data types in Visual FoxPro 9.0 which improves .NET interop and SQL Server compatibility. Moreover the ‘Sedna’ add-on pack includes improvements to the Data Explorer and the Upsizing Wizard. These have significant improvements to support SQL Server 2005.

Q: How does Visual FoxPro 9.0 compare to SQL Server?

We do not contrast Visual FoxPro versus SQL Server. We position SQL Server as a database engine and Visual FoxPro as a developer tool. While Visual FoxPro has a database engine built-in, it is not positioned as a stand-alone database engine only. The trend is for an increasing amount of Visual FoxPro based applications to use SQL Server as the data storage in the solution. Of course, this is not required; it depends on the requirements of the solution. SQL Server offers security, reliability, replication, and many other features of a full relational database engine while the Visual FoxPro database system is an open file based DBF system that does not have many of those features. We leave it up to developers and companies to position and to compare various Microsoft products and technologies with each other and decide which ones are best for them to use when and how.

Q: Are there plans to enhance the 2 GB database size limit in Visual FoxPro?

The 2 GB limit is per table, not per database. We do not have any plans to extend the 2 GB table size limit in Visual FoxPro due to many reasons including the 32-bit architecture that already exists within the product. For large, scalable databases we recommend SQL Server 2008.

Q: Is Visual FoxPro supported on Windows Vista?

Yes. Visual FoxPro 9 Service Pack 2 is fully supported on Windows Vista.

Q: Are there plans for Visual FoxPro to support 64-bit versions of the Windows operating system?

No. While Visual FoxPro will remain 32-bit and not natively use 64-bit addressing; it will run in 32-bit compatibility mode. Visual Studio 2008 supports creating native 64-bit applications.

Q: How do you position Visual FoxPro in relation to Microsoft Access?

Microsoft Access, the database in Office, is the most broadly used and easiest-to-learn database tool that Microsoft offers. If you are new to databases, if you are building applications that take advantage of Microsoft Office, or if you want an interactive product with plenty of convenience, then choose Microsoft Access. Visual FoxPro is a powerful rapid application development (RAD) tool for creating relational database applications. If you are a database developer who builds applications for a living and you want ultimate speed and power, then choose Visual FoxPro.

Q: Is Visual FoxPro part of Visual Studio .NET?

No. Visual FoxPro 9.0 is a stand-alone database development tool which is compatible and evolutionary from previous versions of Visual FoxPro. Visual FoxPro 9.0 does not use or install the Windows .NET Framework. Visual FoxPro 9.0 is compatible with Visual Studio .NET the area of XML Web services, XML support, VFP OLE DB provider, and more. Visual FoxPro and Visual Studio are complimentary tools that work great together, such as Visual FoxPro 9.0 plus ASP.NET for adding WebForm front ends and mobile device front ends to Visual FoxPro applications.

Q: What is Microsoft’s position on Visual FoxPro related to Visual Studio and .NET?

We do not have plans to merge Visual FoxPro into Visual Studio and .NET, and there are no plans to create any sort of new Visual FoxPro .NET language. Instead, we are working on adding many of the great features found in Visual FoxPro into upcoming versions of Visual Studio, just like we’ve added great Visual Studio features into Visual FoxPro. If you want to do .NET programming, you should choose a .NET language with Visual Studio.

A Message to the Community

Article
08/29/2016

March 2007

We have been asked about our plans for a new version of VFP. We are announcing today that there will be no VFP 10. VFP9 will continue to be supported according to our existing policy with support through 2015 (https://support.microsoft.com/lifecycle/?p1=7992). We will be releasing SP2 for Visual FoxPro 9 this summer as planned, providing fixes and additional support for Windows Vista.

Additionally, as you know, we’ve been working on a project codenamed Sedna for the past year or so. Sedna is built using the extensibility model of VFP9 and provides a number of new features including enhanced connectivity to SQL Server, integration with parts of the .NET framework, support for search using Windows Desktop Search and Windows Vista as well as enhanced access to VFP data from Visual Studio.

Concurrently, the community has been using CodePlex (https://www.codeplex.com) to enhance VFP using these same capabilities in the VFPx project. Some of these community driven enhancements include:

Support for GDI+
An enhanced class browser
Support for Windows Desktop Alerts
An object oriented menu system
Integration with MSBuild
A rule-based code analysis tool similar to fxCop in Visual Studio
An Outlook Control Bar control

To reiterate, today we are announcing that we are not planning on releasing a VFP 10 and will be releasing the completed Sedna work on CodePlex at no charge. The components written as part of Sedna will be placed in the community for further enhancement as part of our shared source initiative. You can expect to see the Sedna code on CodePlex sometime before the end of summer 2007.

Microsoft Visual FoxPro 9.0 SP2

Article
07/09/2007

In the Visual FoxPro Documentation

What’s New in Visual FoxPro
Describes the new features and enhancements included in this version of Visual FoxPro.
Getting Started with Visual FoxPro
Provides information about where to find the Readme file, installing and upgrading from previous versions, configuring Visual FoxPro, and customizing the development environment.
Using Visual FoxPro
Provides an overview of Visual FoxPro features, describes concepts and productivity tools for developing, programming, and managing high-performance database applications and components.
Samples and Walkthroughs
Contains Visual FoxPro code samples and step-by-step walkthroughs that you can use for experimenting with and learning Visual FoxPro features.
Reference
Includes Visual FoxPro general, programming language, user interface, and error message reference topics.
Product Support
Provides information about Microsoft product support services for Visual FoxPro.

Additional Information

Microsoft Visual FoxPro Web Site
Provides a link to the Microsoft Visual FoxPro Web site for additional information and resources for Visual FoxPro.
Microsoft Visual FoxPro Community
Provides a link to Microsoft Visual FoxPro Online Community Web site for third-party community resources and newsgroups.
Microsoft Visual FoxPro Training and Resources
Provides a link to the Visual FoxPro training Web site to find information about training, books, and events for Visual FoxPro.
Accessibility for People with Disabilities
Provides information about features that make Visual FoxPro more accessible for people with disabilities.

What’s New in Visual FoxPro

Article
07/09/2007

This release of Visual FoxPro contains many new features and enhancements. The following sections describe these new features and enhancements.

In This Section

Guide to Reporting Improvements
A roadmap to all new Reporting enhancements.
Data and XML Feature Enhancements
Describes additions and improvements to Visual FoxPro data features.
SQL Language Improvements
Describes enhancements to SQL language such as SELECT – SQL Command.
Class Enhancements
Describes additions and improvements to Visual FoxPro classes, forms, controls and object-oriented related features.
Language Enhancements
Describes additions and improvements to the Visual FoxPro programming language.
Interactive Development Environment (IDE) Enhancements
Describes additions and improvements made to the Visual FoxPro IDE.
Enhancements to Visual FoxPro Designers
Describes improvements made to designers available in Visual FoxPro.
Miscellaneous Enhancements
Describes other improvements made in this version of Visual FoxPro.
Changes in Functionality for the Current Release
Describes changes in the behavior of existing language and functionality.
Visual FoxPro New Reserved Words
Lists new reserved words added to Visual FoxPro.

Getting Started with Visual FoxPro
Provides information about where to find the ReadMe file and how to install and upgrade from previous versions, configure Visual FoxPro, and customize the development environment.
Using Visual FoxPro
Provides an overview of Visual FoxPro features, describes concepts and productivity tools for developing, programming, and managing high-performance database applications and components, and provides walkthroughs that help get you started. With the robust tools and data-centric object-oriented language that Visual FoxPro offers, you can build modern, scalable, multi-tier applications that integrate client/server computing and the Internet.
Samples and Walkthroughs
Contains Visual FoxPro code samples and step-by-step walkthroughs that you can use for experimenting with and learning Visual FoxPro features.
Reference (Visual FoxPro)
Describes Visual FoxPro general, programming language, user interface, and error message reference topics.
Product Support (Visual FoxPro)
Provides information about Microsoft product support services for Visual FoxPro.

English (United States)

Your Privacy ChoicesTheme

Guide to Reporting Improvements

Article
07/09/2007

Design-time enhancements.
Multiple features and changes make designing reports in Visual FoxPro better for you and your end-users. The Report Builder Application re-organizes your design experience out-of-the-box. If you want to customize the design process, Report Builder dialog boxes and Report Designer events are fully exposed for you to do so.
Multiple detail bands.
You can handle multiple child tables and data relationships more flexibly in the revised Report Designer. When you run multiple-detail-band reports, you can leverage the new bands, with associated detail headers and footers, both for appropriate presentation of these relationships and for more capable calculations.
Object-assisted run-time report processing.
An entirely re-built output system, including a new base class, changes the way Visual FoxPro provides output report and label files at run time. Object-assisted reporting provides better-quality output, new types of output, and an open-architecture based on a new Visual FoxPro base class, the ReportListener. A programmable Report Preview interface interacts with ReportListeners to give you full control over report preview experience. The Report Preview Application provides improved out-of-the-box previewing facilities.
Printing, rendering, and character-set-handling improvements.
Visual FoxPro 9 makes better use of the operating system’s printing features and GDI+ rendering subsystem. It also handles multiple locales and character sets better than previous versions. These changes are showcased in the Report System, and are accessible for use in custom code during report design and run-time processing.
Extensible use of report and label definition files (.frx and .lbx tables).
Visual FoxPro 9 handles your existing reports and labels without modification, while allowing you to add new features and behavior to these reports easily. This backward-compatible, yet forward-thinking, migration strategy is made possible by the Report System’s newly-flexible handling of the .frx and .lbx table structure.

Design-time Enhancements

Numerous changes in the Report System help you enhance the design-time experience for developers and end-users. This section directs you to information about design-time improvements.

Report Designer Event Hooks and the Report Builder Application

The Report Designer now offers Report Builder Hooks, which enable you to intercept events occurring during a report or label design session to override and extend designer activity. The default Report Builder Application replaces many of the standard reporting dialog boxes with new ones written in Visual FoxPro code. Components of the Report Builder Application are exposed as Visual FoxPro Foundation Classes for your use.Expand table

To learn about:	Read:
Report Builder Hooks	Understanding Report Builder Events
How the Report Builder Application uses Report Builder Hooks	How to: Configure the Report Builder’s Event Handling
How to specify and distribute a Report Builder with your applications	_REPORTBUILDER System Variable How to: Specify and Distribute ReportBuilder.App Including Report Files for Distribution
Using Report Builder algorithms in your code	FRX Cursor Foundation Class FRX Device Helper Foundation Class

Protection for End-User Design Sessions, and other Design-time Customization Opportunities

You can allow end-users to MODIFY and CREATE reports and labels, while setting limitations on what they can do in the Report Designer interface, using the new PROTECTED keyword. Protection is available individually by object and globally for the report. You can change what end-users see on the designer layout surface, from complex expressions to simple labels or sample data, while working in PROTECTED design mode, using Design-Time Captions. You can also provide helpful instructions, for both PROTECTED and standard design mode, by specifying Tooltips for report controls.Expand table

To learn about:	Read:
Using the PROTECTED keyword	MODIFY REPORT Command MODIFY LABEL Command
Setting Protection in the Report or Label Designer, and what Protection settings do	Setting Protection for Reports
Protection settings exposed in Report or Label Dialog dialog boxes when you use the default Report Builder Application	Protection Tab, Report Control Properties Dialog Box (Report Builder)Protection Tab, Report Properties Dialog Box (Report Builder)Protection Tab, Report Band Properties Dialog Box (Report Builder)
Design-Time Captions	How to: Add Design-time Captions to Field Controls
ToolTips for Report Controls	How to: Add Tooltips to Report Controls

Enhanced Data Environment Use in Reports

You can save the Data Environment you designed for a Report or Label as a visual class. You can load a Data Environment into a Report or Label design from either a visual class or a previously-saved report or label.Expand table

To learn about:	Read:
Saving a Report Data Environment	How to: Save Report Data Environments as Classes
Loading a Report Data Environment	Data Environment Tab, Report Properties Dialog Box (Report Builder)How to: Load Data Environments for Reports

Miscellaneous Design Improvements

There have been numerous enhancements to the Report and Label Designers. Some features are subtle changes to make design sessions more efficient and more enjoyable, and others improve your choices for resulting output.Expand table

To learn about:	Read:
Improvements to the Report and Label Interactive Development Environment (IDE), such as:Enhanced Report Designer toolbar, and easier access to the Report Designer Toolbar from the View menuNew global Report Properties context menuImprovements and additions to existing context menusRevised and extended Report menu	Report Layout and Design
Changes to global report and label design options	Reports Tab, Options Dialog Box
Using the new PictureVal property of the Image control to specify images in reports	How to: Add Pictures to Reports PictureVal Property
New picture template characters (`U` and `W`) and updated format instructions (`Z`, now supported for date and datetime data), useful in reports and labels	Format Expressions for Field Controls InputMask Property Format Property
Receiving improved HTML output, which leverages run-time reporting enhancements, when you choose Save As HTML… while designing a report or label	How to: Generate Output for Reports TipOther Visual FoxPro components that invoke Genhtml.prg, the default _GENHTML implementation, automatically share the improved HTML output, although these components have not changed. These include the FRX to HTML Foundation Class and the Output Object Foundation Class.
Report document properties enable you to include information about the report in the report. Document properties are included as elements and attributes in XML and HTML output.	How to: Add Document Properties to a Report Document Properties Tab, Report Properties Dialog Box (Report Builder)
You can dynamically change the properties of report controls at run time based on the evaluation of an expression.	How to: Dynamically Format Report Controls Dynamics Tab, Report Control Properties Dialog Box (Report Builder)

Multiple Detail Bands

The Report Engine can now move through a scope of records multiple times. The records can represent related sets of detail lines in child tables, or they can be multiple passes through a single table. These multiple passes through a scope of records are represented as multiple detail bands.

Detail bands can have their own headers and footers, their own associated onEntry and onExit code, and their own associated report variables. Each detail band can be explicitly associated with a separate target alias, allowing you to control the number of entries in each detail band separately for related tables.

Multiple detail band reports provide many new ways you can represent data in reports and labels, and new ways you can calculate or summarize data, as you move through a record scope.Expand table

To learn about:	Read:
Designing reports and labels with multiple detail bands and their associated headers and footers	Optional Bands Dialog Box Report Band Properties Dialog Box Band Tab, Report Band Properties Dialog Box (Report Builder)
Handling multiple, related tables in report and label data	Controlling Data in Reports Working with Related Tables using Multiple Detail Bands in Reports
Associating report variables with detail bands	How to: Reset Report Variables
Comparing multiple groups and multiple detail bands	Report Bands

Object-assisted Run-time Report Processing

Visual FoxPro 9 has a new, object-assisted method of generating output from reports and labels. You can use your existing report and label layouts in object-assisted mode, to:

Generate multiple types of output during one report run.
Connect multiple reports together as part of one output result.
Improve the quality of traditional report output.
Dynamically adjust the contents of a report while you process it.
Provide new types of output not available from earlier versions of Visual FoxPro.

This section covers the array of run-time enhancements that work together to support object-assisted reporting mode.

Object-Assisted Architecture and ReportListener Base Class

The new ReportListener base class and supporting language enhancements are the heart of run-time reporting enhancements.Expand table

To learn about:	Read:
Fundamentals of the architecture, how its components work together, and what happens during an object-assisted report run	Understanding Visual FoxPro Object-Assisted Reporting
The ReportListener base class and its members	ReportListener Object ReportListener Object Properties, Methods, and Events
Invoking object-assisted reporting mode automatically	SET REPORTBEHAVIOR Command _REPORTOUTPUT System Variable Reports Tab, Options Dialog Box
Invoking object-assisted reporting mode explicitly with Visual FoxPro commands	REPORT FORM Command LABEL Command
Debugging and error-handling object-assisted report runs	Handling Errors During Report Runs

Report Preview API and the Report Preview Application

Visual FoxPro 9’s object-assisted reporting mode gives you complete control over report and label previews.Expand table

To learn about:	Read:
How object-assisted preview works	The Preview Container API Creating a Custom Preview Container
The default Report Preview Application	Leveraging the Default Preview Container
How to specify and distribute Report Preview components with your applications	_REPORTPREVIEW System Variable How to: Specify and Distribute ReportPreview.App Including Report Files for Distribution

New Types of Output and the Report Output Component Set

Because you can subclass ReportListener, you can create new types of output. Visual FoxPro 9 supplies a Report Output Application to connect ReportListener subclasses with output types, as well as ReportListener-derived classes with enhanced output capabilities.Expand table

To learn about:	Read:
Requirements for Report Output Application, and how Visual FoxPro uses Report Output Applications	_REPORTOUTPUT System Variable
Features of the default Report Output Application	Understanding the Report Output Application
Specifying custom output handlers using the default Report Output Application	How to: Specify an Alternate Report Output Registry Table How to: Register Custom ReportListeners and Custom OutputTypes in the Report Output Registry Table Considerations for Creating New Report Output Types
Understanding and configuring the Visual FoxPro Foundation Classes providing default ReportListener behavior for object-assisted preview and printing	ReportListener User Feedback Foundation Class
Understanding and configuring the Visual FoxPro Foundation Classes responsible for default XML and HTML output	ReportListener XML Foundation Class ReportListener HTML Foundation Class
Leveraging the full set of supported Report Output Foundation Classes and VFP Report Output XML format	ReportListener Foundation Classes Using VFP Report Output XML
How to specify and distribute Report Output components with your applications	How to: Specify and Distribute Report Output Application Components Including Report Files for Distribution

Migration Strategies and Changes in Output Rendering

You can use the design-time changes to improve all reports and labels, whether you choose backward-compatible or object-assisted reporting mode at run time.

When evaluating whether to switch to object-assisted reporting mode at run time, first consider items on the Reporting list of Important Changes in the Changes in Functionality for the Current Release topic, some of which are specific to this new method of creating output. .The topic includes a table of minor differences between backward-compatible and object-assisted reporting output. You can examine what effects these changes might have on individual existing reports, and use the recommendations in the table to address them. You will find additional details in the topic Using GDI+ in Reports.

Once you have experimented with your current reports, you can decide on a migration strategy for output:

You can switch applications over to use object-assisted reporting mode completely, by using the command SET REPORTBEHAVIOR 90.
You can use SET REPORTBEHAVIOR 90 but preface specific REPORT FORM commands for reports with formatting issues with SET REPORTBEHAVIOR 80, returning your application to object-assisted mode afterwards.
You can use object-assisted mode all the time, but adjust your ReportListener-derived classes’ behavior to suit specific needs. For example, you could change the default setting of the ReportListener’s DynamicLineHeight Property to False (.F.).
You can leave SET REPORTBEHAVIOR at its default setting of 80, and add an explicit OBJECT clause to specific reports at your leisure, as you have the opportunity to evaluate and adjust individual report and label layouts.

Printing, Rendering, and Character-set-handling Improvements

General changes to Visual FoxPro’s use of Windows’ printing, rendering and font-handling support the improvements in the Report System’s output. These changes enhance your ability to support multiple printers and multiple languages in reports.Expand table

To learn about:	Read:
GDI+ features and their impact on native Visual FoxPro output	Using GDI+ in Reports
Visual FoxPro reporting enhancements that allow your code to use GDI+ in object-assisted reporting mode, and Visual FoxPro Foundation Classes to get you started	GDIPlusGraphics Property Render Method GDI Plus API Wrapper Foundation Classes
Making full use of multiple character sets, or language scripts, in reports, for single report layout elements, for report defaults, or globally in Visual FoxPro	GETFONT( ) Function Style Tab, Report Control Properties Dialog Box (Report Builder)How to: Change Page Settings for Reports Reports Tab, Options Dialog Box Reporting Features for International Applications
Changes to page setup dialog boxes in Visual FoxPro, improvements in your programmatic access to them, and providing overrides to Printer Environment settings in report and label files	SYS(1037) – Page Setup Dialog Box
Receiving improved information about the user’s installed printers	APRINTERS( ) Function
Limiting a list of fonts to those appropriate for printer user	GETFONT( ) Function

Extensible Use of Report and Label Definition Files

Underneath all the changes to the Visual FoxPro Report System, the Report Designer and Report Engine handle your report and label definitions using the same .frx and .lbx file structures as they did in previous versions. They change the way they use certain fields, without making these reports and labels invalid in previous versions, and they also allow you to extend your use of existing fields or add custom fields.

Tip

This change is critical to your ability to create extensions of the new reporting features. For example, you might store two sets of ToolTips in two report extension fields, one set for use by developers and one for use by end-users. In a Report Builder extension, you could evaluate whether the Designer was working in protected or standard mode, and replace the actual set of ToolTips from the appropriate extension field. In previous versions, you could not add fields to report or label structure; the Designer and Engine would consider the table invalid. You also could not add custom content to unused, standard fields in various report and label records safely, because the Report Designer removed such content.

Visual FoxPro 9 provides a revised FILESPEC table for report and label files, with extensive information on the use of each column in earlier versions as well as current enhancements.

Visual FoxPro 9 also establishes a new, structured metadata format for use with reports. This format is an XML document schema shared with the Class Designer’s XML MemberData.

The XML document format allows you to pack custom reporting information into a single report or label field. The default Report Builder Application makes it easy to add Report XML MemberData to report and label records.Expand table

To learn about:	Read:
How Visual FoxPro uses .frx and .lbx tables, and how to extend these structures	Understanding and Extending Report Structure
How to find and display the contents of the revised FILESPEC table, 60FRX.dbf	Table Structures of Table Files (.dbc, .frx, .lbx, .mnx, .pjx, .scx, .vcx)
How you can edit the XML data using the Report Builder Application	How to: Assign Structured Metadata to Report Controls
How you can use Report XML MemberData	Report XML MemberData Extensions
The shared MemberData document schema	MemberData Extensibility

Data and XML Feature Enhancements

Article
07/09/2007

Extended SQL Capabilities

Visual FoxPro contains many enhancements for SQL capabilities. For more information, see SQL Language Improvements.

New Data Types

Visual FoxPro includes the following new field and data types:

**Varchar **To store alphanumeric text without including padding by additional spaces at the end of the field or truncating trailing spaces, use the new Varchar field type.If you do not want Varchar fields translated across code pages, use the Varchar (Binary) field type. For more information, see Varchar Field Type.You can specify Varchar type mapping between ODBC, ADO, and XML data source types and CursorAdapter and XMLAdapter objects using the MapVarchar Property. You can also specify Varchar mapping for SQL pass-through technology and remote views using the MapVarchar setting in the CURSORSETPROP( ) function. For more information, see CURSORSETPROP( ) Function and CURSORGETPROP( ) Function.
**Varbinary **To store binary values and literals of fixed length in fields and variables without padding the field with additional zero (0) bytes or truncating any trailing zero bytes that are entered by the user, use the Varbinary data type. Internally, Visual FoxPro binary literals contain a prefix, 0h, followed by a string of hexadecimal numbers and are not enclosed with quotation marks (“”), unlike character strings. For more information, see Varbinary Data Type.You can specify binary type mapping between ODBC, ADO, and XML data source types and CursorAdapter and XMLAdapter objects using the MapBinary Property. You can also specify binary mapping for SQL pass-through technology and remote views using the MapBinary setting in the CURSORSETPROP( ) function. For more information, see CURSORSETPROP( ) Function and CURSORGETPROP( ) Function.
BlobTo store binary data with indeterminate length, use the Blob data type. For more information, see Blob Data Type.

Many of the Visual FoxPro language elements affected by these new data types are listed in the topics for the new data types.

Binary Index Tag Based on Logical Expressions

Visual FoxPro includes a new binary, or bitmap, index for creating indexes based on logical expressions, for example, indexes based on deleted records. A binary index can be significantly smaller than a non-binary index and can improve the speed of maintaining indexes. You can create binary indexes using the Table Designer or INDEX command. Visual FoxPro also includes Rushmore optimization enhancements in the SQL engine for deleted records.

For more information, see Visual FoxPro Index Types, INDEX Command, ALTER TABLE – SQL Command, and Indexes Based on Deleted Records.

Converting Data Types with the CAST( ) Function

You can convert expressions from one data type to another by using the new CAST( ) function. Using CAST( ) makes it possible for you to create SQL statements more compatible with SQL Server.

For more information, see CAST( ) Function.

Get Cursor and Count Records Affected by SQL Pass-Thru Execution

By using the aCountInfo parameter of the SQLEXEC( ) and SQLMORERESULTS( ) functions, you can get the name of the cursor created and a count of the records affected by the execution of a SQL pass-through statement.

For more information, see SQLEXEC( ) Function) and SQLMORERESULTS( ) Function.

Roll-Back Functionality Supported when a SQL Pass-Through Connection Disconnects

Visual FoxPro now supports the DisconnectRollback property for use with the SQLSETPROP( ), SQLGETPROP( ), DBSETPROP( ), and DBGETPROP( ) functions. DisconnectRollback is a connection-level property that causes a transaction to be either rolled back or committed when the SQLDISCONNECT( ) function is called for the last connection handle associated with the connection.

The DisconnectRollback property accepts a logical value.

False (.F.) – (Default) The transaction will be committed when the SQLDISCONNECT( ) function is called for the last statement handle associated with the connection.
True (.T.) – The transaction is rolled back when the SQLDISCONNECT( ) function is called for the last statement handle associated with the connection.

The following example shows the DisconnectRollback property set in the DBSETPROP( ) and SQLSETPROP( ) functions.Copy

DBSETPROP("testConnection","CONNECTION","DisconnectRollback",.T.)
SQLSETPROP(con,"DisconnectRollback",.T.)

For more information, see DisconnectRollback property in SQLSETPROP( ) Function.

SQLIDLEDISCONNECT( ) Temporarily Disconnects SQL Pass-Through Connections

You can use the new SQLIDLEDISCONNECT( ) function to allow a SQL Pass-Through connection to be temporarily disconnected. Use the following syntax.Copy

SQLIDLEDISCONNECT( nStatementHandle )

The nStatementHandle parameter is set to the statement handle to be disconnected or 0 if all statement handles should be disconnected.

The SQLIDLEDISCONNECT( ) function returns the value 1 if it is successful; otherwise, it returns -1.

The function fails if the specified statement handle is busy or the connection is in manual commit mode. The AERROR( ) function can be used to obtain error information.

The disconnected connection handle is automatically restored if it is needed for an operation. The original connection data source name is used.

If a statement handle is temporarily released, the OBDChstmt property returns 0; the OBDChdbc returns 0 if the connection is temporarily disconnected. A shared connection is temporarily disconnected as soon as all of its statement handles are temporarily released.

For more information, see SQLIDLEDISCONNECT( ) Function.

Retrieving Active SQL Connection Statement Handles

You can retrieve information for all active SQL connection statement handles using the new ASQLHANDLES( ) function. ASQLHANDLES( ) creates and uses the specified array to store numeric statement handle references that you can use in other Visual FoxPro SQL functions, such as SQLEXEC( ) and SQLDISCONNECT( ). ASQLHANDLES( ) returns the number of active statement handles in use or zero (0) if none are available. For more information, see ASQLHANDLES( ) Function.

Obtain the ADO Bookmark for the Current Record in an ADO-Based Cursor

The ADOBookmark property is now supported by the CURSORGETPROP( ) function. Use this property to obtain the ActiveX® Data Objects (ADO) bookmark for the current record in an ADO-based cursor.

For more information, see ADOBookmark Property in CURSORGETPROP( ) Function.

If a table is not selected and an alias is not specified, Error 52, “No table is open in the current work area,” is generated. If the cursor selected is not valid, Error 1467, “Property is invalid for local cursors,” is generated.

Obtain the Number of Fetched Records

You can obtain the number of fetched records during SQL Pass-Through execution by using the new RecordsFetched cursor property with the CURSORGETPROP( ) function.

Specifying the RecordsFetched cursor property will return the number of fetched records from an OBDC/ADO-based cursor.

If records have been deleted or appended locally, the RecordsFetched cursor property may not return the current number of records in the OBDC/ADO-based cursor. In addition, filter conditions are ignored.

For more information, see RecordsFetched Property in CURSORGETPROP( ) Function.

Determine if a Fetch is Complete

You can determine if a fetch process is complete for an OBDC/ADO-based cursor by using the new FetchIsComplete cursor property with the CURSORGETPROP( ) function. Read-only at design time and run time.

This property is not supported on environment level (work area 0) cursors, tables, and local views.

The FetchIsComplete cursor property returns a logical expression True (.T.) if the fetch process is complete; otherwise False (.F.) is returned.

For more information, see FetchIsComplete Property in CURSORGETPROP( ) Function.

ISMEMOFETCHED( ) Determines Whether a Memo is Fetched

You can use the ISMEMOFETCHED( ) function to determine whether a Memo field or General field is fetched when you are using delayed memo fetching. For more information about delayed memo fetching, see Speeding Up Data Retrieval.

The syntax for this function is:

ISMEMOFETCHED(cFieldName | nFieldNumber [, nWorkArea | cTableAlias ])

The ISMEMOFETCHED( ) function returns True (.T.) when the Memo field is fetched or if local data is used. ISMEMOFETCHED() returns NULL if the record pointer is positioned at the beginning of the cursor or past the last record.

For more information, see ISMEMOFETCHED( ) Function.

Cancel ADO Fetch

In Visual FoxPro, you can now cancel a lengthy ADO fetch by pressing the ESC key.

Long Type Name Support

Visual FoxPro supports using long type names with the following functions, commands, and properties.

The following table lists the data types along with their long type names and short type names.Expand table

Data Type	Long Type Name	Short Type Name
Character	Char, Character	C
Date	Date	D
DateTime	Datetime	T
Numeric	Num, Numeric	N
Floating	Float	F
Integer	Int, Integer	I
Double	Double	B
Currency	Currency	Y
Logical	Logical	L
Memo	Memo	M
General	General	G
Picture	Picture	P
Varchar	Varchar	V
Varbinary	Varbinary	Q
Blob	Blob	W

Visual FoxPro allows ambiguous long type names to be used with the ALTER TABLE, CREATE CURSOR, CREATE TABLE, and CREATE FROM commands. If the specified long type name is not a recognized long type name, Visual FoxPro will truncate the specified name to the first character.

Transaction Support for Free Tables and Cursors

In prior versions of Visual FoxPro, transactions using the BEGIN TRANSACTION Command were only supported for local and remote data from databases. Transactions involving free tables and cursors are now supported through use of the MAKETRANSACTABLE( ) and ISTRANSACTABLE( ) functions. For more information, see MAKETRANSACTABLE( ) Function and ISTRANSACTABLE( ) Function.

Specify a Code Page When Using the CREATE TABLE or CREATE CURSOR Commands

You can specify a code page by including the CODEPAGE clause with the CREATE CURSOR or CREATE TABLE commands.

When the CODEPAGE clause is specified, the new table or cursor has a code page specified by nCodePage. An error, 1914, “Code page number is invalid”, is generated if an invalid code page is specified.

The following example creates a table and displays its code page:Copy

CREATE TABLE Sales CODEPAGE=1251 (OrderID I, CustID I, OrderAmt Y(4))

? CPDBF( )

For more information, see CREATE CURSOR – SQL Command, CREATE TABLE – SQL Command and Code Pages Supported by Visual FoxPro.

Convert Character and Memo Data Types Using the ALTER TABLE Command

Visual FoxPro now supports automatic conversion from character data type to memo data type without loss of data when using the ALTER TABLE command along with the ALTER COLUMN clause. This conversion is also supported when making structural changes using the Table Designer. For more information, see ALTER TABLE – SQL Command.

BLANK Command Can Initialize Records to Default Value

You can initialize fields in the current record to their default values as stored in the table’s database container (DBC) by using the DEFAULT [AUTOINC] option when clearing the record with the BLANK command. For more information, see BLANK Command.

FLUSH Command Writes Data Explicitly to Disk

Visual FoxPro now includes options and parameters for the FLUSH command and FFLUSH function so you can explicitly save all changes you make to all open tables and indexes. You can also save changes to a specific table by specifying a work area, table alias, or a path and file name. For more information, see FLUSH Command and FFLUSH( ) Function.

Populate an Array with Aliases Used by a Specified Table

The new cTableName parameter for the AUSED( ) function makes it possible to filter the created array to contain only the aliases being used for a specified table.

AUSED(ArrayName [, nDataSessionNumber [, cTableName ]])

The cTableName parameter accepts the following formats to specify a table, from highest to lowest in priority.

DatabaseName!TableName or DatabaseName!ViewName
Path\DatabaseName!TableName or Path\DatabaseName!ViewName
DBC-defined table name or view in the current DBC in the current data session
Simple or full file name

For more information, see AUSED( ) Function.

Obtain Last Auto-Increment Value with GETAUTOINCVALUE( )

You can use the new GETAUTOINCVALUE( ) function to return the last value generated for an autoincremented field within a data session. For more information, see GETAUTOINCVALUE( ) Function.

SET TABLEPROMPT Controls Prompt to Select Table

The new SET TABLEPROMPT command controls whether Visual FoxPro prompts the user with the Open Dialog Box (Visual FoxPro) to select a table when one specified cannot be found, such as in SELECT – SQL Command. For more information, see SET TABLEPROMPT Command.

Use SET VARCHARMAPPING to Control Query Result Set Mappings

For queries such as SELECT – SQL Command, character data is often manipulated using Visual FoxPro functions and expressions. Since the length of the resulting field value may be important for certain application uses, it is valuable to have this Character data mapped to Varchar data in the result set. The SET VARCHARMAPPING command controls whether Character data is mapped to a Character or Varchar data type. For more information, see SET VARCHARMAPPING Command.

SET TABLEVALIDATE Expanded

When a table header is locked during validation, attempts to open the table, for example, with the USE command, generate the message “File is in use (Error 3).” If the table header cannot be locked for a table open operation, you can suppress this message by setting the third bit for the SET TABLEVALIDATE command. You must also set the first bit to validate the record count when the table opens. Therefore, you need to set the SET TABLEVALIDATE command to a value of 5. Also, a fourth bit option (value of 8) is available for Insert operations which checks the table header before the appended record is saved to disk and the table header is modified.

For more information, see SET TABLEVALIDATE Command.

SET REFRESH Can Specify Faster Refresh Rates

You can specify fractions of a second for the nSeconds2 parameter to a minimum of 0.001 seconds. You can also specify the following values for the optional second parameter:

-1 – Always read data from a disk.
0 – Always use data in memory buffer but do not refresh buffer.

The Table refresh interval check box on the Data tab of the Options dialog box now also accepts fractional values.

For more information, see SET REFRESH Command and Data Tab, Options Dialog Box.

SET REFRESH Can Differentiate Values for Each Cursor

You can use the new Refresh property with the CURSORGETPROP( ) function to differentiate the SET REFRESH values for individual cursors. The default setting is -2, which is a global value. This value is not available with the SET REFRESH command.

The Refresh property is available at the Data Session and Cursor level. The default setting for a Data Session level is -2 and the default value for a Cursor level is the current session’s level setting. If the global level setting is set to 0, the Cursor level setting is ignored.

If a table is not currently selected and an alias is not specified, Error 52, “No table is open in the current work area,” is generated.

For more information, see Refresh Property in CURSORGETPROP( ) Function.

SET( ) Determines SET REPROCESS Command Settings

You can now use the following syntax with the SET( ) function to determine how the SET REPROCESS command was declared.Expand table

SET Command	Value Returned
REPROCESS, 2	Current session setting type (0 – attempts, 1 – seconds)
REPROCESS, 3	System session setting type (0 – attempts, 1 – seconds)

For more information, see SET( ) Function and SET REPROCESS Command.

Log Output from SYS(3054) Using SYS(3092)

You can use the new SYS(3092) function in conjunction with SYS(3054) to record the resulting output to a file.

SYS( 3092 [, cFileName [, lAdditive ]])

The cFileName parameter specifies the file to echo the SYS(3054) output to. Sending an empty string to cFileName will deactivate output recording to the file.

The default value for lAdditive is False (.F.). This specifies that new output will overwrite the previous contents of the specified file. To append new output to the specified file, set lAdditive to True (.T.).

SYS(3092) returns the name of the current echo file if it is active; otherwise, it returns an empty string.

SYS(3054) and SYS(3092) are global settings — in a multithreaded runtime they are scoped to a thread. Each function can be changed independently from each other.

These functions are not available in the Visual FoxPro OLE DB Provider.

For more information, see SYS(3054) – Rushmore Query Optimization Level and SYS(3092) – Output Rushmore Query Optimization Level.

Purge Cached Memory for Specific Work Area Using SYS(1104)

You can optionally specify the alias or work area of a specified table or cursor for which cached memory is purged. For more information, see SYS(1104) – Purge Memory Cache.

New Table Types for SYS(2029)

The SYS(2029) function returns new values for tables that contain Autoinc, Varchar, Varbinary or Blob fields. For more information, see SYS(2029) – Table Type.

Map Remote Unicode Data to ANSI Using SYS(987)

Use SYS(987) to map remote Unicode data retrieved through SQL pass-through or remote views to ANSI. This function can be used to retrieve remote Varchar data as ANSI for use with Memo fields. This setting is a global setting across all data sessions so should be used with care. For more information, see SYS(987) – Map Remote Data to ANSI.

Memo and Field tips in a BROWSE or Grid

When the mouse pointer is positioned over a Memo field cell in a Browse window or Grid control, a Memo Tip window displays the contents of the Memo field.

For other field types, positioning the mouse pointer over the field displays the field contents in a Field Tip window when the field is sized smaller than its contents.

Memo Tip windows display no more than 4 kilobytes of text, and are not displayed for binary data. A Memo Tip window is displayed until the mouse pointer is moved from the Memo field. The _TOOLTIPTIMEOUT System Variable determines how long a Field Tip window is displayed.

You can disable Memo Tips by setting the _SCREEN ShowTips Property to False (.F.).

Memo and Field Tips will also be displayed for Grid controls if both _SCREEN and the form’s ShowTips property are set to True (.T.). Additionally, the ToolTipText Property for the field’s grid column Textbox control must contain an empty string.

Specify Code Pages

You can specify the code page used to decode data when XML is being parsed and to encode data when UTF-8 encoded XML is generated. The following language changes are available:

nCodePage ParameterTo specify code pages, you can use the nCodePage parameter for the following XMLToTable methods:
CopyXMLTable.ToCursor ( [ lAppend [, cAlias [, nCodePage ]]] ) XMLTable.ChangesToCursor( [ cAlias [, lIncludeUnchangedData [, nCodePage ]]] ) XMLTable.ApplyDiffgram( [ cAlias [, oCursorAdapter [, lPreserveChanges [, nCodePage ]]]] )
CodePage and UseCodePage PropertiesUse the CodePage Property and UseCodePage Property to specify code pages when you use the following classes:
CopyXMLAdapter.CodePage = nValue XMLTable.CodePage = nValue XMLField.CodePage = nValue
Flag 32768The flag 32768 is available for the following functions and class:
CopyCursorAdaptor.Flags = nCodePage XMLTOCURSOR( eExpression | cXMLFile [, cCursorName [, nFlags ]]) CURSORTOXML(nWorkArea | cTableAlias, cOutput [, nOutputFormat [, nFlags [, nRecords [, cSchemaName [, cSchemaLocation [, cNameSpace ]]]]]]) XMLUPDATEGRAM( [ cAliasList [, nFlags [, cSchemaLocation]]]) The nCodePage parameter must match a recognized Visual FoxPro code page.

For more information, see Code Pages Supported by Visual FoxPro.

MapVarchar Property Maps to Varchar, Varbinary, and Blob Data Types

For CursorAdapter and XMLAdapter classes, you can use the MapVarchar property to map to Varchar data types. To map to Varbinary and Blob data types, you can use the MapBinary property.

The XMLTOCURSOR( ) Function contains several new flags to support mapping of Char and base64Binary XML field types to new Fox data types.

For more information, see the MapVarchar Property and MapBinary Property.

Handling Conflict Checks with Properties for CursorAdapter Class

You can better handle conflicts when performing update and delete operations using the commands specified by the UpdateCmd and DeleteCmd properties for CursorAdapter objects by using the new ConflictCheckType and ConflictCheckCmd properties for CursorAdapter objects.

You can use ConflictCheckType to specify how to handle a conflict check during an update or delete operation. When ConflictCheckType is set to 4, you can use ConflictCheckCmd to specify a custom command to append to the end of the commands in the UpdateCmd and DeleteCmd properties.

Note

Visual FoxPro 8.0 Service Pack 1 includes the ConflictCheckType and ConflictCheckCmd properties.

For more information, see ConflictCheckType Property and ConflictCheckCmd Property.

Improved DataEnvironment Handling with UseCursorSchema and NoData Properties

You can specify default settings for CursorFill Method calls made without the first two parameters by setting these properties. For more information, see UseCursorSchema Property and NoData Property.

Timestamp Field Support

The new TimestampFieldList property lets you specify a list of timestamp fields for the cursor created by the CursorAdapter. For more information see TimestampFieldList Property.

Auto-Refresh Support

There are a number of scenarios where you might want to have cursor data refreshed from a remote data source after an Insert/Update operation has occurred. These include following scenarios:

A table has an auto-increment field that also acts as a primary key.
A table has a timestamp field, and that field must be refreshed from the database after each Insert/Update in order to allow successful subsequent updates to the record when WhereType=4 (key and timestamp).
A table contains some fields which have DEFAULT values or triggers defined that will cause changes to occur.

The following new properties have been added to the CursorAdapter class for Auto-Refresh support:Expand table

Property	Description
InsertCmdRefreshFieldList	List of fields to refresh after Insert command executes.
InsertCmdRefreshCmd	Specifies the command to refresh the record after Insert command executes.
InsertCmdRefreshKeyFieldList	List of key fields to refresh in record after Insert command executes.
UpdateCmdRefreshFieldList	List of fields to refresh after Update command executes.
UpdateCmdRefreshCmd	Specifies the command to refresh the record after Update command executes.
UpdateCmdRefreshKeyFieldList	List of key fields to refresh the record after Update command executes.
RefreshTimestamp	Enables automatic refresh for fields in TimestampFieldList during Insert/Update.

For more information about how Visual FoxPro updates remote data using a CursorAdapter, see Data Access Management Using CursorAdapters. Also, see InsertCmdRefreshCmd Property, InsertCmdRefreshFieldList Property, InsertCmdRefreshKeyFieldList Property, UpdateCmdRefreshCmd Property, UpdateCmdRefreshFieldList Property, UpdateCmdRefreshKeyFieldList Property and RefreshTimeStamp Property.

On Demand Record Refresh

In Visual FoxPro 8.0, the REFRESH( ) Function provides on demand record refresh functionality for local and remote views, however, it does not support this for the CursorAdapter. Visual FoxPro 9.0 extends REFRESH( ) support to the CursorAdapter and provides some additional capabilities:Expand table

Member	Description
RecordRefresh method	Refreshes the current field values for the target records. Use the CURVAL( ) Function to determine current field values.
BeforeRecordRefresh event	Occurs immediately before the RecordRefresh method is executed.
AfterRecordRefresh event	Occurs after the RecordRefresh method is executed.
RefreshCmdDataSourceType property	Specifies the data source type to be used for the RecordRefresh method.
RefreshCmdDataSource property	Specifies the data source to be used for the RecordRefresh method.
RefreshIgnoreFieldList property	List of fields to ignore during RecordRefresh operation
RefreshCmd property	Specifies the command to refresh rows when RecordRefresh is executed.
RefreshAlias property	Specifies the alias of read-only cursor used as a target for the refresh operation.

For more information, see RecordRefresh Method, BeforeRecordRefresh Event, AfterRecordRefresh Event, RefreshCmdDataSourceType Property, RefreshCmdDataSource Property, RefreshIgnoreFieldList Property, RefreshCmd Property and RefreshAlias Property.

Delayed Memo Fetch

The CursorAdapter class has a FetchMemo Property, which when set to False (.F.) in Visual FoxPro 9.0 places the cursor in Delayed Memo Fetch mode similar to Remote Views. Delayed Memo Fetch Mode prevents the contents of Memo fields from being fetched using CursorFill Method or CursorRefresh Method. An attempt to fetch content for a Memo field is done when the application attempts to access the value. The following CursorAdapter enhancements provide support for Delayed Memo Fetch:Expand table

Member	Description
DelayedMemoFetch method	Performs a delayed Memo field fetch for a target record in a cursor in a CursorAdapter object.
FetchMemoDataSourceType property	Specifies the data source type used for the DelayedMemoFetch method.
FetchMemoDataSource property	Specifies the data source used for the DelayedMemoFetch method.
FetchMemoCmdList property	Specifies a list of Memo field names and their associated fetch commands.

For more information, see DelayedMemoFetch Method, FetchMemoDataSourceType Property, FetchMemoDataSource Property and FetchMemoCmdList Property.

UseTransactions Property

The new UseTransactions property specifies whether the CursorAdapter should use transactions when sending Insert, Update or Delete commands through ADO or ODBC. For more information, see UseTransactions Property.

DEFAULT and CHECK Constraints Respected

In Visual FoxPro 9.0, DEFAULT values and table and field level CHECK constraints are supported for XML, Native, ADO and ODBC data sources. In Visual FoxPro 8.0, DEFAULT values and table and field level CHECK constraints are only supported for an XML data source. For the DEFAULT values and CHECK constraints to be applied to a cursor, call the CursorFill Method with the lUseSchema parameter set to True (.T.). For more information, see CursorSchema Property.

Remote Data Type Conversion for Logical Data

When you move data between a remote server and Visual FoxPro, Visual FoxPro uses ODBC or ADO data types to map remote data types to local Visual FoxPro data types. In Visual FoxPro 9.0, certain ODBC and ADO data types can now be mapped to a logical data type in remote views and the CursorAdapter object. For more information, see Data Type Conversion Control.

ADOCodePage Property

When working with an ADO data source for your CursorAdapter, you may want to specify a code page to use for character data translation. The new ADOCodePage property allows you to specify this code page. For more information, see ADOCodePage Property.

Read and Write Nested XML Documents

You can read to and write from your relational database into XML documents using nesting to handle the relationships between tables. You accomplish this using the RespectNesting Property of the XMLAdapter class. The XMLTable class has the Nest Method, Unnest Method and the following properties to handle nesting.

For more information, see the XMLAdapter Class and the XMLTable Class.

LoadXML Method Can Accept Any XML Document

The LoadXML method accepts any XML document with a valid schema. Previously, the method required that the schema follow the format of a Visual Studio generated dataset. When you use the LoadXML method to read an XML document with a schema different from a Visual Studio generated dataset, the properties for the XMLAdapter, XMLName, and XMLPrefix properties are set to empty (“”). The XMLAdapterXMLNamespace property becomes equal to the target Namespace attribute value for the schema node and each XML element becomes a complexType and is mapped to an XMLTable object. The XMLNamespace property is set to namespaceURI for the element.

If you set the XMLAdapterRespectNesting property to True (.T.), the top level element declaration is ignored if it is referenced from some other complex element. For that case, the XMLTable object for the referenced element is nested into the XMLTable for the element that references it.

For more information, see LoadXML Method.

XPath Expressions Can Access Complex XML Documents

You can use XPath expressions to access complex XML documents and the new properties for reading the nodes within the document. For example, you might want to filter record nodes, restore relationships based on foreign key fields, use an element’s text as data for a field, or access XML that uses multiple XML namespaces. The following properties provide you with the ability to read the XML at the XMLAdapter level, XMLTable level, or the XMLField level.

You can use the following table to determine the node within the XML document that you want to start reading.

For example, if you use an XPath expression in the XMLName property for an XMLAdapter, reading begins at the first nodeExpand table

To read	Class	Context node
From the first found XML node:	XMLAdapter	IXMLDOMElement property
All found XML nodes and use each node as a single record:	XMLTable	XMLAdapter object
The first found XML node and use its text as a field value:	XMLField	XMLTable object

The following methods do not support the use of XPath expressions in the XMLName property:

The ApplyDiffgram and ChangesToCursor methods do not support XPath expressions for XMLAdapter and XMLTable objects.
The ToCursor method does not support an XPath expression for XMLAdapter when the IsDiffgram property is set to True (.T.).
The ToXML method does not support XPath expressions for XMLAdapter and XMLTable objects and ignores XMLField objects that use XPath expressions.

For more information about XPath expressions, see the XPath Reference in the Microsoft Core XML Services (MSXML) 4.0 SDK in the MSDN library at https://msdn.microsoft.com/library.

Cursor to XML Functions

Support for the following functions has been added to the OLE DB Provider for Visual FoxPro:

When used in the OLE DB Provider for Visual FoxPro, the _VFP VFPXMLProg property is not supported for the CURSORTOXML( ), XMLTOCURSOR( ) and XMLUPDATEGRAM( ) functions because the _VFP system variable is not supported in the OLE DB Provider.

EXECSCRIPT Supported in the Visual FoxPro OLE DB Provider

You can use the EXECSCRIPT( ) function with the Visual FoxPro OLE DB Provider. For more information, see EXECSCRIPT( ) Function.

Returning a Rowset from a Cursor in the Visual FoxPro OLE DB Provider

You can use the new SETRESULTSET( ), GETRESULTSET( ), and CLEARRESULTSET( ) functions to mark a cursor or table that has been opened by the Visual FoxPro OLE DB Provider, retrieve the work area of the marked cursor, and clear the marker flag from a marked cursor. By marking a cursor or table, you can retrieve a rowset that is created from the marked cursor or table from a database container (DBC) stored procedure when the OLE DB Provider completes command execution.

For more information, see SETRESULTSET( ) Function, GETRESULTSET( ) Function, and CLEARRESULTSET( ) Function.

SQL Language Improvements

Article
07/09/2007

Expanded Capacities

Several SELECT – SQL command limitations have been removed or increased in Visual FoxPro 9.0. The following table lists the areas where limitations have been removed or increased.Expand table

Capacity	Description
Number of Joins and Subqueries in a SELECT – SQL command	Visual FoxPro 9.0 removes the limit on the total number of join clauses and subqueries in a SELECT – SQL command. The previous limit was nine.
Number of UNION clauses in a SELECT – SQL command	Visual FoxPro 9.0 removes the limit on number of UNION clauses in a SQL SELECT statement. The previous limit was nine.
Number of tables referenced a SELECT – SQL command	Visual FoxPro 9.0 removes the limit on the number of tables and aliases referenced in a SQL SELECT statement. The previous limit was 30.
Number of arguments in an IN( ) clause	Visual FoxPro 9.0 removes the limit of 24 values in the IN (Value_Set) clause for the WHERE clause. However, the number of values remains subject to the setting of SYS(3055) – FOR and WHERE Clause Complexity. For functionality changes concerning the IN clause, see Changes in Functionality for the Current Release.

Subquery Enhancements

Visual FoxPro 9.0 provides more flexibility in subqueries. For example, multiple subqueries are now supported. The following describes the enhancements to subqueries in Visual FoxPro 9.0.

Multiple Subqueries

Visual FoxPro 9.0 supports multiple subquery nesting, with correlation allowed to the immediate parent. There is no limit to the nesting depth. In Visual FoxPro 8.0, error 1842 (SQL: Subquery nesting is too deep) was generated when more than one level of subquery nesting occurred.

The following is the general syntax for multiple subqueries.

SELECT … WHERE … (SELECT … WHERE … (SELECT …) …) …

Examples

The following example queries, which will generate an error in Visual FoxPro 8.0, are now supported in Visual FoxPro 9.0.Copy

CREATE CURSOR MyCursor (field1 I)
INSERT INTO MyCursor VALUES (0)

CREATE CURSOR MyCursor1 (field1 I)
INSERT INTO MyCursor1 VALUES (1)

CREATE CURSOR MyCursor2 (field1 I)
INSERT INTO MyCursor2 VALUES (2)

SELECT * FROM MyCursor T1 WHERE EXISTS ;
    (SELECT * from MyCursor1 T2 WHERE NOT EXISTS ;
    (SELECT * FROM MyCursor2 T3))

*** Another multiple subquery nesting example ***
SELECT * FROM table1 WHERE table1.iid IN ;
    (SELECT table2.itable1id FROM table2 WHERE table2.iID IN ;
    (SELECT table3.itable2id FROM table3 WHERE table3.cValue = "value"))

GROUP BY in a Correlated Subquery

Many queries can be evaluated by executing a subquery once and substituting the resulting value or values into the WHERE clause of the outer query. In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.

Visual FoxPro 8.0 does not allow using GROUP BY in correlated subquery, and generates error 1828 (SQL: Illegal GROUP BY in subquery). Visual FoxPro 9.0 removes this limitation and supports GROUP BY for correlated subqueries allowed to return more than one record.

The following is the general syntax for the GROUP BY clause in a correlated subquery.

SELECT … WHERE … (SELECT … WHERE … GROUP BY …) …

Examples

The following example, which will generate an error in Visual FoxPro 8.0, is now supported in Visual FoxPro 9.0.Copy

CLOSE DATABASES ALL
CREATE CURSOR MyCursor1 (field1 I, field2 I, field3 I)
INSERT INTO MyCursor1 VALUES(1,2,3)
CREATE CURSOR MyCursor2 (field1 I, field2 I, field3 I)
INSERT INTO MyCursor2 VALUES(1,2,3)

SELECT * from MyCursor1 T1 WHERE field1;
   IN (SELECT MAX(field1) FROM MyCursor2 T2 ;
   WHERE T2.field2=T1.FIELD2 GROUP BY field3)

TOP N in a Non-Correlated Subquery

Visual FoxPro 9.0 supports the TOP N clause in a non-correlated subquery. The ORDER BY clause should be present if the TOP N clause is used, and this is the only case where it is allowed in subquery.

The following is the general syntax for the TOP N clause in a non-correlated subquery.

SELECT … WHERE … (SELECT TOP nExpr [PERCENT] … FROM … ORDER BY …) …

Examples

The following example, which will generate an error in Visual FoxPro 8.0, is now supported in Visual FoxPro 9.0.Copy

CLOSE DATABASES ALL
CREATE CURSOR MyCursor1 (field1 I, field2 I, field3 I)
INSERT INTO MyCursor1 VALUES(1,2,3)
CREATE CURSOR MyCursor2 (field1 I, field2 I, field3 I)
INSERT INTO MyCursor2 VALUES(1,2,3)

SELECT * FROM MyCursor1 WHERE field1 ;
   IN (SELECT TOP 5 field2 FROM MyCursor2 order by field2)

Subqueries in a SELECT List

Visual FoxPro 9.0 allows a subquery as a column or a part of expression in a projection. A subquery in a projection has exactly the same requirements as a subquery used in a comparison operation. If a subquery does not return any records, NULL value is returned.

In Visual FoxPro 8.0, an attempt to use a subquery as a column or a part of expression in a projection would generate error 1810 (SQL: Invalid use of subquery).

The following is the general syntax for a subquery in a SELECT list.

SELECT … (SELECT …) … FROM …

Example

The following example, which will generate an error in Visual FoxPro 8.0, is now supported in Visual FoxPro 9.0.Copy

SELECT T1.field1, (SELECT field2 FROM MyCursor2 T2;
   WHERE T2.field1=T1.field1) FROM MyCursor1 T1

Aggregate functions in a SELECT List of a Subquery

In Visual FoxPro 9.0, aggregate functions are now supported in a SELECT list of a subquery compared using the comparison operators <, <=, >, >= followed by ALL, ANY, or SOME. See Considerations for SQL SELECT Statements for more information about aggregate functions.

Example

The following example demonstrates the use of an aggregate function (the COUNT( ) function) in a SELECT list of a subquery.Copy

CLOSE DATABASES ALL 

CREATE CURSOR MyCursor (FIELD1 i)
INSERT INTO MyCursor VALUES (6)
INSERT INTO MyCursor VALUES (0)
INSERT INTO MyCursor VALUES (1)
INSERT INTO MyCursor VALUES (2)
INSERT INTO MyCursor VALUES (3)
INSERT INTO MyCursor VALUES (4)
INSERT INTO MyCursor VALUES (5)
INSERT INTO MyCursor VALUES (-1)

CREATE CURSOR MyCursor2 (FIELD2 i)
INSERT INTO MyCursor2  VALUES (1)
INSERT INTO MyCursor2  VALUES (2)
INSERT INTO MyCursor2  VALUES (2)
INSERT INTO MyCursor2  VALUES (3)
INSERT INTO MyCursor2  VALUES (3)
INSERT INTO MyCursor2  VALUES (3)
INSERT INTO MyCursor2  VALUES (4)
INSERT INTO MyCursor2  VALUES (4)
INSERT INTO MyCursor2  VALUES (4)
INSERT INTO MyCursor2  VALUES (4)

SELECT * FROM MYCURSOR WHERE field1;
   < ALL (SELECT count(*) FROM MyCursor2 GROUP BY field2) ;
   INTO CURSOR MyCursor3
BROWSE

Correlated Subqueries Allow Complex Expressions to be Compared with Correlated Field

In Visual FoxPro 8.0, correlated fields can only be referenced in the following forms:

correlated field <comparison> local field

-or-

local field <comparison> correlated field

In Visual FoxPro 9.0. correlated fields support comparison to local expressions, as shown in the following forms:

correlated field <comparison> local expression

-or-

local expression <comparison> correlated field

A local expression must use at least one local field and cannot reference any outer (correlated) field.

Example

In the following example, a local expression (MyCursor2.field2 / 2) is compared to a correlated field (MyCursor.field1).Copy

SELECT * FROM MyCursor ;
   WHERE EXISTS(SELECT * FROM MyCursor2  ;
   WHERE MyCursor2.field2 / 2 > MyCursor.field1)

Changes for Expressions Compared with Subqueries.

In Visual FoxPro 8.0, the left part of a comparison using the comparison operators [NOT] IN, <, <=, =, ==, <>, !=, >=, >, ALL, ANY, or SOME with a subquery must reference one and only one table from the FROM clause. In case of a comparison with correlated subquery, the table must also be the correlated table.

In Visual FoxPro 9.0, comparisons work in the following ways:

The expression on the left side of an IN comparison must reference at least one table from the FROM clause.
The left part for the conditions =, ==, <>, != followed by ALL, SOME, or ANY must reference at least one table from the FROM clause.
The left part for the condition >, >=, <, <= followed by ALL, SOME, or ANY (SELECT TOP…) must reference at least one table from the FROM clause.
The left part for the condition >, >=, <, <= followed by ALL, SOME, or ANY (SELECT <aggregate function>…) must reference at least one table from the FROM clause.
The left part for the condition >, >=, <, <= followed by ALL, SOME, or ANY (subquery with GROUP BY and/or HAVING) must reference at least one table from the FROM clause.

In Visual FoxPro 9.0, the left part of a comparison that does not come from the list (for example, ALL, SOME, or ANY are not included) doesn’t have to reference any table from the FROM clause.

In all cases, the left part of the comparison is allowed to reference more than one table from the FROM clause. For a correlated subquery, the left part of the comparison does not have to reference the correlated table.

Subquery in an UPDATE – SQL Command SET List

In Visual FoxPro 9.0, the UPDATE – SQL Command now supports a subquery in the SET clause.

A subquery in a SET clause has exactly the same requirements as a subquery used in a comparison operation. If the subquery does not return any records, the NULL value is returned.

Only one subquery is allowed in a SET clause. If there is a subquery in the SET clause, subqueries in the WHERE clause are not allowed.

The following is the general syntax for a subquery in the SET clause.

UPDATE … SET … (SELECT …) …

Example

The following example demonstrates the use of a subquery in the SET clause.Copy

CLOSE DATA
CREATE CURSOR MyCursor1 (field1 I , field2 I NULL)

INSERT INTO MyCursor1 VALUES (1,1)
INSERT INTO MyCursor1 VALUES (2,2)
INSERT INTO MyCursor1 VALUES (5,5)
INSERT INTO MyCursor1 VALUES (6,6)
INSERT INTO MyCursor1 VALUES (7,7)
INSERT INTO MyCursor1 VALUES (8,8)
INSERT INTO MyCursor1 VALUES (9,9)

CREATE CURSOR MyCursor2 (field1 I , field2 I)

INSERT INTO MyCursor2 VALUES (1,10)
INSERT INTO MyCursor2 VALUES (2,20)
INSERT INTO MyCursor2 VALUES (3,30)
INSERT INTO MyCursor2 VALUES (4,40)
INSERT INTO MyCursor2 VALUES (5,50)
INSERT INTO MyCursor2 VALUES (6,60)
INSERT INTO MyCursor2 VALUES (7,70)
INSERT INTO MyCursor2 VALUES (8,80)

UPDATE MyCursor1 SET field2=100+(SELECT field2 FROM MyCursor2 ;
  WHERE MyCursor2.field1=MyCursor1.field1) WHERE field1>5

SELECT MyCursor1
LIST

Sub-SELECT in the FROM Clause

A sub-SELECT is often referred to as a derived table. Derived tables are SELECT statements in the FROM clause referred to by an alias or a user-specified name. The result set of the SELECT in the FROM clause creates a table used by the outer SELECT statement. Visual FoxPro 9.0 permits the use of a subquery in the FROM clause.

A sub-SELECT should be enclosed in parentheses and an alias is required. Correlation is not supported. A sub-SELECT has the same syntax limitations as the SELECT command, but not the subquery syntax limitations. All sub-SELECTs are executed before the top most SELECT is evaluated.

The following is the general syntax for a subquery in the FROM clause.

SELECT … FROM (SELECT …) [AS] Alias…

Example

The following example demonstrates the use of a subquery in the FROM clause.Copy

SELECT * FROM (SELECT * FROM MyCursor T1;
   WHERE field1 = (SELECT T2.field2 FROM MyCursor1 T2;
   WHERE T2.field1=T1.field2);
   UNION SELECT * FROM MyCursor2;
   ORDER BY 2 desc) AS subquery

** Note that the following code will generate an error ** SELECT * FROM (SELECT TOP 5 field1 FROM MyCursor) ORDER BY field1

ORDER BY with Field Names in the UNION clause

When using a UNION clause in Visual FoxPro 8.0, you are forced to use numeric references in the ORDER BY clause. In Visual FoxPro 9.0, this restriction has been removed and you can use field names.

The referenced fields must be present in the SELECT list (projection) for the last SELECT in the UNION; that projection is used for ORDER BY operation.

Example

The following example demonstrates the use of a field names in the ORDER BY clause.Copy

CLOSE DATABASES all
CREATE CURSOR MyCursor(field1 I,field2 I)
INSERT INTO MyCursor values(1,6)
INSERT INTO MyCursor values(2,5)
INSERT INTO MyCursor values(3,4)

SELECT field1, field2, .T. AS FLAG,1 FROM MyCursor;
   WHERE field1=1;
   UNION ;
   SELECT field1, field2, .T. AS FLAG,1 FROM MyCursor;
   WHERE field1=3;
   ORDER BY field2 ;
   INTO CURSOR TEMP READWRITE

BROWSE NOWAIT

Optimized TOP N Performance

In Visual FoxPro 8.0 and earlier versions, when using the TOP N [PERCENT] clause all records are sorted and then the TOP N are extracted. In Visual FoxPro 9.0, performance has been improved by eliminating records that do not qualify for the TOP N from the sort process as early as possible.

The TOP N optimization is done only if the SET ENGINEBEHAVIOR Command is set to 90.

Optimization requires that TOP N return no more than N records (this is not the case for Visual FoxPro 8.0 and earlier versions) which is enforced if SET ENGINEBEHAVIOR is set to 90.

TOP N PERCENT cannot be optimized unless the whole result set can be read into memory at once.

Improved Optimization for Multiple Table OR Conditions

Visual FoxPro 9.0 provides for improved Rushmore optimization involving multi-table OR conditions. Visual FoxPro uses multi-table OR conditions to Rushmore optimize filter conditions for a table as long as both sides of the condition can be optimized. The following example shows this:Copy

CLEAR
CREATE CURSOR Test1 (f1 I)
FOR i=1 TO 20
  INSERT INTO Test1 VALUES (I)
NEXT 
INDEX ON f1 TAG f1
CREATE CURSOR Test2 (f2 I)
FOR i=1 TO 20
  INSERT INTO Test2 VALUES (I)
NEXT
INDEX ON f2 TAG f2
SYS(3054,12)
SELECT * from Test1, Test2 WHERE (f1 IN (1,2,3) AND f2 IN (17,18,19)) OR ;
  (f2 IN (1,2,3) AND f1 IN (17,18,19)) INTO CURSOR Result
SYS(3054,0)

In this scenario, table Test1 can be Rushmore optimized using the following condition:

(f1 IN (1,2,3) OR f1 IN (17,18,19))and table Test2 with the following:

(f2 IN (17,18,19) OR f2 IN (1,2,3))

Support for Local Buffered Data

At times it can be beneficial to use SELECT – SQL to select records from a local buffered cursor in which the table has not been updated. Many times when creating controls like grids, list boxes, and combo boxes it is necessary to consider newly added records which have not yet been committed to disk. Currently, SQL statements are based on content that is already committed to disk.

Visual FoxPro 9.0 provides language enhancements that allow you to specify if the data returned by a SELECT – SQL command is based on buffered data or data written directly to disk.

The SELECT – SQL command now supports a WITH … BUFFERING clause that lets you specify if retrieved data is based on buffered data or data written directly to disk. For more information, see SELECT – SQL Command – WITH Clause.

If you do not include the BUFFERING clause, the retrieved data is then determined by the setting for SET SQLBUFFERING command. For more information, see the SET SQLBUFFERING Command.

Enhancements to other SQL Commands

The following sections describe enhancements made to the INSERT – SQL Command, UPDATE – SQL Command, and DELETE – SQL Command commands in Visual FoxPro 9.0.

UNION Clause in the INSERT – SQL Command

In Visual FoxPro 9.0, a UNION clause is now supported in the INSERT – SQL Command.

The following is the general syntax for the UNION clause.

INSERT INTO … SELECT … FROM … [UNION SELECT … [UNION …]]

Example

The following example demonstrates the use of a UNION clause in INSERT-SQL.Copy

CLOSE DATABASES ALL
CREATE CURSOR MyCursor (field1 I,field2 I)
CREATE CURSOR MyCursor1 (field1 I,field2 I)
CREATE CURSOR MyCursor2 (field1 I,field2 I)

INSERT INTO MyCursor1 VALUES (1,1)
INSERT INTO MyCursor2 VALUES (2,2)

INSERT INTO MyCursor SELECT * FROM MyCursor1 UNION SELECT * FROM MyCursor2

SELECT MyCursor
LIST

Correlated UPDATE – SQL Commands

Visual FoxPro 9.0 now supports correlated updates with the UPDATE – SQL Command.

If a FROM clause is included in the UPDATE -SQL command, then the name after UPDATE keyword defines the target for the update operation. This name can be a table name, an alias, or a file name. The following logic is used to select the target table:

If the name matches an implicit or explicit alias for a table in the FROM clause, then the table is used as a target for the update operation.
If the name matches the alias for the cursor in the current data session, then the cursor is used as a target.
A table or file with the same name is used as a target.

The UPDATE -SQL command FROM clause has the same syntax as the FROM clause in the SELECT – SQL command with the following limitations:

The target table or cursor cannot be involved in an OUTER JOIN as a secondary table.
The target cursor cannot be a subquery result.
All other JOINs can be evaluated before joining the target table.

The following is the general syntax for a correlated UPDATE command.

UPDATE … SET … FROM … WHERE …

Example

The following example demonstrates a correlated update using the UPDATE -SQL command.Copy

CLOSE DATABASES ALL

CREATE CURSOR MyCursor1 (field1 I , field2 I NULL,field3 I NULL)
INSERT INTO MyCursor1 VALUES (1,1,0)
INSERT INTO MyCursor1 VALUES (2,2,0)
INSERT INTO MyCursor1 VALUES (5,5,0)
INSERT INTO MyCursor1 VALUES (6,6,0)
INSERT INTO MyCursor1 VALUES (7,7,0)
INSERT INTO MyCursor1 VALUES (8,8,0)
INSERT INTO MyCursor1 VALUES (9,9,0)

CREATE CURSOR MyCursor2 (field1 I , field2 I)
INSERT INTO MyCursor2 VALUES (1,10)
INSERT INTO MyCursor2 VALUES (2,20)
INSERT INTO MyCursor2 VALUES (3,30)
INSERT INTO MyCursor2 VALUES (4,40)
INSERT INTO MyCursor2 VALUES (5,50)
INSERT INTO MyCursor2 VALUES (6,60)
INSERT INTO MyCursor2 VALUES (7,70)
INSERT INTO MyCursor2 VALUES (8,80)

CREATE CURSOR MyCursor3 (field1 I , field2 I)
INSERT INTO MyCursor3 VALUES (6,600)
INSERT INTO MyCursor3 VALUES (7,700)

UPDATE MyCursor1 SET MyCursor1.field2=MyCursor2.field2, field3=MyCursor2.field2*10 FROM MyCursor2 ;
  WHERE MyCursor1.field1>5 AND MyCursor2.field1=MyCursor1.field1

SELECT MyCursor1
LIST

UPDATE MyCursor1 SET MyCursor1.field2=MyCursor3.field2 FROM MyCursor2, MyCursor3  ;
  WHERE MyCursor1.field1>5 AND MyCursor2.field1=MyCursor1.field1 AND MyCursor2.field1=MyCursor3.field1

SELECT MyCursor1
LIST

Correlated DELETE – SQL Commands

Visual FoxPro 9.0 now supports correlated deletions with the DELETE – SQL Command.

If a FROM clause has more than one table, the name after the DELETE keyword is required and it defines the target for the delete operation. This name can be a table name, an alias or a file name. The following logic is used to select the target table:

If the name matches an implicit or explicit alias for a table in the FROM clause, then the table is used as a target for the update operation.
If the name matches the alias for the cursor in the current data session, then the cursor is used as a target.
A table or file with the same name is used as a target.

The DELETE -SQL command FROM clause has the same syntax as the FROM clause in the SELECT – SQL command with the following limitations:

The target table or cursor cannot be involved in an OUTER JOIN as a secondary table.
The target cursor cannot be a subquery result.
It should be possible to evaluate all other JOINs before joining the target table.

The following is the general syntax for a correlated DELETE command.

DELETE [alias] FROM alias1 [, alias2 … ] … WHERE …

Example

The following example demonstrates a correlated deletion using the DELETE -SQL command.Copy

CLOSE DATABASES ALL 

CREATE CURSOR MyCursor1 (field1 I , field2 I NULL,field3 I NULL)
INSERT INTO MyCursor1 VALUES (1,1,0)
INSERT INTO MyCursor1 VALUES (2,2,0)
INSERT INTO MyCursor1 VALUES (5,5,0)
INSERT INTO MyCursor1 VALUES (6,6,0)
INSERT INTO MyCursor1 VALUES (7,7,0)
INSERT INTO MyCursor1 VALUES (8,8,0)
INSERT INTO MyCursor1 VALUES (9,9,0)

CREATE CURSOR MyCursor2 (field1 I , field2 I)
INSERT INTO MyCursor2 VALUES (1,10)
INSERT INTO MyCursor2 VALUES (2,20)
INSERT INTO MyCursor2 VALUES (3,30)
INSERT INTO MyCursor2 VALUES (4,40)
INSERT INTO MyCursor2 VALUES (5,50)
INSERT INTO MyCursor2 VALUES (6,60)
INSERT INTO MyCursor2 VALUES (7,70)
INSERT INTO MyCursor2 VALUES (8,80)

CREATE CURSOR MyCursor3 (field1 I , field2 I)
INSERT INTO MyCursor3 VALUES (6,600)
INSERT INTO MyCursor3 VALUES (7,700)

DELETE MyCursor1 FROM MyCursor2  ;
  WHERE MyCursor1.field1>5 AND MyCursor2.field1=MyCursor1.field1

SELECT MyCursor1
LIST
RECALL ALL

DELETE MyCursor1 FROM MyCursor2, MyCursor3  ;
  WHERE MyCursor1.field1>5 AND MyCursor2.field1=MyCursor1.field1 AND MyCursor2.field1=MyCursor3.field1

SELECT MyCursor1
LIST
RECALL ALL

DELETE FROM MyCursor1 WHERE MyCursor1.field1>5

SELECT MyCursor1
list
RECALL ALL

DELETE MyCursor1 from MyCursor1 WHERE MyCursor1.field1>5

RECALL ALL IN MyCursor1

DELETE T1 ;
  FROM MyCursor1 T1 JOIN MyCursor2 ON T1.field1>5 AND MyCursor2.field1=T1.field1, MyCursor3  ;
  WHERE MyCursor2.field1=MyCursor3.field1

RECALL ALL IN MyCursor1

Updatable Fields in UPDATE – SQL Command

The number of fields that can be updated with the UPDATE – SQL Command is no longer limited to 128 as in prior versions of Visual FoxPro. You are now limited to 255, which is the number of fields available in a table.

SET ENGINEBEHAVIOR

The SET ENGINEBEHAVIOR Command has a new Visual FoxPro 9.0 option, 90, that affects SELECT – SQL command behavior for the TOP N clause and aggregate functions. For additional information, see the SET ENGINEBEHAVIOR Command.

Data Type Conversion

Conversion between data types (for example, conversion between memo and character fields) has been improved in Visual FoxPro 9.0. This conversion improvement applies to the ALTER TABLE – SQL Command with the COLUMN option as well as structural changes made with the Table Designer.

Class Enhancements

Article
07/09/2007

Anchoring Visual Controls

You can anchor a visual control to one or more edges of its parent container using the control’s Anchor property. When you anchor a visual control to the parent container, the edges of the control remain in the same position relative to the edges of the container when you resize the container. For more information, see Anchor Property.

Docking Forms

Visual FoxPro extends docking support to user-defined forms. Docking forms works similarly to docking toolbars except that you can dock forms to Visual FoxPro Interactive Development Environment (IDE) windows and other forms, and controls on the form can still obtain focus when the form is docked.

Visual FoxPro includes the following new and updated properties, methods, and events to support docking forms.

For more information, see How to: Dock Forms.

CheckBox and OptionButton Controls Support Wordwrapping

The WordWrap property is now supported for CheckBox and OptionButton controls. The text portions of these controls now use wordwrapping. For more information, see WordWrap Property.

CommandButton Controls Can Align Text with Pictures

The Alignment property now applies to CommandButton controls when specifying an image for the Picture property and setting the PicturePosition property to a value other than the default. The Alignment property also contains new and revised settings for CommandButton, CheckBox, and OptionButton controls. For more information, see Alignment Property.

CommandButton, OptionButton, and CheckBox Controls Can Hide Captions

The PicturePostion property contains a new setting of 14 (No text) for CommandButton, OptionButton, and CheckBox controls. You can use this setting to hide the text portions of these controls without needing to set the Caption property to an empty string. This setting is particularly useful when you want to include a hotkey for a button with a graphic without displaying the Caption text. You must set the Style property to 1 (Graphical) for this new setting to apply.

In addition, the PicturePosition property now applies to CheckBox and OptionButton controls when Style is set to 1 (Graphical).

For more information, see PicturePosition Property.

PictureMargin and PictureSpacing Properties Control Spacing and Margins on CommandButton, OptionButton, and CheckBox Controls

You can better control positioning of images on CommandButton, OptionButton, and CheckBox controls with the new PictureMargin and PictureSpacing properties. The PictureMargin property specifies margin spacing in pixels between an image and the control’s border as determined by the PicturePosition property. The PictureSpacing property specifies margin spacing in pixels between an image and text on the control.

For more information, see PictureMargin Property and PictureSpacing Property.

Collection Objects Support in ComboBox and ListBox Controls

You can now specify Collection objects as the row source and row source type for the RowSource and RowSourceType properties of ComboBox and ListBox controls. For more information, see RowSource Property and RowSourceType Property.

Setting Ascending or Descending Indexes on Cursors in the DataEnvironment

You can specify ascending or descending order for a cursor index by using the new OrderDirection property for Cursor objects.

Note

OrderDirection is disregarded when the cursor’s Order property is empty.

For more information, see OrderDirection Property.

Grid Supports Rushmore Optimization

The Grid Control can be set to support Rushmore optimization if the underlying data source contains indexes that support this.

For more information, see Optimize Property.

Mouse Pointer Control for Grid Columns and Column Headers

The MousePointer and MouseIcon properties now apply to Column objects in a grid and Header objects in a column. For the MousePointer property, you can specify the new setting of 16 (Down Arrow) to reset the mouse pointer for a column header to the default down arrow.

For more information, see MousePointer Property and MouseIcon Property.

Rotating Label, Line, and Shape Controls

You can use the new Rotation property to rotate Label controls. The Rotation property applies to Line and Shape controls when used with the new PolyPoints property. For more information, see Rotation Property (Visual FoxPro), PolyPoints Property, and Creating More Complex Shapes using the PolyPoints Property.

Label Controls Can Display Themed Background

For Label controls, you can set the Style property to Themed Background Only to show only themed background colors when Windows themes are turned on. The label background color is the same as the parent container for the label. For more information, see Style Property.

ListBox Controls Can Hide Scroll Bars

You can use the new AutoHideScrollBar property for ListBox controls to hide scroll bars when the list contains less than the number of items that can be visible in the list box. For more information, see AutoHideScrollBar Property.

Note

In versions prior to this release, undocked vertical system and user-defined toolbars did not display horizontal separators. In the current release, horizontal separators now display for vertical undocked toolbars.

For more information, see Style Property.

The Visible property now applies to Separator objects so you can control whether a Separator object displays in Toolbar controls. When used in combination with the Style property, the separator’s Visible property determines whether a space or line displays as the separator when its Style property is set to 0 (Normal – do not display a line) or 1 (display a horizontal or vertical line), respectively.

For more information, see Visible Property (Visual FoxPro).

Creating More Complex Shapes

For Line controls, when you create a polygon line using the PolyPoints property, you can specify the new setting of “S” or “s” for the LineSlant property to create a Bezier curve.

For more information, see PolyPoints Property and LineSlant Property.

ComboBox Controls Can Hide Drop-Down Lists

You can now use the NODEFAULT command in the DropDown event for a ComboBox control. This will prevent the drop-down list portion of a ComboBox control from appearing. For more information, see NODEFAULT Command.

NEWOBJECT( ) Creates Objects without Raising Initialization Code

By passing 0 to the cInApplication parameter for the NEWOBJECT( ) function, Visual FoxPro allows you to create an instance of a class without raising initialization code (such as code in the Init, Load, Activate, and BeforeOpenTables events). Furthermore, when the object is released, it does not raise its destructor code (such as code in the Destroy and Unload events). Only initialization and destructor code is suppressed; code in other events and methods is still called.

If you use the cInApplication parameter to suppress initialization and destructor code in an object, you also suppress it in the object’s child objects.

This behavior is not supported for the NewObject Method.

For more information, see NEWOBJECT( ) Function.

Specify Where Focus is Assigned in the Valid Event

To direct where focus is assigned, you can use the optional ObjectName parameter in the RETURN command of the Valid event. The object specified must be a valid Visual FoxPro object. If the specified object is disabled or cannot receive focus, then focus is assigned to the next object in the tab order. If an invalid object is specified, Visual FoxPro keeps the focus at the current object.

You can now set focus to objects in the following scenarios:

Set focus to an object on another visible form.
Set focus to an object on a non-visible Page or Pageframe control.

For more information, see Valid Event.

TextBox Controls Have Auto-Completion Functionality

You can add auto-completion functionality to your text box controls to make data entry more efficient. Auto-completion is the automatic display of a drop-down list of entries that match the string as it is typed into the text box. The entries come from a special table that tracks unique values entered into the text box, the control name that is the source of the value, and usage information.

The following properties support auto-completion:

By the setting the AutoComplete property, you determine the sort order for the entries. If you want more control over the list and where it is stored, you can use the AutoCompSource property to specify the table used to populate the automatic list. By default, the table is AUTOCOMP.DBF. You can use one table for each text box control or a single table can populate automatic lists for several text boxes.

If you use a single table, which is the default, the table uses values in the Source field for each entry to identify the text box control associated with the entry. By default, the Source field value is the name of the text box control. You can specify the Source field value using the AutoCompSource property of the text box. For example, you might want to make the same set of entries available to multiple Text box controls within the application such as address information. You can explicitly set the AutoCompTable and AutoCompSource properties for each of the controls to the same table and source field value. The same automatic list appears for each of them.

The text box control handles updating the auto-completion table for you based on the values actually entered in the text box. If you want to remove a value from the list, enter a string in a text box that matches the string you want to delete to filter the list for it. Select the entry in the list and press the DELETE key. The string remains in the table but no longer appears in the automatic list.

Note

You can control the number of items that appear in the drop-down list using SYS(2910) – List Display Count.

For more information, see AutoComplete Property, AutoCompSource Property, and AutoCompTable Property.

New InputMask and Format Property Settings

The following new InputMask and Format settings are available:

InputMask PropertyExpand table

cMask	Description
U	Permits alphabetic characters only and converts them to uppercase (A – Z).
W	Permits alphabetic characters only and converts them to lowercase (a – z).

Format PropertyExpand table

cFunction	Description
Z	Displays the value as blank if it is 0, except when the control has focus.Dates and DateTimes are also supported in these controls. The date and datetime delimiters are not displayed unless the control has focus.

For more information, see InputMask Property and Format Property.

Use PictureVal Property to Pass Images as Strings

The Image control’s new PictureVal property can be used instead of the Picture Property (Visual FoxPro) to specify a character string expression or object of an image. For an object, the format must be of an IPicture interface format compatible with LOADPICTURE( ) Function.

For more information, see PictureVal Property.

CLEAR CLASSLIB Updated

Note

Classes in other class libraries that are used or referenced by a class in the specified class library are not cleared.

For more information, see CLEAR Commands.

Screen Resolution Limit Increased

Form.Width = 2552
Form.Height = 2014

Additionally, if you attempted to set Width and Height properties to these limits in design-time and then ran the form, you would see that the values have reverted to screen resolution limits (being that they were saved this way):Copy

Form.Width = 1280
Form.Height = 998

In Visual FoxPro 9.0, this limitation has been increased to approximately 32,000 pixels for each dimension and now allows for more flexibility with certain forms such as scrollable ones:Copy

Form.Width = 32759
Form.Height = 32733

For more information, see Width Property and Height Property.

ProjectHook Source Code Control Events

New events have been added to the ProjectHook class, which allow you to perform source code control operations such as check-in and check-out of multiple files at once.

For more information, see SCCInit Event and SCCDestroy Event.

AddProperty Method Supports Design Time Settings

You can specify the visibility (Protected, Hidden or Public) and description of a property using the AddProperty method with new available parameters. These settings can also be controlled from the New Property Dialog Box and Edit Property/Method Dialog Box. For more information, see AddProperty Method.

WriteMethod Method Supports Design Time Settings

You can specify the visibility (Protected, Hidden or Public) and description of a method using the WriteMethod method with new available parameters. These settings can also be controlled from the New Property Dialog Box and Edit Property/Method Dialog Box. For more information, see WriteMethod Method.

Language Enhancements

Article
07/09/2007

Class Enhancements

Visual FoxPro contains significant language enhancements for classes, forms, controls, and object-oriented related features. For more information, see Class Enhancements.

Data and XML Enhancements

Visual FoxPro contains significant language enhancements for Data, SQL and XML features. For more information, see SQL Language Improvements and Data and XML Feature Enhancements.

IDE Enhancements

Visual FoxPro contains a number of language enhancements for features related to the IDE (Interactive Development Environment). For more information, see Interactive Development Environment (IDE) Enhancements and Enhancements to Visual FoxPro Designers.

Printing and Reporting Enhancements

Visual FoxPro contains a number of language enhancements to support new Reporting functionality:

REPORT FORM CommandDisplays or prints out a report specified by a report definitions file. This command has been enhanced to support Report Listener objects.
SET REPORTBEHAVIOR CommandControls use of Report Preview and Report Output applications with the Visual FoxPro Report System.
SYS(2024) – Detect Report CancellationDetermines if user canceled out of a running report.

Additionally, there are improvements to the following related Printing language elements:

SYS(1037) – Page Setup Dialog BoxDisplays Visual FoxPro default or report Page Setup dialog box or sets printer settings for the default printer in Visual FoxPro or for the report printer environment. In this version, a new nValue parameter is available.
APRINTERS( ) FunctionReturns a five-column array with the name of the printer, connected port, driver, comment, and location. The last three columns are available if the new optional parameter is passed.
GETFONT( ) FunctionContains an additional setting to display only those fonts available on the current default printer and clarified values for the language script.

New Reporting functionality is described in more detail in separate Reporting topics. For more information, see Guide to Reporting Improvements.

Specifying Arrays with More Than 65K Elements

Note

Array sizes can also be limited by available memory, which affects performance, especially for very large arrays. Make sure your computer has enough memory to accommodate the upper limits of your arrays.

The Library Construction Kit, which contains the files Pro_Ext.h, WinAPIMS.lib, and OcxAPI.lib, still has a limit of 65,000 elements. For more information about these files, see Accessing the Visual FoxPro API, How to: Add Visual FoxPro API Calls, and How to: Build and Debug Libraries and ActiveX Controls. The SAVE TO command does not support saving arrays larger than 65,000 elements.

For more information, see Visual FoxPro System Capacities and DIMENSION Command.

STACKSIZE Setting Increases Nesting Levels to 64k

For operations such as the DO command, you can change the default number of nesting levels from 128 levels to 32 and up to 64,000 levels of nesting by including the new STACKSIZE setting in a Visual FoxPro configuration file.

Note

You can change the nesting level only during Visual FoxPro startup.

For more information, see Special Terms for Configuration Files and Visual FoxPro System Capacities.

Program and Procedure File Size Is Unrestricted

In previous versions of Visual FoxPro, the size of a procedure or program could not exceed 65K. Visual FoxPro now removes this restriction for programs and procedures. For more information, see Visual FoxPro System Capacities.

PROGCACHE Configuration File Setting

In previous versions of Visual FoxPro, you could not specify the program cache size or amount of memory reserved to run programs. This configuration file setting allows you to control this. It is especially useful for MTDLL scenarios. For more information, see Special Terms for Configuration Files.

ICASE( ) Function

The new ICASE( ) function makes it possible for you to evaluate a list of conditions and return results depending on the result of evaluating those conditions. For more information, see ICASE( ) Function.

TTOC( ) Converts DateTime Expressions to XML DateTime Format

You can convert a DateTime expression into a character string in XML DateTime format by passing a new optional value of 3 to the TTOC( ) function. For more information, see TTOC( ) Function.

SET COVERAGE Command Available at Run Time

The SET COVERAGE command is now available at run time so that you can debug errors that occur at run time but not at design time. For more information, see SET COVERAGE Command.

CLEAR ERROR Command

The new ERROR clause for the CLEAR command makes it possible to reset the error structures as if no error occurred. This affects the following functions:

The AERROR( ) function will return 0.
The ERROR( ) function will return 0.
The output from MESSAGE( ), MESSAGE(1) and SYS(2018) will return a blank string.

The CLEAR command should not be used with the ERROR clause within a TRY…CATCH…FINALLY structure. For more information, see CLEAR Commands.

Write Options Dialog Settings to Registry Using SYS(3056)

The SYS(3056) function can now be used to write out settings from the Options dialog box to the registry.

SYS(3056 [, nValue ])

The following table lists values for nValue.Expand table

nValue	Description
1	Update only from registry settings, with the exception of SET commands and file locations.
2	Write out settings to the registry.

For more information, see SYS(3056) – Read Registry Settings.

FOR EACH … ENDFOR Command Preserves Object Types

Visual FoxPro now includes the FOXOBJECT keyword for the FOR EACH … ENDFOR command to support preservation of native Visual FoxPro object types.

FOR EACH objectVar [AS Type [OF ClassLibrary ]] IN Group FOXOBJECT

Commands

[EXIT]

[LOOP]

ENDFOR | NEXT [Var]

The FOXOBJECT keyword specifies that the objectVar parameter created will be a Visual FoxPro object. The FOXOBJECT keyword only applies to collections where the collection is based on a native Visual FoxPro Collection class. Collections that are COM-based will not support the FOXOBJECT keyword.

For more information, see FOR EACH … ENDFOR Command.

SET PATH Command Enhancements

The SET PATH command now supports the ADDITIVE keyword. The ADDITIVE keyword appends the specified path to the end of the current SET PATH list. If the path already exists in the SET PATH list, Visual FoxPro does not add it or change the order of the list. Paths specified with the ADDITIVE keyword must be strings in quotes or valid expressions.

In addition, the length of the SET PATH list has been increased to 4095 characters.

For more information, see SET PATH Command.

Trim Functions Control Which Characters Are Trimmed

It is now possible to specify which characters are trimmed from an expression when using the TRIM( ), LTRIM( ), RTRIM( ), and ALLTRIM( ) functions.

TRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, ...]]])

LTRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, ...]]])

RTRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, ...]]])

ALLTRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, ...]]])

You can specify that the trim is case-insensitive using the nFlag value of 0 bit and 1.

The cParseChar parameter specifies one or more character strings to be trimmed from cExpression. A maximum of 23 strings can be specified in cParseChar.

By default, if cParseChar is not specified, then leading and trailing spaces are trimmed from character strings or 0 bytes are removed for Varbinary data types.

The cParseChar parameters are applied in the order they are entered. When a match is found, cExpression is truncated and the process repeats from the first cParseChar parameter.

For more information, see the TRIM( ) Function, LTRIM( ) Function, RTRIM( ) Function, and ALLTRIM( ) Function topics.

ALINES( ) Offers More Flexible Parsing Options

The ALINES( ) function has been enhanced to provide several additional options such as case-insensitive parsing and improved handling of empty array elements. These options are available using the new nFlags parameter that replaces the older lTrim 3rd parameter. For more information, see ALINES( ) Function.

Improvements in TEXT…ENDTEXT Statement

You can use the TEXT…ENDTEXT command to eliminate line feeds using the new PRETEXT setting. A new FLAGS parameter controls additional output settings. For more information, see TEXT … ENDTEXT Command.

Include Delimiters in STREXTRACT( ) Results

The STREXTRACT( ) function has a new nFlags setting that allows you to include the specified delimiters with the returned expression. For more information, see STREXTRACT( ) Function.

STRCONV( ) Enhanced to Allow for Code Page and FontCharSet

For certain conversion settings, you can specify an optional Code Page or Fontcharset setting for use in the conversion. For more information, see STRCONV( ) Function.

TYPE( ) Determines if an Expression is an Array

The TYPE( ) function accepts the parameter, 1, to evaluate an expression to determine if it is an array.

Type(cExpression, 1)

The following character values are returned if the 1 parameter is specified.Expand table

Return Value	Description
A	cExpression is an array.
U	cExpression is not an array.
C	cExpression is a collection.

cExpression must be passed as a character string.

For more information, see TYPE( ) Function.

BINTOC( ) and CTOBIN( ) Have Additional Conversion Capabilities

The BINTOC( ) and CTOBIN( ) functions have update or new parameters that provide you with more control over the output of these functions. Additionally, these enhancements offer some improvements for working with Win32 API routines. For more information, see BINTOC( ) Function and CTOBIN( ) Function.

MROW( ) and MCOL( ) Can Detect the Position of the Mouse Pointer

The MROW( ) and MCOL( ) functions now have a zero (0) parameter for detecting the position of the mouse pointer based on the currently active form instead of the form returned by the WOUTPUT( ) function. Although they are typically reference the same form, if the AllowOutput property of the form is set to False (.F.), WOUTPUT( ) does not return the current active form. The mismatch of references and can lead to unexpected results. By using the zero (0) parameter, you can avoid misplacing items, such as Shortcut menus, as the currently active form is always used.

For more information, see MROW( ) Function and MCOL( ) Function.

INPUTBOX( ) Returns A Cancel Operation

The INPUTBOX( ) function contains an additional parameter that allows you to determine if the user canceled out of the dialog. For more information, see INPUTBOX( ) Function.

AGETCLASS( ) Supported for Runtime Applications

The AGETCLASS( ) fiunction is now supported for runtime applications. For more information, see AGETCLASS( ) Function.

SYS(2019) Extends Handling of Configuration Files

You can use SYS(2019) to obtain the name and location of both internal and external configuration files. For more information, see SYS(2019) – Configuration File Name and Location.

SYS(2910) Controls List Display Count

You can control the number of items that appear in a drop-down list such as the one used by AutoComplete Property. This is the setting that is available on the View Tab, Options Dialog Box of the Options Dialog Box (Visual FoxPro).

For more information, see SYS(2910) – List Display Count.

SYS(3008) Turns Off Hyperlink Tip

Visual FoxPro will display a tip such as “CTRL+Click to follow the link” when you hover over a hyperlink in the editor. If you desire to not have this tip show, you can use SYS(3008) to turn it off. This function is also useful for international applications where you do not want to display the English text for this tip. For more information, see SYS(3008) – Hyperlink Tooltips.

SYS(3065) Internal Program Cache

You can obtain the internal program cache (PROGCACHE configuration file setting). For more information, see SYS(3065) – Internal Program Cache.

SYS(3101) COM Code Page Translation

You can now specify a code page to use for character data translation involving COM interoperability. For more information, see SYS(3101) – COM Code Page Translation.

Bidirectional Support for Tooltips and Popups

For international applications that display text from right to left, you can use the following new enhancements to control text justification:

SYS(3009) – right justifies text in ToolTips.
DEFINE POPUP…RTLJUSTIFY – right justifies items in a popup, such as a shortcut menu.
SET SYSMENU TO RTLJUSTIFY – right justifies an entire menu system.

The SYS(3009) function is a global setting. For more information, see SYS(3009) – Bidirectional Text Justification for ToolTips, DEFINE POPUP Command and SET SYSMENU Command.

Enhanced Font Script Support

Visual FoxPro 9.0 contains a number of enhancements that extend ability to specify a Font Language Script (or FontCharSet) along with existing Font settings:

SYS(3007) – specifies a FontCharSet for ToolTips. This is a global setting.
FONT Clause – the following table lists commands that support an optional FONT clause that allows for specification of a FontCharSet in the following format:FONT cFontName [, nFontSize [, nFontCharSet]]Expand tableCommandDEFINE MENUDEFINE POPUPDEFINE BARDEFINE PADDEFINE WINDOWMODIFY WINDOWBROWSE/EDIT/CHANGE?/??
Browse – the Font Dialog Box that you can invoke by selecting the Font menu item from the Table menu with a Browse Window active now allows for selection of a font language script. You can specify a global default font script from the IDE Tab, Options Dialog Box in the Options Dialog Box (Visual FoxPro). To do this, you must first check the Use font script checkbox.
Editors – the Font Dialog Box that you can invoke with an editor window active by selecting the Font menu item from the Format menu or right-click shortcut menu Edit Properties Dialog Box now allows for selection of a font language script. You can specify a global default font script from the IDE Tab, Options Dialog Box in the Options Dialog Box (Visual FoxPro). To do this, you must first check the Use font script checkbox.

For more information, see SYS(3007) – ToolTipText Property Font Language Script, IDE Tab, Options Dialog Box, and FontCharSet Property.

You can specify how long a ToolTip is displayed if the mouse pointer is left stationary. For more information, see _TOOLTIPTIMEOUT System Variable.

Tablet PC Features

The following features are available to assist with applications designed to run on a Tablet PC computer.

ISPEN( ) – determines if the last Visual FoxPro application mouse event on a Tablet PC was a pen tap.
_SCREEN.DisplayOrientation – this read-write property specifies the screen display orientation for a Tablet PC. The value returned is the current orientation.
_TOOLTIPTIMEOUT – specifies how long a ToolTip is displayed if the mouse pointer is left stationary.

For more information, see ISPEN( ) Function, DisplayOrientation Property, and _TOOLTIPTIMEOUT System Variable.

Windows Message Event Handling

A power broadcast message used to intercept standby or power-down activities.
Media insertion and removal events, such as the insertion of a CD into a drive.
The insertion and/or removal of a Plug and Play hard disk (e.g., USB Drive).
Interception of screen saver queries to stop the screen saver from activating.
Operating system level font changes and Windows XP Theme changes.
New network connections/shares added or removed from system.
Switching between applications.

You can use the Visual FoxPro BINDEVENT functions to register (and unregister) event handlers used to intercept messages (i.e., Win32 API window messages that get processed by the Win32 WindowProc function). See MSDN for more details.

The new BINDEVENT( ) syntax requires the hWnd (integer) of the window receiving the message you desire to intercept, and the specific message itself (integer). For example, power-management events such as standby and power-down use the Win32 WM_POWERBROADCAST message (value of 536).

BINDEVENT(hWnd, nMessage, oEventHandler, cDelegate)

The following example illustrates detection of a Windows XP Theme change:Copy

#DEFINE WM_THEMECHANGED    0x031A
#DEFINE GWL_WNDPROC    (-4)
PUBLIC oHandler
oHandler=CREATEOBJECT("AppState")
BINDEVENT(_SCREEN.hWnd, WM_THEMECHANGED, oHandler, "HandleEvent")
MESSAGEBOX("Test by changing Themes.")
DEFINE CLASS AppState AS Custom
nOldProc=0
PROCEDURE Destroy
    UNBINDEVENT(_SCREEN.hWnd, WM_THEMECHANGED)
ENDPROC
PROCEDURE Init
    DECLARE integer GetWindowLong IN WIN32API ;
        integer hWnd, ;
        integer nIndex
    DECLARE integer CallWindowProc IN WIN32API ;
        integer lpPrevWndFunc, ;
        integer hWnd,integer Msg,;
        integer wParam,;
        integer lParam
    THIS.nOldProc=GetWindowLong(_VFP.HWnd,GWL_WNDPROC)
ENDPROC
PROCEDURE HandleEvent(hWnd as Integer, Msg as Integer, ;
    wParam as Integer, lParam as Integer)
    lResult=0
    IF msg=WM_THEMECHANGED
        MESSAGEBOX("Theme changed...")
    ENDIF
    lResult=CallWindowProc(this.nOldProc,hWnd,msg,wParam,lParam)
    RETURN lResult
ENDPROC
ENDDEFINE

The following SYS( ) functions are also available to assist with handing these events:

SYS(2325) – returns the hWnd of a client window from the parent window’s WHANDLE.
SYS(2326) – returns a Visual FoxPro WHANDLE from a window’s hWnd.
SYS(2327) – returns a window’s hWnd from a Visual FoxPro window’s WHANDLE.

For more information, see BINDEVENT( ) Function, UNBINDEVENTS( ) Function, and AEVENTS( ) Function. Also, see SYS(2325) – WCLIENTWINDOW from Visual FoxPro WHANDLE, SYS(2326) – WHANDLE from a Window’s hWnd, and SYS(2327) – Window’s hWnd from Visual FoxPro WHANDLE for related topics. Refer to MSDN as reference source for details on specific window messages.

Interactive Development Environment (IDE) Enhancements

Article
07/09/2007

Additional Project Manager Shortcut Menu Commands

When docked, the Project Manager window contains the following additional shortcut menu commands that are available on the Project menu:

CloseCloses the Project Manager.
Add Project to Source ControlCreates a new source control project based on the current project. Available only when a source code control provider is installed and specified on the Projects tab in the Options dialog box.
ErrorsDisplays the error (.err) file after running a build.
RefreshRefreshes the contents of the Project Manager.
Clean Up ProjectRemoves deleted records from the Project Manager (.PJX) file.

Modifying a Class Library from the Project Manager

When you select a class library (.vcx) file in the Project Manager, you can now open and browse class libraries by clicking the Modify button. The class library opens in the Class Browser. For more information, see How to: Open Class Libraries.

Set Font of Project Manager

You can change the text font settings for the Project Manager window. Right-click the Project Manager window (outside of the tree hierarchy window) and choose Font.

Generating Message Logs During Project Build and Compile

In the current release, Visual FoxPro writes build status and error messages to the .err file as they occur during the build process. If the build process is interrupted, you can open the .err file opens to review the errors.

Note

If no errors occur during the build, the .err file is deleted.

If the Debug Output window is open, build status and error messages appear in the window. You can save messages from the Debug Output window to a file.

For more information, see How to: View and Save Build Messages.

Properties Window Enhancements

Design time support for entering property values greater than 255 characters and extended characters, such as CHR(13) (carriage return) and CHR(10) (linefeed), has been added to visual class library (.vcx) and form (.scx) files. You can now enter up to 8k characters in length. NoteExtended property value support is only available through the Properties Window (Zoom dialog box) for custom user-specified properties as well as certain native ones such as CursorSchema and Value. For properties not supported, you can still specify values which are longer than 255 characters, or contain carriage returns and linefeeds by assigning them in code such as during the object’s Init Event.The Zoom dialog box and Expression Builder dialog box have been updated to support this. The Properties window includes a Zoom (Z) button that appears next to the property settings box for appropriate properties. WarningProperty values that exceed 255 characters or include carriage return and/or linefeed characters are stored in a new format inside the .vcx or .scx file. If you attempt to modify these classes in a prior version, an error occurs.This feature is particularly useful for setting the CursorAdapter CursorSchema property to any schema expression when schemas might exceed 255 characters.
The Properties window font can now be specified by the new Font shortcut menu option. This new menu replaces the Small, Medium and Large font menu items used in prior versions. This font is also used in the description pane, and object and property value dropdowns. NoteBold and italic font styles are reserved for non-default property values and read-only properties, respectively. If a bold or italic font style is chosen, then the Properties window inverts the displayed behavior. For example, if one chooses an italic font style, read-only properties appear in normal font style and all others in italic.
Colors can be specified for certain types of properties by right clicking on the Properties Window and selecting following menu items:
- Non-Default Properties ColorSets color for properties whose values have changed from default setting (same properties that are displayed when the Non-Default Properties Only menu item is selected).
- Custom Properties ColorSets color for custom properties.
- Instance Properties ColorSets color for custom properties that have been added to the current class instance (same properties that appear in bold in the Edit Property/Method Dialog Box).
NoteIf a conflict exists between color settings, the Instance setting takes priority followed by the Non-Default one.

For more information, see Zoom <property> Dialog Box, Expression Builder Dialog Box, CursorSchema Property, and Properties Window (Visual FoxPro).

MemberData Extensibility

The MemberData extensibility architecture lets you provide metadata for class members (properties, methods and events). With MemberData, you can specify a custom property editor, display a property on the Favorites tab, or change the capitalization in the Properties Window (Visual FoxPro).

For more information, see MemberData Extensibility.

Setting Default Values for New Properties

When adding a new property to a class, you can specify an initial value other than the default in the New Property dialog box. Subclasses inherit these default values unless you reset the default values to the parent class. In previous versions, you had to set the default value for the new property by selecting the property in the Properties window and setting the default value.

For more information, see How to: Add Properties to Classes.

Document View Sort Options

You can now sort items in the Document View window by name for forms and visual class libraries.

See Document View Window for more information on sorting items in the Document View Window.

Compiling Code in the Background

When the single and current line of code that you are typing contains invalid syntax, Visual FoxPro displays the line of code with the formatting style selected in the Editor tab of the Options dialog box.

Note

Syntax coloring must be turned on for background compilation to function. Background compilation does not detect invalid syntax in multiple lines of code, including those containing continuation characters.

For more information, see How to: Display and Print Source Code in Color.

Rich Text Format (RTF) Clipboard Support

Visual FoxPro now supports copying in RTF (Rich Text Format) to the clipboard. Visual FoxPro preserves the style (bold, italic, and underline) and color attributes.

RTF is supported only in the FoxPro editors that allow for syntax coloring, such as the Command window and editing windows opened with MODIFY COMMAND Command. The RTF clipboard format is only supported when syntax coloring is enabled such as from Edit Properties Dialog Box. You can disable RTF clipboard format with the _VFP EditorOptions Property.

The _CLIPTEXT System Variable does not support RTF.

Find Dialog Box Improvements

The following improvements were made to Find support:

If a word is selected in a Visual FoxPro editor, the Find Dialog Box (Visual FoxPro) when opened now displays the word in the Look For drop-down box. If Find has not yet been used for a running instance of Visual FoxPro, a word positioned under the insertion pointer will appear in the Look For drop-down. If multiple words are selected, only the first word appears in the drop-down (use copy and paste to enter multiple words).
When a Browse window is open and you search for a word with the Find dialog box, you can search for the word again (Find Again) after the Find dialog box is closed by pressing the F3 key.
You can now use Find to search for content in Name column of the Watch and Locals debug windows (see Debugger Window). When searching object members, Find searches in these debug windows are limited to nodes that have been expanded and one level below.

View Constants in Trace Window

Constants (#DEFINE values) can be viewed in the Trace Window when you hover over it with the mouse.

Note

Visual FoxPro evaluates constants as expressions in the Trace Window and may have difficulty interpreting a specific #DEFINE when you hover over it with the mouse. Consequently, if there are multiple expressions on a line, they are all displayed in the value tip.

Printing Selected Text in Editor Windows

You can print selected text from Visual FoxPro editor windows. When you have text selected in the editor window, the Selection option in the Print dialog box is available and selected.

Note

If a partial line is selected, the entire line is printed.

For more information, see Print Dialog Box (Visual FoxPro).

System Font Improvements

To improve legibility on high-resolution monitors, Error dialog boxes and the Zoom <property> Dialog Box in the Properties window now use the Windows Message Box text font.

In Windows XP, the Windows Message Box text font is set by opening Display in the Control Panel, and then clicking Advanced on the Appearance tab.

IntelliSense Saves Settings Between User Sessions

Visual FoxPro now saves IntelliSense settings, such as turning IntelliSense on, between user sessions. These settings are controlled by the _VFP EditorOptions property. In addition, the settings in the _VFP EditorOptions property are saved in the FoxUser.dbf resource file. For more information, see EditorOptions Property.

IntelliSense in Memo Field Editor Window

Visual FoxPro includes IntelliSense support in Memo field editor windows when syntax coloring is turned on.

IntelliSense Available for Runtime Applications

Selected IntelliSense features are available at run time in distributed Visual FoxPro 9.0 applications. In order to use IntelliSense at run time, you need to set the _FOXCODE and _CODESENSE variables, and EditorOptions Property.

Note

With runtime applications, syntax coloring does not need to be turned on for an editor to support IntelliSense.

For more information, see IntelliSense Support in Visual FoxPro, _FOXCODE System Variable, _CODESENSE System Variable and EditorOptions Property.

IntelliSense Support in WITH … ENDWITH and FOR EACH … ENDFOR Commands

Visual FoxPro now supports IntelliSense within the WITH … ENDWITH Command and FOR EACH … ENDFOR Command.

WITH ObjectName [AS Type [OF ClassLibrary]]

Commands

ENDWITH

FOR EACH ObjectName [AS Type [OF ClassLibrary]] IN Group

Commands

[EXIT]

[LOOP]

ENDFOR

The Type parameter can be any valid type, including data types, class types, or ProgID. If the class name cannot be found, Visual FoxPro disregards Type and does not display IntelliSense for it.

Note

The type reference does not affect the functionality of the application at run time. The type reference is only used for IntelliSense.

The ObjectName expression can refer to a memory variable or an array.

The ClassLibrary parameter must be in a path list that is visible to Visual FoxPro. You must specify a valid class library; references to existing objects are not valid. If Visual FoxPro cannot find the specified class library, IntelliSense does not display.

Types expressed as ProgIDs and class libraries do not require quotation marks (“”) to enclose them unless their names contain spaces.

When a user types the AS keyword, IntelliSense displays a list of types registered in the FoxCode.dbf table with Type “T”. If you have specified a valid type, typing a period within a WITH … ENDWITH or a FOR EACH … ENDFOR command displays IntelliSense for that object reference.

Visual FoxPro supports IntelliSense for nested WITH … ENDWITH and FOR EACH … ENDFOR commands. The following is an example of nested WITH … ENDWITH commands in a class defined in a program (.prg) file named Program1.prg. To use, paste this code into a new program named Program1.prg, save it and then type a period (.) inside the WITH … ENDWITH block.Copy

DEFINE CLASS f1 AS form
MyVar1 = 123
ADD OBJECT t1 AS mytext
PROCEDURE Init
  WITH THIS AS f1 OF program1.prg
    WITH .t1 AS mytext OF program1.prg
    ENDWITH
  ENDWITH
ENDPROC
ENDDEFINE

DEFINE CLASS mytext as textbox
MyVar2 = 123
ENDDEFINE

IntelliSense provides limited List Values functionality for selected properties that begin with a “T” or “F” within a WITH … ENDWITH or FOR EACH … ENDFOR command. This is done to avoid possible conflicts with the common property values True (.T.) and False (.F.). If you just type “.T” or “.F” and press Enter, the word selected in the List Value drop-down does not expand. You need to type at least two letters for IntelliSense to insert the selected word.

Enhancements to Visual FoxPro Designers

Article
07/09/2007

Report and Label Designers

You can use the Report Builder available in the Report Designer and Label Designer to perform reporting tasks, configure settings, and set properties for reporting features such as report layout, report bands, data groups, report controls, and report variables. For example, you can perform the following tasks:

Prevent users from modifying reports, report controls, and report bands when editing the report in protected mode.
Display captions instead of expressions for Field controls at design time.
Display user-defined ToolTips for report controls.
Set the language script for reports.
Save the report data environment as a class.

By default, the Report Builder activates when you interact with the Report and Label designers. However, you can use the _REPORTBUILDER system variable to specify ReportBuilder.app. The Report Builder consolidates, replaces, and adds to the functionality found in previous Report Designer user interface elements, which remain in the product and are available by setting _REPORTBUILDER. You can write custom report builders to extend reporting functionality and output or run reports with report objects. For more information, see Working with Reports and _REPORTBUILDER System Variable.

Menu Designer

You can set the _MENUDESIGNER system variable to call your own custom designer for creating menus.Copy

_MENUDESIGNER = cProgramName

For more information, see _MENUDESIGNER System Variable.

Table Designer

The Table Designer accommodates the following data enhancements:

New Data Types: Varchar, Varbinary and Blob
Binary Indexes
For more information, see Data and XML Feature Enhancements.

Query and View Designers

You can use spaces in table names specified in SQL statements in the Query and View designers if you provide an alias. For example, editing the following statement is valid in the View and Query designers:Copy

SELECT * from dbo."Order Details" Order_Details

For more information, see SELECT – SQL Command.

Data Environment Designer

The full path to the database (DBC) appears in the status bar when you select a database in the Add Table or View Dialog Box.

Class and Form Designers

The name of the class you are modifying appears in the title bar for the following dialog boxes:

The View menu for the Form Designer offers both options for specifying the tab order on forms: Assign Interactively or Assign by List.

In the Class, Form, and Report designers, you can use the following keyboard shortcut commands to adjust spacing between selected items.Expand table

Shortcut	Description
ALT+Arrow Key	Adjusts the spacing between the selected objects by one pixel in the direction of the arrow key.
ALT+CTRL+Arrow Key	Adjusts the spacing between the selected objects by one grid scale in the direction of the arrow key.

For more information, see Interactive Development Environment (IDE) Enhancements.

Miscellaneous Enhancements

Article
07/09/2007

Printing Dialog Boxes and Printing Language Enhancements

Visual FoxPro includes various enhancements for its printing dialog boxes and printing language.

Visual FoxPro uses the latest operating system dialogs for Printer Setup and other related printing operations. If the user is running on Windows XP, the dialogs will appear Themed.

The following language functions contain new enhancements that impact general printing operations:

SYS(1037) – Page Setup Dialog Box
APRINTERS( ) Function
GETFONT( ) FunctionContains an additional setting to display only those fonts available on the current default printer and clarified values for the language script.

For more information, see Language Enhancements.

Improved Support for Applications Detecting Terminal Servers

Visual FoxPro now automatically includes support for applications that are generated by the build process to detect whether they are running on a Terminal Server and prevent loading of unnecessary dynamic-link library (.dll) files that can impact performance. For more information, see BUILD EXE Command.

Updated Dr. Watson Error Reporting to 2.0

Visual FoxPro includes and updates its product error reporting to support Dr. Watson 2.0. This version includes new and improved error reporting, logging, and auditing features. For example, errors are logged while offline and are posted when you reconnect.

Anchor Editor Application

Visual FoxPro 9.0 allows you to create a custom property editor through extended metadata attributes for class members. Through this new extensibility model, you now have the ability to extend the functionality of class properties and methods, allowing you to create design-time enhancements such as a custom property editor. For more information about creating custom property editors, see MemberData Extensibility.

A sample custom property editor, Anchoreditor.app, is included in Visual FoxPro 9.0 and is located in the Wizards directory. This application is run when the Anchor property is double-clicked in the Properties window, or by choosing the Anchor property in the Properties window and clicking the ellipsis button (…).Expand table

Term	Definition
Anchor but do not resize vertically	Specifies that the center of the control is anchored to the top and bottom edges of its container but the control does not resize.
Anchor but do not resize horizontally	Specifies that the center of the control is anchored to the left and right edges of its container but the control does not resize.
Border values	Displays the current settings for the border values.
Common settings	Selects commonly used settings for the Anchor property.
Sample	Click the Sample button to test the current anchor value on a sample form.
Anchor value	The Anchor property value that is the combination of the current settings for the border values.

Class Browser

You can open and view class definitions that are specified within a program (.prg) similarly to class libraries (.vcx). You can select a program (.prg) from the File Open/Add dialog box. See Class Browser Window for more information.

CursorAdapter Builder

The CursorAdapter Builder contains a number of enhancements that correspond to improvements added to the CursorAdapter class. See CursorAdapter Builder for more information.

Toolbox

The Toolbox (Visual FoxPro) is now dockable and can be docked to the desktop or other IDE windows.

Code References

The Code References Window has been updated with the following minor enhancements:

For the results grid, the Options dialog provides a new setting to show separate columns for class, method, and line, rather than concatenating them all in a single column.
You can now sort by method name by right-clicking on the method header or selecting the Sort By menu item from the right-click menu.
With the results tree list, the following new right-click menu options are available:
- Expand All – expands all nodes
- Collapse All – collapses all nodes
- Sort by Most Recent First – puts the most recent result sets at the top of the list rather than at the bottom

Note

The results beneath a tree node are not filled until the node is expanded. This is done to increase performance if you have a large result sets.

GENDBC.PRG

The Gendbc.prg program which generates program used to recreate a database has been updated with following minor enhancements:

Support for new Varchar, Varbinary and Blob field types
Support for AllowSimultaneousFetch, RuleExpression, and RuleText properties for views

Environment Manager Task Pane

The Environment Manager Task Pane has been enhanced with the following features:

Form and Formset Template Classes – you can now specify template classes for new forms and formsets with each environment set. This is setting specified in the Forms Tab, Options Dialog Box.
Field Mapping – you can set classes to use for when you drag and drop a field onto a form with each environment set. This is setting specified in the Field Mapping Tab, Options Dialog Box.
Resource File – the Environment Manager now supports setting of a Resource File. If one does not exist, the Environment Manager will optionally create it when the environment is set.
The Environment Manager now contains a new <default field mapping> environment set. This set is created the first time the Environment Manager is run so that the original default Options dialog settings for Field Mapping and Form Template Classes can be saved and restored later if desired.
For more information, see Environment Manager Task Pane.

Data Explorer Task Pane

The Task Pane Manager includes the new Data Explorer Task Pane which allows you to view and work with remote data sources such as SQL Server databases.

For more information, see Data Explorer Task Pane.

MemberData Editor

The new MemberData Editor lets you edit MemberData for your classes. The MemberData Editor is available from the Class menu when the Class Designer is active. The MemberData Editor is also invoked silently when you right-click on an item in the Properties Window and select the Add to Favorites menu item. The MemberData Editor application is specified as a builder and can be changed in the Builder.dbf table located in your Wizards directory.

For more information, see MemberData Editor and MemberData Extensibility.

New Foundation Classes (FFC)

The following are new FoxPro Foundation classes added to this version of Visual FoxPro:

_REPORTLISTENER.VCX – a set of core classes you can use when creating custom report listeners.
_FRXCURSOR.VCX – a class library used for working with report (FRX) files.
_GDIPLUS.VCX – a set of classes you can use for GDI+ handling. This is intended primarily for use when creating custom report listener classes.

New Solution Samples

Visual FoxPro 9.0 contains many new samples that show off new features in the product. To see a list of these samples, select the Solution Samples task pane in the Task Pane Manager and expand the New in Visual FoxPro 9.0 node.

Changes in Functionality for the Current Release

Article
07/09/2007

Critical Changes Functionality changes most likely to affect existing code when running under this version of Visual FoxPro. It is extremely important that you read this section.
Important Changes Functionality changes that might affect existing code when running under this version of Visual FoxPro.
Miscellaneous Changes Functionality changes you should know about but are not likely to impact existing code.
Removed Items Features or files that existed in prior versions of Visual FoxPro but are no longer included.

Critical Changes

Critical behavior changes will most likely to affect existing code when running under this version of Visual FoxPro.

SQL SELECT IN (Value_Set) Clause

In previous versions of Visual FoxPro, the IN (Value_Set) clause for the WHERE clause in the SQL SELECT command is mapped to INLIST( ) function. In the current release, Visual FoxPro might stop evaluating values and expressions in the Value_Set list when the first match is found. Therefore, if the IN clause is not Rushmore-optimized, you can improve performance by placing values most likely to match in the beginning of the Value_Set list. For more information, see the description for the IN clause in the SELECT – SQL Command topic and the INLIST( ) Function.

Conversion of INLIST( ) Function in the Query Designer and View Designer

In previous versions of Visual FoxPro, the Query Designer and View Designer convert INLIST( ) function calls in the WHERE clause of the SQL SELECT command into IN (Value_Set) clauses. In the current release, this conversion no longer occurs due to the differences between INLIST( ) and the SQL IN clause. INLIST( ) remains restricted to 24 arguments. For more information, see the description for the IN clause in the SELECT – SQL Command topic and the INLIST( ) Function.

Grids and RecordSource and ControlSource Properties

In Visual FoxPro 9.0 there is a change in Grid control behavior. When the RecordSource property for a Grid control is set, Visual FoxPro 9.0 resets all ControlSource properties to the empty string (“”) for all columns. In earlier versions of Visual FoxPro, the ControlSource properties were not properly reset, so problems could occur when a RecordSource with a different structure was later bound. This change may impact scenarios involving Access and Assign methods or BINDEVENT( ) function calls made against a Grid column’s ControlSource property.

Important Changes

Important changes might affect existing code when running under Visual FoxPro 9.0.

Reporting

Visual FoxPro contains many improvements for reporting. The following are behavior changes that could impact existing reports:

The Report Designer and Engine now make use of extensible components. You can control or eliminate use of design-time extensions by altering the value of _REPORTBUILDER System Variable. You control run-time extension use with the SET REPORTBEHAVIOR Command.
In Visual FoxPro 9’s new object-assisted reporting mode, report fields may need to be adjusted (widened) slightly. This is especially important for numeric data where a field that is not wide enough to display the entire number will show it instead as asterisks (*****). For more information about the changes to the Report System that required this change, and features of the GDI+ rendering engine and other changes related to it, see Using GDI+ in Reports. For migration strategy and recommendations, see Guide to Reporting Improvements.
For a table of additional, minor rendering differences between backward-compatible reporting mode and object-assisted reporting mode, see the table below.Expand tableRendering featureBehavior in backward-compatible modeBehavior in object-assisted modeRecommendationsTab stops (CHR(9) values included in report data)The width of a tab stop is determined by a number of characters in the font used.Tab stops are set at fixed-width positions, regardless of font.If you concatenated tabs with data in a stretching report layout element to create a table format within the element, you can often fulfill the same requirements using a second detail band in Visual FoxPro 9. Alternatively, change the number of tabs you concatenate with your data.Special characters and word-wrappingNon-breaking spaces are not respected; they are treated as normal space characters.Special characters such as non-breaking spaces (CHR(160)) and soft hyphens (CHR(173)) are correctly interpreted. As a result, words may wrap differently in output.Evaluate the results. In most cases, users will appreciate the change, because it more faithfully representing their original intentions in the text. If necessary, use the CHRTRAN( ) Function or STRTRAN( ) Function to replace these special characters with standard spaces and hyphens.Line spacing of multi-line objectsLine spacing is determined by a formula that does not take font properties into consideration. Lines in a multi-line object are individually rendered, so background colors for each line may appear to have a different width.GDI+ line spacing is dynamically determined using font characteristics. A multi-line object is rendered as a single block of text.Evaluate the results. In most cases, the change in line spaces will provide a more polished appearance, and in all cases this method of handling multi-line text provides better performance. If a report depends on the old style of spacing lines, you can adjust the ReportListener’s DynamicLineHeight Property to revert to the old behavior.Cursor images (.CUR files).CUR files can be used as image sources in reports..CUR files are not supported as image sources.Convert the cursor file to another, supported image format.Shape (Rounded Rectangle) curvatureLimited choices for curvature.More choices are available through the Report Builder Application dialog box interface, but some will not look the same way in backward-compatible mode and object-assisted mode.If reports have to run in both backward-compatible mode and object-assisted mode, or if they are designed in version 9.0 but must run in earlier versions, limit your choices of values of shape curvature to those allowed in the native Round Rectangle Dialog Box. If you are using the Style Tab, Report Control Properties Dialog Box (Report Builder) to design such reports, use the values 12, 16, 24, 32, and 99, to represent the native buttons, selecting the buttons from left to right. The default value in the Round Rectangle dialog box (second button) is 16.
When you create a Quick Report, by using the CREATE REPORT – Quick Report Command or by invoking the Quick Report… option on the Report menu, and if you have SET REPORTBEHAVIOR 90, the layout elements created by the Report Designer are sized differently from ones created for the same fields in previous versions. This change handles the additional width required by the new rendering mechanism of the report engine.
If you use the KEYBOARD Command or PLAY MACRO Command statements to address options on the Report menu, you may need to revise the keystrokes in these statements, as the menu has been reorganized.
Reports may take longer to open in the Report Designer if the report was previously saved with the Printer Environment setting enabled. You can improve performance by unchecking the Printer Environment menu item from the Report menu and re-saving the report. The saved Printer Environment is not critical for functioning of a report and is typically not recommended. Object-assisted report mode also respects different printers’ resolution settings, so saving resolution information for one printer in your report may have adverse effects in an environment with printers that have different resolutions. A saved Printer Environment may also have more adverse affects on REPORT FORM or LABEL commands invoked with the TO FILE option than they did in previous versions, if the associated printer setup is not available in the environment at runtime. In Visual FoxPro 9, the global default for this setting in the Reports Tab, Options Dialog Box, and for reports created in executable applications (.exe files), has been changed to unchecked.
Because of changes to the way Visual FoxPro 9 uses current printer settings to determine items such as print resolution and page height dynamically, a REPORT FORM or LABEL command will not run in object-assisted mode if there are no available printer setups in the environment or if the print spooler has been stopped. You will receive Error loading printer driver (Error 1958). If you need to run reports in an environment with no printer information, perhaps creating custom types of output that do not require printers, you can supply Visual FoxPro with the minimal set of information it needs to run your report by supplying a page height and page width from the appropriate Report Listener methods. For more information, see GetPageHeight Method and GetPageWidth Method.
By default, and by design, the Report Builder Application does not automatically show tables in the report’s Data Environment when you build report expressions. To better protect end-user design sessions, only tables you have explicitly opened, not all tables from the DataEnvironment, are available in the Expression Builder. With this change, you have the opportunity to set up the design session’s data exactly the way you want the end-user to see it, before you issue a MODIFY REPORT Command in your application. If you prefer the Report Designer’s old behavior, you can change the Report Builder Application to emulate it. For more information, see How to: Replace the Report Builder’s Expression Builder Dialog Box.
The ASCII keyword on the REPORT FORM Command is documented as following the <filename> parameter of the TO FILE<filename> clause. In earlier versions of Visual FoxPro, you could safely use the incorrect and unsupported syntax TO FILE ASCII<filename> instead. This incorrect syntax triggers an error in Visual FoxPro 9. Note that the ASCII keyword has no effect on object-assisted mode output provided by the Report Engine, although a ReportListener Object can be written to implement it.
The keyword NOCONSOLE has no default meaning in object-assisted reporting mode, because ReportListeners do not echo their rendering output to the current output window by default. However, a ReportListener can mimic backward-compatible mode in this respect, if desired. Refer to OutputPage Method for a complete example.
To facilitate development of run-time reporting extensions, the Report Engine now allows normal debugging procedures during a report run. If your error handling routine assumes it is impossible for a report to be suspended, this assumption will now be challenged. Refer to Handling Errors During Report Runs for a detailed look at the associated changes, and some suggestions for strategy.
REPORT FORM and LABEL commands are no longer automatically prohibited as user-interface-related commands in COM objects compiled into DLLs, when you run the commands in object-assisted mode. The restriction still applies to these commands when they are run in backward-compatible mode. (The topic Selecting Process Types explains why user-interface-related commands are prohibited in DLLs.) This change is not applicable to multi-threaded DLLs. A number of user-interface-related facilities also are not available in DLLs (whether single- or multi-threaded). For example, the TXTWIDTH( ) Function and TextWidth Method depend on a window handle to function, so they are not available in a DLL. The CREATE REPORT – Quick Report Command relies on the same facilities as TXTWIDTH(), and therefore is not available in a DLL. However, in many instances, creating custom output using a ReportListener does not require any user-interface activity, so a REPORT FORM or LABEL command can now be used productively in a DLL. Using the SYS(2335) – Unattended Server Mode function to trap for potential modal states, as well as the new SET TABLEPROMPT Command, is recommended. Refer to Server Design Considerations and Limitations for more information.
Changes have occurred to the handling of group headers and footers in multi-column reports, when the columns flow from left to right (label-style layout). The revised behavior is more useful and behaves consistently with the new detail header and footer bands as well. For a description of the change, see How to: Define Columns in Reports.
In previous versions, the NOWAIT keyword on the REPORT FORM and LABEL commands was not significant when the commands were issued in the Command window rather than in a program. In Visual FoxPro 9’s object-assisted mode, when previewing instructions are interpreted by the Report Preview Application, this keyword is significant no matter where you issue the command. The Report Preview Application uses the NOWAIT keyword, consistently, as an instruction to provide a modeless preview form. For more information about the Report Preview Application, see Extending Report Preview Functionality.
Visual FoxPro 8 introduced the NOPAGEEJECT keyword on the REPORT FORM and LABEL commands, but applied the keyword only to printed output. In Visual FoxPro 9, NOPAGEEJECT has significance for all output targets, including PREVIEW. This keyword provides chained or continued report runs for multiple REPORT FORM and LABEL commands. To facilitate this behavior in preview mode, and to allow you to apply customization instructions to multiple previews, the Report Output Application caches a single ReportListener object instance for preview output, causing a change in behavior for multiple modeless report commands (REPORT FORM … PREVIEW NOWAIT). In the past, you used multiple REPORT FORM… PREVIEW NOWAIT statements in a sequence, your commands resulted in multiple report preview windows. In Visual FoxPro 9, when SET REPORTBEHAVIOR 90, these commands will result in successive report previews being directed to a single report preview window. TipYou can easily invoke the old behavior by creating multiple ReportListener object references and associating one with each separate REPORT FORM or LABEL command, using the OBJECT keyword. For more information about using the OBJECT syntax, see REPORT FORM Command. For information about receiving multiple object references of the appropriate type from the Report Output Application, see Understanding the Report Output Application.
In the process of reviewing and overhauling the native Report Engine, a number of outstanding issues regarding band and layout element positioning were addressed. For example, a field element marked to stretch and sized to take up more than one text line’s height in the report layout might have inappropriately pushed its band’s exit events to the next page in Visual FoxPro 8. In Visual FoxPro 9, the band’s exit events occur on the same page. Additional revisions improve record-pointer-handling in footer bands, when bands stretch across pages. These changes are not specific to object-assisted output rendering. If you have relied on undocumented behavior providing exact band or layout control placement in a particular report, you should review that report’s behavior in Visual FoxPro 9.

Rushmore Optimization

When character values are indexed, all values are considered to be encoded using the table’s code page. In previous versions of Visual FoxPro, when the current Visual FoxPro code page differed from a table’s code page, any attempt to look for a character value within that table’s index resulted in an implicit translation of the value from the current Visual FoxPro code page into the table’s code page. This could cause SQL or other Rushmore optimizable commands to return or act upon incorrect records.

In Visual FoxPro 9 and later, by default, the optimization engine no longer uses existing character indexes for tables created with a non-current code page. Instead, Visual FoxPro builds temporary indexes to ensure correct results. This can result in a loss of optimization of SQL or other commands which were optimized in earlier VFP versions. To prevent this, ensure that the current Visual FoxPro code page returned by CPCURRENT( ) Function matches the table’s code page returned by CPDBF( ) Function. This requires either changing the current Visual FoxPro code page, or changing the table’s code page. For information on specifying the current Visual FoxPro code page, see Understanding Code Pages in Visual FoxPro. For information on specifying the code page for a table, see How to: Specify the Code Page of a .dbf File. If you cannot change either the Visual FoxPro codepage or the table codepage to match, you can force optimization to work as it did in Visual FoxPro 8 and prior versions using the SET ENGINEBEHAVIOR Command with either 80 or 70 as a parameter.

SQL SELECT Statements

A SELECT – SQL Command containing DISTINCT and ORDER BY clauses in which the ORDER BY field is not in the SELECT field list will generate an error in Visual FoxPro 9.0 with SET ENGINEBEHAVIOR 90 (Error 1808: SQL: ORDER BY clause is invalid.). The following example shows this:CopySET ENGINEBEHAVIOR 90 CREATE CURSOR foo (f1 int, f2 int) SELECT DISTINCT f1 FROM foo ORDER BY f2 INTO CURSOR res
A SELECT – SQL Command containing DISTINCT and HAVING clauses in which the HAVING field is not in the SELECT field list will now generate an error in Visual FoxPro 9.0 with SET ENGINEBEHAVIOR 90 (Error 1803: SQL: HAVING clause is invalid.). An error is reported because the HAVING field is not in projection and DISTINCT is used. The following example shows this:CopySET ENGINEBEHAVIOR 90 CREATE CURSOR foo (f1 int, f2 int) SELECT DISTINCT f1 FROM foo HAVING f2>1 INTO CURSOR res
The number of UNION statements that can be used in a SELECT – SQL Command is no longer limited to 9. Parentheses are not completely supported with UNION statements and unlike previous versions may generate an error. If two or more SELECT statements are enclosed in parenthesis, an error is generated during compile (Error 2196: Only a single SQL SELECT statement can be enclosed in parentheses.). This behavior is not tied to any SET ENGINEBEHAVIOR Command level. The following example shows this error:CopySELECT * FROM Table1 ; UNION ; (SELECT * FROM Table2 ; UNION ; SELECT * FROM Table3) The following example compiles without an error:CopySELECT * FROM Table1 ; UNION ; (SELECT * FROM Table2) ; UNION ; (SELECT * FROM Table3)

For more information, see SET ENGINEBEHAVIOR Command.

Disabling TABLEREVERT( ) Operations During TABLEUPDATE( ) Operations

For CursorAdapters, Visual FoxPro does not permit TABLEREVERT( ) operations during operations.

For more information, see TABLEREVERT( ) Function and TABLEUPDATE( ) Function.

Index Key Truncation during Index Updates

An error (Error 2199) is now generated when index key truncation is about to occur, typically during index creation or modification. This can happen with use of a key that contains an expression involving a Memo field, whose length in not fixed, such as in the following example:

INDEX ON charfld1 + memofld1 TAG mytag

Similar issues can also occur with the SQL engine (such as during a SQL SELECT command or View creation) where it might fail to build a temporary index to optimize a join evaluation if it is unable to accurately determine the maximum size of the key.

For more information, see Error building key for index “name”. (Error 2199).

Memo Field Corruption

Visual FoxPro will now detect if a Memo field in a class library (.vcx) or form (.scx) is corrupt when you try to open up that file in the designer. If the file contains a corrupt Memo field, an Error 41 such as following will occur:

Memo file <path>\myclass.VCT is missing or is invalid.

Additionally, similar Memo errors may occur if you have a Visual FoxPro table open and try to access contents of a corrupt Memo. The following sample code shows how you can detect the Error 41 memo file corruption:Copy

TRY
  USE myTable EXCLUSIVE NOUPDATE 
  SCAN
    SCATTER MEMO MEMVAR
  ENDSCAN
CATCH TO loError
  IF loError.ErrorNo=41
    * handle error here
  ENDIF
ENDTRY
USE IN myTable

While it is possible that loss of data may occur, the following sample code may assist in repairing some or the entire file:Copy

ON ERROR *
USE myclass.vcx
COPY TO myclass_bkup.vcx&&backup
COPY TO myclass2.vcx
USE
DELETE FILE myclass.vc*
RENAME myclass2.vcx TO myclass.vcx
RENAME myclass2.vct TO myclass.vct
COMPILE CLASSLIB myclass.vcx
ON ERROR

Visual Form and Class Extended Property Support

Visual FoxPro 9 allows you to create custom properties in your visual class (SCX or VCX file) whose values can contain carriage returns and/or be of length greater than 255 characters. If you specify a property with a value like this through the Properties Window (i.e., the Zoom dialog box), Visual FoxPro will store it in a format such that you will no longer be able to edit that class in older versions of Visual FoxPro.

Class Definitions

The ability to have a property assignment set to instantiated object is no longer supported in a class definition and will generate an error. The following example shows this.Copy

LOCAL oCustom
oCustom = CREATEOBJECT('cusTest')
DEFINE CLASS cusTest AS CUSTOM
    oRef = CREATEOBJECT('myclass')
ENDDEFINE
DEFINE CLASS myclass AS CUSTOM
ENDDEFINE

You can instead assign a property to an instantiated object reference in the Init event of your class.

Merge Modules for Redistributable Components

Visual FoxPro includes merge modules (MSM files) for use in redistributing shared components with your runtime applications. Merge modules are used by applications that can create Windows Installer based setups. For example, Visual FoxPro ships with merge modules that contain the Visual FoxPro runtime libraries as well as some common components including a number of ActiveX controls.

For Visual FoxPro 9, the VFP9RUNTIME.MSM merge module contains the runtime libraries that you will need for your custom redistributable application. The VFP9RUNTIME.MSM merge module also has dependencies on the merge modules containing the Microsoft VC 7.1 runtime library (MSVCR71.DLL) and the GDI+ graphics library (GDIPLUS.DLL). Because of these dependencies, if you select the VFP9RUNTIME.MSM merge module in a Windows Installer tool such as InstallShield, the other dependent merge modules will automatically be selected as well.

Note For Windows XP and higher operating systems, Visual FoxPro uses the GDI+ graphics library that is installed in your Windows System folder.

For Visual FoxPro 9, the merge module containing the VC runtime library no longer installs to the Windows System directory. Instead, this file is installed to your application’s directory. This is done in compliance with recommended component versioning strategies for Windows operating systems. The GDI+ library is installed into the same directory as your Visual FoxPro runtime libraries and is only installed on operating systems later than Windows XP (XP already includes the GDI+ library in its Windows System directory).

Tip There may be circumstances where you will want to install the VC or GDI+ library to another location such as the Windows System directory. You can do this with your Windows Installer application (e.g., InstallShield) by first selecting the merge module before selecting the VFP9RUNTIME.MSM one. Once you have selected a merge module, you can change its installation path.

There are new merge modules for MSXML3 and MSXML4 XML parser components. The MSXML 3.0 component consists of the following merge modules:

MSXML 3.0 (msxml3_wim32.msm)
Msxml3 Exception INF Merge Module (msxml3inf_wim32.msm)
WebData std library (wdstddll_wim32.msm)

There are two MSXML 4.0 modules that should be included with any custom setup:

MSXML 4.0 (msxml4sxs32.msm)
MSXML 4.0 (msxml4sys32.msm)

MTDLL Memory Allocation

Visual FoxPro contains a new PROGCACHE configuration file setting which specifies the amount of memory Visual FoxPro allocates at startup for running programs (program cache). This setting also determines memory allocated per thread for Visual FoxPro MTDLL COM Servers. In prior versions, this setting was not configurable and memory was allocated as a fixed program cache of a little over 9MB (144 * 64K). The new PROGCACHE setting allows you to set the exact size of the program cache or specify that dynamic memory allocation be used.

Since MTDLL COM Servers can use up a great amount of memory if many threads are created, it is important that memory be allocated more efficiently for these scenarios. In Visual FoxPro 9, the new default setting for MTDLL COM Servers is -2 (dynamic memory allocation). For more information, see Special Terms for Configuration Files.

Miscellaneous Changes

The following are miscellaneous changes that you should know about but are not likely to impact existing code.

CursorAdapter Changes

In the current version of Visual FoxPro, the following behavior changes apply to the CursorAdapter object:

You can no longer call TABLEREVERT( ) Function while a TABLEUPDATE( ) Function operation is in progress.
The ConversionFunc Property setting is now respected during ADODB.Recordset based updates.
The target record is now kept current in the ADODB.Recordset during CursorAdapter.After… events.

Grid SetFocus Supported for AllowCellSelection

You can now call a Grid control’s SetFocus Method and have the Grid receive focus when the AllowCellSelection Property is set to False (.F.) and the grid contains no records.

EXECSCRIPT Function

The EXECSCRIPT( ) Function now allows you to pass parameters by reference.

Additionally, Visual FoxPro 9.0 tightens syntax validation of calls made from concatenation of parameters. The following code, which worked in prior versions of Visual FoxPro, now properly causes an error because the CHR(13) character breaks the call into two lines whereas it is supposed to be part of the parameter for the EXECSCRIPT call.Copy

  cRecPauseScript = "EXECSCRIPT('" + ;
    "?123" + CHR(13) + ;
    "?456" + ;
    "')"
  _VFP.DoCmd(cRecPauseScript)

To make a valid call that does not cause a syntax error, you can use the following code:Copy

  cRecPauseScript = "EXECSCRIPT('?123'+CHR(13)+ '?456')"
  _VFP.DoCmd(cRecPauseScript)

Listbox Control Click Event

In the current version of Visual FoxPro, the PageUp, PageDown, Home and End keyboard keys now cause a Listbox control’s Click event to fire. In previous versions, these keys did not trigger the Click event to fire, unlike the arrow keys.

PEMSTATUS( ) Function Returns False for Hidden Native Properties

In previous versions of Visual FoxPro, the PEMSTATUS( ) function returned True (.T.) for hidden native properties of Visual FoxPro base classes when specifying a value of 5 for nAttribute. In the current release, PEMSTATUS( ) returns False (.F.) for these hidden native properties. For more information, see PEMSTATUS( ) Function.

Changes to Options Dialog Box

In the Options dialog box, the List display count option has been moved from the Editor tab to the View tab. For more information, see View Tab, Options Dialog Box.
In previous versions of Visual FoxPro, you could output all the settings in the Options Dialog Box (Visual FoxPro) to the Command Window by pressing the SHIFT key when choosing the OK button to close the dialog. In the current release, these settings are now sent to the Debug Output Window. The Debug Output window must be opened in order for the settings to be directed there.

FOXRUN.PIF

The FOXRUN.PIF file that is used by the RUN | ! Command is no longer installed in the Visual FoxPro root directory. If Visual FoxPro detects the presence of a FOXRUN.PIF file during a RUN command, it will use COMMAND.COM to execute the specified RUN command. This may not be the desired SHELL program to use for a particular operating system, especially newer ones like Windows XP in which CMD.EXE is preferable.

The current behavior for a RUN command without the existence of a FOXRUN.PIF file is that the RUN command will use the SHELL program specified by the operating system COMSPEC environment variable. With Windows XP, you can view and edit this variable by right-clicking your computer desktop’s My Computer icon and selecting the Properties dialog box (Advanced tab).

The FOXRUN.PIF file is still available in the Tools directory if you need it for a particular reason.

For more information, see RUN | ! Command.

SCATTER Command

The SCATTER command no longer allows for ambiguous use of both MEMVAR and NAME clauses in the same command. You can only include one of these clauses. In prior versions, the following code would not generate an error:Copy

USE HOME()+"SAMPLES\Data\customer.dbf"
SCATTER MEMVAR NAME oCust

For more information, see SCATTER Command.

SET DOHISTORY

The SET DOHISTORY command, which is included for backward compatibility, was updated to send output to the Debug Output Window instead of the Command Window as in prior versions.

SCREEN ShowTips Property

The default value for _SCREEN ShowTips Property has been changed from False (.F.) to True (.T.). This change was made because new Memo and Field Tips support is now dependent on this setting.

AllowCellSelection Does Not Permit Deleting Grid Rows When Set to False

When the AllowCellSelection Property is set to False (.F.) for a Grid control, you cannot select a row for deletion by clicking the deletion column. For more information, see AllowCellSelection Property.

Northwind Database

The sample Northwind database has been updated. Five of the stored procedures now include calls to the SETRESULTSET( ) Function so that the Visual FoxPro OLE DB Provider will return a cursor when they are executed.

Foundation Classes

The _ShellExecute class contained in the _Environ.vcx FFC class library has been updated to include an additional parameter in the ShellExecute method.

Wizards and Builders

The Wizard/Builder selection dialog box now properly hides deleted entries in the Wizard and Builder registration tables.

Specifying Western Language Script Values for GETFONT( ) Function

In versions prior to this release, passing 0 as the nFontCharSet value for GETFONT( ) opened the Font Picker dialog box and displayed the Script list as unavailable. You could not specify 0 (Western) as the language script value, and setting it to 1 (Default) sets nFontCharSet to the default font setting only, which is determined by the operating system.

In this release, passing 0 to GETFONT( ) opens the Font Picker dialog box with the Script list available and Western selected. The return value for GETFONT( ) also includes the return value for nFontCharSet.

Removed Items

HTML Help SDK

The HTML Help 1.3 SDK no longer ships with Visual FoxPro.

Visual FoxPro New Reserved Words

Article
07/09/2007

_

Expand table

_MEMBERDATA	_MENUDESIGNER	_REPORTBUILDER
_REPORTOUTPUT	_ REPORTPREVIEW	_TOOLTIPTIMEOUT

A

Expand table

ADJUSTOBJECTSIZE	ADOCODEPAGE	AFTERBAND
AFTERRECORDREFRESH	AFTERREPORT	ALLOWMODALMESSAGES
ANCHOR	ASQLHANDLES	AUTOCOMPLETE
AUTOCOMPSOURCE	AUTOCOMPTABLE	AUTOHIDESCROLLBAR

B

Expand table

BEFOREBAND	BEFORERECORDREFRESH	BEFOREREPORT
BLOB

C

Expand table

CANCELREPORT	CAST	CLEARRESULTSET
CLEARSTATUS	COMMANDCLAUSES	CONFLICTCHECKCMD
CONFLICTCHECKTYPE	CURRENTDATASESSION	CURRENTPASS

D

Expand table

DECLAREXMLPREFIX	DELAYEDMEMOFETCH	DISPLAYORIENTATION
DOCKABLE	DOMESSAGE	DOSTATUS
DYNAMICLINEHEIGHT

E

Expand table

EVALUATECONTENTS

F

Expand table

FETCHMEMOCMDLIST	FETCHMEMODATASOURCE	FETCHMEMODATASOURCETYPE
FIRSTNESTEDTABLE	FRXDATASESSION	FOXOBJECT

G

Expand table

GDIPLUSGRAPHICS	GETAUTOINCVALUE	GETDOCKSTATE
GETPAGEHEIGHT	GETPAGEWIDTH	GETRESULTSET

H

I

Expand table

ICASE	INCLUDEPAGEINOUTPUT	INSERTCMDREFRESHCMD
INSERTCMDREFRESHFIELDLIST	INSERTCMDREFRESHKEYFIELDLIST	ISMEMOFETCHED
ISPEN	ISTRANSACTABLE

J

K

L

Expand table

LISTENERTYPE

LOADREPORT

M

Expand table

MAKETRANSACTABLE

MAPBINARY

MAPVARCHAR

N

Expand table

NEST

NESTEDINTO

NEXTSIBLINGTABLE

O

Expand table

ONPREVIEWCLOSE	OPTIMIZE	ORDERDIRECTION
OUTPUTPAGE	OUTPUTPAGECOUNT	OUTPUTTYPE

P

Expand table

PAGENO	PAGETOTAL	PICTUREMARGIN
PICTURESPACING	PICTUREVAL	POLYPOINTS
PREVIEWCONTAINER	PRINTJOBNAME	PROGCACHE

Q

Expand table

QUIETMODE

R

Expand table

RECORDREFRESH	REFRESHALIAS	REFRESHCMD
REFRESHCMDDATASOURCE	REFRESHCMDDATASOURCETYPE	REFRESHIGNOREFIELDLIST
REFRESHTIMESTAMP	RENDER	REPORTBEHAVIOR
REPORTLISTENER	RESPECTNESTING	ROTATION

S

Expand table

SCCDESTROY	SCCINIT	SELECTIONNAMESPACES
SENDGDIPLUSIMAGE	SETRESULTSET	SQLIDLEDISCONNECT
SUPPORTSLISTENERTYPE

T

Expand table

TABLEPROMPT

TIMESTAMPFIELDLIST

TWOPASSPROCESS

U

Expand table

UNLOADREPORT	UNNEST	UPDATECMDREFRESHCMD
UPDATECMDREFRESHFIELDLIST	UPDATECMDREFRESHKEYFIELDLIST	UPDATESTATUS
USECODEPAGE	USECURSORSCHEMA	USETRANSACTIONS

V

Expand table

VARBINARY

VARCHAR

VARCHARMAPPING

W

X

Expand table

XMLNAMEISXPATH

Y

Z

Getting Started with Visual FoxPro

Article
07/09/2007

Microsoft Visual FoxPro is the object-oriented relational database management system that makes it possible for you to create state-of-the-art enterprise database solutions. Visual FoxPro includes professional productivity tools, documentation, and sample code for quickly building, managing, and deploying solutions.

In This Section

Locating Readme Files (Visual FoxPro)
Describes how to find the location of the Visual FoxPro Readme file.
Installing Visual FoxPro
Describes system requirements and how to install and add applications to Visual FoxPro.
Upgrading from Earlier Versions
Describes how Visual FoxPro protects your investment in applications built in previous versions of FoxPro.
How to: Convert Earlier Visual FoxPro Files
Describes how to convert files from earlier versions of FoxPro and Visual FoxPro.
Customizing the Visual FoxPro Environment
Explains how you can optimize your computer system, configure Visual FoxPro and development environment settings, restore your desktop, and how people with disabilities can improve accessibility to Visual FoxPro and Microsoft Windows.

What’s New in Visual FoxPro
Describes the new features and enhancements included in this version of Visual FoxPro.
Using Visual FoxPro
Gives an overview of Visual FoxPro features, describes concepts and productivity tools for developing, programming, and managing high-performance database applications and components, and provides walkthroughs that help get you started. With the robust tools and data-centric object-oriented language that Visual FoxPro offers, you can build modern, scalable, multi-tier applications that integrate client/server computing and the Internet.
Developing Visual FoxPro Applications
Includes conceptual information about how to develop Visual FoxPro applications, instructions for creating databases and the user interface, and other tasks needed to create Visual FoxPro applications.
Programming in Visual FoxPro
Describes how understanding object-oriented programming techniques and the event-driven model can maximize your programming productivity and enable you to access the full power of Visual FoxPro.
Reference (Visual FoxPro)
Includes Visual FoxPro general, programming language, user interface, and error message reference topics.
Samples and Walkthroughs
Contains Visual FoxPro samples and walkthroughs that you can use for experimenting with and learning Visual FoxPro features.

Locating Readme Files (Visual FoxPro)

Article
07/09/2007

The Readme.htm file is stored at the root of the Microsoft Visual FoxPro CD-ROM. Use your Internet browser to open and view the files.

To locate Readme files for additional products included in the Visual FoxPro package, see the root of each product CD-ROM.

Installing Visual FoxPro

Article
07/09/2007

The following sections describe information about installing Visual FoxPro.

In This Section

Requirements for Installing Visual FoxPro
Describes system requirements for installing Visual FoxPro.
How to: Install Visual FoxPro
Describes how to install Visual FoxPro.
How to: Install Additional Applications
Describes how to install other applications included with Visual FoxPro.
How to: Reinstall Visual FoxPro
Describes how to reinstall Visual FoxPro or install Visual FoxPro to another location.
Troubleshooting Installation
Describes potential issues you might encounter when installing Visual FoxPro.

Customizing the Visual FoxPro Environment
Explains how you can optimize your computer system, configure Visual FoxPro and development environment settings, restore your desktop, and how people with disabilities can improve accessibility to Visual FoxPro and Microsoft Windows.
Upgrading from Earlier Versions
Describes how Visual FoxPro protects your investment in applications built in previous versions of FoxPro.
Getting Started with Visual FoxPro
Provides information about where to find the Readme file, installing and upgrading from previous versions, configuring Visual FoxPro, and customizing the development environment.

Requirements for Installing Visual FoxPro

Article
07/09/2007

Visual FoxPro has the following minimum system requirements:

Computer: PC with a Pentium class processor.
Peripherals: Mouse or pointing device
Memory: 64 MB RAM (128 MB or higher recommended)
Hard disk space:
- Visual FoxPro Prerequisites: 20 MB
- Visual FoxPro Typical Install: 165 MB
- Visual FoxPro Maximum Install: 165 MB
Video: 800 x 600 resolution, 256 colors (High color 16-bit recommended)
Operating system: Developing applications with Visual FoxPro is supported only on Microsoft Windows 2000 Service Pack 3 or later, Windows XP and Windows Server 2003. You can create and distribute run-time applications for Windows 98, Windows Me, Windows 2000 Service Pack 3 or later, Windows XP and Windows Server 2003. NoteInstallation on Windows NT 4.0 Terminal Server Edition is not supported.

How to: Install Visual FoxPro

Article
07/09/2007

You can install this version of Visual FoxPro from a CD-ROM or a network to a local hard drive. You must install Visual FoxPro on a local drive, not a mapped drive. There is no other preparation required before installing Visual FoxPro. You must have administrator privileges to install Visual FoxPro. It is recommended that you run with power-user privileges to use all the provided tools effectively.

You can safely install or uninstall using Visual FoxPro Setup. If you are upgrading Visual FoxPro, you must first uninstall the previous version of of the program. Though both versions of Visual FoxPro can exist on the same computer, you cannot install the current version of Visual FoxPro in the same directory as the previous version.

If you plan to publish XML Web services using Visual FoxPro, you might want to set up Internet Information Services (IIS) on a Windows 2000, Windows XP or Windows Server 2003 computer. Refer to your operating system documentation for instructions on how to set up and configure IIS.

Note

Visual FoxPro setup no longer installs any Windows operating system Service Packs or versions of Internet Explorer. It is highly recommended that you install the latest versions of these components before installing Visual FoxPro. Additionally, Visual FoxPro 9.0 is supported only on Windows 2000 Service Pack 3 or later. For details about installing the latest Service Pack, visit the following Microsoft Web page at https://www.microsoft.com/windows2000/.

Full installation includes all Visual FoxPro program files, online help, and samples files.

To install Visual FoxPro

Quit all open applications. NoteIf you use a virus protection program on your computer, override it or turn it off before running the Installation wizard. The Installation wizard might not run properly with virus protection turned on. After installation, be sure to restart your virus protection program.
Insert the Visual FoxPro CD.The Visual FoxPro Setup start page appears automatically.
Click Install Visual FoxPro to launch Visual FoxPro Setup.
To determine if you need additional components, click Prerequisites to display any necessary components.
Click Install Now! to install any new components. If Visual FoxPro Prerequisites needs to only update components, click Update Now!
You might need to restart your computer. When finished, click Done.Visual FoxPro Setup reappears.
To continue installation, click Visual FoxPro.
After accepting the End User License Agreement and entering the Product Key and your name, click Continue. NoteVisual FoxPro cannot be installed on a mapped drive. You must install Visual FoxPro on a local drive. Do not attempt to use the Map Network Drive functionality in Setup.
On the Options page, select the features you want to install and click Install Now! to continue.
When finished, click Done to return to Visual FoxPro Setup. Click Exit to return to the Visual FoxPro Setup start page.

If you uninstall Visual FoxPro while the previous version of Visual FoxPro exists on your computer, certain shared registry keys used by the previous version of Visual FoxPro are removed. You must reinstall these critical shared registry keys.

If you run Visual FoxPro from the Start menu, Visual FoxPro Setup automatically reinstalls these keys. If you start Visual FoxPro using other means, such as running the application executable directly, the setup program does not start automatically. You should use Add/Remove Programs in the Control Panel and the following steps to reinstall the registry keys manually:

To manually reinstall Visual FoxPro 9.0 registry keys

From the Start menu, click Control Panel.
Click Add/Remove Programs.
Click Change/Remove for Microsoft Visual FoxPro 9.0.
Click Visual FoxPro and Repair/Reinstall.

How to: Install Additional Applications

Article
07/09/2007

This release includes copies of additional software that you can install and use with Visual FoxPro. These include:

InstallShield Express Limited EditionProvides the capability to package and deploy the applications that you create using Visual FoxPro. Visual FoxPro includes the InstallShield Express 5.0 Visual FoxPro Limited Edition. NoteThe limited and full editions of InstallShield Express 5.0 are considered two versions of the same product and cannot coexist. If you install one version on a computer where another already exists, the original is uninstalled automatically. Because the limited edition contains fewer features than the full edition, you should keep the full edition on your computer.
Microsoft SOAP Toolkit 3.0 SamplesProvides samples for demonstrating how to consume and publish XML Web services. Visual FoxPro Prerequisites installs the core SOAP Toolkit 3.0 components needed to access and publish XML Web services in Visual FoxPro.
Microsoft SQL Server 2000 Desktop Engine (MSDE)Provides a personal version of SQL Server.

To install InstallShield Express Limited Edition

Insert the Visual FoxPro CD.The Visual FoxPro Setup start page opens automatically.
Click Install InstallShield Express.
Follow the instructions in the InstallShield Express installation wizard.

You can also locate the Setup.exe file for InstallShield Express in the InstallShield folder on the Visual FoxPro CD.

Note

Visual FoxPro 9.0 installs its redistributable merge modules in the same location as Visual FoxPro 8.0.

The version of InstallShield Express included with Visual FoxPro 9.0 automatically uses the Visual FoxPro 9.0 merge module location.

Note

Visual FoxPro 9.0 requires certain merge modules when creating a Visual FoxPro 9.0 redistributable custom application setup program using InstallShield Express.

You need to include the following merge modules when creating your custom setup program:

Microsoft Visual FoxPro 9 Runtime Libraries
Microsoft Visual C Runtime Library 7.1
GDI Plus Redist
MSXML 4.0
MSXML 3.0 (needed only for CURSORTOXML functions)
Microsoft Visual FoxPro 9 Runtime Language Libraries (specific language library files that may be needed for international applications)
Reporting Applications (needed for Visual FoxPro 9.0 reporting engine)

Note

MSXML 4.0 consists of two merge modules (msxml4sxs32.msm and msxml4sys32.msm). MSXML 3.0 consists of three merge modules (msxml3_wim32.msm, msxml3inf_wim32.msm and wdstddll_wim32.msm).

To install SOAP Toolkit 3.0 Samples

Insert the Visual FoxPro CD.The Visual FoxPro Setup start page opens automatically.
Click Install SOAP Toolkit 3.0 Samples.
Follow the instructions in the SOAP Toolkit 3.0 Samples Setup Wizard.

You can also locate the Soapsdk.msi and Soapsamp.msi files for the SOAP Toolkit in the SOAPToolkit folder on the Visual FoxPro CD.

To install MSDE

Insert the Visual FoxPro CD.The Visual FoxPro Setup start page opens automatically.
Click Install Microsoft SQL Server Desktop Engine (MSDE) and follow the installation instructions that appear in the Readme file.

You can locate the Setup.exe file for MSDE in the SQLMSDE folder on the Visual FoxPro CD.

Note

Visual FoxPro includes Microsoft SQL Server 2000 Desktop Engine Service Pack 3.0a. To make sure you have the most recent version and Service Pack installed, visit the Microsoft SQL Server Web page at https://www.microsoft.com/sql. In addition, if you are distributing custom Visual FoxPro applications that require MSDE, you can obtain the redistributable merge modules from the Microsoft SQL Server Web page for use with Windows Installer-based setup programs.

How to: Reinstall Visual FoxPro

Article
07/09/2007

You can reinstall Visual FoxPro by uninstalling it and then installing it again. You can uninstall Visual FoxPro from the Start menu or from the original installation disk.

To uninstall Visual FoxPro

On the Start menu, click Control Panel.
In the Control Panel window, double-click Add or Remove Programs.The Add or Remove Programs window opens.
In the Currently installed programs list, click the version of Microsoft Visual FoxPro you want to uninstall, and then Change/Remove.

If you reinstall Visual FoxPro or reinstall to another location, you might want to clean your user settings and other files installed by Visual FoxPro before reinstalling.

You can remove these files by deleting the contents of the …\Application Data\Microsoft\Visual FoxPro folder inside your user settings folder. To determine the location of the Application Data folder, type ? HOME(7) in the Command window. These files include your FoxUser.* resource files, which contain user settings, and folders for the Toolbox and Task Pane.

However, it is possible that your resource files are in another location. You can determine their location by typing the following in the Command window:Copy

? SYS(2005)

You should delete old Code Reference files that might be associated with projects in the project directories. These are labeled as projectname_ref.* files. You might also need to restore the default Visual FoxPro registry settings.

Visual FoxPro includes the VFPClean.app tool so you can make sure all core Xbase and other files are set appropriately.

To run VFPClean.app

Type the following line of code in the Command window:CopyDO HOME()+"VFPCLEAN.APP"

Troubleshooting Installation

Article
07/09/2007

You might encounter the following issues when installing Visual FoxPro:

If you cannot run Visual FoxPro and do not see error messages telling you what is wrong, the problem might be in your computer’s ROM BIOS or the video driver you are using.
If you are using an extended keyboard, be sure it is compatible with the ROM BIOS. In addition, make sure that you are using a standard VGA or Super VGA Windows video driver.
If you get a “stack overflow” error message, your video driver is out of date or not designed for your video card. To correct this problem, update the video driver.
For additional information, see the Visual FoxPro Readme at the root of the Visual FoxPro installation CD.

Upgrading from Earlier Versions

Article
07/09/2007

Microsoft Visual FoxPro protects your investment in applications built with previous versions of FoxPro. In Visual FoxPro, you can run many applications that were written in earlier versions with little or no conversion. You can modify and enhance applications using the Visual FoxPro language, knowing that most extensions to the language do not affect backward compatibility. In addition, you can convert FoxPro screens, projects, and reports to Visual FoxPro format.

However, it is possible that some behavior or feature changes in the current version of Visual FoxPro might affect existing Visual FoxPro source code. Therefore, you should review the new features, enhancements, and most recent behavior changes for this version. For more information, see What’s New in Visual FoxPro and Changes in Functionality for the Current Release.

Conversion to Visual FoxPro Format

If you choose to convert your dBASE or FoxPro files to the Visual FoxPro format, you can take advantage of the unique features of Visual FoxPro. You can run many files from some previous versions of FoxPro directly; others require varying levels of conversion.

You can convert most projects or components created using previous versions of Visual FoxPro simply by opening or recompiling them in this version of Visual FoxPro. When you recompile components, such as forms, screens, or reports, some modifications may be necessary. You can make modifications to these components in the same way you modify the components of this version of Visual FoxPro.

You can find additional information about upgrading from previous versions of Visual FoxPro on the Microsoft Developer Network (MSDN) Web site at https://msdn.microsoft.com. You can search the MSDN Archive for documentation of previous versions of Visual FoxPro.

How to: Convert Earlier Visual FoxPro Files

Article
07/09/2007

You can explicitly convert FoxPro 2.6 and Visual FoxPro 3.0 files to the current Visual FoxPro format, which is necessary when you want to use these files with later versions of Visual FoxPro. Files that are created from later versions are converted automatically.

To convert FoxPro 2.6 and Visual FoxPro 3.0 files

On the File menu, click Open.
In the Open dialog box, browse for and select the file.The Visual FoxPro Converter dialog box opens. For more information, see Visual FoxPro Converter Dialog Box.
In the Visual FoxPro Converter dialog box, select the options you want.
To complete the file conversion, click Continue. NoteIf you are converting Macintosh or MS-DOS files that have never contained Windows records, the Visual FoxPro Transporter dialog box appears. For more information, see Visual FoxPro Transporter Dialog Box.

You can also convert FoxPro 2.6 and Visual FoxPro 3.0 files by typing one of the following commands with the file name in the Command window:

Customizing the Visual FoxPro Environment

Article
07/09/2007

After you install Visual FoxPro, you might want to customize your development environment. You can also specify configuration settings that load when you start Visual FoxPro.

For information on optimizing your Visual FoxPro applications, see Optimizing Applications.

In This Section

Optimizing Your System
Explains how to get maximum performance by optimizing your operating system, Visual FoxPro, and your application.
Visual FoxPro Configuration
Explains how changing the configuration of your copy of Visual FoxPro affects the way it looks and behaves, such as establishing default locations for files used with Visual FoxPro, altering how your source code looks in an edit window, and displaying the format of dates and times.
Visual FoxPro Environment Settings
Describes different ways to change Visual FoxPro environment settings such as using the Options dialog box, setting configuration options at program startup, and using command-line options. You can configure Visual FoxPro toolbars, dock windows, set editor options, and customize the appearance of your applications without altering code.
Restoring the Visual FoxPro Interactive Environment
Describes how to close down all program operations and clear the Visual FoxPro desktop to return to its interactive state.
Accessibility for People with Disabilities (Visual FoxPro)
Provides information about features, products, and services that make Microsoft Visual FoxPro and the Windows operating system more accessible for people with disabilities.

Getting Started with Visual FoxPro
Provides information about installing, upgrading, and customizing Visual FoxPro.
Using Visual FoxPro
Explains how Visual FoxPro provides the tools you need to create and to manage high-performance database applications and components.
Samples and Walkthroughs
Describes how to create different types of applications and components with step-by-step guides.
Overview of Visual FoxPro Features
Describes how Visual FoxPro gives you more of everything you have come to expect in a database management system (DBMS) — speed, power, and flexibility.
Developing Visual FoxPro Applications
Includes conceptual information about how to develop Visual FoxPro applications, instructions for creating databases and the user interface, and other tasks needed to create Visual FoxPro applications.
Programming in Visual FoxPro
Describes how understanding object-oriented programming techniques and the event-driven model can maximize your programming productivity and enable you to access the full power of Visual FoxPro.
Development Productivity Tools
Explains that Visual FoxPro provides developer tools for application development within the FoxPro application and the FoxPro language.

Optimizing Your System

Article
07/09/2007

Visual FoxPro is designed to be a fast relational database development system. However, applications you create with Visual FoxPro can have varying requirements and purposes. Therefore, you might want to optimize the operating system, Visual FoxPro, or your application for maximum performance.

In This Section

Optimizing the Operating Environment
Describes how to optimize computer hardware and and operating environment for running Visual FoxPro.
Optimizing Visual FoxPro Startup Speed
Describes how to optimize startup and operating speed in Visual FoxPro.
Optimizing Visual FoxPro in a Multiuser Environment
Describes how to improve performance when running Visual FoxPro in a multiuser environment.

Customizing the Visual FoxPro Environment
Provides information about setting environment options, accessibility features, and configuration.
Getting Started with Visual FoxPro
Discusses how to get started, including information about installing, upgrading, and customizing Visual FoxPro to create state-of-the-art enterprise database solutions.
What’s New in Visual FoxPro
Lists the new features and enhancements made to this version of Microsoft Visual FoxPro.
Using Visual FoxPro
Provides links to information on Visual FoxPro programming features that are designed to improve developer productivity, including Access and Assign methods, support for more graphic file formats, and language to simplify programming tasks.
Developing Visual FoxPro Applications
Includes conceptual information about how to develop Visual FoxPro applications, instructions for creating databases and the user interface, and other tasks needed to create Visual FoxPro applications.
Programming in Visual FoxPro
Discusses how to access the full power of Visual FoxPro by creating applications. Understanding object-oriented programming techniques and the event-driven model can maximize your programming productivity.

Optimizing the Operating Environment

Article
07/09/2007

Maximizing Memory and Virtual Memory

Providing your computer with as much memory as possible is the most effective way to optimize your system for Visual FoxPro. You can also use memory more effectively by closing all other running applications on your computer. To maximize the use of your computer’s memory while running Visual FoxPro, follow these guidelines:

Do not run other Windows applications while running Visual FoxPro.
Use only those memory-resident programs needed for operation.
Simplify the screen display.

You can free memory by simplifying the way windows and screen backgrounds display on your computer monitor.

Use a color or a pattern for the desktop background instead of wallpaper.
Use the lowest-resolution display that is practical for you. The higher resolution of the display, the more memory your computer requires and the slower your graphics and user-interface elements appear. For VGA-compatible displays that use an extended mode driver, such as Video 7 or 8514, using the standard VGA driver ensures faster display performance but provides lower resolution and less color support.

To increase the number of applications that you can run simultaneously, Microsoft Windows supports virtual memory by swapping segments of code that is the least recently used from memory to the hard disk in the form of a paging file. As a rule, the default settings in the Windows operating system for managing virtual memory meet the requirements of most users and are the recommended settings.

Note

The paging file does not improve Visual FoxPro performance and is not a substitute for more memory.

Managing Your Hard Disk

Disk input/output performance degrades significantly when a hard disk is nearly full. The more free hard disk space that is available, the more likely it is that contiguous blocks of disk space are available. Visual FoxPro uses this space for changes and additions to database, table, index, memo, and temporary files. Increasing free hard disk space improves performance of any commands that change or add to your files. More disk space also decreases the time required to read those files in response to your queries.

The way that Windows and Visual FoxPro manage files on disk can greatly affect the performance of your application. The following sections discuss managing files in directories and temporary files:

Managing Files in Directories
Managing Temporary Files

Managing Files in Directories

Use the Visual FoxPro Project Manager to create and manage your files, segregate program files into separate directories, and avoid creating numerous generated files.
When you want to distribute your application, create an application or an executable (.exe) file instead of numerous individually generated files.This process decreases the number of files in your application’s subdirectories and increases performance.
If you delete a large number of files in one directory, copy the remaining files into a new directory or optimize the directory using a defragmenting utility program. NoteDeleting files from a directory does not automatically speed directory searching. When a file is deleted, the file is only marked for deletion and is still included in directory searches.
When saving files, use short file paths to increase performance.For example, suppose you have a file path “C:\Program Files\Microsoft Visual FoxPro\…”, which is a very long file path. Try to use shorter file paths.

Managing Temporary Files

Tip

In most cases, you should specify one location for all Visual FoxPro temporary files. Make sure that the location you specify contains enough space for all possible temporary files.

For more information, see How to: Specify the Location of Temporary Files.

Searching for Temporary Files

When Visual FoxPro searches for temporary files, for example, when you use the SYS(2023) – Temporary Path function to retrieve the temporary files path or when the TMPFILES, EDITWORK, PROGWORK, and SORTWORK settings in a Visual FoxPro configuration file do not specify a different location, the Windows API GetTempPath is used to search for the path containing the temporary files. GetTempPath searches a sequence of variables that differ depending on the operating system. Microsoft Windows 2000 and later include user variables that store the location of temporary files, while Microsoft Windows 95, 98, and Me include only global system environment variables for this purpose.

On Windows 2000 and later, GetTempPath, and therefore, SYS(2023), TMPFILES, EDITWORK, PROGWORK, and SORTWORK, searches the TMP user variable for the location of temporary files by default. If the TMP user variable does not specify a location, Visual FoxPro searches the following variables in a specific order:

TMP system variable.
TEMP user variable.
TEMP system variable.

If these variables do not specify a location, the location for storing temporary files defaults to the home drive and path, or the Temp folder in the user’s Documents and Settings directory.

Note

If more than one value is specified for TMP or TEMP, then the first value is used.

On Windows 95, 98, and Me, GetTempPath searches the TMP and TEMP global system variables in that order and then searches the current directory.

For more information, see SYS(2023) – Temporary Path and Special Terms for Configuration Files.

Optimizing Visual FoxPro Startup Speed

Article
07/09/2007

Managing Startup Speed

The time required to load and start Visual FoxPro relates to the physical size of Visual FoxPro, the length of the PATH statement in effect, the number of items to be found at startup, and other factors. You can control the load size, the search path, component file locations, and the startup SET command values of Visual FoxPro.

Managing File Locations

Visual FoxPro stores the FoxUser.dbf file, which contains user settings, in the user’s Application Data directory by default. You can display this location by typing ? HOME(7) in the Command window. Visual FoxPro searches for the FoxUser.dbf and Config.fpw files in the following places:

In the startup application or executable file, if any.For example, you can start a Visual FoxPro application by typing the following code on the command line:Copy VFPversionNumber.exe MyApp.app – or –CopyVFPversionNumber.exe MyApp.exe If the startup application or executable file contains a Config.fpw file, the configuration file is always used. You can override settings in a Config.fpw file that are bound inside an application by specifying an external Config.fpw file, using the -C command-line switch when starting an application or Visual FoxPro.
In the working directory.
Along the path established with the PATH environment variable.
In the directory containing Visual FoxPro.

Controlling File Loading

You can also speed startup by preventing Visual FoxPro from loading files you don’t plan to use. If your application does not use the FoxUser or FoxHelp file, disable them in the Config.fpw file by using the following commands:Copy

RESOURCE = OFF
HELP = OFF

Visual FoxPro seeks all other Visual FoxPro components (GENXTAB, CONVERT, and so on) only in the Visual FoxPro directory. If you place components elsewhere, you must explicitly identify the path to those components in your Config.fpw file. For example, you might specify these locations:Copy

_TRANSPORT = c:\migrate\transport.prg
_GENXTAB = c:\crosstab\genxtab.prg
_FOXREF = c:\coderefs\foxref.app

You can use the environment variable FOXPROWCFG to explicitly specify the location of Config.fpw. For details about the FOXPROWCFG variable, see Customizing the Visual FoxPro Environment.

Optimizing the Load Size of Visual FoxPro

If you don’t plan on using any of the Visual FoxPro components listed previously, set them to an empty string to speed startup.

To optimize the load size of Visual FoxPro, use the following syntax:Copy

        cFileVariable = ""

Replace cFileVariable with _TRANSPORT, _CONVERT, or other variables as appropriate.

Optimizing Key SET Commands

You can optimize the operation of Visual FoxPro by tuning the values of certain SET commands.

The following table shows SET commands that have the greatest effect on performance, and their settings for maximum performance. You can specify SET command values by including them in the Config.fpw file, by typing them in the Command window, or by setting them in the Options dialog box.

Command Settings for Maximum PerformanceExpand table

SET Command	Performance Setting
SET ESCAPE Command	ON
SET OPTIMIZE Command	ON
SET REFRESH Command	0,0
SET SYSMENU Command	DEFAULT
SET TALK Command	OFF
SET VIEW Command	OFF

Optimizing Visual FoxPro in a Multiuser Environment

Article
07/09/2007

Managing Temporary Files

In most multiuser environments, it is recommended that you save temporary files to local disks or memory when networked computers contain large amounts of free disk space. Redirecting storage of temporary files can improve performance by reducing frequent access to the network drive.

On small networks with older networked computers and slow hard disks, you might experience better performance by leaving Visual FoxPro temporary files on the file server; however, when in doubt, direct temporary files to the local disk. When working on large, heavily used networks, always redirect temporary files to the local disk.

By saving all temporary files to a single directory on a local hard drive, you can safely erase the contents of the temporary file directory on the file server prior to each Visual FoxPro session. This action purges the system of any temporary files that were created but not erased by Visual FoxPro due to a system reboot or power loss.

For more information about temporary files, see Optimizing the Operating Environment and How to: Specify the Location of Temporary Files.

If users share tables on a network, the way you manage access to them can affect performance.

Avoid opening and closing tables repeatedly.
Buffer write operations to tables that are not shared.
Provide exclusive access to tables.
Limit the time on locking tables.

Providing Exclusive Access

You can enhance performance for the APPEND, REPLACE, and DELETE commands and operations that run at times when no other users require access to the data, for example, overnight updates, by opening data files for exclusive use. When tables are open for exclusive use, performance improves because Visual FoxPro does not need to test the status of record or file locks.

To open data files for exclusive use, use the EXCLUSIVE clause in the USE and OPEN DATABASE commands. For more information, see USE Command and OPEN DATABASE Command.

Limiting the Time on Locking Tables

You can reduce contention between users for write access to a table or record by shortening the amount of time for locking a record or table. Instead of locking a record while the user edits it, lock the record only after it has been edited. Using optimistic row buffering provides the shortest amount of time that records are locked. For more information, see Buffering Data.

Visual FoxPro Configuration

Article
07/09/2007

The configuration of Visual FoxPro determines how your copy of Visual FoxPro looks and behaves. For example, you can establish the default locations for files used with Visual FoxPro, how your source code looks in an edit window, and the format of dates and times.

You can make changes to the Visual FoxPro configuration that exist for the current session only (temporary), or specify them as the default settings for the next time you start Visual FoxPro (permanent). If the settings are temporary, they are stored in memory and are discarded when you quit Visual FoxPro.

If you make permanent settings, they are stored in the Microsoft Windows registry or Visual FoxPro resource file. The Windows registry is a database that stores configuration information about the operating system, all Windows applications, OLE, and optional components such as ODBC. For example, the registry is where Windows stores the associations between file name extensions and applications so that when you click a file name, Windows can launch or activate the appropriate application.

For an example of how to change the registry, you can examine Registry.prg in the \Samples\Classes directory, which contains numerous methods based on Windows API calls and makes it possible for you to manipulate the Windows registry.

Similarly, Visual FoxPro stores its application-specific configuration information in the registry. When you start Visual FoxPro, the program reads the configuration information in the registry and sets the configuration according to those settings. After reading the registry, Visual FoxPro also checks for a configuration file, which is a text file in which you can store configuration settings to override the defaults stored in the registry. After Visual FoxPro has started, you can make additional configuration settings using the Options Dialog Box or SET commands. For more information, see How to: View and Change Environment Settings.

Note

The run-time version of Visual FoxPro does not read the Windows registry when starting up, as registry settings are designed primarily to configure the development environment. If you intend to distribute your Visual FoxPro applications using a run-time library, you can establish configuration settings in two ways: with a configuration file, or with a program that manipulates the Windows registry on the user’s computer.

Visual FoxPro also maintains a resource file, Foxuser.dbf, that stores information about the current state of the program when you quit. For example, the resource file contains information about the location and size of the Command window, current keyboard macros, the toolbars that are displayed, and so on. The Foxuser.dbf file is an ordinary Visual FoxPro table, which you can read and change as required by your application.

Tip

If the data in the Foxuser.dbf file becomes corrupted or invalid, it can cause Visual FoxPro to behave in an erratic manner. If you do not manually store anything in the table, for example keyboard macros, deleting the table might help.

Visual FoxPro Environment Settings

Article
07/09/2007

You can make changes to the Visual FoxPro environment by using the Options dialog box, editing the Windows registry, overriding default configuration settings, customizing the Project Manager, and configuring Visual FoxPro toolbars.

In This Section

How to: View and Change Environment Settings
Describes how to view and change environment settings.
How to: Specify the Location of Temporary Files
Describes how to change the default location for saving temporary files in Visual FoxPro.
How to: Change Configuration Settings in the Windows Registry
Describes how configure Visual FoxPro by making changes to the Windows registry using the Registry Editor.
Setting Configuration Options at Startup
Describes how to set or override default configuration opens when starting Visual FoxPro.
How to: Specify the Configuration File
Describes how to specify or suppress configuration files when starting Visual FoxPro.
How to: Create a Configuration File
Describes how to create configuration files to use when starting Visual FoxPro.
Special Terms for Configuration Files
Provides a list of special terms you can use in configuration files.
How to: Use Command-Line Options When Starting Visual FoxPro
Explains how to specify startup options by using a command-line switch.
How to: Configure Visual FoxPro Toolbars
Describes how to customize, display, and hide toolbars.
How to: Dock Windows
Describes how to dock Visual FoxPro windows to the main Visual FoxPro window or to each other.
How to: Set Editor Options
Explains how to configure Visual FoxPro editors.
How to: Display and Print Source Code in Color
Describes how to configure settings for displaying and printing program files, methods, stored procedures, and memos in color.
Automating Keystroke Tasks with Macros
Describes how to record and save keystrokes using macros.

Customizing the Visual FoxPro Environment
Explains customizing development environment settings.
Visual FoxPro Configuration
Explains configuring Visual FoxPro.
Restoring the Visual FoxPro Interactive Environment
Describes closing operations and clearing the Visual FoxPro desktop to return to its interactive state.

How to: View and Change Environment Settings

Article
07/09/2007

Displaying Environment Settings

When you run Visual FoxPro, you can verify environment settings by using the Options dialog box or the DISPLAY STATUS Command. Also, you can display the values of individual SET Commands to verify settings.

To display multiple environment settings

On the Tools menu, click Options to display the Options dialog box and view the current settings. – OR –
Type DISPLAY STATUS in the Command window.

To display individual environment settings

Use the SET( ) Function in the Command window to display the current value of any SET commands.

For example, to view the current status of SET TALK, type:Copy

? SET("TALK")

Note

Because settings are valid only for the current data session, you must capture your settings and place them in a program or a form’s Init event code for every private data session.

For more information, see SET Command Overview.

To echo Options dialog box settings to the Debug Output window

On the Tools menu, click Debugger.
Click the main Visual FoxPro window to select it and on the Tools menu, click Options.
In the Options dialog box, make setting choices.
Hold down the SHIFT key and click OK.The settings are echoed to the Debug Output window.
Click the Visual FoxPro Debugger window to select it and copy the setting commands from the Debug Output window.

Saving Environment Settings

You can save the settings you make in the Options dialog box for the current data session or as default (permanent) settings for your copy of Visual FoxPro.

To save settings for the current session only

In the Options dialog box, select your settings.
Click OK.

When you save settings for the current session only, they remain in effect until you quit Visual FoxPro (or until you change them again). To save changes permanently, save them as default settings. This action stores your settings in the Windows registry.

To save current settings as default settings

In the Options dialog box, select your settings.
Click Set As Default. NoteThe Set as Default button is disabled until you make a change to the current settings.

You can override default settings either by issuing SET commands or by specifying a configuration file when you start Visual FoxPro. For details, see Setting Configuration Options at Startup.

Setting the Environment Using the SET Command

You can programmatically modify most options displayed on the tabs in the Options dialog box using SET commands or by assigning a value to a system variable.

Note

When you configure the environment using SET commands, the settings take effect only for the current session of Visual FoxPro. When you quit the program, the system discards your settings. This means you must reissue the SET commands. However, you can automate this process by issuing SET commands at startup or using a configuration file. For details, see Setting Configuration Options at Startup.

Tip

To save a configuration made with SET commands, open the Options dialog box and save your settings there.

To set the environment programmatically

Use the SET commands that you want.

For example, the following lines of code set a default path, add a clock to the status bar and use a year-month-date (yy.mm.dd) format for dates:Copy

SET DEFAULT TO HOME()+"\VFP"
SET CLOCK ON
SET DATE TO ANSI

For more information, see SET Command Overview.

How to: Specify the Location of Temporary Files

Article
07/09/2007

You can specify a different location for temporary files using the Visual FoxPro interface or by using the TMPFILES, EDITWORK, PROGWORK, and SORTWORK settings in a Visual FoxPro configuration file.

To specify the location of temporary files

On the Tools menu, click Options.
In the Options dialog box, click the File Locations tab.
In the File Type list, click Temporary Files, then Modify.
In the Change File Location dialog box, type a new location or click the ellipsis (…) button to browse and select a location for temporary files.

For more information, see Options Dialog Box (Visual FoxPro) and File Locations Tab, Options Dialog Box.

How to: Change Configuration Settings in the Windows Registry

Article
07/09/2007

You can set the Visual FoxPro configuration by making changes directly in the Windows registry. To change the Windows registry, use the Registry Editor, a utility provided with Windows.

Note

Use caution when changing the Windows registry. Changing the wrong registry entry or making an incorrect entry for a setting can introduce an error that prevents Visual FoxPro, or even Windows itself, from starting or working properly.

To change configuration settings in the registry

In Windows, start the Registry Editor.
In HKEY_CURRENT_USER node, browse to the Software\Microsoft\Visual FoxPro directory and open the folder for the current version of Visual FoxPro.
In the Options folder, double-click the name of the setting to change, and then enter a new value.
Close the Registry Editor.Your change will take effect the next time you start Visual FoxPro.

You can also make changes to the registry by calling Windows APIs from a Visual FoxPro program.

Setting Configuration Options at Startup

Article
07/09/2007

Using SET Commands in Applications

One way to establish configuration settings is to issue one or more SET commands when your application starts. For example, to configure your system to display a clock in the status bar when the application starts, you can issue this SET command:Copy

SET CLOCK ON

The exact point at which you issue the SET command depends on your application. In general, you issue SET commands from your application’s main program file, which is the program or form that controls access to the rest of your application. You can also issue SET commands from the Load or Init events of the form. If you are using private data sessions, it may be necessary to make these settings in the BeforeOpenTables Event of your DataEnvironment object. For details about specifying a main file for an application, see Compiling an Application.

If your application has a form set as main in the project manager, and it then launches a menu, you can add SETUP commands by entering them in the menu’s Setup option. For details see How to: Add Setup Code to a Menu System in Designing Menus and Toolbars.

Tip

An efficient way to manage SET commands for startup is to create a procedure that contains all the commands that you want to issue. You can then call the procedure from the appropriate point in your application. Keeping all the SETUP commands in a single procedure makes it easier to debug and maintain your configuration settings. You can also put the code in the class on which your application object is based, or the class on which your forms are based.

Using a Configuration File

In addition to setting the Visual FoxPro environment using the Options dialog box or SET commands, you can establish preferred settings and save them in one or more configuration files. A Visual FoxPro configuration file is a text file in which you can specify values for SET commands, set system variables, and execute commands or call functions. Visual FoxPro reads the configuration file when starting up, establishing the settings and executing the commands in the file. Settings made in the configuration file override default settings made in the Options dialog box (and stored in the Windows registry).

Using a configuration file provides several advantages. You can:

Override the default settings established in the Options dialog box.
Maintain several different configuration files, each with different settings, so that Visual FoxPro can load a configuration suitable to a particular user or project.
Make changes more easily than if you establish settings with the SET commands in the program initialization sequence.
Start a program or call a function automatically when Visual FoxPro starts.

For instructions about working with configuration files, see How to: Create a Configuration File and How to: Specify the Configuration File.

How to: Specify the Configuration File

Article
07/09/2007

When Visual FoxPro starts, you can specify a configuration file or bypass all configuration files, allowing Visual FoxPro to use its default settings.

When Visual FoxPro loads a configuration file, the settings in that file take precedence over corresponding default settings made in the Options dialog box.

To specify a configuration file

In the command line that starts Visual FoxPro, specify the -C switch and the name of the configuration file that you want to use (including a path if necessary). Do not put a space between the switch and the file name.-or-
In Windows, double-click the name of the configuration file to use. Visual FoxPro will start using the configuration file you have selected.

If you want to avoid using any configuration file, including the default file Config.fpw, you can suppress all configuration files. This causes Visual FoxPro to use only the default settings established in the Options dialog box.

To suppress a configuration file

In the command line that starts Visual FoxPro, add the -C switch with nothing after it.For example, to avoid any configuration file found in the startup directory or the system path, use this command line:CopyVFPVersionNumber.exe -C

Specifying an External Configuration File

You can use an external configuration file in addition to an internal configuration file in circumstances where you need to configure settings separately. For example, setting SCREEN=OFF should be performed in an internal configuration file.

You can set Visual FoxPro to read an external configuration file following an internal configuration file by using the new ALLOWEXTERNAL directive in the internal configuration file. When you include the setting ALLOWEXTERNAL=ON in the internal configuration file, Visual FoxPro searches for an external configuration file, usually Config.fpw, and reads its settings. You can also specify a different configuration file using the -C command-line switch when starting Visual FoxPro.

Note

For .exe and .dll file servers, Visual FoxPro supports only those configuration files bound inside the server. Therefore, Visual FoxPro disregards the ALLOWEXTERNAL setting.

To read an external configuration file after an internal one

In the internal configuration file, set the special term ALLOWEXTERNAL to on.CopyALLOWEXTERNAL = ON
When you start your program, either specify a second configuration file using the -C command-line switch or have a second configuration file in the default program path.

For more information about command-line switches, see How to: Use Command-Line Options When Starting Visual FoxPro.

The settings in an external configuration file take precedence over those in the internal configuration file, if duplicate settings exist, because the external configuration file is read after the internal file. Visual FoxPro does not begin initialization until it reads both files.

If you want to specify the configuration file as read-only, place the file in your project and mark it as Included. If you want to specify that the file can be modified, place the file in your project and mark it as Excluded. You can then distribute the file separately with your application or executable file. By convention, configuration files use the .fpw extension.

How to: Create a Configuration File

Article
07/09/2007

To create a configuration file, use the Visual FoxPro editor, or any editor that can create text files, to create a text file in the directory where Visual FoxPro is installed. Earlier versions of Visual FoxPro created the file Config.fpw in the startup directory. Config.fpw became the default configuration file. You can create any program file and use it to establish default settings and behaviors by starting Visual FoxPro using that file either by double clicking the file or using a command line reference.

If you are creating a new configuration file, you can save it using any name you want. By convention, configuration files have the extension .fpw.

When you start Visual FoxPro, you can use a default configuration file from the following locations in order:

Current working directory
Directory where Visual FoxPro is installed
Directories listed in the DOS path

If Visual FoxPro does not find a configuration file in these locations, Visual FoxPro uses only the default settings established in the Options dialog box.

Note

For details about specifying an alternative to the default file name or location for the configuration file, see How to: Specify the Configuration File.

Enter configuration settings using one of these methods:

Make settings with the SET command.
Set system variables.
Call programs or functions.
Include special terms used only in configuration files.

To enter SET commands in a configuration file

Enter SET commands without the SET keyword and with an equal sign.For example, to set a default path type, use this format:CopyDEFAULT = HOME()+"\VFP" To add a clock to the status bar, use this command:CopyCLOCK = ON

To enter a setting for a system variable, use the same syntax you would use in the Command window or in a program.

To set system variables in a configuration file

Enter the name of the system variable, an equal sign (=), and the value to set the variable to.For example, the following command specifies an alternative spell-checking program:Copy_SPELLCHK = "SPLLCHK.EXE"

You can also call functions or execute programs from within a configuration file by using the COMMAND command. For example, you can start an initialization program as part of the startup process.

To call functions or execute commands in a configuration file

Enter COMMAND, an equal sign (=), and the command to execute or function to call.For example, to include the Visual FoxPro version number in the caption of the main Visual FoxPro window, use this command:CopyCOMMAND =_SCREEN.Caption="Visual FoxPro " + VERS(4) The following command launches a specific application when Visual FoxPro starts:CopyCOMMAND = DO MYAPP.APP

You can also use special terms in a configuration file that do not correspond to SET value, system variables, or commands.

To use special terms in a configuration file

Enter the special term, an equal sign (=), and the setting.For example, to set the maximum number of variables available in Visual FoxPro, use this command:CopyMVCOUNT = 2048

For a complete list of special terms for configuration files, see Special Terms for Configuration Files.

Starting Applications or Programs Automatically

You can insert commands into a configuration file that automatically launches programs when Visual FoxPro starts. You can use these commands either to start an entire application or to start just a program, such as one that initializes system variables.

To start applications from a configuration file

Assign the name of your application to the _STARTUP System Variable anywhere in the configuration file:Copy_STARTUP = MYAPP.APP -or-
Use the COMMAND command, which must be the last line in your configuration file:CopyCOMMAND = DO MYAPP.APP

Special Terms for Configuration Files

Article
07/09/2007

Remarks

Expand table

Term and syntax	Description
`ALLOWEXTERNAL ON \| OFF`	Specifies whether settings from an external configuration file as specified by the -C command-line switch (or located in path) are read in after those from an internal one. The ALLOWEXTERNAL term is ignored unless it is bound inside of an application.Default: OFF
`BITMAP ON \| OFF`	Specifies whether Visual FoxPro first writes screen or form updates to an off-screen bitmap, and then performs a bit block transfer (bitblt) to the screen. BITMAP = OFF can improve performance when application are accessed using Windows Terminal Server clients.Default: ON
`CODEPAGE = nValue \| AUTO`	Specifies a number that identifies the character set used for files. Setting CODEPAGE to AUTO selects the current operating system code page.For the possible values you can use, see Code Pages Supported by Visual FoxPro.
`COMMAND = cVisualFoxProCommand`	Specifies a Visual FoxPro command to execute when Visual FoxPro is started. The cVisualFoxProCommand specifies the command to execute.
`EDITWORK path`	Specifies where the text editor should place its work files. Because work files can become large, specify a location with plenty of free space.Default: Operating system dependent. For more information, see Optimizing the Operating Environment.
`INDEX extension`	Specifies the extension for Visual FoxPro index files.Default: .idx
`LABEL extension`	Specifies the extension for Visual FoxPro label definition files.Default: .lbx
`_MENUDESIGNER = cProgramName`	Specifies an external menu design application.Default: The empty string “”.
`MVCOUNT`	Sets the maximum number of variables that Visual FoxPro can maintain. This value can range from 128 to 65,000.Default: 16,384
`OUTSHOW = ON \| OFF`	Disables the ability to hide all windows in front of the current output by pressing SHIFT+CTRL+ALT.Default: ON
`PROGCACHE = nMemoryPages`	Specifies the amount of memory (address space) in pages that Visual FoxPro allocates at startup or a Visual FoxPro MTDLL COM Server allocates per thread for the internal program cache (memory used to run programs). Each page of memory is equal to 64K so the default setting equates to an allocation a little over 9MB. As the cache is filled, Visual FoxPro will try to flush it to remove unused items. It is possible that Visual FoxPro cannot free enough memory in which case an Error 1202 is generated (Program is too large). Adjusting the PROGCACHE setting can prevent this error from occurring. NoteWhile this setting can be used for the Visual FoxPro development product or normal runtime applications, it is primarily intended for MTDLL COM Servers where many threads are often created for a single server. In Visual FoxPro 9.0, the default value for MTDLL COM Servers is -2.When the value of nMemoryPages is greater than 0, Visual FoxPro allocates a fixed program cache. You can specify between 1 and 65000.If you specify 0 for nMemoryPages, no program cache is used. Instead, Visual FoxPro uses dynamic memory allocation based on determinations made by the operating system.If you pass a value for nMemoryPages that is less than 0, Visual FoxPro uses dynamic memory allocation but is limited to the specified memory (nMemoryPages * 64K). When the limit is reach, Visual FoxPro will flush allocated programs to free memory.You can call SYS(3065) to determine the current PROGCACHE setting. CLEAR PROGRAM will attempt to clear unreferenced code regardless of this setting. NoteThe Visual FoxPro OLE DB Provider ignores this setting since it uses dynamic memory allocation (PROGCACHE=0).Default: 144 (-2 for MTDLL)
`PROGWORK path`	Specifies where Visual FoxPro stores the program cache file.For faster performance, especially in a multiuser environment, specify a fast disk, such as a local disk or memory, if available. Provide at least 256K for the cache, though the file can grow larger.Default: Operating system dependent. For more information, see Optimizing the Operating Environment.
`REPORT extension`	Specifies the extension for Visual FoxPro report definition files.Default: .frx
`RESOURCE path[\file] \| OFF`	Specifies the location of the FoxUser.dbf resource file or prevents Visual FoxPro from using a resource file. The file argument is optional. If file is omitted, Visual FoxPro searches for the FoxUser.dbf file. If the specified file does not exist, it is created.Default: Startup directory as path and FoxUser.dbf as file.
`SCREEN = ON \| OFF`	Specifies whether the main Visual FoxPro window appears when opening Visual FoxPro.When an application consists of one or more top-level forms that are displayed in the Windows desktop, setting SCREEN to OFF can be useful, making the main Visual FoxPro window not required. For further information on top-level forms, see Controlling Form Behavior.Default: ON
`SORTWORK path`	Specifies where commands such as SORT and INDEX should place work files.Because work files can be up to twice as large as the tables being sorted, specify a location with plenty of free space. For faster performance, especially in a multiuser environment, specify a fast disk such as a local disk.Default: Operating system dependent. For more information, see Optimizing the Operating Environment.
`STACKSIZE = nValue`	Specifies the number of nesting levels from 32 to 64,000 for operations such as the DO command. NoteYou can change the nesting level only during Visual FoxPro startup.Default: 128
`TEDIT [/N] editor`	Specifies the name of the text editor used when you edit program files with MODIFY COMMAND or MODIFY FILE.Include the optional clause /N with TEDIT to specify a Windows text editor.Default: Visual FoxPro editor
`TITLE title`	Specifies the title that appears in the caption bar of the main Visual FoxPro window.Default: “Microsoft Visual FoxPro”
`TMPFILES path`	Specifies where temporary work files specified by EDITWORK, SORTWORK, and PROGWORK configuration file settings are stored if they are not specified.Because work files can become very large, specify a location with plenty of free space. For faster performance, especially in a multiuser environment, specify a fast disk such as a local disk.Default: Operating system dependent. For more information, see Optimizing the Operating Environment.

How to: Use Command-Line Options When Starting Visual FoxPro

Article
07/09/2007

In addition to using the SET command and a configuration file, you can specify startup options by including a command-line switch. For example, using command-line options, you can suppress the display of the Visual FoxPro splash screen, which displays at the startup of Visual FoxPro, or specify a nondefault configuration file.

To use a command-line switch

On the command line or in a shortcut, add the switch after the name of the Visual FoxPro executable file, VFPVersionNumber.exe where VersionNumber represents the version number of this release or any Visual FoxPro-created .exe file. NoteIf the command-line switch requires arguments, such as a file name, do not put a space between the switch and the argument. For example, to specify a configuration file, use a command such as:CopyC:\Program Files\Microsoft Visual FoxPro VersionNumber\VFPVersionNumber.exe -CC:\MYAPP.FPW Separate multiple options with single spaces.

The following table lists the command-line switches available in Visual FoxPro.Expand table

Switch	Description
`-A`	Ignore the default configuration file and Windows registry settings.
`-BFileName,Duration`	Display a custom bitmap (.bmp), .gif, or .jpg graphic file and specify its display duration in milliseconds when Visual FoxPro starts. You can also include the -B command-line switch in a Visual FoxPro shortcut. NoteIf the bitmap you specify cannot be located, the bitmap does not display when Visual FoxPro starts.
`-CFileName`	Specify a configuration file, including a path if necessary, other than the default file, Config.fpw.
`-LFileName`	Specify a resource file, including a path if necessary, other than the default, vfp*ENU.dll, so you can use Visual FoxPro in a language other than the current language specified by Windows.
`-R`	In earlier versions, refresh the Windows registry with information about Visual FoxPro, such as associations for Visual FoxPro files. In later versions, use /regserver.
`-T`	Suppress the display of the Visual FoxPro splash screen. By default, when Visual FoxPro starts, it displays a splash screen that shows the Visual FoxPro logo, version number, and other information. If you prefer that users of your application not see this splash screen, you can prevent Visual FoxPro from displaying it using the -T command-line switch.
`/?`	List the available command-line arguments. Available in Visual FoxPro 7.0 and later.
`/regserver`	Register Visual FoxPro default registry keys.
`REGSVR32 server.dll`	Register a .dll component.
`/unregserver`	Remove Visual FoxPro default registry keys.
`/u server.dll`	Remove a .dll component.

How to: Configure Visual FoxPro Toolbars

Article
07/09/2007

Activating and Deactivating Toolbars

By default, only the Standard toolbar is visible. When you use a Visual FoxPro designer tool (for example, the Form Designer), the designer displays the toolbars that you commonly need when working with that designer tool. However, you can activate a toolbar any time you require it.

Run the associated tool.–OR–

Close the associated tool.–OR–

You can also programmatically activate and deactivate toolbars that have been previously activated by using the DEACTIVATE WINDOW or ACTIVATE WINDOW commands as in the following example.Copy

IF WVISIBLE ("Color Palette")
DEACTIVATE WINDOW("Color Palette")
ENDIF

Customizing Existing Toolbars

The easiest way to create custom toolbars is by modifying the toolbars already provided with Visual FoxPro. You can:

Modify an existing toolbar by adding or removing buttons.
Create a new toolbar that contains buttons from existing toolbars.

You can also define custom toolbars by creating a custom toolbar class using code. For details, see Designing Menus and Toolbars.

You can modify any of the toolbars provided with Visual FoxPro. For example, you might want to remove a button from an existing toolbar, or copy buttons from one toolbar to another.

From the View menu, choose Toolbars.
Select the toolbar you want to customize and choose Customize.
Remove buttons from the toolbar by dragging them off of the toolbar.
Add buttons to the toolbar by selecting an appropriate category in the Customize Toolbar dialog box and then dragging the appropriate buttons onto the toolbar.
Complete the toolbar by choosing Close in the Customize Toolbar dialog box and then closing the toolbar window. TipIf you change a Visual FoxPro toolbar, you can restore it to its original configuration of buttons by selecting the toolbar in the Toolbar dialog box and then choosing Reset.

You can create your own toolbars comprised of buttons from other toolbars.

From the View menu, choose Toolbars.
Choose New.
In the New Toolbar dialog box, name the toolbar.
Add buttons to the toolbar by selecting a category in the Customize Toolbar dialog box and then dragging the appropriate buttons onto the toolbar.
You can rearrange buttons on the toolbar by dragging them to the desired position.
Complete the toolbar by choosing Close in the Customize Toolbar dialog box and then closing the toolbar window. NoteYou cannot reset buttons on a toolbar you create.

From the View menu, choose Toolbars.
Select the toolbar you want to delete.
Choose Delete.
Choose OK to confirm the deletion. NoteYou cannot delete toolbars provided by Visual FoxPro.

How to: Dock Windows

Article
07/09/2007

Right-click the title bar of an open and dockable window.
Click Dockable to enable or disable the dockable status of the window. A check mark appears when the window is dockable.-OR-
Click the desired window to make the window active.
On the Window menu, click Dockable to enable or disable the dockable status of the window. A check mark appears when the window is dockable.

For more information about retrieving the dockable status for a window programmatically, see WDOCKABLE( ) Function.

Docking Modes

You can dock windows in three different modes:

Normal dockingWindows dock to a boundary of the Visual FoxPro desktop window.
Linked dockingWindows dock to each other and share a dockable window container.
Tabbed dockingWindows dock to each other and share the full window through the use of tabs.

You can use tabbed docking and linked docking together.

To create normal docking

Make sure the docking status of the window or windows is set to Dockable.
Drag the window title bar to a boundary of the Visual FoxPro desktop window.

To create linked docking

Make sure the docking status of the window or windows is set to Dockable.
Drag the title bar of the desired window to a boundary or docking zone of the target window.

The docking zone is indicated when the window you are dragging changes shape to fit the target window. Visual FoxPro creates an additional title bar for link-docked windows.

To create tabbed docking

Make sure the docking status of the window or windows is set to Dockable.
Drag the title bar of the desired window to the title bar of the target window.

Visual FoxPro adds tabs to the bottom boundary of the docked windows.

To undock windows

To undock normal-docked windows, drag the title bar of the desired window away from the shared window boundary.-or-
To undock link-docked windows, drag the title bar of the desired window away from the shared window.-or-
To undock tab-docked windows, drag the tab of the desired window away from the shared window.

You can disable docking behavior by holding the CTRL key while dragging a window.

Deleting or editing the FoxUser.dbf resource file, which contains your settings, restores or changes your default window settings. For more information about altering your settings, see FoxUser Resource File Structure.

How to: Set Editor Options

Article
07/09/2007

You can configure the window of the Visual FoxPro editor to display text the way you want such as setting the font, text alignment, or syntax coloring. You can also make the editor easier to use by setting preferences for indentation, wordwrap, automatic backup copies, and other features.

To configure the editing window, set your preferences in the Edit Properties dialog box that appears after you open a program or text file open and select Properties from the Edit menu. For details about settings you can configure, see Edit Properties Dialog Box.

To display the Edit Properties dialog box

Open the editing window for a program, text file, or control.
From the Edit menu that becomes active, select Properties. TipYou can display the Font dialog box directly by right-clicking the editing window and choosing Font from the shortcut menu.

For more information about opening a program or text file, see How to: Create Programs.

By default, the settings that you make in the Edit Properties dialog box persist for that file. For example, if you change the font, the font for all text in the current window changes. If you open another editing window, the default settings apply.

You can choose to save your settings so that they apply to all files of the same type, or not to save the new settings at all. If you apply your settings to similar file types, Visual FoxPro uses the settings you make when you edit files with the same extension, for example, all .prg files, or all method code in the Form Designer.

To avoid persisting changes to editor settings

In the Edit Properties window, clear the Save Preferences option and then click OK.

To apply editor options to similar files

In the Edit Properties dialog box, select Use These Preferences As Default and then click OK.

You can also set the color and font that the editor uses to identify keywords, comments, and other elements of programs. For details, see Editor Tab, Options Dialog Box.

How to: Display and Print Source Code in Color

Article
07/09/2007

Displaying Source Code in Color

You can turn on syntax coloring separately for each editing window; however; the color syntax settings that you choose in the Editor tab of the Options dialog box apply to the Command window and most other editing windows. For more information, see Editor Tab, Options Dialog Box and Editing Window.

Note

To display color syntax in run-time applications, Visual FoxPro must be configured to display color syntax. Run-time applications display only the default color settings because run-time applications do not check the Windows registry for settings that you change in the Options dialog box, which specifies the default settings for color syntax.

When color syntax is turned on, Visual FoxPro performs background compilation for the current and single line of code that you are typing. When a line of code contains invalid syntax, Visual FoxPro displays the line of code with the selected formatting style.

To display color syntax in an editing window

Open the editing window that you want to display color syntax.
On the Edit menu, choose Properties.
In the Edit Properties dialog box, select the Syntax coloring check box.
Click OK.

Syntax coloring is activated for the editing window you selected. For more information, see Edit Properties Dialog Box.

You can also display source code files to the screen in color using the TYPE command. For more information, see TYPE Command.

To customize color syntax settings

On the Tools menu, choose Options.
In the Options dialog box, choose the Editor tab.
Under Syntax color settings, choose the color settings that you want.
When you are finished choosing settings, click OK.

The color settings you chose take effect.

To set formatting for invalid color syntax

Turn on syntax coloring for the editing window.
On the Tools menu, choose Options.
In the Options dialog box, choose the Editor tab.
In the Background Compile box, choose the formatting style you want.

Invalid syntax displays in the editing window with the formatting style you chose.

Printing Source Code in Color

You can print source code in color wherever color syntax appears, such as program files, methods, stored procedures, and memos.

Note

To print your code files in color, you must be connected to a color printer to select the color printing options available for your printer. Background colors set in the Editor tab of the Options dialog box are not printed. If you select color printing for source code files, hyperlinks appear underlined.

To print source code in color

Open the program file or code you want to print.
On the File menu, choose Print.
In the Print dialog box, choose your color printer, and then click Preferences.
Select the color printing options available for your printer.
When you are finished, click Print.

When connected to a color printer, you can also print source code in color using the TYPE command with the TO PRINTER clause.

For more information, see Print Dialog Box (Visual FoxPro) and TYPE Command.

Automating Keystroke Tasks with Macros

Article
07/09/2007

If you find yourself repeating the same keystrokes, or if you want to automate some keystroke tasks for your users, you can record and save these keystrokes in macros by using the Macros dialog box.

Note

To use macros, any keyboard you use must have function keys F1 to F9, and a CTRL key or an ALT key.

In This Section

How to: Clear Macro Definitions
Provides instructions on how to use the Macros dialog box to clear macro key definitions.
How to: Create, Save, and Restore Macro Sets
Describes how you can create macro sets through the Macros dialog box.
How to: Edit Macros (Visual FoxPro)
Explains how the Macros dialog box makes it possible to edit existing macros.
How to: Record Macros (Visual FoxPro)
Describes the steps to record keystroke macros in Visual FoxPro.
How to: Set a Default Macro Set
Explains how you can create and set your own default macro set in Visual FoxPro.

Visual FoxPro Environment Settings
Describes different ways to change Visual FoxPro environment settings such as using the Options dialog box, setting configuration options at program startup, using command-line options. You can configure Visual FoxPro toolbars, dock windows, set editor options, and customize the appearance of your applications without altering code.
Optimizing Your System
Provides information about optimizing your operating environment.
Development Productivity Tools
Discusses the different tools available to help make creating Visual FoxPro applications easier and faster.

How to: Clear Macro Definitions

Article
07/09/2007

You can clear macro key definitions through the Macros Dialog Box.

To clear a macro definition

From the Macros dialog box, select a macro from the Individual Macro list.
Choose Clear.

How to: Create, Save, and Restore Macro Sets

Article
07/09/2007

A macro set is a defined set of keys and their associated macros stored in a file with the extension .FKY. You can create macro sets through the Macros Dialog Box.

To create a macro set

From the Macros dialog box, create individual key macros.
Choose Save.
In the Save Current Macros To box, enter a name for the macro set and choose Save.

To restore a macro set

From the Macros dialog box, choose Restore.
Select a macro set file and then choose OK.

How to: Edit Macros (Visual FoxPro)

Article
07/09/2007

You can edit existing macros through the Macros Dialog Box.

To edit a macro

From the Macros dialog box, select a macro from the Individual Macro list and then choose Edit.
Modify the macro contents.
Choose OK.

How to: Record Macros (Visual FoxPro)

Article
07/09/2007

You can record keystroke macros in Visual FoxPro through the Macros Dialog Box.

To record a macro

From the Tools menu, choose Macros.
In the Macros dialog box, choose Record.
Press the key or type the key combination you want to define.
Enter a name for the macro or accept the default and then choose OK.

Visual FoxPro begins recording every keystroke.

Note

Macro names cannot contain spaces.

To stop recording a macro

From the Tools menu, choose Macros.
In the Stop Recording Macro Dialog Box, choose from the following:
- To save the macro as is, choose OK.
- To continue recording, choose Continue.
- To discard the macro, choose Discard.
- To insert a literal keystroke (the literal meaning of a key instead of any meaning currently assigned to it), choose Insert Literal.
- To insert a pause, select Seconds, add the amount of time, and choose Insert Pause.

How to: Set a Default Macro Set

Article
07/09/2007

A default macro set exists when Visual FoxPro starts. You can create and set your own default macro set through the Macros Dialog Box.

To set a default macro set

From the Macros dialog box, create or restore a macro set.
Choose Set Default.

Additional Information

Microsoft Visual FoxPro Web Site

Provides a link to the Microsoft Visual FoxPro Web site for additional information and resources for Visual FoxPro.Microsoft Visual FoxPro Community

Provides a link to Microsoft Visual FoxPro Online Community Web site for third-party community resources and newsgroups.Microsoft Visual FoxPro Training and Resources

Provides a link to the Visual FoxPro training Web site to find information about training, books, and events for Visual FoxPro. Accessibility for People with Disabilities

Provides information about features that make Visual FoxPro more accessible for people with disabilities.

This release of Visual FoxPro contains many new features and enhancements. The following sections describe these new features and enhancements.

In This Section

Guide to Reporting Improvements

A roadmap to all new Reporting enhancements.Data and XML Feature Enhancements

Describes additions and improvements to Visual FoxPro data features.SQL Language Improvements

Describes enhancements to SQL language such as SELECT – SQL Command.Class Enhancements

Describes additions and improvements to Visual FoxPro classes, forms, controls and object-oriented related features.Language Enhancements

Describes additions and improvements to the Visual FoxPro programming language.Interactive Development Environment (IDE) Enhancements

Describes additions and improvements made to the Visual FoxPro IDE.Enhancements to Visual FoxPro Designers

Describes improvements made to designers available in Visual FoxPro.Miscellaneous Enhancements

Describes other improvements made in this version of Visual FoxPro.Changes in Functionality for the Current Release

Describes changes in the behavior of existing language and functionality.Visual FoxPro New Reserved Words

Lists new reserved words added to Visual FoxPro.

Related Sections

Getting Started with Visual FoxPro

Provides information about where to find the ReadMe file and how to install and upgrade from previous versions, configure Visual FoxPro, and customize the development environment.Using Visual FoxPro

Provides an overview of Visual FoxPro features, describes concepts and productivity tools for developing, programming, and managing high-performance database applications and components, and provides walkthroughs that help get you started. With the robust tools and data-centric object-oriented language that Visual FoxPro offers, you can build modern, scalable, multi-tier applications that integrate client/server computing and the Internet.Samples and Walkthroughs

Contains Visual FoxPro code samples and step-by-step walkthroughs that you can use for experimenting with and learning Visual FoxPro features.Reference (Visual FoxPro)

Describes Visual FoxPro general, programming language, user interface, and error message reference topics.Product Support (Visual FoxPro)

Provides information about Microsoft product support services for Visual FoxPro.

Guide to Reporting Improvements

Visual FoxPro 9’s Report System has undergone a thorough revision. This topic sketches the broad outlines of the changes, and provides you with information about where to look for details.

The following main areas of enhancements to the Report System are covered in sections of this topic. Design-time enhancements.

Multiple features and changes make designing reports in Visual FoxPro better for you and your end-users. The Report Builder Application re-organizes your design experience out-of-the-box. If you want to customize the design process, Report Builder dialog boxes and Report Designer events are fully exposed for you to do so. Multiple detail bands.

You can handle multiple child tables and data relationships more flexibly in the revised Report Designer. When you run multiple-detail-band reports, you can leverage the new bands, with associated detail headers and footers, both for appropriate presentation of these relationships and for more capable calculations. Object-assisted run-time report processing.

An entirely re-built output system, including a new base class, changes the way Visual FoxPro provides output report and label files at run time. Object-assisted reporting provides better-quality output, new types of output, and an open-architecture based on a new Visual FoxPro base class, the ReportListener. A programmable Report Preview interface interacts with ReportListeners to give you full control over report preview experience. The Report Preview Application provides improved out-of-the-box previewing facilities. Printing, rendering, and character-set-handling improvements.

Visual FoxPro 9 makes better use of the operating system’s printing features and GDI+ rendering subsystem. It also handles multiple locales and character sets better than previous versions. These changes are showcased in the Report System, and are accessible for use in custom code during report design and run-time processing. Extensible use of report and label definition files (.frx and .lbx tables).

Visual FoxPro 9 handles your existing reports and labels without modification, while allowing you to add new features and behavior to these reports easily. This backward-compatible, yet forward-thinking, migration strategy is made possible by the Report System’s newly-flexible handling of the .frx and .lbx table structure.

Design-time Enhancements

Numerous changes in the Report System help you enhance the design-time experience for developers and end-users. This section directs you to information about design-time improvements.

Report Designer Event Hooks and the Report Builder Application

To learn about:	Read:
Report Builder Hooks	Understanding Report Builder Events
How the Report Builder Application uses Report Builder Hooks	How to: Configure the Report Builder’s Event Handling
How to specify and distribute a Report Builder with your applications	_REPORTBUILDER System Variable How to: Specify and Distribute ReportBuilder.App Including Report Files for Distribution
Using Report Builder algorithms in your code	FRX Cursor Foundation Class FRX Device Helper Foundation Class

Protection for End-User Design Sessions, and other Design-time Customization Opportunities

You can allow end-users to MODIFY and CREATE reports and labels, while setting limitations on what they can do in the Report Designer interface, using the new PROTECTED keyword. Protection is available individually by object and globally for the report. You can change what end-users see on the designer layout surface, from complex expressions to simple labels or sample data, while working in PROTECTED design mode, using Design-Time Captions. You can also provide helpful instructions, for both PROTECTED and standard design mode, by specifying Tooltips for report controls.

To learn about:	Read:
Using the PROTECTED keyword	MODIFY REPORT Command MODIFY LABEL Command
Setting Protection in the Report or Label Designer, and what Protection settings do	Setting Protection for Reports
Protection settings exposed in Report or Label Dialog dialog boxes when you use the default Report Builder Application	Protection Tab, Report Control Properties Dialog Box (Report Builder) Protection Tab, Report Properties Dialog Box (Report Builder) Protection Tab, Report Band Properties Dialog Box (Report Builder)
Design-Time Captions	How to: Add Design-time Captions to Field Controls
ToolTips for Report Controls	How to: Add Tooltips to Report Controls

Enhanced Data Environment Use in Reports

To learn about:	Read:
Saving a Report Data Environment	How to: Save Report Data Environments as Classes
Loading a Report Data Environment	Data Environment Tab, Report Properties Dialog Box (Report Builder) How to: Load Data Environments for Reports

Miscellaneous Design Improvements

Multiple Detail Bands

Detail bands can have their own headers and footers, their own associated onEntry and onExit code, and their own associated report variables. Each detail band can be explicitly associated with a separate target alias, allowing you to control the number of entries in each detail band separately for related tables.

Multiple detail band reports provide many new ways you can represent data in reports and labels, and new ways you can calculate or summarize data, as you move through a record scope.

To learn about:	Read:
Designing reports and labels with multiple detail bands and their associated headers and footers	Optional Bands Dialog Box Report Band Properties Dialog Box Band Tab, Report Band Properties Dialog Box (Report Builder)
Handling multiple, related tables in report and label data	Controlling Data in Reports Working with Related Tables using Multiple Detail Bands in Reports
Associating report variables with detail bands	How to: Reset Report Variables
Comparing multiple groups and multiple detail bands	Report Bands

Object-assisted Run-time Report Processing

Visual FoxPro 9 has a new, object-assisted method of generating output from reports and labels. You can use your existing report and label layouts in object-assisted mode, to:

Generate multiple types of output during one report run.
Connect multiple reports together as part of one output result.
Improve the quality of traditional report output.
Dynamically adjust the contents of a report while you process it.
Provide new types of output not available from earlier versions of Visual FoxPro.

This section covers the array of run-time enhancements that work together to support object-assisted reporting mode.

Object-Assisted Architecture and ReportListener Base Class

The new ReportListener base class and supporting language enhancements are the heart of run-time reporting enhancements.

To learn about:	Read:
Fundamentals of the architecture, how its components work together, and what happens during an object-assisted report run	Understanding Visual FoxPro Object-Assisted Reporting
The ReportListener base class and its members	ReportListener Object ReportListener Object Properties, Methods, and Events
Invoking object-assisted reporting mode automatically	SET REPORTBEHAVIOR Command _REPORTOUTPUT System Variable Reports Tab, Options Dialog Box
Invoking object-assisted reporting mode explicitly with Visual FoxPro commands	REPORT FORM Command LABEL Command
Debugging and error-handling object-assisted report runs	Handling Errors During Report Runs

Report Preview API and the Report Preview Application

Visual FoxPro 9’s object-assisted reporting mode gives you complete control over report and label previews.

To learn about:	Read:
How object-assisted preview works	The Preview Container API Creating a Custom Preview Container
The default Report Preview Application	Leveraging the Default Preview Container
How to specify and distribute Report Preview components with your applications	_REPORTPREVIEW System Variable How to: Specify and Distribute ReportPreview.App Including Report Files for Distribution

New Types of Output and the Report Output Component Set

To learn about:	Read:
Requirements for Report Output Application, and how Visual FoxPro uses Report Output Applications	_REPORTOUTPUT System Variable
Features of the default Report Output Application	Understanding the Report Output Application
Specifying custom output handlers using the default Report Output Application	How to: Specify an Alternate Report Output Registry Table How to: Register Custom ReportListeners and Custom OutputTypes in the Report Output Registry Table Considerations for Creating New Report Output Types
Understanding and configuring the Visual FoxPro Foundation Classes providing default ReportListener behavior for object-assisted preview and printing	ReportListener User Feedback Foundation Class
Understanding and configuring the Visual FoxPro Foundation Classes responsible for default XML and HTML output	ReportListener XML Foundation Class ReportListener HTML Foundation Class
Leveraging the full set of supported Report Output Foundation Classes and VFP Report Output XML format	ReportListener Foundation Classes Using VFP Report Output XML
How to specify and distribute Report Output components with your applications	How to: Specify and Distribute Report Output Application Components Including Report Files for Distribution

Migration Strategies and Changes in Output Rendering

You can use the design-time changes to improve all reports and labels, whether you choose backward-compatible or object-assisted reporting mode at run time.

When evaluating whether to switch to object-assisted reporting mode at run time, first consider items on the Reporting list of Important Changes in the Changes in Functionality for the Current Release topic, some of which are specific to this new method of creating output. .The topic includes a table of minor differences between backward-compatible and object-assisted reporting output. You can examine what effects these changes might have on individual existing reports, and use the recommendations in the table to address them. You will find additional details in the topic Using GDI+ in Reports.

Once you have experimented with your current reports, you can decide on a migration strategy for output:

You can switch applications over to use object-assisted reporting mode completely, by using the command SET REPORTBEHAVIOR 90.
You can use SET REPORTBEHAVIOR 90 but preface specific REPORT FORM commands for reports with formatting issues with SET REPORTBEHAVIOR 80, returning your application to object-assisted mode afterwards.
You can use object-assisted mode all the time, but adjust your ReportListener-derived classes’ behavior to suit specific needs. For example, you could change the default setting of the ReportListener’s DynamicLineHeight Property to False (.F.).
You can leave SET REPORTBEHAVIOR at its default setting of 80, and add an explicit OBJECT clause to specific reports at your leisure, as you have the opportunity to evaluate and adjust individual report and label layouts.

Printing, Rendering, and Character-set-handling Improvements

To learn about:	Read:
GDI+ features and their impact on native Visual FoxPro output	Using GDI+ in Reports
Visual FoxPro reporting enhancements that allow your code to use GDI+ in object-assisted reporting mode, and Visual FoxPro Foundation Classes to get you started	GDIPlusGraphics Property Render Method GDI Plus API Wrapper Foundation Classes
Making full use of multiple character sets, or language scripts, in reports, for single report layout elements, for report defaults, or globally in Visual FoxPro	GETFONT( ) Function Style Tab, Report Control Properties Dialog Box (Report Builder) How to: Change Page Settings for Reports Reports Tab, Options Dialog Box Reporting Features for International Applications
Changes to page setup dialog boxes in Visual FoxPro, improvements in your programmatic access to them, and providing overrides to Printer Environment settings in report and label files	SYS(1037) – Page Setup Dialog Box
Receiving improved information about the user’s installed printers	APRINTERS( ) Function
Limiting a list of fonts to those appropriate for printer user	GETFONT( ) Function

Extensible Use of Report and Label Definition Files

Visual FoxPro 9 provides a revised FILESPEC table for report and label files, with extensive information on the use of each column in earlier versions as well as current enhancements.

Visual FoxPro 9 also establishes a new, structured metadata format for use with reports. This format is an XML document schema shared with the Class Designer’s XML MemberData.

To learn about:	Read:
How Visual FoxPro uses .frx and .lbx tables, and how to extend these structures	Understanding and Extending Report Structure
How to find and display the contents of the revised FILESPEC table, 60FRX.dbf	Table Structures of Table Files (.dbc, .frx, .lbx, .mnx, .pjx, .scx, .vcx)
How you can edit the XML data using the Report Builder Application	How to: Assign Structured Metadata to Report Controls
How you can use Report XML MemberData	Report XML MemberData Extensions
The shared MemberData document schema	MemberData Extensibility

Extended SQL Capabilities

Visual FoxPro contains many enhancements for SQL capabilities. For more information, see SQL Language Improvements.

New Data Types

Visual FoxPro includes the following new field and data types:

Varchar To store alphanumeric text without including padding by additional spaces at the end of the field or truncating trailing spaces, use the new Varchar field type. If you do not want Varchar fields translated across code pages, use the Varchar (Binary) field type. For more information, see Varchar Field Type. You can specify Varchar type mapping between ODBC, ADO, and XML data source types and CursorAdapter and XMLAdapter objects using the MapVarchar Property. You can also specify Varchar mapping for SQL pass-through technology and remote views using the MapVarchar setting in the CURSORSETPROP( ) function. For more information, see CURSORSETPROP( ) Function and CURSORGETPROP( ) Function.
Varbinary To store binary values and literals of fixed length in fields and variables without padding the field with additional zero (0) bytes or truncating any trailing zero bytes that are entered by the user, use the Varbinary data type. Internally, Visual FoxPro binary literals contain a prefix, 0h, followed by a string of hexadecimal numbers and are not enclosed with quotation marks (“”), unlike character strings. For more information, see Varbinary Data Type. You can specify binary type mapping between ODBC, ADO, and XML data source types and CursorAdapter and XMLAdapter objects using the MapBinary Property. You can also specify binary mapping for SQL pass-through technology and remote views using the MapBinary setting in the CURSORSETPROP( ) function. For more information, see CURSORSETPROP( ) Function and CURSORGETPROP( ) Function.
Blob To store binary data with indeterminate length, use the Blob data type. For more information, see Blob Data Type.

Many of the Visual FoxPro language elements affected by these new data types are listed in the topics for the new data types.

Binary Index Tag Based on Logical Expressions

Visual FoxPro includes a new binary, or bitmap, index for creating indexes based on logical expressions, for example, indexes based on deleted records. A binary index can be significantly smaller than a non-binary index and can improve the speed of maintaining indexes. You can create binary indexes using the Table Designer or INDEX command. Visual FoxPro also includes Rushmore optimization enhancements in the SQL engine for deleted records.

For more information, see Visual FoxPro Index Types, INDEX Command, ALTER TABLE – SQL Command, and Indexes Based on Deleted Records.

Converting Data Types with the CAST( ) Function

You can convert expressions from one data type to another by using the new CAST( ) function. Using CAST( ) makes it possible for you to create SQL statements more compatible with SQL Server.

For more information, see CAST( ) Function.

Get Cursor and Count Records Affected by SQL Pass-Thru Execution

By using the aCountInfo parameter of the SQLEXEC( ) and SQLMORERESULTS( ) functions, you can get the name of the cursor created and a count of the records affected by the execution of a SQL pass-through statement.

For more information, see SQLEXEC( ) Function) and SQLMORERESULTS( ) Function.

Roll-Back Functionality Supported when a SQL Pass-Through Connection Disconnects

Visual FoxPro now supports the DisconnectRollback property for use with the SQLSETPROP( ), SQLGETPROP( ), DBSETPROP( ), and DBGETPROP( ) functions. DisconnectRollback is a connection-level property that causes a transaction to be either rolled back or committed when the SQLDISCONNECT( ) function is called for the last connection handle associated with the connection. The DisconnectRollback property accepts a logical value. False (.F.) – (Default) The transaction will be committed when the SQLDISCONNECT( ) function is called for the last statement handle associated with the connection. True (.T.) – The transaction is rolled back when the SQLDISCONNECT( ) function is called for the last statement handle associated with the connection. The following example shows the DisconnectRollback property set in the DBSETPROP( ) and SQLSETPROP( ) functions. Copy Code DBSETPROP(“testConnection”,”CONNECTION”,”DisconnectRollback”,.T.) SQLSETPROP(con,”DisconnectRollback”,.T.)

For more information, see DisconnectRollback property in SQLSETPROP( ) Function.

SQLIDLEDISCONNECT( ) Temporarily Disconnects SQL Pass-Through Connections

You can use the new SQLIDLEDISCONNECT( ) function to allow a SQL Pass-Through connection to be temporarily disconnected. Use the following syntax. Copy Code SQLIDLEDISCONNECT( nStatementHandle ) The nStatementHandle parameter is set to the statement handle to be disconnected or 0 if all statement handles should be disconnected. The SQLIDLEDISCONNECT( ) function returns the value 1 if it is successful; otherwise, it returns -1. The function fails if the specified statement handle is busy or the connection is in manual commit mode. The AERROR( ) function can be used to obtain error information. The disconnected connection handle is automatically restored if it is needed for an operation. The original connection data source name is used. If a statement handle is temporarily released, the OBDChstmt property returns 0; the OBDChdbc returns 0 if the connection is temporarily disconnected. A shared connection is temporarily disconnected as soon as all of its statement handles are temporarily released.

For more information, see SQLIDLEDISCONNECT( ) Function.

Retrieving Active SQL Connection Statement Handles

You can retrieve information for all active SQL connection statement handles using the new ASQLHANDLES( ) function. ASQLHANDLES( ) creates and uses the specified array to store numeric statement handle references that you can use in other Visual FoxPro SQL functions, such as SQLEXEC( ) and SQLDISCONNECT( ). ASQLHANDLES( ) returns the number of active statement handles in use or zero (0) if none are available. For more information, see ASQLHANDLES( ) Function.

Obtain the ADO Bookmark for the Current Record in an ADO-Based Cursor

The ADOBookmark property is now supported by the CURSORGETPROP( ) function. Use this property to obtain the ActiveX® Data Objects (ADO) bookmark for the current record in an ADO-based cursor.

For more information, see ADOBookmark Property in CURSORGETPROP( ) Function.

Obtain the Number of Fetched Records

You can obtain the number of fetched records during SQL Pass-Through execution by using the new RecordsFetched cursor property with the CURSORGETPROP( ) function. Specifying the RecordsFetched cursor property will return the number of fetched records from an OBDC/ADO-based cursor. If records have been deleted or appended locally, the RecordsFetched cursor property may not return the current number of records in the OBDC/ADO-based cursor. In addition, filter conditions are ignored.

For more information, see RecordsFetched Property in CURSORGETPROP( ) Function.

Determine if a Fetch is Complete

You can determine if a fetch process is complete for an OBDC/ADO-based cursor by using the new FetchIsComplete cursor property with the CURSORGETPROP( ) function. Read-only at design time and run time. This property is not supported on environment level (work area 0) cursors, tables, and local views. The FetchIsComplete cursor property returns a logical expression True (.T.) if the fetch process is complete; otherwise False (.F.) is returned.

For more information, see FetchIsComplete Property in CURSORGETPROP( ) Function.

ISMEMOFETCHED( ) Determines Whether a Memo is Fetched

You can use the ISMEMOFETCHED( ) function to determine whether a Memo field or General field is fetched when you are using delayed memo fetching. For more information about delayed memo fetching, see Speeding Up Data Retrieval.

The syntax for this function is:

ISMEMOFETCHED(cFieldName | nFieldNumber [, nWorkArea | cTableAlias ])

The ISMEMOFETCHED( ) function returns True (.T.) when the Memo field is fetched or if local data is used. ISMEMOFETCHED() returns NULL if the record pointer is positioned at the beginning of the cursor or past the last record.

For more information, see ISMEMOFETCHED( ) Function.

Cancel ADO Fetch

In Visual FoxPro, you can now cancel a lengthy ADO fetch by pressing the ESC key.

Long Type Name Support

Visual FoxPro supports using long type names with the following functions, commands, and properties.

The following table lists the data types along with their long type names and short type names.

Data Type	Long Type Name	Short Type Name
Character	Char, Character	C
Date	Date	D
DateTime	Datetime	T
Numeric	Num, Numeric	N
Floating	Float	F
Integer	Int, Integer	I
Double	Double	B
Currency	Currency	Y
Logical	Logical	L
Memo	Memo	M
General	General	G
Picture	Picture	P
Varchar	Varchar	V
Varbinary	Varbinary	Q
Blob	Blob	W

Visual FoxPro allows ambiguous long type names to be used with the ALTER TABLE, CREATE CURSOR, CREATE TABLE, and CREATE FROM commands. If the specified long type name is not a recognized long type name, Visual FoxPro will truncate the specified name to the first character.

Transaction Support for Free Tables and Cursors

Specify a Code Page When Using the CREATE TABLE or CREATE CURSOR Commands

You can specify a code page by including the CODEPAGE clause with the CREATE CURSOR or CREATE TABLE commands. When the CODEPAGE clause is specified, the new table or cursor has a code page specified by nCodePage. An error, 1914, “Code page number is invalid”, is generated if an invalid code page is specified. The following example creates a table and displays its code page: Copy Code CREATE TABLE Sales CODEPAGE=1251 (OrderID I, CustID I, OrderAmt Y(4)) ? CPDBF( )

For more information, see CREATE CURSOR – SQL Command, CREATE TABLE – SQL Command and Code Pages Supported by Visual FoxPro.

Convert Character and Memo Data Types Using the ALTER TABLE Command

Visual FoxPro now supports automatic conversion from character data type to memo data type without loss of data when using the ALTER TABLE command along with the ALTER COLUMN clause. This conversion is also supported when making structural changes using the Table Designer. For more information, see ALTER TABLE – SQL Command.

BLANK Command Can Initialize Records to Default Value

You can initialize fields in the current record to their default values as stored in the table’s database container (DBC) by using the DEFAULT [AUTOINC] option when clearing the record with the BLANK command. For more information, see BLANK Command.

FLUSH Command Writes Data Explicitly to Disk

Visual FoxPro now includes options and parameters for the FLUSH command and FFLUSH function so you can explicitly save all changes you make to all open tables and indexes. You can also save changes to a specific table by specifying a work area, table alias, or a path and file name. For more information, see FLUSH Command and FFLUSH( ) Function.

Populate an Array with Aliases Used by a Specified Table

The new cTableName parameter for the AUSED( ) function makes it possible to filter the created array to contain only the aliases being used for a specified table. AUSED(ArrayName [, nDataSessionNumber [, cTableName ]]) The cTableName parameter accepts the following formats to specify a table, from highest to lowest in priority. DatabaseName!TableName or DatabaseName!ViewName Path\DatabaseName!TableName or Path\DatabaseName!ViewName DBC-defined table name or view in the current DBC in the current data session Simple or full file name

For more information, see AUSED( ) Function.

Obtain Last Auto-Increment Value with GETAUTOINCVALUE( )

You can use the new GETAUTOINCVALUE( ) function to return the last value generated for an autoincremented field within a data session. For more information, see GETAUTOINCVALUE( ) Function.

SET TABLEPROMPT Controls Prompt to Select Table

Use SET VARCHARMAPPING to Control Query Result Set Mappings

For queries such as SELECT – SQL Command, character data is often manipulated using Visual FoxPro functions and expressions. Since the length of the resulting field value may be important for certain application uses, it is valuable to have this Character data mapped to Varchar data in the result set. The SET VARCHARMAPPING command controls whether Character data is mapped to a Character or Varchar data type. For more information, see SET VARCHARMAPPING Command.

SET TABLEVALIDATE Expanded

When a table header is locked during validation, attempts to open the table, for example, with the USE command, generate the message “File is in use (Error 3).” If the table header cannot be locked for a table open operation, you can suppress this message by setting the third bit for the SET TABLEVALIDATE command. You must also set the first bit to validate the record count when the table opens. Therefore, you need to set the SET TABLEVALIDATE command to a value of 5. Also, a fourth bit option (value of 8) is available for Insert operations which checks the table header before the appended record is saved to disk and the table header is modified.

For more information, see SET TABLEVALIDATE Command.

SET REFRESH Can Specify Faster Refresh Rates

You can specify fractions of a second for the nSeconds2 parameter to a minimum of 0.001 seconds. You can also specify the following values for the optional second parameter: -1 – Always read data from a disk. 0 – Always use data in memory buffer but do not refresh buffer. The Table refresh interval check box on the Data tab of the Options dialog box now also accepts fractional values.

For more information, see SET REFRESH Command and Data Tab, Options Dialog Box.

SET REFRESH Can Differentiate Values for Each Cursor

You can use the new Refresh property with the CURSORGETPROP( ) function to differentiate the SET REFRESH values for individual cursors. The default setting is -2, which is a global value. This value is not available with the SET REFRESH command. The Refresh property is available at the Data Session and Cursor level. The default setting for a Data Session level is -2 and the default value for a Cursor level is the current session’s level setting. If the global level setting is set to 0, the Cursor level setting is ignored. If a table is not currently selected and an alias is not specified, Error 52, “No table is open in the current work area,” is generated.

For more information, see Refresh Property in CURSORGETPROP( ) Function.

SET( ) Determines SET REPROCESS Command Settings

You can now use the following syntax with the SET( ) function to determine how the SET REPROCESS command was declared. SET Command Value Returned REPROCESS, 2 Current session setting type (0 – attempts, 1 – seconds) REPROCESS, 3 System session setting type (0 – attempts, 1 – seconds)

For more information, see SET( ) Function and SET REPROCESS Command.

Log Output from SYS(3054) Using SYS(3092)

You can use the new SYS(3092) function in conjunction with SYS(3054) to record the resulting output to a file. SYS( 3092 [, cFileName [, lAdditive ]]) The cFileName parameter specifies the file to echo the SYS(3054) output to. Sending an empty string to cFileName will deactivate output recording to the file. The default value for lAdditive is False (.F.). This specifies that new output will overwrite the previous contents of the specified file. To append new output to the specified file, set lAdditive to True (.T.). SYS(3092) returns the name of the current echo file if it is active; otherwise, it returns an empty string. SYS(3054) and SYS(3092) are global settings — in a multithreaded runtime they are scoped to a thread. Each function can be changed independently from each other. These functions are not available in the Visual FoxPro OLE DB Provider.

For more information, see SYS(3054) – Rushmore Query Optimization Level and SYS(3092) – Output Rushmore Query Optimization Level.

Purge Cached Memory for Specific Work Area Using SYS(1104)

You can optionally specify the alias or work area of a specified table or cursor for which cached memory is purged. For more information, see SYS(1104) – Purge Memory Cache.

New Table Types for SYS(2029)

The SYS(2029) function returns new values for tables that contain Autoinc, Varchar, Varbinary or Blob fields. For more information, see SYS(2029) – Table Type.

Map Remote Unicode Data to ANSI Using SYS(987)

Use SYS(987) to map remote Unicode data retrieved through SQL pass-through or remote views to ANSI. This function can be used to retrieve remote Varchar data as ANSI for use with Memo fields. This setting is a global setting across all data sessions so should be used with care. For more information, see SYS(987) – Map Remote Data to ANSI.

Memo and Field tips in a BROWSE or Grid

When the mouse pointer is positioned over a Memo field cell in a Browse window or Grid control, a Memo Tip window displays the contents of the Memo field. For other field types, positioning the mouse pointer over the field displays the field contents in a Field Tip window when the field is sized smaller than its contents.

Memo Tip windows display no more than 4 kilobytes of text, and are not displayed for binary data. A Memo Tip window is displayed until the mouse pointer is moved from the Memo field. The _TOOLTIPTIMEOUT System Variable determines how long a Field Tip window is displayed.

You can disable Memo Tips by setting the _SCREEN ShowTips Property to False (.F.).

Memo and Field Tips will also be displayed for Grid controls if both _SCREEN and the form’s ShowTips property are set to True (.T.). Additionally, the ToolTipText Property for the field’s grid column Textbox control must contain an empty string.

Specify Code Pages

You can specify the code page used to decode data when XML is being parsed and to encode data when UTF-8 encoded XML is generated. The following language changes are available:

nCodePage Parameter To specify code pages, you can use the nCodePage parameter for the following XMLToTable methods:
Copy Code XMLTable.ToCursor ( [ lAppend [, cAlias [, nCodePage ]]] ) XMLTable.ChangesToCursor( [ cAlias [, lIncludeUnchangedData [, nCodePage ]]] ) XMLTable.ApplyDiffgram( [ cAlias [, oCursorAdapter [, lPreserveChanges [, nCodePage ]]]] )
CodePage and UseCodePage Properties Use the CodePage Property and UseCodePage Property to specify code pages when you use the following classes:
Copy Code XMLAdapter.CodePage = nValue XMLTable.CodePage = nValue XMLField.CodePage = nValue
Flag 32768 The flag 32768 is available for the following functions and class:
Copy Code CursorAdaptor.Flags = nCodePage XMLTOCURSOR( eExpression | cXMLFile [, cCursorName [, nFlags ]]) CURSORTOXML(nWorkArea | cTableAlias, cOutput [, nOutputFormat [, nFlags [, nRecords [, cSchemaName [, cSchemaLocation [, cNameSpace ]]]]]]) XMLUPDATEGRAM( [ cAliasList [, nFlags [, cSchemaLocation]]]) The nCodePage parameter must match a recognized Visual FoxPro code page.

For more information, see Code Pages Supported by Visual FoxPro.

MapVarchar Property Maps to Varchar, Varbinary, and Blob Data Types

For CursorAdapter and XMLAdapter classes, you can use the MapVarchar property to map to Varchar data types. To map to Varbinary and Blob data types, you can use the MapBinary property.

The XMLTOCURSOR( ) Function contains several new flags to support mapping of Char and base64Binary XML field types to new Fox data types.

For more information, see the MapVarchar Property and MapBinary Property.

Handling Conflict Checks with Properties for CursorAdapter Class

You can better handle conflicts when performing update and delete operations using the commands specified by the UpdateCmd and DeleteCmd properties for CursorAdapter objects by using the new ConflictCheckType and ConflictCheckCmd properties for CursorAdapter objects. You can use ConflictCheckType to specify how to handle a conflict check during an update or delete operation. When ConflictCheckType is set to 4, you can use ConflictCheckCmd to specify a custom command to append to the end of the commands in the UpdateCmd and DeleteCmd properties. Note: Visual FoxPro 8.0 Service Pack 1 includes the ConflictCheckType and ConflictCheckCmd properties.

For more information, see ConflictCheckType Property and ConflictCheckCmd Property.

Improved DataEnvironment Handling with UseCursorSchema and NoData Properties

You can specify default settings for CursorFill Method calls made without the first two parameters by setting these properties. For more information, see UseCursorSchema Property and NoData Property.

Timestamp Field Support

The new TimestampFieldList property lets you specify a list of timestamp fields for the cursor created by the CursorAdapter. For more information see TimestampFieldList Property.

Auto-Refresh Support

There are a number of scenarios where you might want to have cursor data refreshed from a remote data source after an Insert/Update operation has occurred. These include following scenarios: A table has an auto-increment field that also acts as a primary key. A table has a timestamp field, and that field must be refreshed from the database after each Insert/Update in order to allow successful subsequent updates to the record when WhereType=4 (key and timestamp). A table contains some fields which have DEFAULT values or triggers defined that will cause changes to occur. The following new properties have been added to the CursorAdapter class for Auto-Refresh support: Property Description InsertCmdRefreshFieldList List of fields to refresh after Insert command executes. InsertCmdRefreshCmd Specifies the command to refresh the record after Insert command executes. InsertCmdRefreshKeyFieldList List of key fields to refresh in record after Insert command executes. UpdateCmdRefreshFieldList List of fields to refresh after Update command executes. UpdateCmdRefreshCmd Specifies the command to refresh the record after Update command executes. UpdateCmdRefreshKeyFieldList List of key fields to refresh the record after Update command executes. RefreshTimestamp Enables automatic refresh for fields in TimestampFieldList during Insert/Update.

On Demand Record Refresh

In Visual FoxPro 8.0, the REFRESH( ) Function provides on demand record refresh functionality for local and remote views, however, it does not support this for the CursorAdapter. Visual FoxPro 9.0 extends REFRESH( ) support to the CursorAdapter and provides some additional capabilities:

Member	Description
RecordRefresh method	Refreshes the current field values for the target records. Use the CURVAL( ) Function to determine current field values.
BeforeRecordRefresh event	Occurs immediately before the RecordRefresh method is executed.
AfterRecordRefresh event	Occurs after the RecordRefresh method is executed.
RefreshCmdDataSourceType property	Specifies the data source type to be used for the RecordRefresh method.
RefreshCmdDataSource property	Specifies the data source to be used for the RecordRefresh method.
RefreshIgnoreFieldList property	List of fields to ignore during RecordRefresh operation
RefreshCmd property	Specifies the command to refresh rows when RecordRefresh is executed.
RefreshAlias property	Specifies the alias of read-only cursor used as a target for the refresh operation.

Delayed Memo Fetch

The CursorAdapter class has a FetchMemo Property, which when set to False (.F.) in Visual FoxPro 9.0 places the cursor in Delayed Memo Fetch mode similar to Remote Views. Delayed Memo Fetch Mode prevents the contents of Memo fields from being fetched using CursorFill Method or CursorRefresh Method. An attempt to fetch content for a Memo field is done when the application attempts to access the value. The following CursorAdapter enhancements provide support for Delayed Memo Fetch:

Member	Description
DelayedMemoFetch method	Performs a delayed Memo field fetch for a target record in a cursor in a CursorAdapter object.
FetchMemoDataSourceType property	Specifies the data source type used for the DelayedMemoFetch method.
FetchMemoDataSource property	Specifies the data source used for the DelayedMemoFetch method.
FetchMemoCmdList property	Specifies a list of Memo field names and their associated fetch commands.

For more information, see DelayedMemoFetch Method, FetchMemoDataSourceType Property, FetchMemoDataSource Property and FetchMemoCmdList Property.

UseTransactions Property

The new UseTransactions property specifies whether the CursorAdapter should use transactions when sending Insert, Update or Delete commands through ADO or ODBC. For more information, see UseTransactions Property.

DEFAULT and CHECK Constraints Respected

Remote Data Type Conversion for Logical Data

When you move data between a remote server and Visual FoxPro, Visual FoxPro uses ODBC or ADO data types to map remote data types to local Visual FoxPro data types. In Visual FoxPro 9.0, certain ODBC and ADO data types can now be mapped to a logical data type in remote views and the CursorAdapter object. For more information, see Data Type Conversion Control.

ADOCodePage Property

When working with an ADO data source for your CursorAdapter, you may want to specify a code page to use for character data translation. The new ADOCodePage property allows you to specify this code page. For more information, see ADOCodePage Property.

Read and Write Nested XML Documents

You can read to and write from your relational database into XML documents using nesting to handle the relationships between tables. You accomplish this using the RespectNesting Property of the XMLAdapter class. The XMLTable class has the Nest Method, Unnest Method and the following properties to handle nesting.

For more information, see the XMLAdapter Class and the XMLTable Class.

LoadXML Method Can Accept Any XML Document

The LoadXML method accepts any XML document with a valid schema. Previously, the method required that the schema follow the format of a Visual Studio generated dataset. When you use the LoadXML method to read an XML document with a schema different from a Visual Studio generated dataset, the properties for the XMLAdapter, XMLName, and XMLPrefix properties are set to empty (“”). The XMLAdapter XMLNamespace property becomes equal to the target Namespace attribute value for the schema node and each XML element becomes a complexType and is mapped to an XMLTable object. The XMLNamespace property is set to namespaceURI for the element. If you set the XMLAdapter RespectNesting property to True (.T.), the top level element declaration is ignored if it is referenced from some other complex element. For that case, the XMLTable object for the referenced element is nested into the XMLTable for the element that references it.

For more information, see LoadXML Method.

XPath Expressions Can Access Complex XML Documents

You can use XPath expressions to access complex XML documents and the new properties for reading the nodes within the document. For example, you might want to filter record nodes, restore relationships based on foreign key fields, use an element’s text as data for a field, or access XML that uses multiple XML namespaces. The following properties provide you with the ability to read the XML at the XMLAdapter level, XMLTable level, or the XMLField level.

You can use the following table to determine the node within the XML document that you want to start reading.

For example, if you use an XPath expression in the XMLName property for an XMLAdapter, reading begins at the first node

To read	Class	Context node
From the first found XML node:	XMLAdapter	IXMLDOMElement property
All found XML nodes and use each node as a single record:	XMLTable	XMLAdapter object
The first found XML node and use its text as a field value:	XMLField	XMLTable object

The following methods do not support the use of XPath expressions in the XMLName property:

The ApplyDiffgram and ChangesToCursor methods do not support XPath expressions for XMLAdapter and XMLTable objects.
The ToCursor method does not support an XPath expression for XMLAdapter when the IsDiffgram property is set to True (.T.).
The ToXML method does not support XPath expressions for XMLAdapter and XMLTable objects and ignores XMLField objects that use XPath expressions.

For more information about XPath expressions, see the XPath Reference in the Microsoft Core XML Services (MSXML) 4.0 SDK in the MSDN library at http://msdn.microsoft.com/library.

Cursor to XML Functions

Support for the following functions has been added to the OLE DB Provider for Visual FoxPro:

When used in the OLE DB Provider for Visual FoxPro, the _VFP VFPXMLProg property is not supported for the CURSORTOXML( ), XMLTOCURSOR( ) and XMLUPDATEGRAM( ) functions because the _VFP system variable is not supported in the OLE DB Provider.

EXECSCRIPT Supported in the Visual FoxPro OLE DB Provider

You can use the EXECSCRIPT( ) function with the Visual FoxPro OLE DB Provider. For more information, see EXECSCRIPT( ) Function.

Returning a Rowset from a Cursor in the Visual FoxPro OLE DB Provider

You can use the new SETRESULTSET( ), GETRESULTSET( ), and CLEARRESULTSET( ) functions to mark a cursor or table that has been opened by the Visual FoxPro OLE DB Provider, retrieve the work area of the marked cursor, and clear the marker flag from a marked cursor. By marking a cursor or table, you can retrieve a rowset that is created from the marked cursor or table from a database container (DBC) stored procedure when the OLE DB Provider completes command execution.

For more information, see SETRESULTSET( ) Function, GETRESULTSET( ) Function, and CLEARRESULTSET( ) Function.

Expanded Capacities

Several SELECT – SQL command limitations have been removed or increased in Visual FoxPro 9.0. The following table lists the areas where limitations have been removed or increased.

Capacity	Description
Number of Joins and Subqueries in a SELECT – SQL command	Visual FoxPro 9.0 removes the limit on the total number of join clauses and subqueries in a SELECT – SQL command. The previous limit was nine.
Number of UNION clauses in a SELECT – SQL command	Visual FoxPro 9.0 removes the limit on number of UNION clauses in a SQL SELECT statement. The previous limit was nine.
Number of tables referenced a SELECT – SQL command	Visual FoxPro 9.0 removes the limit on the number of tables and aliases referenced in a SQL SELECT statement. The previous limit was 30.
Number of arguments in an IN( ) clause	Visual FoxPro 9.0 removes the limit of 24 values in the IN (Value_Set) clause for the WHERE clause. However, the number of values remains subject to the setting of SYS(3055) – FOR and WHERE Clause Complexity. For functionality changes concerning the IN clause, see Changes in Functionality for the Current Release.

Subquery Enhancements

Visual FoxPro 9.0 provides more flexibility in subqueries. For example, multiple subqueries are now supported. The following describes the enhancements to subqueries in Visual FoxPro 9.0.

Multiple Subqueries

The following is the general syntax for multiple subqueries.

SELECT … WHERE … (SELECT … WHERE … (SELECT …) …) …

Examples

The following example queries, which will generate an error in Visual FoxPro 8.0, are now supported in Visual FoxPro 9.0.

GROUP BY in a Correlated Subquery

The following is the general syntax for the GROUP BY clause in a correlated subquery.

SELECT … WHERE … (SELECT … WHERE … GROUP BY …) …

Examples

The following example, which will generate an error in Visual FoxPro 8.0, is now supported in Visual FoxPro 9.0.

TOP N in a Non-Correlated Subquery

Visual FoxPro 9.0 supports the TOP N clause in a non-correlated subquery. The ORDER BY clause should be present if the TOP N clause is used, and this is the only case where it is allowed in subquery.

The following is the general syntax for the TOP N clause in a non-correlated subquery.

SELECT … WHERE … (SELECT TOP nExpr [PERCENT] … FROM … ORDER BY …) …

Examples

The following example, which will generate an error in Visual FoxPro 8.0, is now supported in Visual FoxPro 9.0.

Subqueries in a SELECT List

In Visual FoxPro 8.0, an attempt to use a subquery as a column or a part of expression in a projection would generate error 1810 (SQL: Invalid use of subquery).

The following is the general syntax for a subquery in a SELECT list.

SELECT … (SELECT …) … FROM …

Example

The following example, which will generate an error in Visual FoxPro 8.0, is now supported in Visual FoxPro 9.0.

Aggregate functions in a SELECT List of a Subquery

Example

The following example demonstrates the use of an aggregate function (the COUNT( ) function) in a SELECT list of a subquery.

Correlated Subqueries Allow Complex Expressions to be Compared with Correlated Field

In Visual FoxPro 8.0, correlated fields can only be referenced in the following forms:

correlated field <comparison> local field

-or-

local field <comparison> correlated field

In Visual FoxPro 9.0. correlated fields support comparison to local expressions, as shown in the following forms:

correlated field <comparison> local expression

-or-

local expression <comparison> correlated field

A local expression must use at least one local field and cannot reference any outer (correlated) field.

Example

In the following example, a local expression (MyCursor2.field2 / 2) is compared to a correlated field (MyCursor.field1).

Changes for Expressions Compared with Subqueries.

In Visual FoxPro 8.0, the left part of a comparison using the comparison operators [NOT] IN, <, <=, =, ==, <>, !=, >=, >, ALL, ANY, or SOME with a subquery must reference one and only one table from the FROM clause. In case of a comparison with correlated subquery, the table must also be the correlated table.

In Visual FoxPro 9.0, comparisons work in the following ways:

The expression on the left side of an IN comparison must reference at least one table from the FROM clause.
The left part for the conditions =, ==, <>, != followed by ALL, SOME, or ANY must reference at least one table from the FROM clause.
The left part for the condition >, >=, <, <= followed by ALL, SOME, or ANY (SELECT TOP…) must reference at least one table from the FROM clause.
The left part for the condition >, >=, <, <= followed by ALL, SOME, or ANY (SELECT <aggregate function>…) must reference at least one table from the FROM clause.
The left part for the condition >, >=, <, <= followed by ALL, SOME, or ANY (subquery with GROUP BY and/or HAVING) must reference at least one table from the FROM clause.

In Visual FoxPro 9.0, the left part of a comparison that does not come from the list (for example, ALL, SOME, or ANY are not included) doesn’t have to reference any table from the FROM clause.

Subquery in an UPDATE – SQL Command SET List

In Visual FoxPro 9.0, the UPDATE – SQL Command now supports a subquery in the SET clause.

A subquery in a SET clause has exactly the same requirements as a subquery used in a comparison operation. If the subquery does not return any records, the NULL value is returned.

Only one subquery is allowed in a SET clause. If there is a subquery in the SET clause, subqueries in the WHERE clause are not allowed.

The following is the general syntax for a subquery in the SET clause.

UPDATE … SET … (SELECT …) …

Example

The following example demonstrates the use of a subquery in the SET clause.

Sub-SELECT in the FROM Clause

The following is the general syntax for a subquery in the FROM clause.

SELECT … FROM (SELECT …) [AS] Alias…

Example

The following example demonstrates the use of a subquery in the FROM clause.

ORDER BY with Field Names in the UNION clause

When using a UNION clause in Visual FoxPro 8.0, you are forced to use numeric references in the ORDER BY clause. In Visual FoxPro 9.0, this restriction has been removed and you can use field names.

The referenced fields must be present in the SELECT list (projection) for the last SELECT in the UNION; that projection is used for ORDER BY operation.

Example

The following example demonstrates the use of a field names in the ORDER BY clause.

Optimized TOP N Performance

The TOP N optimization is done only if the SET ENGINEBEHAVIOR Command is set to 90.

Optimization requires that TOP N return no more than N records (this is not the case for Visual FoxPro 8.0 and earlier versions) which is enforced if SET ENGINEBEHAVIOR is set to 90.

TOP N PERCENT cannot be optimized unless the whole result set can be read into memory at once.

Improved Optimization for Multiple Table OR Conditions

In this scenario, table Test1 can be Rushmore optimized using the following condition:

(f1 IN (1,2,3) OR f1 IN (17,18,19))and table Test2 with the following:

(f2 IN (17,18,19) OR f2 IN (1,2,3))

Support for Local Buffered Data

Visual FoxPro 9.0 provides language enhancements that allow you to specify if the data returned by a SELECT – SQL command is based on buffered data or data written directly to disk.

If you do not include the BUFFERING clause, the retrieved data is then determined by the setting for SET SQLBUFFERING command. For more information, see the SET SQLBUFFERING Command.

Enhancements to other SQL Commands

The following sections describe enhancements made to the INSERT – SQL Command, UPDATE – SQL Command, and DELETE – SQL Command commands in Visual FoxPro 9.0.

UNION Clause in the INSERT – SQL Command

In Visual FoxPro 9.0, a UNION clause is now supported in the INSERT – SQL Command.

The following is the general syntax for the UNION clause.

INSERT INTO … SELECT … FROM … [UNION SELECT … [UNION …]]

Example

The following example demonstrates the use of a UNION clause in INSERT-SQL.

Correlated UPDATE – SQL Commands

Visual FoxPro 9.0 now supports correlated updates with the UPDATE – SQL Command.

If the name matches an implicit or explicit alias for a table in the FROM clause, then the table is used as a target for the update operation.
If the name matches the alias for the cursor in the current data session, then the cursor is used as a target.
A table or file with the same name is used as a target.

The UPDATE -SQL command FROM clause has the same syntax as the FROM clause in the SELECT – SQL command with the following limitations:

The target table or cursor cannot be involved in an OUTER JOIN as a secondary table.
The target cursor cannot be a subquery result.
All other JOINs can be evaluated before joining the target table.

The following is the general syntax for a correlated UPDATE command.

UPDATE … SET … FROM … WHERE …

Example

The following example demonstrates a correlated update using the UPDATE -SQL command.

Correlated DELETE – SQL Commands

Visual FoxPro 9.0 now supports correlated deletions with the DELETE – SQL Command.

If the name matches an implicit or explicit alias for a table in the FROM clause, then the table is used as a target for the update operation.
If the name matches the alias for the cursor in the current data session, then the cursor is used as a target.
A table or file with the same name is used as a target.

The DELETE -SQL command FROM clause has the same syntax as the FROM clause in the SELECT – SQL command with the following limitations:

The target table or cursor cannot be involved in an OUTER JOIN as a secondary table.
The target cursor cannot be a subquery result.
It should be possible to evaluate all other JOINs before joining the target table.

The following is the general syntax for a correlated DELETE command.

DELETE [alias] FROM alias1 [, alias2 … ] … WHERE …

Example

The following example demonstrates a correlated deletion using the DELETE -SQL command.

Updatable Fields in UPDATE – SQL Command

SET ENGINEBEHAVIOR

Data Type Conversion

Anchoring Visual Controls

Docking Forms

For more information, see How to: Dock Forms.

CheckBox and OptionButton Controls Support Wordwrapping

The WordWrap property is now supported for CheckBox and OptionButton controls. The text portions of these controls now use wordwrapping. For more information, see WordWrap Property.

CommandButton Controls Can Align Text with Pictures

CommandButton, OptionButton, and CheckBox Controls Can Hide Captions

For more information, see PicturePosition Property.

PictureMargin and PictureSpacing Properties Control Spacing and Margins on CommandButton, OptionButton, and CheckBox Controls

For more information, see PictureMargin Property and PictureSpacing Property.

Collection Objects Support in ComboBox and ListBox Controls

Setting Ascending or Descending Indexes on Cursors in the DataEnvironment

You can specify ascending or descending order for a cursor index by using the new OrderDirection property for Cursor objects. Note: OrderDirection is disregarded when the cursor’s Order property is empty.

For more information, see OrderDirection Property.

Grid Supports Rushmore Optimization

The Grid Control can be set to support Rushmore optimization if the underlying data source contains indexes that support this.

For more information, see Optimize Property.

Mouse Pointer Control for Grid Columns and Column Headers

For more information, see MousePointer Property and MouseIcon Property.

Rotating Label, Line, and Shape Controls

Label Controls Can Display Themed Background

ListBox Controls Can Hide Scroll Bars

Toolbar Controls Can Display Horizontal Separator Objects

For Separator objects, set the Style property to 1 to display a horizontal line or a vertical line, depending on how the toolbar appears. If the toolbar appears horizontally, the line displays vertically. If the toolbar appears vertically, the line displays horizontally. In versions prior to this release, setting Style to 1 displayed a vertical line only. Note: In versions prior to this release, undocked vertical system and user-defined toolbars did not display horizontal separators. In the current release, horizontal separators now display for vertical undocked toolbars.

For more information, see Style Property.

Toolbar Controls Can Hide Separator Objects

For more information, see Visible Property (Visual FoxPro).

Creating More Complex Shapes

You can use the new PolyPoints property for Line and Shape controls to create polygon lines and shapes. PolyPoints specifies an array of any dimension containing coordinates with the format of X1, Y1, X2, Y2, …, organized in the order in which the polygon line or shape is drawn. For Line controls, when you create a polygon line using the PolyPoints property, you can specify the new setting of “S” or “s” for the LineSlant property to create a Bezier curve.

For more information, see PolyPoints Property and LineSlant Property.

ComboBox Controls Can Hide Drop-Down Lists

NEWOBJECT( ) Creates Objects without Raising Initialization Code

To mimic the behavior of a class opened in Class Designer or Form Designer, pass 0 to the cInApplication parameter. This feature allows you to create design-time tools that view the structure of a class. By passing 0 to the cInApplication parameter for the NEWOBJECT( ) function, Visual FoxPro allows you to create an instance of a class without raising initialization code (such as code in the Init, Load, Activate, and BeforeOpenTables events). Furthermore, when the object is released, it does not raise its destructor code (such as code in the Destroy and Unload events). Only initialization and destructor code is suppressed; code in other events and methods is still called. If you use the cInApplication parameter to suppress initialization and destructor code in an object, you also suppress it in the object’s child objects.

This behavior is not supported for the NewObject Method.

For more information, see NEWOBJECT( ) Function.

Specify Where Focus is Assigned in the Valid Event

To direct where focus is assigned, you can use the optional ObjectName parameter in the RETURN command of the Valid event. The object specified must be a valid Visual FoxPro object. If the specified object is disabled or cannot receive focus, then focus is assigned to the next object in the tab order. If an invalid object is specified, Visual FoxPro keeps the focus at the current object. You can now set focus to objects in the following scenarios: Set focus to an object on another visible form. Set focus to an object on a non-visible Page or Pageframe control.

For more information, see Valid Event.

TextBox Controls Have Auto-Completion Functionality

For more information, see AutoComplete Property, AutoCompSource Property, and AutoCompTable Property.

New InputMask and Format Property Settings

The following new InputMask and Format settings are available: InputMask Property cMask Description U Permits alphabetic characters only and converts them to uppercase (A – Z). W Permits alphabetic characters only and converts them to lowercase (a – z). Format Property cFunction Description Z Displays the value as blank if it is 0, except when the control has focus. Dates and DateTimes are also supported in these controls. The date and datetime delimiters are not displayed unless the control has focus.

For more information, see InputMask Property and Format Property.

Use PictureVal Property to Pass Images as Strings

For more information, see PictureVal Property.

CLEAR CLASSLIB Updated

The CLEAR CLASSLIB command now automatically executes a CLEAR CLASS command on each class in the specified class library. Any errors that might occur during release of individual classes (e.g., class in use) are ignored. Note: Classes in other class libraries that are used or referenced by a class in the specified class library are not cleared.

For more information, see CLEAR Commands.

Screen Resolution Limit Increased

In prior versions of Visual FoxPro, the definable maximum area for a form is limited to twice the Screen Resolution for both X and Y dimensions. For example, if your monitor resolution is 1280×1024, then the max dimensions would be: Copy Code Form.Width = 2552 Form.Height = 2014 Additionally, if you attempted to set Width and Height properties to these limits in design-time and then ran the form, you would see that the values have reverted to screen resolution limits (being that they were saved this way): Copy Code Form.Width = 1280 Form.Height = 998 In Visual FoxPro 9.0, this limitation has been increased to approximately 32,000 pixels for each dimension and now allows for more flexibility with certain forms such as scrollable ones: Copy Code Form.Width = 32759 Form.Height = 32733

For more information, see Width Property and Height Property.

ProjectHook Source Code Control Events

New events have been added to the ProjectHook class, which allow you to perform source code control operations such as check-in and check-out of multiple files at once.

For more information, see SCCInit Event and SCCDestroy Event.

AddProperty Method Supports Design Time Settings

You can specify the visibility (Protected, Hidden or Public) and description of a property using the AddProperty method with new available parameters. These settings can also be controlled from the New Property Dialog Box and Edit Property/Method Dialog Box. For more information, see AddProperty Method.

WriteMethod Method Supports Design Time Settings

You can specify the visibility (Protected, Hidden or Public) and description of a method using the WriteMethod method with new available parameters. These settings can also be controlled from the New Property Dialog Box and Edit Property/Method Dialog Box. For more information, see WriteMethod Method.

Class Enhancements

Visual FoxPro contains significant language enhancements for classes, forms, controls, and object-oriented related features. For more information, see Class Enhancements.

Data and XML Enhancements

Visual FoxPro contains significant language enhancements for Data, SQL and XML features. For more information, see SQL Language Improvements and Data and XML Feature Enhancements.

IDE Enhancements

Printing and Reporting Enhancements

Visual FoxPro contains a number of language enhancements to support new Reporting functionality:

REPORT FORM Command Displays or prints out a report specified by a report definitions file. This command has been enhanced to support Report Listener objects.
SET REPORTBEHAVIOR Command Controls use of Report Preview and Report Output applications with the Visual FoxPro Report System.
SYS(2024) – Detect Report Cancellation Determines if user canceled out of a running report.

Additionally, there are improvements to the following related Printing language elements:

SYS(1037) – Page Setup Dialog Box Displays Visual FoxPro default or report Page Setup dialog box or sets printer settings for the default printer in Visual FoxPro or for the report printer environment. In this version, a new nValue parameter is available.
APRINTERS( ) Function Returns a five-column array with the name of the printer, connected port, driver, comment, and location. The last three columns are available if the new optional parameter is passed.
GETFONT( ) Function Contains an additional setting to display only those fonts available on the current default printer and clarified values for the language script.

New Reporting functionality is described in more detail in separate Reporting topics. For more information, see Guide to Reporting Improvements.

Specifying Arrays with More Than 65K Elements

You can now specify arrays containing more than 65,000 elements, for example, when using the DIMENSION command. Normal arrays and member arrays have a new limit of 2GB. Arrays containing member objects retain a limit of 65,000 elements. Note: Array sizes can also be limited by available memory, which affects performance, especially for very large arrays. Make sure your computer has enough memory to accommodate the upper limits of your arrays.

For more information, see Visual FoxPro System Capacities and DIMENSION Command.

STACKSIZE Setting Increases Nesting Levels to 64k

For operations such as the DO command, you can change the default number of nesting levels from 128 levels to 32 and up to 64,000 levels of nesting by including the new STACKSIZE setting in a Visual FoxPro configuration file. Note: You can change the nesting level only during Visual FoxPro startup.

For more information, see Special Terms for Configuration Files and Visual FoxPro System Capacities.

Program and Procedure File Size Is Unrestricted

PROGCACHE Configuration File Setting

ICASE( ) Function

TTOC( ) Converts DateTime Expressions to XML DateTime Format

You can convert a DateTime expression into a character string in XML DateTime format by passing a new optional value of 3 to the TTOC( ) function. For more information, see TTOC( ) Function.

SET COVERAGE Command Available at Run Time

The SET COVERAGE command is now available at run time so that you can debug errors that occur at run time but not at design time. For more information, see SET COVERAGE Command.

CLEAR ERROR Command

The new ERROR clause for the CLEAR command makes it possible to reset the error structures as if no error occurred. This affects the following functions: The AERROR( ) function will return 0. The ERROR( ) function will return 0. The output from MESSAGE( ), MESSAGE(1) and SYS(2018) will return a blank string.

The CLEAR command should not be used with the ERROR clause within a TRY…CATCH…FINALLY structure. For more information, see CLEAR Commands.

Write Options Dialog Settings to Registry Using SYS(3056)

The SYS(3056) function can now be used to write out settings from the Options dialog box to the registry. SYS(3056 [, nValue ]) The following table lists values for nValue. nValue Description 1 Update only from registry settings, with the exception of SET commands and file locations. 2 Write out settings to the registry.

For more information, see SYS(3056) – Read Registry Settings.

FOR EACH … ENDFOR Command Preserves Object Types

Visual FoxPro now includes the FOXOBJECT keyword for the FOR EACH … ENDFOR command to support preservation of native Visual FoxPro object types. FOR EACH objectVar [AS Type [OF ClassLibrary ]] IN Group FOXOBJECT Commands [EXIT] [LOOP] ENDFOR | NEXT [Var] The FOXOBJECT keyword specifies that the objectVar parameter created will be a Visual FoxPro object. The FOXOBJECT keyword only applies to collections where the collection is based on a native Visual FoxPro Collection class. Collections that are COM-based will not support the FOXOBJECT keyword.

For more information, see FOR EACH … ENDFOR Command.

SET PATH Command Enhancements

The SET PATH command now supports the ADDITIVE keyword. The ADDITIVE keyword appends the specified path to the end of the current SET PATH list. If the path already exists in the SET PATH list, Visual FoxPro does not add it or change the order of the list. Paths specified with the ADDITIVE keyword must be strings in quotes or valid expressions. In addition, the length of the SET PATH list has been increased to 4095 characters.

For more information, see SET PATH Command.

Trim Functions Control Which Characters Are Trimmed

It is now possible to specify which characters are trimmed from an expression when using the TRIM( ), LTRIM( ), RTRIM( ), and ALLTRIM( ) functions. TRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, …]]]) LTRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, …]]]) RTRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, …]]]) ALLTRIM(cExpression[, nFlags] [, cParseChar [, cParseChar2 [, …]]]) You can specify that the trim is case-insensitive using the nFlag value of 0 bit and 1. The cParseChar parameter specifies one or more character strings to be trimmed from cExpression. A maximum of 23 strings can be specified in cParseChar. By default, if cParseChar is not specified, then leading and trailing spaces are trimmed from character strings or 0 bytes are removed for Varbinary data types. The cParseChar parameters are applied in the order they are entered. When a match is found, cExpression is truncated and the process repeats from the first cParseChar parameter.

For more information, see the TRIM( ) Function, LTRIM( ) Function, RTRIM( ) Function, and ALLTRIM( ) Function topics.

ALINES( ) Offers More Flexible Parsing Options

Improvements in TEXT…ENDTEXT Statement

Include Delimiters in STREXTRACT( ) Results

The STREXTRACT( ) function has a new nFlags setting that allows you to include the specified delimiters with the returned expression. For more information, see STREXTRACT( ) Function.

STRCONV( ) Enhanced to Allow for Code Page and FontCharSet

For certain conversion settings, you can specify an optional Code Page or Fontcharset setting for use in the conversion. For more information, see STRCONV( ) Function.

TYPE( ) Determines if an Expression is an Array

The TYPE( ) function accepts the parameter, 1, to evaluate an expression to determine if it is an array. Type(cExpression, 1) The following character values are returned if the 1 parameter is specified. Return Value Description A cExpression is an array. U cExpression is not an array. C cExpression is a collection. cExpression must be passed as a character string.

For more information, see TYPE( ) Function.

BINTOC( ) and CTOBIN( ) Have Additional Conversion Capabilities

MROW( ) and MCOL( ) Can Detect the Position of the Mouse Pointer

For more information, see MROW( ) Function and MCOL( ) Function.

INPUTBOX( ) Returns A Cancel Operation

The INPUTBOX( ) function contains an additional parameter that allows you to determine if the user canceled out of the dialog. For more information, see INPUTBOX( ) Function.

AGETCLASS( ) Supported for Runtime Applications

The AGETCLASS( ) fiunction is now supported for runtime applications. For more information, see AGETCLASS( ) Function.

SYS(2019) Extends Handling of Configuration Files

You can use SYS(2019) to obtain the name and location of both internal and external configuration files. For more information, see SYS(2019) – Configuration File Name and Location.

SYS(2910) Controls List Display Count

For more information, see SYS(2910) – List Display Count.

SYS(3008) Turns Off Hyperlink Tip

SYS(3065) Internal Program Cache

You can obtain the internal program cache (PROGCACHE configuration file setting). For more information, see SYS(3065) – Internal Program Cache.

SYS(3101) COM Code Page Translation

You can now specify a code page to use for character data translation involving COM interoperability. For more information, see SYS(3101) – COM Code Page Translation.

Bidirectional Support for Tooltips and Popups

For international applications that display text from right to left, you can use the following new enhancements to control text justification: SYS(3009) – right justifies text in ToolTips. DEFINE POPUP…RTLJUSTIFY – right justifies items in a popup, such as a shortcut menu. SET SYSMENU TO RTLJUSTIFY – right justifies an entire menu system.

The SYS(3009) function is a global setting. For more information, see SYS(3009) – Bidirectional Text Justification for ToolTips, DEFINE POPUP Command and SET SYSMENU Command.

Enhanced Font Script Support

Visual FoxPro 9.0 contains a number of enhancements that extend ability to specify a Font Language Script (or FontCharSet) along with existing Font settings:

SYS(3007) – specifies a FontCharSet for ToolTips. This is a global setting.
FONT Clause – the following table lists commands that support an optional FONT clause that allows for specification of a FontCharSet in the following format: FONT cFontName [, nFontSize [, nFontCharSet]] Command DEFINE MENU DEFINE POPUP DEFINE BAR DEFINE PAD DEFINE WINDOW MODIFY WINDOW BROWSE/EDIT/CHANGE ?/??
Browse – the Font Dialog Box that you can invoke by selecting the Font menu item from the Table menu with a Browse Window active now allows for selection of a font language script. You can specify a global default font script from the IDE Tab, Options Dialog Box in the Options Dialog Box (Visual FoxPro). To do this, you must first check the Use font script checkbox.
Editors – the Font Dialog Box that you can invoke with an editor window active by selecting the Font menu item from the Format menu or right-click shortcut menu Edit Properties Dialog Box now allows for selection of a font language script. You can specify a global default font script from the IDE Tab, Options Dialog Box in the Options Dialog Box (Visual FoxPro). To do this, you must first check the Use font script checkbox.

For more information, see SYS(3007) – ToolTipText Property Font Language Script, IDE Tab, Options Dialog Box, and FontCharSet Property.

ToolTip Timeout Control

You can specify how long a ToolTip is displayed if the mouse pointer is left stationary. For more information, see _TOOLTIPTIMEOUT System Variable.

Tablet PC Features

The following features are available to assist with applications designed to run on a Tablet PC computer. ISPEN( ) – determines if the last Visual FoxPro application mouse event on a Tablet PC was a pen tap. _SCREEN.DisplayOrientation – this read-write property specifies the screen display orientation for a Tablet PC. The value returned is the current orientation. _TOOLTIPTIMEOUT – specifies how long a ToolTip is displayed if the mouse pointer is left stationary.

For more information, see ISPEN( ) Function, DisplayOrientation Property, and _TOOLTIPTIMEOUT System Variable.

Windows Message Event Handling

Visual FoxPro allows you to trap and handle window messages from the Microsoft® Windows® operating system using existing BINDEVENT functions. Some examples of common events you might wish to trap for include: A power broadcast message used to intercept standby or power-down activities. Media insertion and removal events, such as the insertion of a CD into a drive. The insertion and/or removal of a Plug and Play hard disk (e.g., USB Drive). Interception of screen saver queries to stop the screen saver from activating. Operating system level font changes and Windows XP Theme changes. New network connections/shares added or removed from system. Switching between applications. You can use the Visual FoxPro BINDEVENT functions to register (and unregister) event handlers used to intercept messages (i.e., Win32 API window messages that get processed by the Win32 WindowProc function). See MSDN for more details. The new BINDEVENT( ) syntax requires the hWnd (integer) of the window receiving the message you desire to intercept, and the specific message itself (integer). For example, power-management events such as standby and power-down use the Win32 WM_POWERBROADCAST message (value of 536). BINDEVENT(hWnd, nMessage, oEventHandler, cDelegate) The following example illustrates detection of a Windows XP Theme change: Copy Code #DEFINE WM_THEMECHANGED 0x031A #DEFINE GWL_WNDPROC (-4) PUBLIC oHandler oHandler=CREATEOBJECT(“AppState”) BINDEVENT(_SCREEN.hWnd, WM_THEMECHANGED, oHandler, “HandleEvent”) MESSAGEBOX(“Test by changing Themes.”) DEFINE CLASS AppState AS Custom nOldProc=0 PROCEDURE Destroy UNBINDEVENT(_SCREEN.hWnd, WM_THEMECHANGED) ENDPROC PROCEDURE Init DECLARE integer GetWindowLong IN WIN32API ; integer hWnd, ; integer nIndex DECLARE integer CallWindowProc IN WIN32API ; integer lpPrevWndFunc, ; integer hWnd,integer Msg,; integer wParam,; integer lParam THIS.nOldProc=GetWindowLong(_VFP.HWnd,GWL_WNDPROC) ENDPROC PROCEDURE HandleEvent(hWnd as Integer, Msg as Integer, ; wParam as Integer, lParam as Integer) lResult=0 IF msg=WM_THEMECHANGED MESSAGEBOX(“Theme changed…”) ENDIF lResult=CallWindowProc(this.nOldProc,hWnd,msg,wParam,lParam) RETURN lResult ENDPROC ENDDEFINE The following SYS( ) functions are also available to assist with handing these events: SYS(2325) – returns the hWnd of a client window from the parent window’s WHANDLE. SYS(2326) – returns a Visual FoxPro WHANDLE from a window’s hWnd. SYS(2327) – returns a window’s hWnd from a Visual FoxPro window’s WHANDLE.

Additional Project Manager Shortcut Menu Commands

When docked, the Project Manager window contains the following additional shortcut menu commands that are available on the Project menu:

Close Closes the Project Manager.
Add Project to Source Control Creates a new source control project based on the current project. Available only when a source code control provider is installed and specified on the Projects tab in the Options dialog box.
Errors Displays the error (.err) file after running a build.
Refresh Refreshes the contents of the Project Manager.
Clean Up Project Removes deleted records from the Project Manager (.PJX) file.

Modifying a Class Library from the Project Manager

Set Font of Project Manager

You can change the text font settings for the Project Manager window. Right-click the Project Manager window (outside of the tree hierarchy window) and choose Font.

Generating Message Logs During Project Build and Compile

When you build a project, application, or dynamic-link library, Visual FoxPro automatically generates an error (.err) file that includes any error messages, if they exist, when the build process completes. When you select the Display Errors check box in the Build Options dialog box, Visual FoxPro displays the .err file when the build completes. Selecting the Recompile All Files check box includes compile errors in the .err file. Build status messages usually appear in the status bar. However, in previous versions, if the build process is interrupted, Visual FoxPro did not write the .err file to disk. In the current release, Visual FoxPro writes build status and error messages to the .err file as they occur during the build process. If the build process is interrupted, you can open the .err file opens to review the errors. Note: If no errors occur during the build, the .err file is deleted. If the Debug Output window is open, build status and error messages appear in the window. You can save messages from the Debug Output window to a file.

For more information, see How to: View and Save Build Messages.

Properties Window Enhancements

Design time support for entering property values greater than 255 characters and extended characters, such as CHR(13) (carriage return) and CHR(10) (linefeed), has been added to visual class library (.vcx) and form (.scx) files. You can now enter up to 8k characters in length. Note: Extended property value support is only available through the Properties Window (Zoom dialog box) for custom user-specified properties as well as certain native ones such as CursorSchema and Value. For properties not supported, you can still specify values which are longer than 255 characters, or contain carriage returns and linefeeds by assigning them in code such as during the object’s Init Event. The Zoom dialog box and Expression Builder dialog box have been updated to support this. The Properties window includes a Zoom (Z) button that appears next to the property settings box for appropriate properties. Caution: Property values that exceed 255 characters or include carriage return and/or linefeed characters are stored in a new format inside the .vcx or .scx file. If you attempt to modify these classes in a prior version, an error occurs. This feature is particularly useful for setting the CursorAdapter CursorSchema property to any schema expression when schemas might exceed 255 characters.
The Properties window font can now be specified by the new Font shortcut menu option. This new menu replaces the Small, Medium and Large font menu items used in prior versions. This font is also used in the description pane, and object and property value dropdowns. Note: Bold and italic font styles are reserved for non-default property values and read-only properties, respectively. If a bold or italic font style is chosen, then the Properties window inverts the displayed behavior. For example, if one chooses an italic font style, read-only properties appear in normal font style and all others in italic.
Colors can be specified for certain types of properties by right clicking on the Properties Window and selecting following menu items:
- Non-Default Properties Color Sets color for properties whose values have changed from default setting (same properties that are displayed when the Non-Default Properties Only menu item is selected).
- Custom Properties Color Sets color for custom properties.
- Instance Properties Color Sets color for custom properties that have been added to the current class instance (same properties that appear in bold in the Edit Property/Method Dialog Box).
Note: If a conflict exists between color settings, the Instance setting takes priority followed by the Non-Default one.

For more information, see Zoom <property> Dialog Box, Expression Builder Dialog Box, CursorSchema Property, and Properties Window (Visual FoxPro).

MemberData Extensibility

The MemberData extensibility architecture lets you provide metadata for class members (properties, methods and events). With MemberData, you can specify a custom property editor, display a property on the Favorites tab, or change the capitalization in the Properties Window (Visual FoxPro).

For more information, see MemberData Extensibility.

Setting Default Values for New Properties

For more information, see How to: Add Properties to Classes.

Document View Sort Options

You can now sort items in the Document View window by name for forms and visual class libraries.

See Document View Window for more information on sorting items in the Document View Window.

Compiling Code in the Background

Visual FoxPro performs background compilation when syntax coloring is turned on in the Command window and Visual FoxPro editors for program (.prg) files, methods, stored procedures, and memos. The Expression box in the Expression Builder dialog box also includes support for background compilation and syntax coloring when turned on. When the single and current line of code that you are typing contains invalid syntax, Visual FoxPro displays the line of code with the formatting style selected in the Editor tab of the Options dialog box. Note: Syntax coloring must be turned on for background compilation to function. Background compilation does not detect invalid syntax in multiple lines of code, including those containing continuation characters.

For more information, see How to: Display and Print Source Code in Color.

Rich Text Format (RTF) Clipboard Support

Visual FoxPro now supports copying in RTF (Rich Text Format) to the clipboard. Visual FoxPro preserves the style (bold, italic, and underline) and color attributes.

The _CLIPTEXT System Variable does not support RTF.

Find Dialog Box Improvements

The following improvements were made to Find support:

If a word is selected in a Visual FoxPro editor, the Find Dialog Box (Visual FoxPro) when opened now displays the word in the Look For drop-down box. If Find has not yet been used for a running instance of Visual FoxPro, a word positioned under the insertion pointer will appear in the Look For drop-down. If multiple words are selected, only the first word appears in the drop-down (use copy and paste to enter multiple words).
When a Browse window is open and you search for a word with the Find dialog box, you can search for the word again (Find Again) after the Find dialog box is closed by pressing the F3 key.
You can now use Find to search for content in Name column of the Watch and Locals debug windows (see Debugger Window). When searching object members, Find searches in these debug windows are limited to nodes that have been expanded and one level below.

View Constants in Trace Window

Constants (#DEFINE values) can be viewed in the Trace Window when you hover over it with the mouse.

Printing Selected Text in Editor Windows

You can print selected text from Visual FoxPro editor windows. When you have text selected in the editor window, the Selection option in the Print dialog box is available and selected. Note: If a partial line is selected, the entire line is printed.

For more information, see Print Dialog Box (Visual FoxPro).

System Font Improvements

To improve legibility on high-resolution monitors, Error dialog boxes and the Zoom <property> Dialog Box in the Properties window now use the Windows Message Box text font.

In Windows XP, the Windows Message Box text font is set by opening Display in the Control Panel, and then clicking Advanced on the Appearance tab.

IntelliSense Saves Settings Between User Sessions

IntelliSense in Memo Field Editor Window

Visual FoxPro includes IntelliSense support in Memo field editor windows when syntax coloring is turned on.

IntelliSense Available for Runtime Applications

Selected IntelliSense features are available at run time in distributed Visual FoxPro 9.0 applications. In order to use IntelliSense at run time, you need to set the _FOXCODE and _CODESENSE variables, and EditorOptions Property. Note: With runtime applications, syntax coloring does not need to be turned on for an editor to support IntelliSense.

For more information, see IntelliSense Support in Visual FoxPro, _FOXCODE System Variable, _CODESENSE System Variable and EditorOptions Property.

IntelliSense Support in WITH … ENDWITH and FOR EACH … ENDFOR Commands

Visual FoxPro now supports IntelliSense within the WITH … ENDWITH Command and FOR EACH … ENDFOR Command.

WITH ObjectName [AS Type [OF ClassLibrary]]

Commands

ENDWITH

FOR EACH ObjectName [AS Type [OF ClassLibrary]] IN Group

Commands

[EXIT]

[LOOP]

ENDFOR

The Type parameter can be any valid type, including data types, class types, or ProgID. If the class name cannot be found, Visual FoxPro disregards Type and does not display IntelliSense for it.

The ObjectName expression can refer to a memory variable or an array.

Types expressed as ProgIDs and class libraries do not require quotation marks (“”) to enclose them unless their names contain spaces.

Report and Label Designers

You can use the Report Builder available in the Report Designer and Label Designer to perform reporting tasks, configure settings, and set properties for reporting features such as report layout, report bands, data groups, report controls, and report variables. For example, you can perform the following tasks: Prevent users from modifying reports, report controls, and report bands when editing the report in protected mode. Display captions instead of expressions for Field controls at design time. Display user-defined ToolTips for report controls. Set the language script for reports. Save the report data environment as a class.

By default, the Report Builder activates when you interact with the Report and Label designers. However, you can use the _REPORTBUILDER system variable to specify ReportBuilder.app. The Report Builder consolidates, replaces, and adds to the functionality found in previous Report Designer user interface elements, which remain in the product and are available by setting _REPORTBUILDER. You can write custom report builders to extend reporting functionality and output or run reports with report objects. For more information, see Working with Reports and _REPORTBUILDER System Variable.

Menu Designer

You can set the _MENUDESIGNER system variable to call your own custom designer for creating menus. Copy Code _MENUDESIGNER = cProgramName

For more information, see _MENUDESIGNER System Variable.

Table Designer

The Table Designer accommodates the following data enhancements:

New Data Types: Varchar, Varbinary and Blob
Binary Indexes
For more information, see Data and XML Feature Enhancements.

Query and View Designers

You can use spaces in table names specified in SQL statements in the Query and View designers if you provide an alias. For example, editing the following statement is valid in the View and Query designers: Copy Code SELECT * from dbo.”Order Details” Order_Details

For more information, see SELECT – SQL Command.

Data Environment Designer

The full path to the database (DBC) appears in the status bar when you select a database in the Add Table or View Dialog Box.

Class and Form Designers

The name of the class you are modifying appears in the title bar for the following dialog boxes:

The View menu for the Form Designer offers both options for specifying the tab order on forms: Assign Interactively or Assign by List.

In the Class, Form, and Report designers, you can use the following keyboard shortcut commands to adjust spacing between selected items.

Shortcut	Description
ALT+Arrow Key	Adjusts the spacing between the selected objects by one pixel in the direction of the arrow key.
ALT+CTRL+Arrow Key	Adjusts the spacing between the selected objects by one grid scale in the direction of the arrow key.

For more information, see Interactive Development Environment (IDE) Enhancements.

Printing Dialog Boxes and Printing Language Enhancements

Visual FoxPro includes various enhancements for its printing dialog boxes and printing language. Visual FoxPro uses the latest operating system dialogs for Printer Setup and other related printing operations. If the user is running on Windows XP, the dialogs will appear Themed. The following language functions contain new enhancements that impact general printing operations:

SYS(1037) – Page Setup Dialog Box
APRINTERS( ) Function
GETFONT( ) Function Contains an additional setting to display only those fonts available on the current default printer and clarified values for the language script.

For more information, see Language Enhancements.

Improved Support for Applications Detecting Terminal Servers

Updated Dr. Watson Error Reporting to 2.0

Anchor Editor Application

Term	Definition
Anchor but do not resize vertically	Specifies that the center of the control is anchored to the top and bottom edges of its container but the control does not resize.
Anchor but do not resize horizontally	Specifies that the center of the control is anchored to the left and right edges of its container but the control does not resize.
Border values	Displays the current settings for the border values.
Common settings	Selects commonly used settings for the Anchor property.
Sample	Click the Sample button to test the current anchor value on a sample form.
Anchor value	The Anchor property value that is the combination of the current settings for the border values.

Class Browser

You can open and view class definitions that are specified within a program (.prg) similarly to class libraries (.vcx). You can select a program (.prg) from the File Open/Add dialog box. See Class Browser Window for more information.

CursorAdapter Builder

The CursorAdapter Builder contains a number of enhancements that correspond to improvements added to the CursorAdapter class. See CursorAdapter Builder for more information.

Toolbox

The Toolbox (Visual FoxPro) is now dockable and can be docked to the desktop or other IDE windows.

Code References

The Code References Window has been updated with the following minor enhancements:

For the results grid, the Options dialog provides a new setting to show separate columns for class, method, and line, rather than concatenating them all in a single column.
You can now sort by method name by right-clicking on the method header or selecting the Sort By menu item from the right-click menu.
With the results tree list, the following new right-click menu options are available:
- Expand All – expands all nodes
- Collapse All – collapses all nodes
- Sort by Most Recent First – puts the most recent result sets at the top of the list rather than at the bottom

GENDBC.PRG

The Gendbc.prg program which generates program used to recreate a database has been updated with following minor enhancements:

Support for new Varchar, Varbinary and Blob field types
Support for AllowSimultaneousFetch, RuleExpression, and RuleText properties for views

Environment Manager Task Pane

The Environment Manager Task Pane has been enhanced with the following features:

Form and Formset Template Classes – you can now specify template classes for new forms and formsets with each environment set. This is setting specified in the Forms Tab, Options Dialog Box.
Field Mapping – you can set classes to use for when you drag and drop a field onto a form with each environment set. This is setting specified in the Field Mapping Tab, Options Dialog Box.
Resource File – the Environment Manager now supports setting of a Resource File. If one does not exist, the Environment Manager will optionally create it when the environment is set.
The Environment Manager now contains a new <default field mapping> environment set. This set is created the first time the Environment Manager is run so that the original default Options dialog settings for Field Mapping and Form Template Classes can be saved and restored later if desired.
For more information, see Environment Manager Task Pane.

Data Explorer Task Pane

The Task Pane Manager includes the new Data Explorer Task Pane which allows you to view and work with remote data sources such as SQL Server databases.

For more information, see Data Explorer Task Pane.

MemberData Editor

The new MemberData Editor lets you edit MemberData for your classes. The MemberData Editor is available from the Class menu when the Class Designer is active. The MemberData Editor is also invoked silently when you right-click on an item in the Properties Window and select the Add to Favorites menu item. The MemberData Editor application is specified as a builder and can be changed in the Builder.dbf table located in your Wizards directory.

For more information, see MemberData Editor and MemberData Extensibility.

New Foundation Classes (FFC)

The following are new FoxPro Foundation classes added to this version of Visual FoxPro:

_REPORTLISTENER.VCX – a set of core classes you can use when creating custom report listeners.
_FRXCURSOR.VCX – a class library used for working with report (FRX) files.
_GDIPLUS.VCX – a set of classes you can use for GDI+ handling. This is intended primarily for use when creating custom report listener classes.

New Solution Samples

Visual FoxPro 9.0 contains many new samples that show off new features in the product. To see a list of these samples, select the Solution Samples task pane in the Task Pane Manager and expand the New in Visual FoxPro 9.0 node.

Critical Changes

Critical behavior changes will most likely to affect existing code when running under this version of Visual FoxPro. SQL SELECT IN (Value_Set) Clause

In previous versions of Visual FoxPro, the IN (Value_Set) clause for the WHERE clause in the SQL SELECT command is mapped to INLIST( ) function. In the current release, Visual FoxPro might stop evaluating values and expressions in the Value_Set list when the first match is found. Therefore, if the IN clause is not Rushmore-optimized, you can improve performance by placing values most likely to match in the beginning of the Value_Set list. For more information, see the description for the IN clause in the SELECT – SQL Command topic and the INLIST( ) Function.

Conversion of INLIST( ) Function in the Query Designer and View Designer

In previous versions of Visual FoxPro, the Query Designer and View Designer convert INLIST( ) function calls in the WHERE clause of the SQL SELECT command into IN (Value_Set) clauses. In the current release, this conversion no longer occurs due to the differences between INLIST( ) and the SQL IN clause. INLIST( ) remains restricted to 24 arguments. For more information, see the description for the IN clause in the SELECT – SQL Command topic and the INLIST( ) Function.

Grids and RecordSource and ControlSource Properties

Important Changes

Important changes might affect existing code when running under Visual FoxPro 9.0. Reporting

Visual FoxPro contains many improvements for reporting. The following are behavior changes that could impact existing reports:

The Report Designer and Engine now make use of extensible components. You can control or eliminate use of design-time extensions by altering the value of _REPORTBUILDER System Variable. You control run-time extension use with the SET REPORTBEHAVIOR Command.
In Visual FoxPro 9’s new object-assisted reporting mode, report fields may need to be adjusted (widened) slightly. This is especially important for numeric data where a field that is not wide enough to display the entire number will show it instead as asterisks (*****). For more information about the changes to the Report System that required this change, and features of the GDI+ rendering engine and other changes related to it, see Using GDI+ in Reports. For migration strategy and recommendations, see Guide to Reporting Improvements.
For a table of additional, minor rendering differences between backward-compatible reporting mode and object-assisted reporting mode, see the table below. Rendering feature Behavior in backward-compatible mode Behavior in object-assisted mode Recommendations Tab stops (CHR(9) values included in report data) The width of a tab stop is determined by a number of characters in the font used. Tab stops are set at fixed-width positions, regardless of font. If you concatenated tabs with data in a stretching report layout element to create a table format within the element, you can often fulfill the same requirements using a second detail band in Visual FoxPro 9. Alternatively, change the number of tabs you concatenate with your data. Special characters and word-wrapping Non-breaking spaces are not respected; they are treated as normal space characters. Special characters such as non-breaking spaces (CHR(160)) and soft hyphens (CHR(173)) are correctly interpreted. As a result, words may wrap differently in output. Evaluate the results. In most cases, users will appreciate the change, because it more faithfully representing their original intentions in the text. If necessary, use the CHRTRAN( ) Function or STRTRAN( ) Function to replace these special characters with standard spaces and hyphens. Line spacing of multi-line objects Line spacing is determined by a formula that does not take font properties into consideration. Lines in a multi-line object are individually rendered, so background colors for each line may appear to have a different width. GDI+ line spacing is dynamically determined using font characteristics. A multi-line object is rendered as a single block of text. Evaluate the results. In most cases, the change in line spaces will provide a more polished appearance, and in all cases this method of handling multi-line text provides better performance. If a report depends on the old style of spacing lines, you can adjust the ReportListener’s DynamicLineHeight Property to revert to the old behavior. Cursor images (.CUR files) .CUR files can be used as image sources in reports. .CUR files are not supported as image sources. Convert the cursor file to another, supported image format. Shape (Rounded Rectangle) curvature Limited choices for curvature. More choices are available through the Report Builder Application dialog box interface, but some will not look the same way in backward-compatible mode and object-assisted mode. If reports have to run in both backward-compatible mode and object-assisted mode, or if they are designed in version 9.0 but must run in earlier versions, limit your choices of values of shape curvature to those allowed in the native Round Rectangle Dialog Box. If you are using the Style Tab, Report Control Properties Dialog Box (Report Builder) to design such reports, use the values 12, 16, 24, 32, and 99, to represent the native buttons, selecting the buttons from left to right. The default value in the Round Rectangle dialog box (second button) is 16.
When you create a Quick Report, by using the CREATE REPORT – Quick Report Command or by invoking the Quick Report… option on the Report menu, and if you have SET REPORTBEHAVIOR 90, the layout elements created by the Report Designer are sized differently from ones created for the same fields in previous versions. This change handles the additional width required by the new rendering mechanism of the report engine.
If you use the KEYBOARD Command or PLAY MACRO Command statements to address options on the Report menu, you may need to revise the keystrokes in these statements, as the menu has been reorganized.
Reports may take longer to open in the Report Designer if the report was previously saved with the Printer Environment setting enabled. You can improve performance by unchecking the Printer Environment menu item from the Report menu and re-saving the report. The saved Printer Environment is not critical for functioning of a report and is typically not recommended. Object-assisted report mode also respects different printers’ resolution settings, so saving resolution information for one printer in your report may have adverse effects in an environment with printers that have different resolutions. A saved Printer Environment may also have more adverse affects on REPORT FORM or LABEL commands invoked with the TO FILE option than they did in previous versions, if the associated printer setup is not available in the environment at runtime. In Visual FoxPro 9, the global default for this setting in the Reports Tab, Options Dialog Box, and for reports created in executable applications (.exe files), has been changed to unchecked.
Because of changes to the way Visual FoxPro 9 uses current printer settings to determine items such as print resolution and page height dynamically, a REPORT FORM or LABEL command will not run in object-assisted mode if there are no available printer setups in the environment or if the print spooler has been stopped. You will receive Error loading printer driver (Error 1958). If you need to run reports in an environment with no printer information, perhaps creating custom types of output that do not require printers, you can supply Visual FoxPro with the minimal set of information it needs to run your report by supplying a page height and page width from the appropriate Report Listener methods. For more information, see GetPageHeight Method and GetPageWidth Method.
By default, and by design, the Report Builder Application does not automatically show tables in the report’s Data Environment when you build report expressions. To better protect end-user design sessions, only tables you have explicitly opened, not all tables from the DataEnvironment, are available in the Expression Builder. With this change, you have the opportunity to set up the design session’s data exactly the way you want the end-user to see it, before you issue a MODIFY REPORT Command in your application. If you prefer the Report Designer’s old behavior, you can change the Report Builder Application to emulate it. For more information, see How to: Replace the Report Builder’s Expression Builder Dialog Box.
The ASCII keyword on the REPORT FORM Command is documented as following the <filename> parameter of the TO FILE <filename> clause. In earlier versions of Visual FoxPro, you could safely use the incorrect and unsupported syntax TO FILE ASCII <filename> instead. This incorrect syntax triggers an error in Visual FoxPro 9. Note that the ASCII keyword has no effect on object-assisted mode output provided by the Report Engine, although a ReportListener Object can be written to implement it.
The keyword NOCONSOLE has no default meaning in object-assisted reporting mode, because ReportListeners do not echo their rendering output to the current output window by default. However, a ReportListener can mimic backward-compatible mode in this respect, if desired. Refer to OutputPage Method for a complete example.
To facilitate development of run-time reporting extensions, the Report Engine now allows normal debugging procedures during a report run. If your error handling routine assumes it is impossible for a report to be suspended, this assumption will now be challenged. Refer to Handling Errors During Report Runs for a detailed look at the associated changes, and some suggestions for strategy.
REPORT FORM and LABEL commands are no longer automatically prohibited as user-interface-related commands in COM objects compiled into DLLs, when you run the commands in object-assisted mode. The restriction still applies to these commands when they are run in backward-compatible mode. (The topic Selecting Process Types explains why user-interface-related commands are prohibited in DLLs.) This change is not applicable to multi-threaded DLLs. A number of user-interface-related facilities also are not available in DLLs (whether single- or multi-threaded). For example, the TXTWIDTH( ) Function and TextWidth Method depend on a window handle to function, so they are not available in a DLL. The CREATE REPORT – Quick Report Command relies on the same facilities as TXTWIDTH(), and therefore is not available in a DLL. However, in many instances, creating custom output using a ReportListener does not require any user-interface activity, so a REPORT FORM or LABEL command can now be used productively in a DLL. Using the SYS(2335) – Unattended Server Mode function to trap for potential modal states, as well as the new SET TABLEPROMPT Command, is recommended. Refer to Server Design Considerations and Limitations for more information.
Changes have occurred to the handling of group headers and footers in multi-column reports, when the columns flow from left to right (label-style layout). The revised behavior is more useful and behaves consistently with the new detail header and footer bands as well. For a description of the change, see How to: Define Columns in Reports.
In previous versions, the NOWAIT keyword on the REPORT FORM and LABEL commands was not significant when the commands were issued in the Command window rather than in a program. In Visual FoxPro 9’s object-assisted mode, when previewing instructions are interpreted by the Report Preview Application, this keyword is significant no matter where you issue the command. The Report Preview Application uses the NOWAIT keyword, consistently, as an instruction to provide a modeless preview form. For more information about the Report Preview Application, see Extending Report Preview Functionality.
Visual FoxPro 8 introduced the NOPAGEEJECT keyword on the REPORT FORM and LABEL commands, but applied the keyword only to printed output. In Visual FoxPro 9, NOPAGEEJECT has significance for all output targets, including PREVIEW. This keyword provides chained or continued report runs for multiple REPORT FORM and LABEL commands. To facilitate this behavior in preview mode, and to allow you to apply customization instructions to multiple previews, the Report Output Application caches a single ReportListener object instance for preview output, causing a change in behavior for multiple modeless report commands (REPORT FORM … PREVIEW NOWAIT). In the past, you used multiple REPORT FORM… PREVIEW NOWAIT statements in a sequence, your commands resulted in multiple report preview windows. In Visual FoxPro 9, when SET REPORTBEHAVIOR 90, these commands will result in successive report previews being directed to a single report preview window. Tip: You can easily invoke the old behavior by creating multiple ReportListener object references and associating one with each separate REPORT FORM or LABEL command, using the OBJECT keyword. For more information about using the OBJECT syntax, see REPORT FORM Command. For information about receiving multiple object references of the appropriate type from the Report Output Application, see Understanding the Report Output Application.
In the process of reviewing and overhauling the native Report Engine, a number of outstanding issues regarding band and layout element positioning were addressed. For example, a field element marked to stretch and sized to take up more than one text line’s height in the report layout might have inappropriately pushed its band’s exit events to the next page in Visual FoxPro 8. In Visual FoxPro 9, the band’s exit events occur on the same page. Additional revisions improve record-pointer-handling in footer bands, when bands stretch across pages. These changes are not specific to object-assisted output rendering. If you have relied on undocumented behavior providing exact band or layout control placement in a particular report, you should review that report’s behavior in Visual FoxPro 9.

Rushmore Optimization

SQL SELECT Statements

A SELECT – SQL Command containing DISTINCT and ORDER BY clauses in which the ORDER BY field is not in the SELECT field list will generate an error in Visual FoxPro 9.0 with SET ENGINEBEHAVIOR 90 (Error 1808: SQL: ORDER BY clause is invalid.). The following example shows this: Copy Code SET ENGINEBEHAVIOR 90 CREATE CURSOR foo (f1 int, f2 int) SELECT DISTINCT f1 FROM foo ORDER BY f2 INTO CURSOR res
A SELECT – SQL Command containing DISTINCT and HAVING clauses in which the HAVING field is not in the SELECT field list will now generate an error in Visual FoxPro 9.0 with SET ENGINEBEHAVIOR 90 (Error 1803: SQL: HAVING clause is invalid.). An error is reported because the HAVING field is not in projection and DISTINCT is used. The following example shows this: Copy Code SET ENGINEBEHAVIOR 90 CREATE CURSOR foo (f1 int, f2 int) SELECT DISTINCT f1 FROM foo HAVING f2>1 INTO CURSOR res
The number of UNION statements that can be used in a SELECT – SQL Command is no longer limited to 9. Parentheses are not completely supported with UNION statements and unlike previous versions may generate an error. If two or more SELECT statements are enclosed in parenthesis, an error is generated during compile (Error 2196: Only a single SQL SELECT statement can be enclosed in parentheses.). This behavior is not tied to any SET ENGINEBEHAVIOR Command level. The following example shows this error: Copy Code SELECT * FROM Table1 ; UNION ; (SELECT * FROM Table2 ; UNION ; SELECT * FROM Table3) The following example compiles without an error: Copy Code SELECT * FROM Table1 ; UNION ; (SELECT * FROM Table2) ; UNION ; (SELECT * FROM Table3)

For more information, see SET ENGINEBEHAVIOR Command.

Disabling TABLEREVERT( ) Operations During TABLEUPDATE( ) Operations

For CursorAdapters, Visual FoxPro does not permit TABLEREVERT( ) operations during operations.

For more information, see TABLEREVERT( ) Function and TABLEUPDATE( ) Function.

Index Key Truncation during Index Updates

INDEX ON charfld1 + memofld1 TAG mytag

For more information, see Error building key for index “name”. (Error 2199).

Memo Field Corruption

Memo file <path>\myclass.VCT is missing or is invalid.

While it is possible that loss of data may occur, the following sample code may assist in repairing some or the entire file:

Visual Form and Class Extended Property Support

Class Definitions

The ability to have a property assignment set to instantiated object is no longer supported in a class definition and will generate an error. The following example shows this.

You can instead assign a property to an instantiated object reference in the Init event of your class.

Merge Modules for Redistributable Components

Note For Windows XP and higher operating systems, Visual FoxPro uses the GDI+ graphics library that is installed in your Windows System folder.

Tip There may be circumstances where you will want to install the VC or GDI+ library to another location such as the Windows System directory. You can do this with your Windows Installer application (e.g., InstallShield) by first selecting the merge module before selecting the VFP9RUNTIME.MSM one. Once you have selected a merge module, you can change its installation path.

There are new merge modules for MSXML3 and MSXML4 XML parser components. The MSXML 3.0 component consists of the following merge modules:

MSXML 3.0 (msxml3_wim32.msm)
Msxml3 Exception INF Merge Module (msxml3inf_wim32.msm)
WebData std library (wdstddll_wim32.msm)

There are two MSXML 4.0 modules that should be included with any custom setup:

MSXML 4.0 (msxml4sxs32.msm)
MSXML 4.0 (msxml4sys32.msm)

MTDLL Memory Allocation

Visual FoxPro contains a new PROGCACHE configuration file setting which specifies the amount of memory Visual FoxPro allocates at startup for running programs (program cache). This setting also determines memory allocated per thread for Visual FoxPro MTDLL COM Servers. In prior versions, this setting was not configurable and memory was allocated as a fixed program cache of a little over 9MB (144 * 64K). The new PROGCACHE setting allows you to set the exact size of the program cache or specify that dynamic memory allocation be used.

Miscellaneous Changes

The following are miscellaneous changes that you should know about but are not likely to impact existing code. CursorAdapter Changes

In the current version of Visual FoxPro, the following behavior changes apply to the CursorAdapter object:

You can no longer call TABLEREVERT( ) Function while a TABLEUPDATE( ) Function operation is in progress.
The ConversionFunc Property setting is now respected during ADODB.Recordset based updates.
The target record is now kept current in the ADODB.Recordset during CursorAdapter.After… events.

Grid SetFocus Supported for AllowCellSelection

You can now call a Grid control’s SetFocus Method and have the Grid receive focus when the AllowCellSelection Property is set to False (.F.) and the grid contains no records.

EXECSCRIPT Function

The EXECSCRIPT( ) Function now allows you to pass parameters by reference.

To make a valid call that does not cause a syntax error, you can use the following code:

Listbox Control Click Event

In the current version of Visual FoxPro, the PageUp, PageDown, Home and End keyboard keys now cause a Listbox control’s Click event to fire. In previous versions, these keys did not trigger the Click event to fire, unlike the arrow keys.

PEMSTATUS( ) Function Returns False for Hidden Native Properties

Changes to Options Dialog Box

In the Options dialog box, the List display count option has been moved from the Editor tab to the View tab. For more information, see View Tab, Options Dialog Box.
In previous versions of Visual FoxPro, you could output all the settings in the Options Dialog Box (Visual FoxPro) to the Command Window by pressing the SHIFT key when choosing the OK button to close the dialog. In the current release, these settings are now sent to the Debug Output Window. The Debug Output window must be opened in order for the settings to be directed there.

FOXRUN.PIF

The FOXRUN.PIF file is still available in the Tools directory if you need it for a particular reason.

For more information, see RUN | ! Command.

SCATTER Command

The SCATTER command no longer allows for ambiguous use of both MEMVAR and NAME clauses in the same command. You can only include one of these clauses. In prior versions, the following code would not generate an error:

For more information, see SCATTER Command.

SET DOHISTORY

The SET DOHISTORY command, which is included for backward compatibility, was updated to send output to the Debug Output Window instead of the Command Window as in prior versions.

SCREEN ShowTips Property

The default value for _SCREEN ShowTips Property has been changed from False (.F.) to True (.T.). This change was made because new Memo and Field Tips support is now dependent on this setting.

AllowCellSelection Does Not Permit Deleting Grid Rows When Set to False

Northwind Database

Foundation Classes

The _ShellExecute class contained in the _Environ.vcx FFC class library has been updated to include an additional parameter in the ShellExecute method.

Wizards and Builders

The Wizard/Builder selection dialog box now properly hides deleted entries in the Wizard and Builder registration tables.

Specifying Western Language Script Values for GETFONT( ) Function

In versions prior to this release, passing 0 as the nFontCharSet value for GETFONT( ) opened the Font Picker dialog box and displayed the Script list as unavailable. You could not specify 0 (Western) as the language script value, and setting it to 1 (Default) sets nFontCharSet to the default font setting only, which is determined by the operating system.

In this release, passing 0 to GETFONT( ) opens the Font Picker dialog box with the Script list available and Western selected. The return value for GETFONT( ) also includes the return value for nFontCharSet.

Removed Items

HTML Help SDK

The HTML Help 1.3 SDK no longer ships with Visual FoxPro.

_

_MEMBERDATA	_MENUDESIGNER	_REPORTBUILDER
_REPORTOUTPUT	_ REPORTPREVIEW	_TOOLTIPTIMEOUT

A

ADJUSTOBJECTSIZE	ADOCODEPAGE	AFTERBAND
AFTERRECORDREFRESH	AFTERREPORT	ALLOWMODALMESSAGES
ANCHOR	ASQLHANDLES	AUTOCOMPLETE
AUTOCOMPSOURCE	AUTOCOMPTABLE	AUTOHIDESCROLLBAR

B

BEFOREBAND	BEFORERECORDREFRESH	BEFOREREPORT
BLOB

C

CANCELREPORT	CAST	CLEARRESULTSET
CLEARSTATUS	COMMANDCLAUSES	CONFLICTCHECKCMD
CONFLICTCHECKTYPE	CURRENTDATASESSION	CURRENTPASS

D

DECLAREXMLPREFIX	DELAYEDMEMOFETCH	DISPLAYORIENTATION
DOCKABLE	DOMESSAGE	DOSTATUS
DYNAMICLINEHEIGHT

E

EVALUATECONTENTS

F

FETCHMEMOCMDLIST	FETCHMEMODATASOURCE	FETCHMEMODATASOURCETYPE
FIRSTNESTEDTABLE	FRXDATASESSION	FOXOBJECT

G

GDIPLUSGRAPHICS	GETAUTOINCVALUE	GETDOCKSTATE
GETPAGEHEIGHT	GETPAGEWIDTH	GETRESULTSET

I

ICASE	INCLUDEPAGEINOUTPUT	INSERTCMDREFRESHCMD
INSERTCMDREFRESHFIELDLIST	INSERTCMDREFRESHKEYFIELDLIST	ISMEMOFETCHED
ISPEN	ISTRANSACTABLE

L

LISTENERTYPE

LOADREPORT

M

MAKETRANSACTABLE

MAPBINARY

MAPVARCHAR

N

NEST

NESTEDINTO

NEXTSIBLINGTABLE

O

ONPREVIEWCLOSE	OPTIMIZE	ORDERDIRECTION
OUTPUTPAGE	OUTPUTPAGECOUNT	OUTPUTTYPE

P

PAGENO	PAGETOTAL	PICTUREMARGIN
PICTURESPACING	PICTUREVAL	POLYPOINTS
PREVIEWCONTAINER	PRINTJOBNAME	PROGCACHE

Q

QUIETMODE

R

RECORDREFRESH	REFRESHALIAS	REFRESHCMD
REFRESHCMDDATASOURCE	REFRESHCMDDATASOURCETYPE	REFRESHIGNOREFIELDLIST
REFRESHTIMESTAMP	RENDER	REPORTBEHAVIOR
REPORTLISTENER	RESPECTNESTING	ROTATION

S

SCCDESTROY	SCCINIT	SELECTIONNAMESPACES
SENDGDIPLUSIMAGE	SETRESULTSET	SQLIDLEDISCONNECT
SUPPORTSLISTENERTYPE

T

TABLEPROMPT

TIMESTAMPFIELDLIST

TWOPASSPROCESS

U

UNLOADREPORT	UNNEST	UPDATECMDREFRESHCMD
UPDATECMDREFRESHFIELDLIST	UPDATECMDREFRESHKEYFIELDLIST	UPDATESTATUS
USECODEPAGE	USECURSORSCHEMA	USETRANSACTIONS

V

VARBINARY

VARCHAR

VARCHARMAPPING

X

XMLNAMEISXPATH

Conversion to Visual FoxPro Format

You can find additional information about upgrading from previous versions of Visual FoxPro on the Microsoft Developer Network (MSDN) Web site at http://msdn.microsoft.com. You can search the MSDN Archive for documentation of previous versions of Visual FoxPro.

In This Section

Optimizing the Operating Environment

Describes how to optimize computer hardware and and operating environment for running Visual FoxPro. Optimizing Visual FoxPro Startup Speed

Describes how to optimize startup and operating speed in Visual FoxPro.Optimizing Visual FoxPro in a Multiuser Environment

Describes how to improve performance when running Visual FoxPro in a multiuser environment.

Related Sections

Customizing the Visual FoxPro Environment

Provides information about setting environment options, accessibility features, and configuration.Getting Started with Visual FoxPro

Discusses how to get started, including information about installing, upgrading, and customizing Visual FoxPro to create state-of-the-art enterprise database solutions. What’s New in Visual FoxPro

Lists the new features and enhancements made to this version of Microsoft Visual FoxPro.Using Visual FoxPro

Provides links to information on Visual FoxPro programming features that are designed to improve developer productivity, including Access and Assign methods, support for more graphic file formats, and language to simplify programming tasks.Developing Visual FoxPro Applications

Includes conceptual information about how to develop Visual FoxPro applications, instructions for creating databases and the user interface, and other tasks needed to create Visual FoxPro applications.Programming in Visual FoxPro

Discusses how to access the full power of Visual FoxPro by creating applications. Understanding object-oriented programming techniques and the event-driven model can maximize your programming productivity.

Optimizing the Operating Environment

You can optimize Visual FoxPro performance by maximizing your computer’s hardware and operating environment. The following sections describe how you can optimize these areas:

Maximizing Memory and Virtual Memory
Managing Your Hard Disk

Maximizing Memory and Virtual Memory

Do not run other Windows applications while running Visual FoxPro.
Use only those memory-resident programs needed for operation.
Simplify the screen display.

You can free memory by simplifying the way windows and screen backgrounds display on your computer monitor.

Use a color or a pattern for the desktop background instead of wallpaper.
Use the lowest-resolution display that is practical for you. The higher resolution of the display, the more memory your computer requires and the slower your graphics and user-interface elements appear. For VGA-compatible displays that use an extended mode driver, such as Video 7 or 8514, using the standard VGA driver ensures faster display performance but provides lower resolution and less color support.

Managing Your Hard Disk

Managing your hard disk can improve overall product speed. To get the best performance from your hard disk, provide a generous amount of disk space. If your hard disk has little free space, you can increase Visual FoxPro performance by removing unnecessary data or by purchasing a hard disk with greater capacity. Disk input/output performance degrades significantly when a hard disk is nearly full. The more free hard disk space that is available, the more likely it is that contiguous blocks of disk space are available. Visual FoxPro uses this space for changes and additions to database, table, index, memo, and temporary files. Increasing free hard disk space improves performance of any commands that change or add to your files. More disk space also decreases the time required to read those files in response to your queries. The way that Windows and Visual FoxPro manage files on disk can greatly affect the performance of your application. The following sections discuss managing files in directories and temporary files:

Managing Files in Directories
Managing Temporary Files

Managing Files in Directories

As a directory becomes increasingly congested with files, the operating system takes longer to find files. The speed of your system when searching directories is a factor that Visual FoxPro does not control. To improve the speed of directory searches, reduce the number of files in your directories by performing the following actions: Use the Visual FoxPro Project Manager to create and manage your files, segregate program files into separate directories, and avoid creating numerous generated files. When you want to distribute your application, create an application or an executable (.exe) file instead of numerous individually generated files. This process decreases the number of files in your application’s subdirectories and increases performance. If you delete a large number of files in one directory, copy the remaining files into a new directory or optimize the directory using a defragmenting utility program. Note: Deleting files from a directory does not automatically speed directory searching. When a file is deleted, the file is only marked for deletion and is still included in directory searches. When saving files, use short file paths to increase performance. For example, suppose you have a file path “C:\Program Files\Microsoft Visual FoxPro\…”, which is a very long file path. Try to use shorter file paths. Managing Temporary Files

Visual FoxPro creates temporary files for a variety of operations. For example, Visual FoxPro creates temporary files during editing, indexing, and sorting. Text editing sessions can also create a temporary or backup (.bak) copy of the edited file. By default, Visual FoxPro creates its temporary files in the same directory that Windows stores its temporary files unless you specifically designate an alternate location. Tip: In most cases, you should specify one location for all Visual FoxPro temporary files. Make sure that the location you specify contains enough space for all possible temporary files.

For more information, see How to: Specify the Location of Temporary Files.

Searching for Temporary Files

TMP system variable.
TEMP user variable.
TEMP system variable.

If these variables do not specify a location, the location for storing temporary files defaults to the home drive and path, or the Temp folder in the user’s Documents and Settings directory.

On Windows 95, 98, and Me, GetTempPath searches the TMP and TEMP global system variables in that order and then searches the current directory.

For more information, see SYS(2023) – Temporary Path and Special Terms for Configuration Files.

Managing Startup Speed

Managing File Locations

Visual FoxPro stores the FoxUser.dbf file, which contains user settings, in the user’s Application Data directory by default. You can display this location by typing ? HOME(7) in the Command window. Visual FoxPro searches for the FoxUser.dbf and Config.fpw files in the following places:

In the startup application or executable file, if any. For example, you can start a Visual FoxPro application by typing the following code on the command line: Copy Code VFPversionNumber.exe MyApp.app – or – Copy Code VFPversionNumber.exe MyApp.exe If the startup application or executable file contains a Config.fpw file, the configuration file is always used. You can override settings in a Config.fpw file that are bound inside an application by specifying an external Config.fpw file, using the -C command-line switch when starting an application or Visual FoxPro.
In the working directory.
Along the path established with the PATH environment variable.
In the directory containing Visual FoxPro.

Controlling File Loading

You can use the environment variable FOXPROWCFG to explicitly specify the location of Config.fpw. For details about the FOXPROWCFG variable, see Customizing the Visual FoxPro Environment.

Optimizing the Load Size of Visual FoxPro

If you don’t plan on using any of the Visual FoxPro components listed previously, set them to an empty string to speed startup.

To optimize the load size of Visual FoxPro, use the following syntax:

Replace cFileVariable with _TRANSPORT, _CONVERT, or other variables as appropriate.

Optimizing Key SET Commands

You can optimize the operation of Visual FoxPro by tuning the values of certain SET commands.

Command Settings for Maximum Performance

SET Command	Performance Setting
SET ESCAPE Command	ON
SET OPTIMIZE Command	ON
SET REFRESH Command	0,0
SET SYSMENU Command	DEFAULT
SET TALK Command	OFF
SET VIEW Command	OFF

Managing Temporary Files

For more information about temporary files, see Optimizing the Operating Environment and How to: Specify the Location of Temporary Files.

Sharing Tables

If users share tables on a network, the way you manage access to them can affect performance.

Avoid opening and closing tables repeatedly.
Buffer write operations to tables that are not shared.
Provide exclusive access to tables.
Limit the time on locking tables.

Providing Exclusive Access

To open data files for exclusive use, use the EXCLUSIVE clause in the USE and OPEN DATABASE commands. For more information, see USE Command and OPEN DATABASE Command.

Limiting the Time on Locking Tables

Microsoft Transaction Server for Visual FoxPro Developers

Article
06/30/2006

Introduction What Is Microsoft Transaction Server? Why Is MTS Important for Visual FoxPro Developers? Creating Your First MTS Server Setting Up Security The Basic Features of MTS Just-In-Time Activation Transactions Programming Models Deployment Remote Deployment and Administration Security Shared Property Manager MTS Support for Internet Information Server Automating MTS Administration Tips and TricksExpand table

Click to copy the sample files associated with this technical article.

Introduction

No doubt you’ve heard all about Microsoft Transaction Server (MTS) and how it will make your life easier to develop three-tier applications. This article offers a good primer on using Visual FoxPro 6.0 with MTS. We cover the basics of using MTS and then extend it to using with Visual FoxPro Component Object Model (COM) Components. This document is intended to be used with the Microsoft PowerPoint® slide show included with the Visual FoxPro sample files.

MTS is a great environment for working with three-tier development. However, one should realize that it is simply not just a matter of dropping your Visual FoxPro servers into an MTS package and expecting miracles. While it is true that much of the work is already done for you, nothing comes for free. Performance and scalability are critical factors that require well-thought-out designs. Good MTS applications are designed with MTS in mind from the start!

This article assumes that you have MTS already installed. It is available in the Microsoft Windows NT® version 4.0 Option Pack, available from the Microsoft Web site at https://www.microsoft.com/windows/downloads/default.asp.

In addition, you should familiarize yourself with the basics of MTS. Information is available in the Help files provided with MTS when you install the Windows NT 4.0 Option Pack.

What Is Microsoft Transaction Server?

MTS is a component-based transaction processing system for building, deploying, and administering robust Internet and intranet server applications. In addition, MTS allows you to deploy and administer your MTS server applications with a rich graphical tool (MTS Explorer). MTS provides the following features:

The MTS run-time environment.
The MTS Explorer, a graphical user interface for deploying and managing application components.
Application programming interfaces (APIs) and resource dispensers for making applications scalable and robust. Resource dispensers are services that manage nondurable shared state on behalf of the application components within a process.

The MTS programming model provides a framework for developing components that encapsulate business logic. The MTS run-time environment is a middle-tier platform for running these components. You can use the MTS Explorer to register and manage components executing in the MTS run-time environment.

The three-tier programming model provides an opportunity for developers and administrators to move beyond the constraints of two-tier client/server applications. You have more flexibility for deploying and managing three-tier applications because:

The three-tier model emphasizes a logical architecture for applications, rather than a physical one. Any service may invoke any other service and may reside anywhere.
These applications are distributed, which means you can run the right components in the right places, benefiting users and optimizing use of network and computer resources.

Why Is MTS Important for Visual FoxPro Developers?

Microsoft is investing a great amount of resources in three-tier development because of a multitude of benefits derived from this architecture. As shown in Figure, Tier 2, the so-called “middle tier,” represents the layer where much of the Application Services/Business Logic is stored. Visual FoxPro COM components are ideally suited for this architecture and will play a key role in this tier for many years to come. This middle tier is also where MTS lives.

Figure 1. Web-enabled three-tier architecture

Future applications will consist of Web based front ends using a combination of HTML/XML. While Visual FoxPro data can be used as your database of choice for Tier 3, your applications should be written to communicate to a generic back end. This should be a test of your application’s extensibility. “How easy is it to swap back ends—let’s say Visual FoxPro database to Microsoft SQL Server™?” There are several options, including Open Database Connectivity (ODBC) and ActiveX® Data Objects (ADO), which provide generic interfaces to data. Remember, your application should be written knowing that any or all of the three tiers can be swapped out independent of each other.

So why is MTS great for Visual FoxPro developers? It should be clear now that the ability to swap out tier components at will makes for a great reusability story. Microsoft has a concept called total cost of ownership (TCO), which means the collective cost of providing and maintaining corporate Information Services. The three-tier model goes a long way toward reducing TCO.

Updating the Presentation layer is very easy because it merely involves one having to refresh his/her browser. Windows front ends consisting of Visual FoxPro/Visual Basic® forms offer more flexibility in user interface, but updating 150 sites can be time-consuming. In addition, one should expect improved UI options available in HTML.

The back-end data is usually the tier that changes the least. Having data managed centrally also reduces costs. Remember that data can be distributed and still managed from one location. It doesn’t have to be stored centrally to be managed centrally.

Finally, we get to Visual FoxPro’s role in the middle tier. Middle-tier components tend to change most often because they represent business rules, which change as the needs of the business changes. Traditional client/server and monolithic applications would often combine the first two layers into one. This was very inefficient because of the distribution costs in updating sites. Today, with browsers, much of this distribution problem goes away. However, business rules are often complex and can contain sensitive/secure information, so it’s not always wise to send these rules back with the HTML to a Web browser. In addition, it can impede performance.

So, we end up with a dilemma. We want to limit the amount of information sent back to the client, but we also want to minimize the number of back and forth trips between client and server, because bandwidth is also a big consideration (more so with the Internet versus an intranet). The best solution is one involving a so-called “Smart Client.” Traditionally, the Web browser is thought of as an unintelligent client whose job is to merely display an entire static Web page. Each time something on the page changes, we need to refresh the entire Web page. With dynamic HTML (DHTML), you no longer need to do this. Only parts of the Web page affected need updating. In addition, some of the business rules can (and should) reside on the client, thus reducing round trips to the server. For example, you may want to have your client have simple data validation rules, such as one to ensure a value is not negative. It would be more efficient to perform these sorts of checks on the client. Most of the rules, especially sensitive ones, will exist on the server away from client eyes. It is also important to realize, however, that client-side business rules are subject to change almost as frequently as those on the server. The ATSWeb application (available at https://msdn.microsoft.com/vfoxpro/ats_alpha/default.htm) offers a great example of business rules being applied to both client and server.

MTS provides an environment for hosting your Visual FoxPro middle-tier objects because it handles many of the common tasks, including resource and thread management, security, deployment, application robustness, and transactions. This leaves you, the developer, with only the responsibility of providing business logic specific to your application.

Creating Your First MTS Server

Let’s jump right in and create an MTS server, because it’s very simple if you already know how to create a Visual FoxPro COM component.

Creating a Visual FoxPro COM Component

Create a new project file called test1.pjx
Create a new program file (PRG) called test1.prg
Add the following code to this program:DEFINE CLASS server1 AS custom OLEPUBLIC PROCEDURE hello RETURN “Hello World” ENDPROC ENDDEFINE
Build the server as a DLL (for example, test1.dll). All MTS components must be created as in-process DLL servers. You now have a server that can be tested directly in Visual FoxPro:x=create(“test1.server1”) ? x.hello()

Adding the Visual FoxPro COM Component to an MTS Package

A package is a collection of components that run in the same process. Packages define the boundaries for a server process running on a server computer. For example, if you group a Sales component and a Purchasing component in two different packages, these two components will run in separate processes with process isolation. Therefore, if one of the server processes terminates unexpectedly (for instance, because of an application fatal error), the other package can continue to execute in its separate process.

This section describes the task of installing the Visual FoxPro server into the MTS environment.

Launch MTS Explorer.
In the left pane, navigate to the Computers item and select My Computer. You are now looking at the MTS environment.
Click the Packages Installed node to view all default packages installed by MTS. You can think of a Package as a set of components that perform related application functions. For example, an Inventory package might consist of two DLLs, each performing a task related to checking product inventory for a customer order.
Let’s create a new package now. Select the Action -> New -> Package menu item.
Click the Create an empty package button. Type in a name for your new package (for example, Foxtest1).
Click the Next button, and then click the Finish button. You should now see your new package added under the Packages Installed node.
Click your new package node (for example, Foxtest1). You should now see two items. The Components folder is where you add new components such as the Visual FoxPro component you just created. The Roles folder is where you set up groups of users (roles) who all share similar access privileges (security). You do not need to add anything to the Roles folder in order to use your Visual FoxPro component with MTS.
Click the Components folder and select the Action -> New -> Component menu item.
Click the Install new component(s) button. This will bring up the Install Components dialog box. Click the Add files button and go to the location where you created your Visual FoxPro server (for example, test1.dll). Select both the .dll and .tlb files. The .tlb file is the type library file containing properties and methods of your server. After selecting these two files, you should see your OLEPUBLIC component listed in the lower panel. Click Finish and you should see your server added to this folder.
At this point, your package is complete and ready to go. Later, we will talk about setting Transaction support. This can be done from the Properties dialog box of your server.

Accessing Your Component

You can now test your new MTS packaged component using a command similar to the one used to test Visual FoxPro after the DLL server was first created.

x=create("test1.server1")
? x.hello()

That’s all you need to do! If you go back into the MTS Explorer, you should see the component represented with a spinning icon. Click the Status View to see details about the state of the object.

Figure 2. New component viewed in MTS Explorer

If you release the object (RELEASE x), MTS releases its reference.

Going Forward

We’ve just discussed the basics of installing your Visual FoxPro server in MTS. Essentially, all we did was wrap the Visual FoxPro component inside an MTS process that manages security, transaction state, fault tolerance, and other common server responsibilities. All Visual FoxPro servers used with MTS are registered this way. The remainder of the article discusses how to take advantage of MTS-specific features such as security and transactions. You can write code in your components that talk directly to the MTS run-time environment. In addition, the above process can be entirely automated, because MTS exposes an administrative Automation interface.

Setting Up Security

So why are we starting out so early with security? Well, sooner or later, you’re going to fiddle with some sort of security switch and suddenly that MTS application of yours will no longer work. It’s important that you follow these instructions and refer to them later when you decide to add security to your applications.

Note MTS 2.0 security setup is described in the Readme document. If you have MTS installed on Microsoft Windows® 95, you can skip this section.

Setting System Package Identity

Before you do anything in MTS, it is a good idea to configure the system package for administrating security. When installing MTS, set the system package identity before creating any new packages as follows:

Create a new local Windows NT group named “MTS Administrators” and a new local user named “MTS Administrator.”
Add the “MTS Administrator” user to the “MTS Administrators” and “Administrators” groups.
Set the identity of the system package to “MTS Administrator.” If this does not work, try setting this to the Administrator user.

Note You cannot set a package’s identity to a group.

Shut down the system package so that it will be restarted with the new identity. You can do this by right-clicking the My Computer icon in MTS Explorer and selecting Shut Down Server Processes.

Adding Security for MTS Packages

You first need to determine whether you want all or just a few components in your Package to have security. Right-click the Package and select Properties. Next, click the Security tab. Then check the Enable authorization checking check box. To enable or disable security at a component level, right-click a component and display the Properties dialog box.

If this is all you do, an “Access is denied” error message is generated when you try to access your component. You MUST associate a valid role with any component marked for security!

Right-click the package’s Roles folder and select New Role. Type in a functional role such as Managers, Accountants, and so on.

The new role is added as a subfolder. Right-click this folder to Add New User (you will get a dialog box to Add Users and Groups to Role). Select the user(s) that you want to add to your role. To finish, select the Role Membership folder under each component that is marked for security and add the new role created in step 3 by right-clicking the folder and selecting New Role.

Note You may still experience the “Access is denied” error message when running your components. There are a couple of possible solutions:

Sometimes adding a Group to a role does not work (step 3). You might try adding individual users instead.
The user rights for that user are not properly set. Make sure the user account for the identities of the system package and other MTS packages have the Windows NT “Log on as a service” user right. You can verify this by using the Windows NT User Manager:

From the Policies menu, select User Rights.
Click Show Advanced User Rights.

Tips for Visual FoxPro Users

Much of the security administration can easily be handled by Automation using the MTS Admin objects. You can set up Security administration in the AfterBuild event of a ProjectHook class you have tied to the project that generates your MTS COM DLL server. See the section “Using Visual FoxPro 6.0 Project Hooks” for examples.

The Basic Features of MTS

Before we jump right into using Visual FoxPro with MTS, let’s review some basic concepts that you need to know in order to make effective use of the MTS environment. For more detailed information, see MTS Help.

Activity

An activity is a collection of MTS objects that has a single distributed thread of logical execution. Each MTS object belongs to a single activity. This is a basic concept that describes how the middle-tier functions when confined to the MTS environment. In an MTS package, multiple clients can access objects, but only one object per client is running at a time on a single thread.

Context

Context is state that is implicitly associated with a given MTS object. Context contains information about the object’s execution environment, such as the identity of the object’s creator and, optionally, the transaction encompassing the work of the object. The MTS run-time environment manages a context for each object.

As a developer, think of every Visual FoxPro object that is registered in an MTS package as having an associated Context object that is created every time you instantiate the Visual FoxPro object. So, each time you issue a CreateObject command, two objects are created—your server and its associated Context. In fact, you can return an object reference to this Context object directly in your code, as in the following example:

#DEFINE MTX_CLASS   "MTXAS.APPSERVER.1"
LOCAL oMTX,oContext
oMtx = CREATEOBJECT(MTX_CLASS)
oContext = oMtx.GetObjectContext()

The Context object has the following properties and methods.Expand table

Count	CreateInstance	DisableCommit
EnableCommit	IsCallerInRole	IsInTransaction
IsSecurityEnabled	Item	Security
SetAbort	SetComplete

As you can see, the properties, events, and methods (PEMs) are used to access information related to the object transaction and security context (see MTS Help for more details on specific syntax for these PEMs). It is important to understand that the Context state is inherited. An object in a package called from another object in the same package will inherit the state of its caller. Because Context is confined within the same process, state, such as security, is trusted. No object in a package needs to explicitly provide its own security. When your object is released, so is its Context.

Package

Packages, as we just described, are the building blocks of MTS. Think of them as mini applications—a set of components that perform related application functions. All components in a package run in the same MTS process.

Remember, “Good MTS applications are designed with MTS in mind from the start.” You should design your Package contents with your entire application in mind. Each package runs in its own process, so try to design packages that don’t attempt to do more than they absolutely need to. There are performance advantages to maintaining many components within in a single package, but there may also be security constraints (roles) that dictate a different architecture.

Packages are also the primary means of deployment. The MTS environment allows one to export the contents of a Package to a nice distributable setup (both client and server). We’ll discuss this in the “Deployment” section.

Role

A role is a symbolic name that defines a class of users for a set of components. Each role defines which users are allowed to invoke interfaces on a component. A role is the primary mechanism to enforce security. Role-based security is handled at the component level. It’s possible that this may be at the method level in a future version of MTS. Security cannot be enforced on the Windows 95 version of MTS.

Roles are stored at the package level. Each component in a package can belong to one of more of the defined roles. For example, an Inventory package might contain a Visual FoxPro server whose responsibility is to handle inventory. There are two roles defined in this package: Managers and Clerks. These two roles are simply collections of Windows NT users/groups with a collective name that you provide. Your server is coded so that Clerks can access inventory data for normal order entries and reporting. Managers have additional power in that they can override inventory levels to make adjustments (for example, quarterly product shrinkage estimates).

You can set up security so that it is automatically handled (for instance, users not in roles are given “Access is denied” error message), or you can manage it programmatically through code. The Context object’s IsCallerInRole method is ideal for this.

Resource Dispensers

A resource dispenser manages nondurable shared state on behalf of the application components within a process. Resource dispensers are similar to resource managers, but without the guarantee of durability. MTS provides two resource dispensers:

The ODBC resource dispenser
The Shared Property Manager

Resources are shared within the same process—same process = same package. In the section “Shared Property Manager,” we discuss programmatically accessing shared properties. This is a really cool thing for Visual FoxPro developers because it allows multiple instances of objects to share state information. For example, you could have a counter that tracks the last ID number used by a database.

ODBC resource dispenser

The ODBC resource dispenser manages pools of database connections for MTS components that use the standard ODBC interfaces, allocating connections to objects quickly and efficiently. Connections are automatically enlisted on an object’s transactions and the resource dispenser can automatically reclaim and reuse connections. The ODBC 3.0 Driver Manager is the ODBC resource dispenser; the Driver Manager DLL is installed with MTS.

Shared Property Manager

The Shared Property Manager provides synchronized access to application-defined, process-wide properties (variables). For example, you can use it to maintain a Web page hit counter or to maintain the shared state for a multiuser game.

Resource Managers

A resource manager is a system service that manages durable data. Server applications use resource managers to maintain the durable state of the application, such as the record of inventory on hand, pending orders, and accounts receivable. Resource managers work in cooperation with the Microsoft Distributed Transaction Coordinator (MS DTC) to guarantee atomicity and isolation to an application. MTS supports resource managers, such as Microsoft SQL Server version 6.5, that implement the OLE Transactions protocol.

The MS DTC is a system service that coordinates transactions. Work can be committed as an atomic transaction even if it spans multiple resource managers, potentially on separate computers. MS DTC was first released as part of SQL Server 6.5 and is included in MTS, providing a low-level infrastructure for transactions. MS DTC implements a two-phase commit protocol to ensure that the transaction outcome (either commit or abort) is consistent across all resource managers involved in a transaction. MS DTC ensures atomicity, regardless of failures.

You might be asking if Visual FoxPro is a resource manager, because it has its own native database. Unfortunately, the answer is no. Visual FoxPro transactions are native to Visual FoxPro and do not go through the MS DTC. Therefore, automatic transaction support within MTS is not supported for Visual FoxPro data. You cannot use the Context object’s SetAbort method to abort a transaction if the data is stored in Visual FoxPro databases/tables. The database must either support OLE Transactions (SQL Server) or be XA-compliant (Oracle).

Base Clients

A base client is simply a client that runs outside of the MTS run-time environment, but instantiates MTS objects. In a three-tier architecture, a base client is typically the presentation layer, such as an application form or Web page. The base client neither knows nor needs to know that MTS is used in the middle tier. It merely creates an instance of an object that exists in an MTS package and awaits a response. The following table describes some of the differences between a base client and an MTS component, such as a Visual FoxPro DLL server.Expand table

Base client	MTS component
Can be EXEs, DLLs.	Must be in-process DLL.
MTS does not manage its process.	Manages server processes that host MTS component.s
MTS does not create or manage threads used by application.	Creates and manages threads.
Does not have implicit Context object.	Each MTS object has own Context object.
Cannot use Resource Dispensers.	Can use Resource Dispensers.

Just-In-Time Activation

Just-in-Time (JIT) activation is the ability to activate an MTS object only as needed for executing requests from a client. Most Visual FoxPro developers are familiar with object instantiation, as in the following code:

myObject = CreateObject("myclass")
myObject.myMethod()
myObject.myProperty = 123
RELEASE myObject

A “stateful” object created by this code retains state during the lifetime of the object (until it is released). This means that property values (such as myProperty) are retained between statement execution. When the object is finally released, all object references and state are released.

There is overhead with creating objects from your Visual FoxPro components. Each time you instantiate an object, Visual FoxPro needs to allocate a certain amount of memory. In addition, the first time you create an object, Visual FoxPro takes a little extra time to load its run-time libraries. When the last instance is released, the entire Visual FoxPro run time is also released.

JIT activation addresses many of these memory issues that affect performance. The first thing JIT does is cache the server’s run-time libraries in memory, even though no outstanding object references exist. The first time you instantiate a Visual FoxPro server that’s in an MTS package, the Visual FoxPro run time loads the address space of the MTS process. When you release the object, MTS still keeps the libraries in memory for a specified amount of time. You can change this setting in the package’s property sheet (default = 3 minutes). This saves having to reload the run time when the object count hits 0.

The main thing that JIT activation offers is ability to transform your object from “stateful” to “stateless” mode. In the preceding example, you can interpret a “stateless” object as one having the initial default settings. So, in the example, the value of myProperty would be reset to its original setting. A stateless object is managed by MTS and is very lightweight, so it consumes much less memory. The only thing keeping the stateless object alive is the object reference held onto by the client. Internally, MTS recycles threads consumed by stateful objects when they go stateless. When a method is invoked on that object, it then becomes stateful again on a thread that could be different from the one originally created on.

Putting your objects into a stateless mode is handled easily by the Context object. The following code illustrates putting an object in a stateless mode:

#DEFINE MTX_CLASS   "MTXAS.APPSERVER.1"
LOCAL oMTX,oContext
oMtx = CREATEOBJECT(MTX_CLASS)
oContext = oMtx.GetObjectContext()
oContext.SetComplete()

This code is actually called from within a method of your Visual FoxPro server. You can see if your object is stateless by viewing the status of your component in the MTS Explorer. A stateless object appears in the Objects column, but not in the Activated or In Call columns.

Use the SetComplete method to put the object in a stateless mode. Use SetComplete for committing transactions (as we discuss in the next section, “Transactions”). You can also use SetAbort to make an object stateless.

Again, when you change an object to stateless, all property settings revert to their original defaults. When you invoke a method (or property set/get) on this stateless object, the object is activated (goes stateful) and the object’s INIT event is fired. When you call SetComplete, the object DESTROY event is fired.

Note Any state that exists on the object is lost when the object is deactivated (SetComplete). If you need to save state, you should either persist information to a database or use the MTS Shared Property Manager.

Because your object’s INIT is called whenever your object goes from Stateless to Stateful, you should try to minimize the amount of code in this event.

Here is a simple scenario showing interaction between client and MTS server.

Visual FoxPro server code:

DEFINE CLASS mts2 AS Custom OLEPUBLIC
   MyColor = "Green"
   PROCEDURE InUsa (tcCustID)
      LOCAL llInUSA,oMTX,oContext
      oMtx = CreateObject("MTXAS.APPSERVER.1")
      oContext = oMtx.GetObjectContext()
      llInUSA = .F.
      USE CUSTOMER AGAIN SHARED
      LOCATE FOR UPPER(cust_id) == UPPER(tcCustID)
      IF FOUND()
         llInUSA = (ATC("USA",country)#0)
      ENDIF
      oContext.SetComplete()
      RETURN llInUSA
   ENDPROC
ENDDEFINE

Base client executes following code:

LOCAL oCust,cCust,lUsa
oCust = CreateObject("vfp_mts.mts2")
? oCust.MyColor
Green
oCust.MyColor = "Red"
? oCust.MyColor
Red
cCust = "JONES"
lUsa = oCust.InUsa(cCust)   && object goes stateless (deactivated)
? oCust.MyColor      && object is activated (stateful)
Green
RELEASE oCust         && object is fully released

Notice in the preceding example how the state of oCust is lost after the InUsa method is called. The MyColor property no longer returns Red, but is instead reset to its original value of Green.

Transactions

If you have used Visual FoxPro at all, you are probably aware that Visual FoxPro supports transactions. Changes to your data can be committed or rolled back. Though transactions are critical to MTS, don’t be misled by the name; there is a lot more to it than just transactions. However, the ability to have MTS automatically handle transactions between distributed objects is quite powerful. Transactions are often discussed in terms of the ACID acronym:

Atomicity—ensures that either the entire transaction commits or nothing commits.
Consistency—a transaction is a correct transformation of the system state.
Isolation—protects concurrent transactions from seeing each other’s partial and uncommitted results.
Durability—committed updates to managed resources can survive failures.

As just mentioned, MTS transaction support is not compatible with Visual FoxPro data. It only works with databases supporting OLE transaction or XA protocols. Both SQL Server and Oracle data can be used with MTS in transactional fashion.

You should understand what we mean by a transaction and to what extent things are either committed or rolled back. Consider the following scenario (all done within confines of two components in a single MTS package):

Component A adds a new customer record to the Customer table in SQL Server.
Component A writes out new record to a Visual FoxPro database (audit log).
Component A sends e-mail notification of new customer to some manager.
Component A calls Component B.
Component B edits the Orders table with a new order in SQL Server.
Component B writes out text log file of activity.
Component B completes activity by committing the transaction (SetComplete).
Component A discovers bad credit history with customer and aborts transaction (SetAbort).

When Component B commits in step 7, not a whole lot happens because MTS manages the entire Context within the package in a distributed fashion. Component B actually inherits transaction state from Component A, so it cannot really fully commit the transaction. The real transaction terminates in step 8 when the last object with transaction state aborts. At this point, changes made to both Customer and Orders tables are rolled back because these tables are in SQL Server. Unfortunately, the Visual FoxPro table update, e-mail notification, and text log file activities are not rolled back. When a transaction is aborted/committed, only data managed through the MS DTC is affected. There is no event that is magically triggered. (Check out the MTS SDK for ideas on using Spy).

Remember, good MTS apps are written with MTS in mind from the start. Managing transactions is very important, and while much of it is handled automatically, you will need to provide a fair amount of code to effectively manage all the resources being utilized in a transaction setting.

Transaction support is set at the component level, but transactions can span multiple packages. You can set this option in the MTS Explorer from the component’s Property Sheet (see MTS Help for details on the various options). Again, the object’s Context manages and passes on transaction state for a given component. If the transaction setting of a component is marked as “Requires a transaction,” a transaction is always associated with the component. If another object that calls this component already has a transaction in effect, no new transaction is created. The component merely inherits the current one. A new transaction is only created if one does not already exist in the context.

Figure 3. Setting Transaction support

Let’s return a minute to the SetComplete and SetAbort methods. These methods actually serve two purposes. From their names, they imply functionality related to transactions. However, as already discussed, they also serve to deactivate objects (make them stateless). In fact, these methods can be used simply for JIT activation without any concern for transactional support. Again, SetComplete releases valuable resources/memory used by MTS to allow for improved scalability. The Context object also includes several other methods useful for transactions: EnableCommit, DisableCommit, and IsInTransaction. The following example shows how to handle transactions in Visual FoxPro:

LPARAMETER tcCustID
LOCAL lFound,oMTX,oContext
oMtx = CreateObject("MTXAS.APPSERVER.1")
oContext = oMtx.GetObjectContext()
USE CUSTOMER AGAIN SHARED
LOCATE FOR UPPER(cust_id) == UPPER(tcCustID)
lFound = FOUND()
IF FOUND()
oContext.SetComplete()
ELSE
oContext.SetAbort()
ENDIF
RETURN lFound

In this scenario, we assume that another component already performed an update on another table (for example, Orders). If the customer ID in the preceding code was not found, the entire transaction would be rolled back.

You’re probably wondering how transactions work in the code, which clearly appears to be against Visual FoxPro data. Actually, this example is using Remote Views against SQL Server data. Again, Visual FoxPro tables do not support OLE transactions, so you will not get MTS transaction support if you use DBF tables. However, data updates either to Remote Views or by SQL pass-through work just fine.

**Tip **Make sure that your connection to a remote data source is made without any login dialog box. If you are using a connection stored in a DBC, ensure that the Display ODBC logins prompt is set to Never. For access to remote data through SQL pass-through commands, you can use the SQLSetProp function:

 SQLSETPROP(0, 'DispLogin', 3)

Programming Models

MTS supports two programming models. The TransactionContext model is intended primarily for backward compatibility. It essentially lets the base client control the transaction. The assumption is that the COM component has no MTS awareness (that is, the component was written before MTS was available). The second model is called the ObjectContext model and assumes the COM component inside the MTS package has MTS smarts and is aware of its Context object.

TransactionContext

We do not recommend using this model for new three-tier applications, because it has limited access to the full capabilities of MTS. It merely offers a way to provide some transaction support to applications whose middle-tier components were developed without MTS in mind. The burden of transaction handling rests on the base client. With this model, the base client is likely to be a smart client that has scripting capabilities (for example, an application form). The base client is less likely to be a Web page, and it always runs outside of the MTS run-time environment.

The following code snippet in a Visual FoxPro form (base client) shows this model in use. The middle-tier component is a Visual FoxPro server whose ProgID is “vfp_mts.mts1”. The assumption here is that this server knows nothing about MTS, thus requiring the base client to perform all transaction handling:

#DEFINE TRANS_CLASS   "TxCtx.TransactionContext"
THIS.oContext = CreateObject(TRANS_CLASS)
LOCAL loCust
loCust = THISFORM.oContext.CreateInstance("vfp_mts.mts1")
RETURN loCust.lnUSA

The code in the middle tier simply does a lookup in a SQL Server table for a customer’s home country. If the record was actually changed, the base client would have the capability to actually commit or roll back the transaction. The TransactionContext object only supports three methods: CreateInstance, Commit, and Abort.

ObjectContext

The ObjectContext model is the only model you should consider for new MTS application development. It relies on component awareness of MTS, but this should be your goal so that you can optimize performance and take advantage of MTS-specific features.

Unlike the TransactionContext object, which uses the following PROGID:

#DEFINE TRANS_CLASS   "TxCtx.TransactionContext"

the ObjectContext object can be accessed using the following code:

#DEFINE MTX_CLASS   "Mtxas.AppServer.1"

The ObjectContext object, which can be referenced in your Visual FoxPro code, as shown here:

LOCAL oMTX,oContext
oMtx = CreateObject("MTXAS.APPSERVER.1")
oContext = oMtx.GetObjectContext()

contains the following properties, events, and methods (PEMs).Expand table

PEM	Description
Count	Returns the number of Context object properties.
CreateInstance	Instantiates another MTS object.
DisableCommit	Declares that the object hasn’t finished its work and that its transactional updates are in an inconsistent state. The object retains its state across method calls, and any attempts to commit the transaction before the object calls EnableCommit or SetComplete will result in the transaction being aborted.
EnableCommit	Declares that the object’s work isn’t necessarily finished, but its transactional updates are in a consistent state. This method allows the transaction to be committed, but the object retains its state across method calls until it calls SetComplete or SetAbort, or until the transaction is completed.
IsCallerInRole	Indicates whether the object’s direct caller is in a specified role (either directly or as part of a group).
IsInTransaction	Indicates whether the object is executing within a transaction.
IsSecurityEnabled	Indicates whether security is enabled. MTS security is enabled unless the object is running in the client’s process.
Item	Returns a Context object property.
Security	Returns a reference to an object’s SecurityProperty object.
SetAbort	Declares that the object has completed its work and can be deactivated on returning from the currently executing method, but that its transactional updates are in an inconsistent state or that an unrecoverable error occurred. This means that the transaction in which the object was executing must be aborted. If any object executing within a transaction returns to its client after calling SetAbort, the entire transaction is doomed to abort.
SetComplete	Declares that the object has completed its work and can be deactivated on returning from the currently executing method. For objects that are executing within the scope of a transaction, it also indicates that the object’s transactional updates can be committed. When an object that is the root of a transaction calls SetComplete, MTS attempts to commit the transaction on return from the current method.

Deployment

Microsoft Transaction Server offers excellent tools for deploying both client- and server-side setups. Setups are made at the package level, so you should include all components for your application in a particular package. The deployment package contains all the distributed COM (DCOM) configuration settings you need, so you don’t have to fuss with the messy DCOM Configuration dialog box.

To create a setup

Click the package that you want to create setup.
Select Export… from the Action menu. The Export dialog box is displayed.

Figure 4. Exporting a package

**Important **The directions in the Export dialog box are not very clear. You should not simply type in a path as specified. If you do, the Export routine creates a file with a .pak extension in the folder location you specify. Instead, you should always type a full path and file name for the .pak file, as shown in Figure 4.

You can also use the scriptable administration objects to automate deployment and distribution of your MTS packages. See the section “Remote Deployment and Administration” to follow for more details.

The output of the Export operation consists of two setups:

Server Setup

This setup, which is placed in the folder specified in the Export dialog box, contains the .pak file and all COM DLL servers used by the package.

Note With Visual FoxPro servers, you will also have .tlb (type library) files included. You can install this package by selecting Install from the Package Wizard in MTS Explorer.

Figure 5. Installing package from the Package Wizard

Client Setup

The Export process creates a separate subfolder named “clients” in the folder specified in the Export Package dialog box. The Clients folder contains a single .exe file that a user can double-click to run.

The Client setup merely installs necessary files and registry keys needed by a client to access (remotely through DCOM) your MTS package and its COM servers.

Remote Deployment and Administration

The MTS Explorer allows you to manage remote components (those installed on a remote machine). The Remote Components folder contains the components that are registered locally on your local computer to run remotely on another computer. Using the Remote Components folder requires that you have MTS installed on the client machines that you want to configure. If you want to configure remote computers manually using the Explorer, add the components that will be accessed by remote computers to the Remote Components folder.

Pushing and Pulling

If both the server and client computer are running MTS, you can distribute a package by “pulling” and “pushing” components between one or more computers. You can “push” components by creating remote component entries on remote computers and “pull” components by adding component entries to your local computer. Once you create the remote component entries, you must add those component entries to your Remote Components folder on your local machine (pull the components).

Before you deploy and administer packages, set your MTS server up by doing the following:

Configure roles and package identity on the system package.
Set up computers to administer.

You must map the System Package Administrator role to the appropriate user in order to safely deploy and manage MTS packages. When MTS is installed, the system package does not have any users mapped to the administrator role. Therefore, security on the system package is disabled, and any user can use the MTS Explorer to modify package configuration on that computer. If you map users to system package roles, MTS will check roles when a user attempts to modify packages in the MTS Explorer.

Roles

By default, the system package has an Administrator role and a Reader role. Users mapped to the Administrator role of the system package can use any MTS Explorer function. Users that are mapped to the Reader role can view all objects in the MTS Explorer hierarchy but cannot install, create, change, or delete any objects, shut down server processes, or export packages. If you map your Windows NT domain user name to the System Package Administrator role, you will be able to add, modify, or delete any package in the MTS Explorer. If MTS is installed on a server whose role is a primary or backup domain controller, a user must be a domain administrator in order to manage packages in the MTS Explorer.

You can also set up new roles for the system package. For example, you can configure a Developer role that allows users to install and run packages, but not delete or export them. The Windows NT user accounts or groups that you map to that role will be able to test installation of packages on that computer without having full administrative privileges over the computer.

In order to work with a remote computer, you first need to add it to the Computers folder in the MTS Explorer:

Click the Computers folder.
Select New -> Computer from the Action menu.
Enter name of the remote computer.

Important You must be mapped to the Administrator role on the remote computer in order to access it from your machine. In addition, you cannot remotely administer MTS on a Windows 95 computer from MTS on a Windows NT server.

You should now see both My Computer and the new remote computer under the Computers folder. At this point, you can push and pull components between the two machines. Think of the Remote Components folder as its own special package. You are merely adding to it components that exist in one or more packages of remote machines.

The following example pulls a component from a remote machine to My Computer.

Click the Remote Components folder of My Computer.
Select New-> Remote Component from the Action menu to display the dialog box shown here.

Figure 6. Adding a component to Remote Components

In this example, we select (and add) a component called test6.foobar2 from a package called aa on the remote machine calvinh5. This package also has another component (Visual FoxPro OLEPUBLIC class) named test6.foobar, which we do not select. When we click OK, a copy of the DLL and the type library are copied to the local machine (My Computer) and stored in a subfolder of your MTS root location (in this case, c:\ C:\Program Files\Mts\Remote\aa\). In addition, the server is now registered on your machine. Note that while the DLL is copied to your machine, the .dll registered in your registry points to the remote machine.

If you encounter problems after you click OK, you may not have proper access rights to copy the server components. Ensure that the remote machine is configured with proper access privileges for you. At this point, you can go into Visual FoxPro running on the local machine and access the server:

oServer = CreateObject("test6.foobar2")
? oServer.myeval("SYS(0)")

You use MTS Explorer to view the activated object in the remote machine folder under the package it is registered in. You will not see the object activity in the Remote Components folder. See the “Working with Remote MTS Computers” topic in the MTS Help file for more details.

Security

Security in MTS is handled by roles. Roles are established at the package level. Components within that package can set up role memberships. The following MTS Explorer image shows a package called Devcon1, which contains three roles. Only the last two components contain Role Memberships.

Figure 7. Package with roles

If you navigate the Roles folder, you can see all Windows NT users or groups assigned to that particular role.

To create a new role

Click the Roles folder.
Select New-> Role from the Action menu.
Enter a new role name in the dialog box.

You can add new users/groups to a particular role as follows:

To add new users or groups

Click the Users folder of the newly added role.
Select New-> User from the Action menu.
Select users/groups from the dialog box.

MTS handles its security several different ways. The MTS security model consists of declarative security and programmatic security. Developers can build both declarative and programmatic security into their components prior to deploying them on a Windows NT security domain.

You can administer package security using MTS Explorer. This form of declarative security, which does not require any component programming, is based on standard Windows NT security. This can be done by Package- or Component-level security.

Declarative Security

You can manage Declarative security at the package and at the component level through settings available in the Security tab of the Package Properties dialog box.

Package-level security

Each package has its own security access authorization, which can be set in the Package Properties dialog box.

Figure 8. Package properties

By default, the Security check box is not marked, so you need to check this box to enable security. If you do not enable security for the package, MTS will not check roles for the component. If security is enabled, you must also enable security at the component level in order to have roles checked.

Component-level security

Each installed component can also have its own security setting. You set security for a component through the same Enable authorization checking check box on the Property dialog box in MTS Explorer. If you are enabling security at both levels and you do have defined roles, you must include one of the roles in the component’s Role Membership folder. If you do not include a role in the folder, you will get an “Access is denied” error message when you try to access a property or method of the component. Of course, if you do not have any roles, you will get the same error.

Note You can still do a CreateObject on the component, but that is all.

oContext = CreateObject("vfp_mts.mts1")
oContext.Hello()   && will generate an Access is denied error

To restrict access to a specific component within a package, you must understand how components in the package call one another. If a component is directly called by a base client, MTS checks roles for the component. If one component calls another component in the same package, MTS does not check roles because components within the same package are assumed to “trust” one another.

When you change the security settings for a particular package or component, you need to shut down server processes before changes can take place. This option is available from the Action menu when Package is selected.

Programmatic Security

You can put code in your program to check for specific security access rights. The following three properties and methods from the Context object return information regarding security for that package or component.Expand table

Methods	Description
IsCallerInRole	Indicates whether the object’s direct caller is in a specified role (either directly or as part of a group).
IsSecurityEnabled	Indicates whether security is enabled. MTS security is enabled unless the object is running in the client’s process.
Security	Returns a reference to an object’s SecurityProperty object.

The following method checks whether the called object is in a particular role. The IsCallerInRole method is useful when the roles are defined, but if your code is generic and doesn’t know the particular roles associated with a component, you must handle this through your error routine.

PROCEDURE GetRole (tcRole)   
   LOCAL oMTX,oContext,lSecurity,cRole,lHasRole
   IF EMPTY(tcRole)
      RETURN "No Role"
   ENDIF
   oMtx = CREATEOBJECT(MTX_CLASS)
   oContext = oMtx.GetObjectContext()
   IF oContext.IsSecurityEnabled
      THIS.SkipError=.T.
      lHasRole = oContext.IsCallerInRole(tcRole)
      THIS.SkipError=.F.
      DO CASE
      CASE THIS.HadError
         THIS.HadError = .F.
         cRole="Bad Role"
      CASE lHasRole 
         cRole="Yep"
      OTHERWISE
         cRole="Nope"
      ENDCASE
ELSE
      cRole="No Security"
ENDIF
   oContext.SetComplete()
   RETURN cRole
ENDPROC

Advanced users can access the SecurityProperty object to obtain more details on the user for handling security. The Security object offers the following additional methods.Expand table

Method	Description
GetDirectCallerName	Retrieves the user name associated with the external process that called the currently executing method.
GetDirectCreatorName	Retrieves the user name associated with the external process that directly created the current object.
GetOriginalCallerName	Retrieves the user name associated with the base process that initiated the call sequence from which the current method was called.
GetOriginalCreatorName	Retrieves the user name associated with the base process that initiated the activity in which the current object is executing.

What type of security should you use? Programmatic security offers more power in terms of structuring specific functionality for particular roles. You can use Case statements, as in the previous example, which perform different tasks, depending on the role. Declarative security, on the other hand, can only control access at the component level (not method or lower).

Changes to Programmatic security, however, require a new build of the component, which may not always be convenient or realistic. Controlling Component-level security for users and roles by using MTS Explorer to turn security on or off gives an administrator greater control. The optimal solution is one with utilizes both declarative and programmatic securities in the most efficient manner.

Shared Property Manager

The Shared Property Manager (SPM) MTS resource dispenser allows you to create and share properties across components. Because it is a resource dispenser, all other components in the same package can share information, but information cannot be shared across different packages. For example, if you want to keep a counter to use for generating unique IDs for objects in a package, you could create a Counter property to hold the latest unique ID value. This property would be preserved while the package was active (regardless of object state).

The SPM also represents an excellent way for an object to preserve its state before being deactivated in a stateless mode (SetComplete). Just-In-Time activation does not affect or reset the state of SPM.

The following example shows how to use the SPM with Visual FoxPro servers:

#DEFINE MTX_CLASS        "MTXAS.APPSERVER.1"
#DEFINE MTX_SHAREDPROPGRPMGR "MTxSpm.SharedPropertyGroupManager.1"
PROCEDURE GetCount (lReset)
   LOCAL oCount 
   LOCAL oMTX,oContext
   LOCAL nIsolationMode,nReleaseMode,lExists
   oMtx = CREATEOBJECT(MTX_CLASS)
   oContext = oMtx.GetObjectContext()
   oSGM = oContext.CreateInstance(MTX_SHAREDPROPGRPMGR)
   nIsolationMode = 0
   nReleaseMode = 1
   
* Get group reference in which property is contained
   oSG = oSGM.CreatePropertyGroup("CounterGroup", nIsolationMode,;
nReleaseMode, @lExists)
   
* Get object reference to shared property
   oCount = oSG.CreateProperty("nCount", @lExists)
* check if property already exists otherwise reset
   IF lReset OR !lExists
      oCount.Value = 1
   ELSE
      oCount.Value = oCount.Value + 1
   ENDIF
   RETURN oCount.Value
ENDPROC

The following settings are available for Isolation and Release modes.

Isolation mode

LockSetGet 0 (default)—Locks a property during a Value call, assuring that every get or set operation on a shared property is atomic. This ensures that two clients can’t read or write to the same property at the same time, but doesn’t prevent other clients from concurrently accessing other properties in the same group.

LockMethod 1—Locks all of the properties in the shared property group for exclusive use by the caller as long as the caller’s current method is executing. This is the appropriate mode to use when there are interdependencies among properties or in cases where a client may have to update a property immediately after reading it before it can be accessed again.

Release mode

Standard 0 (default)—When all clients have released their references on the property group, the property group is automatically destroyed.

Process 1—The property group isn’t destroyed until the process in which it was created has terminated. You must still release all SharedPropertyGroup objects by setting them to Nothing.

MTS Support for Internet Information Server

MTS includes several special system packages for use with Microsoft Internet Information Server (IIS). The Windows NT Options Pack 4.0 integrates MTS and IIS more closely. In the future, you can expect even better integration to play a more central role in your Web applications.

IIS Support

Transactional Active Server Pages—You can now run Scripts in Active Server Pages (ASP) within an MTS-managed transaction. This extends the benefits of MTS transaction protection to the entire Web application.
Crash Protection for IIS Applications—IIS Web applications can now run within their own MTS package, providing process isolation and crash protection for Web applications.
Transactional Events—You can embed commands in scripts on ASP pages, enabling you to customize Web application response based on transaction results.
Object Context for IIS Built-In Objects—The MTS object context mechanism, which masks the complexity of tracking user state information from the application developer, now tracks state information managed by IIS built-in objects. This extends the simplicity of the MTS programming model to Web developers.
Common Installation and Management—MTS and IIS now share common installation and a common management console, lowering the complexity of deploying and managing business applications on the Web.

IIS System Packages

If you use MTS with Internet Information Server version 4.0, the Packages Installed folder contains the following IIS-specific system packages.

IIS in-process applications

The IIS In-Process Applications folder contains the components for each Internet Information Server application running in the IIS process. An IIS application can run in the IIS process or in a separate application process. If an IIS application is running in the IIS process, the IIS application will appear as a component in the IIS In-Process Applications folder. If the IIS application is running in an individual application process, the IIS application will appear as a separate package in the MTS Explorer hierarchy.

IIS utilities

The IIS Utilities Folder contains the ObjectContext component required to enable transactions in ASP pages. For more information about transactional ASP pages, refer to the Internet Information Server documentation.

Automating MTS Administration

Microsoft Transaction Server contains Automation objects that you can use to program administrative and deployment procedures, including:

Installing a prebuilt package.
Creating a new package and installing components.
Enumerating through installed packages to update properties.
Enumerating through installed packages to delete a package.
Enumerating through installed components to delete a component.
Accessing related collection names.
Accessing property information.
Configuring a role.
Exporting a package.
Configuring a client to use Remote Components.

You can use the following Admin objects in your Visual FoxPro code.Expand table

Object	Description
Catalog	The Catalog object enables you to connect to MTS Catalog and Access collections.
CatalogObject	The CatalogObject object allows you to get and set object properties.
CatalogCollection	Use the CatalogCollection object to enumerate, add, delete, and modify Catalog objects and to access related collections.
PackageUtil	The PackageUtil object enables installing and exporting a package. Instantiate this object by calling GetUtilInterface on a Packages collection.
ComponentUtil	Call the ComponentUtil object to install a component in a specific collection and import components registered as in-process servers. Create this object by calling GetUtilInterface on a ComponentsInPackage collection.
RemoteComponentUtil	Using the RemoteComponentUtil object, you can program your application to pull remote components from a package on a remote server. Instantiate this object by calling GetUtilInterface on a RemoteComponents collection.
RoleAssociationUtil	Call methods on the RoleAssociationUtil object to associate roles with a component or interface. Create this object by calling the GetUtilInterface method on a RolesForPackageComponent or RolesForPackageComponentInterface collection.

In addition, the following collections are also supported.Expand table

Collection
LocalComputer
ComputerList
Packages
ComponentsInPackage
RemoteComponents
InterfacesForComponent
InterfacesForRemoteComponent
RolesForPackageComponent
RolesForPackageComponentInterface
MethodsForInterface
RolesInPackage
UsersInRole
ErrorInfo
PropertyInfo
RelatedCollectionInfo

If you want to get a reference to a particular collection, use the GetCollection method. The following example shows, first, getting the collection of packages and, second, getting a collection of all components in the first package:

#DEFINE MTS_CATALOG      "MTSAdmin.Catalog.1"
oCatalog = CreateObject(MTS_CATALOG)
oPackages = oCatalog.GetCollection("Packages")
oPackages.populate()
? oPackages.Count
oComps = oPackages.GetCollection("ComponentsInPackage",;
oPackages.Item(0).Key)
oComps.Populate()

Note The GetCollection method merely returns an object reference to an empty collection. You need to explicitly call the Populate method to fill the collection.

Collections are case sensitive, as in the following example code:

oPackages = oCatalog.GetCollection("Localcomputer")   &&fails
oPackages = oCatalog.GetCollection("LocalComputer")   &&works

Note Also keep in mind that all MTS collections are zero-based.

oPackages = oCatalog.GetCollection("LocalComputer")
oPackages.populate()
? oPackages.item[0].name

See MTS Help for more specific language details.

Visual FoxPro 6.0 is ideally suited for using MTS Automation because of the new Project Manager and Application Builder hooks support.

Using Visual FoxPro 6.0 Project Hooks

The MTS samples posted along with this document contain a special Project Hook class designed specially for MTS. This class automatically shuts down and refreshes MTS registered servers contained in that project. One of the issues that developers must consider when coding and testing servers under MTS is repeatedly opening the MTS Explorer to manually shut down processes so that servers can be rebuilt and overwritten. Using a Project Hook nicely automates this process. Here is sample code from the BeforeBuild event, which iterates through the Packages collection shutting-down processes.

* BeforeBuild event
LPARAMETERS cOutputName, nBuildAction, lRebuildAll, lShowErrors, lBuildNewGuids
#DEFINE MTS_CATALOG      "MTSAdmin.Catalog.1"
#DEFINE   MSG_MTSCHECK_LOC   "Shutting down MTS servers...."
LOCAL oCatalog,oPackages,oUtil,i,j,oComps
LOCAL oProject,lnServers,laProgIds,lcSaveExact
THIS.lBuildNewGuids = lBuildNewGuids
oProject = _VFP.ActiveProject
lnServers = oProject.servers.count
DIMENSION THIS.aServerInfo[1]
STORE "" TO THIS.aServerInfo
IF lnServers = 0 OR nBuildAction # 4
   RETURN
ENDIF
WAIT WINDOW MSG_MTSCHECK_LOC NOWAIT
DIMENSION laProgIds[lnServers,3]
FOR i = 1 TO lnServers
   laProgIds[m.i,1] = oProject.servers[m.i].progID
   laProgIds[m.i,2] = oProject.servers[m.i].CLSID
   laProgIds[m.i,3] = THIS.GetLocalServer(laProgIds[m.i,2])
ENDFOR
ACOPY(laProgIds,THIS.aServerInfo)
* Shutdown servers
oCatalog = CreateObject(MTS_CATALOG)
oPackages = oCatalog.GetCollection("Packages")
oUtil = oPackages.GetUtilInterface
oPackages.Populate()
lcSaveExact = SET("EXACT")
SET EXACT ON
FOR i = 0 TO oPackages.Count - 1
   oComps = oPackages.GetCollection("ComponentsInPackage",;
oPackages.Item(m.i).Key)
   oComps.Populate()
   FOR j = 0 TO oComps.Count-1
IF ASCAN(laProgIds,oComps.Item(m.j).Value("ProgID")) # 0
oUtil.ShutdownPackage(oPackages.Item(m.i).Value("ID"))
EXIT
ENDIF
   ENDFOR
ENDFOR
WAIT CLEAR
SET EXACT &lcSaveExact
* User is building new GUIDs, so packages 
* need to be reinstalled manually
IF lBuildNewGuids
   RETURN
ENDIF

This is only one of the many possibilities provided by a Visual FoxPro Project Hook. The MTS Admin objects can save a great deal of time you normally would spend manually setting options in the MTS Explorer.

Using Visual FoxPro 6.0 Application Builders

As with the Project Hooks, you might also want to create an Application (Project) Builder that handles registration of Visual FoxPro Servers in MTS packages. The Visual FoxPro MTS samples include such a builder. (See the Readme file in the mtsvfpsample sample application for more details on setup and usage of these files.)

This Builder simply enumerates through all the servers in your Visual FoxPro project and all the available MTS packages. You can then select (or create) a particular package and registered server to install in that package. Additionally, you can set the Transaction property for each component. The Visual FoxPro code called when the user clicks OK is as follows:

#DEFINE   MTS_CATALOG      "MTSAdmin.Catalog.1"
#DEFINE   ERR_NOACTION_LOC   "No action taken."
LOCAL oCatalog,oPackages,oUtil,i,j,oComps,nPoslcPackage
LOCAL lPackageExists,oCompRef
LOCAL oProject,lnServers,laProgIds,lcSaveExact,oPackageRef,lctrans
lcPackage = ALLTRIM(THIS.cboPackages.DisplayValue)
lPackageExists = .f.
SELECT mtssvrs
LOCATE FOR include
IF !FOUND() OR EMPTY(lcPackage)
   MESSAGEBOX(ERR_NOACTION_LOC)
   RETURN
ENDIF
THIS.Hide
oCatalog = CreateObject(MTS_CATALOG)
oPackages = oCatalog.GetCollection("Packages")
oPackages.Populate()
FOR i = 0 TO oPackages.Count-1
   IF UPPER(oPackages.Item(m.i).Name) == UPPER(lcPackage)
      oPackageRef = oPackages.Item(m.i)
      lPackageExists=.T.
      EXIT
   ENDIF
ENDFOR
IF !lPackageExists   &&creating new package
   oPackageRef = oPackages.Add
   oPackageRef.Value("Name") = lcPackage
   oPackages.SaveChanges
ENDIF
oComps = oPackages.GetCollection("ComponentsInPackage",;
oPackageRef.Key)
oUtil = oComps.GetUtilInterface
SCAN FOR include
   oUtil.ImportComponentByName(ALLTRIM(progid))
ENDSCAN
oPackages.SaveChanges()
oComps.Populate()
SCAN FOR include
   DO CASE
   CASE trans = 1
      lctrans = "Supported"
   CASE trans = 2
      lctrans = "Required"
   CASE trans = 3
      lctrans = "Requires New"
   OTHERWISE
      lctrans = "Not Supported"         
   ENDCASE
   FOR j = 0 TO oComps.Count-1
      IF oComps.Item(m.j).Value("ProgID")=ALLTRIM(progid)
         oCompRef = oComps.Item(m.j)
         oCompRef.Value("Transaction") = lctrans
         oCompRef.Value("SecurityEnabled") = ;
IIF(THIS.chkSecurity.Value,"Y","N")
      ENDIF
   ENDFOR
ENDSCAN
oComps.SaveChanges()
oPackages.SaveChanges()

Tips and Tricks

Hopefully, this article offers enough insight into creating Visual FoxPro components that work well with your three-tier MTS applications. Here are a few final tips to consider:

Design your components with MTS in mind from the start.
Components must be in-process DLLs. Do not use Visual FoxPro EXE servers.
When adding Visual FoxPro components, make sure to select both .dll and .tlb files.
In the Project Info dialog box of Visual FoxPro DLL servers, set Instancing to MultiUse.
Don’t be afraid to mix with other components (for example, Visual Basic servers).
You must have DTC running for transaction support.
Call SetComplete regardless of whether you’re using transactions, because it places objects in stateless mode.
Your MTS object has an associated Context object. Do not place this code in the base client.
Connections must have DispLogin set to Never; for SQL pass-through, use SQLSetProp(0).
Minimize the number of PEMs on an object (protect your PEMs).
Because of page locking issues, limit the length of time you leave SQL Server 6.5 transactions uncommitted.
To use security, you must have a valid role associated with the component.
Avoid using CreateInstance on non-MTS components.
Do not pass object references of the Context object outside of the object itself.
Consider using disconnected ADO recordsets to move data between tiers.
You can pass Visual FoxPro data in strings, arrays, or ADO recordsets.
Passing Parameters:
- Be careful when passing parameters.
- Always use SafeArray when passing object references.
- Passing by value:- Fastest and most efficient- Copies the parameters into a buffer- Sends all values at once
- Passing by reference:- Sends a reference, but leaves the object back in the client.- Accessing the parameter scampers back to the client machine.
Always read the Late Breaking News! It contains important information such as Security configuration details.
Visit the Microsoft MTS Web site at www.microsoft.com/com/ for more information.
By default, MTS will create a maximum of 100 apartment threads for client work (per package). In Windows NT 4.0 Service Pack 4 (and later), you can tune the MTS activity thread pool. This will not affect the number of objects than can be created. It will simply configure the number that can be simultaneously in call. To tune the MTS activity thread pool:
1. Open your Windows Registry using RegEdit and go to the package key:HKLM/Software/Microsoft/Transaction Server/Package/{your package GUID}
2. Add a REG_DWORD named value:ThreadPoolMax
3. Enter a value for ThreadPoolMax. Valid values are:0 to 0x7FFFFFFF

Summary: Describes how the Microsoft® Visual FoxPro® version 6.0 Application Framework, including the Application Wizard and Application Builder, can be used by the beginning developer to turn out polished applications and customized by the more experienced developer to create more detailed applications. (32 printed pages)

Overview Examining Framework Components Designating the Classes You Want Specifying Your Own Framework Components A Closer Look at the Standard Application Wizard A New Application Wizard A Few Parting Thoughts about Team Practices Appendix 1 Appendix 2 Appendix 3 Appendix 4Expand table

Click to copy the appfrmwk sample application discussed in this article.

Overview

The Visual FoxPro 6.0 Application Framework offers a rapid development path for people with little experience in Visual FoxPro. With a few simple choices in the Application Wizard and the Application Builder, beginning developers can turn out polished and practical applications.

Under the hood, however, the framework offers experienced developers and teams much more. This article shows you how to adapt the framework components so they fit your established Visual FoxPro requirements and practices.

In the first section of this article you’ll learn about the files and components that support the framework and how they work together while you develop an application. This information is critical to moving beyond simply generating framework applications to experimenting with framework enhancements.

The second section teaches you how to apply your experiences with the framework to multiple applications. After you’ve experimented with framework enhancements for a while, you will want to integrate your changes with the framework, for standard use by your development team. By customizing the files the Application Wizard uses to generate your application, you’ll make your revisions accessible to team members—without sacrificing the framework’s characteristic ease of use.

Examining Framework Components

This section shows where the framework gets its features and components, and how these application elements are automatically adjusted during your development process.

Once you see how and where framework information is stored, you can begin to try different variations by editing the versions generated for a framework application. When you’re satisfied with your changes, you can use the techniques in the next section to migrate them to your team’s versions of the framework components.

Note Like most Visual FoxPro application development systems, the framework is composed of both object-oriented programming (OOP) class components and non-OOP files. This distinction is important because you adapt these two types of components in different ways; classes can be subclassed, while non-OOP files must be included as is or copied and pasted to get new versions for each application. The framework is minimally dependent on non-OOP files, as you’ll see here, but these files still exist.

Throughout this article we’ll refer to the non-OOP framework files as templates, to distinguish these components from true classes.

Framework Classes

The Visual FoxPro 6.0 framework classes are of two types:

Framework-specific classes. These classes have been written especially for the application framework and provide functionality specific to the framework. The standard versions of these classes are in the HOME( )+ Wizards folder, in the _FRAMEWK.VCX class library.
Generic components. These features come from class libraries in the HOME( )+ FFC (Visual FoxPro Foundation Classes) folder.

_FRAMEWK.VCX

The _FRAMEWK.VCX class library (see Figure 1) contains all the classes written specifically to support the framework. Each framework application you create has an application-specific VCX containing subclasses of the _FRAMEWK.VCX components. The Application Wizard puts these subclasses in a class library named <Your projectname> plus a suffix to designate this library as one of the wizard-generated files. To distinguish these generated, empty subclasses, it adds a special prefix to the class names as well.

Figure 1. _FRAMEWK.VCX framework-specific class library, as viewed in Class Browser, is found in the HOME( )+ Wizards folder.

Framework superclass: _Application

The _Application class is a required ancestor class, which means that this class or a subclass of this class is always required by the framework. This class provides application-wide manager services. For example, it manages a collection of modeless forms the user has opened.

You designate a subclass of _Application simply by using CREATEOBJECT( ) or NEWOBJECT( ) to instantiate the subclass of your choice. (By default, the framework provides a main program to do this, but this PRG contains no required code.) When your designated _Application subclass has instantiated successfully, you call this object’s Show( ) method to start running the application.

Note In this article, we’ll refer to the object you instantiate from a subclass of _Application as the application object. We’ll continue to refer to “your subclass of _Application” to mean the class definition instantiating this object, which will be in a VCX belonging to your application (not _FRAMEWK.VCX). You’ll also see references to “_Application“, that refer specifically to code and properties you’ll find in the superclass located in _FRAMEWK.VCX.

At run time, the application object instantiates other objects as necessary to fill all the roles represented by the other classes in _FRAMEWK.VCX except _Splash. The framework identifies these roles as important to various application functions, but, as you’ll see in this section, you have full control over how the roles are carried out.

Note The _Splash class is an anomaly in _FRAMEWK.VCX; it isn’t instantiated or used by the framework application directly. (If it were instantiated by the application object, your splash screen would appear too late to be useful.) Instead, _Splash merely provides a default splash screen with some of the same attributes as _Application (for example, your application name and copyright). The Application Builder transfers these attributes to your application’s subclass of _Splash at the same time it gives them to your application’s subclass of _Application, so they stay synchronized. The default main program delivered with a framework gives you one way to instantiate this splash screen before you instantiate your application object.

You certainly don’t need to use the method shown in the default main program for your splash screen. In fact, many applications do not need a splash screen at all. For those that do, you may prefer to use the Visual FoxPro –b<file name> command-line switch, which displays a bitmap of your choice during startup, rather than a Visual FoxPro form of any description.

Framework superclass: _FormMediator

You’ll grasp most of the “roles” played by the subsidiary classes in _FRAMEWK.VCX easily, by reading their class names and descriptions. (If you can’t read the full class description when you examine _FRAMEWK.VCX classes in a project, try using the Class Browser.) However, you’ll notice a _FormMediator class whose purpose takes a little more explaining.

You add an object descended from the _FormMediator custom class to any form or form class, to enable the form to communicate efficiently with the application object. This section will show you several reasons the form might want to use services of the application object. With a mediator, your form classes have access to these services, but the forms themselves remain free of complex framework-referencing code.

The _FormMediator class is low-impact. It doesn’t use a lot of resources, and its presence will not prevent your forms from being used outside a framework application. Using this strategy, the framework can manage any forms or form classes your team prefers to use, without expecting them to have any special inheritance or features.

Like _Application, _FormMediator class is a required ancestor class. You can create other mediator classes, as you can subclass _Application to suit your needs, but your mediators must descend from this ancestor.

We’ll refer to _FormMediator and its descendents as the mediator object, because (strictly speaking) your forms will see it as the “application mediator” while the application object treats it as a “form mediator.”

The Visual FoxPro 6.0 Form Wizards create forms designed to take advantage of mediators when the framework is available. You can see some simple examples of mediator use in the baseform class of HOME( )+ Wizards\WIZBASE.VCX.

Examine _FormMediator‘s properties and methods, and you’ll see that you can do much more with the mediator in your own form classes. For example, the application object calls mediator methods and examines mediator properties during its DoTableOutput( ) method. (This method allows quick output based on tables in the current data session.) Your mediator for a specific form could:

SELECT a particular alias to be the focus of the output.
Prepare a query specifically for output purposes (and dispose of it after the output).
Inform the application object of specific classes and styles to be used by _GENHTML for this form.
Change the output dialog box caption to suit this form.

The mediator also has methods and properties designed to specify context menus for the use of a particular form. If the application object receives this information from the mediator, it handles the management of this menu (sharing it between forms as necessary).

You’ll find one example of mediator use in the ErrorLogViewer class. (This use is described in Appendix 1, which covers the options system.) A full discussion of the _FormMediator class is beyond the scope of this document. The more information you give a mediator or mediator subclass, however, the more fully your forms can use framework’s features, without making any significant changes to the forms themselves.

Note The _Application class includes a property, lEnableFormsAtRuntime (defaulting to .T.), which causes the application object to add mediators at run time to any form not having a mediator of its own. You can specify the mediator subclass that the application adds to a form at run time. Keep in mind, however, that mediators added at design time will have a more complete relationship with their form containers, because these forms can include code referencing their mediator members. During a form’s QueryUnload event, for example, the form can use the mediator to determine whether the form contains any unconfirmed changes. Without code in the form’s QueryUnload method, the mediator can’t intercede at this critical point.

Additional _FRAMEWK.VCX classes

The other classes in _FRAMEWK.VCX are all dialog box and toolbar classes to perform common functions within an application. None of these classes are required ancestors; you can substitute your own user interfaces and class hierarchies for these defaults at will. Two of them (_Dialog and _DocumentPicker) are abstract; that is, they are never instantiated directly, existing only to provide properties and methods to their descendent classes. Others will not instantiate unless you pick specific application characteristics. For example, if you don’t write “top form” applications (MDI applications in their own frames) you will never use _TopForm, the _FRAMEWK.VCX class that provides the MDI frame window object.

Once you have examined these classes, and identified their roles, you will know which ones supply the types of services you need in applications you write—and, of these, you will identify the ones you wish to change.

Designating the Classes You Want

For each class role identified by the framework, the application object uses corresponding xxxClass and xxxClassLib properties to determine the classes you want. To change which class is instantiated for each role, you change the contents of these properties in your subclass of _Application.

For example, _Application has cAboutBoxClass and cAboutBoxClassLib properties, and it uses these properties to decide what dialog box to show in its DoAboutBox( ) method (see Figure 2).

Figure 2. Class and ClassLib property pairs in the _Application object

If you fill out a class property but omit the matching Classlib property, _Application assumes that your designated class is in the same library as the _Application subclass you instantiated. If your _Application subclass is in the MyApplication.vcx and cAboutBoxClass has the value “MyAboutBox” but cAboutBoxClassLib is empty, a call to the Application object’s DoAboutBox( ) method instantiates a class called MyAboutBox in MyApplication.vcx.

If you call the method instantiating one of the subsidiary classes when the matching class property is empty, _Application attempts to provide appropriate behavior to the specific situation. For example, if the cAboutBoxClass property is empty, DoAboutBox( ) will simply do nothing, because it has no alternative. By contrast, if the cErrorViewerClass property is empty, the _Application DisplayErrorLog( ) method will ask its cusError member object to use its default error log display instead.

Except for the cMediatorClass and cMediatorClassLib properties, which must specify a class descending from _FormMediator in _FRAMEWK.VCX, remember that there are no restrictions on these dialog boxes and toolbars. You don’t have to subclass them from the classes in _FRAMEWK.VCX, or even follow their examples, in your own classes fulfilling these framework roles.

Even when you design completely different classes, you will still benefit from investigating the defaults in _FRAMEWK.VCX, to see how they take advantage of their relationship with the framework. For example, all the classes descended from _Dialog have an ApplyAppAttributes( ) method. When the framework instantiates these classes, it checks for the existence of this method. If the ApplyAppAttributes( ) method exists, the application object passes a reference to itself to the form, using this method, before it calls the Show( ) method. In this way, the dialog box can derive any framework-specific information it needs before it becomes visible. For instance, the About Box dialog box might adjust its caption using the _Application.cCaption property.

If the ApplyAppAttributes( ) method does not exist in yourcAboutBoxClass class, no harm is done. The _Application code still tries to harmonize your dialog box with its interface, in a limited way, by checking to see whether you’ve assigned any custom value to its Icon property. If you haven’t, _Application assigns the value in its cIcon property to your dialog box’s icon before calling its Show( ) method.

Note This strategy typifies the framework’s general behavior and goals:

It tries to make the best use of whatever material you include in the application.
When possible, it does not make restrictive assumptions about the nature of this material.
It avoids overriding any non-default behavior you may have specified.

Investigating the default _Options dialog box class and _UserLogin default dialog boxes will also give you insight into the _Application options and user systems. While the dialog boxes themselves are not required, you will want to see how they interact with appropriate _Application properties and methods, so your own dialog boxes can take advantage of these framework features. In particular, the _Application options system has certain required elements, detailed in Appendix 1.

FoxPro Foundation Generic Classes

You may be surprised that _FRAMEWK.VCX contains only two required classes (the application and mediator objects), and in fact even when you add the other subsidiary classes, _FRAMEWK.VCX doesn’t contain much of the functionality you may expect in a Visual FoxPro application. You will not find code to perform table handling. You won’t find dialog boxes filling standard Visual FoxPro roles, such as a dialog box to select report destinations. You won’t find extensive error-handling code.

_FRAMEWK.VCX doesn’t include this functionality because there is nothing framework-specific about these requirements. Instead, it makes use of several Visual FoxPro Foundation Classes libraries, useful to any framework or application, to perform these generic functions. The _Application superclass contains several members descending from FFC classes, and it instantiates objects from other FFC classes at run time as necessary. Then it wraps these objects, setting some of their properties and adding some specific code and behavior to make these instances of the FFC classes especially useful to the framework.

For example, _Application relies on its cusError member, descended from the _Error object in FFC\_APP.VCX, to do most of its error handling, and to create an error log. However, as mentioned earlier, _Application code displays the error log using a framework-specific dialog box. The application object also sets the name and location of the error log table to match its own needs, rather than accepting _Error‘s default.

The framework uses four FFC class libraries: _APP.VCX, _TABLE.VCX, _UI.VCX, and _REPORTS.VCX. Figure 3 shows these libraries in Class Browser views, as well as in a Classes tab for a framework application project.

Figure 3. A framework application uses generic Visual FoxPro Foundation Classes, from HOME( )+ FFC folder, to supplement the framework-specific classes in _FRAMEWK.VCX.

Unlike the subsidiary classes in _FRAMEWK.VCX, the FFC classes and their complex attributes are used directly by _Application, so you don’t specify alternative classes or class libraries for these objects. You can still specify your own copies of these class libraries, as you’ll see in the next section.

If you examine the Project tab in Figure 3, or the project for any framework application, you’ll find this list of libraries built in. You’ll see _FRAMEWK.VCX, and there will be at least one class library containing the subclasses of _FRAMEWK.VCX for this application.

You’ll see one more FFC library: _BASE.VCX, which contains the classes on which _FRAMEWK.VCX and all the FFC libraries are based. Your framework project must have access to a library called _BASE, containing all the classes found in _BASE. However, neither the framework nor the four FFC class libraries it uses require any specific behavior or attributes from these classes. You are free to create an entirely different _BASE.VCX with classes of the same name, perhaps descending from your team’s standard base library.

Framework Templates

The framework templates are of three types:

Menu templates, a collection of Visual FoxPro menu definition files (.mnx and .mnt extensions)
Metatable, an empty copy of the table the framework uses to store information about the documents (forms, reports, and labels) you use in your application
Text, a collection of ASCII supporting files

Unlike the .vcx files used by the framework, Visual FoxPro doesn’t deliver separate versions of these templates on disk. Because the templates are copied, rather than subclassed, for framework applications, the templates don’t need to be available to your project as separate files. Instead, these items are packed into a table, _FRAMEWK.DBF, found in the HOME( )+ Wizards folder. The Application Wizard unpacks the files when it generates your new application (see Figure 4).

Figure 4. The Application Wizard copies template files from this _FRAMEWK.DBF table in HOME( )+ Wizards folder.

Because the files don’t exist on disk, their template file names are largely irrelevant, except to the Application Wizard. Although we’ll use the template names here, keep in mind that their copies receive new names when the Wizard generates your application.

Menu Templates

Just as the framework identifies “dialog box roles” and supplies sample dialog boxes to fill those roles, it identifies some “menu roles,” and comes equipped with standard menus to meet these requirements. The roles are startup (the main menu for your application) and navigation (a context menu for those forms you identify as needing navigation on the menu).

There are three template startup menus, each corresponding to one of the three application types described by the Application Builder as normal, top form, and module. T_MAIN.MNX, is a standard “replace-style” Visual FoxPro menu. It’s used for normal-style applications, which take over the Visual FoxPro environment and replace _MSYSMENU with their own menu. T_TOP.MNX, for top form applications, looks identical to T_MAIN.MNX, but has some code changes important to a menu in an MDI frame. T_APPEND.MNX is an “append-style” menu, characteristic of modules, which are applications that add to the current environment rather than controlling it.

There is one navigation menu template, T_GO.MNX. Its options correspond to the options available on the standard navigation toolbar (_NavToolbar in _FRAMEWK.VCX).

Note Because both T_GO.MNX and T_APPEND.MNX are “append-style” menus, they can exist as part of either _MSYSMENU or your top form menu. The Application Builder synchronizes your copy of T_GO.MNX to work with your normal- or topform-type application. However, if you change your application type manually rather than through the Application Builder, or if you want a module-type application that adds to an application in a top form, you may need to tell these menus which environment will hold them.

You make this change in the General Options dialog box of the Menu Designer (select or clear the Top-Level Form check box). If you prefer, you can adjust the ObjType of the first record in the MNX programmatically, as the Application Builder does. See the UpdateMenu( ) method in HOME( )+ Wizards\APPBLDR.SCX for details.

Like the document and toolbar classes in _FRAMEWK.VCX, the menu templates are not required. They simply provide good examples, and should give you a good start on learning how to use menus in a framework application.

In particular, you’ll notice that the menus do not call procedural code directly, only application object methods. This practice ensures that the code is properly scoped, regardless of whether the MPR is built into an app, or whether the .app or .exe holding the MPR is still in scope when the menu option runs.

Because Visual FoxPro menus are not object-oriented, they can’t easily hold a reference to the application object. To invoke application object methods, the menus use the object’s global public reference. This reference is #DEFINEd as APP_GLOBAL, in an application-specific header file, like this:

#DEFINE APP_GLOBAL              goApp

Here is an example menu command using the #DEFINEd constant (the Close option on the File menu):

IIF(APP_GLOBAL.QueryDataSessionUnload( ),
  APP_GLOBAL.ReleaseForm( ),.T.)

Each template menu header #INCLUDEs this header file. You can change the #DEFINE and recompile, and your menus will recognize the new application reference.

Note The application object can manage this public reference on its own (you don’t need to declare or release it). It knows which variable name to use by consulting its cReference property, which holds this name as a string. You can either assign the value in the program that instantiates your application object (as shown in the default main program) or you can assign this string to the cReference property of your _Application subclass at design time.

The template menus are the only part of the framework using this global reference. If you wish, your forms and other objects can use the reference, too, but there are rarely good reasons to do this. Before you opt to use the global reference, think about ways you might pass and store a reference to the application object in your forms instead. If your forms have mediator objects, they have a built-in method to receive this reference any time they need it.

Metatable Template

_FRAMEWK.DBF contains records for T_META.DBF/FPT/CDX, the table holding information about documents for your application. Records in this table indicate whether a document should be treated as a “form” or “report”—and you can create other document types on your own.

The document type designation is used by the framework dialog boxes descending from _DocumentPicker, to determine which documents are displayed to the user at run time. For example, the _ReportPicker dialog box will not display documents of “form” type, but the _FavoritePicker dialog box displays both forms and reports.

However, document type as specified in the metatable does not dictate file type. A “report” type document might be a PRG, which called a query dialog box and then ran a report based on the results.

The Application Builder creates and edits metatable records when you use the Builder to add forms and documents to the application. If you manually add a form or document to a framework project, the Project Hook object invokes the Builder to ask you for details about this document and fill out the metatable accordingly. Of course, you can also add records to the metatable manually.

The Application Builder and the _FRAMEWK.VCX dialog boxes descending from _DocumentPicker rely on the default structure of this metatable. (You’ll find its structure detailed in**Appendix 2.) The dialog boxes derive from this table the information they need to invoke each type of document, including the options you’ve set in the Application Builder for each document. (Appendix 3 gives you a full list of _DocumentPicker subclasses and their assigned roles.)

Just as you don’t have to use the _DocumentPicker dialog boxes, you don’t have to use the default metatable structure in a framework application. If you like the idea of the table, you could design a different structure and use it with dialog boxes with different logic to call the _Application methods that start forms and reports.

Note If you design a metatable with a different structure from the default, the application object can still take care of it for you. On startup, the metatable is validated for availability and appropriate structure. Once the metatable is validated, the application object holds the metatable name and location so this information is available to your application elements later, even though the application object makes no use of the metatable directly.

Edit your _Application subclass’s ValidateMetatable( ) method to reflect your metatable structure if it differs from the default. No other changes to the standard _Application behavior should be necessary to accommodate your metatable strategy.

You can also dispense entirely with a metatable in a framework application. No part of the framework, except the _DocumentPicker dialog boxes, expects the metatable to be present.

For instance, you might have no need for the dialog boxes or data-driven document access in a simple application. In this case, you can eliminate the metatable and invoke all your reports and forms directly from menu options. Simply provide method calls such as APP_GLOBAL.DoForm( ) and APP_GLOBAL.DoReport( ) as menu bar options. Fill out the arguments in these methods directly in the command code for each menu option, according to the requirements of each form and report.

Additional Text Templates

_FRAMEWK.DBF holds copies of some additional text files copied for your application’s use.

T_START.PRG is the template for the program that initializes your application object and shows the splash screen. Its behavior is well documented in comments you’ll find in the application-specific header file, described later. In addition, as just mentioned, it is not necessary. The program that creates your application object does not have to be the main program for your application, nor does it have to do any of the things that T_START.PRG does.

For example, suppose your application is a “module type,” handling a particular type of chore for a larger application. Because it is a module, it does not issue a READ EVENTS line or disturb your larger application’s environment. It may or may not need to use the framework’s user log on capabilities; you may have set up a user logging system in the outer program. The outer application may be a framework application, or it may not. All these things will help you decide what kind of startup code you need for this application object.

Let’s look at some sample code you might want to use for an accounting application. This .exe file is not a framework application, but it has a framework module added to it, which performs supervisor-level actions. Only some users are allowed to have access to this module. When your accounting application starts up, it may have an application manager object of its own, which performs its own login procedures. The method that decides whether to instantiate the framework module might look like this:

IF THIS.UserIsSupervisor( )
   THIS.oSupervisorModule = ;
      NEWOBJECT(THIS.cMyFrameworkModuleSupervisorClass,;
                THIS.cMySupervisorAppClassLib)
   IF VARTYPE(THIS.oSupervisorModule) = "O"
      * success
   ELSE
      * failure
   ENDIF
ELSE
   IF VARTYPE(THIS.oSupervisorModule) = "O"
      * previous user was a supervisor
      THIS.oSupervisorModule.Release()
   ENDIF
ENDIF

This code does not handle the public reference variable, a splash screen, or any of the other items in T_START.PRG.

You may not need the public reference variable at all because, in this example, your framework application is securely scoped to your larger application manager object. However, if your module application has menus that use the global reference to invoke your application object, you might assign the correct variable name to THIS.oSupervisorModule.cReference just above the first ELSE statement in the preceding sample code (where you see the “* success” comment). This is the strategy you see in T_START.PRG.

Note If many different outer applications will use this module, you will prefer to assign the appropriate cReference string in the class, rather than in this method (so you only need to do it once). You can assign this value to cReference either in the Properties window or in code during startup procedures for the application object. Either way, an assign method on the cReference property in _Application does the rest.

T_META.H is the template name for the application-specific header file, just mentioned in the section on menu templates. Only the menus and T_START.PRG use this header file, so it is up to you whether you use it, and how you use it. In the preceding example, you might not use it at all, or you might use only its APP_GLOBAL define to set the application object’s global reference.

The framework uses a few more text templates:

T_CONFIG.FPWNot surprisingly, provides a template for the config.fpw generated for your application. The template version gives new Visual FoxPro developers some ideas about what the config.fpw is for (it’s mostly comments); you will almost certainly wish to edit this file to meet your own standards.
T_LOG.TXTProvides a startup file for the “action log” the Project Hook will write during the life of your application to let you know what changes it has made to your application while you worked with the project.
T_HEAD.TXTProvides a standard header that the Application Wizard uses when generating your application-specific copies of framework templates. You might want to revise T_HEAD.TXT to include your own copyright notices, especially after you’ve edited the rest of the templates.

Specifying Your Own Framework Components

If you’ve done any development at all, you’ve undoubtedly experienced moments in which you identify something you wish to abstract from the process of developing a single application. You’ve done it too many times, you know how to do it, and now it’s time you figure out the best way to do it—so you never have to do it again.

In OOP terms, this is the time to develop a superclass to handle this function, so you can reuse its features. In template terms, this is the time to edit the template you copy for each application’s use. In the Visual FoxPro 6.0 application framework’s mixed environment, as you know, we have both types of components.

We’ll quickly review how these components are managed automatically by the Application Wizard and Builder during your development cycle. Then we’ll turn our attention to how you integrate your own superclasses and edited templates into this system.

Framework Components During Your Application Lifecycle

When you choose to create a new framework application, the Application Wizard takes your choices for a location and project name and generates a project file. If you select the Create project directory structure check box, the Application Wizard also creates a directory tree under the project directory. It adds _FRAMEWK.VCX and the required foundation class libraries to this project. It also adds a class library with appropriate application-specific subclasses of _FRAMEWK.VCX.

The Application Wizard then adds template-generated, application-specific versions of all the non-OOP components the application needs. As you probably realize, the Application Wizard copies these files out of the memo fields in _FRAMEWK.DBF.

_FRAMEWK.DBF contains two more records we haven’t mentioned yet: T_META.VCX and T_META.VCT. These records hold straight subclasses of the classes in _FRAMEWK.VCX, and they are copied out to disk to provide your application-specific class library.

Note T_META.VCX is not a template. It is just a convenient way for the Application Wizard to hold these subclasses, and is not part of your classes’ inheritance tree. Your subclasses descend directly from _FRAMEWK.VCX when the Application Wizard creates them, and thereafter will inherit directly from _FRAMEWK.VCX.

Once your new framework project exists, the Application Wizard builds it for the first time. It also associates this project with a special Project Hook object, designed to invoke the Application Builder. The Application Wizard shows you the new project and invokes the Application Builder.

At this point, the Application Builder takes over. The Application Builder provides an interface you can use to customize the framework aspects of any framework-enabled project, throughout the life of the project.

You can use the Application Builder to customize various cosmetic features of the application object, such as its icon. When you make these choices, the Application Builder stores them in the appropriate properties of your _Application subclass. (In some cases, it also stores them in the matching _Splash subclass properties.)

In addition, the Application Builder gives you a chance to identify data sources, forms, and reports you’d like to associate with this project. It gives you convenient access to the data, form, and report wizards as you work, in case you want to generate new data structures and documents. For inexperienced developers, the Application Builder provides a visual way to associate data structures directly with forms and reports, by providing options to invoke report and form wizards each time you add a new data source.

Whether you choose to generate reports and forms using the wizards or to create your own, the Application Builder and its associated Project Hook object help you make decisions about framework-specific use of these documents. (Should a report show up in the Report Picker dialog box, or is it only for internal use? Should a form have a navigation toolbar?) It stores these decisions in your framework metatable.

As you think about these automated elements of a framework development cycle, you’ll see a clear difference between the changes you can effect if you change the Application Wizard, or generation process, and the changes you can effect by editing the Application Builder and Project Hook. The files provided by the Wizard, in advance of development, represent your standard method of development. The changes made thereafter, through the Builder and Project Hook, represent customization you can do for this single application.

The balance of this article concentrates on enhancing the Wizard to provide the appropriate framework components when you begin a new application. Once you have established how you want to enhance the startup components, you will think of many ways you can change the Builder and the Project Hook, to take advantage of your components’ special features, during the rest of the development cycle.

Note An important change in versions after Visual FoxPro 6.0 makes it easy for you to customize the Application Builder to match your style of framework use. Rather than directly invoking the default appbldr.scx, the default Application Builder in later versions is a PRG.

The PRG makes some critical evaluations before it displays a Builder interface. For example, it checks to see whether the project has an associated Project Hook object, and whether this Project Hook object specifies a builder in its cBuilder property. See HOME( )+ Wizards\APPBLDR.PRG for details. You will find it easy to adopt this strategy, or to edit appbldr.prg to meet your own needs for displaying the Builder interface of your choice.

A preview version of appbldr.prg is included with the source for this article. See appbldr.txt for instructions on making this new Application Builder available automatically from the VFP interface, similar to the new wizard components delivered as part of the document.

A Closer Look at the Standard Application Wizard

You’ll find the Visual FoxPro 6.0 Application Wizard files in your HOME( )+ Wizards folder. When you invoke the Application Wizard from the Tools menu, it calls appwiz.prg, which in turn invokes the dialog box in Figure 5, provided by appwiz.scx.

Figure 5. The standard Visual FoxPro 6.0 Application Wizard dialog box provided by appwiz.scx

When you choose a project name and location, appwiz.prg invokes HOME( )+ Wizards\WZAPP.APP, the Visual FoxPro 5.0 Application Wizard, with some special parameters.

The older wizard contained in wzapp.app does most of the work of creating your new project files. The Visual FoxPro 5.0 Application Wizard determines that you are in a special automated mode from the object reference it receives as one parameter and does not show its original interface. It evaluates a set of preferences received from this object reference, and proceeds with the generation process.

The standard implementation has a number of constraints:

Your application subclasses descend directly from _FRAMEWK.VCX. This prevents your adding superclass levels with your own enhancements to the framework, and you certainly can’t specify different superclasses when you generate different “styles” of applications.
Your copies of the ancestor classes, in _FRAMEWK.VCX and FFC libraries, are presumed to be in the HOME( )+ Wizards and HOME( )+ FFC directories. Because these ancestor classes are built into your framework applications, and therefore require recompilation during a build, you have to give all team members write privileges to these locations or they can’t use the Application Wizard to start new framework applications. In addition, the fixed locations hamper version control; you may wish to retain versions of ancestor classes specific to older framework applications, even when Microsoft delivers new FFC and Wizards folders.
Your non-OOP components are always generated out of HOME( )+ Wizards\_FRAMEWK.DBF. The templates are not easily accessible for editing. The assumed location of _FRAMEWK.DBF prevents you from using different customized template versions for different types of apps, and also presents the same location problems (write privileges and versioning) that affect your use of the framework class libraries. As with your application subclasses, you can’t designate different templates when you generate different types of applications.
You have no opportunity to assign a custom Project Hook to the project.

To allow you to design and deploy customized framework components, a revised Application Wizard should, at minimum, address these points.

You can make the required changes without major adjustment of the current Application Wizard code, but some additional architectural work provides more room for other enhancements later.

A New Application Wizard

If you DO NEWAPPWIZ.PRG, provided in the source code for this article, you will get a dialog box almost identical to Figure 5, and functionally equivalent to the original dialog box. The only difference you’ll notice is a request, on startup, asking you if you wish to register this wizard in your HOME( )+ Wizards\WIZARD.DBF table for future use (see Figure 6).

Figure 6. The Newappwiz.prg wizard classes can be registered to HOME( )+ Wizards\WIZARD.DBF so you can choose them from the Tools Wizards menu later.

Though your newly instantiated wizard class calls the old Visual FoxPro 5.0 Wizard code just as the original one did, its internal construction allows completely new generation code to replace this approach in a future version.

You can call newappwiz.prg with a great deal of information packed into its second parameter, to indicate what wizard class should instantiate and what this wizard class should do once instantiated.

Why the second parameter, rather than the first? Newappwiz.prg, like appwiz.prg, is designed with the standard wizard.app in mind. wizard.app, the application invoked by the Tools Wizards menu option for all wizard types, uses its registration table, HOME( )+ Wizards\WIZARD.DBF to find the appropriate wizard program to run. Wizard.app passes other information in its first parameter to the wizard program (in this case, newappwiz.prg). Wizard.app passes the contents of the Parms field of wizard.dbf, as the second parameter.

If you choose Yes in the dialog box in Figure 6, the NewAppWizBaseBehavior class becomes a new choice in the registration table, and fills out its options in the Parms field. Additional NewAppWizBaseBehavior subclasses will do the same thing, registering their own subclasses as separate entries. Once a class is registered in wizard.dbf, you don’t have to call newappwiz.prg directly again.

If you’ve chosen Yes in the dialog box in Figure 6 and also choose to register the wizard subclass we investigate in the next section, when you next choose the Application Wizard from the Tools menu, you’ll get a choice, as you can see in Figure 7.

Figure 7. Select your Application Wizard du jour from the Tools Wizards option—once you have more than a single Application Wizard listed in your HOME( )+ Wizards\WIZARD.DBF table.

An Extended Subclass of the New Wizard: AppWizReinherit

With an enhanced architecture in place, we can address the issues of component-generation we’ve raised.

Run newappwiz.prg again, this time with a second parameter indicating a different wizard subclass to instantiate:

You should get another message box, similar to Figure 6, asking you if you want to register this subclass in the wizard.dbf table. When you’ve dismissed the message box, you see the dialog box in Figure 8.

Figure 8. Re-inheritance Application Wizard, page 1

The first page of this dialog box contains exactly the same options as the standard Application Wizard.

Note You’ll find all the visual classes used in the new wizards in newappwiz.vcx, as part of the source code for this article. The container you see on this page of the AppWizFormReinherit class is the same container class used in AppWizFormStandard. You can read more about these dialog box classes in Appendix 4.

Each subsequent page of the dialog box addresses one of our concerns with the way the original Application Wizard delivers components, and includes some information about how it works. (Figure 9 shows you pages 2 and 3.) Each option defaults to the same behavior you’d get from the original Application Wizard—you don’t need to fill out information on all pages.

Figure 9. Pages 2 and 3 of the Re-inherit App Wizard provide a layer of superclasses and the locations of your FFC and _FRAMEWK.VCX libraries for this framework application.

If you change the parent VCX as suggested on the second page of the dialog box, you can have one or more layers of superclasses between your application’s subclasses of _FRAMEWK.VCX. You’ll create team-specific enhancements in these layers.

Note This version of the Application Wizard will create the initial classes for you, as subclasses of the components in _FRAMEWK.VCX, if you specify a VCX name that does not exist. Later, you can create more layers of subclasses from the one the Application Wizard derived from _FRAMEWK.VCX, and designate your subclass layer in this dialog box as appropriate. The VCX you designate on the second page of this dialog box should always conform to the following rules:

Be the immediate superclasses (parent classes) of the application-specific VCX for this application.–and–
Include all the required subclasses of _FRAMEWK.VCX, with the same names as the _FRAMEWK ancestor classes.

You may want several different branches of your team-specific class levels, to match different types of framework applications you commonly create. For example, you could have one superclass set with your team’s options for a framework module and another one with your team’s topform custom attributes (including the class and classlibrary for your subclass of _topform to provide the correct frame).

Note These branches, or types, are not restricted to the “styles” or options you see represented in the Application Builder. They are just part of the normal process of subclassing and enhancing a class tree.

For example, you may decide to create Active Documents as framework applications. To do so, you’ll need an _Application subclass that is aware of its hosted environment, and makes certain interface decisions accordingly. You’ll also need an ActiveDoc subclass that is aware of the framework’s capabilities and calls application object methods in response to browser-triggered events, just as the menu templates invoke framework behavior.

Now that you can insert class levels between _FRAMEWK.VCX and your application-specific level, you can make the implementation of these features standard across applications.

If you change the locations of the FFC and _FRAMEWK.VCX libraries on the “Ancestors” page, the Application Wizard will place appropriate copies of the required class libraries in your specified locations if they don’t exist. The Application Wizard also ensures that your copy of _FRAMEWK.VCX inherits from the proper version of FFC, and that your parent classes point to the proper version of _FRAMEWK.VCX.

Note As mentioned in the section “FoxPro Foundation Generic Classes,” your FFC location can include your own version of _BASE.VCX. Your _BASE.VCX does not have to have the same code or custom properties as the original _BASE.VCX, but like your parent classes, your _BASE must include classes descended from the same Visual FoxPro internal classes, with the same names, as the classes in the original _BASE.

Other FFC libraries, not used in the framework and not described in this article, will not necessarily work with your own _BASE.VCX. For example, if your application uses _GENHTML, the _HTML.VCX library relies on code in the HOME( ) + FFC\_BASE.VCX library. If you use other FFC libraries in your framework application, you may have two _BASE.VCXs included in your project—this is perfectly normal.

The Application Wizard then focuses on your template files on the next page of the dialog box. If you set a location for your template files, the Application Wizard will create fresh copies of these files (by copying them from the original _FRAMEWK.DBF), ready for you to edit.

In each case, if the files are already in the locations you supply, the Application Wizard will use the ones you have.

The last page of the dialog box allows you to pick a Project Hook. The original AppHook class in HOME( ) + Wizards\APPHOOK.VCX is the required ancestor class for a Project Hook designed to work with this application framework, but you can add a lot of team-specific features to your Project Hook subclass. The Application Wizard attempts to verify that the class you specify on this page descends from the appropriate AppHook class.

When you generate your application, the Application Wizard will create a new set of straight subclasses from your parent VCX (or _FRAMEWK.VCX, if you haven’t changed the default on the “Parents” page). These subclasses become the new T_META.VCX/VCT records in _FRAMEWK.DBF. The Wizard appends new contents for all the other template records of _FRAMEWK.DBF from the template folder, if you’ve named one.

Note The first time you and the Application Wizard perform these tasks, it won’t make much difference to the final results. Once the Wizard gives you editable superclass layers and your own copies of the templates, however, you have all the architecture necessary to customize the framework for subsequent uses of the Application Wizard.

Having replaced _FRAMEWK.DBF records, the Application Wizard proceeds to create your new application much as before, inserting information about your designated Project Hook class at the appropriate time.

All the “enhanced” Wizard actions are tuned to respect the current setting of the lDelegateToOriginalAppWizard switch, which indicates whether the Visual FoxPro 5.0 Application Wizard code is running or if new code is creating the project. For example, because the original code only looks in the HOME( )+ Wizards folder for _FRAMEWK.DBF, if you have indicated a different place for your _FRAMEWK.DBF (on the “Templates” page) this table will be copied to HOME( )+Wizards before wzapp.app runs. (The first time this occurs, the new Wizard copies your original _FRAMEWK.DBF to a backup file in the HOME( ) + Wizards folder.) Presumably, newer code simply uses your templates table wherever you’ve placed it.

When you use this Wizard to generate a framework application it saves information about your preferred parent classes, as well as the locations of your FFC and _FRAMEWK libraries and template files, to special _FRAMEWK.DBF records. You won’t need to enter this information, unless you wish to change it. This release of the Application Wizard doesn’t save information about the custom Project Hook subclass you may have specified. However, the next section will show you how to put this information into the Parms of wizard.dbf for default use.

Note Because the Application Wizard reads its stored information out of _FRAMEWK.DBF, it can’t get the location of _FRAMEWK.DBF from a stored record! However, you can put this information into the Parms field of wizard.dbf, as described in the next section, so all your developers use the proper version of _FRAMEWK.DBF without having to look for it.

You may even decide to use a version of this Wizard class, or of its associated dialog box, that only allows some developers to change the “advanced” pages. Other team members can fill out standard information on Page 1, but they’ll still get your improved versions of all the framework components.

Registering Additional Wizard Subclasses and Customized Records

The new Application Wizard provides the opportunity to register each subclass of its superclass separately in the wizard.dbf table. The wizard stores its class name and location in the Parms field of its own wizard.dbf record.

However, you can add more information in the Parms field. You can even store multiple entries in the wizard.dbf for a single subclass, with differently tuned Parms values. The Application Wizard, once instantiated, uses this additional information.

Here’s the full list of nine options you can pass in the second parameter, or place in the Parms field, for use by NewAppWizBaseBehavior and its subclasses. All #DEFINEs mentioned in this list are in the newappwiz.h header file associated with newappwiz.prg:

These three options instantiate the Wizard:

Wizard classMust descend from #DEFINEd APPWIZSUPERCLASS, defaults to NEWAPPWIZSUPERCLASS.
Wizard classlibLibrary containing wizard class, defaults to NEWAPPWIZ.PRG.
.App or .exe file nameOptional file, containing the wizard class library.

These six options are used by the Application Wizard after it instantiates:

Wizard form classMust descend from #DEFINEd APPWIZFORMSUPERCLASS, defaults to #DEFINEd NEWAPPWIZFORMSTANDARD.
Wizard form classlibLibrary containing the form class, defaults to NEWAPPWIZ.VCX.
.App or .exe file nameOptional file containing the wizard form class library.
Project Hook classThe Project Hook class you want to associate with this project, if you don’t want to use the default Project Hook class associated with framework-enabled projects. This class should descend from the AppHook class in HOME( )+ “Wizards\APPHOOK.VXC”, so it includes the default functionality, but can include enhancements required by your team.
Project Hook classlibThe class library containing the Project Hook class you choose to associate with this project.
Template DBFHolding application components, defaults to HOME( )+ Wizards\_FRAMEWK.DBF (#DEFINED as APPWIZTEMPLATETABLE).

Store these values delimited by commas or carriage returns in the Parms field of wizard.dbf. Similarly, if you call newappwiz.prg directly, you can pass all this information as the program’s second parameter, as a single string delimited with commas or carriage returns.

After you’ve registered the AppWizReinherit class, the Parms field for this class’ record in wizard.dbf contains the following information:

APPWIZREINHERIT,<fullpath>\newappwiz.fxp,,AppWizFormReinherit, <fullpath>\NEWAPPWIZ.VCX,,APPHOOK, <fullpath of HOME()+ "Wizards"> \APPHOOK.VCX, <fullpath of HOME()+ "Wizards"> _framewk.DBF

You could run the NEWAPPWIZ program, passing the same string as its second parameter, to get AppWizReinherit‘s default behavior.

Using our ActiveDoc example just shown, you could create a wizard.dbf entry that invokes the same Wizard class but defaults to a different parent VCX and different menu templates than the rest of your framework applications.

To accomplish this, you’d edit the information in the ninth value for this row of the wizard.dbf table, which indicates Template DBF, by editing the Parms field.

Your new row in the table contains the same string in the Parms field, except for the section following the last comma, which points to a new template table. Your special ActiveDoc copy of _FRAMEWK.DBF holds your special Active Document menu templates and superclass information.

Next, suppose you decide that your ActiveDocument framework applications need a special Project Hook subclass, not just special superclasses and menu templates. You could specify this hook automatically, in the seventh and eighth sections of the Parms field. You might even subclass the AppWizFormReinherit dialog box, to disable the last page of this dialog box for ActiveDocument-type applications, by changing the fourth and fifth sections of the Parms field. (This way, your team members would always use the right Project Hook class when generating this type of framework application.)

If you made all these changes, this new entry in the wizard.dbf table might have a Parms field that looked like this:

APPWIZREINHERIT,<fullpath>\newappwiz.fxp,,MyAppWizActiveDocumentDialog, <fullpath>\MyAppWizDialogs.VCX,,MyActiveDocumentAppHookClass, <fullpath> \MyHooks.VCX, <fullpath>\MyTemplates.DBF

You would also edit the Name field in wizard.dbf for this entry, perhaps to something like “Active Document Framework Application,” to distinguish this entry from your standard values for the AppWizReinherit class.

When one of your team members accessed the Tools Wizards option from the system menu, “Active Document Framework Application” would now appear on the list of available Wizards, as part of the list you saw in Figure 7. The developer could automatically create the right type of framework application, without making any special choices.

A Few Parting Thoughts about Team Practices

You’ll notice a check box in the Reinheritance Wizard‘s dialog box, indicating that you can omit message boxes and generate your new application with no warning dialog boxes or user interaction. Although this is a helpful option once you’ve used this Wizard a few times, please be sure to read all the message boxes, and the information in the edit boxes on the various pages of this dialog box, at least once.

Any developer’s tool, especially one that edits visual class libraries and other metafiles as extensively as this one does, can potentially cause problems if the system is low on resources. The Help text available within this Wizard attempts to point out its potential trouble spots, so you can close other applications as needed, and have a good idea of what to expect at each step. Other caveats, such as incompletely validated options in this preliminary version, are indicated in the Help text as well.

You also see a More Info button, which provides an overview of the issues this class is meant to address, and how you can expect it to behave (see Figure 10).

Figure 10. Wizard documentation under the More Info button

Beyond its stated purpose to enhance the Application Wizard, AppWizReinherit and its dialog box class try to give you a good model for tool documentation, both at design and run time. The dialog box’s NewAppWiz_Documentation( ), GetUserInfo( ), and DisplayDocumentation( ) methods should give you several ideas for implementation of run-time documentation. Newappwiz.prg has a demonstration procedure, BuilderGetDocumentation( ), which shows you how you can apply these ideas to design time documentation for Builders as well. A final demonstration procedure in newappwiz.prg, ReadDocs( ), shows you another aspect of this process.

Each documentation idea demonstrated here is a variation on a theme: Text is held (using various methods) within the VCX, so it travels with the VCX and will not get lost no matter how widely you distribute the library.

Whether you use these particular implementations is not important; in many cases you’ll be just as well off if you create a text file with documentation and use Visual FoxPro’s FileToString( ) method to read this information for display by the tool whenever necessary.

No matter how you decide to implement it, documentation that helps your team better understand the intended use, extension possibilities, and limitations of the tools you build is critical to their adoption and successful use.

A framework is, in itself, a kind of abstraction, a level above daily activities. Enhancements to a framework represent yet another level of abstraction. Your team will benefit from all the extra attention you can give to communicating your goals for this process.

With any framework, you can efficiently prototype applications and build complete lightweight applications. With a framework set up the way your team operates, you can accomplish these goals without sacrificing quality, depth, or your normal habits of development. With a framework set to deliver your standard components and practices automatically, even new developers can make meaningful, rewarding contributions to your team effort.

Appendix 1: The User Option System

The framework employs a user-registration system based on a user table that is created by the application object if not found at run time. The application object uses the cUserTableName property to set the name and location of this table. If no path is supplied in this property, the location will be set by the cAppFolder property.

Note By default, the application object sets cAppFolder to the location of the APP or EXE that instantiated it. If, for some reason, the application object was instantiated outside a compiled APP or EXE container, cAppFolder contains the location of the application object’s VCX.

If necessary, the application object creates this table in the appropriate location, using the following code (excerpted from the CreateUserTable( ) method):

lcIDField = THIS.cUserTableIDField
lcLevelField = THIS.cUserTableLevelField
* names of two generic-requirement fields,
* User ID and level, are specified by
* _Application properties in case you
* wish to match them to some existing system
CREATE TABLE   (tcTable) ;
   ((lcIDField) C(60), ;
   (lcLevelField) I, ;
   UserPass  M NOCPTRANS, ;
   UserOpts  M NOCPTRANS, ;
   UserFave  M NOCPTRANS, ;
   UserMacro M NOCPTRANS, ;
   UserNotes M )
INDEX ON PADR(ALLTR(&lcIDField.),60) TAG ID
* create a case-sensitive, exact word match
INDEX ON PADR(UPPER(ALLTR(&lcIDField.)),60) TAG ID_Upper
* create a case-insensitive, exact word match
INDEX ON DELETED( ) TAG IfDeleted

If you don’t opt to have users log in and identify themselves in this application, this table is still created. In this case it supplies a default record, representing “all users,” so user macros, favorites, and options can still be stored in this table on an application-wide basis.

Note Because of their “global” nature in Visual FoxPro, user macro saving and setting features are only available to framework applications that issue READ EVENTS. Module applications are not allowed to edit the macro set.

When a user logs in, his password is evaluated using the user table’s UserPass field. A SetUserPermissions( ) method, abstract in the base, is called at this time so the user’s level can be checked in order to make appropriate changes to the application and menu options as well.

If the login is successful (or when the application starts up assuming no user login for this application), user name and level are stored in the cCurrentUser and iCurrentUserLevel properties.

User macros, favorites, and options are set from the user’s record in the user table. The _Application code handling macros rely on standard Visual FoxPro abilities to SAVE and RESTORE macros to and from the UserMacro memo field. The favorites system uses an easy-to-read ASCII format in the UserFave memofield. However the options system and the UserOptions field deserve more explanation.

The user table stores option information in its UserOptions memo field, by SAVEing the contents of a local array. This local array is RESTOREd and copied into a member array, aCurrentUserOpts, to establish user options when the current user is set.

The array format is fixed, and yet extremely flexible in the types of user options that can be stored. The allowable options include SETs and member properties, and the options should be specified as being “global” to the application or private to a datasession. The array is laid out, to specify these attributes of each option, in four columns, as follows.Expand table

User Option Array Column 1	Column 2	Column 3	Column 4
Item nameFor a SET command, the item you’re setting, same as what you’d pass to the SET( ) function. For an object, the property you wish to set. Can be the Member.Property you wish to set.	Value for this item	Property (.F.) or SET (.T.) ?	Session (.F.) or Global (.T.) ?

Each time a user logs in, the application method ApplyGlobalUserOptions( ) applies SET options and application object property values for all array rows with .T. in the fourth column. The mediator object has the responsibility to call the application method ApplyUserOptionsForSession( ), on your instructions, passing a reference to its parent form. This method applies SET options and form property values for all array rows with .F. in the fourth column.

The _Options dialog box supplied in _FRAMEWK.VCX gives you examples of all the combinations that can be created for a user option using this array, although its contents are merely examples. It shows you how the user options stored in an array can be expressed as a user interface, giving the user a chance to make changes. It also shows how results of a user-option-setting can be “translated” back into the user options array for use during this login, or saved as defaults to the user preference table.

You will note that, when the user options to apply changes to the current settings, the Options dialog box reinvokes ApplyGlobalUserOptions( ) and then iterates through the available forms, giving their mediators a chance to reapply session settings if they’re set to do so.

In many cases, a “global” setting can transferred to forms as well. For example, the _ErrorLogViewer dialog box has a mediator that checks the application’s cTextDisplayFont setting. This is a global user option, because it provides a chance for the user to specify a text font across all the UI of an application. The mediator transfers the value of the cTextDisplayFont to a property of the same name belonging to its parent dialog box. An assign method on this property then applies the fontname value to all members of the dialog box that should reflect the setting.

Appendix 2: The Default Metatable Structure

This table shows you the default structure of the framework’s metatable. Appendix 3 shows you how the default _FRAMEWK.VCX dialog boxes use this information.Expand table

FieldName	Type	Use
Doc_type	C	This field contains a character to distinguish between document types. Currently, “F” is used for “forms” and “R” is used for “reports.” But this designation just determines how the document type is presented in the interface, not necessarily what type of Visual FoxPro source code file underlies the document. See Alt_Exec and Doc_wrap fields, below.More document types may be added. The framework already contains one extra type, “A,” specifically reserved for you to add application information. The framework will not use “A”-type metatable records in any way, so the reservation of this type simply allows you to use metatable records, or perhaps one metatable header record, as a convenient place for system storage. In most cases, you would want to transfer the contents of such a record to application properties on startup.
Doc_descr	C	The “caption” or long description you want to show up in document picker lists.
Doc_exec	M	The name of the file to be run, usually an .scx or .frx file. In the case of a class to be instantiated, this is the .vcx file name.For Form-type documents, the file extension is assumed to be .scx unless this entry is marked “Doc_wrap” (see below) or the Doc_class field is filled out, in which case the extension is assumed to be .vcx.For Report-type documents, the file extension will default to .frx unless this entry is marked “Doc_wrap”. If no .frx file exists by that name, the application object looks for an .lbx file.In all cases, you may also fill out the file extension explicitly.In all cases, if you Include the file to be run in the project, you need not use paths in this field. If you wish to Exclude the file from the project, you may use path information. Assuming your applications install their subsidiary Excluded files to the appropriately located folder, relative pathing should work in the metatable, and is probably the best policy in this case!
Doc_class	M	The class to be instanced, where the Doc_exec is a .vcx file
Doc_new	L	Mark this .T. for a Form-type document you wish to show up in the FileNew list. When the application object instantiates a form from the FileNew list, it sets its own lAddingNewDocument property to .T. This practice gives the form a chance to choose between loading an existing document or a blank document during the form’s initialization procedures.In many cases, the form delegates this process to its mediator object. The mediator object saves this information for later use.If you do not use a mediator, you may wish to save this information to a form property; you can’t expect the application object’s lAddingNewDocument to reflect the status of any particular form except during the initialization process of that form.For a Report-type document, this field denotes an editable report (new report contents, or even a new report from a template). This capability isn’t currently implemented.
Doc_open	L	Mark this .T. for a Form-type document you wish to show up in the FileOpen list.For a Report-type document, this field denotes a runnable report or label and will place the item in the report picker list.
Doc_single	L	Mark this .T. for a Form-type document that is modeless but should only have one instance. The application object will bring it forward, rather than create a second instance, if the user chooses it a second time.
Doc_noshow	L	Mark this .T. for a Form-type document that you wish to .Show( ) yourself after additional manipulation, rather than allowing the DoForm( ) method to perform the .Show( ).Note You will have to manipulate the application’s forms collection or the current _SCREEN.Forms( ) contents to get a reference to this form, so you can manipulate the form and then .Show it when you are ready. If you need this reference immediately, the best place to get it is probably the application object’s aForms[] member array. At this moment, the application object’s last-instantiated form is the one for which you want the reference, and the application object’s nFormCount property has just been refreshed. Therefore, .aForms[THIS.nFormCount] gives you the reference you need when you’re in an application object method (in other code, replace THIS with a reference to the application object). You can see an example of this usage in the _Application‘s DoFormNoShow( ) method.You can create Doc_Wrap programs as described in the entry for the next field. Your wrapper program can take advantage of the DoFormNoShow( ) method, receive its return value (a reference to the form or formset object), and proceed to do whatever you want with it.
Doc_wrap	L	If this field is marked .T. indicating a “wrapped” document, the application’s DoProgram( ) method will run instead of its DoReport( )/DoLabel( ) or DoForm( ) method.If you omit the file extension, the DoProgram( ) method uses the standard Visual FoxPro extension hierarchy to figure out what file you wish to run (“.exe .app .fxp .prg”).
Doc_go	L	If this field is marked .T. and the document is “Form”-type, the form uses the framework’s standard Go context menu for navigation. The menu name is configurable using the application object’s cGoMenuFile property. This field is not used for report-type documents.
Doc_nav	L	If this field is marked .T. and the document is “Form”-type, the form uses the framework’s standard navigation toolbar for navigation. The class is configurable using the application object’s cNavToolbarClass and cNavToolbarClassLib properties. This field is not used for report-type documents.
Alt_exec	M	If this field is filled out, it takes precedence over the Doc_exec field just described. When the user makes a document choice, the _DocumentPicker’s ExecDocument( ) method converts the contents of this field into a string and runs that string as a macro.Your Alt_exec statement can be anything you choose, and it can use attributes of the metatable, including the Properties field (below) however you want. For example, you can choose to have the metatable editable (on disk) rather than included in the APP/EXE, and you can place information in the Properties field dynamically at run time. Your document would then be able to be “aware” of this information by examining the current contents of the Properties field.
Properties	M	This memo field is not used by the framework in any way. It’s for developer use, primarily in conjunction with the Alt_exec field.
User_notes	M	This memo field is not used by the framework in any way. It can be used for notes that would be displayed as Help text for a particular form or report, and so on.

Appendix 3: Default Document- Management Elements of the Framework

The framework accesses metatable information through the _DocumentPicker classes. _DocumentPicker is an abstract standard dialog box class, which contains a picklist and a couple of buttons. The working _DocumentPicker subclasses each have their own way of using the information in the metatable to perform two tasks:

Show the documents in the picklist.
Run the appropriate action when the user picks a document.

Each subclass stores the relevant metatable fields into an array, which serves as the data source for the list box in the dialog box. The same array holds the metatable information that will eventually act on the user’s choice.

The _DocumentPicker superclass has an abstract FillDocumentArray( ) method, designed to perform the first service during the dialog box Init( ), and another abstract method called ExecDocument( ), which is triggered whenever/however the user makes a selection from the document list.

The _DocumentPicker class receives a parameter from the application object. Each subclass of _DocumentPicker uses the parameter to determine which of two states it is supposed to be in when it displays its document list and acts on the user’s choice of a document from the list. The _DocumentPicker superclass simply makes note of this logical value, leaving it to the subclasses to interpret it.

The various _DocumentPicker’s FillDocumentArray( ) methods concentrate on different document types, and fill the array with the appropriate information for that type. Their ExecDocument( ) methods call different application object methods depending on their document type and the dialog box’s current state, sending information from the metatable from the array to method arguments as needed.

The first two columns in the table below show you the names of these working classes and the document types that will appear in their lists, courtesy of their FillDocumentArray( ) method. The other columns show the application methods that call them, and the meaning assigned to their two states when ExecDocument( ) is triggered. Each application method listed here takes a logical parameter (defaulting to .F., State 1) to indicate for what purpose the class presents its document list.Expand table

_DocumentPicker Subclass	_Document types	Associated _Application method	State 1 action	State 2 action
_NewOpen	forms	DoNewOpen( )	Edit	Add
_ReportPicker	reports and labels	DoReportPicker( )	Run report/label	Modify/Add not implemented in _Application superclass.
_FavoritePicker	documents and files of any type	DoStartupForm( )	Run document/file	Put document / file on Favorites menu for quick access.

Appendix 4: Using the NEWAPPWIZ Visual Classes

AppWizFormReinherit, the dialog box called by AppWizReinherit, and AppWizFormStandard, the default dialog box with the same interface as the original wizard, both descend from the same superclass, AppWizFormBaseBehavior (see Figure 11).

Figure 11. Newappwiz.vcx in the Class Browser

AppWizFormBaseBehavior is the required superclass for any dialog box provided as the UI of a NewAppWizBaseBehavior or its descendents. The Application Wizard superclass validates your dialog box class when it instantiates the dialog box as descending from this superclass dialog box.

NewAppWizBaseBehavior contains only the very simple required behavior, no visible controls. It has three custom properties to represent required wizard information (project name, location, and whether or not the Wizard should generate project directory structure). It receives this information from an object reference the Wizard passes. It has a Finish( ) method which passes this information back to the Application Wizard.

In your subclass of AppWizFormBaseBehavior, you simply databind the interface controls of your choice to these three custom properties. You create other controls and custom properties to represent your enhanced options. Your dialog box calls the Finish( ) method when you’re ready to generate. (Both AppWizFormReinherit and AppWizFormStandard use the OKButton class you see in Figure 11, which contains the call to its parent form’s Finish( ) method.)

You can augment Finish( ) to pass more options from the dialog box back to your Wizard subclass as necessary.

You’ll find more information in the NewAppWiz_Documentation method of the superclass. The default AppWizFormStandard subclass shows you a simple example of how to make it work.

ADO Jumpstart for Microsoft Visual FoxPro Developers

Article
06/30/2006

Introduction

Microsoft®ActiveX® Data Objects (ADO) is perhaps the most exciting new Microsoft technology in quite some time. Because ADO is concerned with data, this new technology is of particular interest to Microsoft® Visual FoxPro® developers. Of course, you may ask, “Why do I need ADO? Visual FoxPro already has a high-performance local data engine.” It’s a good question.

This paper provides the Visual FoxPro developer with a background of what ADO is and how to incorporate ADO into Visual FoxPro applications. After reading this paper, you should have enough information to readily answer the question: “Why do I need ADO?”

A Brief Word About ADO Events

One limitation of Visual FoxPro has been an inability to surface COM events. While Visual FoxPro can respond to events raised by ActiveX controls, objects created with the CreateObject function cannot. In Microsoft®Visual Basic®, COM Events are handled by using the WithEvents keyword. In Visual FoxPro, the new VFPCOM.DLL achieves the same results. The topics VFPCOM, ADO Events, and how to integrate ADO and Visual FoxPro will be discussed in another white paper. This paper is dedicated to providing the Visual FoxPro developer, with a comprehensive overview of ActiveX Data Objects, Remote Data Services (RDS), their respective objects, and how those objects work.

This paper covers the following topics:

What are ADO and OLE DB?
Why incorporate ADO into a Visual FoxPro application?
The ADO object model
Remote Data Services

What Are OLE DB and ADO?

When discussing ADO, we are really talking about two distinct elements: the ActiveX data objects themselves and Microsoft Universal Data Access technology, more commonly known as OLE DB.

OLE DB and Universal Data Access

In simple terms, OLE DB is the succeeding technology to the Open Database Connectivity (ODBC) standard. OLE DB is a set of low-level interfaces that facilitate the Microsoft Universal Data Access strategy. ADO is a set of high-level interfaces for working with data.

While both ODBC and OLE DB have the ability to make data available to a client, the capabilities of the two technologies are very different. ODBC is primarily designed for use on relational data. However, data exists in nonrelational as well as relational formats. In addition to new data formats, data resides in new places such as the Internet. Finally, the Microsoft Component Object Model (COM) framework requires better data access technology. Clearly, ODBC does not address these needs; a new technology is needed. That technology is OLE DB, and it is here to stay.

The following graphic best illustrates how OLE DB and ADO work together. Clients can work directly with OLE DB or can work with OLE DB through the ADO interface (the latter is typically the case). Note that OLE DB can access SQL data either directly or through ODBC. An OLE DB provider provides direct access by OLE DB. Also note that OLE DB can also be used to access a variety of non-SQL data, as well as data that exists in mainframes. The ability to access data through a common interface, without regard to data location or structure, is the real power behind ADO and OLE DB.

Whereas ODBC uses drivers, OLE DB uses providers. A provider is a software engine that provides a specific type of data that matches the OLE DB specification. Several OLE DB providers exist today, including those for Microsoft SQL Server™ and Oracle. Because there is such widespread use of ODBC, an OLE DB provider for ODBC has also been created in order to ease the migration from ODBC to OLE DB. Several nonrelational providers are currently under development. Perhaps the most anticipated of these is the OLE DB Provider for Microsoft Outlook®. A special provider, MS Remote, allows direct data access over the Internet. This brief list of providers shows the third-party community commitment to OLE DB, and many new providers are currently under development. For the latest news on available providers, refer to https://www.microsoft.com/data/.

ADO Overview

OLE DB is then a set of low-level interfaces that provide access to data in a variety of formats and locations. While powerful, OLE DB interfaces can be cumbersome to work with directly. Fortunately, ADO provides a set of high-level, developer-friendly interfaces that make working with OLE DB and universal data access a relatively simple task. Regardless of the programming environment you use, any Visual Studio® or Microsoft Office product such as Visual FoxPro, Visual Basic, Visual C++®, or Word, the interface you will use to access data remains constant. That interface is ADO, which in turn uses OLE DB.

ADO itself is just a set of objects. By itself, ADO is not capable of anything. In order to provide any functionality, ADO needs the services of an OLE DB provider. The provider in turn uses the low-level OLE DB interface to access and work with data. One ADO connection may use a SQL Server OLE DB provider and another ADO connection may use an Oracle OLE DB provider. While the interface is constant, the capabilities may be very different because OLE DB providers are very different, which highlights the polymorphic nature of OLE DB.

As developers, we crave consistency. ADO provides us with a consistent interface for our program code.

ADO Version Summary

The current version of ADO (2.1) is the fourth version of ADO to be released in less than two years. ADO 1.0 was primarily limited to working with Active Server pages. Only one OLE DB provider existed, the OLE DB Provider for ODBC Drivers.

ADO (2.1)—Ships with the newest version of Microsoft Web browser, Internet Explorer 5.0. When discussing data or anything related to the Internet, it is almost impossible to do so without mentioning XML. XML, the Extensible Markup Language, is a mark-up language that allows users to create custom tags to describe data. XML is quickly becoming the universal format for storing and streaming data. The primary storage format in Office 2000 for document data will be XML. ADO (2.1) client-side recordsets can be saved as XML documents.

ADO (2.0)—Represented a huge gain in functionality. One of the most notable new features was the ability to create client-side recordsets. To go along with this, also added were the abilities to create filters and indexes, and the ability to sort recordsets. These abilities are very much the same as those that exist with Visual FoxPro cursors. Finally, the ability to persist client-side recordsets was also added. In effect, data could be acquired from a server into a client-side recordset. The client-side recordset could then be saved as a file on the local hard-drive that could be opened at a later time without being connected to the network.

ADO (1.5)—Introduced new capabilities and providers to ADO. Among the new providers was the OLE DB Provider for Jet (the JOLT Provider). The MS Remote Provider, which powers the Remote Data Services (RDS), was introduced as well. This version also introduced the ability to create disconnected recordsets.

What You Need to Get Started

In order to work through the examples presented in this paper, you will need the following:

Microsoft Visual FoxPro 6.0
Microsoft Data Access Components, which can be downloaded from https://www.microsoft.com/data/
SQL Server 6.5 or 7.0 with the sample Northwind database installed
A system DSN called TasTrade that points to the TasTrade Visual FoxPro Sample Database
A system DSN called Northwind that points to the SQL Server Northwind database

Why Incorporate ADO into a Visual FoxPro Application?

Have you ever wanted to pass a cursor as an argument to a function or class method? Or have you wanted to pass data to automation server applications such as Microsoft Word or Excel? Perhaps you have created a Visual FoxPro DLL and have needed a way to pass data from the user interface to a class method in the DLL. Maybe you have been looking for a way to stream data across the Web. If your answer is “yes” to at least one of these, ADO can help you today!

Until now, the world of component-based development has lacked one thing: a method of effectively moving data between processes. Now, whether ADO is hosted by Visual FoxPro, Visual Basic, Excel, or Word, the interface is consistent. The new COM capabilities of Visual FoxPro 6.0 enable creating of ADO recordsets, populating them with data, and passing them to a variety of processes. This all goes to support the strategic positioning of Visual FoxPro, a creator of middle-tier components.

Just about everything in Visual FoxPro is an object, except for reports, menus, and data. One of the biggest feature requests from Visual FoxPro developers has been the ability to work with data as a set of objects. Data objects provide several benefits, including an enhanced event model and the ability to overcome limitations of Visual FoxPro cursors. While many limitations are gone, many benefits of Visual FoxPro cursors have been retained. As you work with ADO, there’s good reason to think are many similarities to Visual FoxPro; ADO is based on the Visual FoxPro cursor engine. So, for those who have wanted data objects in Visual FoxPro, the wait is over with ADO.

ADO is not a replacement for Visual FoxPro cursors. Rather, Visual FoxPro cursors and ADO are complementary. When used together, very powerful applications can result. The following pages detail the ADO object model and the common properties and methods you will work with, including:

Remote Data Services (RDS), technology which allows for the streaming of data over the Internet via HTTP.
VFPCOM.DLL, which enables the handling of COM events in Visual FoxPro.
ADO Integration into Visual FoxPro.

This section has several comprehensive examples on strategies you may employ when integrating ADO into your Visual FoxPro Applications.

ADO Object Model

Connection Object

ProgID: ADODB.Connection

The purpose of the Connection object is to provide access to a data store. To illustrate, the following code creates an ADO Connection object:

oConnection = CreateObject("adodb.connection")

Once an ADO Connection object has been created, you can access its data store. An active connection can be established by providing a few pieces of key information and invoking the Open( ) method of the Connection object. The following code opens a connection to the Visual FoxPro TasTrade database:

oConnection.Open("TasTrade")

Alternatively, the following code accesses the SQL Server Northwind database:

oConnection.Open("Northwind","sa","")

These two examples work with the OLE DB Provider for ODBC drivers. Different OLE DB providers can be used as well. The following example sets some common properties of the Connection object and uses the OLE DB Provider for SQL Server:

With oConnection
   .Provider = "SQLOLEDB.1"
   .ConnectionString = "Persist Security Info=False;User 
      ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith

Using and creating data link files

The syntax of the ConnectionString property appears complicated. Fortunately, you don’t have to code this by hand. When you install the Microsoft Data Access Components (MDAC), you can create a data link file.

To create a data link file:

Right-click your desktop and choose New\Microsoft Data Link from the pop-up menu.
Specify a name for the file.
Right-click and select Properties to modify the file properties.
In the Properties dialog box, click the Provider tab, and choose a provider.
The OLE DB Provider for ODBC is the default choice. For this example, select the OLE DB Provider for SQL Server.
Click the Connection tab.
Specify the name of the server, your user name and password, and the name of the database you wish to connect to.
Open the UDL file in Notepad.Now, it is just a matter of copying and pasting the information. Alternatively, you can use the file itself:oConnection.Open(“File Name=c:\temp\test.udl”)

ADO recognizes four arguments in the ConnectionString:

File Name: Specifies the name of a UDL file to use.
Provider: Specifies the name of an OLE DB provider to use.
Remote Provider: Specifies the name of a provider to use with Remote Data Services (RDS).
Remote Server: Specifies the server on which data resides when using Remote Data Services (RDS).

Any additional arguments passed in the ConnectionString are passed through to the OLE DB provider being used.

In addition to the Open method, the following are the common methods you are likely to use with the Connection object:

BeginTrans, CommiTrans, and RollBackTrans—These methods work like the Begin Transaction, End Transaction, and RollBack statements in Visual FoxPro. The Connection object controls all transaction processing. For more detail, see the section Transactions/Updating Data. Note that not all OLE DB providers support transaction processing.
Close—This method closes an open Connection object.
Execute—This method runs a SQL statement, stored procedure, or OLE DB provider-specific command. In reality, a Command object, which actually does the work of executing the command, is created on the fly. More on the Command object and the flat object hierarchy of ADO later in this paper.
OpenSchema—This method returns information regarding defined tables, fields, catalogs, and views into an ADO Recordset object. This method works like the DBGetProp( ) function in Visual FoxPro.

Errors collection

ADO does not trap errors, nor does it have an error handler. Instead, ADO can record the occasions when errors occur. It is up to the host application, Visual FoxPro in this case, to both trap and handle the error. ADO only reports what errors have occurred. Note that the error is actually reported by the specific OLE DB provider. ADO is merely a vehicle to report the error.

The Errors collection is part of the Connection object and consists of zero or more Error objects. When an error occurs, an Error object is appended to the Errors collection. The following code illustrates how the Errors collection works. In this example, the name of the database has been misspelled purposely in order to generate an error:

oConnection = CreateObject("adodb.connection")
With oConnection
   .Provider = "SQLOLEDB.1"
   .ConnectionString = "Persist Security Info=False;User 
    ID=sa;Initial Catalog=Nothwind;Data Source=JVP"
   .Open
EndWith
*/ At this point an error will occur – causing VFP's default error
*/ handler – or the active error handler to invoke
*/ At this point, we can query the Errors Collection of the
*/ Connection Object
For Each Error In oConnection.Errors
   ?Error.Description,Error.Number
Next Error

Recordset Object

ProgID: ADODB.Recordset

Once you establish an ADO connection, you can open a recordset of data. The Recordset object is very much like a Visual FoxPro cursor. Like the Visual FoxPro cursor, an ADO recordset consists of rows of data. The recordset is the primary object that you will use while working with ADO. Like the Connection object, the Recordset object also provides an Open method. To illustrate, the following code opens the Customer table of the Visual FoxPro Tastrade database:

oRecordSet = CreateObject("adodb.recordset")
oRecordSet.Open("Select * From Customer",oConnection)

The first argument of the Open method specifies the source of data. As you will see, the source can take on several forms. The second argument of the Open method specifies a connection to use for retrieving the data specified by the source. At a minimum, this is all you need to open a recordset. Additional examples will expand on the additional arguments the Open method accepts.

With a Recordset object created, one of the most common actions you will perform is navigating through records. Depending on the type of ADO recordset that has been created, certain navigational capabilities may or may not be available. The different types of possible ADO recordsets will be discussed shortly. The following code illustrates how to navigate through an ADO recordset:

Do While !oRecordSet.Eof
   oRecordset.MoveNext
EndDo

The following paragraphs briefly describe the most common recordset properties and pethods you are likely to use. It is by no means a replacement for the ADO documentation, which gives both a complete description of the properties and methods and complete descriptions of acceptable enumerated types and arguments. ADO is well documented in the Microsoft Data Access Components (MDAC) SDK. You can download the MDAC SDK from https://www.microsoft.com/data.

In addition, I highly recommend ADO 2.0 Programmers Reference, by David Sussman and Alex Homer, from Wrox Press.

RecordSet types

You can create four types of recordsets in ADO:

Forward Only—This type of recordset can be navigated only in a forward direction. It is ideal when only one pass through a recordset is required. Examples include populating a List box or a Combo box. The RecordCount property is irrelevant with this type of recordset.
Keyset—This type of recordset keeps acquired data up to date. For example, if you retrieve 100 records, data modified by other users to those 100 records will be visible in your recordset. However, modifications regarding new or deleted records made by other users will not be visible in your recordset. Both forward and backward navigation are supported. The RecordCount property returns a valid value with this type of recordset.
Dynamic—With this type of recordset, all underlying data is visible to the Recordset object. Because the number of records in the underlying table can change, the RecordCount property is irrelevant with this type of cursor. However, forward and backward navigation are supported.
Static—Both the number of records and data are fixed at the time the Recordset object is created. The only way to get the latest version of data and all records is to explicitly invoke the Requery method. You can use the RecordCount property. In addition, both forward and backward navigation is permitted.

RecordSet locations

Recordset objects can exist in either of two locations, the server or the client:

Server—The most common examples of server-side ADO recordsets are those created through Active Server Pages (ASP).
Client—A recordset that resides on a workstation is useful when creating disconnected recordsets or recordsets on which you wish to apply filters, sorts, or indexes.

The most common properties you are likely to use with ADO recordsets include the following:

ActiveCommand property—An object reference to the Command object that created the recordset.
ActiveConnection property—An object reference, to the Connection object, that provides the link to an underlying data source.
AbsolutePosition property—Specifies the relative position of a record in an ADO recordset. Unlike the Bookmark property, which does not change, the AbsolutePosition property can change depending on the active sort and filter.
Bookmark property—A unique record identifier that, like the record number in a Visual FoxPro cursor or a record number in Visual FoxPro, does not change during the life of a recordset.
BOF/EOF properties—Beginning of File and End of File, respectively, that work just like the BOF( ) and EOF( ) functions in Visual FoxPro.
EditMode property—Specifies the editing state of the current record in an ADO recordset.
Filter property—The string that represents the current filter expression. This property is like the SET FILTER statement in Visual FoxPro. Unlike the Find method, multiple expressions linked with AND or OR operators are allowed. This property is only applicable to client-side recordsets.
Sort property—A comma-delimited set of fields that specifies how the rows in an ADO recordset are sorted. This property is only applicable to client-side recordsets.
State property—Specifies the state of an ADO recordset. Valid State properties are closed, open, connecting, executing, or fetching.
Status property—Specifies the editing status of the current record. Valid Status properties include unmodified, modified, new, and deleted. This property can be any one of the values contained in RecordStatusEnum.
MarshalOptions property—Specifies how records are returned (marshaled) to the server. Either all or only modified records can be returned. This property is only applicable to client-side disconnected recordsets
MaxRecords property—Specifies the total number of records to fetch from a data source.
RecordCount property—Specifies the number of records in a recordset. This property is like the Recc( ) function in Visual FoxPro.
Source property—Specifies the command or SQL statement that provides data for the recordset.

Note The type and location of a cursor as well as the OLE DB provider you select will affect the recordset properties that are available.

Use the following table as a guide to help you make the right recordset type and location decision:

Table 1. PropertiesExpand table

Type	Bookmark	RecordCount	Sort	Filter	MarshalOptions
Forward Only
Key Set	`4`	`4`
Dynamic
Static: Client	`4`	`4`	`4`	`4`	`4`
Static: Server	`4`	`4`

Only client-side recordsets can be sorted and filtered. If the CursorLocation property of ForwardOnly, KeySet, and Dynamic recordset types is set to adUseClient, making them client-side cursors, the CursorType property is automatically coerced to the Static Cursor type.

Note This is the behavior of the OLE DB Provider for SQL Server. The OLE DB Provider for ODBC supports only ForwardOnly and Static recordsets, regardless of where the recordset resides.

As with properties, method availability can also vary:

Table 2. Available MethodsExpand table

Type	MoveFirst	MovePrevious	MoveNext	MoveLast	Resync	Requery
Forward Only			`4`			`4`
Key Set	`4`	`4`	`4`	`4`		`4`
Dynamic	`4`	`4`	`4`	`4`		`4`
Static – Client	`4`	`4`	`4`	`4`	`4`	`4`
Static – Server	`4`	`4`	`4`	`4`		`4`

The following list describes some of the common methods you will use in the ADO Recordset object:

MoveFirst, MovePrevious, MoveNext, MoveLast, and Move methods—Navigational methods that work as their respective names imply. The Move method accepts two arguments, the number of records to move and the position from which to begin the move. The Move method is similar to the Go statement in Visual FoxPro. MoveFirst and MoveLast work like Go Top and Go Bottom, respectively. Finally, MovePrevious and MoveNext work like Skip 1 and Skip –1, respectively.
Find method—Accepts a criterion string as an argument and searches the recordset for a match. If a match is not found, depending on the search direction, either the BOF or EOF property will evaluate to true (.T.). This method works much the same way as the Seek and Locate statements in Visual FoxPro. Unlike the Filter property and the Seek and Locate statements in Visual FoxPro, the ADO Recordset object does not allow multiple search values joined by the And or the Or operator. Using anything other than a single search value will result in an error.
Open method—Opens an existing ADO Recordset object. This method accepts several arguments and is discussed in detail later in this section.
Close method—Closes an ADO Recordset object. Many properties, such as CursorType and LockType, although read/write, cannot be modified while the recordset is open. The Close method must be invoked before those and other properties are modified.
Update and UpdateBatch methods—Update writes changes for the current record to the underlying data source; UpdateBatch writes pending changes for all modified records to the underlying data source. The UpdateBatch method is only relevant when Optimistic Batch Locking is used.
Cancel and CancelBatch methods—The Cancel method cancels modifications made to the current record; the CancelBatch method cancels pending changes to all modified records.
Resync method—Refreshes the Recordset object with data from the underlying data source. Invoking this method does not rerun the underlying command. Options exist for which records are actually refreshed.
Requery method—Unlike the Resync method, reruns the underlying command, which causes any pending changes to be lost. In effect, issuing a Requery is like invoking the Close method then immediately invoking the Open method.
Supports method—Specifies whether or not the recordset supports a function, based on a passed argument. For example, you can use this method to specify whether a recordset supports bookmarks, or the addition or deletion of records, or the Find, Update, and UpdateBatch methods, to name a few. Because what is supported is depends on the OLE DB provider used, it is a good idea to use this method to make sure a needed function is supported.
GetRows method—Returns a set of records into an array.
GetString method—Returns a set of records into a string.

The moral of the story is that before relying on the existence of anything in ADO, know and understand the OLE DB provider you are using, because the capabilities available to you can vary dramatically.

Lock types

There are four different locking schemes in ADO recordsets. These locking schemes are similar to those in Visual FoxPro.

Read-Only—As the name indicates, the recordset is opened for read-only purposes only. When you don’t need to modify data, this is the best locking scheme to use from a performance standpoint. This scheme applies to both server and client-side recordsets.
Lock Pessimistic—In this scheme, a lock attempt is attempted as soon as an edit is performed. This locking scheme is not relevant for client-side recordsets. Pessimistic Locking in an ADO recordset is like Pessimistic Locking with Row Buffering in a Visual FoxPro cursor.
Lock Optimistic—In this scheme, a lock attempt is made when the Update method is invoked. This locking scheme applies to both server and client-side recordsets. Optimistic Locking in an ADO recordset is like Optimistic Locking with Row Buffering in a Visual FoxPro cursor.
Lock Batch Optimistic—This scheme is like the Lock Optimistic scheme, except that more than one row of data is involved. In this scheme, a lock is attempted on modified records when the UpdateBatch method is invoked. This scheme is like Optimistic Locking with Table Buffering in a Visual FoxPro cursor.

The following table illustrates the availability of some common methods depending on the locking scheme used:

Table 3. Method Availability (Depending on Lock Type)Expand table

Lock Type	Cancel	CancelBatch	Update	UpdateBatch
Read Only	`4`
Pessimistic	`4`	`4`	`4`	`4`
Optimistic	`4`	`4`	`4`	`4`
Optimistic Batch	`4`	`4`	`4`	`4`

With the concepts of cursor types, locations, and locking schemes out of the way, we can discuss the real abilities of ADO recordsets. The most notable of these abilities are updating, sorting, and filtering of data. Before undertaking that discussion, however, take a few moments to review the Fields Collection object.

Fields collection object

Associated with the Recordset object, is the Fields Collection object. The Fields Collection object contains zero or more Field objects. The following code enumerates through the Fields Collection of a Recordset object:

For Each ofield In oRecordset.Fields
   With oField
      ?.Name,.Value,.Type,.DefinedSize
      ?.ActualSize,.NumericScale,.Precision
   EndWith
Next oField

The common Field properties you will work with:

Name—Specifies the name of the Field object. This corresponds to the name of the data element in the underlying data source. It is easy to define the name element as the name of the field in the underlying table. However, note that ADO and OLE DB work with both relational and nonrelational data. Given that, while you may be working with ADO, the underlying data may come from Outlook, Excel, Word, or Microsoft® Windows NT® Directory Services.
Value—Indicates the current value of the Field object.
OriginalValue—Indicates the Value property of the Field object before any modifications where made. The OriginalValue property returns the same value that would be returned by the OldVal( ) function in Visual FoxPro. When you invoke the Cancel or CancelUpdate methods of the Recordset object, the Value property of the Field object is replaced by the contents of the OriginalValue property. This behavior is similar to that exhibited when TableRevert( ) is issued against a Visual FoxPro cursor.
UnderlyingValue—Indicates the current value in the data source. This property corresponds most closely to the CurVal( ) function in Visual FoxPro. To populate the Value property of each Field object in the Fields collection, you need to invoke the Resync method of the Recordset object. With a client-side cursor, this property will return the same value as the OriginalValue property, since the recordset may or may not have an active connection.
Type—Indicates the data type of the Field object. The value of this property corresponds to a value contained in DataTypeEnum. Examples of values in DataTypeEnum are adBoolean, adInteger, and adVarChar.
Defined Size—Specifies the size of the field containing a data element in the data source. For example, in SQL Server, the Country field in the Customers table of the Northwind database is 15 characters long. Therefore, the DefinedSize property of the Country Field object is 15.
ActualSize—Represents the length of the actual data element in a datasource. To illustrate, consider the Country Field object again. In the case where the value is Germany, the ActualSize property is 7, while the DefinedSize property is still 15.
NumericScale—Specifies how many digits to the right of the decimal place are stored.
Precision—Specifies the maximum number of digits to be used for numeric values.

In addition to these properties, GetChunk is one interesting method you are likely to use. This method allows you to progressively fetch portions of the contents of a field object. This method is very useful when dealing with large text fields. It can be used only on fields where the adFldLong Bit set of the Attributes property is set to true (.T.). See the next section for details on the Attributes property. Understand that fields of the type ADLongVarChar have the adFldLong Bit set. The Notes field of the Employees table is of the type adLongVarChar.

The following code fetches data from the notes field in 10-byte chunks:

Local nBytesRead,cChunkRead 
nBytesRead = 0
cChunkRead = Space(0)
Do While .T.
   nBytesRead = nBytesRead + 10
   cChunkRead = oRecordset.Fields("notes").GetChunk(10)
   If IsNull(cChunkRead) Or;
    nBytesRead > oRecordset.Fields("notes").ActualSize
      Exit
   Else
      ?cChunkRead
   Endif   
EndDo

Successive calls to GetChunk continue where the previous call ended. The GetChunk method is very useful when you need to stream data or only need to see the first few characters of a large text field.

Along with GetChunk, examine the AppendChunk method. The first time this method is called for a field, it overwrites any data in the field. Successive calls then append the data, until pending edits are cancelled or updated. The following code illustrates how this method works:

For x = 1 To 100
   oRecordset.Fields("notes").AppendChunk(Str(x)+Chr(10)+Chr(13))
Next x

Both the GetChunk and AppendChunk methods are ideal for dealing with low memory scenarios.

The Attributes property

An attribute specifies the characteristics of something. As a person, you have many attributes, eye color, height, weight, and so forth. In the OOP world, objects have many attributes. Most of the time, attributes are exposed in the form of properties. A Visual FoxPro form has several properties such as Width, Height, and BackColor, just to name a few. The same is true for objects in ADO. Sometimes, however, it is not convenient to have a one-to-one correspondence between attributes and properties. Often, you can pack large amounts of information into a smaller space through the power of setting bits. A bit is much like a switch. It is either on or off or 1 or 0. If you string these bits together, you gain the ability to store multiple values in a small space. This is how the Attributes property works.

The Connection, Parameter, Field, and Property objects all have an Attributes property. If you have never worked with bit operations before, working with this property can be quite challenging. In some situations, as is the case with the GetChunk and AppendChunk methods, you will need to refer to the Attributes property of the Field object to determine whether those methods are available.

Using the Field object to illustrate how the Attributes property works, you can associate the following attributes with a Field object and its associated binary values:

AdFldMayDefer—Indicates that the field contents are retrieved only when referenced—0x00000002
adFldupdateable—Indicates that the field can be updated—0x00000004
adFldUnkownupdateable—Indicates that the provider does not know whether the field is updateable—0x00000008
adFldFixed—Indicates that the field contains fixed length data—0x00000010
adFldIsNullable—Indicates that the field can accept a null value during a write operation—0x00000020
adFldMayBeNullable—Indiates that the field may contain a null value—0x00000040
adFldlong—Indicates that the field contains long binary data and that the GetChunk and AppendChunk methods can be used—0x00000080
adFldRowID—Indicates that the field contains a row ID and cannot be updated. This does not relate to a field that may contain the identity value or some other auto-incrementing value. Rather, it relates to a ROW ID that is unique across the database. Oracle has this feature—0x00000100
adFldRowVersion—Indicates whether the field indicates the version of the row. For example, a SQL TimeStamp field may have this attribute set—0x00000200
adFldCachedDeferred—Indicates that once this field has been read, future references will be read from the cache—0x00001000

Usually, more than one of these attributes are present at any given time. Yet the Attributes property is a single value. Using the Employees table Notes field as an example, you will see that the Attributes property yields a value of 234. The value 234 represents the sum of the attributes for that field. For example, nullable and long attributes have decimal values of 32 and 128 respectively. This means that the Attributes property evaluates to 160. This works like the Windows Messagebox dialog box with regard to specifying the icon and types of buttons that are present.

Knowing that the Attributes property is a sum of the attributes of a Field object does not help in determining whether a specific attribute is present. This is where understanding bit operations comes in handy. The first step is to convert the sum (such as 234, above) into a binary equivalent:

11101010

Working from right to left, (or from the least significant bit to the most significant)—and beginning with zero, see that bits 1, 3, 5, 6, and 7 are set, (indicated by their values of 1 in those positions). Bits 0, 2, and 4 are not set. The next step is to determine whether a field is “long.”

To determine whether a field is a long field, we must first convert the adFldLong constant, which specifies which bit if set, indicates that the field is long. The adFldLong constant has a hex value of 0x00000080. This translates into a decimal value of 128. The following is the binary equivalent:

10000000

Converting a hex value to decimal in Visual FoxPro is simple. The following code illustrate how to convert hexadecimal values to decimal:

x = 0x00000080
?x && 128

And, if you ever need to convert back to hexadecimal:

?Transform(128,"@0") && 0x00000080

Using our original hex value, 11101010, and working from right to left and beginning with zero, see that the seventh bit is set. Therefore, the seventh bit of the Attributes property, if set, means the field is long. Going further, whatever attributes occupy bits 1, 3, 5, and 6, also apply to this field. The following table of field attributes should help to sort things out:

Table 4. Field AttributesExpand table

Hex Value	Decimal Value	Field Attribute Constant	Bit
0x00000002	2	AdFldMayDefer	1
0x00000004	4	AdFldupdateable	2
0x00000008	8	AdFldUnkownUpdateable	3
0x00000010	16	AdFldFixed	4
0x00000020	32	AdFldIsNullable	5
0x00000040	64	AdFldMayBeNull	6
0x00000080	128	AdFldLong	7
0x00000100	256	AdFldRowID	8
0x00000200	512	AdFldRowVersion	9
0x00001000	4096	AdFldCacheDeferred	12

So, along with being a long field, the field is deferred, updateable, can have a null written to it, and it may also already contain a null value. Visually, this makes sense. How can you do this programmatically?

If you refer to online examples (almost always programmed in Visual Basic), you will see code like this:

If (oField.Attribute AND adFldLong) = adFldLong 
   ' The field is long
End If

This is pretty slick in that you can test for whether a specific attribute bit is set by using the AND operator with the attribute property and the constant. If you try this in Visual FoxPro, you will get data type mismatch errors. Fortunately, there is a way. Visual FoxPro contains a host of bit functions. One function, BITTEST, does as its name implies. It tests whether a specified bit in a passed argument is set. To review, we need to see if the seventh bit in the value 234 is set. The following Visual FoxPro code demonstrates how to use the BITTEST function:

If BitTest(234,7)
   */ The Field is long
Endif

To find out if the field is nullable:

If BitTest(234,5)
   */ The Field is long
Endif

The Attributes property of the Connection, Parameter, and Property objects works in the same manner as illustrated above. The differences are the names and quantity of attributes that are present.

ADO and COM defined constants

ADO and OLE DB, like any COM components, make extensive use of defined constants in the examples that document the usage of properties, events, and methods. Other development environments in Visual Studio such as Visual Basic and Visual Interdev provide IntelliSense technology, because of their respective abilities to interact directly with the type libraries of COM components. For these development environments, you can reference defined constants just as if they were a part of the native language. So, working with published examples is a fairly trivial task. On the other hand, in the Visual FoxPro development environment there is, in fact, a bit of a challenge. The question always seems to be “How can I use the Visual Basic samples in Visual FoxPro?” The biggest stumbling block is usually in finding the value of the defined constants. In Visual FoxPro, you need to use the #Define statement for each constant.

One solution for obtaining the value of the ADO defined constants is to obtain the MDAC SDK from Microsoft. The MDAC SDK can be downloaded from https://www.microsoft.com/data/download.htm.

Once you install the SDK, locate the Include\ADO directory. In that directory, you will find the ADOINT.H file, which contains all of the enumerated types and the values for the defined constants.

A second, and perhaps easier, solution is to use the resources already installed on your machine. If you are working through the sample code in this paper, you already have the Microsoft Data Access Components installed on your workstation. The Visual Basic Development Environment (both the full Visual Basic IDE and the Visual Basic Editor in desktop applications like Word and Excel) has a great resource called the Object Browser. This could, in fact, be the most underutilized tool on the planet.

To illustrate its functionality, open any desktop application that uses Visual Basic, such as Word or Excel. Or, if you have the Visual Basic Programming System installed, you can open that as well.

If you opened a VBA application

From the View menu, choose Toolbars.
From the Toolbars menu, choose Visual Basic.
On the Visual Basic toolbar, click Visual Basic Editor.
From the Tools menu, choose References.
Check the Microsoft Data Access Objects 2.x Library.

If you opened the Visual Basic IDE

Create an empty project.
From the Project menu, select References.
Check the Microsoft Data Access Objects 2.x Library.

Now, whether you are in the VBA Editor or the VB IDE

Press F2 to display the Object Browser.
In the first combo box, select ADODB.
In the second box, type ADVARCHAR.
Press Search
or
Press Enter.

Clearly, the Object Browser is a powerful tool for the developer who works with COM components. Not only are the defined properties, events, and methods accessible in the Object Browser, so also are the defined constants and their respective values. Notice the value of adVarChar in the lower pane of the Object Browser.

Opening, sorting, and filtering data

One of the big advantages of using a development platform such as Visual FoxPro is its local data engine. Not only does the engine provide superior query performance, but it also provides some very flexible capabilities when it comes to both working with and presenting data. There isn’t a Visual FoxPro application that fails to sort or filter data to some degree. In Visual FoxPro, sorting is accomplished by creating a set of index tags for a table. Filtering is accomplished by using the Set Filter command. Fortunately, ADO has these capabilities as well.

You can see in the Field Attribute table that the availability of features depends on the location in which the recordset is created. It is clear that we must ensure that a client-side recordset is created.

For example, create a Connection object to the TasTrade or SQL Server Northwind database. The following code assumes that the Connection object, oConnection, has been created before you open the Recordset object.

First, we need to implement a few required #Defines:

#Define adUseClient   3
#Define adLockBatchOptimistic   4
#Define adCmdTable   2

For SQL Server:

With oRecordset
   .Source = "Customers"
   .ActiveConnection = oConnection
   .CursorLocation = adUseClient
   .LockType = adLockBatchOptimistic
   .Open
EndWith

oRecordset.Open("Customers",;
                 oConnection,;
                 adUseClient,;
                 adLockBatchOptimistic)

For Visual FoxPro:

With oRecordset
   .ActiveConnection = oConnection
   .Source = "Customer"
   .CursorType = adOpenStatic
   .LockType = adLockReadOnly
   .CursorLocation = adUseClient
   .Open(,,,,adCmdtable)
EndWith

oRecordset.Open("Customer",;
                 oConnection,;
                 adUseClient,;
                 adLockBatchOptimistic,;
                 adCmdTable)

With oRecordset
   .ActiveConnection = oConnection
   .Source = "Select * From Customer"
   .CursorType = adOpenStatic
   .LockType = adLockReadOnly
   .CursorLocation = adUseClient
   .Open
EndWith

oRecordset.Open("Select * From Customer",;
                 oConnection,;
                 adUseClient,;
                 adLockBatchOptimistic)

SQL Server and Visual FoxPro open data differently. Remember that when using SQL Server, you are using the OLE DB Provider for SQL Server. When you access data in Visual FoxPro, use the OLE DB Provider for ODBC, since there is no native OLE DB provider for Visual FoxPro.

The difference rests with the optional fifth argument of the Open method. The SQL Server OLE DB Provider is designed to recognize when you pass just a table name. With the ODBC OLE DB Provider, you must specify how it should interpret the Source property. By default, the ODBC OLE DB Provider expects a SQL statement. When you pass a SQL statement, there is no need to explicitly state how the provider should interpret things. The Visual FoxPro ODBC driver generates an “Unrecognized Command Verb” error message if you only specify a table name as the source and you fail to use the optional fifth argument. Note that if you use the ODBC OLE DB Provider to access SQL Server, you must employ the same technique that is needed for Visual FoxPro.

Which method should you employ when you populate the properties individually before invoking the Open method or passing the arguments to the Open method? Once again, it is a matter of preference. Of the two, manually populating the properties makes for more readable code.

Sorting and filtering data are just matters of manipulating the Sort and Filter properties respectively. The following code sorts the recordset created from TasTrade in the example above, by country, ascending, then by region, descending:

oRecordset.Sort = "Country,Region Desc"

The following code displays the sort and the functionality of the AbsolutePosition and Bookmark properties.

oRecordset.MoveFirst
Do While Not oRecordset.Eof
   With oRecordset
      ?.Fields("country").Value,;
       .Fields("region").Value,;
       .AbsolutePosition,;
       .Bookmark 
       .MoveNext
   EndWith
EndDo

Setting a filter is as easy as setting the sort. The following code filters for records where the country is Germany:

oRecordset.Filter = "Country = 'Germany'"

The Filter property also supports multiple values:

oRecordset.Filter = "Country = 'Germany' Or Country = 'Mexico'"

Finally, wild card characters are also supported:

oRecordset.Filter = "Country Like 'U*'"

To reset either the Filter or Sort properties, set them equal to an empty string:

oRecordset.Sort = ""
oRecordset.Filter = ""

Finding data

Another important capability of an ADO recordset is the ability to find records based on a search string. This capability works like searching for records in a Visual FoxPro cursor. Unlike the Seek or Locate statement in Visual FoxPro, the Find method provides control over the scope of records that are searched. The following code searches for a country that begins with the letter “B.”

oRecordset.Find("country Like 'B%'")

Although multiple criteria are not allowed, wild card searches are permitted:

oRecordset.Find("country Like 'U*'")

Searches for multiple criteria, such as the following, would result in an error:

oRecordset.Find("country Like 'G*' Or country Like 'B*'")

Transactions/updating data/conflict resolution

Updating data in an ADO recordset is a fairly simple process. As in any environment, conflict resolution in multi-user environments is always an issue to be dealt with. This is where the Errors collection comes into play. Error trapping and handling needs to become an integral part of your ADO-related code. The following code samples employ a simple error handling scenario and use the Errors collection to determine whether conflicts have occurred. For a complete list and description of ADO error codes, consult the online documentation.

When you update data, you can update either a single row, or several rows at a time in batch mode. These methods most closely correspond to row and table buffering, respectively, in Visual FoxPro. Building on the recordset already created, the lock type is Batch Optimistic. While updates are normally conducted in batches, you can also update one row at a time, just as in Visual FoxPro.

The following code modifies the CompanyName field and attempts to update the SQL Server data source:

oRecordset.Fields("companyname").Value = "Ace Tomato Company"
oRecordset.Update

Depending on a variety of scenarios, this code may or may not work. Perhaps a contention issue exists? Perhaps the user does not have rights to modify data. Hundreds of issues can cause an attempted update to fail. Therefore, anytime you attempt an update, you should employ error trapping. The following code expands the previous example and makes it a bit more robust:

Local Err,cOldErr,oError
cOldError = On("Error")
On Error Err = .T.
oRecordset.Fields("companyname").Value = "Ace Tomato Company"
oRecordset.Update
If Err
   For Each oError In oRecordset.ActiveConnection.Errors
      With oError
         ?.Number,.Description
      EndWith
   Next oError
Endif
On Error &cOldErr

If you are thinking, “Hey, maybe I should write a wrapper class to better encapsulate and centralize code,” you’re on the right track. The following code creates a custom class that can serve as a starting point:

Local oRecordsetHandler
oRecordsetHandler = CreateObject("RecordsetHandler")
oRecordset.Fields("companyname").Value = "Alfreds Futterkiste"
If !oRecordsetHandler.Update(oRecordset)
   oRecordsetHandler.Cancel(oRecordset)
Endif
Define Class RecordsetHandler As Custom
   Protected oRecordset
   Protected ErrFlag
   
   Procedure Update(oRecordset)
      This.oRecordset = oRecordset
      oRecordset.UpdateBatch
      Return !This.ErrFlag
   EndProc
   Procedure Cancel(oRecordset)
      This.oRecordset = oRecordset
      oRecordset.Cancel
      Return !This.ErrFlag
   EndProc
   
   Procedure Error(nError, cMethod, nLine)
      Local oError
      For Each oError In This.oRecordset.ActiveConnection.Errors
         With oError
            ?.Number,.Description
         EndWith
      Next oError
      This.ErrFlag = .T.
   EndProc
EndDefine

There’s a better way to determine whether an update proceeded successfully. The preferred approach is to trap events that ADO fires. Visual FoxPro by itself does not surface COM Events. Fortunately, the new VFPCOM.DLL component provides this capability to Visual FoxPro. The previous example can be modified to show how using COM Events makes for more robust code and class design.

Now we can improve the code of our example. Most of the time, for efficiency, you will want to batch your updates that comprise multiple records. Often, when you update multiple records, transaction processing is required. In other words, either updates to all records must succeed or none should occur. To illustrate, let’s say you must apply a 10 percent price increase to the products you sell. The prime requirement is that all records in the Products table need modification. Without transactional capabilities, the possibility exists that, for example, after the first 10 records are updated, an error generated on the eleventh record prevents a complete update. Transaction processing provides the ability to rollback changes.

The following example incorporates error trapping and the three transaction methods of the Connection object:

Local Err,cOldErr
cOldErr = On("error")
On Error Err = .T.
oRecordset.ActiveConnection.BeginTrans
Do While !oRecordset.Eof
   If Err
      Exit
   Else
      With oRecordset
         .Fields("unitprice").Value = ;
            .Fields("unitprice").Value * 1.1
         .Movenext
      EndWith   
   Endif
EndDo
oRecordSet.UpdateBatch
If Err
   oRecordset.ActiveConnection.RollBackTrans
   oRecordset.CancelBatch
Else
   oRecordset.ActiveConnection.CommitTrans
Endif   
On Error &cOldErr

Additional operations you are likely to employ with recordsets deal with adding new records and deleting existing records. Both of these processes are very simple. The following code adds a new record:

oRecordset.AddNew

As in Visual FoxPro, in ADO the new record becomes current. Once the AddNew method is invoked, the field can be populated and, depending on the LockType, you then invoke either the Update or UpdateBatch methods to modify the data source.

Deleting records is just as easy. The following code deletes the current record:

oRecordset.Delete

Once again, after deleting the record, a call to Update or UpdateBatch will update the data source.

SQL Server identity fields and parent/child relationships

SQL Server, like most server RDBMSs and Microsoft® Access®, creates an auto-incrementing field that can serve as a primary key for a table. Typically, the data type for this field is Integer. In SQL Server, this type of field is called the Identity field. Fields of this type are read-only. It begs the question, “When adding records, how can one determine what these values are?” Knowing that the next generated value is a requirement for maintaining referential integrity when child tables are involved. The following example code shows a recordset in which the first field, ID, is the auto-incrementing field. After new field is added, checking the value of the ID field yields a character with a length of zero. Attempting to update the field results in an error. However, once the recordset is updated, checking the value again will yield a valid identity value.

oRecordset.AddNew
?oRecordset.Fields("id").Value && empty string
oRecordset.UpdateBatch
?oRecordset.Fields("id").Value && returns new identity value

With the new identity value available, you can add records in child tables, using the identity value in the parent table as the foreign key in the child tables.

But, what do you do in cases where you have disconnected recordsets?

This section details an important capability in ADO—the ability to have recordsets without an active connection to the backend data source. At this point you can freely add new records to disconnected records. When the recordset is eventually reconnected, those newly added records are then sent to the backend data source. How do you know what the identity value will be in those cases? Simply put, you don’t know. At the same time, however, you still need to be able to add both parent and child records locally. You need some method that maintains the relationship locally, while at the same time, supporting the use of the identity value when the data is sent to the backend.

The simplest solution to this problem is to include a field in each table that serves as the local ID. You need this extra field because the identity field will be read-only. On the client side, you can use several methods for producing an ID that is unique. One approach is to use the Windows API to fetch the next Global Unique Identifier (GUID). The following procedure outlines how the local process unfolds:

Add a new parent record.
Fetch the next GUID.
Update the local primary key column with the GUID.
Add a new child record.
Update the local primary key column with the GUID.
Update the foreign key column of the child with the GUID from its parent.

At some point, you will reconnect to the server. The update process could be performed within the context of a transaction, done one row at a time by navigating through each record. Checking the recordset Status property, which indicates whether the current record has been newly created, modified, deleted, and so on, determines whether the current row should be sent back to the server. If the record should be sent back, the parent record can be updated via the UpdateBatch method. The UpdateBatch method accepts an optional argument that specifies that only the current record be updated. By default, UpdateBatch works on all records. If the value of one is passed—corresponding to the adAffectCurrent constant—only the current record is updated. Once the update occurs, the identity value generated by the server is available. This value would then be used to update the foreign key columns of any related children. Once that process is complete, the records for that parent would be sent back to the server as well. This same process would be used if grandchild and great-grandchild relationships also existed.

The following Visual FoxPro code, from Visual FoxPro 6 Enterprise Development, by Rod Paddock, John V. Petersen, and Ron Talmage (Prima Publishing), illustrates how to generate a GUID:

Local oGuid
oGuid = CreateObject("guid")
?oGuid.GetNextGuid( )
*/ Class Definition
Define Class guid AS Custom
  */ Create protected members to hold parts of GUID
  Protected data1
  Protected data2
  Protected data3
  Protected data4
  Procedure GetNextGuid
    */ The only public member. This method will return the next GUID
    Local cGuid
    cGuid = This.Export( )
    UuidCreate(@cGuid)         
    This.Import(cGuid)            
    cGuid = This.Convert(cGuid)
    Return cGuid
  EndProc
  Protected Procedure bintoHex(cBin)
    */ This method converts a binary value to Char by calling the Hextochar

    */ Method
    Local cChars, nBin
    cChars = ""
    For nDigit = 1 To Len(cBin)
      nBin = Asc(Substr(cBin, nDigit, 1))
      cChars = cChars + This.Hex2Char(Int(nBin/16)) + ;
        This.Hex2Char(Mod(nBin,16))
    EndFor
    Return(cChars)
  EndProc
  Protected Procedure hex2char(nHex)
    */ This method converts a hex value to  ASCII 
    Local nAsc
    Do Case
      Case Between(nHex,0,9)
        nAsc = 48 + nHex
      Case Between(nHex,10,15)
        nAsc = 65 + nHex - 10
    EndCase
    Return(Chr(nAsc))
  EndProc
  Procedure import(cString)
    */ This method takes the binary string and populates the 4 data
    */ properties
    With This
      .Data1 = Left(cString, Len(.Data1))
      cString = SubStr(cString, Len(.Data1)+1)
      .Data2 = Left(cString, Len(.Data2))
      cString = SubStr(cString, Len(.Data2)+1)
      .Data3 = Left(cString, Len(.Data3))
      cString = SubStr(cString, Len(.Data3)+1)
      .Data4 = Left(cString, Len(.Data4))
    EndWith
    Return cString
    EndProc

  Protected Procedure export
    */ This method creates the buffer to pass to the GUID API.
    With This
      .Data1 = Space(4)
      .Data2 = Space(2)
      .Data3 = Space(2)
      .Data4 = Space(8)
    EndWith
    Return(This.Data1 + This.Data2 + This.Data3 + This.Data4)
  EndProc
  Protected Procedure Convert(cGuid)
    */ This method makes the call to the BinToHex that 
    */ converts the data in the 4 data properties from 
    With This
      cGuid =  .BinToHex(.Data1) + "-" + .BinToHex(.Data2) + "-" + ;
        .BinToHex(.Data3) + "-" + .BinToHex(.Data4)
      Return cGuid
    Endwith 
    EndProc
  Procedure Init
    */ Declare the function in the DLL
    Declare Integer UuidCreate ;
      In C:\Winnt\System32\RPCRT4.DLL String @ UUID
    Return
  EndProc
EndDefine

Output is produced as follows:

Disconnected/Persisted Recordsets

One of the most powerful features of ADO is the ability to create both disconnected and persisted recordsets. A disconnected recordset is a client-side recordset that does not have a current ActiveConnection. SQL data sources, such as SQL Server, Oracle, and so on, are licensed according to the number of concurrent connections. For example, the number of people that using an application connected to SQL Server is 300. However, it has been determined that at any time, only 50 users actually use the services of a connection. A connection is needed only when data is being requested, updates are made, or a stored procedure on the database server is invoked. From a financial standpoint, it is far less expensive for a company to only purchase 50 licenses than to purchase 300. From a resource standpoint, performance should improve because the server only has the overhead of 50 connections instead of 300, of which 250 are idle at any time.

Using the ADO recordset of customer data already created, the following code disconnects the client-side recordset:

oRecordSet.ActiveConnection = Null

If you attempt to do this with a server-side recordset, an error occurs stating that the operation is not allowed on an open recordset. Once the recordset is disconnected, you can continue to work with and modify records. The following code will work:

oRecordset.MoveFirst
Do While !oRecordset.Eof
   ?oRecordset.Fields("companyname").Value
   oRecordset.Fields("companyname").Value = ;
      Upper(oRecordset.Fields("companyname").Value)
   oRecordset.MoveNext 
EndDo

With modified records in a client-side recordset, three basic options exist.

Cancel local changes
Marshall local changes to the server
Save (persist) the recordset locally.

You can save (persist) the recordset locally for both later use and, ultimately, for marshalling those persisted changes back to the server.

The first choice is pretty simple to implement, since it takes one line of code:

oRecordset.CancelBatch

The second choice is also simple to implement. Much of the work in updating multiple records and transactions has already been detailed. This procedure really involves two separate steps:

Re-establish an active connection.
Marshal modified records back to the data source.

The following code re-establishes the connection:

With oConnection
   .Provider = "SQLOLEDB.1"
   .ConnectionString = "Persist Security Info=False;User 
      ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith
oRecordset.Activeconnection = oConnection

Then the code marshals the records by attempting the updates

Local Err,cOldErr
cOldErr = On("error")
On Error Err = .T.
With oRecordset
   .ActiveConnection.BeginTrans
   .UpdateBatch
   If Err
      .ActiveConnection.RollBackTrans
      .CancelBatch
   Else
      .ActiveConnection.CommitTrans
   Endif
EndWith   
   On Error &cOldErr

Often, however, there’s a need to shut things down and then reopen the recordset at another time. To be effective, the recordset must reflect incremental changes. This cycle may repeat any number of times.

To illustrate how to persist a recordset, consider again the following code that modifies records in a Recordset object:

oRecordset.MoveFirst
Do While !oRecordset.Eof
   ?oRecordset.Fields("companyname").Value
   oRecordset.Fields("companyname").Value = ;
      Upper(oRecordset.Fields("companyname").Value)
   oRecordset.MoveNext 
EndDo

Now you can invoke the Save method to persist the recordset:

oRecordset.Save("c:\temp\customers.rs")

At a later time, you can open the persisted recordset:

oRecordset = CreateObject("adodb.recordset")
oRecordset.Open("c:\temp\customers.rs")

After the persisted recordset is reopened, you can use the same code, which establishes a connection to a disconnected recordset, to make additional modifications. You can marshal changes made in the persisted recordset to the underlying data source.

Hierarchical/Shaped Recordsets

Visual FoxPro not only provides the ability to work with local data, but also the ability to set up relations using the Set Relation command. When you move the record pointer in the parent table, the record pointer automatically moves in any child tables that exist. This makes working with and building interfaces for one to many relationships very simple in Visual FoxPro. Fortunately, the same capability exists in ADO, in the form of hierarchical recordsets, also referred to as shaped recordsets.

There are two necessary components when creating and working with hierarchical recordsets:

The Microsoft DataShape OLE DB Provider, MSDataShape
The Shape language, a superset of the SQL syntax

The first requirement is fairly easy to fulfill because it only entails setting the Provider property of the ADO Connection object to the proper value:

oConnection.Provider = "MSDataShape"

The second requirement, using the Data Shape language, is a bit more challenging. When you first see Data Shape language, it can be fairly intimidating, just as FoxPro may have been when you first worked with it. But like anything else, with a bit of practice and patience, Microsoft Data Shape language will become second nature.

To examine Shape language, consider a parent-child common scenario of customers and orders. For each customer, zero or more orders can exist. In turn, each order can contain one or more line items. The following code employs Shape syntax to relate customers and orders in the SQL Server Northwind database:

SHAPE {SELECT * FROM "dbo"."Customers"} AS Customers APPEND ({SELECT * 
FROM "dbo"."Orders"} AS Orders RELATE "CustomerID" TO "CustomerID") AS 
Orders

If your first thought is, “Gee, this is like setting relations in Visual FoxPro,” you are indeed correct. It is exactly the same principle. If the Shape syntax is broken down, the task becomes manageable. The first clause in the code begins with the keyword SHAPE, to signify that what follows is not pure SQL, but rather, Data Shape language. The Data Shape language is a super-set of SQL, which is why you need to use MSDataShape as the OLE DB provider. MSDataShape can interpret and execute Shape commands. Finally, the last portion of the first command specifies that the results of the SQL statement are to be aliased as Customers.

In the next set of commands, things get a bit complicated, especially when the hierarchy is nested an additional one or two levels (this is the case when order details are added, as we’ll do in the next example).

You can interpret the keyword APPEND as “Append the results of the next SQL statement to the results of the previous SQL statement.” Of course, just appending records won’t suffice. Rather, you must provide a rule that specifies how the records are to be related. This is where the RELATE keyword comes into play.

You can interpret the RELATE keyword as, “When appending records, do so based on these join fields.” In this case, the join is between the CustomerID column in the Customers table and the CustomerID column in the Orders table.

Finally, we need to alias the data that was just appended as Orders. The following code sets up the objects and creates the hierarchical recordset:

#Include adovfp.h
Local oRecordset,oConnection,oCommand, cShpStr
oRecordset = CreateObject("adodb.recordset")
oConnection = CreateObject("adodb.connection")
cShpStr = 'SHAPE {SELECT * FROM "dbo"."Customers"} AS Customers '
cShpStr = cShpStr + 'APPEND ({SELECT * FROM "dbo"."Orders"} ;
  AS  Orders '
cShpStr = cShpStr + 'RELATE "CustomerID" TO "CustomerID") AS Orders'
With oConnection
   .Provider = "MSDataShape"
   .ConnectionString = "Data Provider=SQLOLEDB.1;Persist Security ;
    Info=False;User ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith
With oRecordset
   .ActiveConnection = oConnection
   .Source = cShpStr
   .CursorType = adOpenStatic
   .LockType = adLockBatchOptimistic
   .CursorLocation = adUseClient
   .Open
EndWith

The question at this point is, “How is the data appended?” The technique is rather clever. When you append a recordset to another recordset, you do so through a Field object. If you query the Count property of the Fields collection, you discover that the value of 12 is returned. However, in SQL Server, you see that the Customers table only has 11 fields. The twelfth field, in this case, is actually a pointer to the Orders recordset. The rows in the Orders recordset for a given row in the Customers recordset are only those for that customer. The following code illustrates just how powerful hierarchical recordsets are:

oRecordset.MoveFirst
Do While !oRecordset.Eof
   With oRecordset
      ?.Fields("Customerid").Value,.Fields("CompanyName").Value
   EndWith
   oOrders = oRecordset.Fields("orders").Value
   Do While !oOrders.Eof
      With oOrders
      ?Chr(9),.Fields("Customerid").Value,.Fields("orderdate").Value
      .MoveNext
      EndWith   
   EndDo   
   oRecordset.MoveNext
EndDo

With the basics of hierarchical recordsets out of the way, we can turn our attention to a more complicated, real-life example. The following example adds several dimensions to the recordset.

First, the Order Details table is appended to the Orders child recordset. In this case, a new field that will in turn point to the OrderDetails recordset, is added to the Orders recordset. The Products table is then appended to the OrderDetails recordset providing three levels of nesting. Appended to the Products recordset are two tables, Categories and Suppliers. Traversing up the hierarchy to the Orders recordset appends the Employees table.

This list illustrates the hierarchy and shows all the tables involved as well as the nesting scheme. When creating reports, it is quite possible that you will need all of these tables. The ability to relate tables in this fashion and the ability to display the data in a user interface or a report have always been true powers of Visual FoxPro. Before ADO, attempting all this work outside Visual FoxPro was extremely difficult, sometimes bordering on the impossible.

Customers

Orders

OrderDetails
   Products
      Categories
      Suppliers
Employees
   EmployeeTerritories
      Territories
         Region
Shippers

The following is the Shape syntax to create the hierarchical recordset:

SHAPE {SELECT * FROM "dbo"."Customers"} AS Customers APPEND (( SHAPE 
{SELECT * FROM "dbo"."Orders"} AS Orders APPEND (( SHAPE {SELECT * FROM 
"dbo"."Order Details"} AS OrderDetails APPEND (( SHAPE {SELECT * FROM 
"dbo"."Products"} AS Products APPEND ({SELECT * FROM "dbo"."Categories"} 
AS Categories RELATE 'CategoryID' TO 'CategoryID') AS Categories,({SELECT 
* FROM "dbo"."Suppliers"} AS Suppliers RELATE 'SupplierID' TO 
'SupplierID') AS Suppliers) AS Products RELATE 'ProductID' TO 
'ProductID') AS Products) AS OrderDetails RELATE 'OrderID' TO 'OrderID') 
AS OrderDetails,(( SHAPE {SELECT * FROM "dbo"."Employees"} AS Employees 
APPEND (( SHAPE {SELECT * FROM "dbo"."EmployeeTerritories"} AS 
EmployeeTerritories APPEND (( SHAPE {SELECT * FROM "dbo"."Territories"} 
AS Territories APPEND ({SELECT * FROM "dbo"."Region"} AS Region RELATE 
'RegionID' TO 'RegionID') AS Region) AS Territories RELATE 'TerritoryID' 
TO 'TerritoryID') AS Territories) AS EmployeeTerritories RELATE 
'EmployeeID' TO 'EmployeeID') AS EmployeeTerritories) AS Employees RELATE 
'EmployeeID' TO 'EmployeeID') AS Employees,({SELECT * FROM 
"dbo"."Shippers"} AS Shippers RELATE 'ShipVia' TO 'ShipperID') AS 
Shippers) AS Orders RELATE 'CustomerID' TO 'CustomerID') AS Orders

This is just about as complicated as it gets. Nobody in their right mind would want to hammer this code out manually. Fortunately, there is a visual way to build this code. The DataEnvironment designer that ships with Visual Basic allows you to visually design ADO connections, recordsets, and hierarchical recordsets. The following illustrates how this hierarchical recordset appears in the designer:

The extensive Shape syntax can be copied and pasted into Visual FoxPro, or any other environment that can host ADO. For complete details on how to use the DataEnvironment designer, consult the Visual Basic documentation on the MSDN CDs that ship with Visual Studio.

The following Visual FoxPro code traverses the hierarchical recordset and displays the data:

#Include adovfp.h
oRecordset = CreateObject("adodb.recordset")
oConnection = CreateObject("adodb.connection")
cShpStr = 'SHAPE {SELECT * FROM "dbo"."Customers"}  AS Customers APPEND'
cShpStr = cShpStr + '(( SHAPE {SELECT * FROM "dbo"."Orders"}  AS Orders '
cShpStr = cShpStr  + 'APPEND (( SHAPE {SELECT * FROM "dbo"."Order 
Details"}  AS OrderDetails '
cShpStr = cShpStr  + 'APPEND (( SHAPE {SELECT * FROM "dbo"."Products"}  
AS Products '
cShpStr = cShpStr  + 'APPEND ({SELECT * FROM "dbo"."Categories"}  AS 
Categories '
cShpStr = cShpStr  + 'RELATE "CategoryID" TO "CategoryID") AS 
Categories,'
cShpStr = cShpStr  + '({SELECT * FROM "dbo"."Suppliers"}  AS Suppliers '
cShpStr = cShpStr  + 'RELATE "SupplierID" TO "SupplierID") AS Suppliers) 
AS Products '
cShpStr = cShpStr  + 'RELATE "ProductID" TO "ProductID") AS Products) AS 
OrderDetails '
cShpStr = cShpStr  + 'RELATE "OrderID" TO "OrderID") AS OrderDetails,'
cShpStr = cShpStr  + '(( SHAPE {SELECT * FROM "dbo"."Employees"}  AS 
Employees '
cShpStr = cShpStr  + 'APPEND (( SHAPE {SELECT * FROM 
"dbo"."EmployeeTerritories"}  AS EmployeeTerritories '
cShpStr = cShpStr  + 'APPEND (( SHAPE {SELECT * FROM "dbo"."Territories"}  AS Territories '
cShpStr = cShpStr  + 'APPEND ({SELECT * FROM "dbo"."Region"}  AS Region '
cShpStr = cShpStr  + 'RELATE "RegionID" TO "RegionID") AS Region) AS 
Territories '
cShpStr = cShpStr  + 'RELATE "TerritoryID" TO "TerritoryID") AS 
Territories) AS EmployeeTerritories '
cShpStr = cShpStr  + 'RELATE "EmployeeID" TO "EmployeeID") AS 
EmployeeTerritories) AS Employees '
cShpStr = cShpStr  + 'RELATE "EmployeeID" TO "EmployeeID") AS Employees,'
cShpStr = cShpStr  + '({SELECT * FROM "dbo"."Shippers"}  AS Shippers '
cShpStr = cShpStr  + 'RELATE "ShipVia" TO "ShipperID") AS Shippers) AS 
Orders '
cShpStr = cShpStr  + 'RELATE "CustomerID" TO "CustomerID") AS Orders '
With oConnection
   .Provider = "MSDataShape"
   .ConnectionString = "Data Provider=SQLOLEDB.1;Persist Security 
Info=False;User ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith
With oRecordset
   .ActiveConnection = oConnection
   .Source = cShpStr
   .CursorType = adOpenStatic
   .LockType = adLockBatchOptimistic
   .CursorLocation = adUseClient
   .Open
EndWith
Do While !oRecordset.Eof
   With oRecordset
      ?.Fields("CustomerID").Value,.Fields("CompanyName").Value
   EndWith
   oOrders =  oRecordset.Fields("orders").Value
   Do While !oOrders.Eof
      oShippers = oOrders.Fields("shippers").Value
      oEmployee = oOrders.Fields("employees").Value
      oEmployeeTerritories = 
oEmployee.Fields("employeeterritories").Value
      oTerritories = oEmployeeTerritories.Fields("territories").Value
      oRegion = oTerritories.Fields("region").Value
      ?"Order ID:  ",oOrders.Fields("orderid").Value,;
      "Order Date:  ",oOrders.Fields("orderdate").Value
      oOrderDetails = oOrders.Fields("orderdetails").Value
      ?"Territory:  ", 
oTerritories.Fields("territorydescription").Value,;
      "Region:  ",oRegion.Fields("RegionDescription").Value
      ?"Shipper: ",oShippers.Fields("companyname").Value
      oEmployee = oOrders.Fields("employees").Value
      With oEmployee
         ?"Employee: ",.Fields("employeeid").Value,;
         .Fields("firstname").Value + " " + .Fields("lastname").Value
      EndWith   
      ?"Order Details:  "
      Do While !oOrderDetails.Eof
         oProducts = oOrderDetails.Fields("Products").Value
         oCategories = oProducts.Fields("categories").Value
         oSuppliers = oProducts.Fields("suppliers").Value
         ?Chr(9),;
          oProducts.Fields("productname").Value,;
          oSuppliers.Fields("companyname").Value,;
          oCategories.Fields("categoryname").Value,;
          oOrderDetails.Fields("Quantity").Value,;
          oOrderDetails.Fields("UnitPrice").Value
          oOrderDetails.MoveNext
      EndDo
      oOrders.MoveNext
   EndDo   
   oRecordset.MoveNext
EndDo

The output appears as follows:

Because a hierarchy exists, the ability to create drill-down interfaces becomes a fairly simple task. The preceding Visual FoxPro code illustrates how to traverse the hierarchy.

Perhaps you want to use Microsoft Word or Excel as a reporting tool. With a combination of Visual FoxPro COM servers, ADO, and Automation, the process becomes manageable. The first and third parts of the solution have been around. However, only now that a set of COM objects exists to handle and work with data as Visual FoxPro does natively can the solution become a reality.

Hierarchical recordsets and recursive relationships

One of the nice features of SQL Server, and of most other server back ends is provision for recursive relations. The following is the SQL Server 7.0 database diagram for the Northwind database:

In the Northwind database, the Employees table employs recursion to support a manager/staff relationship. Both managers and staff are employees. In some cases, it happens that some employees report to other employees. In Visual FoxPro, you can create the same sort of relation by opening a table twice using two different aliases. In ADO, the task is totally supported and is quite easy to implement. The following is the Shape syntax:

SHAPE {SELECT * FROM "dbo"."Employees"}  AS Managers APPEND ({SELECT * 
FROM "dbo"."Employees"}  AS Staff RELATE 'EmployeeID' TO 'ReportsTo') AS 
Staff

The following Visual FoxPro code displays a list of managers and the staff that reports to each manager:

#Include adovfp.h
oRecordset = CreateObject("adodb.recordset")
oConnection = CreateObject("adodb.connection")
cShpStr = 'SHAPE {SELECT * FROM "dbo"."Employees"}  AS Managers '
cShpStr = cShpStr + 'APPEND ({SELECT * FROM "dbo"."Employees"} AS Staff '
cShpStr = cShpStr + 'RELATE "EmployeeID" TO "ReportsTo") AS Staff '
With oConnection
   .Provider = "MSDataShape"
   .ConnectionString = "Data Provider=SQLOLEDB.1;Persist Security 
Info=False;User ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith
With oRecordset
   .ActiveConnection = oConnection
   .Source = cShpStr
   .CursorType = adOpenStatic
   .LockType = adLockBatchOptimistic
   .CursorLocation = adUseClient
   .Open
EndWith
Do While !oRecordset.Eof
   oStaff = oRecordset.Fields("staff").Value
   If oStaff.Recordcount > 0
      With oRecordset
         ?.Fields("firstname").Value + " " + ;
         .Fields("lastname").Value ,;
         .Fields("Title").Value
         Do While !oStaff.Eof
            With oStaff
               ?Chr(9),;
               .Fields("firstname").Value + " " + ;
               .Fields("lastname").Value ,;
               .Fields("Title").Value
            EndWith
            oStaff.MoveNext
         EndDo
      EndWith
   Endif
   oRecordset.MoveNext
EndDo

The output appears as follows:

Finally, note that hierarchical recordsets are updateable. The following code expands the previous example to illustrate how to make a simple update:

Do While !oRecordset.Eof
   oStaff = oRecordset.Fields("staff").Value
   If oStaff.Recordcount > 0
      With oRecordset
         Do While !oStaff.Eof
            With oStaff
               .Fields("firstname").Value = ;
                  Upper(.Fields("firstname").Value)
               .Fields("lastname").Value = ;
                  Upper(.Fields("lastname").Value)
               .Fields("Title").Value = ;
                  Upper(.Fields("Title").Value)
            EndWith
            oStaff.MoveNext
         EndDo
         */ Write changes to Staff recordset
         oStaff.UpdateBatch
      EndWith
   Endif
   oRecordset.MoveNext
EndDo

The ability to view related records, coupled with the ability to make updates, places the ADO hierarchical recordset capability on par with similar capabilities in Visual FoxPro.

Multiple recordsets

Use of hierarchical recordsets represents only one method for returning data from multiple recordsets in one object. For starters, building hierarchical recordsets is not the most straightforward of propositions. In many cases, a simpler alternative may be all that is required.

Consider the case where you need a specific customer record and the orders for that customer. Yes, you could use a hierarchical recordset. But, there is a simpler way: run two SQL statements.

Some OLE DB providers can process multiple SQL Statements. The OLE DB Provider for SQL Server has this capability. Attempting to do this with Visual FoxPro tables via the OLE DB Provider for ODBC will not work.

When using this technique, you have two choices on where the logic exists to perform the task. One choice is to build the SQL on the client and pass it to the server through a Command object. The other choice is to invoke a stored procedure on the database server through a Command object. I’ll illustrate both techniques. The Command object will be discussed in detail later in this paper.

To illustrate the stored procedure method, the following stored procedure must be created on the SQL Server Northwind database:

CREATE  PROCEDURE CustomerAndOrders @CustomerID nchar(5)
AS
Select * From Customers Where Customers.CustomerID = @CustomerID
Select * From Orders Where Orders.CustomerID = @CustomerID

With the stored procedure created, the following code will create the recordset:

#Include adovfp.h
oConnection = CreateObject("adodb.connection")
oCommand = CreateObject("adodb.command")
With oConnection
   .Provider = "SQLOLEDB.1"
   .ConnectionString = ;
      "Persist Security Info=False;User ID=sa;Initial
        Catalog=Northwind;Data Source=JVP"
   .Open
EndWith
With oCommand
   .CommandText = "CustomerAndOrders"
   .ActiveConnection = oConnection 
   .CommandType = adCmdStoredProc 
EndWith
oCommand.Parameters("@CustomerID").Value = "ALFKI"
oRecordset = oCommand.Execute
Do While !oRecordset.Eof
   ?oRecordset.Fields(1).Value
   oRecordset.MoveNext
EndDo
oRecordset = oRecordset.NextRecordset
Do While !oRecordset.Eof
   ?oRecordset.Fields(0).Value
   oRecordset.MoveNext
EndDo

Like any recordset, the recordset just produced can be navigated. Once the first set of records from the Customers table have been navigated, the NextRecordset method is invoked. This causes the recordset produced by the second SQL statement to become available. Thus, the next set of commands loops through the records from the Orders table. This technique is ideal in those situations where you may need to populate Combo or ListBox controls.

The previous example references a collection that has not been discussed yet, the Parameters collection. The Parameters collection and the individual Parameter objects that it contains serve several purposes. One purpose is to provide the capacity to create parameterized queries. Another purpose is to provide the ability to send arguments to, and return data from, a stored procedure. For more information on the Parameters collection, see the Command Object section of this paper.

Alternatively, you can produce the SQL on the client if you wish. The following code illustrates the difference:

With oCommand
   .CommandText = "Select * From Customers Where CustomerID =
      'ALFKI'" + Chr(13) + "Select * From Orders Where CustomerID =
      'ALFKI'"
   .ActiveConnection = oConnection 
   .CommandType = adCmdText 
EndWith
oRecordset = oCommand.Execute

The same result is achieved. The difference lies in how the result is achieved.

Which approach is better?

It depends on what your requirements are. The first option, which uses stored procedures, is more secure; the code is set and you can assign permissions with regard to who can execute the stored procedure. The second option provides more flexibility, but less security.

Fabricated recordsets

Up to this point, recordset objects have been presented in the context of origination from an ADO connection. In many cases, you may want to create an ADO recordset with data that does not come from a data source, just as you may in some cases use the Create Cursor command in Visual FoxPro. For example, you may have an application that works with a small amount of data, such as an array or Visual FoxPro cursor. Perhaps you need to dynamically build a table structure. Whatever the reason, the ability to create ADO recordsets from scratch is powerful.

To illustrate this capability, consider the need to fetch a list of files from a specified directory. In Visual FoxPro, a handy function, ADIR( ), performs this sort of task. However, what if you need to pass the data to another application? Or, perhaps you need to persist the list to a file on disk. While Visual FoxPro arrays are powerful, ADO recordsets provide a compelling alternative. The following code fetches a list of files from a specified directory, fabricates a recordset, and copies the values from the array into the newly created recordset:

*/GetFiles.prg
#INCLUDE "adovfp.h"
Local Array aFiles[1]
Local nFiles,nField,nFile,oRS
nFiles = Adir(aFiles,Getdir( )+"*.*")
oRS=Createobject("adodb.recordset")
With oRS
.CursorLocation=ADUSECLIENT
.LockType=ADLOCKOPTIMISTIC
*/ Adding new fields is a matter of appending
*/ new field objects to the Fields Collection. 
.Fields.Append("File",ADCHAR,20)
.Fields.Append("Size",ADDOUBLE,10)
.Fields.Append("DateTime",ADDBTIME,8)
.Fields.Append("Attributes",ADCHAR,10)
.Open
EndWith
For nFile = 1 To nFiles
   */ Add a new record. This automatically makes
   */ the new record the current record - just
   */ like VFP.
   oRS.AddNew
   With ors
      .Fields("File").Value = aFiles[nFile,1]
      .Fields("Size").Value = aFiles[nFile,2]
      .Fields("DateTime").Value = ;
        Ctot(Dtoc(aFiles[nFile,3]) + " " + aFiles[nFile,4])
      .Fields("Attributes").Value = aFiles[nFile,5]
   EndWith
Next nItem
Return oRS

With the new recordset created and populated, it can be navigated like any other recordset:

oFiles = GetFiles ( )
Do While !oFiles.Eof
   ?oFiles.Fields("File").Value
   oFiles.movenext
EndDo

ADO recordsets instead of arrays

Referring to the previous example, let’s say that the list needs to be sorted by file size, descending. Arrays in Visual FoxPro can be sorted, when all columns in the array are of the same data type. In this case, there are three data types: Character, Numeric, and DateTime. With a client-side ADO recordset, the process becomes simple. The following code does the trick:

oRS.Sort = "Size Desc"

Sorts are not limited to just one column. Perhaps you need to sort by size, descending, and then by file, ascending:

oRS.Sort = "Size Desc,File"

And, when it comes to sorting, such properties as Bookmark and AbsolutePosition that have already been demonstrated are available here as well.

Perhaps you need to find a specific value. The ASCAN( ) function in Visual FoxPro enables you to do this. However, it does not allow you to specify a particular column to search. Rather, once the first occurrence of a specified value is found, regardless of the column, the search is stopped. With ADO recordsets, more granular control is provided. The following code checks to see if a file called VFP6.EXE is in the recordset:

oRS.Find("File Like 'VFP6.EXE'")
If !oRS.Eof
   */ Found it
Else
   */ Not found
Endif

Finally, you may wish to filter the list based on the file size being greater than a specified value:

oRS.Filter = "size > 50000"

When evaluating the tools at your disposal for local data handling, be sure to consider fabricated ADO recordsets. Also, if you find yourself running into obstacles with Visual FoxPro arrays, fabricated ADO recordsets may provide a sound alternative.

Command Object

ProgID: ADODB.Command

The purpose of the Command object is just as the its name implies, to run commands. For example, you may need to run a SQL update against a SQL Server table. To illustrate, the following code applies a 10 percent increase in the UnitPrice field in the Products table of the SQL Server Northwind database:

oCommand = CreateObject("adodb.command")
With oCommand
   .ActiveConnection = oConnection
   .CommandText = "Update Products Set unitprice = unitprice * 1.1"
   .Execute
EndWith

The ActiveConnection property

To review, both the Command object and Recordset object have the ActiveConnection property. A Command object needs to know what data source it is to execute commands against. A Recordset object needs to know what data source contains the data it is to retrieve. The way you accomplish this is by setting the ActiveConnection property.

The ActiveConnection property presents a great opportunity to talk about the flexible nature of the ADO object model. The ADO object model is very flat, in that you do not have to create a series of objects in order to gain access to other objects. For example, the following is one way to create and open both a Connection and a Recordset object:

oConnection = CreateObject(""adodb.connection"")
oRecordset = CreateObject(""adodb.recordset"")
With oConnection
   .Provider = ""SQLOLEDB.1""
   .ConnectionString = ""Persist Security Info=False;User 
    ID=sa;Initial Catalog=Nothwind;Data Source=JVP""
   .Open
EndWith
With oRecordset
   .ActiveConnection = oConnection
   .Source = ""Products""
   .Open
EndWith

Here is another way to create the two objects:

oRecordset = CreateObject(""adodb.recordset"")
With oRecordset
   .ActiveConnection = ""Provider=SQLOLEDB.1;Persist Security 
    Info=False;User ID=sa;Initial Catalog=Northwind;Data Source=JVP""
   .Source = ""Products""
   .Open
EndWith

Now, you can reference the Connection object because it has been implicitly created from the passed connection string:

?oRecordset.ActiveConnection.ConnectionString

The same is true for the Command object. While a Command object was not explicitly created, a Command object was in fact created and actually did the work of creating the recordset. Using the recordset just created, the following command will yield “Products” as the CommandText:

?oRecordset.ActiveCommand.CommandText

Which method should you use?

It is really a matter of preference. The latter method, which uses only the RecordSet object, is somewhat overloaded. It carries the same overhead as the former method because you must still create a Connection object. The former method is probably a better way to go as it makes for more readable code.

Parameters collection

The Parameters collection works with the Command object. The primary use of the Parameters Collection is to both pass arguments to, and accept return values from stored procedures. To illustrate, consider the CustOrderHist stored procedure in the SQL Server Northwind database:

CREATE PROCEDURE CustOrderHist @CustomerID nchar(5)
AS
SELECT ProductName, Total=SUM(Quantity)
FROM Products P, [Order Details] OD, Orders O, Customers C
WHERE C.CustomerID = @CustomerID
AND C.CustomerID = O.CustomerID AND O.OrderID = OD.OrderID AND 
OD.ProductID = P.ProductID
GROUP BY ProductName

To illustrate how the Parameters collection is used in conjunction with the Command object, consider the following comprehensive example:

First, you need to establish a valid connection:

oConnection = CreateObject("adodb.connection")

Next, the connection needs to be opened.

With oConnection
   .Provider = "SQLOLEDB.1"
   .ConnectionString = "Persist Security Info=False;User 
    ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith

With a valid, open connection, a Command object can be prepared:

With oCommand
   .ActiveConnection = oConnection 
   .CommandText = "CustOrderHist"
   .CommandType = adCmdStoredProc && adCmdStoredProc = 4
EndWith

At this point, information can be obtained from the Parameters collection:

For Each Parameter in oCommand.Parameters
   ?Parameter.Name,Parameter.Size,Parameter.Type
Next Parameter

The first Parameter object is reserved for the value that the stored procedure may return. Regardless of whether the stored procedure explicitly returns a value, this Parameter object will be created. Examining the CustOrderHist stored procedure, note that a single argument, a customer ID, is accepted.

With a Command object and Parameter object in place, the real work can begin. To get things rolling, a value needs to be assigned to the Parameter object that will in turn be passed to the stored procedure. In this case, a SQL statement is executed that totals the quantity, by product, that a specified customer has purchased. The following code provides a customer ID and executes the stored procedure:

oCommand.Parameters("@CustomerID").Value = "ALFKI"
oRecordset = oCommand.Execute

Yet another way to produce a Recordset object is through the execution of a stored procedure. The resulting Recordset object contains two fields that correspond to the select statement in the CustOrderHist stored procedure. Need a different history? Just update the Value property of the Parameter object and invoke the Execute method of the Command object.

The Parameters collection also comes into play in the area of parameterized queries. Consider the following SQL Statement:

Select * ;
   From Customer ;
   Where country = ? And max_order_amt > ?

As with views, either local or remote, in Visual FoxPro, so too can queries be parameterized in ADO. In ADO, the question mark acts as a placeholder for parameters. The following example illustrates how to put this all together.

First, a connection and a Command object need to be created:

oConnection = CreateObject("adodb.connection")
oCommand = CreateObject("adodb.command")

Next, the connection needs to be established:

oConnection.Open("northwind","sa","")

For illustration purposes, the OLE DB Provider for ODBC is used. The native OLE DB Provider for SQL Server could have been used as well.

Next, the Command object needs to be prepared:

With oCommand
   .ActiveConnection = oConnection
   .CommandText = "Select * From Customer Where country = ? 
EndWith

With the Command object ready to go, a parameter object needs to be created:

oCountryParameter = ;
 oCommand.CreateParameter("country",adChar,adParamInput,1," "))

The arguments for the CreateParameter method are as follows:

Name—The name of the parameter.
Type—The data type of the parameter. A list of valid values is contained in DataTypeEnum.
Direction—The direction of the parameter. Parameters sent to a command are input parameters. Arguments passed back from a command are output parameters. A list of valid values is contained in ParameterDirectionEnum.
Size—The length of the parameter.
Value—The initial value of the parameter.

Alternatively, the parameter could have been created like this:

OCountryParameter = CreateObject("adodb.parameter")
With oCountryParameter
   .Name = "Country"
   .Type = adChar
   .Direction = adParamInput
   .Size = 1
   .Value = " "
EndWith

Once the parameter has been created, it needs to be appended into the Parameters collection of the Command object:

oCommand.Parameters.Append(oCountryParameter)

With the parameter in place, the value of the parameter can be set. In this case, the parameter will be set so that any country that begins with the letter U will be returned into a Recordset object:

With oCountryParameter
   .Size = 2
   .Value = "U%"
EndWith

Now, a Recordset object can be created:

oRecordset = oCommand.Execute

A useful feature of specifying parameters is that this enforces characteristics such as size, data type, and so on. For example, the preceding parameter was defined as a character. If a value based on a different data type was assigned to the Value property of the Parameter object, an error would result. The same is true if the assigned value is greater in length than what has been specified by the Size property.

Finally, if a list of customers in Mexico were required, the following code would complete the task:

With oCommand
   .Parameters("country").Size = Len("Mexico")
   .Parameters("country").Value = "Mexico"
   oRecordSet = .Execute 
EndWith

Properties Collection

Recall the earlier assertion that, by itself, ADO is incapable of doing anything? ADO in fact just provides an interface. OLE DB providers give ADO the ability to do anything. So then, what distinguishes one OLE DB provider from another? More specifically, how can you determine what an OLE DB provider can and cannot do, or what attributes it does or does not possess? Depending on the OLE DB provider you use, or the type of recordset you use (client or server), what is supported will likely differ.

The Properties collection applies to the Connection, Recordset, and Field objects. The Command object also has a Properties collection, which is identical to the Recordset object Properties collection.

Multiple result sets provide a good example of varying OLE DB provider support. To determine if multiple result sets can be obtained, you can refer to the “Multiple Results” properties:

If oConnection.Properties("Multiple Results").Value = 1
   */ Supports multiple result sets
EndIf

While the OLE DB providers for SQL Server and ODBC both support multiple results, the OLE DB provider for Jet does not. To illustrate, the following is valid syntax for SQL Server:

oRecordset.Source="SELECT * FROM customers;"+"SELECT * FROM orders"
oRecordset.Open
?oRecordSet.Fields.Count && number of fields in customers table
oRecordset = oRecordset.NextRecordSet
?oRecordSet.Fields.Count && number of fields in orders table

In this case, the OLE DB Provider for SQL Server can return multiple recordsets. If you attempt the same thing with the OLE DB Provider for ODBC, which you need to use when accessing Visual FoxPro data, you will receive an error message stating that the requested action is not supported by the OLE DB provider.

Another example involves the way in which the Properties collection deals with the location of a Recordset object. Recordsets can either exist locally as client-side recordsets or they can exist remotely as server-side recordsets. Client-side recordsets, as will be discussed shortly, have several capabilities that server-side recordsets do not have. One of these abilities is to create indexes. The following code creates a client-side recordset:

oRecordset = CreateObject("adodb.recordset")
oConnection = CreateObject("adodb.connection")
With oConnection
   .Provider = "SQLOLEDB.1"
   .ConnectionString = "Persist Security Info=False;User 
    ID=sa;Initial Catalog=Northwind;Data Source=JVP"
   .Open
EndWith
With oRecordset
   .Cursorlocation = adUseClient && adUseClient = 3
   .ActiveConnection = oConnection
   .Source = "Products"
   .Open 
EndWith

Now, lets create an index on the ProductName field using the following code:

oRecordSet.Fields("productname").Properties("optimize").Value = .T.

In the absence of a declaration of where a Recordset object should reside, the Recordset object, by default, resides on the server. Attempting to reference the Optimize property results in an error stating that the specified property could not be found in the collection.

While the ADO interface is constant, depending on the provider you use, the capabilities may be very different. Be sure to consult your provider’s documentation.

Remote Data Services

One of the most powerful data access capabilities introduced by Microsoft is Remote Data Services (RDS). Although a separate set of objects exists for RDS, RDS is really just another component for use with ADO. There are two ways you can implement RDS.

Use the same ADO objects described in this paper
Use the RDS data control

Let’s discuss the RDS data control option first, since it represents some uncharted territory.

The RDS Data Control

The following code creates an instance of the RDS data control:

oRDSDataControl = Createobject("rds.datacontrol")

Once the data control is created, only three properties need to be populated: Server, Connect, and SQL.

With oRDSDataControl
   .Server = "http://jvp"
   .Connect = ;
    "Remote Provider=SQLOLEDB.1;database=northwind;User ID=sa;"
   .Sql = "Customers"
EndWith

Because we’re using the SQL Server OLE DB Provider, the SQL property can consist of just the table name. The following code retrieves the same recordset, but does so with the OLE DB provider for ODBC:

With oRDSDataControl
   .Server = "http://jvp"
   .Connect = "dsn=northwind;uid=sa;pwd=;"
   .Sql = "Customers"
EndWith

Whenever possible, you should use a native OLE DB provider rather than the OLE DB provider for ODBC.

With the RDS data control properties set, you can create a recordset. Invoke the Refresh method to accomplish this, as in the following code:

oRDSDataControl.Refresh
oRecordset = oRDSDataControl.Recordset

From this point on, you can work with the recordset the same way you work with any other ADO client-side recordset:

Do While !oRecordset.Eof
   orecordset.Fields(1).value = ;
      Proper(orecordset.Fields(1).value)
   oRecordset.Movenext
EndDo
oRecordset.Updatebatch

Alternatively, you can replace the last line of code with a call to the SubmitChanges method of the RDS data control:

oRDSDataControl.SubmitChanges

Implementing RDS Through the ADO Interface

You can invoke RDS by using the same ADO Connection object discussed above. As with hierarchical recordsets, the first step involves the selection of an OLE DB provider. In this case, the MSRemote provider is required. The following code sets up the Connection object:

oConnection = CreateObject("adodb.connection")
With oConnection
   .Provider = "MS Remote.1"

   .ConnectionString = "Remote Server=http://jvp;Remote 
         Provider=SQLOLEDB.1;database=northwind;User ID=sa;Pwd=;"
   .Open
EndWith

The ADO ConnectionString property supports only four arguments. The first two, Provider and File Name, have already been discussed. The third and fourth, Remote Provider and Remote Server, are used by the RDS in the example above. The Remote Provider is the same OLE DB provider used when you create local connections. The additional parameters that specify the database, user ID, and password are used by the OLE DB Provider for SQL Server that in turn is located on the remote server. The following code connects the Recordset object and Connection object and with one difference, is basically the same as the previous examples in this paper:

With oRecordset
   .ActiveConnection = oConnection
   .Source = "Customers"
   .LockType = adLockBatchOptimistic
   .Open
EndWith

The only difference is that properties such as CursorLocation and CursorType are omitted since all recordsets created through RDS must exist on the client. Additionally, all client-side recordsets are static types. If you like, you can still specify the properties explicitly. Any incompatible properties will be coerced to a valid value. For example, if you specify the CursorType to be a ForwardOnly cursor and you specify the recordset exists on the client, when the Open method is fired, ADO forces the cursor type to be static. The same is true if you specify the CursorLocation to be on the server and you use the MSDataShape provider. Since all hierarchical pecordsets must exist on the client, the CursorLocation is coerced to the proper value.

Summary

The goal of this paper has been to provide you with a fairly comprehensive overview of both ADO and RDS from the perspective of Visual FoxPro applications. Note that ADO is not a replacement for the Visual FoxPro Cursor Engine. Rather, regard it as another tool at your disposal. Both Visual FoxPro cursors and ADO recordsets have their relative strengths and weaknesses.

ADO is ideal in situations where your application is component based, or in situations where you need to pass data to other applications such as Excel in automation operations. Fabricated ADO recordsets can provide an interesting alternative to arrays when more robust data handling requirements are necessary.

For most local data handling operations however, Visual FoxPro cursors will usually provide better results.

John V. Petersen, MBA, is president of Main Line Software, Inc., based in Philadelphia, Pennsylvania. John’s firm specializes in custom software development and database design. He is a Microsoft Most Valuable Professional and has spoken at many developer events, including Visual FoxPro Developers Conference, FoxTeach, the Visual FoxExpress Developer’s Conference, DevDays, and TechEd. In addition, John has written numerous articles for FoxTalk and FoxPro Advisor. John is co-author of Visual FoxPro 6 Enterprise Development and Hands-on Visual Basic 6—Web Development, both from Prima Publishing. John’s latest project is the ADO Developer’s Handbook, from Sybex Publishing, due September 1999.

E-mail: jpetersen@mainlinesoftware.com

Books

“Programming FoxPro 2.5” by Lisa Slater and Steven D. Arnott
- A foundational book for beginners and intermediate users of FoxPro 2.5. Covers database management and programming techniques.
“The Revolutionary Guide to Visual FoxPro 3.0” by Kevin McNeish and Marilyn McLain
- This book explores Visual FoxPro 3.0, detailing object-oriented programming features and database development.
“Visual FoxPro 6.0 Programmer’s Guide” by Tamar E. Granor, Ted Roche, Doug Hennig, and Dave Fulton
- A comprehensive guide to using Visual FoxPro 6.0 for professional database application development.
“Advanced Object-Oriented Programming with Visual FoxPro 6.0” by Markus Egger
- Focuses on object-oriented techniques, design patterns, and advanced development strategies.
“Hacker’s Guide to Visual FoxPro 7.0” by Tamar E. Granor and Ted Roche
- A deep dive into advanced features and lesser-documented aspects of Visual FoxPro 7.0.
“Microsoft Visual FoxPro: Language Reference”
- Official Microsoft documentation providing a detailed reference to the Visual FoxPro language, commands, and functions.
“Fundamentals of FoxPro 2.x Programming” by George Goley
- Covers foundational programming techniques and database management in early FoxPro versions.

Online Resources

Microsoft Visual FoxPro Developer Center
- https://learn.microsoft.com/en-us/previous-versions/visual-foxpro/
  The official Microsoft documentation hub for Visual FoxPro, including guides, FAQs, and technical articles.
Foxite – Visual FoxPro Community Resource
- https://www.foxite.com
  A community forum where FoxPro and Visual FoxPro developers share tips, tricks, and solutions.
UtterAccess Visual FoxPro Forum
- https://www.utteraccess.com
  A forum dedicated to database professionals, including a section for Visual FoxPro discussions.
The Universal Thread
- https://www.universalthread.com
  A long-standing online resource for FoxPro and Visual FoxPro developers.

Journal Articles

“Visual FoxPro 9.0 and Beyond”
- Published in FoxTalk, this article discusses updates and features in the final version of Visual FoxPro, as well as transitioning to other platforms.
“Bringing Old FoxPro Applications into the Modern Era”
- Journal of Database Development, 2010, discusses strategies for updating legacy FoxPro applications.

Additional Resources

CodePlex Archive for Visual FoxPro Projects
- https://archive.codeplex.com
  Hosts various Visual FoxPro open-source projects, including tools and libraries.
GitHub: VFPX (Visual FoxPro Extensions)
- https://github.com/VFPX
  A collaborative project to extend the life of Visual FoxPro by providing tools, add-ons, and updates.

Books

“Visual FoxPro 6.0 Programmer’s Guide”
- Authors: Tamar E. Granor, Ted Roche, Doug Hennig, and Dave Fulton
- Publisher: Que
- Description: Comprehensive coverage of Visual FoxPro 6.0 features, including database development, debugging, and deployment techniques.
“Advanced Object-Oriented Programming with Visual FoxPro 6.0”
- Author: Markus Egger
- Publisher: Hentzenwerke
- Description: An in-depth guide to using object-oriented programming techniques in Visual FoxPro 6.0, with practical examples.
“Hacker’s Guide to Visual FoxPro 7.0”
- Authors: Tamar E. Granor, Ted Roche
- Publisher: Hentzenwerke
- Description: Although for version 7.0, much of its content applies to version 6.0, covering advanced features and debugging tips.
“What’s New in Visual FoxPro 8.0”
- Author: Tamar E. Granor, Doug Hennig
- Publisher: Hentzenwerke
- Description: Provides insights into features that extend and build on those found in Visual FoxPro 6.0 and 7.0.
“Special Edition Using Visual FoxPro 6”
- Authors: Neil L. Clausen and Que Development Group
- Publisher: Que
- Description: Offers step-by-step guidance for Visual FoxPro 6.0 developers, including user interface design and SQL integration.
“Visual FoxPro 9.0: Best Practices for Business Applications”
- Author: Les Pinter
- Publisher: Hentzenwerke
- Description: A guide to building modern business applications using Visual FoxPro 9.0, emphasizing reporting, XML, and COM interoperability.
“MegaFox: 1002 Things You Wanted to Know About Extending Visual FoxPro”
- Authors: Marcia Akins, Andy Kramek, and Rick Schummer
- Publisher: Hentzenwerke
- Description: A deep dive into the extensibility features of Visual FoxPro 9.0, focusing on advanced development practices.

Official Documentation

“Microsoft Visual FoxPro 6.0 Documentation”
- Publisher: Microsoft Corporation
- Includes a comprehensive reference for commands, functions, and user interface components.
“Microsoft Visual FoxPro 9.0 Service Pack 2 Documentation”
- Publisher: Microsoft Corporation
- Details enhancements, bug fixes, and updates introduced in the final version of Visual FoxPro.
“Visual FoxPro 9.0: Language Reference”
- Publisher: Microsoft Press
- An essential guide for understanding Visual FoxPro’s programming language, covering every command, function, and system variable.

Online Resources

Microsoft Visual FoxPro Developer Center
- https://learn.microsoft.com/en-us/previous-versions/visual-foxpro/
- The official repository for Visual FoxPro documentation, FAQs, and support resources.
VFPX (Visual FoxPro eXtensions)
- https://github.com/VFPX
- An open-source repository hosting tools, extensions, and community-driven updates for Visual FoxPro.
Foxite – Visual FoxPro Community Resource
- https://www.foxite.com
- A popular forum and Q&A site for Visual FoxPro developers.
The Universal Thread – FoxPro Resource
- https://www.universalthread.com
- Archives and active discussions on FoxPro and Visual FoxPro development.
Visual FoxPro Wiki
- https://fox.wikis.com
- A collaborative resource with tutorials, best practices, and links to additional FoxPro content.

Journal Articles

“Visual FoxPro 9.0: Enhancements and Updates”
- Published in FoxTalk, this article covers key features introduced in VFP 9.0, such as reporting enhancements and data handling.
“Refactoring Legacy Applications in Visual FoxPro 6.0”
- Published in Journal of Database Development, 2004, discussing strategies to modernize FoxPro 6.0 applications.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

December 21, 2024

Tool	Associated Toolbars	Command
Database Designer	Database	CREATE DATABASE
Form Designer	Form ControlsForm DesignerColor PaletteLayout	CREATE FORM
Print Preview	Print Preview
Query Designer	Query Designer	CREATE QUERY
Report Designer	Report ControlsReport DesignerColor PaletteLayout	CREATE REPORT

Category: Database

Foundations of Data Analysis

Mastering Microsoft PowerBI for Business Intelligence

PowerBI Data Transformation Explained

PowerBI Data Visualization Fundamentals

Essential Concepts in Data Security

Briefing Document: Excel and Power BI Data Analysis Techniques

Excel Functions and Power BI Data Modeling

The Original Text

Introduction to Database Engineering

Understanding SQL: Language for Database Interaction

Data Modeling Principles: Schema, Types, and Design

Understanding Version Control: Git and Collaborative Development

Python Programming Fundamentals: An Introduction

Database and Python Fundamentals Study Guide

Quiz

Answer Key

Essay Format Questions

Glossary of Key Terms

Briefing Document: Review of “01.pdf”

Oracle Database Administration Study Guide

SQL and PL/SQL

Subqueries

Types of SQL

Railroad Diagrams

Database Architecture

Data Files

Server Processes

Software Installation

Database Creation

Physical Database Design

Partitioning

Partition Views

User Management and Data Loading

User Management

Data Loading

Quiz

Answer Key

Essay Questions

Glossary of Key Terms

Briefing Document: Oracle Database 12c Administration

I. Database Architecture

II. Database Administration

III. Physical Database Design

IV. Database Maintenance

V. Database Tuning

VI. Key Takeaways

Oracle Database Administration FAQ

What are the different types of subqueries in Oracle SQL?

How is space organized within Oracle data files?

What are the main types of server processes in Oracle?

What are the different types of partitioning available in Oracle?

How can I export and import data in Oracle?

What is the purpose of the Oracle Data Dictionary?

What are some tools for monitoring an Oracle database?

What are some techniques for tuning SQL queries in Oracle?

Oracle 12c Database Administration

Timeline of Events:

Cast of Characters:

Oracle Database Administration

Relational Databases and SQL

SQL and PL/SQL in Oracle Databases

Database Backup and Recovery Strategies

Database Performance Tuning: A Five-Step Approach

Back-End Server Study Guide

Quiz

Quiz Answer Key

Essay Questions

Glossary of Key Terms

Backend Development: Node, Express, PostgreSQL, and Docker

Server-Side Development: Key Concepts and Practices

Routes, Endpoints, and HTTP Verbs: A Server-Side Guide

Database Interactions: Storage, Management, and Security

Authentication and Tokens: A Concise Guide

Client Emulation: Testing Server Endpoints and Network Requests

Server Configuration: Back End, Ports, and Middleware

The Original Text

Data Science & Machine Learning Study Guide

Quiz

Answer Key