“01.pdf” outlines a Jupyter Notebook project focused on analyzing a Zomato restaurant dataset. The session aims to teach data visualization techniques like bar charts, line graphs, histograms, box plots, and heatmaps to extract insights. Specific questions to be answered include identifying popular restaurant types, understanding customer ratings, analyzing order frequency based on dining options, and determining spending patterns. The initial steps cover uploading the data, importing necessary Python libraries (Pandas, NumPy, Matplotlib, Seaborn), and reading the CSV file into a Pandas DataFrame for analysis and visualization.
The second source details a Python-based data analysis project using a Netflix movie dataset. The goal is to answer questions about movie genres, popularity, and release years by employing exploratory data analysis (EDA). The process involves importing libraries (NumPy, Pandas, Matplotlib, Seaborn), loading the data, and performing data cleaning tasks such as format conversion and handling missing/duplicate values. The project then focuses on data visualization and statistical analysis to identify popular genres, highly-rated movies, and trends in movie releases over time.
The third source presents a Python project centered on analyzing an e-commerce dataset to understand customer behavior and sales trends. The objectives include performing data analysis and visualization, specifically monthly sales and profit. The initial steps involve setting up the environment by opening Jupyter Notebook, uploading the dataset, and importing essential Python libraries, including Pandas and the Plotly visualization library. The project intends to clean the data and then generate insightful reports through various visualizations.
Python Project Analysis Study Guide
Quiz
- According to the “Python Complete Crash Course,” what is the primary reason recruiters focus on project experience during interviews?
- Name at least three Python libraries mentioned in the “Python Complete Crash Course” that are commonly used for data analysis. What is the general purpose of each?
- In the Zomato data analysis project, what was the initial step after uploading the dataset into the Jupyter Notebook? What Python function was used for this?
- Explain the purpose of the user-defined function handle_rate in the Zomato project. What data transformation did it perform?
- Based on the Zomato project analysis, which type of restaurant (listed in the ‘listed_in(type)’ column) receives the majority of food orders? What evidence supports this conclusion?
- According to the Zomato project, what is the general rating range (out of 5) that the majority of restaurants receive? What visualization was used to determine this?
- In the Uber case study, what was the initial problem in Paris in 2008 that led to the idea for Uber?
- Describe the evolution of Uber’s service from its initial concept to the different types of ride-sharing options available today, as mentioned in the case study.
- Identify at least three ways Uber utilizes data science and analytics in its operations, according to the case study.
- What were the months identified in the Uber project analysis with the least number of Uber bookings? What possible reason was suggested for this trend?
Quiz Answer Key
- Recruiters focus on project experience because projects demonstrate practical application of skills and the amount of work a candidate has actually done, which is more telling than just theoretical knowledge in a short interview.
- Pandas: Used for data manipulation and cleaning, providing data structures like DataFrames. NumPy: Used for numerical computations and mathematical operations. Matplotlib and Seaborn: Used for data visualization, creating graphs and charts.
- The initial step was to create a Pandas DataFrame by reading the Zomato CSV file into the Jupyter Notebook. The Python function used was pd.read_csv().
- The purpose of the handle_rate function was to clean the ‘rate’ column by extracting the numerical rating value as a float and removing the ‘/5’ suffix. This converted the rating into a usable numerical format.
- Based on the count plot visualization, the ‘Dining’ type restaurant receives the majority of food orders, as indicated by the highest bar representing the count of this category.
- The majority of restaurants receive ratings between 3.5 and 4 (out of 5). This was determined using a histogram visualization of the ‘rate’ column, which showed the highest frequency of ratings within this range.
- The initial problem in Paris in 2008 was a snowy evening with limited public transport, leading to frustration and the idea for a technology to easily book rides.
- Uber initially started as a ride-sharing platform where costs were divided among passengers going in the same direction. It gradually evolved to allow on-demand booking of individual rides and expanded to offer various options like UberX (affordable), Uber Pool (shared rides), Uber Black (premium), UberXL (larger groups), Uber Freight, and Uber for Businesses.
- Uber utilizes data science for TA estimation (arrival time prediction), price prediction, route optimization, driver-rider matching, and fraud prevention in payments.
- The months identified with the least number of Uber bookings were November, December, and January. The suggested reason was the cold weather and snowfall during these winter months, particularly since the data is US-based and Paris was an early international expansion location.
Essay Format Questions
- Compare and contrast the objectives and methodologies of the Zomato data analysis project and the Uber case study analysis. What were the key insights gained from each, and how could these insights be valuable to the respective businesses?
- Discuss the importance of data cleaning and preprocessing in both the “Python Complete Crash Course” examples (Zomato and Uber/Netflix). Provide specific examples of cleaning techniques used and explain why these steps were crucial for accurate analysis.
- Evaluate the role of data visualization in understanding and communicating the findings of the Zomato and Uber/Netflix project analyses. Describe at least three different types of visualizations used and explain what information each visualization effectively conveyed.
- Analyze the business implications of the findings from either the Zomato or the Uber project. How could the identified trends and patterns (e.g., popular restaurant types, peak booking times, popular movie genres) inform strategic decision-making for the company?
- Reflect on the process of conducting a data analysis project as demonstrated in the provided sources. What are the key stages involved, and what skills are essential for a data professional to effectively execute such projects from data acquisition to insight generation?
Glossary of Key Terms
- Library (in programming): A collection of pre-written code that provides functions and tools to perform specific tasks, saving programmers from writing code from scratch. (e.g., Pandas, NumPy, Matplotlib, Seaborn, Plotly).
- DataFrame (Pandas): A two-dimensional, tabular data structure with labeled rows and columns, similar to a spreadsheet or SQL table. It is a primary data structure in Pandas for data manipulation and analysis.
- CSV (Comma Separated Values): A simple text file format in which values are separated by commas and each line represents a row of data.
- Jupyter Notebook: An interactive web-based environment that allows users to create and share documents containing live code, equations, visualizations, and narrative text. It is commonly used for data analysis and exploration in Python.
- Data Cleaning: The process of identifying and correcting errors, inconsistencies, and inaccuracies in a dataset to improve its quality for analysis. This can involve handling missing values, removing duplicates, and standardizing formats.
- Data Preprocessing: The steps taken to transform raw data into a format suitable for analysis. This can include cleaning, transforming, integrating, and reducing data.
- Data Visualization: The representation of data in a graphical format (e.g., charts, graphs, maps) to make it easier to understand patterns, trends, and insights.
- User-Defined Function: A block of code defined by the programmer to perform a specific task. It can be called multiple times within a program to reuse the code.
- API (Application Programming Interface): A set of rules and protocols that allows different software applications to communicate and exchange data with each other.
- Data Analyst: A professional who examines data to identify trends, answer questions, and provide insights to help organizations make better decisions.
- Data Scientist: A professional who uses scientific methods, algorithms, and systems to extract knowledge and insights from data in various forms.
- Machine Learning: A subset of artificial intelligence that enables computers to learn from data without being explicitly programmed.
- Algorithm: A step-by-step procedure or set of rules to solve a problem or accomplish a task.
- Feature (in data): An individual measurable property or characteristic of a data point. In a table, features are represented by columns.
- Insight (in data analysis): A meaningful and actionable finding or understanding derived from the analysis of data.
- Count Plot (Seaborn): A type of bar plot that shows the counts of observations in each categorical bin.
- Histogram: A graphical representation of the distribution of numerical data, where the data is grouped into bins and the height of each bar represents the frequency of values within that bin.
- Box Plot (Seaborn): A standardized way of displaying the distribution of quantitative data based on five summary statistics: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It can also show outliers.
- Density Plot (Seaborn/Distplot): A visualization that shows the probability density function of a continuous variable, providing a smooth estimate of the distribution.
- Value Counts (Pandas Series): A method that returns a Series containing counts of unique values in a Pandas Series (a single column of a DataFrame).
- Map Function (Pandas Series): A method used to substitute each value in a Series with another value, which can be derived from a function, dictionary, or Series.
- Group By (Pandas DataFrame): A powerful method to group rows in a DataFrame that have the same values in one or more columns, allowing for aggregate calculations on these groups.
- Reset Index (Pandas DataFrame): A method used to reset the index of a DataFrame to a default integer index. The old index can be kept as a new column.
- Drop Function (Pandas DataFrame): A method used to remove rows or columns from a DataFrame based on specified labels or index.
- Concatenation (in data manipulation): The process of joining two or more datasets (e.g., DataFrames or Series) along a particular axis.
- Categorical Data: Data that represents categories or groups (e.g., restaurant types, movie genres).
- Numerical Data: Quantitative data that can be measured or counted (e.g., ratings, votes, revenue).
Briefing Document: Analysis of Provided Sources
This document provides a summary of the main themes, important ideas, and key facts presented in the provided excerpts. Quotes from the original sources are included where appropriate to illustrate the points.
Source 1: Excerpts from “01.pdf” (Python Complete Crash Course in Hindi | 5 Python Projects)
Main Theme: This source is an introduction to a Python crash course focused on building five real-time projects, specifically aimed at individuals preparing for interviews in data analysis or data science roles. The emphasis is on practical implementation and building a strong portfolio of projects to showcase skills to recruiters.
Important Ideas and Facts:
- Course Objective: The primary goal of the course is to equip learners with practical Python skills through project-based learning, making them more confident and prepared for job interviews.
- Quote: “The objective of creating the course is that whenever you go for an interview Are recruiters interviewing you for half an hour? I can’t ask the whole Python question. That is why they ask the most questions on your projects and your It is the projects that tell this story that how much work you do that’s why this”
- Project-Based Learning: The course includes five “real timer” projects, implying they are designed to simulate real-world scenarios. One specific project mentioned is related to Zomato data analysis.
- Quote: “There are five real timers inside it Projects include such as JumT”
- Beginner-Friendly Approach: The course is designed for beginners with no prior “HiFi coding” experience, starting from the very basics and gradually advancing.
- Quote: “Starting from the very basic level, here we will talk about things We will start with this and gradually advance The whole project will reach this level Every single line of code that is written in I have told you one line at a very basic level. explained in very simple language so that Complete all your five projects easily”
- Focus on Implementation: The course emphasizes the practical application of Python concepts in the projects.
- Quote: “There will be a lot of focus on implementation, whatever We are talking about all these projects”
- Interview Preparation: The projects are designed to help learners answer interview questions effectively, as recruiters often focus on practical experience demonstrated through projects.
- Quote: “if you complete it then you will see it in real time The idea would be that a data analyst is a How can a data scientist work inside a company? if we work with it then this whole Projects are its Notes Data Set Code file all things description box below It is already mentioned and this course Your confidence level after completing it It will definitely get a boost, so let’s get started”
- Zomato Data Analysis Project: The excerpt details the initial steps of a Zomato data analysis project, including:
- Understanding the context of Zomato’s business (restaurant partners, customers, delivery partners).
- Identifying potential insights a data professional can extract (e.g., customer behavior, revenue growth, offering coupons).
- Demonstrating the process of uploading a CSV dataset into a Jupyter Notebook.
- Creating a new Jupyter Notebook file and naming it “Zomato Project”.
- Adding a heading “Zomato Data Analysis Project”.
- Importing essential Python libraries for data analysis: Pandas (as pd), NumPy (as np), Matplotlib.pyplot (as plt), and Seaborn (as sns).
- Reading the Zomato CSV data into a Pandas DataFrame using pd.read_csv().
- Addressing potential errors in file names (e.g., spaces).
- Displaying the first few rows of the DataFrame using df.head() to understand the data structure (restaurant name, online order status, table booking, ratings, votes, cost for two, listed dining type).
- Data Cleaning and Manipulation:
- Creating a user-defined function handle_rate to extract the numerical rating from the ‘rate’ column, which initially includes “/5”.
- Using the .apply() method with the handle_rate function to clean the ‘rate’ column and convert it to a float.
- Data Visualization and Insight Extraction:
- Analyzing the ‘listed_in(type)’ column to determine the most popular type of restaurant using sns.countplot(). The conclusion is that “Majority of the Restaurant Falls in Dining Category”.
- Analyzing the ‘rate’ column using plt.hist() to understand the distribution of ratings. The conclusion is that “Majority restaurant Receive ratings from 3.5 to 4 for majority of restaurants which Ratings are between 3.5 to 4 out of four It is getting a good rating”.
- Analyzing ‘online_order’ and ‘book_table’ using sns.countplot() to understand customer preferences for online ordering and table booking.
- Analyzing the ‘location’ column and visualizing the number of restaurants in each location using plt.figure() and sns.countplot().
- Analyzing the relationship between ‘approx_cost(for two people)’ and ‘listed_in(type)’ using plt.figure() and sns.barplot().
Source 2: Excerpts from “01.pdf” (Continued – Uber Data Analysis Project)
Main Theme: This section transitions to an Uber data analysis project, focusing on analyzing ride data to extract insights relevant to the company’s operations and customer behavior. It also connects this project to preparing for data science/analyst job interviews by simulating a take-home assignment.
Important Ideas and Facts:
- Interview Simulation: The project is presented as a way to prepare for the first round of data science/analyst interviews, where companies often provide case studies or assignments.
- Quote: “If you get a call for that then this project Through which our round number one is Where the company will give you a case study How to solve the assignments she sends how do we submit it should make projects like this that we prepare ourselves for this round Was able to create potential and for this round You should be so capable that we can take out the round So these things are also included in today’s project You will get to learn how to The company provides case studies,”
- Uber Company Overview: A brief introduction to Uber Technologies is provided, including its nature (American multinational transportation company), services (courier, food delivery, freight), headquarters (San Francisco), global presence (over 70 countries, 10,500+ cities), user base (150 million+ monthly active users, 6 million drivers), trip volume (2.8 million+ daily trips), and revenue ($37.2 billion for 2020-23).
- Uber’s Origin Story: The excerpt recounts the founding of Uber by Camp Garrett in 2008, inspired by the difficulty of finding transportation in Paris during a snowy evening. The initial idea involved ride-sharing to divide costs, which later evolved. Travis Kalanick joined as a co-founder, bringing technological expertise. Uber launched its beta version in San Francisco in 2009, with the first booked ride in June 2010. International expansion began in 2012.
- Uber’s Service Types: The different types of Uber ride-sharing services are mentioned: UberX (affordable), Uber Pool (shared rides), Uber Black (premium), Uber Group/UberXL (larger groups), and expansion into Uber Eats, Uber Freight, and Uber for Businesses.
- Uber’s Revenue Model: Uber earns money through commissions on rides, subscriptions, and advertising.
- Role of Data Science in Uber: The importance of data science in Uber’s operations is highlighted, specifically in:
- ETA (Estimated Time of Arrival) Estimation: Using machine learning algorithms to predict arrival times.
- Price Prediction: Using data analysis to determine ride fares.
- Route Optimization: Algorithms are used to suggest efficient routes.
- Matching Drivers and Riders: Data science helps in creating optimal matches.
- Fraud Detection in Payments: Ensuring secure transactions.
- Uber’s Data Science History: Uber started working on data science and analytics around 2010-2011, with Kevin Novak as the head of data science.
- Uber Data Analysis Project – Data Loading and Initial Cleaning:Loading an Uber dataset into a Pandas DataFrame named data_set using pd.read_csv().
- Handling missing values in the ‘PURPOSE*’ column by filling them with “NOT OK” using fillna().
- Converting the ‘START_DATE*’ and ‘END_DATE*’ columns to datetime objects using pd.to_datetime() with errors=’coerce’ to handle potential invalid date formats.
- Extracting the date and time components from the ‘START_DATE*’ column into new columns named ‘date’ and ‘time’ using pd.DatetimeIndex().
- Categorizing the ‘time’ column into ‘Day Night’ categories (Morning, Afternoon, Evening, Night) based on specific time intervals using pd.cut().
- Removing rows with any remaining null values using data_set.dropna(inplace=True).
Source 3: Excerpts from “01.pdf” (Continued – Uber Data Analysis Project – Data Visualization and Analysis)
Main Theme: This section focuses on performing data visualization and extracting insights from the cleaned Uber dataset to answer specific business-related questions.
Important Ideas and Facts:
- Data Visualization Setup: Setting up the environment for data visualization using Matplotlib (as plt) and Seaborn (as sns).
- Analyzing Ride Categories and Purposes:Creating subplots using plt.figure() and plt.subplot().
- Using sns.countplot() to visualize the distribution of rides by ‘CATEGORY’. The insight is that most Uber rides are for “Business” purposes compared to “Personal”.
- Using sns.countplot() to visualize the distribution of rides by ‘PURPOSE’. The insights include that the most frequent purpose is “Meeting”, followed by “Meal/Entertain”, “Customer Visit”, “Errand/Supplies”, “Temporary Site”, and “Office Supplies”. Null values (labeled “NOT OK”) are also present.
- Analyzing Ride Timing:Using sns.countplot() to visualize the distribution of rides by the created ‘Day Night’ categories. The insight is that most Uber rides are booked during the “Afternoon”, followed by “Evening”, then “Night”, and the least in the “Morning”.
- Analyzing Monthly and Weekly Trends:Creating a ‘month’ column by extracting the month from the ‘START_DATE*’ column using pd.DatetimeIndex().month.
- Creating a dictionary month_label to map numerical month values to month names.
- Mapping the numerical ‘month’ column to the ‘month_label’ to get a categorical month column.
- Using pd.DataFrame() and groupby() with count() to get the count of rides per month.
- Creating a line plot using plt.plot() to visualize the trend of Uber rides over the months. The observation is that rides are least booked during the winter months (November, December, January).
- Creating a ‘day’ column by extracting the day of the week from the ‘START_DATE*’ column using data_set[‘START_DATE*’].dt.weekday.
- Creating a dictionary day_label to map numerical weekday values (0-6) to day names (Monday-Sunday).
- Mapping the numerical ‘day’ column to the ‘day_label’ to get a categorical day column.
- Using data_set[‘DAY*’].value_counts() to count the occurrences of each day.
- Using sns.barplot() to visualize the number of rides booked on each day of the week. The insight is that Friday has the highest number of bookings, while Saturday and Sunday have the least.
- Analyzing Ride Distance (Miles):Using sns.boxplot() to visualize the distribution of ride distances (‘MILES*’). The initial box plot shows a wide range of distances, with most rides concentrated at lower mileages but outliers going up to 175 miles.
- Creating a filtered DataFrame (data_set[data_set[‘MILES*’] <= 100]) and using sns.boxplot() to focus on rides within 100 miles. This provides a clearer view of the distribution within this range.
- Further filtering for rides less than 40 miles (data_set[data_set[‘MILES*’] < 40][‘MILES*’]) and using sns.distplot() to visualize the density distribution of these shorter rides. The insight is that most rides are within the 0-10 mile range, with a significant number up to 20 miles, indicating that people primarily use Uber for shorter distances.
Source 4: Excerpts from “01.pdf” (Continued – Netflix Movie Data Analysis Project)
Main Theme: This section introduces a new project focused on analyzing a Netflix movie dataset to extract insights about movie genres, popularity, and release trends. This is also framed as a potential data analysis task within a large media streaming company.
Important Ideas and Facts:
- Context: The project simulates working as a data professional within Netflix.
- Netflix Company Overview: A brief overview of Netflix is provided, including its origin as a DVD rental service in 1997, its expansion into streaming in 2007, its global reach (available in over 190 countries, including India), its profitability ($2.4 billion profit), its large subscriber base (283 million+ paid memberships), and its use of data to understand customer behavior and patterns.
- Dataset: The project involves analyzing a dataset of around 9,000-10,000 movies with information such as release date, title, overview, popularity, vote count, average vote, genre, and poster URL.
- Analysis Questions: Five specific questions need to be answered using the dataset:
- Which are the genres that most people have liked (based on the number of votes)?
- Which is the movie that is most popular, and what is its genre?
- Which is the movie that has the least popular rating, and what is its genre?
- Which is that movie in which year were the most films made (most released)?
- (Implied but not explicitly numbered) Analyze the vote average column to categorize movies based on their average rating (e.g., popular, average, below average, not popular).
- Tool: Python and Jupyter Notebook are to be used for solving the questions.
- Data Loading: The excerpt shows importing necessary libraries (NumPy as np, Pandas as pd, Matplotlib.pyplot as plt, Seaborn as sns) and loading the movie data from a CSV file named “tmdb_5000_movies.csv” into a Pandas DataFrame named df. The lineterminator=’\n’ argument is used in pd.read_csv().
Source 5: Excerpts from “01.pdf” (Continued – Netflix Movie Data Analysis Project – Data Exploration and Cleaning)
Main Theme: This section details the initial steps of exploring and cleaning the Netflix movie dataset.
Important Ideas and Facts:
- Initial Data Exploration: Displaying the first five rows of the DataFrame using df.head() to understand the data structure (budget, genres, homepage, id, keywords, original_language, original_title, overview, popularity, production_companies, production_countries, release_date, revenue, runtime, spoken_languages, status, tagline, title, vote_average, vote_count).
- Handling Date Format: Acknowledging the need to bring the ‘release_date’ column into datetime format, though the implementation is shown later.
- Exploring Genres: Examining the ‘genres’ column, noting that multiple genres are listed within a single string, separated by commas and spaces (e.g., “Action, Adventure, Science Fiction”). This highlights the need for splitting and processing this column for effective analysis.
- Checking for Duplicates: Using df.duplicated().sum() to check for duplicate rows in the dataset. The result of 0 indicates that there are no duplicate movies.
- Basic Statistics: Using df.describe() to get descriptive statistics for numerical columns (popularity, vote_count, vote_average, budget, revenue, runtime). This provides insights into the distribution and range of these variables (e.g., average popularity, maximum votes).
- Irrelevant Column Removal: Identifying columns that are deemed unnecessary for answering the posed questions: ‘budget’, ‘homepage’, ‘id’, ‘keywords’, ‘original_language’, ‘poster_path’, ‘production_companies’, ‘production_countries’, ‘release_date’, ‘revenue’, ‘runtime’, ‘spoken_languages’, ‘status’, ‘tagline’. These are stored in a list columns_to_remove.
- Dropping Columns: Using df.drop() with axis=1 and inplace=True to permanently remove the identified irrelevant columns from the DataFrame.
- Categorizing Vote Average: Creating a user-defined function categorize_vote_average to label movies based on their ‘vote_average’ into categories: ‘popular’, ‘average’, ‘below average’, and ‘not popular’. This involves calculating percentile-based thresholds (minimum, 25th percentile, 50th percentile, 75th percentile, maximum) and using pd.cut() to assign labels. The function is applied to the ‘vote_average’ column to create a new column ‘vote_category’.
- Splitting Genres: Processing the ‘genres’ column to split the comma-separated genre strings into individual genres. This involves:
- Using df[‘genres’].str.split(‘, ‘) to split the strings into lists of genres.
- Using df.explode(‘genres’) to transform each list of genres into separate rows, effectively creating a long format DataFrame where each movie-genre combination has its own row.
- Resetting the index of the resulting DataFrame df_exploded.
- Casting the ‘genres’ column to the ‘category’ data type.
- Unique Genre Count: Using df_exploded[‘genres’].nunique() to find the number of unique genres (resulting in 19).
Source 6: Excerpts from “01.pdf” (Continued – Netflix Movie Data Analysis Project – Data Visualization and Analysis with Plotly)
Main Theme: This section introduces the Plotly library for creating interactive visualizations to answer the remaining questions about the Netflix movie dataset.
Important Ideas and Facts:
- Introduction to Plotly: Explaining the benefits of using Plotly for dynamic and interactive visualizations, noting that it’s a valuable skill in the industry. It mentions the use of plotly.express for high-level plotting and plotly.graph_objects for more customized graphs.
- Importing Plotly Libraries: Importing the necessary Plotly modules:
- plotly.express as px
- plotly.graph_objects as go
- plotly.figure_factory as ff (commented out but mentioned)
- plotly.subplots (implicitly used later)
- plotly.colors
- plotly.io as pio
- Setting Default Theme: Setting the default Plotly template to ‘plotly_white’ using pio.templates.default = “plotly_white”.
- Re-loading Data (Optional): Showing the re-loading of the “tmdb_5000_movies.csv” dataset into a DataFrame named data (though the previous df_exploded DataFrame with processed genres is the one primarily used for analysis).
- Analyzing Most Frequent Genre:Using df_exploded[‘genres’].value_counts() to get the count of movies for each genre.
- Using px.bar() to create a bar chart showing the frequency of each genre. The ‘genres’ column is on the y-axis, and the count is on the x-axis. The chart is displayed using fig.show(). The insight is that ‘Drama’ is the most frequent genre, followed by ‘Action’.
- Analyzing Vote Average Category:Using px.bar() to create a bar chart showing the distribution of movies across the created ‘vote_category’ (popular, average, etc.). The ‘vote_category’ is on the x-axis, and the count is on the y-axis. The chart is displayed using fig.show(). The insight is that most movies fall into the ‘average’ vote category.
- Identifying Most Popular Movie:Finding the movie with the maximum ‘popularity’ using df[df[‘popularity’] == df[‘popularity’].max()]. The most popular movie is identified as ‘Minions’ with a popularity score of approximately 875.
- Identifying Least Popular Movies:Finding the movie(s) with the minimum ‘popularity’ using df[df[‘popularity’] == df[‘popularity’].min()]. Several movies are identified with a popularity score of 0.
- Analyzing Movie Releases by Year:Extracting the year from the ‘release_date’ column using pd.to_datetime(df[‘release_date’]).dt.year.
- Counting the number of movies released each year using .value_counts().
- Using px.bar() to create a bar chart showing the number of movie releases per year. The year is on the x-axis, and the count is on the y-axis. The chart is displayed using fig.show(). The insight is that the number of movie releases generally increased over time, with a peak around 2014-2016.
This briefing document summarizes the key aspects of the provided sources, highlighting the learning objectives, methodologies, and insights gained from the Python-based data analysis projects on Zomato, Uber, and Netflix movie data. The use of quotes and explicit mention of important code snippets and conclusions from the analysis are included to provide a comprehensive overview.
Python, Data Analysis, and Interview Preparation
Frequently Asked Questions based on the Provided Sources
1. What is the primary goal of the Python Complete Crash Course in Hindi, and who is it designed for? The primary goal of the Python Complete Crash Course is to equip individuals with practical Python skills, particularly for data analysis and data science roles. It emphasizes learning through hands-on projects to build confidence and demonstrate practical abilities in interviews. The course is designed for beginners with no prior HiFi coding experience, starting from basic concepts and gradually advancing to real-time projects. It aims to help learners understand how data analysts and data scientists work within a company.
2. Why does the Python course emphasize project-based learning, especially for interview preparation? The course focuses on project-based learning because recruiters in interviews often concentrate on candidates’ projects rather than asking exhaustive theoretical Python questions due to time constraints. Projects effectively showcase a candidate’s practical skills, the amount of work they’ve done, and their ability to apply Python in real-world scenarios. Completing projects demonstrates a tangible understanding of Python and boosts confidence for interviews.
3. What are the key Python libraries highlighted in the Zomato Data Analysis Project, and what are their primary uses in this context? The key Python libraries highlighted are: * Pandas (as pd): Used for data manipulation and cleaning. It provides data structures like DataFrames, which are essential for reading, processing, and analyzing structured data like the Zomato CSV file. * NumPy (as np): A library for numerical computations and mathematical operations, often used in conjunction with Pandas for data analysis tasks. * Matplotlib.pyplot (as plt): A fundamental library for creating static, interactive, and animated visualizations in Python, such as plots and graphs to understand data patterns. * Seaborn (as sns): A data visualization library built on top of Matplotlib, providing a higher-level interface for creating informative and attractive statistical graphics.
These libraries are used to import and read the Zomato dataset, clean and manipulate the data (using Pandas and NumPy), and then visualize various aspects of the data to extract insights (using Matplotlib and Seaborn).
4. What were some of the key data cleaning and preprocessing steps performed in the Zomato Data Analysis Project? Several data cleaning and preprocessing steps were performed, including: * Importing necessary libraries: Pandas, NumPy, Matplotlib, and Seaborn were imported to handle data manipulation and visualization. * Reading the CSV file into a Pandas DataFrame: The Zomato dataset was loaded for analysis. * Handling rating data: A user-defined function was created to extract the numerical rating from the “Rate” column, which initially contained additional text. The .apply() method was used to apply this function to the entire column. * Handling missing values: The project identified and handled missing values in the “Purpose” column by replacing them with “Not OK”. Later, it demonstrated dropping rows with any null values using dropna() to ensure cleaner data for analysis. * Converting date and time formats: The “Start Date” and “End Date” columns, initially in object format, were converted to datetime objects using pd.to_datetime(). Errors in the date format were handled by setting invalid dates to “Not a Time” (NaT). * Creating new date and time-related columns: New columns for “Date” and “Time” were extracted from the “Start Date” column. * Categorizing time into day periods: A new “Day Night” column was created to categorize rides into “Morning,” “Afternoon,” “Evening,” and “Night” based on the time. * Handling duplicates: Although no duplicates were found in the Netflix movie dataset, the process of checking for duplicates using .duplicated() and .sum() was demonstrated.
5. What were some of the key insights derived from the Zomato Data Analysis Project through data visualization? Several insights were gained through data visualization: * Restaurant Type Preferences: Dining type restaurants receive the most orders. * Rating Distribution: The majority of restaurants receive ratings between 3.5 and 4. * Booking Category and Purpose: The “Business” category has the most Uber bookings, and the primary purpose for booking is for “Meetings.” * Booking Trends by Time of Day: Most Uber rides are booked during the “Afternoon.” * Booking Trends by Month: Uber bookings are lowest during the winter months of January, February, November, and December. * Booking Trends by Day of the Week: Friday is the day with the highest number of Uber bookings, while Sunday has the fewest. * Trip Distance Analysis: Most Uber trips are within the 0 to 20-mile range, with a noticeable peak in the 5 to 10-mile range.
6. What are the different rounds typically involved in data science/analytics job interviews, as mentioned in the sources? The sources mention that data science/analytics job interviews generally involve three to four rounds: * Round Number One (Assignment/Case Study): The company provides an assignment with problems to be solved, often a case study, which needs to be submitted within 24 to 48 hours. * Round Number Two (Technical): If shortlisted in the first round, candidates appear for a technical interview. * Round Number Three (HR/Managerial): If successful in the technical round, the final round is usually with HR or a hiring manager.
The Zomato project aims to help candidates prepare for the first round, where they might receive a case study or assignment requiring data analysis and insight extraction.
7. How did the idea for Uber originate, and what were the initial concepts behind it? The idea for Uber originated in Paris in 2008 with Garrett Camp on a snowy evening when public transport was scarce. Frustrated by the lack of available transportation, he conceived the idea of a technology or app that would allow people to book rides and have a driver come to their location. Initially, the concept involved ride-sharing to divide costs among passengers traveling in the same direction. However, the idea evolved to its current form where users can book a ride directly to their specific location.
8. What role does data science play in Uber’s operations, and what are some specific applications mentioned? Data science plays a crucial role in various aspects of Uber’s operations, including: * ETA (Estimated Time of Arrival) Estimation: Machine learning algorithms analyze various factors to predict how long it will take for a driver to arrive at a user’s location. * Price Prediction: Data analysis and modeling are used to predict the cost of a ride based on factors like distance, time, demand, and traffic. * Route Optimization: Algorithms determine the most efficient routes for drivers to take passengers to their destinations. * Driver-Rider Matching: Data science helps in making optimal matches between available drivers and ride requests. * Fraud Detection: Analyzing data helps in identifying and preventing fraudulent activities related to payments and bookings.
The sources emphasize that data science is essential for Uber to function effectively, as it enables accurate estimations, predictions, and optimizations that enhance the user experience.
Data Visualization for Insight: A Project Analysis
Data visualization is a key aspect of the projects discussed in the sources, playing a crucial role in understanding data and extracting meaningful insights. The sources illustrate several ways data visualization is employed:
Tools and Libraries for Visualization:
- The primary libraries mentioned for creating visualizations are Matplotlib and Seaborn. Seaborn (sns) is frequently used for generating various types of plots.
- Plotly is also mentioned as a library that can be used for visualization in the context of the e-commerce project.
Types of Visualizations and Their Purposes:
- Count Plots: These are used to display the frequency of different categories within a dataset. In the Zomato project, a count plot is used to determine which type of restaurant (e.g., dining, cafe, buffet) has the majority of customer orders. Similarly, in the u data analysis project, count plots are used to see which category (business or personal) and for what purpose (e.g., meeting) people book rides the most.
- Line Graphs: Line graphs are used to show trends and relationships between two variables. In the Zomato project, a line graph visualizes the number of votes received by each type of restaurant, helping to identify which restaurant types are most liked by customers. A line plot is also used in the u data analysis project to analyze the monthly booking trends.
- Histograms: Histograms are used to display the distribution of a single numerical variable. In the Zomato project, a histogram is used to understand the distribution of ratings given by customers to different restaurants.
- Box Plots: Box plots are useful for comparing the distribution of a numerical variable across different categories. In the Zomato project, a box plot is used to compare the ratings given for online and offline food orders.
- Heat Maps: Heat maps are used to visualize the relationship between two categorical variables using color intensity. In the Zomato project, a heat map shows the relationship between the type of restaurant and whether orders are placed online or offline, indicating preferences.
- Bar Graphs: Bar graphs are used to compare the values of different categories. In the u data analysis project, bar graphs (count plots, which are a type of bar graph) are intended to visualize the most frequent categories and purposes of Uber bookings. Seaborn’s catplot (categorical plot) is used to show the distribution of movie genres, where the height of the bars represents the frequency of each genre.
- Dist Plots (Distribution Plots): These plots are used to visualize the distribution of a single variable, often combining a histogram with a kernel density estimate. In the u data analysis project, a dist plot is used to analyze the distribution of ride distances (in miles) booked by users.
Purpose of Data Visualization:
- Extracting Insights: The primary goal of data visualization in these projects is to extract meaningful insights from the data. By presenting data visually, patterns, trends, and relationships become easier to identify and understand.
- Answering Questions: Visualizations are created to help answer specific questions related to the data. For example, in the Zomato project, visualizations are used to answer questions like which restaurants are ordered from the most, which restaurant types receive the most votes, and which restaurants have the highest ratings. Similarly, in the u data analysis project, visualizations help answer questions about booking categories, purposes, times, and distances.
- Data Exploration and Analysis (EDA): Data visualization is a key component of Exploratory Data Analysis (EDA), as mentioned in the movie data project. It helps in understanding the characteristics of the data, identifying potential issues, and forming hypotheses.
- Communication of Findings: Visualizations are a powerful way to communicate findings to others, such as recruiters or stakeholders in a company. Graphs and charts can convey complex information more effectively than tables of raw data.
- Supporting Decision Making: The insights gained from data visualization can support better decision-making within a business context. For example, understanding which restaurant types are most popular can inform business strategies for Zomato. Similarly, understanding peak booking times for Uber can help with resource allocation.
In summary, the sources highlight the significant role of data visualization in analyzing datasets, extracting actionable insights, answering specific business questions, and effectively communicating findings using libraries like Matplotlib and Seaborn, with Plotly also being a potential tool. Different types of visualizations are chosen based on the nature of the data and the specific questions being addressed.
Zomato Data Analysis: Visualizing Restaurant Insights
The Zomato data is used in a Python Complete Crash Course to illustrate how a data analyst or data scientist can work within a company. The objective of using this data is to answer specific business-related questions through data analysis, including performing visualization and extracting insights.
According to the source, the Zomato data includes information such as:
- Restaurant names.
- Whether an order was placed online (yes/no).
- Whether a table was booked (yes/no).
- Ratings given by customers.
- The number of votes received by a restaurant.
- The approximate cost for two people.
- The listed type of restaurant, such as dining, buffet, or cafe.
The project involves several steps with the Zomato data, including data cleaning. One crucial cleaning step is converting the data type of the rating column to extract only the numerical rating and remove any extraneous text like “bye f” or “f s l f”. The source also mentions checking for missing values, and in this specific dataset, no missing values were found.
The core of working with the Zomato data, as demonstrated in the source, is to extract insights by answering specific questions using data visualization. The source provides examples of the following visualizations created using the Zomato data and the insights derived from them:
- A count plot is used to determine which type of restaurant receives the majority of orders, revealing that dining-type restaurants have the most orders.
- A line graph visualizes the number of votes received by each type of restaurant, showing that dining restaurants received the most votes.
- A histogram displays the distribution of customer ratings, indicating that the majority of restaurants receive ratings between 3.5 and 4.
- A count plot examines the approximate cost for two people, suggesting that most couples spend around ₹00 on an average order.
- A box plot compares ratings for online and offline orders, concluding that offline orders receive lower ratings compared to online orders.
- A heat map visualizes the relationship between the type of restaurant and online/offline orders, indicating that dining restaurants mostly receive offline orders, while cafes see more online orders.
These examples directly align with our previous discussion on data visualization, illustrating how different plot types are used to explore the data and answer specific questions. The Zomato project emphasizes the importance of visualization in understanding customer behavior and restaurant preferences. The insights gained from these visualizations can then be used by a company like Zomato to inform business strategies, such as offering more coupons for offline dining or focusing on improving offline dining experiences based on lower ratings.
Zomato Data: Analysis of Restaurant Types and Preferences
The sources discuss restaurant types primarily within the context of the Zomato data analysis project. The data includes a column specifying the listed type of restaurant, which includes categories such as dining, buffet, and cafe. The source also mentions an “other” category.
Here’s a breakdown of the discussion around these restaurant types:
- Categories Identified: The main restaurant types identified in the Zomato data are:
- Dining
- Buffet
- Cafe
- Other
- Analysis through Visualization: The project uses various data visualization techniques to analyze customer behavior and preferences related to these different restaurant types:
- A count plot reveals that dining-type restaurants receive the majority of customer orders. This indicates that, in terms of order frequency, dining establishments are the most popular among customers in the dataset.
- A line graph showing the number of votes received by each restaurant type indicates that dining restaurants have received the most votes. This suggests that dining restaurants not only have more orders but also receive more feedback from customers, potentially signifying higher engagement or a larger customer base.
- A heat map visualizes the relationship between the type of restaurant and whether orders are placed online or offline. This visualization shows that dining restaurants mostly receive offline orders, suggesting customers prefer to dine in at these establishments. Conversely, cafes see more online orders, implying a preference for takeaway or delivery from cafes. The behavior for buffet and “other” categories is also visualized in this manner.
- Business Implications: The analysis of restaurant types allows for the extraction of actionable business insights:
- Knowing that dining restaurants are the most popular in terms of orders and votes, Zomato can understand its strongest segment.
- The finding that offline orders are more common for dining restaurants could inform strategies related to in-house dining experiences and potential partnerships with these establishments.
- The observation that cafes have more online orders can guide strategies for optimizing online ordering and delivery services for this type of restaurant.
- The lower ratings for offline orders compared to online orders suggest that Zomato might need to investigate and potentially work with restaurants to improve the offline dining experience.
In summary, the analysis of restaurant types within the Zomato data, facilitated by data visualization, allows for a deeper understanding of customer preferences regarding different dining experiences and ordering methods. This information can be crucial for Zomato to make informed business decisions and tailor its services to meet customer demands effectively. This aligns with our earlier discussion about the role of data visualization in extracting meaningful insights from the Zomato data [previous turn].
Zomato Data: Customer Rating Analysis
Customer ratings are a significant aspect discussed in the sources, particularly within the context of the Zomato data analysis project. The Zomato data includes a column for ratings given by customers [previous turn]. The analysis of these ratings, often through data visualization, provides valuable insights into customer satisfaction and preferences.
Here’s a breakdown of how customer ratings are discussed and analyzed:
- Data Cleaning of Ratings: The Zomato project involves a crucial data cleaning step for the rating column. Initially, the ratings in the dataset contain extraneous text, such as “bye f” or “f s l f” appended to the numerical rating (e.g., “4.1/5”) . To use the ratings for analysis, a user-defined function is created to extract only the numerical part of the rating by splitting the string and taking the first value. This cleaned rating is then converted to a floating-point number .
- Visualization of Rating Distribution: A histogram is used to visualize the distribution of customer ratings for restaurants in the Zomato data . This visualization helps to understand the range of ratings and the frequency of each rating. The analysis of the histogram reveals that the majority of restaurants receive ratings between 3.5 and 4 . This indicates a general level of customer satisfaction, with most ratings falling within a positive range.
- Comparison of Online and Offline Ratings: A box plot is employed to compare the ratings given for online and offline food orders . The analysis of this box plot indicates that offline orders tend to receive lower ratings compared to online orders . This suggests potential areas for improvement in the offline dining experience or differences in customer expectations between online and offline orders.
- Importance of Ratings for Businesses: The source emphasizes that rating is very important for businesses as it reflects how much customers like their product . Understanding customer ratings helps companies gauge satisfaction levels and identify areas where they might need to improve their offerings.
- Categorization of Vote Averages (Movie Data – Related Concept): While not directly about Zomato ratings, the movie data project discusses a related concept of categorizing vote averages into labels like “popular,” “average,” “below average,” and “not popular” based on defined criteria . This demonstrates another way in which numerical ratings or scores can be analyzed and transformed into more easily understandable categories of customer sentiment.
In summary, the analysis of customer ratings in the Zomato project is a key component of understanding customer feedback and preferences. Through data cleaning and visualization techniques like histograms and box plots, insights are derived about the distribution of ratings and the differences in ratings between online and offline orders. These insights are crucial for businesses to assess customer satisfaction and identify areas for potential improvement. The importance of ratings is highlighted, and a related concept of categorizing ratings is seen in the movie data project.
Data Analysis Through Zomato and Movie Projects
Data analysis is a crucial process discussed extensively in the provided sources, particularly through the practical examples of the Zomato data analysis project and the movie data project. It involves a series of steps aimed at extracting meaningful insights and answering specific questions from data.
Here’s a breakdown of the key aspects of data analysis as illustrated in the sources:
- Defining the Objective: Both projects begin with a clear objective. The Zomato project aims to understand customer behavior and restaurant preferences by answering specific questions related to order frequency, ratings, and costs. Similarly, the movie data project seeks to answer questions about movie genres, popularity, and release years. Having a clear objective guides the entire analysis process.
- Data Collection and Understanding: The first step in data analysis is having access to the relevant data. The sources describe using a Zomato dataset containing information about restaurants, orders, ratings, etc. and a movie dataset with details like movie titles, genres, popularity, and release dates. Understanding the structure and content of the data is fundamental before proceeding with any analysis.
- Data Cleaning and Pre-processing: This is a critical step highlighted in both projects.
- In the Zomato project, data cleaning involves extracting the numerical rating from a string format and converting it to a usable data type [previous turn, 4]. The source also mentions checking for and finding no missing values in the initial Zomato dataset.
- The movie data project emphasizes several pre-processing tasks, including handling missing values, changing the format of date columns, and removing potential white spaces in categorical columns. It also involves removing irrelevant columns that do not contribute to answering the research questions. Additionally, the movie project demonstrates creating labels from numerical data (vote average) to categorize movies.
- Data cleaning and pre-processing ensure the data is accurate, consistent, and in a suitable format for analysis.
- Exploratory Data Analysis (EDA): The movie data project explicitly mentions performing Exploratory Data Analysis (EDA). EDA involves initial investigations to summarize the main characteristics of the dataset, often using visual methods. The movie project demonstrates this by checking for duplicate values and calculating basic statistics (like mean, min, max) for numerical columns to understand their distribution.
- Data Visualization: Both sources heavily emphasize the role of data visualization in data analysis.
- The Zomato project uses various plot types like count plots, line graphs, histograms, box plots, and heat maps to explore relationships and distributions in the data and answer specific business questions [previous turn, 2, 5, 6, 7, 8, 9, 10, 11]. Visualizations help in understanding patterns and trends that might not be apparent from raw data alone.
- While not explicitly detailed, the movie project also leads to answering questions, implying the use of visualization to derive those answers (e.g., identifying the most frequent genre or the year with the most movie releases would likely involve some form of aggregation and visual representation).
- Insight Extraction and Interpretation: The ultimate goal of data analysis is to extract meaningful insights from the analyzed data. In the Zomato project, visualizations lead to conclusions such as dining restaurants being the most popular, offline orders receiving lower ratings, and most couples spending around ₹00 on average orders. These insights can then be used to inform business decisions.
- Answering Specific Questions: Both projects are structured around answering a set of predefined questions. This highlights that data analysis is often driven by specific inquiries that need to be addressed using the available data.
- Business Relevance: The Zomato project is explicitly framed within a business context, demonstrating how a data analyst or data scientist can work with real-world data to solve business problems. The insights gained are directly relevant to Zomato’s operations and strategies [previous turn]. The e-commerce sales analysis project excerpt further reinforces the business importance of data analysis, highlighting its role in understanding sales, profit, and customer behavior for e-commerce companies.
In summary, data analysis, as demonstrated in the sources, is a systematic process involving defining goals, collecting and cleaning data, exploring its characteristics, visualizing patterns, extracting insights, and ultimately answering questions to support informed decision-making, particularly within a business context. The emphasis on data cleaning, visualization, and the derivation of actionable insights are recurring themes throughout the examples provided.
The Original Text
hello everyone i am swati and welcome to the All the people can benefit from this through interviews and This has happened for a job as well but with this python’s The objective of creating the course is that whenever you go for an interview Are recruiters interviewing you for half an hour? I can’t ask the whole Python question. That is why they ask the most questions on your projects and your It is the projects that tell this story that how much work you do that’s why this You can get Complete Python Crash Course There are five real timers inside it Projects include such as JumT There will be a lot of focus on implementation, whatever We are talking about all these projects When we start a project, Beginners have these questions in their mind It happens that we are very good at coding If I don’t know, can we do these projects? If you find it then definitely you have nothing to worry about You don’t need this crash course No HiFi coding required Starting from the very basic level, here we will talk about things We will start with this and gradually advance The whole project will reach this level Every single line of code that is written in I have told you one line at a very basic level. explained in very simple language so that Complete all your five projects easily If you get it then these projects which are going to happen It will be very beneficial for you because when You can do these five projects yourself. If you complete it then you will see it in real time The idea would be that a data analyst is a How can a data scientist work inside a company? if we work with it then this whole Projects are its Notes Data Set Code file all things description box below It is already mentioned and this course Your confidence level after completing it It will definitely get a boost, so let’s get started This is Python Complete Crash Course That’s alright so let’s start today’s class In today’s class when you complete your project If you complete it then your confidence level will increase This will definitely increase your confidence in yourself You will believe that yes now I have the science data The things that are analyzed should be understood today’s class is starting so we before you start zomato.in means Who supplies their food through Zomato If yes then that is from 2 lakhs onwards 26000 which is the restaurant near jomat If we have partners then we can understand that The data which is in crores is also of the customers Available for more than Rs 2 lakh and Rs 2.5 lakh Around Nearby About We can say Delivery Partners who are restaurant data If it is available then now it is available for you here The situation is that you want to enter any data in Zomato. Are you working on a driven role? The record of the data of the customers is yours Available now and as a data professional You need to extract some insights from this data Some EDF needs to perform visualization and there are some questions which are If you have to answer the questions then first First, I would like to show you that our The project is today’s class, in what way So in today’s class it will be visible in this way We will be coding each and every thing To rectify it in some way The graph you see on my screen With the help of these graphs we are will perform visualization whether it is a bar Whether it is a graph, be it a line graph or then in this way we have How to plot a histogram Insights have to be extracted in this way all the visualizations box plots all of these Visualizations you see Heat Map How to make this with data from Zomato We will learn all this in today’s class you must be coming to the point that here But we have some questions about this to help us solve the project First of all we have given Is there a restaurant or a buffet or dining? All this data is a type of restaurant we have here it’s been many times now This happens even in direct interviews Some such questions for you to solve are given specifically for your round number Like in forest mainly you ask such questions Now you will get to see it in the interview Solve these questions from this data The first question is what type of question you have to answer Restaurants do the majority of customers Which are the restaurants to order from? Which is the type of restaurant that Majority of customers order food how many votes has each type of restaurant Received from the customers now we are Talking about restaurants, there are different types Are there any restaurants, some have buffets Is there a cafe like full-fledged dining? If it is a restaurant, how many votes did you give? the customer has because look whenever someone The company comes to you, we come from Zomato we take the food and then it comes to us How many votes were given for a feedback option? How many people have told us well here We have to find out from this data what are the Ratings Now whenever we order food if yes then we give rating to Zomato that this How we liked the restaurant food If you don’t like the food out of five then give me three We gave a rating of 4.5 if we liked the food So, which are the restaurants that High ratings from the majority of customers It has been provided, you have to extract the data I have asked a question from Jomat observed that there are many couples who They order their food online How much amount do you spend on one meal So what size is it when ordering? Suppose the customers are from 00 to If you order food worth up to ₹5000000 It is possible that some people order online from Sometimes people take it offline also Which like mode has the highest ratings People give food or order it online is there in it or people go to eat If you eat in a restaurant, then this is there in it You also have to do an analysis of the six The question is which type The restaurant that has the most people Order food offline so that Jomat can make money like this Give some more good offers/coupons to the customers So that these people can get food wherever they want offline order online from where you order also start and the revenue of zomat is If he is able to grow then I have answered all these questions and the data set is given in front of you I will also tell you the data set here Let me show you this data set that we have is available But our Jupyter notebook is here We opened our Jupyter Notebook and Here I have created a new folder which is this I have created the folder, now The first thing we will do in the folder is how to upload data set then how to get the data Let’s see how to upload the set If yes, then I went here and clicked the upload button What you are seeing here is the upload We clicked on this Zomato data here I selected it and opened it and I I am uploading this data here we have this in csp format The data has been uploaded to us by Deposit Right now next we will work here I am a new folder which is new for us the file is ip by file so I am here Python 3 Our file, I will give it here I will open it and in this there is untitled now If this is a file, we would name it is zomato We named it as Zomato Project Now the first thing that we do is to we have uploaded it here and one of our Pass ip y n b cutter notebook Now first of all we have opened the notebook We give the heading here that today We are going to make that is the Zomato Data Analysis Project So I’m Writing Here is Zomato Data Analysis Project We would have written it in the proper heading format so i am writing this in a format of A heading Jupyter Notebook How do you install You have to do it, the link to the video is also below It is given in the description box How will you do the installation with that also You can come and check out the first one now work that we have to do here step number One of our main goals to make this project is to We will need some libraries nearby So first of all we will add those libraries here If you import then step one will happen We use the libraries which are imported These are going to be done in different ways We will use the libraries here the most The first one that we will have is a library here. That is Pandas now given this data set If we need to clean some data then If we want to do manipulation then for this we can use pandas Using the library NamPie Python is a library that provides numerical If it is useful for mathematical operations then We’re using measurement here Matt Plot Lip and Sea Bon Jo Visualizations I I was showing you so many graphs Here is the matte plot lip and seaborn If you make it with the help of the library, then every I learned what a library does. Now we have explained to you that every library what will we do here, we will import it so that All these files are in our Jupyter Notebook what should happen to all the libraries Once activated, import Panda SPD Now look Panda S PD that’s why I wrote Because panda panda is such a big word, we every If you cannot write the place then give an abbreviation for it A short form has been created PD i.e. Panda, we are with everyone’s abbreviation People will import the library just like that import measurement do not write measurement again and again so we write it as NP After this, we will import it here. matte plot lip so i’m writing here Import Matte Plot Lip pie plot s flipped by this name and then We will import C bonds I am writing here important business what have we done with all these libraries Imported successfully now libraries So now we have reached step number two. It will happen that we have the right to this data We have the data of Zomato CSP what are we supposed to do with this data If you want to bring it in this notebook then data frame Read what to do on this We will have to create data frame so I will write the steps here Number two we will create the data frame now The data of that Zomat should be transferred to this In this file in Jupyter Notebook, we have We will bring you here for this, most of the people here First, what I’m doing is creating a data frame I’m creating a variable named this Inside PD, PD means Panda library PD dot head I will do it here pdi d Reed I’ll pay here because now we have to What should we do with this data that we have? It is available, if I want to read this data then I I will write on this page read under CSV why i wrote csv because the file The format is in CSV format now what is the name of this data that we have So the name of this data is Zomato we have a file with this name data.csv If it was close we would have run it here so it is showing certain error here The error is because jomat is after that space is dot csv so here we People should slightly modify the name of the file as it This is the name of the file, write it here let’s run it now alright so Now this error is gone because the file name Now it is okay, now we will do it here We print it and enter the data here Now if I print the frame then In this way the data comes to us now what should I do as I am Here, if I call data frame then this The data has reached us properly So what we’ve done so far are two things. I have learned to do this project I have imported the libraries I need and This data was available to us. We have our own Jupyter Notebook here We have already read this data Now that we have the data, we can Now we will start working with the data First, look at the data carefully The name given means that the restaurant This is the name given to all these restaurants There are 147 of your restaurants in this data which It is included now if you order online If you have ordered online then yes or no means If someone has ordered offline then no How many people have booked the table here? I have booked a table and eaten here Its record is given how many ratings it has given After eating food from your choice menu, this You have ratings, how many votes you have cast This is available and enough to feed two people It costs, here’s how much it costs and the listed type of restaurant What type of dining does the buffet serve? What does the cafe serve and what type is it? We have this data available here Right now look, first of all we have to understand two things We learned how to create this data frame and we have libraries here I have imported it, till here everything is clear If yes, then let me know in the comments If you have any doubt then ask in the comments Now what will be our next task here? The next task is to use this data What we have to do is first see the questions Let us take a look at what needs to be solved first The first question to be solved is which This is such a restaurant which attracts maximum people what type of food do you order from the restaurant so now what are we here for We will do this thing to solve this question We will solve it but before that we will give you some data Professional, it is your job to ensure that this data Is there any missing value in Is this data clean? Is this data accurate? There are no such outsiders in these to find out everything so that we can use this data Let us be sure that yes, what we have We have provided the best data we took everything out and then If we can solve the questions then we will The steps that we are doing are data Cleaning Now the first thing that we need to do here You should see the ratings, see in which order It is 4.1 or 5 so now what is this, look at it in everyone bye f bye f bye is written so This is something that is there for us somewhere The hurdle will create so what do we want These are the ones who sleep under everyone like boyfriend it can be seen written f s l f We want to remove the rating as it is If someone gave 4.1 then it is just 4.1 and this is It is written in the denominator that this thing should be removed So now what is this step for us here? If you want to perform then now whatever step you have to take We are going to perform in this what do we need to do is to convert Convert the Data type of which column rating column so Column Name hey off column rate okay what do we need it for in this we want that only and Only the ratings given by the customers If 4.5 is given then 4.5 should come out like here 4.1 So we want 4.1 to come or slf If what is written is removed then this thing will be considered the most important First we clean the data, now its what do I need to do for this then I I am writing the code here, please pay some attention to it see i wrote here deaf most First we are creating a user defined function whenever we call a user defined function let’s create user def function means this There is a function like this, look here I am If I am writing print again and again then this print is already built in by python it’s inbuilt inside us we’re just picking it up and using it again and again so If I need to print something again and again then you don’t need to write any function yourself But user defined function means this it happens that we have the advantage that a This time I wrote a code for the user Now whenever I get this inside the defined function I would like to use it if I can Here, we are calling a user defined function will create this user defined function We will give it a name, let’s suppose it handles rate handle rate rate mean rating and its Inside here, we pass in the arguments Let me explain the value to you a little bit now please be patient this code is one by one line of code Don’t be worried if I explain it to you now The string we have inside value The value is I will write the whole code first After that, every line I will tell you Don’t be afraid at all as I keep explaining. I will explain one thing to you from the basic level itself. After this, we will write the value here. is equals to The value of zero and then we are going to Right Here Return float value return float The value is up to here, we can run it and see yes alright no errors yet now In this I will write one more line data frame The name of the data we have is data frame Inside that there is a rate column, inside this we People will put equals to two minutes, just be patient Take it, I am explaining what I am writing I want to write the code first Then I will explain to you that every What is the meaning of the line, we are inside it will apply dot apply handle handle rate Ok, now by running the code we have written Let’s see, now I am explaining this line by line. look at what i wrote now if you Here you will see directly what we need was that the direct floating point number which is If it is 4.1 then directly 4.1 will come, which is slf if now he goes away before everybody else we will see how it was coming earlier It was 4.1 5/5 Look, this thing was there in everyone now The code for this has been removed from here I have written about it, now I will tell you every single Let me explain the meaning of the line first Created a user defined function here We named it R and kept handle rate inside it I passed a value now here it is The value you are seeing is equals to string now here string is we know that this data Given the type we have the string type If there is data then we have written the string here value dot split means split function what does that do in the string like if you Anything you want like here it was written 4.15 So if you want to cut from slf We don’t need this thing from here If you want to split it then here it is The split function helps in 4.1 You are separated, now what do we need If we only want 4.1 then what value did we write is equals to value row means row position Your 4.1 was available and you can return it gave this value now what did this thing do We have collected the entire rate which we have. We had this column named this rate We need to implement this in the column of So what did we do here that this thing called rate We have columns inside this data frame what is the name of the columns we have in data frame Inside this we have the column name there is a rate so I have written the rate here inside it What did I do by applying this handle applied the rate function so that We should not do this thing again and again read and if it gets easy then this function we wrote and then look after writing The things we wanted were direct from here The values we get here Now what else do we want Now we need to check that the data set is Is there any missing value in this Is there any value null that every whether the value is proper or not is something we People will now check their data cleaning What should I do for this inside the step I will write a summary of this thing, if you check the full If you want to do it then we write it for this frame d if now understand this info from function name I am coming that he will give the information so now we People can see here what we have Total columns are six columns here six seven Columns are available Name object inside it Means the name is obviously a string Type of data is online order yes no This is also string type data booked is it a table yes no then this is also a string This is the type of data, now this rating was 4.1 3.5 It comes to us floating point The data shows how many people voted This is a type of food for two people What is the approximate cost? This is also an integer and the listed type is Hotel Dining what type of cafe is it so this Object type is done, now here you If you look here it is written no null This means that all the values in this data set is available no value is missing Here, this thing has become clear to us now Now we are sure about this thing So let’s give the data we have collected to ourselves The correct data is available now, the first question whatever it is this is our question that we What should we do about that restaurant? I want to find out which majority customer If you order food then now we do it for you People write this as the first thing here These are the types of restaurants that we know If you want to take it out here then I will write it here I will give you the type of restaurant first We are going to solve the question Now how do I get this thing to show me So I want to show it to you here After making the graphs, first of all I am here I write data frame dot head and call it first So if we take this, we have all the columns that they all have arrived when we put our heads down In this data we have total 148 values but when we apply the head then it The initial five data are given to us It shows here, now here we have the graph To create the graph I want to create I am using the C bun library and I am we wrote bun as import c bun s n s so I wrote here s ns dot here But we want to make a count plot here By counting a restaurant here make equal to I will write here I’ll write the code once for the data frame then well you have this thing I will continue to explain and what we have What type of restaurant is column buffet type cafe type dining type so this There are values, we talk about these values here Let’s write it down first, now make this We want the count plot to be overturned dot x label and write it here Type of restaurant run by See, we only have two lines of code I wrote it and this graph came to us. I am trying to explain the meaning of this graph to you. first of all let’s understand the graph a little Then let’s understand its code, first look at it Comes to you here at X Access It is a buffet type restaurant or a cafe type restaurant if it is other or dining type then we From which type of restaurant did you have to take it out? Most of the orders are for food so now we Here you can easily remove it that Dining type restaurant is the best among these More food is ordered which is more than 100 If there is any plus then we can easily reach this conclusion Here you can find out who the majority customer is order food from a restaurant from From a dining type restaurant then The cafe restaurant is at number two The reason people order food is other and then de buffet okay so this data We had to bring the analysis out here Now we have extracted its code a little bit Understand this, now look at a plot where my By counting the numbers he is telling that the There are more than 100 people having dinner so what is this going on here count and he is telling me by doing this, when we A plot has to be made where we can see the exact If you are writing the value by counting then here We plot the count of the curve on x What did I need in Access? Which type of restaurant is listed in type to Look here we have this list end type The column was titled Buffet This is what we have done, we have put the name of the column as In access we needed the label below that What should we name it? We will have to name it type of restaurant so here x What did I do to assign labels in Access Written here type of restaurant whenever you If we used to make graphs earlier, we would have done something like this used to write na this is x axis so there We used to write the type of restaurant Exactly as we have written here And this is the graph that has come to us So the first question that was given was What was the conclusion? This is its conclusion turned out that Majority of the Restaurant Falls in Dining Category Maximum number of meals in dining category People are afraid to eat here in restaurants If we get to know about this thing here then We have solved our part number one question I have solved it, now it’s a matter of how many votes which is every type of restaurant when people If you eat food then obviously you vote Which type of restaurant do you go to the most? Every type has the most votes here Now the restaurant of has got this thing what do we need to do here for this For this we will have to write the code here Let us write its code here and use it let’s analyze so the next insight which What we have to figure out is that every How much do you like the type of restaurant? Vote for each type of restaurant It has been received so I am writing here data frame dat head is alright so what to do now No, we have to make a graph like this In which there are two things, visualization which We have to perform, we have to do something like this that the x axis remains with us what type of restaurant it is and If we want to take out votes in that access then For this, we have a line graph here. We will make such a line graph that Which restaurant got how many votes There are lines from this type of visualization If we want to show it through a graph then now here Two columns mean a lot to us The first list and type in which the restaurant type and another column in which each How many votes has the restaurant received then you Here you can see 775 787 in this way Show us the votes here Now here is the graph which we have for this key I want to make it for this I have posted the code here I am writing the code one line at a time to explain it do not hesitate at all, If you understand, let me know in the comments You can go but what did we do here? Grouped I created a variable called data. See Now first I will explain the graph then this code So look what we need How many types of restaurants do we have? Four restaurants that order buffet There are some cafes, some dining and some Everyone has other types of restaurants Look what you will see in the va axis In Y Access you will see every number of votes How many votes a restaurant has received then we You can see here that it is of buffet type He has got more than 2500 votes now. The dining here is the most It has received more than 20,000 votes Votes have been received for dining type The restaurant and other restaurants Its approximate cost is around 10000 It got votes so we had to make it that we wanted votes to come in this axis and type of restaurant here so that we can Can you guess which restaurant? type got the most votes so here Pay Dining Restaurants are the most Liked by most customers and most Most votes have been given to him now Every single code that we have written for this understand the line firstly I have read it here What is grouped in the name of grouped data? I have created a variable here called data Now in this data frame we have two columns: We meant that there was a column of liston type And the second column we had was for boards What we did was group by both the columns. that is, by doing both together, I put it and wrote the names of both the columns Listing Type and Votes are the two columns I wrote the name and made both of them equal now After this the result was equal to Dr. Data frame now these votes are inside it We passed this group data that we had created to it Now this is the plot we have made Look at this, we have made a line graph, so c This is the line graph of equal to green We wanted to keep the color green so I used It is written green here, now let’s suppose you If you want to keep the color red then you can make it red also You can keep it or it depends on your wish, see now If it turns red, then it depends on you It does whatever color you want to keep right now So we color it green That’s why I wrote this for color here now its marker what does marker mean Look, there is green at every point here You should see the green coloured dots every time you look at them. The place to see the dots here is green getting it right this are the dots isi it is called marker so we have to know the type of marker Which one did I want, I wanted dotted one, so I wrote here oh okay now x access what i wanted was an axis what did i want you to write type it look at the restaurant then write it in x access What is the type of restaurant and its color I have written red here so now If its color is red then you need to do something else I want to keep this color blue so you can right here blue Now if you look, you will see the blue color here below If the type of restaurant is different then this is for you The top is whatever color you want to keep The size is given, how big can you write now you want me to know its size and If you want to make it bigger then look at what I have written here I just made the size zero Let’s suppose I make it 30, then now Look here the size has become bigger If so, let’s suppose we have kept the size here it is 20 now if you look here it is size 20 I came in just like that and in access too We had to put a heading or we did it in access Put a heading here with the name of bots and kept the color red and size is 20 so in this way this This is the graph that has come to us The answer to the second question is I had to find out what was its conclusion The conclusion here is that the dining It is a restaurant, what do they do? We have got more votes so we are here Let us write it in conclusion Dining Restaurants If he has received maximum votes then this is for us If it is cleared then now the company knows this It’s okay brother, the dining restaurant It is our best source of income because people like it the most Now we have to do the most kind of work Cafes need to do better buffet Working outside restaurants that have If needed, get insights from data in this way Only when a company comes out with a strategy makes it clear what is going well and what Things are going bad now, the third question is The third insight we have is that we need to extract is from this question it is here that what are the ratings that majority of the Customers have received a rating of 2.5 If someone gives a 5 rating, what will that rating be? Which restaurant has the majority? We have now received this type of rating If you want to remove people then rating is very important It is important for the customer to like your product Knowing how much each company is liking it If you want then what should we do now Here are the ratings, how many ratings are there If we want to work on this then we have If we look here then the rate will be here We already have a column named now we what will we do we have to see who has received the highest rating from so what are we We will do whatever the rating is here I have given it like 2.5 2.75 so we’ll plot the graph here and A graph of all the ratings is made in this manner Let’s see which has the highest rating The rating is given here distribution so here I am I want to make a histogram so I written here pat dot hist hist i.e. that histogram now this histogram This is a histogram, all the bars are stuck in it We go through all these things since school time I have been reading things and now I am trying to code it We learn this through data analysis are comma bins equal to fyi code which i am writing you I will also explain it, so don’t worry about it We will keep its title as the title of the graph Rating by the distribution name followed by P lt dot show now let’s make it y p LT Dot have to right palti dot distribution and then here plt dot show run this so here he is saying ok data There is no R in the frame, hence the spelling It was a mistake, now look at what we are making This was what he wanted, as a histogram It has arrived and here is the count given by you now we’re seeing that the most Which likes are the ratings you received? if the restaurant has got it then here we If people watch, we can observe from here looking at this graph that this 3.5 Such restaurants get 3.5 to 5 lakh marks. The highest rating of 4 was received here So people have given so many ratings Where does the maximum rating come from that between 3.5 to 4 so it means here and look here at the code a bit if If you understand then we had to make it here histogram so i wrote hist data Placed inside the frame the column containing the rates name bean equal to 5 it means if I’ll show you once and make it three So look at all these things, they are very sticky hey right now this is what happens every time basically this is what we call beans doing five I am here and I can understand the difference, this is the thing If I do 10 then look at this and go away It will go away and you will understand it better now It will be between 3.5 to 4 at the most People have given the rating that is there now equals two if I say do two then look and you can see it clearly here It seems that the maximum would be between 3.5 and 4.5 Meanwhile people have given rating in this way You can set the beans according to your requirement You can check it after making it Conclusion of our third problem Here comes the majority Restaurants Majority which are Restaurants what are they doing so I am writing here The Majority restaurant Receive ratings from 3.5 to 4 for majority of restaurants which Ratings are between 3.5 to 4 out of four It is getting a good rating There would be concern if two are getting one If it is then you would have to work more But right now the rating is between 3.5 ser Now next thing is what do you want to take out from inside Now Jomat has observed to me that brother There are many couples when two people live together So most and especially such cities if Let us talk about metro cities Bangalore Bangalore Hyderabad Pune where two people Couples are let’s assume working professionals Most of the people who are there are from Jomat let’s order this online let’s order this online This is an overview of Jomat, now know about Jomat wants you to accept when they spend That today let’s assume there are two people and we are going to have dinner What is the average size of what you are ordering It costs Rs. 500 and we order food worth Rs. 600 Let us ask what is its average size So he wants to know about Zomat, so now for this We will write the code for what we are doing If we extract the data then let’s solve this So now we have come to this conclusion Now the next thing we are going to solve is This is how much people spend on an average order Couples do so average order spending Bye couples and this we call the heading format Let me write it down now see what to do Once you have the data frame here Data frame data head is written to the data once I will show you, now you see what is written here that is the approximate cost for two people It is given right here, you can check it here If you want to get out then here you can see now here approximate 800 800 200 300 600 is given in this manner then what will we do We will visualize this thing by making a graph because when you go inside the company The insights you submit will either be sent to you via a report You will create a dashboard there explain through graphs so that we can With all this DA as well as the visualization which yes they are performing so what can we do How many people will do it here? how much money do you spend so here There will also be an amount, we will write it down and If you write the count here then we will know It will be counted on which people spend the most Let’s see what is its average pricing So let’s write the code here I created a like variable here couple This data is written in a data frame named data. what was the name of our column inside the frame The column was titled Approx Cost for Two People wrote the name of this column, now we are here If you want to make a count plot then I have done Here we write the count plot in X access People are passing this couple data here If we go here and run it now Look, it has come to us now. Let’s see which one is the biggest So we will see here what 00 is Most couples spend the most On one order, we had to take out this that The maximum amount that people spend on average If you do it then 00 is its count here Most of the people we see here are After this, if you look here, for ₹ 50 People give more orders then after that If you look, most of them are between 400 to 500. more people order and if you are here If you look at it, the lowest one is Rs 950 for two people Very few people spend on food This means the average order spend is the highest how much is done so the order of 00 Couples would be the most ordered of 00 they do it right so this is us from here had to draw a conclusion, now see when the company If this data is known then it will be on their page The updates that will be made on the menu will be such orders will suggest which is between 00 to 500 so that What is the order capacity of people If it is around 00 then the same thing should be visible Now assume that anyone can buy whatever they can could you say let’s suppose that his budget you don’t have the right to buy it so if you You will show him i, his budget is 10000 rupees phone won’t he buy the right to take it? In the same manner as the customer’s budget You should consider it as a company and accordingly When the items were shown the company understood that Brother, when two people order food then they People order food up to approx 00 The most we do is up to 00 If you start showing them the products Is this the conclusion that got us here? It turned out that the majority of couples who ordered ₹ you keep writing conclusions like this when you If you are learning to do a project, you will forget it In future, always write such conclusions. Majority of couples Prefer Restaurants With N Approximate Cost Off 00 So you got this insight from here Now you people are the true professionals Whose such insights are very useful to the company Now we help in big tasks Next, we have discussed these four things here We have solved it, now we have to find it Which mode is online mode or offline mode The one that gets the most likes rating People give more ratings in online mode What are we for this offline? For this we will enter the code here So, first of all, we write it here. Heading Ch Mode receive Maximum rating people order online then give good rating or off in line so for this we will code here Now what do we have to do? I am the first one to First you have to read the data frame here once I will show you by doing it so I am writing here data frame dot head now let’s look here People here we have a rating what will we do if column is box plot We will make it and see, I will tell you here I am making different plots so that When you are creating your project By now you have realized that how to make a graph how to make a histogram How to make a count plot? How to make a line? How to make a graph and this is the fifth type How to make a bar plot like a box plot I’m going to explain this as well, so the box plot There are some boxes like this and There is some plot like this, for this I have Here you can set the size of Palti Dr. figure want to make a box plot given here now But we have written that order online. I need things, I want to order one online and rating now online order means yes yes Where ‘y’ means online and where ‘no’ is written It means offline and access in Wi I have written the rating and here passed in data. Data frame is now If I run this and see, we have this Look at the graph that has come in X Access We have online orders, so many people If yes, it means it is written here then you can see it There are so many people who order online here Their ratings are higher, from 4.25 to 3.5 in between and those who are offline and those who are online No, their rating is 3.75 to 3.25 This means that when people panic online They give good ratings as compared to the We understood this thing offline on this So through the graph we can see it here Let me write the conclusion so I am writing yer conclusion and in conclusion we are will write offline order receive lower rating lower rating it gets a lower rating in Comparison to online orders which has good rating then here this thing I understood it when people ordered online so your ratings are good but You need to improve a little bit offline You will need it so that is the conclusion now The last question is also a very good one The question is, we will make this heat plot You guys news about heat plot You may have seen many channels etc. but We will now learn how to make it So look, so far we have found out that the There are four types of restaurants What kind of restaurants do we have near us? we have a restaurant we have other type of okay we have one dining type of restaurant we have cafe And we have a buffet which you can order those restaurants are ok i understand that Now the order is coming in every restaurant The way to do it is to order online or So that would be yes, people can order it online or no Now let’s go to a hotel When we eat, we also order Zomato from the same hotel Does it come to our house or do we come to that hotel? can you eat it also so yes or no means Now there is both online and offline in between I want to make a graph that looks like this so that you can understand which restaurant it is from How much food is ordered A color scale We will set it up here as The color will get darker here, it means More and more people are ordering food and colour If there is light, it means less people will be there If you are ordering then this is the heat plot For this we will write the code here. that’s how to write the code for this ok So for this we write the code here. You will find this code line by line here. I will explain so don’t worry about that First of all let us look here, if we Here we have the pivot table here Before I want to create one Let me show you the bars and the data frame So that you get an idea, so I am writing here data frame dot head and look at the frame here once Let’s take it, okay, so now what did we do here? So I’m creating a pivot table here. Because we want to create a table here. I have changed the variable name to the pivot table name placed the data frame and pivot table placed inside it Which are our useful columns like we Which type of restaurant is required here? Inside it’s like a buffet, all these things are here So inside these types listed here we have Enter the column type, now order online If we do it offline then we need these columns also So we have written here how to order it online Here we set the size of the function fill kept the value zero now here we create If you want a heat map, I wrote it. Insert this pivot table inside Enter Equal T true and C map means color here we I will set this color, I kept it yellow green And we have kept the blue format as D set. The title should be Heat Map x Online order should be written in access and y should be written in access as listed type this now if I go and run here If I try it then this is ours here This is a heat map, now it’s ready, one line at a time OF code is understood according to this What we needed first was listed in Type and order was required online order then If you go here and type in these listed If you check here, you will find all four types Restaurants have come to you Dining Cafe & Buffet Right Near You It has been written that we needed the second thing whether the food has been ordered online or not It has happened online means yes and it has happened offline Meaning no, so now we have put color here. I had set it up here which I will tell you here She was showing the color which we have written yellow Green and blue, as the color becomes darker look look this is the dining that we have There is a restaurant in which there are 77 people No means they don’t eat online but offline have eaten food basically obviously when we Many times we go to a restaurant for dining Ambience to get the feel of people dining to get the feel of that wipe what do we do sometimes Going to restaurants for dinner etc. so mostly here we are seeing that If people use the dining type of food offline If there is a restaurant then people are more offline there We eat 33 but people order from there But if I talk about cafes then cafes People are going out and eating less, and off-line The food which is most consumed is snacks If you are ordering from a cafe then these colors are like As it gets darker it means range The stronger we will become now if you look at others then 2 but here we are People are seeing that two out of every 10 people of others which is cafe not cafe but other type The restaurant is going offline to eat and most people online buffet Same thing is happening in the buffet too On an average your people are three here If you are going offline, then four people If you are coming online in this then this is the whole I got the complete analysis after looking at this one graph. I am getting to know all the insights from this If I am able to take it out then I will get its title I had to give a heat map so I wrote PAT Dr title so look here is the heat map x We had to give the label in the axis here and If I had to give it in y axis then it would be in y axis We have listed these types and We had to give it online for X access order so in x access if here I Let me see, my online order has arrived now Look, the color is getting darker As we move towards the color blue We are going to do this show with darkness and if your range is low then click here Look at the yellow and green type shade It is showing you here, so in this way This was our conclusion from now on looking at this, looking at this graph Let’s talk about what conclusion is coming out We will write this in heading format Let’s take this conclusion so here we can The conclusion here is that Dining restaurants which are mostly Offline orders come to him from cafes We get more online orders this time Suggest that clients prefer to place Orders in Person at Restaurants This means that more people like to go in person Let’s talk about the ambiance of the restaurant and to pick up the vibes but mostly Online ordering is also available from the cafe, joke Mainly items like your snacks etc. If it was then it was basically a Zomat We have the complete project And sometimes such questions are even asked in some Jomat When hiring happens, sometimes such things happen Questions also come up, so if you do this The project that we solved today If you solve it then definitely somewhere Somewhere this will boost our confidence our portfolio and our Understanding our skills is very important If it makes you strong then this was today’s project whose code you have in the data set everything Description box is available below you can check it and many more like this projects if you want to do That Python project you want to do Whether that project is of machine learning or Then you want to do that project in SQL. or Power BI, then there are more like this Your projects are complete If you want to watch video lectures on Python Want Machine Learning SQL or Power? If BI offers these complete things for free The I Scale is available in courses You can check out by going to the website You can check The Eye Scale website and app Where you can find free data science data If you will get analyst courses then there More projects in these courses by visiting You can learn more lectures and more concepts Can be used along with notes port files And everything and today’s project too I have written all the notes and port files If it has been provided then complete it and when this project is completed Don’t forget to tell us how you felt in the comments How confident are you about completing this project? If it increases or not then I am waiting for this thing If it stays then we will meet in some other time Keep Data Curious in Video learning and stay motivated along with the Data Science & Analytics on iScale’s Platform Premium courses in data analytics too are available which you can check out The Dai Scale website and mobile application But where you complete the detailed curriculum Lectures End to End Projects Live Doubt Classes Industry Recognised Certificate Off training as well as interview You will also get help for preparation the link is in the Description right, let’s start today in this class project where u We can do a complete project of data analysis inside which we must have been data How to do EDM cleaning All these things about how to do visualization But there is one more special thing about this About the project which I am going to tell you about here I am going to see, we will take on any project If we make it then the ultimate behind it The objective is that one day we go to any company for interview We have to crack the race with this mindset When you do a project, you go for interviews If you go for the company interview You will know what the rounds are Most of the data should be known analytics data science business analyst All positions that are being hired for In that generally like three rounds of There are three to four rounds of interviews and It depends on the company company but what happens mostly is round Number one, the company gives you an assignment Sends you some questions in which There are problems to be solved, they will solve them for you You have to submit it within 24 to 48 hours time is given if you win in this round If you get shortlisted then you will have to appear for the round A call comes for number two technical and If you are shortlisted in round two also If you go then your third round is HR round or the managerial level round which is called If you get a call for that then this project Through which our round number one is Where the company will give you a case study How to solve the assignments she sends how do we submit it should make projects like this that we prepare ourselves for this round Was able to create potential and for this round You should be so capable that we can take out the round So these things are also included in today’s project You will get to learn how to The company provides case studies, so here the most first what happens like when we have a Do any assignments come to be solved? For the interview, first of all the company tells you about yourself Sent documentation or PDF or something like that in which first of all you understand After that let’s know the details of the company You are given questions in this assignment I will solve it and submit it to you if it has to be done then it should be done in the same way as a You get an email from the company’s end just like that I am here to help you get this project done If I am going to start everything first, it’s on you Introduction given about the company u Technologies commonly known as u It is said that this is an American multinational Transportation company and more Courier Services Food Delivery Fry This is also true in sectors like transport The company works as the main head of this company The quarter is San Francisco There are more than 70 countries in California They operate in 10,500 cities. World Wide Inky Rights Available Services It is available along with it world wide It is a big platform with 15 crore Its monthly active users are 60 lakh In this the drivers are active and its If we look around every day then it is 2.8 There are more than a million trips that take place People do this using this platform. Now if we talk about the company’s revenue How much is Rs 37.2 billion for 2020-23 It has a revenue which is quite large the amount is because it is already very There is a big multinational company now about this You have got the information, now this You will get information about how it was made here you are getting that here whatever you are getting You can see the man on the screen here Its name is Camp Garrett which is basically u He is the founder and the idea of making R is his The first thing that came to my mind was basically this It was 2008 that Paris was in a lot of trouble It was a snowy evening and a lot of snow was falling Not much public transport available So in that frustration he told them It started coming to my mind that such a There should be technology, there should be such an app so that I can book the rights and My driver is standing in the place where I am standing If you come to the place and pick me up then this He had a problem and he had to find a solution started and from here the idea of u came to him it came to his mind, he started How did you not become a Kudmir in the beginning In the beginning it used to be that you said let’s suppose If there is a car let’s suppose that car You have booked, you want to go somewhere booked a car, now there are more such people who are suppose that your office is like this, I will take you by example Let me explain to you that your office is Let’s assume your office is near India Gate and you are the one who has to stay let’s suppose you Do you live in Chandni Chowk now? If you need to take LETS to go to India Gate If you want to book then what used to happen earlier that all the people who live in Chandni Chowk And for all those who want to go to India Gate what would have happened to them was that right sharing What used to happen in that was that whatever amount of money you It will seem like it will divide you people. If you can, Uber started in this way first but after that gradually what today is we are looking for u if i can find any place I am standing now, I will take out my p If I book the rights in that then The driver will come directly to my location It seems Uber was not like this in the beginning initially the rights sharing platform which was that we can share the like cost that I had an idea of the method, but slowly This idea changed and the camps that followed He met Travis Canick, who Its cofounder is and this is Pra Cania King Ye was very good at technology He had already formed two companies of his own and time, after that they got Uber If he liked the idea then he also told u I joined and in this manner I got my job in March 2009. in u starts properly Now when u is launched then first of all this sa Its beta version launched in Francisco It was done and R’s first ride was in 2009 It had started in but its first The ride that was booked took place in June 2010. very soon very positive to u I started getting response from people so in 2012 In 2015 u started expanding internationally The first thing they did was their expansion He started expanding it in Paris and also in the global market u When it starts reaching then the right sharing of u is Those are the four types of right sharing u x i.e. that every day if you get an affordable right If you want then you can book u x if you want u pool You want other passengers with you and everybody has to go in the same direction So you can book a bridge if you need more More premium service than Joe Black For cars that run in foreign countries u can book u group and u can book single You have a very large group and you have a lot of If you want a big spacious vehicle then you You can book single as well as today u ts u Freight and u for Businesses like You can also sell your business on the platform moving forward now how do u earn money So whatever commissions we get for whatever rides The commissions that are there in the booking From that u earn money, there is subscription is that you let’s suppose I am one such I am an employee whom you book every day and give me your If I have to go to office then I I can buy a subscription and You earn your money from advertising he earns from here the main important thing is u The thing that is important for us is that you stand strong Data Science in the Domain of Analytics it works on domain why does it work if so then its first point is this TA estimation i.e. arrival time which it happens now let’s suppose that I am here I am standing, okay, one of my locations Suppose Chandni Chowk and where do I go if so then let’s suppose I have to go here Chandni Chowk to Rajiv Nagar now its B’s The distance is how much time will it take for the driver Which driver is available in coming then this For the estimation of all things we people You can also read Machine Learning Algorithms Tips here Lie learning algorithms are responsible for all these things. now as soon as it is needed I book u from one place to another If you know then it immediately starts telling the price that If it costs you 00 then this is the price Prediction is the data science behind it There is a big rule of data analysis Optimization moves from one place to another How much time will it take to estimate the Have to tell me the route for all these things Algorithms are also used in data science happens between the driver and the rider Now we can help you in making perfect match Even if you make payment many times, then someone There should be no fraud in that payment, There is a domain, it is mainly used here Because of data science data analyst You just made a pap of u but that pap Can’t tell from one place to another How much will it cost me to get to the place How long will the driver take to arrive? He will be in some This is why data science is useless All these things are possible with the help of analyst it comes where we can tell that How long will it take for the driver to get your location How much will it cost which one is good This is the route to go for all these things Data science is the use of data analysts and so u are also hiring for this position if it does then this is what started in you That was for data science data analyst It was launched in 2011 to 2010 But work on it has been going on since 2011-10. This is the person you see on the screen Look, his name is Kevin Novak He was basically the head of data science and used to lead The background of me and him None of them came from an economics background It seems that I am a master in data science There was nothing like that were supposed to come from economics domain but He had a very wide interest which he Data Science had developed the skill set He is very much into data analysis Now this person was helpful, he was seventh I had joined as an engineer in R A very small company was started very early It started out small, so when The new one has come to give you the interview Someone told him what kind of a startup is It is open, you can give interview if you want I got to know from a friend that when he took the interview When I reached to give it, I was wearing a coat, pat and tie We reached there but it was a company which She was walking from a very small place so she felt that you should join or not but He knew very well that here This company can definitely grow at some point It will create a very good revolution They then used computational algorithms to This company has captured the science here I started leading it and as a chief Data Scientist he worked as a new After this, in bringing the technologies Now, it is of use to us that the thing which the company has It is given here in the documentation This is that we have told you everything about the company Now let’s assume that you are about to that you are working for data analysis I have six questions for you, You have been sent an assignment for your You have to solve this in the interview what the first question is saying is which People are most interested in this category let me tell you this second question he is saying that most of the rights people For what purpose do you book to go to office I book, I book to go to a party where do we go and book restaurants Let’s book it, after this the third one is that At what time do people book the most R caps? Now let’s see what the company needs to know It is very important, let’s suppose it is 7:00 in the morning Most people come between 10 am and 15 am Let’s say I book a cap to go to office Unless the company knows this, the company How is the driver available at that time? You will be able to show all these in the app It is very important to know the information of the company and we brought out this information here Forth is telling which month The least number of rides are booked during the week Which is that day on which people celebrate the most? book more rides and how many Miles or how many kilometers people have to travel Prefer to book rides now All the information is given to us by S.A. Assignment Now answer all these questions We will remove the curse from us here It is provided, it will show you the data once This data is provided to us by the company There are many columns in this data that the right one starts with what is the start date what is the end date who It is C category how many miles do I have to travel What is the purpose of knowing all this information We have been given what to do now Insights have to be extracted from this data and this questions are given to us to solve There are six questions, we will answer these questions We will solve it one by one so let’s go now Direct move of Jupyter Notebook and let’s solve this assignment in this so first of all we give the project all right People will see that you have the notes How to get it Where will you get the data set You will get the code files for this What to do simply Inside googlegroups.com Lore Courses You have to go here and select the category Want to do free category now it is free here Our free data science category There is a course in it, not only you will the project’s data set but also Lectures on machine learning So you can login directly First I’m going to go to Register Now here and After this, your details are here We will mention the details then after Your email address is sent to us here You will have to enter whatever mail address you use. you can post it here so here on the island contact@gmail whatever is your mail id Basically you have to put it here After that whatever is your contact number, you can I will mention it here, after that you will see it here You will get the option to register You have to click on the register now here Your verification should be successful The mobile number you entered is already there In that you must have got the code here, so we We have registered by entering OTP, now we are here You can directly login to any You had entered your registered number here you will enter your email id after that whatever The password you must have created is simply We will enter the password here and then We will pay here as soon as you log in You will be successfully logged in here to see the interface something like this will get Now after this as soon as you go here If in the courses enrolled in your courses You can visit us for free courses here If you want to select pay then right now we have selected any This is why you did not enroll in the free course It is showing blank here so its what we will do for this explore here We will go inside the courses here again Selecting the free category will give you free data We will go to the science course and Here you will see the button ‘Enroll Now’ You can see the button of enroll now right here So by going here we simply have to tell people this Is this a free course within the course? There are no charges for this too so here we are Simply go here and enroll now You will now be inside this course by clicking You have been successfully enrolled, now you can return If you go to your free courses from now then We will show you your free course here If you open this free course, you will feel The first one you see here is Data Analytics Project You can find it here All the notes, code files, everything. You will get it here, you will get it here plus The symbol will appear as soon as you go and click on it If you do this then you will find it here No notes on the project and find it here will go to u project’s data set this is then You will get it as soon as you click on this If you click, you will get the notes Also, your data set is You will get it easily here, how to download If you want to do it then click here to download You will see that you will see the symbol of two arrows It is given here, simply go to it You have to click, it has been downloaded and this We have both things in proper manner If it has been downloaded then this is enough for you Along with this this pre course will be helpful Lectures on Machine Learning in Python Lectures, all these things are for you people. If available then download the notes in this way So let’s get started today Let’s go to class and start our We have to open the project now How to do Jupyter Notebook in your Your Anakanda will have to go to the system Navigator is installed, you need to open it I already have Anaconda Navigator here It is open here you can go to Jupyter Notebook As you go to the latch you will find the button here We have to click on this launch button If you want to do it then our Jupyter Notebook is our It will open on the browser as shown here I am done with it, now what are we here for? I will create a new folder Now, here we have given this data set There is a lot of data in this data set This data set is available here We will do it, we will upload it here Firstly I uploaded it here, see here The pay button is given to you here upload In this I will simply go here and upload it we will go to the button this is u data set this I opened it here, it has been uploaded It’s gone now here we’ll go to python3 IY Colonel here we are, We will create this file which is ours to open It’s over, first of all we have to rename it Let me name it here now. hmm u data analysis project is it ok We kept the name and have renamed it now If you go back to this folder now Here you will see u data analysis the one you renamed it here The rename has been done, now what is the first thing to do If you want to do it then the data set you uploaded Open it once and take a look here This way you will get the data set that you have entered it is available to see here now What we need to do now is to open this Jupyter Notebook In this data, we first tell people A lot of work has to be done in this If you want to clean the data then for this we People will need libraries here so first of all we should focus on libraries here We will start importing first There will be a library, Panda, which we will call We use it for data cleaning There is quite a useful library so I I will import it here Panda Edge PD then Next Library We do not do any mathematical operation here If you want to perform then import Napai S NP after this we here If you perform visualization also then import Matte plot lip this library is Matte Plated Lip: Who is this for? to perform our visualizations For this purpose import mat plot is available Plot S Pat and another one adjoining it Libraries are made of wood which also help Now she does visualizations for us We have listed all these libraries here I have imported it Now what do we have to do after this The data set needs to be brought here for reading that file is the csp file which we have I have uploaded it for what I am saying here I’ll do a variable called data By set name and within it data set is equal To read this file I should write this I am going to PD read CSP file and here But what is the name of our file? The name of our file is u data name If we have our file then we can write it here I give you some data dot csv okay then it will come here Let’s put double inverted commas in it Let’s close it here and run now I’m going to divide the data set here by this variable when I call then it is showing certain error let’s just check data csv ok here what is the name of our file data set It is CSV we just wrote the data hence the error If it has come then let’s set the data here And let’s run it again, now this run It’s done proper now here what our We have the data set that we have This is our data set file which has been read. Now let us focus on what The columns are given to us firstly The start date of your ride is given On which date did it start? The date has ended and look at the time here It is also given to us that on 1st date 2016 It started on 1st of November, 2016 The end date of the right is its timing The right started at 9:1 and someone probably If the junction is nearby then it is a 6 minute right It was 9:1 right end is where it is now I had to know what the category was for business purposes I had to go to the start here which is the fourth vice Here the pierce is given, stop is given It turns out that one of the locations is Fourth Pierce like here in west palm beach so this location It is given from which location it started And how many have ended at which location? How many miles of delay did you have right to? 5.1 It was meals and for what purpose did you go? The purpose of multi-shape sl entity attainment was So in this way here you will get all the This data is given in columns every day If we look at the set that is here then total There are 10056 roses in this and the total for us How many seven columns are given here Its shape is given here, if you Have to find out whether its shape is proper so what do we write for that data set dot shape we will write it here So here it will show you that the total in this 100 116 What is Rose and Seven Your Columns Now, after this, for this data set, information i want to know that this data What are the values in the set? I need information from this so I Here I will write ta set if bracket close if we run this then now this data set We got all the information, now see This is the start date and the end date Data of object type is Object meaning The data is of string type, ok, after that Here we also know the categories Data of object type is given start end stop this is also object type data which miles how many miles have you walked So what is this floating point in this ride There is data and the columns that have the purpose I have data of object type now you are here If you look at all these values, If you figure it out correctly from this If you want then there is such a column here In which a lot of values are missing Values are given to us here right so now these are the missing values purpose this is how we deal with it I will have to do it because there are a lot of such things here There is a lot of value which is missing values Let’s see how to deal with missing values Let’s see this, let’s move on now The data we have is so good There is no data why it is not there because here But many of your columns are about purpose. There are missing values in it, aren’t we now? What will people have to do before data pre The data will have to be processed How will you have to bring it in the correct format? It will come in the format, what will have to be done Let us understand that first of all now here first The thing is that what is the missing data, first step this is the missing data with this we People have to deal with this, how will they do it Now we will learn and the second thing is that The start date and end date are given in The format of both is a string the format is the object format so now we People will have to change it in a proper way proper date and time format we’ll have to change it here right so in Now we do pre-processing of things and in this both these problems will be solved If we solve it now, then the first thing which The problem here is that I first put the heading I will give you the data, we will do the pre-processing how should you give heading then you people To give heading, write hash in space You have to apply it and then write the data pre We’re going to do the processing here. go here you will find three code marks Selecting Down and Raw Mark Down and run it here as heading This will always come whenever the company When you submit a project, the heading should be proper I want the comment to be proper which step What should I use, please tell me by commenting properly If you want the right then click on the file given here I have put it in the notes for you the whole thing will turn out in a very good way Right now I’m just teaching you project so I comment a lot I am not wasting time by writing etc. But when you download it in Notes You will be shown how to make a deposit what is the right way so here we are now In data pre-processing, this data is Our set is inside it for the purpose The column has a lot of missing data in it The one you see here is written by the name NA If it is right then this data is not available Now what will we do with it here? you will figure out what I want Wherever YA AAA is written it is missing These are the values, write them here instead Not OK means data is not available there is no purpose in calling it not If you want to replace then this is here My data set is: inside this my column is what is its name so the name of my column This is the purpose, now this is my purpose column what to do inside it instead we People feel a function, we are here We will put it here and NA which you have just written in its place i want to write not come and tell this thing to us here If I want to do it permanently then I will write to you in place i equal to write right and its Later we run this so here for you to see something like a warning you will get it and after that if you come here again data set dat head call karo head means You have the first five data if it comes in then i will do y p data set dot head If I call you here then now you are here But you will see that earlier he was coming with NA written on it Now it is here in missing data in purpose It is coming written that it is not ok now we know no what is the purpose its place is empty we have put it here not ok now its What do we have to do next step after that So now in the next step we will tell you this The start and end date is its time which is why does it need to be changed in its format It has to be changed because the format it has right now is What is that object type, see it here It has been given to us in exchange for this what to do with it in the rush of date time how can i put it for best First we will set the data on this and its name will be given below: inside this is my start date and end I have column name by date name so I will write this as the start date ok Write the start date here and After Start date and then what we have to do I would have closed the start date and the bracket is equal to now we have to change this is in date time format then date time Pandas library for converting formats which is yours so i wrote pd dot Inside this there is date time frame then PD dot to date time this hum log yahan pe function will you find out what does this do for date and time if that helps to convert the format Here I have written the date, time and its date If you want to change the time then this is the data set Inside this, your start date There is a column that we need to change so I will click here Let me write start underscore date ok yes, we write it down here start date and then one thing here what does errors equals two mean Now let me just explain you each and every code The line I am going to explain to you is very There is no need to panic too much You will understand something now that I wrote here errors equals to course by doing this means What happens if there is an incorrect data date value it’s right if someone is wrong it means there is no date If there is value and it is wrong then what is it to him? what does it do to fix it Not a time nt in his place puts the value it means not a time Now let’s assume that we know that Like in the morning, whatever time it starts in the morning it is right 6:00 o’clock, it is 12 o’clock in the day It is 12 o’clock, it is 12 o’clock at night, so It does not happen that we tell a different time are you here or else if we like Indians If we look at the time according to the railways, then one It rings, after that 13 14 15 16 such timing It starts, it does not happen that 26 comes If it goes right then this wouldn’t be a fable There is no time like this if there is a wrong time if it is inside the data then we can do it If you do the course then they will remove such wrong data what will it do, it will put the NT value and it will help to remove it so I have put it here it is written like this now this thing which you have written in this We have done the same code with spurt date For whom do people want to do the end date also If I want to change then what should I do here? by typing this code control A control C I copied this and then Now I’m going to paste it by pressing Control-V. Wherever there is a start date, I will what will I do when I go and date Because I need to change both these columns. If you want to do it then I have written the end date here Even here where the start date was I I will go here and change it wrote and date right now after this finally If we go and run this code then it is showing get set not defined ok here Somewhere it is showing data set not defined We made a mistake here, the data set is There must have been a spelling mistake in this Because this is an error, now the spelling has been corrected The error has been removed from here, what about here now? You have to import date and time so now we have changed the format Now let’s see if it has changed or not format actually so I’m going to read here set it If I write this and we run it here now let’s see what it was before The object was showing in both of these now The date and time are correct here now we will do one more thing, what will we do here now But we will create two more new columns what would be a column here that would tell us people which date is proper which date is right It has started, tell us this and Look here, the time is written here right i want some time here separately Let these two columns come in the column what to do here is to prepare it Can you tell us the proper time here also? What will we do for him? Two more new ones If we create columns here then How to create the columns for your So here we will import from date time date time a module already it happens this library is built here inside it we will import date from inside it time then this is our data set Inside this we will create a column, its name is There will be a column named date and a We will create a column, its name will be time here What I am after is what is here till now I have even given the purpose, I am saying that and two we will create new columns of one column The name will be date and the name of the second column will be time so that the date is written separately and this It’s time, I should write it here separately. So we’re creating these two new columns. So let’s make the first column date For this I will write PD here by name Dot date time index ok this is the function we call people will call it date time index and its What will we put inside the data set we have We already have this data set in our there is a column its name is start date so i It is in caps here so I will write here I’ll give you the start date okay now this start date so after that here we People should close the brackets properly After this, we will put the date dot here. date is ok i am just doing this much running I am showing you so say data time here ok here data is not time but from date time must be due to spelling mistake There is an error, the date was misspelled So there is an error, we have cleared it now. Now I’m here to set my head if I call the bracket should be closed now Look, a new column has been created in it What have you given as your date which is different here This column is given to me by It became possible to create another column in the same manner I want whose name is what whose name is time to do what it takes for him I have to write the same thing as written in the code above I won’t be able to spend time writing and explaining I will copy this so that I can save you time I have been doing it but when you guys do it yourself typing the code as much as you can The more you practice the more you remember things it keeps happening now I want that my The name of the next column should be time and this is whatever for the start date and here I am now Who needs this in which format of the hour In the format, I have written our here Now after this run the data set here If I do it then look here, yours has arrived now A separate column has been created with the date name and a separate So a column was created with the name Time so this one He was with me, now he is separated here There is a separate date and it has been given separately It is right, I have understood this much now Now let’s move forward and do one more thing How will we divide it into categories now? Look, we don’t say that Mostly if there is a ride at 6:00 then it will be at 6:00 So the morning can be saved and the evening can also be saved if possible then what is the category that Morning was right, afternoon is also ok was it right or was it evening’s right or It was night time right so what are we going to do with it now? We will divide the morning in this category afternoon, evening and night so what am I I will create another new column here that What will happen in the column will be that we This is the time, this is the category we created If we are going to put it here then I will create a new column here for this Name Let’s say I’ll name it D Night So which right was here for how long For example, look at the time in the morning, This time is from morning till 10 am. what do we call morning time They say right now let’s suppose from 10 o’clock By 3:00 in the morning, this is yours Afternoon ride is fine now after that evening ride From 3:00 PM to 7:00 PM It’s time for your evening and then it’s 7:00 This time is from 1:00 am to 12:00 pm If your night is over then according to this we Now people are divided here by making categories how will we make the category so let’s do this Let’s see, I have another new column here I will create a data set inside this and name it I will celebrate the day of another new column here I’ll name it Knight okay So what am I doing with this data? Create a new column inside this data set I am writing this, what should I name the column Day Night so here I am My own column is called Day Night I am keeping it okay now for this I am here If I want to create a PD dot category, I will use We will install PDCT function Equal to data set: We will write P here. What time do we need in this data set Okay time this is the one that our column is made No, this column, this time, this column I am posting it here because of the time how can you categorize without that If it is morning or night then I wrote here times comma now bins we will write here that Do we want to give away what we have? What does right buns mean time frame that means from 10 a.m. onwards like I Here bins equals to square bracket I will apply it from morning till 12:00 in the night from after noon till 10 a.m. Morning time will be from 10 am Three till 3 p.m. is 15 okay that means till 15 means till 3:00 Our afternoon after that 19 means 7 in the evening Evening time till 12:00 PM and then 7:00 PM 24 after 00 o’clock means 12:00 in the night It’s been night till now I have not spent this time here I have given it here now what is the label of these beans If it happens then the first label will be ours, that is Meaning when it is 12 o’clock at night we say yes, if it starts the next day then from the next day onwards The time will be from 10 am to 15 am we will give it morning label so I here the labels should be equal to To the first label I will give here in the morning ok morning i wrote here The next label I give will be mine I write about the afternoon here Afternoon then the next label which I’m here I will pay, it’s morning, it’s afternoon Then in the evening I write it on here evening then what will be your next label Obviously night this label we put it on okay we have given the code here run so it is showing certain syntax error or syntax error because here We forgot to put inverted commas Let’s run this code and see what happens We wanted to create a column and the column is here whether it came or not then I will write here I give you the data set dot head after this we run it So here we can see yes, the column we are looking for He was here and has come and is now here It is 21 which means it is 9 o’clock in the night here The proper category has arrived night right so this By the way, you have everything here Now it has been arranged properly What do we have to do one more thing here in data pre processing that All the null values here are right We call it the data in which there are missing values. what people want to do is delete If you want to drop him a line You will have to write the code key in data set dot A means remove any null value to do this and write it to us I will have to break it in equal place so that it gets removed If these values are taken then I have written them here place equal tot let’s run it is showing certain invalid syntax latest check what sort of syntax error it is here It has been installed by mistake so there was an error here This has been removed, now we will come here once again See, by setting the data to a dot shape, here it is There were null values, they have been removed from here now Our shape has come to such a pass that now this There are 403 rows in the data and there are 10 columns. It is now available to us what do we need to do next step The first step is to work on data visualization what did the question tell us which one People are most interested in this category right book if we do then we will go here and see that We have a column called Category Category is showing business can be done and There should be some category in which category the people are If you book the most over rights then We will make a graph of it here. If you show me then the first question is We are solving it here and Our visualization starts here now. If yes then I should give proper heading here also. I will provide data visualization from here we will If you are starting then apply hash here We need to select mark down and this is our heading is speaking the first question For which category people are most interested If we book open rights then what do we do? We will make it in graph format A graph in which whichever category is the most If it is more then we people know easily here If I could walk, what would I do for it? People will build it here if we If I want to visualize it properly then I I will write here pat dot figure we here I want to make it and this figure is We will make two plots here I am collecting the questions here together what is the first question to solve He is telling that in which category people are Who books the most and for what purposes? If you book for right then we are here I can make two sub plots so I will make two Put questions together in one diagram I want to show you what I want to say I am showing you, you will stay on one side We will create people for the category and another These two will make a graph for the purpose The graphs that we will have are our bar graphs This is how we will make graphs And whichever graph is higher, we will know Most people will go for this category overwrites the book and for this purpose So we have two sub plots here I want to make it, that’s why I am writing here I’m going to pet dot figure and then we People here will judge the size of your figure If you mention it here then I will write it here Give me the size, this is the size, let’s suppose figure size equals two 20 5 i want to keep the figure size here hmm okay we have this figure The size is coming now I have to make all the plots in this i want to make two plots so I will write here palti dot sub plot because two graphs have to be made So here we will write two graphs I want to make x axis wa axis for this In a way, this is a representation of you right now. Now I will make this first I will use the Seaborn library so I wrote here SNS the first plot which we People are making it here which gives us the feeling of a bar The one that looks like this in which numbers also appear is I have selected our count plot here. why did you take the count plot ma’am so the reason for this is now just making a graph When it comes you will know how much it counts This is what we have to take out the most Which category do people like the most? If you book it means most of the people It means more than just some number There will be like 1500 people, let’s say 1000 What do people book rides for? There may be 500 people to go to the office If you overwrite the book for someone else then I need the number, that’s why I am here It is written count lock so that they can count us If you can tell me the numbers then I will write it here AA D Count Block Inside this, we have data I want to create a data set, I will write it here We found the category column here If you want then I have written it here category okay after this here I will I will close the bracket now after this If you try running it, then look at it this way Two categories have come to you here One category is your business and the other is Personally, now if I count this on you If I see the plot, you can tell me comfortably You can choose the right business category for your business. I use u’s personal name the most We use less for category, it is clear It is understandable up to this point Now let us move on, here’s something else If you want to write down things then this is an If I want to rotate the axis and write so here we can write PT D A6 You can also write here You have to write in small print Palti D xxx xxx and in this rotation we people You can write that if you rotate this, you get We have to write rotation equals to 90°. If you want rotation here then I have put it here It was written 90 so now look here, this is yours The category has been created and here is your You have already reached here by writing the count. is already at 90° so a graph This has become ours now, this is where the purpose lies I mean our first question was that we It is clear which category People book Uber Right the most for this If yes then people are most interested in business category This conclusion is why more overrights are booked We have made it this far, what is the purpose now? We do this for the people here If you want to remove it then I will make another sub plot here If you want to make it I am here I will write pat dot sub plot now all this Inside the plot I will write 1 2 2 ma’am this what are you writing tattoo in the first one Look, I had to make the first plot, so I had written twa which means two is our already access is already given here first I have to make a plot so I wrote forest here Now I want to make another plot. I am writing here 2 so 1 2 2 now who We have to make the graph here We will also tell you which graph to make We will create a count so that it can count automatically If you can tell me then I am writing this Dr count plot now inside this count plot is the data set now inside this data set which This is our column, this time the column we will take is We will take the purpose column because category We have already made it here ok so i put the purpose here now If I simply run this by going here So now look, two graphs have appeared first There was only one, now two graphs have been made here If we look at the category, then business People use u the most for category We do this not for personal category From here we understood in the second graph If we go and look here, then not that you know that n n values There were null values so we named it NOT It was written in it that now if we go If you look, most people use it If you use u for meeting then it is obvious It is important if the category is most important for your business if there are too many then what is the purpose of using u I use it for meetings what else would it be used for after the meeting if yes then it is in second number mile or They had to go somewhere for entertainment if yes then book it Then use it again for customer visit Then use it for supplies To go to the temporary site and Office suit is for business For which category and for which purpose People book u the most u have the best time to go to meetings If there is more use then these were the two questions Whose answers we have to give properly here what was it used for we did it okay so answer these two questions We have just solved the previous answer here Business category came, second answer came meeting here now asking which one People book u the most for timing Let’s kiss you like a morning Now this is us here for the evening I have to take it out, how will I take this thing out Let us see here which one if people book u for timing then We will make it here too snssdk1233 it is booked at night for this Which column do we have for day and night there is a column so we have called this day night column I had made it so I will write it here This is our column this night We will plot the counts within the column with Now I will try to make it, simply if I run If I do it then we will have a simple one The graph has arrived, you can see in it who u are most booked for this time so u do it the most during afternoon time It is booked during evening time at night time and at morning time u People book the least and this is also Our third question was at what time If you have the highest count then you can see it here there are almost 140 of us who are 120 people book for afternoon who in the evening time we do approximation 70 people do it at night time and In the morning time there were around 50 people who yes they book r which means most Most of the bookings are done during afternoon time So here we were asked which one Uber is booked the most on time that’s the afternoon time three We have just asked the questions here We have understood, let’s move forward Let’s look at the next questions now The next question is what is he saying that here we have to analyze the month which is the month in which least number of people book u so now we If people want to see about it then month What will have to be done to analyze this This is our data set column, in this we People will now have to create another new column Let’s know how to make a month whose name is Let’s see, don’t ask me any questions I have been here wondering which month it is Where the least amount is booked and during the week Which is that day where there is maximum crowd? What will I do if I get booked? I’ll go and check my data set now so let me just right here data set dot head I’m here I write but let’s run then see now What do I have to do here? Give me more New columns will have to be created here, this We have a date, so I can know the month what should i do here for this you will have to create a new column in which month comes ok january february now Here you will see that just like January 1st here pe hai right so what day was 1st January because which day of the week do we If I want to find out then I need another column here I will have to make a new one where I Where can I find out about the day? Let me know what right means That particular day was Tuesday, Monday, Wednesday We’ll have to put him here first of all what will you do for these columns If we make it here then we will need two new ones for it If you want to create columns here then Let’s start our work in the first month if i want to analyse then what am i doing here I will do the data set inside this data set I will create another new column named will be by the name of month okay I am by the name of month And I am creating a new column for this Here we will write PD dot date time This is our index, which we will call here will do pd date time index function like we did it before, after that this There is a start date, right, inside it we will will take out Start underscore date and start date which is our There is a column inside it, what do we know If you want the month then I will write it here start date bracket proper close The start date bracket will be applied here square brackets so let me just write this here bracket properly closed dot We will put it inside, we need the month properly Now it is month that we will have to do labeling as to who what do you want to give to me this month so I Here another variable named month label I will create month underscore label inside it What if there is simply nothing inside it Maybe we can create a dictionary here There must have been 1.00 if it was written in this dictionary what does it mean one then dot 0 if If it is written like this anywhere then it means the first the month that is the month of january okay then after that write for February if so 2.0 so I will write here 2.0 somewhere if it is written then coolan should come here This is our key and the value inside it If you want to enter the value pair then we will enter it here so if we make it then this is January it’s okay in this way if 0 is there anywhere This means which month is that? The month of February this is exactly how we are all up to Till when will you write likes, fill it up till December if you want to check it out then I have already posted the code here It is written so I am taking a lot of time now so what am i doing simply I am copying you here so I am simply Just copying this, you guys should not copy yourself Writing Code from Habit to Practice Things to do it’s necessary okay so let me just do one thing here let me just copy this control a control c and here control b so here But now we will see what is written. If it has happened then let me explain it to you once PD. date time index data set start Our date was here, after this every month We have created a dictionary here. what have you made in the dictionary so that here It is 1.0 which means the month of January is 2.0 which means February is 3.0 meaning March is up to date Now we are back on the set till December what will you do with this month label also will map so I used map function used so that the month which is with it The measurements were done and then finally we came here Created a variable named Pay AO and entered the data put the month value counts inside the set And after counting this every month here, now I will give it to you, I will run it and here I’ll go and write data set dot Write head data set dot head here Let’s run so we can see that month in the name The column is ready and look here, January If the month is one then here you will find the month of January has started showing so the column which we I wanted to make it, now it has been made Now we have to figure out that monthly Right, in which month did it occur the lowest? yes that was the question so here it is What do we have to do for this graph? I will have to make this icch padi dot data frame ok pd dot data frame i am here But I will apply it and inside it, whatever is our month’s wish These are the values, we will put them in, I will also I am telling you not to be too worried that this what is happening suddenly it’s a very easy code But I’ll write it down first so I can To make it easier for you to understand, here we have One basically puts it in a dictionary format we want to post this so let me just use here the curly brace okay I need to put curly braces so I will put the code here I am pasting it and the data here The frame is curly braces we wrote months The values of the month in me and the value We are applying group by in the count So that we can know how many likes we see which was that month in which people were most If you book less rights then how many are these He has walked many miles, if we come here Maximum amount will be calculated every month We will calculate the value of maximum biles and If we calculate the total for each month then we will know It won’t work, what month was it like In which the cabs that were least booked were yes right so that’s why how much you and I are If you book for any distance, then every At least a few miles have been covered in a month, it seems like Suppose in the month of January, Rs. 600 I have walked miles, maybe in the month of February you have travelled 700 miles so that’s us has to be taken out here and after that here I want to make a line plot of months And we will put the value count here Now we have a line plot here. Let me run the code once and show you Then you will understand this code more easily. now look at the months here As we will see here, these two lines You will get to see this blue color This is the line that it is telling us count of months and this thing here The line is yours, we can speak it You can see the dotted line here The dashed line is telling you this value count value count means that every How many miles does your car travel in a month? Whatever it is has gone here so if we Here we basically have to focus on months that in which month is there such month where u rights are booked less so if If we go and see here then you will see here You will see that this month of January and February You can see the downfall here and People are seeing November-December These are such down falls in time where you If you see this, you will analyze it Also if this count is your plot if you see this also then this is such a time It is the time of January and February and these are It’s the time of November-December where the graph We can see it falling like this here That means that this is cold weather It is a cold time, whatever rights you have in this time It is the least booked here and This data is basically US based data And it is already snow fall time there If it happens then we will consider it as a luck rough evaluation if you want to take it out and show it then we can say that it is a very irregular curve but the time we People can see here what time it is This is the time of November, December, January It’s cold time u rights these months I am booked here so this is our This was the next question so we answered this as well Here, we have solved the first and second Which was the third question and the fourth question? In a month, here our month is January The month of December was here in December november january november december this month Most of the rights are booked less We have taken this out, now take out the week also if so then what do we have to do for that one more You will have to create a new column, this column’s name it should be right means right now I will give you If by writing data set dot head here Let me show you, let me run, now look, we have done this for the month It was created in the same way with the name Weak here Another column will have to be created for the next question So how can we solve this here We will make a week, if we look at it then it will be like a month we will create week in the same way as we created data set Inside this we will create another new column, this column We name it day because it is week which day is inside monday tuesday wednesday We have to justify this here so this is the data set and here I wrote de Now inside this we will write data set dot This is what we have called Start Date there is a column, we will take this and for this start date Inside, we convert every data into what We will give DT dot on weekdays here dot DT D Week means now it is converted into week day Now we will have to give labels in the week So I’ll write here the data label. Like you gave labels for the month, January It’s the first month here somehow What will we create first? Dictionary now this dictionary we have Let me just write has been created here this year now we will create this dictionary if If there is zero written anywhere, it means Monday happens okay so here I took the p label Zero if you are written zero anywhere If you see any calculus then we will take it here We will consider Monday in the same way If it is written here ‘one’ it means that It is Tuesday so I would write it here Tuesday T ues ok next one is yours If it is written ‘to’ somewhere then put a colon And two means your Wednesday is just like this in this way if there is three then plan t means thursday so i write t h y r then Later on you will keep writing for everyone in the same manner If you keep making it four then it will mean Friday to fo colon Friday then comma f then kallan that is saturday then comma six kallan and we will write it here Sunday what has happened to us this is with us This label of the dictionary was given by us I have made it here, ok now what is the next thing to do Now the next task is to Inside we have to look at the column named Day So here I write the day and then equals to data set and within it we people We will put D inside this day column. the name of the pass column is na so let me just right this yer d let me just correct this and right this to d okay ya by name so We have to create a column, after that here p will map this to the label so dot map and here I will write day Why are you mapping underscore labels because Look here toy day we have to call people day We have to create a column from this now inside this is that means if it is written two then it must be wednesday If it is written five then it must be Saturday We need it to map both of them I have to do this so I wrote here dot loop And inside this we pass the day label to let’s run this and see after this it is showing certain error let’s just check what is the error here so I am saying it here if day label is not defined then We have defined here zero than monday then one comma tuesday th 4 saturday and Sunday right after that there’s the bracket here Let us take a look at it properly ok here is the data label there is a mistake in the name Here we wrote data label and here we wrote is labeled D so calling him a wrong person let’s do the work, let’s do the day label Now the error has been removed from here, the spelling is fine Due to mistake, let us write it here now. Let’s type data set dot head and see it running Now let’s see another new column 1 January 2016 has arrived to you which day was it on 2nd so it was Friday by the way So what can we do here in an easy way now we will be able to plot this I want to know which day of the week it is The day when the most Ober Rights books were available it happens on Monday it happens on Tuesday if it is thursday then which day is it so now what do we have to do for this I will have to code, I will write day here Underscore label equals to data set dot let me explain don’t be worried at all that this what are you just writing code to me Let me focus a little on writing after That’s the line of code I’m going to explain to you counts okay after that we will be here I will write SA D bar plot first I will take the code then I will explain each and every line to you guy x ictu day We will put the under label then dot here What do we need in index comma or axis People need day labels writing here day under the score label okay i’ll write once I take it and make a graph after that one line in Everything will be understood after that what we have to Right here then it will be then after it will be plt dt x label and x access us guys who are into label want to give in t is day and why access If I want to give you a label I will turn around and put it on my face I will write and we will write here ok once i run this code I am showing by doing so i am writing here count let’s just run this okay what I mean, I am just explaining it to you right now I am looking at each line of code, this day label We had created Monday Tuesday for this who wanted to count that now i like this I want to know how many rights are there on Monday The book is being booked right so now I am here If I go and see, I will know here Got approximately 60 rights every Monday If it is being booked then the graph here shows 60 He was able to come because we counted If you could find it, that’s why I posted it here Write the line ta set value count so that here But we should get the count in the axis Now after this here we are making beer So de label in x access and va access I also put the de label mess here In access you have de and wa in access If you wanted to write the count then you can see it here ho in bar graph in x access your It is written nearby ‘Day’ and it is written here here is the count now if you look here then here Friday is such a day where most What do people do on Fridays in most weeks? let’s book cabs because It is obvious that our Friday is over Office hours are over, after that weekend starts Saturday and Sunday go by, so here it’s Friday It is the day when people are most What do you do to book more cabs? So we got this data from here And this question was now the fifth question We have also answered which weekday is it? There is a day in which the most people book so that is the friday On this day people book u the most now In the same way, which is the least day on which Obviously people book less during the day Sunday is Saturday Sunday to us getting to see why Because here on Saturday and Sunday What happens when there are holidays, people Most of the users do not book due to lack of time on there home so Most of the questions have been solved First was which category so business category for which purpose so the purpose of the meeting what time is it used for people Most Ubers are booked during the afternoon In which month people are most excited for time? Book fewer Ubers during winters that is november december january fifth question At what time, in which week, such a day would come This is the day when there is maximum overbooking So now on Fridays we will get this I want to find out how many miles I would normally need to travel or how many kilometers do people have to travel and you use a car, let’s say We need to find out, here If I have to go 5 km then people will you overbook if you accept from here I have to go 1 km so will the people over Will you book it or will you go on foot? All this will be taken by some other transportation Now we have to figure out things Miles So now let’s discuss it we will start alright so now our miles This is the column with the last question, we will now If people explore then I will provide the data here Let me write set dot head once, now here But look at these miles, these are different miles yes right someone u have booked 5 miles to go For some it is 4.8, some it is 63.7, so different miles If there is any, what will we do here? People have booked here for miles We will make a box plot of this why make a plot because you have Like here you have boxes, how many? How far did maximum people go for miles yes we people need to know this that’s why we Here I will create a box plot once I will make a box plot and show you easily you will understand why i am here Pay attention while making the box plot so sns dot box plot and then we Here we will write the data set and the name of the data set Which columns are inside, the name of this column is We have miles right so miles this column I’m running this here and The anthesis is not closed, ok here The bracket is not closed properly, you should have run it Now see what is the benefit of making this The box plot of this ride of ours okay this is showing you approximation 25 from zero means it is approximately 20 from zero Most of the people here for kilometers You have booked it, after that gradually you You will see as many plots as you can see here This is yours for up to 175 miles The graph has been prepared, meaning from zero This entire 175 mile stretch was visible till here How many miles does maximum people travel in Uber? to go to that that is near about 175 Now what do I have to do to travel miles? This all happened very quickly and very quickly I just want to see this for 100 miles I want to see even 100 miles till when people how much do you want to book then what should i write here sna dot box plot and then here we go will write the data set from within this data set Which column do we need? the miles column from inside so here I am I will write to you Miles okay now here Every mile I want to see is 100 I want to see the inside of the mile from him There shouldn’t be much so I am here I will write 100 100 miles and here I am I will write it down Miles, okay I will write it here closes the brackets properly now let’s run it is showing certain What is the error saying int object is not true subscribe can you see it here issue is coming sns2 plot we wrote ok Here is our data set from within this So here we have to take data set data Inside the set there is another data set of ours. From inside we will take miles to be miles we should go less than 100 miles around I have to bring people here so let me just do here one thing close the bracket here Let’s give it to you and then here’s a 100 and it’s miles let’s just run the code now okay Now it is clear here, so the first one was Miles inside the data set data set here And how many miles are within 100 miles? We want to see P here under 100 here If you look, you can go up to about 80. It has been saying that from zero to 80 here we are you can see it here as well if you If you look, people from 20 to 30 miles around more are going after that you will see 50 It is blank from 60 to 70 miles then from 80 to 70 miles Booking a Pay Cap here to go Now this is the way I want to look at it Now, I can even analyse that Look here, this is the area number 40 here But I feel very much that people Miles here so many miles 40 miles Many people book a camp here to go there. so here we go what do we do now If we look separately for 40 miles I will write here sns dot box You can make a plot or dis-plot here I will also tell you what is dis plot I will make it and show it to you will show with density okay so right now I what i can do is this is the same our code I wrote to him, what am I doing here? I am copying let me just control A Control A Control C Go here control and do copy paste now this is you If you see, this was for 100, what I can do is i can do this for 40 so now you If you look here, look here for 40 If you look at it you will see a graph like this Getting to know that I mean that most people go 5-10 miles there Later here is the graph of this method here This is what I see in the density plot if If I make it then I can see it in an easy way How many miles will it go or what is the density What is a plot? I can plot it once I take it so that I can give you a better look How to understand what is a dist plot This works, the density will be visible there SS dot dist plot and inside it is our The data set is inside this data set Data set And we will take it within 40 miles to show up because people go till there If you are going too far then you should have written it here Are Miles end bracket close less than 40 miles for less than 40 and let’s write here miles okay We can write the bracket close here as well after miles put the proper bracket here let’s close it okay and then after this if you run then there is a problem with parents dist dot dist plot data set was data set data start here proper here There is a bracket then miles was after this Less than that, I think of the data set here. Our column is not closed properly It has happened from here, it has not been closed here right here after 40 we People let me just write this here 40 bracket Close, now it has come to us, so what is the meaning of this dis plot, look You can see the density here right now You can easily analyze it here and tell me It is possible that in this graph of ours To travel four to five miles, people use the most You can also get a slot worth Rs 10 from here If you want to see then go to see the peak hike If you are getting it then it means from zero to 10 miles to go up to 10 miles People book the most for u and after this if we look then from 0 to 20 People book U even to travel miles but if they have to travel more than 40 miles If it happens then they don’t show that much interest if you want to go on for too long then this We have to find out the miles here which we have figured out that from row to 10 Most miles are as far as People go and do this from Ro to 20 Miles is where people prefer to go are by booking the u caps so this means The point here was that if we answer this question Answer now why did we choose this dis plot If you made this then the reason for making this dis plot is it was here look do you know the density it’s going through a curve, you can tell ho where is it and how much is it like we are from here If you see, from November to November 20, this is a very peak. in our graph this means 0 to 20 People spend the most money to travel miles that is to get around for example 10 mile 20 mile or 10 20 km if we consider but more than that if If you have to go then people here see you so much Obviously I don’t use it much Why are people using ‘R’ the most? are using it for business now If you are having a business meeting then look around The meetings are from 10 miles to 20 miles If they have to take distance then for that Who do they use, they are caps here so that’s about the other The answer which we have derived here I found out that u are the answer to the first question What was it, which category, business category For which purpose, for office and meeting purpose Then which cab is most available at what time? If you book then it is afternoon time In which month do you book the cheapest fare? Cabs are booked during winter time because There is snowfall in Paris, that is why this The month which was ours was January, November and December I use it less in a month then which week I am the most on the right side here Which is the most popular day on Friday? I use it a lot and the most on Sundays How many people book less caps till the last As for miles, from ro to a 20 miles If they have to go then they will go to R Cap. If you use it more then how do you use it the most many miles to people usually book to 0 to 20 miles so everything that we have here The question was given, we answered it properly make a graph in this way and put all the details here We have extracted the answers and this is our data This was our insight from this, here we So in this way we have taken it out here This project is completed that We extracted insights from every question I took out the visualization and used it for that question So this was our project Where data cleaning is missing Dealing with Values in DateTime Formats Changing the column keeps creating new columns as and when the need arises Doing EDM visualization is quite something we People have got to learn from this project through so there was a lot in this project to learn how to do projects You have to complete the answers and get the insights I have to take it out so I hope you do it well You must have understood and if there is any other problem It seems that you did not understand some things You are definitely open for comments Please tell us by going to the comment box How did you like the project? How much did you understand? If you are able to complete the project then I will tell you about it I will wait for your feedback comments With this in mind, our work here is just to We do not end with just making a project people about this project that you Learn about it You want someone to take your project also give an overview so when you How deep should be your knowledge If you have it then you should also know how to show your work because at last our objective is Cracking interviews is a good job If you want to take it then please send your profile for that make it strong I did not understand, please tell me, we will meet Till then we will stay in some other video Data Curious and Keep Learning But Its Make sure to download the notes first There are some more practice questions in this Includes problems that you can solve yourself yes you will find its answer there Download the notes, see you next time Stay Curious and Keep Data in the Video Learning with this scale Data Science & Data on the Platform Premium courses in Analytics too are available which you can check out The Dai Scale website and mobile application But where you complete the detailed curriculum Lectures End to End Projects Live Doubt Classes Industry Recognised Certificate Off training as well as interview You will also get help for preparation the link is in the Description all right so let’s start This is today’s class where we are complete do We will take data of 10000 movies here and The special thing about this project is that Performing Exploratory Data Analysis That means ED, we learn this very well You will find it in today’s project and its Also you can work in any profile Data Scientist or Data Analyst Company even after I went in, the most The challenging thing that remains for you Data pre-processing happens i.e. Cleaning the data into the correct format arranging the data and then Extracting insights is the focus of today’s project Data cleaning pre-processing is also a It would be a great thing for us to understand What you can learn from this project Besides visualization many times we Reports have to be prepared and submitted to the company There are several bar graphs for that inside the charts in If all things are needed then this one But at the same time, this project that we will complete it in pythonanywhere.com I am going to explain in very easy language So whether you are a beginner or an intermediate level Or it may be in advanced level or any level You are in this project. Please include it in your CV. It will definitely add some weightage to your resume and this project is very basic and simple I will explain you in easy language But to start this project First you will find more on this channel of Dai Skill You will get projects like e-commerce There is a complete project of u data of the analysis This project of late is in which the machine Learning and NLP has been used in i Sales Analysis I am live youtube2 You will sit for three-four hours and complete a You will get up after making the project if it Maximum people comment yes live If you comment on the class then you will get my You will definitely find a youtube video from that On this we will complete the whole project If we complete it, then as many people as I am interested I am expecting that even if 500 students go and say yes in the comments If you write then I really like such live classes is to take such live classes where pay & sat end to end and in three to four hours A great project, we finished it Right so please tell me in the comments those who Would you be interested? Please write yes and tell me like More than 500 people commented yes in this We will definitely arrange such a class where I will give you a complete I will get the project done right so let’s start For now let’s do this project of ours Is In total three rounds are conducted Right, now you have come here to learn a skill. The most important thing for you is You should know that when you are in the company If you go for the actual interview then how many There will be rounds and what will happen in that round So the round which is number one is mostly What do companies do even if you share your data? Analyst Data Scientist Business Analyst Go for any position in most companies What this does in round one is that you get a An assignment or a project that’s mailed to you It will be sent to you within 24 to 48 hours time is given if you complete it I will send you your first round If your assignment is completed If you are shortlisted then you will have to appear for technical The call comes for the round and then after that Round three is your HR round, so this The assignment we get in round one Not in the same way that I am today I am going to get the project done for you so that your First of all let there be a practice right what the company does in the assignment such as Will give you some documentation first Company history will be provided That It has two co-founders as you can see Reed Hastings is looking at the screen And Mark, these two are basically its co-founders is when it started what happened to netflix’s once that its The founders have experienced that if people are late the DVD They charge extra for returning the money starts becoming so then they thought that Why don’t we send this DVD through mail? If we start delivering to people’s homes Whoever wanted the DVD could go What did you do? The photo you are seeing is from 2007 This is the photo of when the website was created It is available in more than including India right now if we talk netflix.in is this it is profitable with this Its profit now is $2.4 billion So this has become a very big company. 283 Its paid membership is in millions It has 283 million subscribers and It is available for more than 190 rupees and in netflix’s which you can also see on the map here are in every corner of the world What kind of things do you like and according to that If it starts showing us the result here then In the world That we understand customer behavior and patterns now you have been given this situation In the assignment that let’s assume you have data Working within this company in a driven role You have a data set of 9000 Now you can find 5 of 10000 movies here questions are given, from that data set you will get this You have to solve five questions and then answer them First, I’ll tell you this data set I will show you what question is solved after that If you want to discuss this then please do it here The data set you are looking at is a heavy We have data from around 9 to 10000 there are movies here so if we If people watch then around 9000 to 10000 movies this something data is given to us now Which movie was released on which date? what was the name of that movie what was the overview How popular was that movie and how many votes did it get? The count was found of how many people gave their average vote Which genre was that movie and URL link of the movie poster I have given every if related to the movie Formation is given like for example any one movie is spiderman right so this is spiderman Which genre is it and how popular is it? People came to see this whole information is available to you in this given in the data now you can contact the company which what are you expecting If you have to solve the question then first of all You have to tell which genre of this The movie which is he has to tell us which seconds are these There are genres that most people have Liked the one with the most votes It can either be a comedy or an action thriller C is such a genre that people like it We have to tell you the third question, which one? This is a movie which is most popular What is Milli and its genera? Fourth question which is that movie which has the least Popular Milli and what is her generation Fifth The question is which is that movie which In which year were the most films made? most released so until this information By giving you the answer to the questions If we have to give the right then now we will answer this question We will solve it with the help of Python, so let us People directly open their Jupiter How to Notebook Jupyter Notebook Anaconda If you don’t know how to install it Link to it in the description box below I have placed it so that you can easily If you can install it then let’s go ahead and Let’s start this project alright So before starting the project there is one more important thing like this right now There are five questions in the project, we will solve it but apart from that The libraries that are in interviews Python like measured panda mat plot lip c Bun and Plotly so related to this 25 I have prepared a set of practice questions for you. has been prepared so that there You can go and practice more questions So where are these practice questions now? You will get The Eye Scale for this You have to go to the website and open it as soon as You will see the website here, after that You have the option to explore courses here it will be seen that we will go simply and whatever is ours There is an option to explore courses by going to this We will click here for the free category We will select the course which is this In this particular website, The Eye Skill On the platform, you will get free data science and Data analyst courses are also available From where you can start your studies Where you will find more video lectures like Python Machine Learning SQL Excel and more You will get all these things like projects etc. Will go with video lectures along with notes and also with the code files now like You can get this free data science course here I will go and click, after this you will see above The option to login and register will appear if you If you have visited the website before, you You can log in directly or if you If you are coming for the first time then you can simply come here What can I do? I can register. So whatever your mobile number is, you can enter it here You will enter the password that you have created yes they will put it after that you will do it successfully Like you will be logged in here in this way You will be able to see an interface dashboard. Here you will get the option to explore courses And here you will also get the option to enroll a course So we are already here in this free course Enrolled so we are here to Enroll Courses I will go and read the free course which is here We will go to the free course as soon as we After this, we will select the first one which this is yours right here you’ll find python By doing practice questions, you can Simply solve this Python practice questions You will click and then you will find it here will go to python’s practice questions and This arrow is visible here, from here you can You can download this then simply Here you have to click on this arrow, this is your It will be downloaded after downloading Here you will get a complete answer of 25 questions If you get the set then I am hospital If it is full then you must download it. Take it and you will find the link below find it in description box So first of all we will look at our system we will go and the anaconda navigator which is you will open it here if you open it If we do something like this then we will The interface will be visible and here we will The option for Jupyter Notebook Latch comes here You will see a launch button at the bottom like You will then go to this launch button and click on it Whichever browser you use, mola firefox.exe In the decryption box I once again Let me tell you that the notes and the data set It is available in the description box below Check it there, I have already done it for you. If it is kept put then as soon as You will go to the description box to download The option will come and you will find it there So now we had this CSP file here What have we done with our data here? I have uploaded it on my site, now we have people here You will need a new Jupyter Notebook of your own to make it we will go to new here python3 iva null we will click on it After that this Jupyter notebook is ours It has been opened now you can see it here You may see untitled in this, whatever you I want to keep a name I want to keep a name hoon movie data analysis In this, the data has to be cleaned first There will be a lot of steps so we Here we will import some libraries. Pi Libraries for Mathematical Operations Pandas library for cleaning data We have a lot of graphs here too. If you make it then we will use it here I will do Matte Plot Lip Library and Also we will use it here seaborn library right so this What should libraries do first? If you have to import then first of all Whatever work we have to do here, we will do it here If you want to import libraries then first First, we will import the libraries here. We will start importing Nam Pay Edge NP NP is its alias then in the same manner import panda as pd than import mat plot lip dot pi plot s p lt her Later we will import sea bun also Import se ban as SNS then this is as much as The first work was done on essential libraries Now we have the right to import this Here our next step is which We have to perform as much as we can with the data We have ma m dbcp2 CSV This is our CSV file, its The name is my movie so if you write my If you look at the keyboard, it is above the caps lock The tab button appears as soon as you press it If you do it, it will automatically show you the name of the file now what do I want here We will use the line terminator for this Meaning one row should come and then the next row should come second row then third row in the next line In this way we can understand systematic data If you want it here then we will write the line here we’ll use the terminator so that we have We provide data here in a clean and nice way If people see it, I will write it here am the line terminator so that after a row The line for the next row changes to equals two So for the next row we have to change the line then the method to change the line in the next row is If we want to go to a new row then we write so that we can go to the next new row So we have written this here, now this data the set has arrived now in this data we Which are the first five days? Five data If you want to see it then we have written here The number of columns you have is given here The first column is the release date On which date is the movie released? After that what is the title of that movie overview what was in that movie I have given a brief overview of that movie How popular was that movie and how many votes did it get? How many average votes did that movie get? Count and the original of the movie that was made what was the language, it is given which It was a Janar movie and the poster of that movie its url link is given so now If you look here, this is the first movie. like it is shown here It was released on 15 December 202021 spiderman nove home ye movie ka jo The overview is written here How much popularity did he get after that, how many votes did he get Count Mille the actual one who made this movie His original language was English and this Which genre was the movie based on, action adventure Science fiction and the poster of this movie If the URL is given then this is the complete Now, we have the data that is given to us. If we want to understand that every Here we have a column given below What data type was inside it? There is no missing value in the data All these things are not null values, now we People want to check it here. Check it We already have a function to do this which we call ff.in now d.in As soon as you apply it here you will You will get the basic information first The information you are getting is that this How many data sets do you have in total? There are movies 9827 movies are given here now you can watch look at every like column then release What is the data type of the date here? Data is of object type i.e. 2021 12 The month means December and the date is 22nd The data that we have is this date We have its data type is object That means if the data is of string type then The first thing we need to understand is that we need to change its format in which place should we use this instead of string will have to change because there is a date so we This will change to proper date-time format The first thing I saw here was that Now let’s see another one, we’ll have to do this thing The title here is like a movie, the kings man so its title is definitely Data must be of string type right in the overview also here the string then this is also object type data After this, this is numeric, that is Floating point data is followed by integer If the data is floating point data then this The language is English, it is obvious if it is string data then this is also correct Object data is now as many data as is inside it These are also genres, object types here there is data of and after this this is the URL of The link is given, this is also object type data then all other things are ok but one The first thing we need to do is Release date is its data type we will have to change this instead of the object Converting people to date-time format The first task now is to find out if there is any If there is any missing value then we can see here You can find that all the records are here Any value is available missing here There is no null means there is no null in this data We don’t have the right to value here This is clean data, there are no null values Whatever null values we have now here We do not have it so whatever operations are going on in future We will continue to clean this data further To simplify, let us focus on that If it will happen right then till here we will tell you this People have become clear now see how much they used to be before This is our question, solve this question there are a lot of questions on the genre of acting so we If you look at the genre of movies here, there are many There are different types of genres such as Spiderman movie its genre is action adventure science fiction but here i am If I watch another movie, Kings Man then this The genre is action adventure thriller One To Aise I have given different genres here If you look at the first five generations here Which genres do I want in this movie? so here I will write df our data The name of the frame is our column inside it Its name is genre and within this genre I will put my head here, that is, my If you only want to see the first five you can click here You can see different types of genres given It is an animation action comedy family fantasy thriller if i get my head out of here I am here, no matter how many people are there here You will get to see all these things in different ways genres but i am just getting started I only wanted to see five so I posted here It is written head right so this is generas now genre One thing that I need to understand is this Look, the first genre is written action, some attention Give me the right here, what I am telling you yes this is a little important matter, this data pre The processing that you will learn now is very This is an important step for even the company Go inside to work, this thing will help you You have to watch this step very carefully Look right, what is this? Action is written here. Just after one birth what did you feel comma then there is a space given here After this I have written the second generation in a different colour I have made it, look what action is going on again comma then after that look here is a white The space that I am showing in blue is a There is white space and then it is written second here genre adventure then comma then watch There is space and then there is science fiction in everything this is the same thing look here it is an animation put a comma then the next one is a comedy before that If there is space then always look here to see space If you get it for me then I will understand this It is being said that this genre column is in this A genre is written, there is a comma and then this white If there is space then it is possible that in the future we will How can I remove this white space also? This data pre-processing will be removed Whatever happens I will teach you to move ahead right There are two works so far, I have seen the first one The work needs to be brought in date-time format We need the release date and this other thing too It is possible later if the requirement arises So whatever white space we have We may have to delete this right twice I understand this even now, this is for us too It is a must watch movie, let’s suppose take spiderman a column okay is this again Is Spiderman being repeated again? Is there any movie or data duplicate? There is a lot to check here It is important that if any data is repeated somewhere If it is not there or it is not a duplicate then now Let’s check if any value To check if it is a duplicate or not function with da dot duplicated name for It happens that we will apply it and put a dot sum whether there are any duplicate values or not here comes zero if i don’t apply even So let’s see what happens if I don’t apply any balance look falls falls falls come This means that any value if Not a duplicate means false if the value If it was duplicated you would have seen it here But this data does not have any value It is not a duplicate, that means every movie which is here There are 9827 movies here and every movie that is What is that, any movie is unique it is not a duplicate so as soon as we put the dot If we apply the sum then the value here will be zero This means any movie here is duplicate now what is the next thing to do so next The work is to do that whatever data we have It is given first of all the basic Perform statistics and see why look because as you look here you vote If there is a count then what was the maximum vote received? What was the minimum vote and what was the average vote? Right so what was the maximum and the minimum is now here If it is popular then how much was its maximum popularity What was the minimum and what was the average popularity? 25 How popular was it at 75? How popular was it? If we have to take all the things to the basics if so then we have describe name for it function is given by but it can be Statistics in the data which is numerical now you For example, suppose here is the title of the movie Spiderman, here are some statistics Statistics will not be applied with numbers If it seems so then we have a description for it There is a function named, by using this one Here we see the basics that what things are given here so here I will write df9 27 is given here because The total number of records you have is 9827 total count right now let me talk If we talk about popularity then average popularity of any movie Is 4.32 minimum popular is 13 and maximum It is popular 5083 This is how the movie got vote count I have received the average vote, I have received the vote count 1392 Minimum 0 votes received and maximum Joe is he is 3177 This is how you have to see the average of the votes So here is the minimum, if you look at it I have got zero votes and you will see maximum So if we get 10 then it is an average on an average. We understand basic statistics here Now whatever points we have earned till here comes We have seen all those points in one place I would write down here and keep my jaw down. I have all these points so far What things did I get to know? The first thing I did right now was What we have to do is this, this release The second thing is to change the format of the date It is possible that in the future, whatever genre it was There was a white space in between, it had to be removed Now I can’t see it here, it’s so big data is given but in this data we Will we keep only those things which are useful to us? He is of no use, so I will remove him from here what is not useful here then here By title we mean overview We have nothing to do with that movie We have no objective yet after knowing about it If not, this column will be of no use to us The language of the movie is in English Made in Hindi Made in any other language We have nothing to do with that, why? No, because the question that was asked of us There is no question of language in it Otherwise, remove the question that is not there. Because you will only ever get so much data Even if you go inside the company, there will be a lot of You will get the data in bulk, it is up to you The company should only provide you with what it is asking of you. keep the data to yourself because you are If you move forward with all the data then you will make mistakes You will not get accurate results what will we do with this data It is a useful thing for us, we will keep it if we keep eliminating him then what is the use I understood two things that these two columns has nothing to do with it eliminate it another one here is poster url now movie what is the poster of it, we have nothing to do with it we don’t have to give it because the interview I have asked a question, there is no mention of it in it There is no poster URL or language So what do we say about these three columns of the overview? We can do these three columns So can I simply delete it here? We will remove these three columns will bring the release date in the correct format and There may be white space in the genre in the future Who has to remove this work, whose work do we have to do data processing inside pre processing Properly arranged data inside We have to arrange it from here Right, so all these tasks that I told you about Let’s note down all these tasks in one place ma’am, make a summary and note it down what is needed it is needed because In the future this file could be used by someone else Look at the recruit or any other person See, he should understand your work. Cleanliness should be visible, everybody should coat It takes but where should I give the comment? You have to write the summary in the correct way so that even if any recruiter is looking for you If he opens the code file and looks at it then he will find that yes it can be done in proper and systematic way It is not just an interview that comes from By going and telling that my strong The points are that I am very organized or I like very clean work, so to speak No one will believe you, you should know how to show it So we write down all the summaries on the second page This summary is also important because today You may be working on this project If you open this file again after 202 days then If you don’t get confused yourself, then the record will be there Then you will understand it automatically, so let’s do one thing Whatever work I have to do till now, I have to do it If you note down the place then the basic These are the summary points, we have jotted down in one place we have written that we have a data Frame having 9827 roses and nine columns That is done. In our data set, any There are no duplicate or non-existent values now But the work that needs to be done is this release date This is a column, we need it in date time format I have to bring three things and remove them Overview Original language poster URL: All three Columns need to be dropped and then vote We want to make average a better category people and all this white space I want to remove it right so we have to simply Right down here, whatever things we find here Now we have to do the next step which is our first task They will do the same thing, I will do D.Ed once Let’s take a look at the data before writing the head This is the release date and the question I have what is that related to the date given to The question is given that in which year the most More movies have been filmed and released in date time but even in this i don’t what does ko just mean to me just ear it means 2021 22 21 it means so I have nothing to do with its month So I have nothing to do with its date Who has anything to do with the year, we what will you do in this release date only year If we don’t need the rest of the things then we will keep it We will remove that also then how will you clean it How will you pre-process the data Let us begin to understand this now So the first thing that we need to do here df is the name of our data frame Inside this our column name is Release Date So we write the release date here and After this we have to change its type date In time format, here we have PD i.e. that we will use the pandas library and this will change to date in time format So I wrote here date time df And the name of our column inside this is Write to us here If you have to, then the name of the column is Release so let me just write this here release date right this is for us here You have to use pay and then the bracket Simply close it and then click here will write print DF After this we will be released here date bracket close and now its data type which is D type, we will write data here Let’s type check if it has changed or not If you run the change then the earlier data type was what we had of object type Tha Here We Can See release date which was delayed the type was that of the object type now this what did we change it into we changed it Look at the date-time format now Data type: This is what we have printed DF release and its data type i.e. D I checked the data type by writing the types, so now It is in date time format But now what are we to do here? just want to keep year 2021 22 like this We don’t need any month so now we will do it If you convert then write for it will have to do d a our column name is release date right and inside it equals to da i type the whole thing in a single code first I will take it after that I will give you one by one I will explain it by doing this so you don’t need to green about what is happening right i I will explain this entire project to you line by line. you will understand that that is my hope for you right So I write it first and then after I will explain every line that is released date and dot d types what is written I’ll explain it to you a little bit Look, first of all we have released this Date is our column, what do we do inside it If you want only year then click here that’s why i wrote release date dt year now Whatever we will get, we will get only year We have the right and if I have the data type If you want to check then now there is only year i.e. If 2021, 2022, 2024 are this type of years, then this What type of data will it be now? We have data of integer type so now i am going to put this code here I am going for a run and it is showing certain error what is the error ok error it is given here here we have a comma double inverted comma which we put single inverted comma He did not close it, now look here what is the data type integer now i let me go here now I’ll type df2 and Let’s see whether what we wanted happened or not otherwise it would have been done by writing DED head We wanted to know how it was coming here earlier Look, it was coming with the full date written 1512 2021 But now, after what we’ve done, If it is coming with just the year written then this year is coming 2021, 22, 21 are now coming according to the year Now that this year is written, its data what is the type integer type data Once it was typed, this was the work we had to do We have done a lot of work on this with the release date well done right now what is the next step to do so this is the first Done and dusted with a working release date now this The overview column will be removed from the original We will remove the language column and the poster We will remove the column containing the URL, so let’s go right Let’s go ahead and perform this operation. it’s alright so let’s proceed with the Next step: Here we first write Dropping the columns like headings So we have already marked those columns which are not useful decided now we have to drop that columns right so I’m gonna mark this down Now let me write down our seconds here Step 1: Which columns do we want to delete? If you want then I can name the columns here I am making a list in which I will get everything I don’t need any columns, I will name those columns If I keep posting then this overview column I don’t need it so I’m here Removing the column with pay overview I want to so I am writing the name of the Columns which I don’t want in that data set On the right, first select the overview column We don’t need it so we are here for the overview Let us give you which column is not required We want the one in original language There is a column, we don’t need that either, so I think The name of this column is Original Language Write So here we write O must be in The Capital Original Original Underscore Language In right language the L is in capital this is the original language we don’t know this one should inside column is poster url right this We also don’t need it so I am Writing Here Poster End underscore u a l so this is as many columns as We don’t need to make this a list Now we are giving lakhs of rupees here, all these If we want to delete the columns then click on delete button Our data frame is named df After this we will use the drop function here. We will use and whatever columns I have written here Those who have already been kept in this list I have to remove that, what am I doing here I am passing so I will write here I am calling the number of columns I need to delete There is already a list of comma columns inside it one has access to the forest and it is as many There are columns here in all three columns I want this data set of mine permanently get out of here i have nothing to do with it If not, I will write it here in place Equal to 2 right now after this DED here columns, let’s run here now Look what the columns are showing us here We did not need these three Columns not wanted Overview Poster URL The original language is now removed but But once we get our data back Let’s see and see if it is really removed or not Otherwise I wrote back here D.Ed Head Now you can see that these three There were columns, they have been removed, now only release Date Title Popular Vote Count Vote Average and Only Janras is coming, rest these three are columns We took away those things which were of no use to us what did you do successfully, eliminated it Now here’s another thing for us It is very important to calculate the vote average Look, many times we have such scenarios that we are given a criteria Like for example when you were in school When we study then this used to happen that also from 90 to 90 marks Whoever comes will get A grade, whoever is above 80 I will bring it up to 90 and give it B grade Whatever you get, let’s assume it will bring 70 to 80 He will get C, such was the criteria There used to be such a grading system in school In the same manner as I vote here I will talk to you if there is any such movie who got more than seven average votes If yes then it might be a popular movie There is a movie which has very low rating If we have got it then we can tell him that the movie What it is is an average movie, it could be any movie it is below average and there could be any other movie which What if it has not become popular at all? Whatever numbers are there, we will do it instead of numbers The label will give you the right if a movie scores 8.3 This means this is a popular movie, 88.1 I mean it is a popular movie but if a movie It is not popular, like say 6.3 right so we can tag it as below It is an average movie so here we are going to tell you a We will make a criteria and according to that The numbers here will be replaced with because seeing the number 8.3 someone might I don’t understand, things get tough But if it is written here Spiderman If it is a popular movie then you can immediately understand the word popular If it comes in then here we have the criteria create and label this vote average From which the numbers will be removed and we will use labels in their place We will add a total of four labels Which four labels will be the first popular ones? Movie’s below average Movie’s below average of movies and then of not popular movies We will put this total which will have four labels We will give here this weight average column So how to do this work inside, let’s see this let’s see alright so first of all we here Let’s write down what we need to do Here I will select mark down and write I am saying that we have to category is the average Vote column right and for this we have four labels as we decided that we are going to Make four labels popular average below average And it’s not popular to do this job anymore What we will do first is create a function Now we will create category with column name what will happen in this function then I will be the first one I am writing every line of code here and after that I will tell you definitely each and I will explain every word here So here we create a user defined function whose name is category is column Right, in which category should we place this? If you want to keep it then we will name it here keeping category column name ok now This is a user defined function because we Here you can implement this function yourself Now what we are doing is this category is column name Creating user defined functions from this I need to put the data set inside so I Another variable named create here now what will happen in this age Here ultimately this is the complete data set We have no minimum value in it there will be some value which will be maximum right and 25 will come in between 50 and 75 of this Accordingly we will give labels that which is most The maximum he will get is the popular label You will get it then let’s suppose if he is at 50 Like on and around 50 to 75 then He will get the title of average if he scores on 50 And if it is around 25 then pay his average bill You will get the title of average and if you score more than 25 If it lies between the minimum then it should not be I will get the label of popular right so that’s why I Here I am creating a variable named edges now inside this da is our data frame Inside this we will pass the column dot We will put the describe function here and Describe what we need inside it Firstly, I am making it for the minimum First, we are making the minimum here so i am writing here minimum ok next For whom will we make DF here? In the same manner we will write DF call here. and now we will create dot describe and Why should we use describe function is 25 for then I have to write here 25 But this is a very new thing so far We did this work in projects It is not implemented that any text We give the data to some number, data to people what to do to convert to labels so this is a very new thing which you guys are watching right now Wright is learning inside the project In the same manner we will write DA here column then dot describe function For dot describe then after 50 per here we will write it so we have to right it here 50 but okay and then we have to close the bracket same manner same thing as here It is written, you should type it here I’m just simply coping with this thing so that I want to have to write this again and again and it will save my time so thatchi guys are practicing for the first time then you must have to write those things Right because the more you write and code the more you will the better things work in your mind I’ll fit in right so here we have to close the bracket okay so this is us guys what did our name Ages do here This was a user defined function, we called it made Now after this we have a variable named edges what do we have to do now after making this df is data frame inside it we have Out of the number of columns, which column If you want to do categorization then I am writing here df1 dot and here the cut function this cut What does a function do in categorization It helps us to define what we want to label no it helps to give it a label Because what was there before, all these numbers before Now if we want to label the numbers then its So this is the function we are writing here If this cut helps in categorization then Here we simply tell you what’s next This is the step, we will proceed here so here we have to write pd dot cut after that We will put the data frame and in the data frame Inside our column, we will We have kept it fixed in ages, right, at 25, 50 Now in this we will give the labels right so now Here I wrote about ages and ages Accordingly we will provide labels here equals to labels comma right here now The labels which we have made separately for this We will pay here and after that no one can if it has duplicate values or This is not the case for us if any If the value is duplicate, it should be dropped We don’t want any duplicate values If I survive inside this then what am I doing here I am dropping the duplicates and After that what will you do to return DA Now look, this is just a function that has been created We have created a separate user defined function so that when we need this function We will call this function whenever needed. So what did I do differently here? The user has created a user defined function def keyword to create a defined function What is the name of this function used for? We have named this category as this Now inside this, in the argument, we have placed Three things pass df means our data frame Columns Whatever column we use this for According to that we would like to label it can it be possible let’s suppose further Let me label some other column So what did I do when I needed it I have created the function whenever needed I will use it when I need it, so right now I just placed the calls here right now we Next, we will specify the name of the column What is the name of the column you will insert, vote average and The labels that we have to give, we also give them separately You will create it but on which category you will label it If you want to give, we have made it for you If we have minimum age then give it to him should the label let’s suppose it’s not popular if that If it is between 25 and 50 then it means below Average if it is at 50 and around 75 Let’s assume that the average and maximum are Popular in this way let’s suppose we I have thought about it, right? Now after this, what we have done Write this line of code to create this data frame Whatever your column will be in cut function It will help in categorization and this we made the ages, we passed it and These are the labels that we will use in the future Look, we have not written it anywhere in our code We have not written anywhere that average popular is ahead I will go and write it now and if there is any duplicate if it is there then we have to drop it, after that we what will people do, they will return this right to the data frame so now we are here Now what label do we want to give here We will start making pay labels, the first label let’s suppose i am writing here not underscore popular this is my first label Right I am creating a second label called Let’s Suppose below average that is my second label third Here we are making the label of average We are making the fourth label of Popular Right, so we created this label here. Now if you guys order this label If you see, the orders here are completely systematic Look, we have kept it, it was minimum so it was not popular Then there was a round of 25 to 50, so below average and if it is from 50 to 75 then average and If it is maximum then popular right we have said this Accordingly, the label has been created here Now what to do is now this function that we have Here we have created the category is column This function by name, the number of steps it had You are seeing that block, right? In this, only We have created the function now I Now I want to call this function I want to use it whenever I If I need to use it then I will call this function just like we name a person after him If you want to call someone, then call his name call me right sambalpuri call me swati Right in the same manner we used this function It is named category is column by taking its name Now we’re going to call this so let’s name it Here we mention the function The name of our category is Category is under Score column now inside it look at the data frame here now when any If you call the function then here you Three arguments have been given now these three You have to pass your arguments here If you read it, the first argument will be our DA Let’s pass this one now, just here Earlier, when we created the function, we only I had left the call in writing but now which one? If you want to function in column then vote If you want to do it in average then now I have to Here you will have to give the name of the column and what is the name of the column vote average right so i have to write it here vote underscore Average is okay, now we have given it a name now what to do next what to do next Labels These are the labels we created here this one this one label right this one Create labels and pass these labels here If you are doing it then I will simply write it here I will give you the labels, write the labels here Ok so this is it, calling the function now Here when you call this function I am writing DF and then this vote This is our column with the average. Now I’m going to put I want to check if this thing has happened or not it hasn’t happened right so what do we have to do Vote average and how many unique votes are there in it These are the values, let us look at them now So here I have written unique now let’s just run this code it is showing certain error let’s say category is column is not defined So what did we do wrong here? Look here the spelling of category is different and when we are calling then the category is the spelling of is different here it is e and here it is s Pay has come because what is it giving us Because of this it is giving us error here let’s do one thing i will go and get it I will do it let’s suppose right now here I will run this but it is showing some error lets say Just check the minimum amount and tell me here it seems there is an error ok so ok look here It’s an error, it’s such a small mistake, it’s minimal it is written the bracket is closed after that we What are singles closing here inverted commas so this is our mistake Let’s correct this mistake brother, remove it Now let’s try running it, now anyone There was no error, now the proper result has come Now you will see that there are unique values here. What has become popular below average Average and not popular now this is what I call I n the data frame we have the change I am writing to see if it is true or not. dead head run and here we are Getting the results we wanted to vote for Look at the average, what came first We’re getting the numbers here you can see Right first here the number comes in the vote average was 8.3 8.1 6.3 now we have changed this thing and here we are getting the labels Right, this was a very important aspect Anytime you do data cleaning before We are currently processing So whenever you do such pre-processing then In this way a numerical data is given labels. converting into this was a new thing Data Prep for Learning in Project inside the processing step so I hope that If it is clear to you then inside We did the important work that we had to do I have done it now I want to see that brother How many movies got the tag of popular? There is so much of a movie which is considered average How many movies have got the tag? It has got the tag of average and how many such movies is the one which has got the tag of not popular right So let’s check how many movies are there In this we will write it to check it df what is the name of our column what is the name of the column name is vote underscore average right vote Underscore Average now includes value counts If we apply it then whatever we If you want to do something, you can do it here Two value counts now look you have it Not popular among the total movies There are movies There are 2467 popular movies 2450 is the average 2412 and below average is 2398 so this is us People now know a rough estimate If you look here then this will be your Right now those movies which are not even popular There’s no use in showing it to people Why right because there is no movie The movies are of no use to him There is no use of showing in recommendations the movie which is not running at all and people are watching it Why are they not liking it and recommending it? will do you right so in the same manner to me now It was found out that the not popular movies which it is not that special but our data Most of the movies are of this type which It is not popular right now what should we do As many likes as people get here, if we have any of these now There are duplicate values or any non-values Is it necessary to remove it in any particular row If so, we will write about it here. Ded drop A and after that we people I have to remove it permanently so what am I to have right in place equals to true Because I don’t want these things forever I just want to have this data remove this permanently now after this you You have to confirm whether it is removed or not then click on dot we will write is a and put a dot some So now you will know, now you will know from here that all the non-values etc. that were there in it All that has become 0g here, that means now this It has been removed properly, right, this was it Basically this is an important step for us Click here to understand the process I hope that I am able to understand the project if you can understand this till here If you have completed it then please tell me in the comment Write and tell me that yes ours is on 50 The project is complete, now move forward We will grow and complete the project at 50 If you do it then those who have been able to reach here at least just let me know in the comments that yes mam ours is at 50 so that I know How many people walked halfway The project is also made minimally, let’s go right Let’s move ahead and proceed to the next step okay let’s do it our work is done now I will call the DED head once again and here now what was in the beginning I had said that whatever genre our column is there is a genre in it then a comma then in the middle There is white space right so what do we do now We will remove this white space and I want that across all the genres They are not written together, they are written separately Like action in one line then in the next line Adventure then science fiction next in line So, whatever these genres are, I want to break them I want like three generas one line here I have written what I want it to be Different, first comes the action, then comes the adventure Then science fiction came here in the same manner But first comes crime, then mystery If a thriller comes then whatever thing is written in one line We have multiples written here General, we want those different lines I write it and white space from here If it gets removed right then what should I do for it now For this we will have to do what Applications which are yet to be implemented We have to do this now, we are going to learn it So firstly I will put this in heading format I will write it down in mark down that V Wood split the genres into a list now here But we will treat this generosity like a list wanting and then after that we will do our will explore this data frame so that If a movie has three genres then it They came in different writings, like Spiderman First genre is action then spiderman is next in line Adventure in me then Spiderman next in line Science fiction i.e. Spiderman Spiderman Spiderman That Will Be Common Action will be written in front of it, then adventure And then science fiction so this work we People have to do this now, how to do it We will proceed further, let’s take this Basically we start watching now so The first thing that we notice here We have to do that for our data frame what do we have inside called genre The name of the column is So I’m Writing Here genre ok now in this genre column df And here we will write the genre again because The operation has to be performed in this genre Now we will dot ATR dot split here First of all we will know how to do splits People had seen it after every genre comma then white space so we can use comma and what is this regarding white space If we split then I am writing I pressed comma once again so that a A space is created and then an inverted comma I am closing it and then it’s done right so this I basically did it here now df2 df-41 Earlier this index was zero Index and in this index three things were written but now if you separate them then they will be in rows Action One will come, Adventure Two will come science fiction will come in so right now what we have We have data and it will become more data Right now the data we have is about 9000 a like let’s just see we up here was fired on 9827 data was right, so now as we We will break this genre into different things If it comes in line then the data will increase further It will come to us so I will write it here I am doing reset index reset index Setting Drop Equals to True after writing we have to because we want to do it permanently and call df dot head Alright, let’s run it and see And finally we are getting the result which we Wanted to see Spiderman here now? Nove Home This was the first movie with three things in it was an action adventure science fiction first one was coming with writing as we saw above It was both action, adventure and science fiction Now this is what it is, it’s split And different things are coming right now one more task What we have to do is that we have a column with the generator What do we have to do with this column? If you want to cast in category then here I am I will write it once in the comments Casting What column into Category right now for this to do this work df We have a column which is the genre column, this We are trying to make it right in the category genre equals two df3 right allright next here df now this The honour is this, we will write it here let’s take it and what is its data type now If you check this out, you are here now You will see that the one who was here earlier is now as many as We also have genres, they are already here They are together in a category and look here Our category data type is object Typewrite so we have already done this work now if you go and see first then only There were 9827 records, now we have split it I did it, now it has increased, so look now How much data do we have now? 9827 Earlier we had number of roses, now we have it passed 2552 right why because what we did The column containing the generator is written first in one line broke it apart now when you If you break it then definitely the data will increase from 9827 it is now done 2551 data you have ok now pay here The more unique data we have, the more If we see this in other columns also If you want I will write it here df1 yike right n unique let me call this so here we are getting the results that our As many likes as you have particular columns Release date is 100% unique data popular vote count general right so we have the total How many unique generas are there? There are 19 left The release date of the movie is repeated 100 is the right title which is 9415 in this way We have seen these things here What a unique value it is, now finally we move ahead Work on data visualization will increase We will answer the questions that have been asked to us in this One by one during the project you will If you solve the question, the more data you get We had to do pre-processing, we did it Now our data is absolutely accurate It has happened once, I even called and showed it to you I will give you the D.Ed. head now in the correct way Our data is organized, now we can use it All set to take out the insides We have prepared our data very well. Now there is no missing value in it there is no error or anything in this There is no problem here now, it is completely ready The data is done, now what to do There are five questions given to solve in this interview assignment Only four or five come in strict assignment I solved the question and sent it but its They don’t show that by making a visualization is the biggest mistake you ever make if someone If he is asking you to extract insight then he Please show me the visualization so that I can understand to the person in front that yes brother you If you know how to do visualization then first of all Setting up i for visualization I want style so I am aa d style I am writing SNS, what does SNS mean? means your c bon library so here we are The white grid has a very stylish look let’s set it up so that our The visualization should be made properly, now the first thing what was the question asked what is the most Frequency genre is this I will put it in heading format Now let me write it down by selecting mark down Which is that genre? So here now we have I have separated the genres and now I am asking on which frequency whose movies has been released netflix’s 2552 means total count of data is 19 unique generals which are the most in the genre in which movie has come that is the drama right and its what is the frequency 3715 now we know this I got the basic idea from here but now I wish that if I write this much If I give it to someone then no one will understand I want this first honor if it is the most There is the highest drama, then what comes after that I want to show this by making a graph So how will the graph be made using Seaborn library? we will use it here now i am here I will write SNS dot cat plot we are here We will make a CAT plot i.e. category estimation can show what we have and what we have What kind of graph do I want to create? Access will be created I will show you how to make it rough first so that you can understand that I want All the people should come here as if they were here Drama action is right for all these genres be written here and according to that Such a graph keeps getting smaller and bigger So you will know that this is the first genre and This is the minimum so you want a graph like this So what I want in the y axis is I’m writing in Axis genre comma data data in giving us people will have to do da comma kind equals to what kind of you want to keep count you want to keep count Means if you want to show the number then suppose Drama is our most category which is drama is the best genre His count will show that 3715 were deducted from his account. The count is right, we have written it here. Comma order comma order equals ToD A in which order should the genre be shown The value counts right whichever is largest comes first and then the smaller one comes Value Counts and Then After What We Setting up the index is a must will have a dot Index value counts then bracket close Then dot index right this is now color too If you want to give something of this then you Can’t mention the color as well so color equals to We can put the color code here so one if I want blue colour let’s suppose then its colour code is 428 7f f I will put the code here and show it to you I am getting the color right, this is our job Now we have to show it something about the graph There should be a title as well so the plot has turned into a twist We would name the title graph as title Are Genre column distribution ok now let’s show this If this graph comes out then for this I will have to write p lt dot show so I am writing here p lt dot show let’s just run this ok it is showing certain error error here because one side is double inverted comma and one side single inverted comma let’s fix this again it is showing certain error ok let’s just check what is the error it is y equal to genre here is speaking order equals two well look here here There is a mistake in writing the genre here but Close here with a single inverted comma I have not done this let’s do that and this is the graph v R getting so now we understand that as much as Now we are seeing people here differently of which we see the most in which Movies are based here, that is drama and the lowest one is western and we have the documentary right Just by looking at a graph you will know immediately Think from this point of view that Assume someone is working inside the company is he in your senior position or someone else He is from the department and he wants to see this From the creative department let’s suppose that Which genre of movie do you want to watch? The one who has been praised the most is us Here you can see that the drama is the most It has gone and the least that is there is Western Right, so we got to know about this general here. Through this our first question is solved happens to be asking in the first question Which genre has the most movies released? has happened netflix’s proceed for the next question so here the second question is which has highest votes in the vote average column right so here now we has to solve this second number question we have to properly apply hash to it and then back it down if we convert it into and write it then see now Here I will tell you once again that D.Ed. Head I am showing you this by writing and here we are seeing That’s our vote average column Of which what we have done is categorized is it has been labeled as popular not In the popular average it is saying that who The genres which are most popular are The highest average is below average now we have to show this so for this also We will make a graph and here our There will be labels, we have four labels on average popular below average and not popular then this We have to make a graph and show it here right so what we’re going to do is most Firstly, if you want to make a graph then use A D Cut Plot That will create a category plot on the y axis We will have a column with vote average in it So here we have to vote and average right vote and average than comma put it equal to df2 as written above in same manner right here But we will show its count also Next will be our order here So whatever I have written above is the same We have to show people things so I am here But what am I doing simply copying this I am pasting but please do not copy paste you have to write right if you are learning you want to learn as much as you can by writing code yourself say that will be great for you to pet d Look at the title, we are not here whatever If you want to give a title, they write the title Let’s write down the board distribution When you see a project If you make a bar, you will hear half the thing, half the thing I will be careful in making it but when you go to the second If you make a bar, you will be able to write the code yourself Right now I am showing by writing that this If you want to do this then watch the video first Make a project next time but do it yourself try once that I post other than video If I try to make it without seeing it, it will be much easier for you right so it is showing certain things like we are not getting the average vote count so let’s just Check OK, this is why a mistake is happening here Because I have put the genre here Right, you will have to change the vote average here It will come, we just did copy paste from above If you had not changed it then you have to enter it here If we look at the vote average now, this is what we get The vote average has been completed, we can see it here Are movies more popular than average ones below average and then not popular to second gonna show us what’s here We have the most movies here too I have already shown it on Let’s proceed with it What does question number three say? Movie got the most popular and what is it Its genre means which movie is liked the most Which movie has got more popularity and which is that movie Is the genre right? This is our question number three so I will give this also properly Select Mark Down in the Format heading I will do this and write it down, now we will write this down too If people want to solve this problem here then now The data frame we have is df. Inside DA and our columns are popular names There is a separate column from this I do one work, D.Ed Let me head over and show you just two first. If you look at the data, look here at the popular If you have one then which movie is the best? Popular is asking too much here and which generation does she belong to right so we Now we will write the code for this here, what is the code We will write it here first of all df our data frame We have inside a column named John Popular We have a column so I’m writing here Popular then after bracket closed double equals two In this we are assigning Popular because we create one by assigning it are here pay and then comma minimum Maximum is asking here first I will write maximum here and After this the bracket should have to be closed and then run a it is showing error popular i think The spelling of popular is wrong here, popular ok here p should be capital If the error is coming due to a spelling mistake then The most popular result is here The name of the movie in this data set is Spiderman and his popularity is 508.509 which is the movie which has lowest If it is popular, then we will post it here people will write hash then space properly run this code in mark down then same hum People need to do the same thing as we did above first up what we did was max Now I want to change this same thing I will pay here for minimum and here We are getting the result of such a movie There are two movies which are very popular The name of the first movie is The United States Vs Bali Holiday and other movies Threads are popular for both of them 133.5 what is its genre so the first movie the general of what is now the united states Music Drama History and the Threads Wali Mu threads movie is its generator If there is drama and science fiction then we When asked which movie is the lowest popular one The lowest popular movie is that it is the United States and Thread are two movies now The next question is asking which year is this which had the maximum number of movies It’s been a while now that we’ve been here If you want to show then we will solve it now finally its another question so i am just writing here question number a let se five here and we have to write this in a Format of a heading so select mark down run now here we are I have to find out which movie is that The film which has the most movies in the year If it is right then what should we do for it It would be better if you show it by making a graph We will all stay here in one access below What will people write? They will write the year like 1990 Its graph will appear here in 2000 such years It will be made like this and its value counts here If we stay right then this graph of ours will be like this We will create it here to make this graph. I will first write it in df df We have what we need, year to year In which year is the release date inside So we will write the release of this column here. name release underscore date okay and now we I will create a histogram here so I am Writing Her Hist And you will have to give some title to it too pat dot title what is the name of this graph If it happens, we will write it here The name of the graph will be the release date column Distribution okay alright a here somethings are in the caps and it’s not looking good so let’s just Write This Again column distribution alright done ok now show this we have to do it too so we are going to right here PAT dot show so that the graph is made and the runs are shown this code is ok it is showing error see error why is there a very small error here There is an error here, you put a double here inverted commas and here it is single so It is showing error because there is a mismatch so let’s make it very simple Now let’s put a single inverted comma here We have already got the release date column Now here we are looking at the best year The one in which the least number of movies have been released is of 1940 and 2020 is a year in which Most movies have been released in this Write down the answer so that when you complete this project Submit the file so that the recruiter can see that Well, he solved it using this method And the answers that came in the end were put in one place It is written with surprise that it shows your efficiency that the way you work, that is very neat and clean you in a neat and clean manner You work and explain each and every thing properly You write in a way that shows your systemic approach that shows your personality that you are having That skill set that will help you do every job systemwide that you have messed everything up by throwing garbage here and there If you give us the right, we will always do this work should anyone remember this any time Assignment comes, project comes, company comes If you want to submit then you can try it now right now you are just doing aspirant be kind but you can also go inside the company Whenever you write code, do it in the correct way It is very important to write by arranging it There is too much scolding from the other wise manager You need to get this thing right now If you want to get into a habit of this thing then let’s do it Surprisingly we finally got to the point Conclusion to surprise the points let me just write this in a mark down format and here are the things right so this is basically the first question We were asked which drama genre was the best the second highest was asked what will be the general on that so that will be drama and The most popular vote we saw was this The third question was that the most Which is the most adventurous and most popular movie? so if it is spiderman what was his genre we I wrote it and then after that I asked lowest The United was popular for two movies status and thread then asked him after that Which year had the most movies? If the film has happened in 22020 then we have all these Answer of question in a simplified manner here If it is written right then these things are very important happens whenever we work on a project we must have to write this things in a proper manner so guys I am The hospital is well understood This project will have to be done two to three times properly If you make it then the most special thing about this project what did you see that once we got the data has been pre-processed correctly After that we quickly asked five questions We had given them, we solved them quickly, but Where did the real hard work go in this project Pre-processing the data in creating it What to keep and what to remove in each column how to convert that column into labels I have to arrange this, this was the thing Worth learning in the project and always in the company If you go inside also then the data will be pre The most time consuming is processing and That is where the company is getting the most profits now. Why does it give this type of project? Because the company also wants to see that Do you know data pre-processing or not? Because if the data is clean then it Anyone will extract some insight and give you the right Even in platforms like GPT If you try putting it in, you will see something somewhere you will get insight but the most important thing is that Understanding each step is a hard task Because four GPTs cannot do this work. like what we did here is white space Right the numbers we gave It was changed to a label popular not popular so you have to think about it right so This thing is called data pre-processing any company judges this, whether you If you go to any company then such things will be yours This is checked during interviews This was our project in which this thing was very It was important and worth learning, now what is here Our work is over after making the project The work is not over yet, now what do you want You should go and do this project You can tag The Eye Scale The Eye Scale You can tag the page of Also if you want to connect with me Together and with the marks, both of us brother and sister So you can connect with us You will get detailed curriculum complete lecture and to & Projects Live Doubt Classes Industry Recognized Certificate of Interview preparation along with training You will also get help for this, the link is in the Description Hello everyone I am Swati And welcome to the class, in today’s class we Starting e-commerce sales Full project key for data analysis analyst and data science now this is This is an e-commerce project, so it is special Because the first reason for this is that e-commerce We all know about the companies first It was very difficult to imagine that in 10 minutes Inside our home we order something and come But now it is possible with e-commerce and This is possible because so much the data that is collected in the correct way Analyzing, building recommendation models, and the goods coming to our house, all these things The data science behind data analytics is a You have a huge hand in today’s project We will also complete it because E-commerce companies have a very efficient amount How to manage the data used It goes we will get to learn one thing and another The most important thing is that here we The project that we are doing is sales Whether you are a data analyst or not, you can do the analysis as well. Pay as a Data Scientist Business Analyst for any profile will go for giving interviews and whether they Any company, be it a health care company Do you work in a finance company or Any company working on AI Be it e-commerce or retail company, any company Sales is important in every company If there is a part then how to do sales analysis This is done in today’s project, you will You will also get to learn Python’s how to use libraries To do visualization, perform EDAC Today we will understand all these things we will do it through today’s project and this This project which is related to e-commerce is ours It would be very beneficial for the resume as well so definitely when you start this project You will be completing your confidence The level will increase and it will be good for your career too This could be a very important project Before making the project its complete The complete material is there, the source file is there, the code is there We have notes, all these things come to us for study There must be material for which all this Things I’ve already put down on git hapu It is placed below in the description box for you You will also get the link to git hub by going there Whatever number of files you have, the data is set You can easily download all these things So let’s start today’s class and let’s start this project all Right, let’s start today’s class where we will complete it completely The project that is your e-commerce data Now when we talk about e-commerce, the analysis If we do that then e-commerce will bring a revolution It came like this where we don’t know how many websites Know whether you use it for shopping or not flipkart-in jettar and many more like this E-commerce platforms have been built now When we talk about e-commerce, But somewhere or the other, every day millions of customers Now all these customers do shopping This data is stored in a proper manner Utilizing these to make company business decisions The company was able to improve its strategy so that the event venue can grow For this, we get so much data every day This data is released and produced It is very important to deal with And if I talk about the purpose of the job If we want to make ourselves job ready Do you want to appear in interviews That data analyst’s job is business If it is an analyst or a data scientist We have to find an e-commerce related The project must be done because there are many There are all such e-commerce companies where Hiring happens and the e-commerce tech segment there is a very large portion of Multiple requirements of multiple companies So if there are any such projects for you Portfolios include those in your resume If yes then this is also an add on advantage for you Maybe today we will be able to create such an e-commerce platform. We will take the data of here, much more than that data will try to extract all the insights So that you can understand the technical aspect as well. and along with it the business of the company This is your point of view, you should also understand this Now if I may talk about it here There are as many technical brands as Brands are like those people who like us We get rights online from where we people Do online shopping like let’s suppose Let’s say we went shopping flipkart-com is a platform but its Apart from this, there are many such offline stores Now these are the offline stores A lot of data is produced from them too. Like for example let me talk about Big Bazaar or should I earn or should I talk Lifestyle Key So all these are the brands of There is a digital presence, but its Along with this, they also have offline centers where we go and from their shop We do our own shopping, so now it’s so There are big franchise chains like If there is a D-Mark then their shops are there Customers also visit here and their There is data, we have to analyze it too It is important so today we are sharing such data We will take a set in which even if we go offline customers are coming online line customers are coming If so, then the analysis is done from their data All these things have to be taken out in today’s In today’s project in class, we Now people will discuss whatever the size You can look at the brands even if I talk about them Lens Cart’s Sugar Cosmetics Mama Earth Louis Vito Sunny or else I am shopping from my cart so I Visit Lenskart’s online website and app I can go shopping there or I Go shopping at any Lenskart store If I can do it then online data and offline The data which is of both these customers How to deal with today’s project We will keep all these things in mind as well And now we will learn all these things too The situation you have here is that you The data of an e-commerce company is here pay is given so first of all I will tell you Let me show you the data set here, something like this type of data set we have here Available now in this data set you have to order ID of which date the shipping was done what was the mode customer id was from which country The customer was City State, all of this The data is given to you here Product ID: How much quantity did that customer buy? How much did you buy and sell? How much was the discount? How much was the profit, all these things It is now given that every company has the most The more important thing would be That company has the sales right, no matter which company it is Also if I talk about the company then give me sales way it could be one of her online Sales can be done through online mode and an offline mode Sales from this event can be important What is it like to analyze his sales? How much profit did that company make? There was no loss so much expenses If the company does it, then every company, whether you are any also go to the company for every company its It is very important to analyze the sales If it is more important then today we Whenever there is a company, someone who sells How much profit is to be made from it How much revenue is all this coming monthly You will also learn these things in today’s project So here we actually have the data set Have a look at e-commerce here The data set is available now also here But we have some questions Whatever amount of data is there in this data set Looking at the record, here we You have to solve six to seven questions The first question here is being asked by you Need to calculate the monthly sales of the store End Identification has the highest sales and the Lowest sales means you have a company You have a record of every sale The sales of the month have to be analyzed and this Tell me which month is the most What is the sales and which month is the lowest? Sales have happened, after that you have to analyze it You need to analyze sales based on Product Category and Determine Which The category has the lowest sales and which judges the highest sale now every single company when we Let’s talk about different categories of e-commerce I sell things, for example, furniture Electronic became a category Appliances has become one category and different If there can be a category here then which one? In which category sales are more and in which less We have analyzed all these things Have to do it according to the company’s product Now let’s talk about sales analysis needs to Be done on the sub category now like for For example, let me talk about a category If I buy electronic appliances If I talk about category then mobile is your Laptop has become a separate category in this TV has become a different category, yours is different AC has become a category, AC has become a separate category We also get sales according to each sub category I have to do the analysis of this here fourth The question is telling you that you need to analyze the Monthly profit from the sales determined Which month has the highest profit which month The company made the highest profit in a month This has happened, we also have to remove it Together we all get profit In the category let’s suppose company if If you sell electronic goods then it is possible that he is facing loss in mobile and If you are getting benefit from laptop then it is more If you focus on your laptop then which one? In this category, he has an advantage and a disadvantage You also have to take it out and show it along with this You have to understand each customer segment which is different Some customers shop like this Those who do shopping, their monthly income is 5 There are some customers who are less than Rs 5,00,000 which will have more than 1 lakh different If the customer segment is right then According to them we also have sales here Data has to be analysed and finally speak This is also the sales to profit ratio You have to take it out now let’s suppose your sales Right of ₹ 1 lakh is done but in this The profit margin was very low so how much Was there profit or was the company in loss? was as mathematical as possible about all these things Operations are the profit from sales We will tell you all the things through today’s project. People will solve these seven questions which were given to us as an assignment We will answer these seven questions one by one People will find the answers here when you Even if you go into interviews, actually What happens is that in your first round you have it in the same way as I am now Telling you like a business case study I have given you a data set Questions are given and you have 24 to 48 hours to answer them time is given that all these By finding the answer to the questions you can solve your problem. This is the code file, I will mail everything If the company has your If you get shortlisted then you will get second chance When a call comes for the round then this Like assignments which are questions and special We should do it as far as e-commerce is concerned If he wants then he has to answer all these seven questions We will solve different concepts will clean the data Manipulation will be done by EDL visualization If we do then we will learn all these things You will meet me in today’s project So let us now move forward with the company’s case We have understood what we have to study Now we will start further work, how will we collect the data We will extract all this information one by one One, let’s start this today project to do all these things We need to analyze the data first. So for this we will have to use Jupiter Notebook for which we use Anaconda Navigator We must have Anaconda installed Navigator is already installed on my system so We’ll go to the start here, Anaconda Navigator will search Anacanda here As soon as the navigator appears, you can simply use it you can go and open it when you open it to see the interface something like this Here we will get an ID whose The name is Jupyter Notebook this Jupiter You can find the button to launch the notebook here. You will get it as soon as we launch it here If you do this, whichever browser you use You will automatically get this in your browser This will open for me here, it is already there It is open now after this this data set of ours I have been given this data set which is the most I will have to upload it first, I have made one here A new folder is created and its What will I do inside I will upload it here so let’s put this data set here Simply click on upload and you will store the sample name is the name of this data set so here We have uploaded it now the next step what will happen now the next step will be that as much as There is also data in it, we need Python Libraries will have to be installed and after that the data If you need to analyze it then click here for it But I will go to new and the first option which I have given your python3 iva kernel to this If I simply open it here then this will be our The pass will open here Python IY File Here we perform all the operations I will now name this file something no it is so it is untitled you are untitled You will go there and click on it and whatever name you we want to keep it’s name here Want to keep the e-commerce project right So here I am going to start with an e-commerce project called I renamed this file now The first thing we do is to take this data set that we have There are several things that have happened in this data set. People have to do analysis and visualization We have to do monthly sales here He said right, he told us here how much profit We would have plenty of these things We need to make graphs here so that we can Now you can create the correct report You will need some libraries for this First we have to see that the data is clean data Is there any error in the data this is something we need to figure out first So for this we need people here You will need some libraries here We are doing libraries one by one When I start importing the data When I talk about cleaning then the most Important libraries are located here Panda, this is the panda library which helps We can do data cleaning very well with this Now we can see many graphs Here you have to make it for visualization Many times you have used matte flat lip color You might have used Plotly but this In the project I would especially like to tell you about the new LATEST LIBRARY Matte Palette Lip Seaborn The libraries are quite old, from 2003 Its use has been going on for around but This Plotly library is from 2015 Since then work on it has started, so it The library is on Pallip Seaborn It is based but this library is a bit advanced it is like a new library and in it Many latest things have come which will help This makes the graph dynamic and it gets created quickly If yes then we will use this library A lot of people skip Plotly but if you have started this project If you complete it, it means your Plotly library is also somewhere in this Your concepts will also be covered in this During the project, we are here We will use the Plotly library because This is dynamic, it is a new library and You can try new things in the industry You will have to do new things that will come out of it You should know how to use it, we are just one source But if you can’t rely on it then go for a matte lip I have done other projects on Seeb too But I am doing this project on Plotly I am doing this so that you can learn Plotly as well Let’s go ahead and first of all here We import Plotly Write to the library for this here I will have to import plotly.com Express SP 1 now what does this mean This is about data visualization For this, the library which is here can be used Now we will have to use it here I also have to make a lot of graphs by doing this so I its another module import plotly dot graph underscore objects here I will write it down and we will name it as age here. People will write that what will this do to us To create advanced and customized graphs will help for now after this there is another one here We will use Plotly here module import plotly dcompare this what work should I do So these are the templates of the graph This would have been very helpful for him here After this there is another module of Plotly That is why I am telling you this in all the modules so that you can use Plotly from here Understanding can also be developed through import Plotly now I want colors here too If it is in my graph then I will write it here I am importing plotly dot colors edge Colours right, these are the colours I used now what I want here is that By default our template’s theme is I want it to be white so here it is But I will use PIO which I have mentioned above Plotly is already written here PO D template and here we have no default We will use template so I am using default here I am writing PO D Template D Default pad telete d default equals to now here I want yes this invalid syntax is coming so Because our code just ran by mistake Plotly underscore not completed white white here is the template If I want a theme, I wrote it here. White now let’s run this all of it Your libraries will be imported here it is showing some error here let’s just Check Plus Let’s Check Where the Mistake Is So here the P is capitalized. It needs to be made smaller, alright, so right now it is This is our mistake, by mistake we capitalize on it It was done, we made it smaller, now this The code has been run properly, error g Why is Panda being used Data is being used using Panda Plotly for cleaning up your visualizations Plotly does not judge graph objects for Go This will help you advance and customize For the graph this is Plotly D Ao P Ao This will help you to get a graph template For customizing colors we have I have already imported it here and The theme we want is white so I have written plotly white here so this what have we done here all the time Now we have imported this data It was uploaded here, now read this data also we have to do that then by data name i am going to do this one I will create a variable and I will put this data here But if I read it then it is equal to read Under We have our data in CSV and beyond Its name is sample now you just look at it like this

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

Leave a comment