The provided text introduces fundamental concepts and practical applications of machine learning and deep learning. It explains various learning paradigms like supervised, unsupervised, and reinforcement learning, alongside common algorithms such as linear regression, decision trees, support vector machines, and clustering techniques. The material further explores neural networks, convolutional neural networks, recurrent neural networks (specifically LSTMs), and large language models, detailing their architecture, training processes, and diverse applications in areas like image recognition, natural language processing, autonomous vehicles, and healthcare. Practical code examples using Python libraries like TensorFlow and Keras illustrate the implementation of these concepts, including image classification, stock price prediction, and real-time mask detection.
Machine Learning Study Guide
Quiz
- Explain the difference between a positive and a negative relationship between variables in the context of linear relationships. Provide a brief real-world example for each.
- In linear regression, what is the significance of the mean values of X and Y (X̄ and Ȳ) in relation to the best-fit line?
- Describe the purpose of calculating entropy in the context of decision trees. What does a high or low entropy value indicate about the data?
- Explain the concept of Information Gain and its role in the construction of a decision tree. How is it used to determine the splitting of data?
- What is the fundamental goal of a Support Vector Machine (SVM) algorithm in classification? How does it aim to achieve this goal?
- Define the term “hyperplane” in the context of SVMs. Why is this concept important when dealing with data that has more than two features?
- In K-Means clustering, what are cluster centroids and how are they iteratively updated during the algorithm’s process?
- Explain the “elbow method” and how it can be used to determine the optimal number of clusters (K) in a K-Means clustering analysis.
- Describe the purpose of the sigmoid function in logistic regression. How does it transform the output of a linear equation for classification tasks?
- Explain the concept of “nearest neighbors” in the K-Nearest Neighbors (KNN) algorithm. How does the value of K influence the classification outcome?
Quiz Answer Key
- A positive relationship means that as one variable increases, the other variable also tends to increase (positive slope), such as speed and distance traveled in a fixed time. A negative relationship means that as one variable increases, the other tends to decrease (negative slope), such as speed and the time it takes to cover a constant distance.
- The linear regression model’s best-fit line should always pass through the point representing the mean value of X and the mean value of Y (X̄, Ȳ). This point serves as a central tendency around which the regression line is fitted to minimize error.
- Entropy in decision trees is a measure of randomness or impurity within a dataset. High entropy indicates a mixed or chaotic dataset with no clear class separation, while low entropy indicates a more homogeneous dataset where the classes are well-defined.
- Information Gain measures the reduction in entropy after a dataset is split based on an attribute. It guides the decision tree construction by selecting the attribute that yields the highest information gain for each split, effectively increasing the purity of the resulting subsets.
- The fundamental goal of an SVM is to find the optimal hyperplane that best separates data points belonging to different classes. It achieves this by maximizing the margin, which is the distance between the hyperplane and the nearest data points (support vectors) from each class.
- A hyperplane is a decision boundary in an N-dimensional space that separates data points into different classes. In SVMs with more than two features, the decision boundary becomes a hyperplane (a line in 2D, a plane in 3D, etc.) necessary to separate the data effectively in higher-dimensional space.
- Cluster centroids are the mean vectors of the data points within each cluster in K-Means. Initially, they can be chosen randomly or strategically. During the iterative process, each data point is assigned to the nearest centroid, and then the centroids are recalculated as the mean of all data points assigned to that cluster.
- The elbow method is a technique to find the optimal K by plotting the within-cluster sum of squares (WSS) against the number of clusters (K). The “elbow” point, where the rate of decrease in WSS starts to diminish sharply, suggests a good balance between minimizing WSS and avoiding overfitting by having too many clusters.
- The sigmoid function in logistic regression is an S-shaped curve that takes any real-valued number and maps it to a probability value between 0 and 1. This transformation allows the linear output of the regression equation to be interpreted as the probability of belonging to a particular class in a classification problem.
- In KNN, the “nearest neighbors” are the K data points in the training set that are closest to a new, unlabeled data point based on a distance metric (e.g., Euclidean distance). The value of K determines how many neighbors are considered when classifying the new point; a majority vote among these K neighbors determines the class assigned to the new data point.
Essay Format Questions
- Compare and contrast linear regression and logistic regression. Discuss the types of problems each algorithm is best suited for and explain the key differences in their approaches and outputs.
- Explain the process of building a decision tree, including the concepts of entropy and information gain. Discuss the advantages and potential limitations of using decision trees for classification.
- Describe the core principles behind the Support Vector Machine algorithm. Elaborate on the role of the hyperplane and margin, and discuss scenarios where SVMs might be a particularly effective classification technique.
- Outline the steps involved in the K-Means clustering algorithm. Discuss the importance of choosing an appropriate value for K and explain methods like the elbow method used for this purpose.
- Consider a real-world problem where multiple machine learning algorithms could be applied (e.g., predicting customer churn, classifying emails as spam). For two different algorithms discussed in the sources (e.g., decision trees and logistic regression), explain how each algorithm could be used to address the problem and discuss potential strengths and weaknesses of each approach in this context.
Glossary of Key Terms
- Positive Relationship:
- A relationship between two variables where an increase in one variable is associated with an increase in the other.
- Negative Relationship: A relationship between two variables where an increase in one variable is associated with a decrease in the other.
- Linear Regression: A statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation.
- Mean: The average of a set of numbers, calculated by summing all the values and dividing by the count of the values.
- Linear Regression Model: A mathematical equation (typically in the form y = mx + c for simple linear regression) that represents the best linear relationship between the independent and dependent variables.
- Slope (m): The rate of change of the dependent variable with respect to the independent variable in a linear equation. It indicates the steepness and direction of the line.
- Coefficient (c or b): The y-intercept of a linear equation, representing the value of the dependent variable when the independent variable is zero.
- Scatter Plot: A type of plot that displays pairs of values as points on a Cartesian coordinate system, used to visualize the relationship between two variables.
- Entropy: A measure of randomness or impurity in a dataset, often used in the context of decision trees.
- Information Gain: The reduction in entropy achieved by splitting a dataset on a particular attribute, used to determine the best splits in a decision tree.
- Decision Tree: A tree-like structure used for classification or regression, where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or a predicted value.
- Support Vector Machine (SVM): A supervised learning algorithm used for classification and regression. It works by finding the hyperplane that best separates the different classes in the data.
- Hyperplane: A decision boundary in an N-dimensional space that separates data points belonging to different classes in an SVM.
- Margin: The distance between the separating hyperplane and the nearest data points (support vectors) in an SVM. The goal is to maximize this margin.
- Support Vectors: The data points that lie closest to the hyperplane and are crucial for defining the margin in an SVM.
- K-Means Clustering: An unsupervised learning algorithm that aims to partition n observations into k clusters, in which each observation belongs to the cluster with the nearest mean (cluster centroid).
- Cluster Centroid: The mean of the data points assigned to a particular cluster in K-Means.
- Elbow Method: A heuristic method used to determine the optimal number of clusters (K) in K-Means by plotting the within-cluster sum of squares (WSS) against different values of K and looking for an “elbow” in the plot.
- Logistic Regression: A statistical model that uses a sigmoid function to model the probability of a binary outcome. It is used for binary classification problems.
- Sigmoid Function: A mathematical function that produces an “S” shaped curve, often used in logistic regression to map any real value into a probability between 0 and 1.
- K-Nearest Neighbors (KNN): A supervised learning algorithm used for classification and regression. It classifies a new data point based on the majority class among its k nearest neighbors in the training data.
- Nearest Neighbors: The data points in the training set that are closest to a new, unlabeled data point based on a distance metric.
- K (in KNN): The number of nearest neighbors considered when classifying a new data point in the KNN algorithm.
Briefing Document: Review of Machine Learning Concepts and Algorithms
This briefing document summarizes the main themes and important ideas presented in the provided excerpts, covering fundamental concepts in machine learning, linear regression, decision trees, support vector machines (SVMs), K-Means clustering, logistic regression, K-Nearest Neighbors (KNN), recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) networks, convolutional neural networks (CNNs), and transfer learning.
1. Foundational Machine Learning Concepts
The sources introduce fundamental concepts like positive and negative relationships between variables, illustrated with the example of a bicyclist. A positive relationship means “as distance increase so does speed increase,” while a negative relationship means “as the speed increases time decreases.”
The importance of data in machine learning is emphasized throughout. Different algorithms require different formats and preprocessing of data to function effectively.
2. Linear Regression
Linear regression is presented as a method for finding the best-fit line through a set of data points using the formula “y = MX + C.” The process involves:
- Calculating the mean of the x and y values. “remember mean is basically the average.”
- Finding the slope (M) using the formula: “m equals the sum of x – x average * y – y aage or y means and X means over the sum of x – x means squared.”
- Calculating the y-intercept (C) by using the mean values and the calculated slope. “since we know that value we can simply plug plug that into our formula y = 2x + C.”
- Predicting new values using the derived regression equation.
- Evaluating the error between the predicted and actual values. “our goal is to reduce this error we want to minimize that error value on our linear regression model minimizing the distance.”
- The concept is extended to multiple dimensions, where the formula becomes more complex with more features. “this is only two Dimensions y = mx + C but you can take that out to X Z ijq all the different features in there.”
3. Decision Trees
Decision trees are described as “a tree-shaped algorithm used to determine a course of action.” Key concepts include:
- Splitting data based on different attributes to make decisions. Each branch represents a possible outcome.
- The challenge of determining the optimal split, especially with complex data. “how do you know what to split where do you split your data what if this is much more complicated data?”
- Entropy as “a measure of Randomness or impurity in the data set.” Lower entropy is desired.
- Information Gain as “the measure of decrease in entropy after the data set is split.” Higher information gain indicates a better split.
- The mathematical calculation of entropy using probabilities of outcomes (e.g., playing golf or not). “In this case we’re going to denote entropy as I of P of and N where p is the probability that you’re going to play a game of golf and N is the probability where you’re not going to play the game of golf.”
- Building the decision tree by selecting the attribute with the highest information gain for each split. “we choose the attribute with the largest Information Gain as the root node and then continue to split each sub node with the largest Information Gain that we can compute.”
4. Support Vector Machines (SVMs)
SVMs are introduced as a “widely used classification algorithm” that “creates a separation line which divides the classes in the best possible manner.” Key ideas include:
- Finding the optimal hyperplane that maximizes the margin between different classes. “The goal is to choose a hyperplan…with the greatest possible margin between the decision line and the nearest Point within the training set.”
- Support vectors as the data points closest to the hyperplane, which influence its position and orientation.
- The concept of a hyperplane extending to multiple dimensions when dealing with more than two features. “One of the reasons we call it a hyperplane versus a line is that a lot of times we’re not looking at just weight and height we might be looking at 36 different features or dimensions.”
- A practical example of classifying muffin and cupcake recipes based on ingredients using Python’s sklearn library. This demonstrates data loading, visualization using seaborn and matplotlib, data preprocessing (creating labels and features), model training using svm.SVC with a linear kernel, and visualizing the decision boundary and support vectors.
5. K-Means Clustering
K-Means clustering is presented as an unsupervised learning algorithm for grouping data points into clusters based on their similarity. Key steps include:
- Selecting initial cluster centroids, either randomly or by choosing the farthest apart points.
- Assigning each data point to the closest cluster based on the distance to the centroids (often Euclidean distance).
- Recalculating the centroids of each cluster as the mean of the points assigned to it.
- Repeating the assignment and centroid recalculation until the cluster assignments no longer change (convergence).
- The elbow method is introduced as a way to determine the optimal number of clusters (K) by plotting the within-cluster sum of squares (WSS) against the number of clusters and looking for an “elbow” in the graph.
- A use case of clustering cars into brands based on features like horsepower and cubic inches is mentioned, using Python with libraries like numpy, pandas, and matplotlib.
6. Logistic Regression
Logistic regression is described as “the simplest classification algorithm used for binary or multi classification problems.” It differs from linear regression by predicting categorical outcomes using the sigmoid function. Key concepts include:
- The sigmoid function (P = 1 / (1 + e^-y)) which transforms the linear regression output into a probability between 0 and 1, generating an “S-shaped” curve.
- The logarithmic transformation of the sigmoid function: “Ln of p over 1 – p = m * x + C.”
- A threshold value (typically 0.5) to classify the outcome. Probabilities above the threshold are rounded to 1 (e.g., pass, malignant), and those below are rounded to 0 (e.g., fail, benign).
- A use case of classifying tumors as malignant or benign using a dataset with multiple features and Python’s pandas, seaborn, and matplotlib libraries. The process includes data loading, exploration, preprocessing, model building using sklearn.linear_model.LogisticRegression, training, and evaluation.
7. K-Nearest Neighbors (KNN)
KNN is presented as a simple classification algorithm that classifies a new data point based on the majority class of its K nearest neighbors in the feature space. Key aspects include:
- Choosing a value for K, the number of neighbors to consider.
- Calculating the distance (e.g., Euclidean distance) between the new data point and all existing data points. “distance D equals the square Ro T of x – a^ 2 + y – b^ 2.”
- Selecting the K nearest neighbors based on the calculated distances.
- Assigning the new data point to the majority class among its K nearest neighbors. “majority of neighbors are pointing towards normal.”
- A use case of predicting diabetes using a dataset and Python’s pandas and sklearn libraries. The process involves data loading, preprocessing (handling missing values by replacing with the mean), splitting data into training and testing sets, scaling features using StandardScaler, training a KNeighborsClassifier, making predictions, and evaluating the model using metrics like the confusion matrix, F1 score, and accuracy.
8. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks
RNNs are introduced as neural networks designed to handle sequential data. Key characteristics include:
- Recurrent connections that allow information to persist across time steps. “RNNs are distinguished by their feedback loops.”
- The challenge of vanishing and exploding gradients in standard RNNs, making it difficult to learn long-range dependencies.
- LSTMs are presented as a type of RNN that addresses the vanishing gradient problem. “LSTMs are a special kind of RNN, capable of learning long-term dependencies.”
- LSTM architecture involves forget gates, input gates, and output gates to control the flow of information through the cell state.
- Forget gate: Decides which information from the previous state to discard. “F of equals forget gate decides which information to delete that is not important from the previous time step.”
- Input gate: Decides which new information to add to the cell state. “I of T equals the input gate determines which information to let through based on its significance in the current time step.”
- Output gate: Decides which information from the cell state to output. “our o equals the output gate allows the past in information to impact the output in the current time step.”
- A use case of predicting stock prices using an LSTM network and Python’s Keras library (running on TensorFlow). The process includes data loading, feature scaling (MinMaxScaler), creating time series data with specified time steps, reshaping data for the LSTM layer, building a sequential LSTM model with dropout regularization, compiling the model, training it on the historical stock prices, and making predictions for future prices.
9. Convolutional Neural Networks (CNNs)
CNNs are introduced as a powerful type of neural network particularly effective for image recognition. Key components and concepts include:
- Convolutional layers that use filters (kernels) to extract features from the input image. “The basic building block of a CNN is the convolutional layer.”
- Pooling layers that reduce the spatial dimensions of the feature maps, making the network more robust to variations in the input. “The pooling layer’s function is to progressively reduce the spatial size of the representation.”
- Activation functions (e.g., ReLU) applied to the output of convolutional layers.
- Flattening the feature maps before feeding them into fully connected layers for classification.
- The success of CNNs in tasks like image classification, object detection, and image segmentation.
- A use case of building a CNN to classify images from the CIFAR-10 dataset (10 classes of objects) using Python’s TensorFlow and Keras libraries. The process involves loading the dataset, preprocessing (normalizing pixel values and one-hot encoding labels), building a CNN model with convolutional layers, pooling layers, dropout, flattening, and dense layers, compiling the model with an optimizer and loss function, and training it on the CIFAR-10 training data. Helper functions for one-hot encoding and setting up images are also described.
10. Transfer Learning
Transfer learning is presented as a technique to improve the performance of a model on a new, smaller dataset by leveraging knowledge learned from a pre-trained model on a large, related dataset. Key ideas include:
- Using a pre-trained base model (e.g., a CNN trained on ImageNet) as a feature extractor.
- Freezing the weights of the pre-trained layers to prevent them from being updated during the initial training on the new dataset. “Loop over all the layers in the base model and freeze them so they will not be updated during the first training process.”
- Adding a new classification head (e.g., dense layers) specific to the new task.
- Training only the weights of the new head on the smaller dataset.
- Optionally, unfreezing some of the later layers of the base model for fine-tuning after the head has been trained.
- A use case of using a pre-trained ResNet50 model (available in TensorFlow.Keras.applications) for a mask detection task. The process involves loading the pre-trained base model, freezing its layers, adding a custom classification head, compiling the model, training it on a dataset of images with and without masks (using data augmentation to increase the training data), evaluating the model’s performance (precision, recall, F1-score, accuracy), and saving the trained model.
11. Ethical Considerations
The example of classifying tumors (malignant or benign) with logistic regression briefly touches upon ethical considerations in the medical domain. Even with high probability predictions, the user would likely seek professional medical confirmation (“I’m guessing that you’re going to go get it tested anyways”). This highlights the importance of understanding the context and limitations of machine learning models, especially in high-stakes applications.
Overall, the provided excerpts offer a foundational overview of several key machine learning algorithms and concepts, illustrated with practical examples and code snippets using popular Python libraries. They emphasize the importance of data preprocessing, model selection, training, and evaluation in building effective machine learning solutions for various types of problems.
Machine Learning Algorithms: Core Concepts Explained
Frequently Asked Questions about Machine Learning Algorithms
1. What is the fundamental idea behind linear regression? Linear regression aims to model the relationship between a dependent variable (the one we want to predict) and one or more independent variables (the features we use for prediction) by fitting a linear equation (a straight line in two dimensions, or a hyperplane in higher dimensions) to the observed data. The goal is to find the line that best represents the trend in the data, allowing us to predict the dependent variable for new values of the independent variables.
2. How do decision trees work for classification? Decision trees are tree-like structures where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (the prediction). To classify a new instance, we start at the root node and follow the branches corresponding to the outcomes of the tests at each node until we reach a leaf node, which provides the classification. The tree is built by recursively splitting the data based on the attribute that provides the most information gain (or the largest reduction in entropy), aiming to create subsets that are increasingly pure with respect to the target class.
3. What is the core principle of the Support Vector Machine (SVM) algorithm for classification? The primary goal of an SVM is to find the optimal hyperplane that best separates data points belonging to different classes in a dataset. This “best” hyperplane is the one that maximizes the margin, which is the distance between the hyperplane and the nearest data points from each class (called support vectors). By maximizing this margin, the SVM aims to create a decision boundary that generalizes well to unseen data, reducing the risk of misclassification.
4. Can you explain the concepts of entropy and information gain in the context of decision trees? Entropy is a measure of the impurity or randomness within a dataset. A dataset with a mix of different classes has high entropy, while a dataset with only one class has low (ideally zero) entropy. Information gain is the reduction in entropy achieved after splitting the dataset on a particular attribute. When building a decision tree, the attribute with the highest information gain is chosen as the splitting criterion at each node, because it leads to the most significant decrease in impurity in the resulting subsets.
5. How does the K-Means clustering algorithm group data points? K-Means clustering is an iterative algorithm that aims to partition a dataset into $K$ distinct, non-overlapping clusters. It starts by randomly initializing $K$ centroids (representing the center of each cluster). Then, it repeatedly performs two steps: (1) assigning each data point to the cluster whose centroid is nearest (using a distance metric like Euclidean distance), and (2) recalculating the centroids of each cluster as the mean of all the data points assigned to that cluster. This process continues until the centroids no longer move significantly, indicating that the clusters have stabilized. The “elbow method” can be used to help determine an appropriate value for $K$.
6. What is the role of the sigmoid function in logistic regression? In logistic regression, the sigmoid function (also known as the logistic function) is used to transform the linear combination of input features into a probability between 0 and 1. While linear regression can produce continuous output values, logistic regression is used for classification tasks where we need to predict the probability of an instance belonging to a particular class. The sigmoid function maps any real-valued number to a value between 0 and 1, which can be interpreted as the probability of the event occurring. A threshold (often 0.5) is then used to classify the instance into one of the two classes.
7. How do Recurrent Neural Networks (RNNs) handle sequential data differently from standard feedforward networks? Standard feedforward neural networks process each input independently, without memory of past inputs in a sequence. RNNs, on the other hand, are designed to process sequences of data by maintaining an internal state (or memory) that is updated as each element of the sequence is processed. This allows RNNs to capture dependencies and patterns across time steps in the input sequence. They achieve this through recurrent connections, where the output of a neuron at one time step can be fed back as input to the neuron (or other neurons in the network) at the next time step.
8. What are Long Short-Term Memory (LSTM) networks, and what problem do they address in RNNs? Long Short-Term Memory (LSTM) networks are a specific type of RNN architecture that is designed to address the vanishing gradient problem, which can make it difficult for standard RNNs to learn long-range dependencies in sequential data. LSTMs introduce a more complex memory cell with mechanisms called “gates” (input gate, forget gate, and output gate) that control the flow of information into, out of, and within the cell state. These gates allow LSTMs to selectively remember relevant information over long sequences and forget irrelevant information, enabling them to learn complex patterns in tasks like natural language processing and time series analysis where long-term context is important.
Supervised Learning: Concepts and Applications
Supervised learning is a method used to enable machines to classify or predict objects, problems, or situations based on labeled data that is fed to the machine. In supervised learning, you already know the answer for a lot of the information coming in.
Here’s a breakdown of key aspects of supervised learning based on the sources:
- Labeled Data: Supervised learning relies on labeled data for training the machine learning model. This means that for each input data point, there is a corresponding correct output or target variable provided.
- Direct Feedback: During the training process, the model receives direct feedback based on the labeled data. This feedback helps the model learn the relationship between the inputs and the correct outputs.
- Prediction of Outcomes: The goal of supervised learning is to train a model that can predict the outcome for new, unseen data based on the patterns it learned from the labeled training data.
- Examples: The sources provide several examples of tasks that can be addressed using supervised learning:
- Predicting whether someone will default on a loan.
- Predicting whether you will make money on the stock market.
- Classification, where you want to predict a category, such as whether a stock price will increase or decrease (a yes/no answer or a 0/1 outcome).
- Regression, where you want to predict a quantity, such as predicting the age of a person based on height, weight, health, and other factors.
- Building a classifier using Support Vector Machines (SVM) to classify if a recipe is for a cupcake or a muffin.
- Classifying a tumor as malignant or benign based on features, which can be done using logistic regression.
Comparison with Unsupervised Learning:
The sources explicitly contrast supervised learning with unsupervised learning:
- In supervised learning, the data is labeled, and there is direct feedback to the model. The aim is to predict a specific outcome.
- In unsupervised learning, the data is unlabeled, and there is no feedback provided during training. The goal is to find hidden structures in the data and group the data together to discover relationships.
The sources also suggest that supervised and unsupervised learning can be used together. For instance, you might use unsupervised learning to find connected patterns in unlabeled image data, and then label those groups. This labeled data can then be used to train a supervised learning model to predict what’s in future images.
In summary, supervised learning is a powerful approach in machine learning that leverages labeled data to train models for prediction and classification tasks, relying on direct feedback to learn the underlying relationships within the data.
Understanding Unsupervised Learning: Concepts and Techniques
Unsupervised learning is a type of machine learning where a model is trained on unlabeled data to find hidden patterns and structure within the data. Unlike supervised learning, there are no target variables or correct answers provided during the training process, and the model does not receive direct feedback on its predictions. The goal is to discover inherent relationships, similarities, and groupings in the data without prior knowledge of what these might be.
Here’s a breakdown of key aspects of unsupervised learning based on the sources:
- Unlabeled Data: Unsupervised learning algorithms work with datasets that do not have predefined labels or categories. The algorithm must learn the underlying structure of the data on its own.
- Finding Hidden Patterns: The primary objective of unsupervised learning is to identify hidden patterns, structures, or relationships that might not be immediately obvious in the unlabeled data.
- No Direct Feedback: Since the data is unlabeled, there is no feedback mechanism that tells the model whether its findings are correct or incorrect. The evaluation of unsupervised learning models often relies on subjective interpretation of the discovered patterns or on downstream tasks that utilize the discovered structures.
- Clustering: One of the main applications of unsupervised learning is clustering, which involves grouping data points into clusters based on their feature similarity. The aim is to create groups where data points within a cluster are more similar to each other than to those in other clusters.
- K-means clustering is highlighted as a commonly used clustering tool and an example of unsupervised learning. It works by defining a specified number (K) of clusters and assigning random centroids. It then iteratively computes the distance of data points to these centroids, forms new clusters based on minimum distances, and recalculates the centroids until the cluster centroids stop changing.
- Hierarchical clustering is another clustering algorithm that creates a tree-like structure (dendrogram) by either agglomerating similar data points from the bottom up or dividing them from the top down.
- Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a density-based algorithm that identifies clusters based on the density of data points and can also handle outliers by labeling them as noise.
- Dimensionality Reduction: Unsupervised learning can also be used for dimensionality reduction, which aims to reduce the number of variables in a dataset while retaining the most important information.
- Principal Component Analysis (PCA) is mentioned as a dimensionality reduction technique that transforms data into a smaller set of uncorrelated variables (principal components) to capture the most variance in the data.
- Autoencoders, a type of neural network, can also be used for dimensionality reduction by learning efficient representations of data.
- Anomaly Detection: Unsupervised learning techniques can be employed to detect anomalies or unusual data points that deviate significantly from the normal patterns in the data.
- Association Rule Mining: While not detailed extensively, the sources mention association algorithms as another type of unsupervised learning problem, focusing on discovering relationships or associations between variables in large datasets.
- Deep Learning: Unsupervised learning principles are also applied in deep learning using algorithms like autoencoders and generative models for tasks such as clustering, dimensionality reduction, and anomaly detection.
Relationship with Supervised Learning:
As mentioned in our previous discussion, supervised learning uses labeled data for prediction. The sources highlight that unsupervised learning is used when the data is unlabeled and the goal is to discover inherent structure. However, the sources also note that these two approaches can be complementary. For example, unsupervised learning can be used to preprocess data or discover initial groupings, which can then inform the labeling process for subsequent supervised learning tasks.
In summary, unsupervised learning is a valuable set of techniques for exploring and understanding unlabeled data by identifying hidden patterns, groupings, and reductions in dimensionality, providing insights without relying on prior knowledge of the data’s categories or outcomes.
Reinforcement Learning: Agent-Environment Interaction and Reward Maximization
Reinforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result. This learning process aims to enable the agent to maximize a reward signal over time.
Here’s a breakdown of key aspects of reinforcement learning based on the sources:
- Agent and Environment: In reinforcement learning, there is an agent that interacts with an environment. The agent is the learner that takes actions. The environment is the setting in which the agent operates and to which it responds.
- Actions and Results: The agent learns by taking actions within the environment. After each action, the agent receives feedback in the form of a new state of the environment and a reward (or punishment).
- Learning by Trial and Error: Similar to how humans learn from experience, reinforcement learning involves a process of trial and error. The agent explores different actions and learns which actions lead to positive rewards and which lead to negative rewards.
- Maximizing Rewards: The ultimate goal of the agent is to learn a policy – a mapping from states to actions – that maximizes the cumulative reward it receives over time.
- Examples: The sources provide an intuitive example of a baby learning not to touch fire after experiencing the pain of being burned. This illustrates the concept of learning through actions and their consequences. Other examples of tasks where reinforcement learning is used include:
- Robotics
- Game playing, using algorithms like Deep Q Networks
- Optimizing shipping routes for a logistics company by considering fuel prices, traffic, and weather (mentioned in the context of “agentic AI”, which builds upon reinforcement learning principles).
- Relation to Other Machine Learning Types: The sources classify reinforcement learning as one of the basic divisions of machine learning, alongside supervised and unsupervised learning. Deep learning AI can also be applied using reinforcement learning methods.
- Current State and Future Potential: The sources describe reinforcement learning as being in its “infant stages” but also highlight it as having potentially the “biggest machine learning demand out there right now or in the future“. This suggests that while it’s a developing field, it holds significant promise for creating intelligent systems.
In essence, reinforcement learning focuses on training agents to make optimal decisions in dynamic environments by learning from the consequences of their actions, aiming to achieve long-term goals through the accumulation of rewards.
Understanding Neural Networks: Foundations and Applications
Neural networks are a fundamental component of deep learning and are inspired by the structure and function of the human brain. They consist of interconnected layers of artificial neurons (or units) that work together to process information.
Here’s a detailed discussion of neural networks based on the sources:
- Biological Inspiration: Artificial neural networks (ANNs) are biologically inspired by the animal brain and its interconnected neurons. They aim to simulate the human brain using artificial neurons. A biological neuron receives inputs through dendrites, processes them in the cell nucleus, and sends output through a synapse. An artificial neuron has analogous components: inputs, a processing unit involving weights and biases, and an output.
- Perceptron: The Basic Unit: A perceptron can be considered one of the fundamental units of neural networks. It can consist of at least one neuron and can function as a basic binary classifier. A basic perceptron receives inputs, multiplies each input by a weight, adds a bias, and then passes the result through an activation function to produce an output (e.g., 0 or 1, indicating whether the neuron is “activated” or not).
- Structure of Neural Networks:
- A fully connected artificial neural network typically includes an input layer, one or more hidden layers, and an output layer.
- The input layer receives data from external sources.
- Each neuron in the hidden layers computes a weighted sum of its inputs (from the previous layer) and applies an activation function to the result before passing it to the next layer.
- The output layer produces the network’s response.
- Weights are associated with the connections between neurons, and these weights are adjusted during training to optimize the network’s performance.
- A bias is added to the weighted sum in each neuron. Unlike weights (which are per input), there is one bias per neuron, and its value is also adjusted during training.
- Activation functions in each neuron decide whether a neuron should be “fired” or not, determining the output (e.g., zero or one) based on the weighted sum of inputs plus the bias. Common activation functions mentioned include ReLU and Sigmoid.
- Training Process:
- The training process involves feeding labeled data (input and expected output) into the network.
- The network makes a prediction, which is compared to the actual (labeled) output.
- The difference between the predicted and actual output is the error, which is measured by a cost function.
- This error is then fed back through the network in a process called backpropagation, which helps in adjusting the weights and biases of the neurons.
- The goal of training is to minimize the cost function, and an optimization technique called gradient descent is commonly used for this purpose by iteratively adjusting weights and biases. The learning rate in gradient descent determines the step size for these adjustments.
- This is an iterative process that continues until the error is minimized to a satisfactory level or a specified number of iterations (epochs) is reached.
- Logical Functions: Early research showed that single-layer perceptrons could implement basic logical functions like AND and OR by adjusting the weights and biases. However, implementing the XOR gate required a multi-level perceptron (MLP) with at least one hidden layer, which overcame an early roadblock in neural network development.
- Types of Neural Networks: The sources describe several common architectures in deep learning:
- Feedforward Neural Networks (FNN): The simplest type, where information flows linearly from input to output. They are used for tasks like image classification, speech recognition, and Natural Language Processing (NLP). Sequential models in Keras are an example of this, where layers are stacked linearly.
- Convolutional Neural Networks (CNN): Designed specifically for image and video recognition. They automatically learn features from images through convolutional operations, making them ideal for image classification, object detection, and image segmentation. CNNs involve layers like convolutional layers, ReLU layers, and pooling (reduction) layers.
- Recurrent Neural Networks (RNN): Specialized for processing sequential data, time series, and natural language. They maintain an internal state to capture information from previous inputs, making them suitable for tasks like speech recognition, NLP, and language translation. Long Short-Term Memory (LSTM) networks are a type of RNN.
- Deep Neural Networks (DNN): Neural networks with multiple layers of interconnected nodes (including multiple hidden layers) that enable the automatic discovery of complex representations from raw data. CNNs and RNNs with multiple layers are considered DNNs.
- Deep Belief Networks (DBN): Mentioned as one of the types of neural networks.
- Autoencoders: A type of neural network used for learning efficient data representations, typically for dimensionality reduction or anomaly detection.
- Applications of Deep Learning and Neural Networks: Deep learning, powered by neural networks, has numerous applications across various domains:
- Autonomous Vehicles: CNNs process data from sensors and cameras for object detection, traffic sign recognition, and driving decisions.
- Healthcare Diagnostics: Analyzing medical images (X-rays, MRIs, CT scans) for early disease detection.
- Natural Language Processing (NLP): Enabling sophisticated text generation, translation, and sentiment analysis (e.g., Transformer models like ChatGPT).
- Deepfake Technology: Creating realistic synthetic media, raising ethical concerns.
- Predictive Maintenance: Analyzing sensor data to predict equipment failures in industries.
- Gaming: AI systems like AlphaGo that can defeat human world champions.
- Synthesizing Images, Music, and Text: Generative Adversarial Networks (GANs) can be used for this.
- Robotics: Enabling human-like capabilities in robots.
- Speech Recognition: Converting audio into text.
- Image Captioning: Analyzing images and generating descriptive captions using RNNs.
- Time Series Prediction: Using RNNs to predict future values based on sequential data, such as stock prices.
- Sentiment Analysis: Determining the emotional tone of text using RNNs.
- Machine Translation: Translating text between different languages using RNNs.
- Fraud Detection: Identifying unusual financial transactions using autoencoders.
- Recommendation Systems: Providing personalized content recommendations.
- Image Enhancement: Features like in-painting and out-painting in tools like Stable Diffusion.
- Face Mask Detection: Building models to check if a person is wearing a mask.
- Relationship with Deep Learning, Machine Learning, and AI:
- Deep learning is a subset of machine learning, which in turn is a branch of artificial intelligence.
- Neural networks, particularly deep neural networks with multiple layers, are the main component of deep learning.
- Unlike traditional machine learning, deep learning models can automatically discover representations (features) from raw data, eliminating the need for manual feature extraction.
- Tools and Platforms:
- TensorFlow is highlighted as a popular open-source platform developed and maintained by Google for developing deep learning applications using neural networks. It supports both CPUs and GPUs for computation and uses tensors (multi-dimensional arrays) and graphs to represent and execute computations.
- Keras is presented as a high-level API that can run on top of TensorFlow (and other backends), making it straightforward to build neural network models, including sequential and functional models. Keras simplifies the process of defining layers (like dense, activation, dropout), compiling the model with optimizers and loss functions, and training it on data.
In summary, neural networks are powerful computational models inspired by the human brain, forming the core of deep learning. They learn complex patterns from data through interconnected layers of neurons with adjustable weights and biases, trained using techniques like backpropagation and gradient descent. With various architectures tailored for different types of data, neural networks have enabled significant advancements across a wide range of applications in artificial intelligence.
Deep Learning: Foundations, Methods, and Applications
Deep learning is presented in the sources as a subset of machine learning, which itself is a branch of artificial intelligence. It is defined as a type of machine learning that imitates how humans gain certain types of knowledge. Unlike traditional machine learning models that require manual feature extraction, deep learning models automatically discover representations from raw data. This capability is primarily achieved through the use of neural networks, particularly deep neural networks that consist of multiple layers of interconnected nodes.
Here’s a more detailed discussion of deep learning based on the sources:
- Core Component: Neural Networks: Neural networks are the main component of deep learning. These networks are inspired by the structure and function of the human brain, consisting of interconnected layers of artificial neurons [6, Me]. Deep learning utilizes deep neural networks, meaning networks with multiple hidden layers [6, Me]. These layers enable the network to transform input data into increasingly abstract and composite representations. For instance, in image recognition, initial layers might detect simple features like edges, while deeper layers recognize more complex structures like shapes and objects.
- Types of Deep Learning: Deep learning AI can be applied using supervised, unsupervised, and reinforcement machine learning methods.
- Supervised learning in deep learning involves training neural networks to make predictions or classify data using labeled datasets. The network learns by minimizing the error between its predictions and the actual targets through a process called backpropagation. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are common deep learning algorithms used for tasks like image classification, sentiment analysis, and language translation.
- Unsupervised learning in deep learning involves neural networks discovering patterns or clusters in unlabeled datasets without target variables. Algorithms like Autoencoders and generative models are used for tasks such as clustering, dimensionality reduction, and anomaly detection.
- Reinforcement learning in deep learning (Deep Reinforcement Learning) involves an agent learning to make decisions in an environment to maximize a reward signal over time [6, Me]. Algorithms like Deep Q-Networks are used for tasks such as robotics and gameplay [6, Me].
- Training Deep Learning Models: Training deep learning models often requires significant data and computational resources. The process typically involves:
- Data Pre-processing: Transforming textual data into a numerical representation (tokenization, encoding). Applying techniques like scaling, normalization, and encoding to make data more usable.
- Random Parameter Initialization: Initializing the model’s parameters randomly before training.
- Feeding Numerical Data: Inputting the numerical representation of the text data into the model.
- Loss Function Calculation: Measuring the discrepancy between the model’s predictions and the actual targets using a loss function [11, Me].
- Parameter Optimization: Adjusting the model’s parameters (weights and biases) through optimization techniques like gradient descent to minimize the loss [11, Me].
- Iterative Training: Repeating the training process over multiple iterations (epochs) until the model achieves satisfactory accuracy [11, Me].
- Advantages of Deep Learning:
- High Accuracy: Achieves state-of-the-art performance in tasks like image recognition and natural language processing.
- Automated Feature Engineering: Automatically discovers and learns relevant features from data without manual intervention.
- Scalability: Can handle large and complex datasets and learn from massive amounts of data.
- Makes processes quicker and simpler for data scientists to gather, analyze, and interpret massive amounts of data.
- Disadvantages of Deep Learning:
- High Computational Requirements: Requires significant data and computational resources (like GPUs) for training.
- Need for Large Labeled Datasets: Often requires extensive labeled data for supervised learning, which can be costly and time-consuming to obtain.
- Overfitting: Can overfit to the training data, leading to poor performance on new, unseen data.
- Applications of Deep Learning: Deep learning is revolutionizing various industries and has a wide range of applications:
- Autonomous Vehicles: Object detection, traffic sign recognition.
- Healthcare Diagnostics: Medical image analysis, early disease detection.
- Natural Language Processing (NLP): Text generation, translation, sentiment analysis, chatbots.
- Deepfake Technology: Creation of realistic synthetic media.
- Predictive Maintenance: Predicting equipment failures.
- Gaming: Creating advanced AI for games.
- Content Creation: Synthesizing images, music, and text.
- Robotics: Enabling more human-like robot capabilities.
- Speech Recognition: Converting spoken language to text.
- Image Recognition: Identifying objects and features in images.
- Fraud Detection: Identifying unusual patterns in financial transactions.
- Recommendation Systems: Providing personalized suggestions.
- Relationship with Other AI Concepts:
- Machine Learning: Deep learning is a subfield of machine learning, distinguished by the use of deep neural networks and automatic feature learning.
- Artificial Intelligence (AI): Deep learning is a powerful technique within the broader field of AI, enabling systems to perform complex tasks that previously required human intelligence.
- Tools and Platforms for Deep Learning:
- TensorFlow: An open-source platform developed by Google, widely used for developing deep learning applications. It supports both CPUs and GPUs and uses tensors for data manipulation.
- PyTorch: Another popular open-source machine learning framework often used for deep learning research and development.
- Keras: A high-level API that can run on top of TensorFlow (and other backends), simplifying the process of building and training neural networks.
In conclusion, deep learning, powered by multi-layered neural networks, represents a significant advancement in AI. Its ability to automatically learn intricate patterns from vast amounts of data has led to remarkable progress in numerous fields, making it a crucial technology in the ongoing AI revolution.
The Original Text
hello everyone and welcome to artificial intelligent full course by simply learn AI or artificial intelligence is changing how machines work teaching them to think learn and make decisions like humans you already see AI in action with Siri Alexa Netflix recommendations and even self-driving cars by 2025 AI will be even bigger with Industries like healthcare finance and Tech relying it to boost Innovation this means huge job opportunities and high salaries with AI professionals earning up to 6 to 10 LP in India and around $100,000 in the US in this course you will learn basics of AI including neural networks deep learning and recording neural networks as well the technology powering modern AI you’ll also explore curent opportunities in Ai and get expert tips to prepare for job interviews and build the skills needed to succeed in this fast growing field but before we comment if you are interested in mastering the future of technology the profession certificate course in generative Ai and machine learning is your perfect opportunity offered in collaboration with the enic academy ID canut this 11 month online live and interactive program provides hands-on experience in Cutting Edge tools like generative AI machine learning and chpt D2 as well you’ll also gain practical experience to 15 plus projects integrated labs and life master classes delivered by esteemed it carool faculty so hurry up and find the course Link in the description box below and in the pin comments so let’s get started Liam a 19-year-old freshman recently joined an Ivy League College to study history and political science while reading about thinkers and Scholars of the early 20th century he stumbled upon a name Alan tur Liam was fascinated by Allan and realized that the computer that he knows of today Allan is considered to be the father of modern computer science that eventually led to the invention of his computer but there was something that was even more fascinating about Allan although Alan Turing was famous for his work developing the first modern computers decoding the encryption of German Enigma machines during the second world war he also built a detailed procedure known as the touring test forming the basis for artificial intelligence Liam had his mind blown by this fact he realized that AI is not a modern phenomen phenon but rather more than a thought experiment existing since the early 9s Liam used AI tools like chat GPT perplexity and consensus on a daily basis for his research he had a smartphone that he used for multiple tasks like using Siri or Google Assistant to find local food places using autocorrect of multiple apps like Instagram and WhatsApp and even AI photo editing features he realized that AI has seeped into almost every aspect of his life for making trivial decisions like where to have his morning coffee to complex AI tools like chat GPT for his research to even his father’s self-driving Tesla that he used whenever he got a chance to artificial intelligence or AI in the 21st century has become a very subtle technology that exists in every human’s life without them even realizing it but what is this AI does this mean robots in a completely dystopian AI warlord future not really let us dive a little deeper into understanding everything about AI artificial intelligence or AI is like giving computers the ability to think and learn much like humans do imagine teaching a friend how to solve puzzles and then that friend can solve different types of puzzles on their own AI Works similarly it helps computers understand and Carry Out tasks that typically need human intelligence these tasks include recognizing faces and photos chatting with us through sart assistants like Siri or Google assistant and even driving cars think of AI as a smart helper that makes our daily lives easier it can learn from data make decisions and improve itself over time this means that AI isn’t just about robots taking over the world it’s more about using smart technology to assist us in various ways making complex tasks simpler and everyday routines smoother AI has found its way into many areas of Our Lives often making things easier without us even realizing it in healthcare for example AI helps doctors by quickly analyzing medical images like x-rays to detect issues faster than the human IMI in finance AI Works to keep our money safe by spotting unusual activities in our bank accounts that could indicate fraud when you stream shows on Netflix AI suggests movies and series based on what you’ve watched and liked before in retail AI manages stock and predicts what items will be popular ensuring that store shelves are filled with what customers need Even in our home AI is at work through smart devices like thermostats that learn your schedule and adjust the temperature automatically or lights that turn on when you enter a room AI touches so many parts of our daily lives making things more convenient and efficient one of the best AI applications today which is very widely known and used is chat GPT an advanced AI developed by open AI that can chat with you just like a human imagine having a friend who knows almost everything and can help you with any question or topic that’s what chat GPT does but how does it work chat GPT is powered by something called a Transformer model this is a type of machine learning model that learns patterns in Language by looking at a vast amount of text Data from books websites and other sources think of it like reading millions of books and remember important information from all of them when you ask chat GPT a question it doesn’t just pull out a random answer instead it looks at the words you used understands the context and predicts what a good response would be based on what it has learned for example if you ask about the weather it understands you are looking for current weather conditions and gives you relevant information if you ask it to help with homework it draws on its knowledge to explain Concepts clearly chat GPT uses a process called Deep learning which is a bit like how our brains work it breaks down sentences into smaller parts and looks at how these parts fit together this helps it understand not just the meaning of individual words but also how they combine to convey a complete idea this is why chat GPT can handle complex questions and give answers that make sense to make sure it provides useful and accurate information chat GPT was trained on a diverse range of topics this training helps it recognize and generate text on anything from Science and History to entertainment in daily life it’s like having an encyclopedia and a friendly tutor ruled into one similar to chat GPT there are a plethora of other tools and applications being developed every day that are trained for various purposes using varied kind of data sets for example doll e which has been traded on a first data set of text and images from the internet stable diffusion which has been trained on a variety of images and corresponding text descriptions Tesla autopilot which has been trained on sensor data from Tesla vehicles and driving data and so on and so forth AI is a remarkable technology that holds great promise for the future offering solutions to some of the world’s most pressing challenges imagine a future where AI takes care of routine tasks giving us more time to be creative and focus on what we love AI can help in many ways from improving Medical Treatments to making our daily lives more efficient however it’s essential to use AI responsibly this means creating guidelines and rules to ensure AI is developed and used in ways that benefit Everyone by embracing Ai and understanding its potential we can look forward to a future where technology and human creativity go hand inand AI is not just about smart gadgets it’s about opening new possibilities and making our world a better place the future of a I is bright filled with opportunities for Innovation and progress helping us achieve things we never thought possible so let’s talk about is AI is a good career or not you have probably heard a lot about artificial intelligence or AI it’s everywhere and it’s shaking up Industries all over the world but here’s the big question is AI a good career choice yes absolutely it is take Elon Musk for example we all know him as the guy behind Tesla and SpaceX but did you know he also co-founded open AI even a laun diving into Ai and that just shows how massive this field is becoming and guess what AI isn’t just for Tech Geniuses there’s room for everyone Let’s Talk About Numbers AI jobs are growing like crazy up to 32% in recent years and the pay is pretty sweet with rols offering over $100,000 a year so we you into engineering research or even the ethical side of the things AI has something for you plus the skills you pick up in AI can be used in all sorts of Industries making it a super flexible career choice now ai is a big field and there are tons of different jobs you can go for let’s break down some of the key roles first up we have machine learning Engineers these folks are like the backbone of AI they build models that can analyze huge amounts of data in real time if you’ve got a background in data science or software engineering this could be your thing the average salary is around $131,000 in the US then there’s data scientist the detectives of the AI World they dig into Data to find patterns that help businesses make smart decisions if you’re good with programming and stats this is a great option and you can make about $105,000 a year next we’ve got business intelligence Developers they are the ones to process and analyze data to sport trends that guide business strategies if you enjoy working with data and have a background in computer science this role might be for you the average salary here is around $87,000 per year then we’ve got research scientist these are the ones pushing AI to new heights by asking Innovative questions and exploring new possibilities it’s a bit more academic often needing Advanced degrees but but it’s super rewarding with salaries around $100,000 next up we have big data engineers and Architects these are the folks who make sure all the different parts of business’s technology talk to each other smoothly they work with tools like Hadoop and Spark and they need strong programming and data visualization skills and get this the average salary is one of the highest in EI around $151,000 a year then we have AI software engineer these engineers build a software that powers AI application they need to be really good at coding and have a solid understanding of both software engineering and AI if you enjoy developing software and want to be a part of the air Revolution This Could Be Your Role the average salary is around $108,000 now if you’re more into designing systems you might want to look at becoming a software architect these guys design and maintain entire system making sure everything is scalable and efficient with expertise in Ai and Cloud platforms software Architects can earn Hefty salary about $150,000 a year let’s not forget about the data analyst they have been around for a while but their role has evolved big time with AI now they prepare data for machine learning models and create super insightful reports if you’re skilled in SQL Python and data visualization tools like Tabu this could be a a great fit for you the average salary is around $65,000 but it can go much higher in tech companies another exciting roles is robotics engineer these Engineers design and maintain AI powered robots from Factory robots to robots that help in healthcare they usually need Advanced degrees in engineering and strong skills in AI machine learning and iot Internet of Things the average salary of Robotics engineer is around $87,000 with experience it can go up to even more last but not the least we have got NLP Engineers NLP stands for natural language processing and these Engineers specialize in teaching machines to understand human language think voice assistants like Siri or Alexa to get into this role you’ll need a background in computational linguistics and programming skills the average salary of an NLP engineer is around $78,000 and it can go even higher as you gain more experience so you can see the world of AI is full of exciting opportunities whether you’re into coding designing systems working with data or even building robots there’s a role for you in this fastest growing field so what skills do you actually need to learn to land an entry-level AI position first off you need to have a good understanding of AI and machine learning Concepts you’ll need programming skills like python Java R and knowing your way around tools like tens of flow and Pie torch will help you give an edge too and do not forget about SQL pandas and big Technologies like Hadoop and Spark which are Super valuable plus experience with AWS and Google cloud is often required so which Industries are hiring AI professionals AI professionals are in high demand across a wide range of Industries here are some of the top sectors that hire AI Talent technology companies like Microsoft Apple Google and Facebook are leading with charge in AI Innovation consulting firms like PWC KPMG and Accenture looking for AI experts to help businesses transform then we have Healthcare organizations are using AI to revolutionize patient with treatment then we have got retail giants like Walmart and Amazon leverage AI to improve customer experiences then we have got media companies like Warner and Bloomberg are using AI to analyze and predict Trends in this media industry AI is not just the future it’s the present with right skills and determination you can carve out a rewarding career in this exciting field whether you’re drawn to a technical challenges or strategic possibilities there’s a role in AI that’s perfect for you so start building your skills stay curious and get ready to be a part of the air Revolution so now let’s see steps to get an AI engineer job so to thrive in this field developing a comprehensive skill set is a crucial while encompasses May specialized areas so here are some certain code skills that are essential across most roles so here is you can build these skills first one is technical skills so AI roles heavily rely on technical expertise particularly in programming data handling or working with AI specific tools or you can say the cloud specific tools so here are some key areas to focus on the first one is the programming languages so profy in journal purpose programming language like Python and R is the fundamental python in particular is widely used in AI for Simplicity and robust liability such as tlow and python which are crucial for machine learning and deep learning task the second one is database management so understanding how to manage and manipulate large data set is essential in AI familiarity with database Management Systems like Apache Cassandra couch base and Dynamo DB will allow you to store retrieve and process data efficiently the third one data analysis and statistics strong skills in data analysis are must tools like matlb Excel and pandas are invaluable for statical analysis data manipulation and visualization Trends and data which are critical for developing AI models fourth one Cloud AI platform knowing of cloudbased platform such as Microsoft Azure AI Google Cloud Ai and IBM Watson is increasingly important so these platform provide pre-build models tools and infrastructure that can accelerate AI development and deployment the second one is industry knowledge while technical skills from the backbone of your AI expertise understanding the industry context is equally important for example knowing how AI integrates with digital marketing goals and strategies can be significant Advantage if you are working in or targeting Industries like e-commerce or advertising so industry specific knowledge allows you to apply AI Solutions more effectively and communicate their value to stakeholders the third one one workplace or soft skills in addition to technical industry specific skills developing workplace skills or you can say soft skill is essential for success in AI roles or any roles so these soft skills often hor through experience include the first one is communication clearly articulating complex AI concept to non-technical stakeholder is crucial whether you are explaining how machine learning model works or presenting data driven Insight effective communication ensure that your work is understood and valued second one is collaboration AI projects often require teamwork across diverse field including data science software development and other things the third one is analytical thinking AI is fundamentally about problem solving you will need a strong analytical thinking skills to approach challenges logically break them down into manageable parts and develop Innovative solution the fourth one problem solving AI projects frequently involve an unexpected challenges whether it’s a technical bug or an unforeseen data issue strong problems solving will help you navigate these hurdles and key projects on TR so building these skills can be achieved through various methods including selfstudy online courses boot camps or formal education additionally working on real projects contributing to open source CI initiatives and seeking mentorship can provide practical experience and further enhance your expertise so next thing is learn Advanced topics so as you advanc in your machine Learning Journey it is important to delve into more advanced topics these areas will deepen your understandings and help you tackle complex problem so some key topics to focus are the first one is deep learning and neural network the second thing is enable learning techniques the third thing is generative models and aders learning fourth one is recommendation system and collaborative filtering the fifth one is time series analysis and forecasting so now let’s move forward and see some machine learning projects so working on real world projects to apply your knowledge focus on data collection and preparation caps project in image recognition and NLP predictive modeling and anomal detection practical experience key to solidifying your skills so now let’s move forward and see what is the next skill that is on a certification so if you are already hold on undergraduate degree in a field of related to AI enrolling in specialized course to enhance your technical skills can be highly beneficial even if you don’t have a degree earning certification can show potential employers that you are committed to your career goals and actively investing in your professional development so you can unleash your career potential with our artificial intelligence and machine learning courses tailor for diverse Industries and roles at top Global forms a program features key tools enhance your AI knowledge and business equipment join the job market and become soft after profession the next thing is continuous learning and exploration so stay updated with the latest development by following industry leaders engaging in online committees and working on one person project pursue Advanced learning through courses and certification to keep your skills sharp so now let’s move forward and see some AI career opportunities with salary so the job market for machine learning professional is booming the average annual salary for AI Engineers can be very based on location experience and Company so here are some roles like machine learning engineer data scientist NLP engineer compion and AI ml researcher so now let’s see how much they earn so the first one is ml engineer so machine learning Engineers earn $153,000 in us and 11 lakh in India perom the second one is Data Centers the data sent is earn $150,000 in us and 12 lakh perom in India the third one is NLP engineer they earn $117,000 in us and 7 lakh in India per anom fourth one is compter Vision engineer CV engineer they earn around $126,000 in us and 650,000 in India the last one is AIML researcher they earn $130,000 in us and in India they earn around 9 lakh per anom so note that these figures can vary on website to website and changes frequently so now last step is start applying for entry-level jobs when you feel confident in your training begin researching and applying for jobs many entry-level AI positions like software engineer or developer roles are often labeled as entry level or Junior in the job description jobs that require less than 3 years of experience are usually suitable for those Jud starting out if you need additional support in your job research consider applying for internship taking on freelance project or participating in hackathons to further hor your skills so these opportunities not only provide valuable feedback on your work but also help you build connection that could benefit your career in the future so with this we have come to end of this video if you have any question or doubt please feel free to ask in the comment section below our team of experts will help you as soon as possible AI will pretty much touch everything we do it’s more likely to be correct and grounded in reality talk to the AI about how to do better it’s a very deep philosophical conversation it’s a bit above my f grade I’m going to say something and it it’s it’s going to sound completely opposite um of what people feel uh you you you probably recall uh over the course of the last 10 years 15 years um almost everybody who sits on a stage like this would tell you it is vital that your children learn computer science um everybody should learn how to program and in fact it’s almost exactly the opposite it is our job to create Computing technology such that nobody has to program and that the programming language it’s human everybody in the world is now a programmer this is the miracle artificial intelligence or AI from its humble beginnings in 1950s AI has evolved from the simple problem solving and symbolic reasoning to the advanced machine learning and deep learning techniques that power some of the most Innovative application we see today so AI is not just a bus word it is a revolutionary Force reshaping Industries enhancing daily life and creating unmatched opportunities across various sector AI is changing numerous fields in healthcare it aids in early disease diagnosis and personalized treatment plans in finance it transform money management with the robo advisors and fraud detection system the automotive industry is seeing the rise of autonomous vehicles that navigate traffic and recognize obstacle while retail and e-commerce benefit from personalized shopping experience and optimized Supply Chain management so one of the most exciting developments in the AI is the rise of advanced C tools like chgb 40 Google Gemini and generative models so these tools represent The Pinacle of conversational AI capable of understanding and generating humanik text with remarkable accuracy chgb for can assist in writing brainstorming ideas and even tutoring make its valuable resource for student professional and creatives similarly Google Gemini take AI integration to the next level enhancing search capabilities providing insightful responses and integrating seamlessly into our digal lives generative AI is a subset of AI is also making views by creating new content from scratch tools like Dal which generates images from textual reception and gpt3 which can write coherent and creative text are just the beginnings so these Technologies are changing Fields like art design and content creation enabling the generation of unique and personal outputs that were previously unimaginable so beyond specific Industries AI application extend to everyday’s life voice activated assistant like Siri and Alexa and smart home devices learn our preferences and adjust our environments accordingly so AI is embedded in the technology we use daily making our lives more convenient connected and efficient so join us as we explore the future of AI examining the breakthroughs the challenges and the endless possibilities that lies ahead so whether you are a tech enthusiast a professional in the field or simply curious about wor next so this video will provide you with a comprehensive look at how AI is shaping our world and what we can expect in the years to come so before we move forward as we know chb Gemini generi tools is an AI based and if you want to learn how these School AI develop and want to create your own so without any further ad do let’s get started so how AI will impact the future the first is enhanced business automation AI is transforming business automation with 55% of organization adopting AI technology chatbots and digital assistant handle customer interaction and basic employee inquiries speeding up decision making the second thing is job disruption automation May display job with a one third to takes potentially automated while roles like Securities are at risk demand for machine learning specialist is rising AI is more like likely to augment skilled and creative positions emphasizing the need for up Skilling data privacy issues training AI model requires large data set raising privacy concern the FTC is investigating open AI for potential violation and the Biden haris Administration introduced an AI bill of right to promote data transparency the fourth one is increased regulation AI impact on intellectual property and ethical concerns is leading to increase regulation lawsuits and government guidelines on responsible AI use could reshape the industry climate change concern AI optimize Supply chains and reduce emission but the energy needed for the AI models may increase carbon emission potentially negating environmental benefits so understanding these impacts help us to prepare for ai’s future challenges and opportunities so now let’s see what industries will AI impact the most the first one is manufacturing AI enhances manufacturing with robotic arm and predictive sensors improving tasks like assembly and equipment and maintenance the second is healthare AI changes healthare by quickly identifying diseases streamlining drug Discovery and monitoring patients through virtual nursing assistant The Third One Finance AI helps bank and financial institution detect fraud conduct Audits and assess loan applications while Trader use AI for risk assessment and smart investment decision the fourth one education AI personalizes education by digitizing textbook deducting plagarism and analyzing student emotions to tailor learning experience the fifth one customer service AI power chatbots and virtual assistant provide data D insights enhancing customer services interaction so these industries are experiencing significant changes due to AI driving Innovation and efficiency across various sectors so now let’s move forward and see some risk and danger of AI so AI offers many benefits but also possess significant risk the first one job loss from 2023 to 2028 44% of worker skills will be disrupted without upskilling AI could lead to higher unemployment and fewer opportunities for marginalized groups the second one is human biases AI often reflect the biases of its trainers such as facial recognition favoring lighter skin tones unchecked biases can perpetuate social inequalities the third one defects and misinformation defects plus reality spreading misinformation with dangerous consequences they can be used for political propaganda financial fraud and compromising reputation the fourth one data privacy AI training on public data risk breaches that expose a personal information a 2024 Cisco survey found 48% of businesses use non-public information in AI tools with 69 concerned about intellectual property and legal rights breaches could expose million of consumers data the fifth one automated weapons AI in automated weapon fails to distinguish between Soldier and civilization posing savior threats misuse could lead endangered large population understanding these risk is crucial for responsible AI development and the use so as we explore the future of AI it’s clear that impact will be profound and far-reaching AI will change Industries and enhance efficiency and drive Innovation however it also brings significant challenges including job displacement biases privacy concern misinformation and the ethical implication of automated weapons so to harness AI potential responsibility we must invest in upscaling our Workforce address biases in AI system protect data privacy and develop regulations that ensure ethical AI use we’ve looked at a lot of examples of machine learning so let’s see if we can give a little bit more of a concrete definition what is machine learning machine learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed we see here we have a nice little diagram where we have our ordinary system uh your computer nowadays you can even run a lot of this stuff on a cell phone because cell phones advance so much and then with artificial intelligence and machine learning it now takes the data and it learns from what happened before and then it predicts what’s going to come next and then really the biggest part right now in machine learning that’s going on is it improves on that how do we find a new solution so we go from descriptive where it’s learning about stuff and understanding how it fits together to predicting what it’s going to do to post scripting coming up with a new solution and when we’re working on machine learning there’s a number of different diagrams that people have posted for what steps to go through a lot of it might be very domain specific so if you’re working working on Photo identification versus language versus medical or physics some of these are switched around a little bit or new things are put in they’re very specific to The Domain this is kind of a very general diagram first you want to Define your objective very important to know what it is you’re wanting to predict then you’re going to be collecting the data so once you’ve defined an objective you need to collect the data that matches you spend a lot of time in data science collecting data and the next step preparing the data you got to make sure that data is clean going in there’s the old saying bad data in bad answer out or bad data out and then once you’ve gone through and we’ve cleaned all this stuff coming in then you’re going to select the algorithm which algorithm are you going to use you’re going to train that algorithm in this case I think we’re going to be working with svm the support Vector machine then you have to test the model does this model work is this a valid model for what we’re doing and then once you’ve tested it you want to run your prediction you want to run your prediction or your choice or whatever output it’s going to come up with and then once everything is set and you’ve done lots of testing then you want to go ahead and deploy the model and remember I said domain specific this is very general as far as the scope of doing something a lot of models you get halfway through and you realize that your data is missing something and you have to go collect new data because you’ve run a test in here someplace along the line you’re saying hey I’m not really getting the answers I need so there’s a lot of things that are domain specific that become part of this model this is a very general model but it’s a very good model to start with and we do have some basic divisions of what machine learning does that’s important to know for instance do you want to predict a category well if you’re categorizing thing that’s classification for instance whether the stock price will increase or decrease so in other words I’m looking for a yes no answer is it going up or is it going down and in that case we’d actually say is it going up true if it’s not going up it’s false meaning it’s going down this way it’s a yes no 01 do you want to predict a quantity that’s regression so remember we just did classification now we’re looking at regression these are the two major divisions in what data is doing for instance predicting the age of a person based on the height weight health and other factors So based on these different factors you might guess how old a person is and then there are a lot of domain specific things like do you want to detect an anomaly that’s anomaly detection this is actually very popular right now for instance you want to detect money withdrawal anom Ames you want to know when someone’s making a withdrawal that might not be their own account we’ve actually brought this up because this is really big right now if you’re predicting the stock whether to buy stock or not you want to be able to know if what’s going on in the stock market is an anomaly use a different prediction model because something else is going on you got to pull out new information in there or is this just the norm I’m going to get my normal return on my money invested so being able to detect anomalies is very big in data science these days another question that comes up which is on what we call untrained data is do you want to discover structure in unexplored data and that’s called clustering for instance finding groups of customers with similar Behavior given a large database of customer data containing their demographics and past buying records and in this case we might notice that anybody who’s wearing certain set of shoes goes shopping at certain stores or whatever it is they’re going to make certain purchases by having that information it helps us to Market or group people together so then we can now explore that group and find out what it is we want to Market to them if you’re in the marketing world and that might also work in just about any Arena you might want to group people together whether they’re uh based on their different areas and Investments and financial background whether you’re going to give them a loan or not before you even start looking at whether they’re valid customer for the bank you might want to look at all these different areas and group them together based on unknown data so you’re not you don’t know what the data is going to tell you but you want to Cluster people together that come together let’s take a quick DeTour for quiz time oh my favorite so we’re going to have a couple questions here under our quiz time and um we’ll be posting the answers in the part two of this tutorial so let’s go ahead and take a look at these quiz times questions and hopefully you’ll get them all right it’ll get you thinking about how to process data and what’s going on can you tell what’s happening in the following cases of course you’re sitting there with your cup of coffee you have your check box and your pen trying to figure out what’s your next step in your data science analysis so the first one is grouping documents into different categories based on the topic and content of each document very big these days you know you have legal documents you have uh maybe it’s a Sports Group documents maybe you’re analyzing newspaper postings but certainly having that automated is a huge thing in today’s world B identifying handwritten digits in images correctly so we want to know whether uh they’re writing an A or capital A B C what are they writing out in their hand digit their handwriting C behavior of a website indicating that the site is not working as designed D predicting salary of an individual based on his or her years of experience HR hiring uh setup there so stay tuned for part two we’ll go ahead and answer these questions when we get to the part two of this tutorial or you can just simply write at the bottom and send a note to Simply learn and they’ll follow up with you on it back to our regular content and these last few bring us into the next topic which is another way of dividing our types of machine learning and that is with supervised unsupervised and reinforcement learning supervised learning is a method used to enable machines to classify predict objects problems or situations based on labeled data fed to the machine and in here you see see we have a jumble of data with circles triangles and squares and we label them we have what’s a circle what’s a triangle what’s a square we have our model training and it trains it so we know the answer very important when you’re doing supervised learning you already know the answer to a lot of your information coming in so you have a huge group of data coming in and then you have new data coming in so we’ve trained our model the model now knows the difference between a circle a square a triangle and now that we’ve trained it we can send in in this case a square and a circle goes in and it predicts that the top one’s a square and the next one’s a circle and you can see that this is uh being able to predict whether someone’s going to default on a loan because I was talking about Banks earlier supervised learning on stock market whether you’re going to make money or not that’s always important and if you are looking to make a fortune on the stock market keep in mind it is very difficult to get all the data correct on the stock market it is very it fluctuates in ways you really hard to predict so it’s quite a a roller coaster ride if you’re running machine learning on the stock market you start realizing you really have to dig for new data so we have supervised learning and if you have supervised we should need unsupervised learning in unsupervised learning machine learning model finds the hidden pattern in an unlabeled data so in this case instead of telling it what the circle is and what a triangle is and what a square is it goes in there looks at them and says for whatever reason it groups them together maybe it’ll group it by the number of corners and it notices that a number of them all have three corners a number of them all have four corners and a number of them all have no corners and it’s able to filter those through and group them together we talked about that earlier with looking at a group of people who are out shopping we want to group them together to find out what they have in common and of course once you understand what people have in common maybe you have one of them who’s a customer at your store or you have five of them are customer at your store and they have a lot in common with five others who are not customers at your store how do you Market to those five who aren’t customers at your store yet they fit the demograph of who’s going to shop there and you’d like them to shop at your store not the one next door of course this is a simplified version you can see very easily the difference between a triangle and a circle which is might not be so easy in marketing reinforcement learning reinforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result and we have here where the in this case a baby it’s actually great that they used an infant for this slide because the reinforcement learning is very much in infant stages but it’s also probably the biggest machine learning demand out there right now or in the future it’s going to be coming up over the next few years is reinforcement learning and how to make that work for us and you can see here where we have our action in the action in this one it goes into the fire hopefully the baby didn’t it was just a little candle not a giant fire pit like it looks like here when the baby comes out and the new state is the baby is sad and crying because they got burned on the fire and then maybe they take another action the baby’s called the legent cuz it’s the one taking the actions and in this case they didn’t go into the fire they went a different direction and now the baby’s happy and laughing and playing reinforcement learning is very easy to understand because that’s how as humans that’s one of the ways we learn we learn whether it is you know you burn yourself on the stove don’t do that anymore don’t touch the stove in the big picture being able to have machine learning program or an AI be able to do this is huge because now we’re starting to learn how to learn that’s a big jump in the world of computer and machine learning and we’re going to go back and just kind of go back over supervised versus unsupervised learning understanding this is huge because this is going to come up in any project you’re working on we have in supervised learning we have labeled data we have direct feedback so someone’s already gone in there and said yes that’s a triangle no that’s not a triangle and then you predicted outcome so you have a nice prediction this is this this new set of data is coming in and we know what it’s going to be and then with unsupervised Trading it’s not labeled so we really don’t know what it is there’s no feedback so we’re not telling it whether it’s right or wrong we’re not telling it whether it’s a triangle or a square we’re not telling it to go left or right all we do is we’re finding hidden structure in the data grouping the data together to find out what connects to each other and then you can use these together so imagine you have an image and you’re not sure what you’re looking for so you go in and you have the unstructured data find all these things that are connected together and then somebody looks at those and labels them now you can take that label data and program something to predict what’s in the picture so you can see how they go back and forth and you can start connecting all these different tools together to make a bigger picture there are many interesting machine learning algorithms let’s have a look at a few of them hopefully this give you a little flavor of what’s out there and these are some of the most important ones that are currently being used we’ll take a look at linear regression decision tree and the support vector machine let’s start with a closer look at linear regression linear regression is perhaps one of the most well-known and well understood algorithms in statistics and machine learning linear regression is a linear model for example a model that assumes a linear relationship between the input variables X and the single output variable Y and you’ll see this if you remember from your algebra classes y equals mx + C imagine we are predicting distance traveled y from speed X our linear regression model representation for this problem would be y = m * x + C or distance = M * speed plus C where m is the coefficient and C is the Y intercept and we’re going to look at two different variations of this first we’re going to start with time is constant and you can see we have a bicyclist he’s got a safety gear on thank goodness speed equals 10 m/ second and so over a certain amount of time his distance equals 36 km we have a second bicyclist is going twice the speed or 20 m/ second and you can guess if he’s going twice the speed and time is a constant then he’s going to go twice the distance and that’s easily to compute 36 * 2 you get 72 kilm and so if you had the question of how fast with somebody’s going three times that speed or 30 m/ second is you can easily compute the distance in our head we can do that without needing a computer but we want to do this for more complicated data so it’s kind of nice to compare the two but’s just take a look at that and what that looks like in a graph so in a linear regression model we have our distance to the speed and we have our m equals the ve slope of the line and we’ll notice that the line has a plus slope and as speed increases distance also increases hence the variables have a positive relationship and so your speed of the person which equals yal MX plus C distance traveled in a fixed interval of time and we could very easily compute either following the line or just knowing it’s three times 10 m/s that this is roughly 102 km distance that this third bicep has traveled one of the key definitions on here is positive relationship so the slope of the line is positive as distance increase so does speed increase let’s take a look at our second example where we put distance is a constant so we have speed equals 10 m/ second they have a certain distance to go and it takes them 100 seconds to travel that distance and we have our second bicyclist who’s still doing 20 m per second since he’s going twice the speed we can guess that he’ll cover the distance in about half the time 50 seconds and of course you could probably guess on the third one 100 divided by 30 since he’s going three times the speed you could easily guess that this is 33333 seconds time we put that into a linear regression model or a graph if the distance is assumed to be constant let’s see the relationship between speed and time and as time goes up the amount of speed to go that same distance goes down so now m equals a minus ve slope of the line as the speed increases time decreases hence the variable has a negative relationship again there’s our definition positive relationship and negative relationship dependent on the slope of the line and with a simple formula like this um and even a significant amount of data Let’s uh see with the mathematical implementation of linear regression and we’ll take this data so suppose we have this data set where we have xyx = 1 2 3 4 5 standard series and the Y value is 3 22 43 when we take that and we go ahead and plot these points on a graph you can see there’s kind of a nice scattering and you could probably eyeball a line through the middle of it but we’re going to calculate that exact line for linear regression and the first thing we do is we come up here and we have the mean of XI and remember mean is basically the average so we added 5 + 4 + 3+ 2 + 1 and divide by five that simply comes out as three and then we’ll do the same for y we’ll go ahead and add up all those numbers and divide by five and we end up with the mean value of y of I equals 2.8 where the XI references it’s an average or means value and the Yi also equals a means value of y and when we plot that you’ll see that we can put in the Y = 2.8 and the xal 3 in there on our graph we kind of gave it a little different color so you can sort it out with the dash lines on it and it’s important to note that when we do the linear regression the linear regression model should go through that dot now let’s find our regression equation to find the best fit line remember we go ahead and take our yal MX plus C so we’re looking for M and C so to find this equation for our data we need to find our slope of M and our coefficient of c and we have y = mx + C where m equals the sum of x – x average * y – y aage or y means and X means over the sum of x – x means squared that’s how we get the slope of the value of the line and we can easily do that by creating some columns here we have XY computers are really good about iterating through data and so we can easily compute this and fill in a graph of data and in our graph you can easily see that if we have our x value of one and if you remember the XI or the means value was 3 1 – 3 = a -2 and 2 – 3 = a one so on and so forth and we can easily fill in the column of x – x i y – Yi and then from those we can compute x – x i^ 2 and x – x i * y – Yi and you can guess it that the next step is to go ahead and sum the different columns for the answers we need so we get a total of 10 for our x – x i^ 2 and a total of two for x – x i * y – Yi and we plug those in we get 2/10 which equals .2 so now we know the slope of our line equals 0.2 so we can calculate the value of c that’d be the next step is we need to know where crosses the y axis and if you remember I mentioned earlier that the linear regression line has to pass through the means value the one that we showed earlier we can just flip back up there to that graph and you can see right here there’s our means value which is 3 x = 3 and Y = 2.8 and since we know that value we can simply plug plug that into our formula y = 2x + C so we plug that in we get 2.8 = 2 * 3 + C and you can just solve for C so now we know that our coefficient equals 2.2 and once we have all that we can go ahead and plot our regression line Y = 2 * x + 2.2 and then from this equation we can compute new values so let’s predict the values of Y using x = 1 2 3 4 5 and plot the points remember the 1 2 3 4 5 was our original X values so now we’re going to see what y thinks they are not what they actually are and when we plug those in we get y of designated with Y of P you can see that x = 1 = 2.4 x = 2 = 2.6 and so on and so on so we have our y predicted values of what we think it’s going to be when we plug those numbers in and when we plot the predicted values along with the actual values we can see the difference and this is one of the things is very important with linear aggression in any of these models is to understand the error and so we can calculate the error on all of our different values and you can see over here we plotted um X and Y and Y predict and we drawn a little line so you can sort of see what the error looks like there between the different points so our goal is to reduce this error we want to minimize that error value on our linear regression model minimizing the distance there are lots of ways to minimize the distance between the line and the data points like of squared errors sum of absolute errors root mean square error Etc we keep moving this line through the data points to make sure the best fit line has the least Square distance between the data points and the regression line so to recap with a very simple linear regression model we first figure out the formula of our line through the middle and then we slowly adjust the line to minimize the error keep in mind this is a very simple formula the math gets even though the math is very much the same it gets much more complex as we add in different dimensions so this is only two Dimensions y = mx + C but you can take that out to X Z ijq all the different features in there and they can plot a linear regression model on all of those using the different formulas to minimize the error let’s go ahead and take a look at decision trees a very different way to solve problems in the linear regression model decision tree is a tree-shaped algorithm used to determine a course of action each branch of a tree represents a possible decision occurrence or reaction we have data which tells us if it is a good day to play golf and if we were to open this data up in a general spreadsheet you can see we have the Outlook whether it’s a rainy overcast Sunny temperature hot mild cool humidity windy and did I like to play golf that day yes or no so we’re taking a census and certainly I wouldn’t want a computer telling me when I should go play golf or not but you could imagine if you got up in the night before you’re trying to plan your day and it comes up and says tomorrow would be a good day for golf for you in the morning and not a good day in the afternoon or something like that this becomes very beneficial and we see this in a lot of applications coming out now where it gives you suggestions and lets you know what what would uh fit the match for you for the next day or the next purchase or the next uh whatever you know next mail out in this case is tomorrow a good day for playing golf based on the weather coming in and so we come up and let’s uh determine if you should play golf when the day is sunny and windy so we found out the forecast tomorrow is going to be sunny and windy and suppose we draw our tree like this we’re going to have our humidity and then we have our normal which is if it’s if you have a normal humidity you’re going to go play golf and if the humidity is really high then we look at the Outlook and if the Outlook is sunny overcast or rainy it’s going to change what you choose to do so if you know that it’s a very high humidity and it’s sunny you’re probably not going to play golf cuz you’re going to be out there miserable fighting the mosquitoes that are out joining you to play golf with you maybe if it’s rainy you probably don’t want to play in the rain but if it’s slightly overcast and you get just the right Shadow that’s a good day to play golf and be outside out on the green now in this example you can probably make your own tree pretty easily because it’s a very simple set of data going in but the question is how do you know what to split where do you split your data what if this is much more complicated data where it’s not something that you would particularly understand like studying cancer they take about 36 measurements of the cancerous cells and then each one of those measurements represents how bulbous it is how extended it is how sharp the edges are something that as a human we would have no understanding of so how do we decide how to split that data up and is that the right decision tree but so that’s the question is going to come up is this the right decision tree for that we should calculate entropy and Information Gain two important vocabulary words there are the entropy and the information gain entropy entropy is a measure of Randomness or impurity in the data set entropy should be low so we want the chaos to be as low as possible we don’t want to look at it and be confused by the images or what’s going on there with mixed data and the Information Gain it is the measure of decrease in entropy after the data set is split also known as entropy reduction Information Gain should be high so we want our information that we get out of the split to be as high as possible possible let’s take a look at entropy from the mathematical side in this case we’re going to denote entropy as I of P of and N where p is the probability that you’re going to play a game of golf and N is the probability where you’re not going to play the game of golf now you don’t really have to memorize these formulas there’s a few of them out there depending on what you’re working with but it’s important to note that this is where this formula is coming from so when you see it you’re not lost when you’re running your programming unless you’re building your own decision tree code in the back and we simply have a log s of P Over p+ N minus n/ P plus n * the log s of n of p plus n but let’s break that down and see what actually looks like when we’re Computing that from the computer script side entropy of a target class of the data set is the whole entropy so we have entropy play golf and we look at this if we go back to the data you can simply count how many yeses and no in our complete data set for playing golf days in our complete set we find we have five days we did play golf and N9 days we did not play golf and so our I equals if you add those together 9 + 5 is 14 and so our I equals 5 over 14 and 9 over 14 that’s our P andn values that we plug into that formula and you can go 5 over 14 = 36 9 over 14 = 64 and when you do the whole equation you get the minus. 36 logun SAR of .36 -64 log s < TK of 64 and we get a set value we get .94 so we now have a full entropy value for the whole set of data that we’re working with and we want to make that entropy go down and just like we calculated the entropy out for the whole set we can also calculate entropy for playing golf in the Outlook is it going to be overcast or rainy or sunny and so we look at the entropy we have P of Sunny time e of 3 of 2 and that just comes out how many sunny days yes and how many sunny days no over the total which is five don’t forget to put the we’ll divide that five out later on equals P overcast = 4 comma 0 plus rainy = 2 comma 3 and then when you do the whole setup we have 5 over 14 remember I said there was a total of five 5 over 14 * the I of3 of 2 + 4 over 14 * the 4 comma 0 and 514 over I 23 and so we can now compute the entropy of just the part it has to do with the forecast and we get 693 similarly we can calculate the entropy of other predictors like temperature humidity and wind and so we look at the gain Outlook how much are we going to gain from this entropy play golf minus entropy play golf Outlook and we can take the original 0.94 for the whole set minus the entropy of just
the um rainy day in temperature and we end up with a gain of. 247 so this is our Information Gain remember we Define entropy and we Define Information Gain the higher the information gain the lower the entropy the better the information gain of the other three attributes can be calculated in the same way so we have our gain for temperature equals 0.029 we have our gain for humidity equals 0.152 and our gain for a windy day equals 048 and if you do a quick comparison you’ll see the. 247 is the greatest gain of information so that’s the split we want now let’s build the decision tree so we have the Outlook is it going to be sunny overcast or rainy that’s our first split because that gives us the most Information Gain and we can continue to go down the tree using the different information gains with the largest information we can continue down the nodes of the tree where we choose the attribute with the largest Information Gain as the root node and then continue to split each sub node with the largest Information Gain that we can compute and although it’s a little bit of a tongue twister to say all that you can see that it’s a very easy to view visual model we have our Outlook we split it three different directions if the Outlook is overcast we’re going to play and then we can split those further down if we want so if the over Outlook is sunny but then it’s also windy if it’s uh windy we’re not going to play if it’s uh not windy we’ll play so we can easily build a nice decision tree to guess what we would like to do tomorrow and give us a nice recommendation for the day so so we want to know if it’s a good day to play golf when it’s sunny and windy remember the original question that came out tomorrow’s weather report is sunny and windy you can see by going down the tree we go Outlook Sunny Outlook windy we’re not going to play golf tomorrow so our little Smartwatch pops up and says I’m sorry tomorrow is not a good day for golf it’s going to be sunny and windy and if you’re a huge golf fan you might go uhoh it’s not a good day to play golf we can go in and watch a golf game at home so we’ll sit in front of the TV instead of being out playing golf in the Wind now that we looked at our decision tree let’s look at the third one of our algorithms we’re investigating support Vector machine support Vector machine is a widely used classification algorithm the idea of support Vector machine is simple the algorithm creates a separation line which divides the classes in the best possible manner for example dog or cat disease or no disease suppose we have a labeled sample data which tells height and weight of males and females a new data point arrives and we want to know whether it’s going to be a male or a female so we start by drawing a line we draw decision lines but if we consider decision line one then we will classify the individual as a male and if we consider decision line two then it will be a female so you can see this person kind of lies in the middle of the two group so it’s a little confusing trying to figure out which line they should be under we need to know which line divides the classes correctly but how the goal is to choose a hyperplan and that is one of the key words they use when we talk about support Vector machines choose a hyperplane with the greatest possible margin between the decision line and the nearest Point within the training set so you can see here we have our support Vector we have the two nearest points to it and we draw a line between those two points and the distance margin is the distance between the hyperplane and the nearest data point from either set so we actually have a value and it should be equal lead distant between the two um points that we’re comparing it to when we draw the hyperplanes we observe that line one has a maximum distance so we observe that line one has a maximum distance margin so we’ll classify the new data point correctly and our result on this one is going to be that the new data point is Mel one of the reasons we call it a hyperplane versus a line is that a lot of times we’re not looking at just weight and height we might be looking at 36 different features or dimensions and so when we cut it with a hyper plane it’s more of a three-dimensional cut in the data multi dimensional it cuts the data a certain way and each plane continues to cut it down until we get the best fit or match let’s understand this with the help of an example problem statement I always start with a problem statement when you’re going to put some code together we’re going to do some coding now classifying muffin and cupcake recipes using support Vector machines so the cupcake versus the muffin let’s have a look at our data set and we have the different recipes here we have a muffin recipe that has so much flour I’m not sure what measurement 50 5 is in but it has 55 maybe it’s ounces but it has certain amount of flour certain amount of milk sugar butter egg baking powder vanilla and salt and So based on these measurements we want to guess whether we’re making a muffin or a cupcake and you can see in this one we don’t have just two features we don’t just have height and weight as we did before between the male and female in here we have a number of features in fact in this we’re looking at eight different features to guess whether it’s a muffin or a cupcake what’s the difference between a muffin and a cupcake turns out muffins have more flour while cupcakes have more butter and sugar so basically the cupcakes a little bit more of a dessert where the muffins a little bit more of a fancy bread but how do we do that in Python how do we code that to go through recipes and figure out what the recipe is and I really just want to say cupcakes versus muffins like some big professional wrestling thing before we start in our cupcakes versus muffins we are going to be working in Python there’s many versions of python many different editors that is one of the strengths and weaknesses of python is it just has so much stuff attached to it and it’s one of the more popular data science programming packages you can use in this case we’re going to go ahead and use anaconda and Jupiter notebook the Anaconda Navigator has all kinds of fun tools once you’re into the anacon Navigator you can change environments I actually have number of environments on here we’ll be using python 36 environment so this is in Python version 36 although it doesn’t matter too much which version you use I usually try to stay with the 3x because they’re current unless you have a project that’s very specifically in version 2x 27 I think is usually what most people use in the version two and then once we’re in our um Jupiter notebook editor I can go up and create a new file and we’ll just jump in here in this case we’re doing spvm muffin versus Cupcake and then let’s start with our packages for data analysis and we almost always use a couple there’s a few very standard packages we use we use import oops import import numpy that’s for number python they usually denoted as NP that’s very commona that’s very common and then we’re going to import pandas as PD and numpy deals with number arrays there’s a lot of cool things you can do with the numpy uh setup as far as multiplying all the values in an array in an numpy array data array pandas I can’t remember if we’re using it actually in this data set I think we do as an import it makes a nice data frame and the difference between a data frame and a nump array is that a data frame is more like your Excel spreadsheet you have columns you have indexes so you have different ways of referencing it easily viewing it and there’s additional features you can run on a data frame and pandas kind of sits on numpy so they you need them both in there and then finally we’re working with the support Vector machine so from sklearn we’re going to use the sklearn model import svm support Vector machine and then as a data scientist you should always try to visualize your data some data obviously is too complicated or doesn’t make any sense to the human but if it’s possible it’s good to take a second look at it so that you can actually see what you’re doing now for that we’re going to use two packages we’re going to import matplot library. pyplot as PLT again very common and we’re going to import caborn as SNS and we’ll go ahead and set the font scale in the SNS right in our import line that’s with this um semi colon followed by a line of data we’re going to set the SNS and these are great because the the C born sits on top of matap plot Library just like Panda sits on numpy so it adds a lot more features and uses and control we’re obviously not going to get into matplot library and caborn that’ be its own tutorial we’re really just focusing on the svm the support Vector machine from sklearn and since we’re in Jupiter notebook uh we have to add a special line in here for our matplot library and that’s your percentage sign or Amber sign matplot Library in line now if you’re doing this in just a straight code Project A lot of times I use like notepad++ and I’ll run it from there you don’t have to have that line in there because it’ll just pop up as its own window on your computer depending on how your computer set up because we’re running this in the jupyter notebook as a browser setup this tells it to display all of our Graphics right below on the page so that’s what that line is for remember the first time I ran this I didn’t know that and I had to go look that up years ago it’s quite a headache so map plot library in line is just because we’re running this on the web setup and we can go ahead and run this make sure all our modules are in they’re all imported which is great if you don’t have them import you’ll need to go ahead and pip use the PIP or however you do it there’s a lot of other install packages out there although pip is the most common and you have to make sure these are all installed on your python setup the next step of course is we got to look at the data can’t run a model for predicting dat data if you don’t have actual data so to do that let me go ahe and open this up and take a look and we have our uh cupcakes versus muffins and it’s a CSV file or CSV meaning that it’s comma separated variable and it’s going to open it up in a nice uh spread sheet for me and you can see up here we have the type we have muffin muffin muffin cupcake cupcake cupcake and then it’s broken up into flour milk sugar butter egg baking powder vanilla and salt so we can do is we can go ahead and look at the this data also in our python let us create a variable recipes equals we’re going to use our pandas module. read CSV remember was a comma separated variable and the file name happened to be cupcakes versus muffins oops I got double brackets there do it this way there we go cupcakes versus muffins because the program I loaded or the the place I saved this particular Python program is in the same folder we can get by with just the file name but remember if you’re storing it in a different location you have to also put down the full path on there and then because we’re in pandas we’re going to go ahead and you can actually in line you can do this but let me do the full print you can just type in recipes. head in the Jupiter notebook but if you’re running in code in a different script you need to go ahead and type out the whole print recipes. head and Panda’s knows that that’s going to do the first five lines of data and if we flip back on over to the spreadsheet where we opened up our CSV file uh you can see where it starts on line two this one calls it zero and then 2 34 5 6 is going to match go and close that out cuz we don’t need that anymore and it always starts at zero and these are it automatically indexes it since we didn’t tell it to use an index in here so that’s the index number for the leand side and it automatically took the top row as uh labels so Panda’s using it to read a CSV is just really slick and fast one of the reasons we love our pandas not just because they’re cute and cuddly teddy bears and let’s go ahead and plot our data and I’m not going to plot all of it I’m just going to plot the uh sugar and flour now obviously you can see where they get really complicated if we have tons of different features and so you’ll break them up and maybe look at just two of them at a time to see how they connect and to plot them we’re going to go ahead and use Seaborn so that’s our SNS and the command for that is SNS dolm plot and then the two different variables I’m going to plot is flour and sugar data equals recipes the Hue equals type and this is a lot of fun because it knows that this is pandas coming in so this is one of the powerful things about pandas mixed with Seaborn and doing graphing and then we’re going to use a pallet set one there’s a lot of different sets in there you can go look them up for Seaborn or do a regular a fit regular equals false so we’re not really trying to fit anything and it’s a scatter kws a lot of these settings you can look up in Seaborn half of these you could probably leave off when you run them somebody played with this and found out that these were the best settings for doing a Seaborn plot let’s go ahead and run that and because it does it in line it just puts it right on the page and you can see right here that just based on sugar and flour alone there’s a definite split and we use these models because you can actually look at it and say hey if I drew a line right between the middle of the blue dots and the red dots we’d be able to do an svm and and a hyperplane right there in the middle then the next step is to format or pre process our data and we’re going to break that up into two parts we need to type label and remember we’re going to decide whether it’s a muffin or cupcake well a computer doesn’t know muffin or cupcake it knows zero and one so what we’re going to do is we’re going to create a type label and from this we’ll create a numpy array andp where and this is where we can do some logic we take our recipes from our Panda and wherever type equals muffin it’s going to be zero and then if it doesn’t equal muffin which is cupcakes is going to be one so we create our type label this is the enser so when we’re doing our training model remember we have to have a a training data this is what we’re going to train it with is that it’s zero or one it’s a muffin or it’s not and then we’re going to create our recipe features and if you remember correctly from right up here the First Column is type so we really don’t need the type column that’s our muffin or cupcake and in pandas we can easily sort that out we take our value recipes dot columns that’s a pandas function built into pandas do values converting them to values so it’s just the column titles going across the top and we don’t want the first one so what we do is since it’s always starts at zero we want one colon till the end and then we want to go ahead and make this a list and this converts it to a list of strings and then we can go ahead and just take a look and see what we’re looking at for the features make sure it looks right let me go ahead and run that and I forgot the S on recipes so we’ll go ahead and add the s in there and then run that and we can see we have flour milk sugar butter egg baking powder vanilla and salt and that matches what we have up here right where we printed out everything but the type so we have our features and we have our label Now the recipe features is just the titles of the columns and we actually need the ingredients and at this point we have a couple options one we could run it over all the ingredients and when you’re doing this usually you do but for our example we want to limit it so you can easily see what’s going on because if we did all the ingredients we have you know that’s what um seven eight different hyperplanes that would be built into it we only want to look at one so you can see what the svm is doing and so we’ll take our recipes and we’ll do just flour and sugar again you can replace that with your recipe features and do all of them but we’re going to do just flour and sugar and we’re going to convert that to values we don’t need to make a list out of it because it’s not string values these are actual values on there and we can go ahead and just print ingredients you can see what that looks like uh and so we have just the N of flour and sugar just the two sets of plots and just for fun let’s go ahead and take this over here and take our recipe features and so if we decided to use all the recipe features you’ll see that it makes a nice column of different data so it just strips out all the labels and everything we just have just the values but because we want to be able to view this easily in a plot later on look go ahead and take that and just do flour and sugar and we’ll run that and you’ll see it’s just the two columns so the next step is to go ahead and fit our model we’ll go and just call it model and it’s a svm we’re using a package called SVC in this case we’re going to go ahead and set the kernel equals linear so it’s using a specific setup on there and if we go to the reference on their website for the svm you’ll see that there’s about there’s eight of them here three of them are for regression three are for classification the SVC support Vector classification is probably one of the most commonly used and then there’s also one for detecting outliers and another one that has to do with something a little bit more specific on the model but SVC and SBR are the two most commonly used standing for support vector classifier and support vector regr regression remember regression is an actual value a float value or whatever you’re trying to work on and SBC is a classifier so it’s a yes no true false but for this we want to know 01 muffin cupcake we go ahead and create our model and once we have our model created we’re going to do model.fit and this is very common especially in the sklearn all their models are followed with the fit command and what we put into the fit what we’re training with it is we’re putting in the ingredients which in this case we limited to just flour and sugar and the type label is it a muffin or a cupcake now in more complicated data science series you’d want to split into we won’t get into that today we split it into training data and test data and they even do something where they split it into thirds where a third is used for where you switch between which one’s training and test there’s all kinds of things go into that and gets very complicated when you get to the higher end not overly complicated just an extra step which we’re not going to do today because this is a very simple set of data and let’s go ahead and run this and now we have our model fit and I got a error here so let me fix that real quick it’s Capital SVC it turns out I did it lowercase support Vector classifier there we go let’s go ahead and run that and you’ll see it comes up with all this information that it prints out automatically these are the defaults of the model you notice that we changed the kernel to linear and there’s our kernel linear on the printout and there’s other different settings you can mess with we’re going to just leave that alone for right now for this we don’t really need to mess with any of those so next we’re going to dig a little bit into our newly trained model and we’re going to do this so we can show you on a graph and let’s go ahead and get the separating we’re going to say we’re going to use a W for our variable on here we’re going to do model. coefficient 0 so what the heck is that again we’re digging into the model so we’ve already got a prediction and a train this is a math behind it that we’re looking at right now and so the W is going to represent two different coefficients and if you remember we have y = mx + C so these coefficients are connected to that but in two-dimensional it’s a plane we don’t want to spend too much time on this because you can get lost in the confusion of the math so if you’re a math Wiz this is great you can go through here and you’ll see that we have AAL minus W 0 over W of 1 remember there’s two different values there and that’s basically the slope that we’re generating and then we’re going to build an XX what is XX we’re going to set it up to a numpy array there’s our NP doline space so we’re creating a line of values between 30 and 60 so it just creates a set of numbers for x and then if you remember correctly we have our formula y equal the slope X X Plus The Intercept well to make this work we can do this as y y equals the slope times each value in that array that’s the neat thing about numpy so when I do a * XX which is a whole numpy array of values it multiplies a across all of them and then it takes those same values and we subtract the model intercept that’s your uh we had MX plus C so that’d be the C from the formula y = mx plus C and that’s where all these numbers come from a little bit confusing cuz digging out of these different arrays and then we want to do is we’re going to take this and we’re going to go ahead and plot it so plot the parallels to separating hyper plane that pass through the support vectors and so we’re going to create b equals a model support vectors pulling our support vectors out there here’s our YY which we now know is a set of data and we have we’re going to create y y down = a * XX + B1 – A * B Z and then model support Vector B is going to be set that to a new value the minus one setup and YY up equals a * XX + B1 – A * B 0 and we can go ahead and just run this to load these variables up if you wanted to know understand a little bit more of what’s going on you can see if we print y y you just run that you can see it’s an array it’s this is a line it’s going to have in this case between 30 and 60 so there going to be 30 variables in here and the same thing with y y up y y down and we’ll we’ll plot those in just a minute on a graph so you can see what those look like we just go ahead and delete that out of here and run that so it loads up the variables nice clean slate I’m just going to copy this from before remember this our SNS or Seaborn plot LM plot flow sugar and I’ll just go and run that real quick so can see what remember what that looks like it’s just a straight graph on there and then one of the new things is because caborn sits on top of Pi plot we can do the P plot for the line going through and that is simply PLT do plot and that’s our xx and y y our two corresponding values XY and then somebody played with this to figure out that the line width equals two and the color black would look nice so let’s go ahead and run this whole thing with the PIP plot on there and you can see when we do this it’s just doing flour and sugar on here corresponding line between the sugar and the flour and the muffin versus Cupcake um and then we generated the um support vectors the y y down and y y up so let’s take a look and see what that looks like so we’ll do our PL plot and again this is all against XX the RX x value but this time we have YY down and let’s do something a little fun with this we can put in a k dash dash that just tells it to make it a dotted line and if we’re going to do the down one we also want to do the up one so here’s our YY up and when we run that it adds both sets aligned and so here’s our support and this is what you expect you expect these two lines to go through the nearest data point so the dash lines go through the nearest muffin and the nearest cupcake when it’s plotting it and then your svm goes right down the middle so it gives it a nice split in our data and you can see how easy it is to see based just on sugar and flour which one’s a muffin or a cupcake let’s go ahead and create a function to predict muffin or cupcake I’ve got my um recipes I pulled off the um internet and I want to see the difference between a muffin or a cupcake and so we need a function to push that through and create a function with DEA and let’s call it muffin or cupcake and remember we’re just doing flour and sugar today we’re not doing all the ingredients and that actually is a pretty good split you really don’t need all the ingredients to know it’s flour and sugar and let’s go ahead and do an IFL statement so if model predict is a flower and sugar equals zero so we take our model and we do run a predict it’s very common in sklearn where you have a DOT predict you put the data in and it’s going to return a value in this case if it equals zero then print you’re looking at a muffin recipe else if it’s not zero that means it’s one then you’re looking at a cupcake recipe that’s pretty straightforward for function or def for definition DF is how you do that in Python and of course we going to create a function you should run something in it and so let’s run a cupcake and we’re going to send it values 50 and 20 a muffin or a cupcake I don’t know what it is and let’s run this and just see what it gives us and it says oh it’s a muffin you’re looking at a muffin recipe so it very easily predicts whether we’re looking at a muffin or a cupcake recipe let’s plot this there we go plot this on the graph so we can see what that actually looks like and I’m just going to copy and paste it from below we we plotting all the points in there so this is nothing different than what we did before if I run it you’ll see it has all the points and the lines on there and we want to do is we want to add another point and we’ll do PLT plot and if I remember correctly we did for our test we did 50 and 20 and then somebody went in here just decided we’ll do yo for yellow or it’s kind of a orange is yellow color is going to come out marker size nine those are settings you can play with somebody else played with them to come up with the right setup so it looks good and you can see there it is graph um clearly a muffin in this case in cupcakes versus muffins the muffin has won and if you’d like to do your own muffin cupcake Contender series you certainly can send a note down below and the team at simply learn will send you over the data they use for the muffin and cupcake and that’s true of any of the data um we didn’t actually run a plot on it earlier we had men versus women you can also request that information to run it on your data setup so you can test that out so to go back over our setup we went ahead for our support Vector machine code we did a predict 40 Parts flower 20 Parts sugar I think it was different than the one we did whether it’s a muffin or a cupcake hence we have built a classifier using spvm which is able to classify if a recipe is of a cupcake or a muffin which wraps up our cupcake versus muffin today in our second tutorial we’re going to cover K means and linear regression along with going over the quiz questions we had during our first tutorial what’s in it for you we’re going to cover clustering what is clustering K means clustering which is one of the most common used clustering tools out there including a flowchart to understand K means clustering and how it functions and then we’ll do an actual python live demo on clustering of cars based on Brands then we’re going to cover logistic regression what is logistic regression logistic regression curve in sigmoid function and then we’ll do another python code demo to classify a tumor as malignant or benign based on features and let’s start with clustering suppose we have a pile of books of different genres now we divide them into different groups like fiction horror education and as we can see from this young lady she definitely is into heavy horror you can just tell by those eyes in the maple Canadian leaf on her shirt but we have fiction horror and education and we want to go ahead and divide our books up well organizing objects into groups based on similarity is clustering and in this case as we’re looking at the books we’re talking about clustering things with knowing categories but you can also use it to explore data so you might not know the categories you just know that you need to divide it up in some way to conquer the data and to organize it better but in this case we’re going to be looking at clustering in specific categories and let’s just take a deeper look at that we’re going to use K means clustering K means clustering is probably the most commonly used clustering tool in the machine learning library K means clustering is an example of unsupervised learning if you remember from our previous thing it is used when you have unlabeled data so we don’t know the answer yet we have a bunch of data that we want to Cluster to different groups Define clusters in the data based on feature similarity so we’ve introduced a couple terms here we’ve already talked about unsupervised learning and unlabeled data so we don’t know the answer yet we’re just going to group stuff together and see if we can find an answer of how things connect we’ve also introduced feature similarity features being different features of the data now with books we can easily see fiction and horror and history books but a lot of times with data some of that information isn’t so easy to see right when we first look at it and so K means is one of those tools where we can start finding things that connect that match with each other suppose we have these data points and want to assign them into a cluster now when I look at these data points I would probably group them into two clusters just by looking at them I’d say two of these group of data kind of come together but in K means we pick K clusters and assign random centroids to clusters where the K clusters represents two different clusters we pick K clusters and S random centroids to the Clusters then we compute distance from objects to the centroids now we form new clusters based on minimum distances and calculate the centroids so we figure out what the best distance is for the centroid then we move the centroid and recalculate those distances repeat previous two steps iteratively till the cluster centroid stop changing their positions and become Static repeat previous two steps iteratively till the cluster centroid stop changing and the positions become Static once the Clusters become Static then K means clustering algorithm is said to be converged and there’s another term we see throughout machine learning is converged that means whatever math we’re using to figure out the answer has come to a solution or it’s converged on an answer shall we see the flowchart to understand make a little bit more sense by putting it into a nice easy step by step so we start we choose K we’ll look at the elbow method in just a moment we assign random centroids to clusters and sometimes you pick the centroids because you might look at the data in in a graph and say oh these are probably the central points then we compute the distance from the objects to the centroids we take that and we form new clusters based on minimum distance and calculate their centroids then we compute the distance from objects to the new centroids and then we go back and repeat those last two steps we calculate the distances so as we’re doing it it brings into the new centroid and then we we move the centroid around and we figure out what the best which objects are closest to each centroid so the objects can switch from one centroid to the other as a centroids are moved around and we continue that until it is converged let’s see an example of this suppose we have this data set of seven individuals and their score on two topics A and B so here’s our subject in this case referring to the person taking the uh test and then we have subject a where we see what they’ve scored on their first subject and we have subject B and we can see what they score on the second subject now let’s take two farthest apart points as initial cluster centroids now remember we talked about selecting them randomly or we can also just put them in different points and pick the furthest one apart so they move together either one works okay depending on what kind of data you’re working on and what you know about it so we took the two furthest points one and one and five and seven and now let’s take the two farthest apart points as initial cluster centroids each point is then assigned to the closest cluster with respect to the distance from the centroids so we take each one of these points in there we measure that distance and you can see that if we measured each of those distances and you use the Pythagorean theorem for a triangle in this case because you know the X and the Y and you can figure out the diagonal line from that or you just take a ruler and put it on your monitor that’d be kind of silly but it would work if you’re just eyeballing it you can see how they naturally come together in certain areas now we again calculate the centroids of each cluster so cluster one and then cluster two and we look at each individual dot there’s one 2 three we’re in one cluster uh the centroid then moves over it becomes 1.8 comma 2.3 so remember it was at one and one well the very center of the data we’re looking at would put it at the one point roughly 22 but 1.8 and 2.3 and the second one if we wanted to make the overall mean Vector the average Vector of all the different distances to that centroid we come up with 4 comma 1 and 54 so we’ve now moved the centroids we compare each individual’s distance to its own cluster mean and to that of the opposite cluster and we find can build a nice chart on here that the as we move that centroid around we now have a new different kind of clustering of groups and using ukian distance between the points and the mean we get the same formula you see new formulas coming up so we have our individual dots distance to the mean cent of the cluster and distance to the mean cent of the cluster only only individual three is nearer to the mean of the opposite cluster cluster two than its own cluster one and you can see here in the diagram where we’ve kind of circled that one in the middle so when we’ve moved the clust the centroids of the Clusters over one of the points shifted to the other cluster because it’s closer to that group of individuals thus individual 3 is relocated to Cluster 2 resulting in a new Partition and we regenerate all those numbers of how close they are to the different clusters for the new clusters we will find the actual cluster centroids so now we move the centroids over and you can see that we’ve now formed two very distinct clusters on here on comparing the distance of each individual’s distance to its own cluster mean and to that of the opposite cluster we find that the data points are stable hence we have our final clusters now if you remember I brought up a concept earlier K me on the K means algorithm choosing the right value of K will help in less number of iterations and to find the appropriate number of clusters in a data set we use the elbow method and within sum of squares WSS is defined as the sum of the squared distance between each member of the cluster in its centroid and so you see we’ve done here is we have the number of clusters and as you do the same K means algorithm over the different clusters and you calculate what that centroid looks like and you find the optimal you can actually find the optimal number of clusters using the elbow the graph is called as the elbow method and on this we guessed at two just by looking at the data but as you can see the slope you actually just look for right there where the elbow is in the slope and you have a clear answer that we want two different to start with k means equals 2 A lot of times people end up Computing K means equals 2 3 4 5 until they find the value which fits on the elbow joint sometimes you can just look at the data and if you’re really good with that specific domain remember domain I mentioned that last time you’ll know that that where to pick those numbers or where to start guessing at what that K value is is so let’s take this and we’re going to use a use case using K means clustering to Cluster cars into Brands using parameters such as horsepower cubic inches make year Etc so we’re going to use the data set cars data having information about three brands of cars Toyota Honda and Nissan we’ll go back to my favorite tool the Anaconda Navigator with the Jupiter notebook and let’s go ahead and flip over to our Jupiter notebook and in our Jupiter notebook I’m going to go ahead and just paste the uh basic code that we usually start a lot of these off with we’re not going to go too much into this code because we’ve already discussed numpy we’ve already discussed matplot library and pandas numpy being the number array pandas being the pandas data frame and matap plot for the graphing and don’t forget uh since if you’re using the jupyter notebook you do need the map plot library in line so that it plots everything on the screen if you’re using a different python editor then you probably don’t need that cuz it’ll have a popup window on your computer and we’ll go ahead and run this just to load our libraries and our setup into here the next step is of course to look at our data which I’ve already opened up in a spreadsheet and you can see here we have the miles per gallon cylinders cubic inches horsepower weight pounds how you know how heavy it is time it takes to get to 60 my card is probably on this one at about 80 or 90 what year it is so this is you can actually see this is kind of older cars and then the brand Toyota Honda Nissan so the different cars are coming from all the way from 1971 if we scroll down to uh the 80s we have between the 70s and 80s a number of cars that they’ve put out and let’s uh when we come back here we’re going to do importing the data so we’ll go ahead and do data set equals and we’ll use pandas to read this in and it’s uh from a CSV file remember you can always post this in the comments and request the data files for these either in the comments here on the you YouTube video or go to Simply learn.com and request that the cars CSV I put it in the same folder as the code that I’ve stored so my python code is stored in the same folder so I don’t have to put the full path if you store them in different folders you do have to change this and double check your name variables and we’ll go ahead and run this and uh We’ve chosen data set arbitrarily because you know it’s a data set we’re importing and we’ve now imported our car CSV into the data set as you know you have to prep the data so we’re going to create the X data this is the one that we’re going to try to figure out what’s going on with and then there is a number of ways to do this but we’ll do it in a simple Loop so you can actually see what’s going on so we’ll do for i n x. columns so we’re going to go through each of the columns and a lot of times it’s important I I’ll make lists of the columns and do this because I might remove certain columns or there might be colums that I want to be processed differently but for this we can go ahead and take X of I and we want to go fill Na and that’s a panda command but the question is what are we going to fill the missing data with we definitely don’t want to just put in a number that doesn’t actually mean something and so one of the tricks you can do with this is we can take X of I and in addition to that we want to go ahead and turn this into an integer because a lot of these are integers so we’ll go ahead and keep it integers and me add the bracket here and a lot of editors will do this they’ll think that you’re closing one bracket make sure get that second bracket in there if it’s a double bracket that’s always something that happens regularly so once we have our integer of X of Y this is going to fill in any missing data with the average and I was so busy closing one set of brackets I forgot that the mean is also has brackets in there for the pandas so we can see here we’re going to fill in all the data with the average value for that column so if there’s missing data is in the average of the data it does have then once we’ve done that we’ll go ahead and loop through it again and just check and see to make sure everything is filled in correctly and we’ll print and then we take X is null and this returns a set of the null value or the how many lines are null and we’ll just sum that up to see what that looks like and so when I run this and so with the X what we want to do is we want to remove the last column because that had the models that’s what we’re trying to see if we can cluster these things and figure out the models there is so many different ways to sort the X out for one we could take the X and we could go data set our variable we’re using and use the iocation one of the features that’s in pandas and we could take that and then take all the rows and all but the last column of the data set and at this time we could do values we just convert it to values so that’s one way to do this and if I let me just put this down here and print X it’s a capital x we chose and I run this you can see it’s just the values we could also take out the values and it’s not going to return anything because there’s no values connected to it what I like to do with this is instead of doing the iocation which does integers more common is to come in here and we have our data set and we’re going to do data set dot or data set. columns and remember that lists all the columns so if I come in here let me just Mark that as red and I print data set . columns you can see that I have my index here I have my MPG cylinders everything including the brand which we don’t want so the way to get rid of the brand would be to do data Columns of Everything But the last one minus one so now if I print this you’ll see the brand disappears and so I can actually just take data set columns minus one and I’ll put it right in here for the columns we’re going to look at and uh let’s unmark this and unmark this and now if I do an x. head I now have a new data frame and you can see right here we have all the different columns except for the brand at the end of the year and it turns out when you start playing with the data set you’re going to get an error later on and it’ll say cannot convert string to uh float value and that’s because for some reason these things the way they recorded them must been recorded as strings so we have a neat feature in here on pandas to convert and it is simply convert objects and for this we’re going to do convert oops convert underscore numeric numeric equals true and yes I did have to go look that up I don’t have it memorized the convert numeric in there if I’m working with a lot of these things I remember them but um depending on where I’m at what I’m doing I usually have to look it up and we run that oops I must have missed something in here let me double check my spelling and when I double check my spelling you’ll see I missed the first underscore in the convert objects when I run this it now has everything converted into a numeric value because that’s what we’re going to be working with is numeric values down here and the next part is that we need to go through the data and eliminate null values most people when they’re doing small amounts working with small data pools discover afterwards that they have a null value and they have to go back and do this so you know be aware whenever we’re formatting this data things are going to pop up and sometimes you go backwards to fix it and that’s fine that’s just part of exploring the data and understanding what you have and I should have done this earlier but let me go ahead and increase the size of my window one notch there we go easier to see so we’ll do 4 I in working with x. columns we’ll page through all the columns and we want to take X of I and we’re going to change that we’re going to alter it and so with this we want to go ahead and fill in X of I pandis Has the fill in a and that just fills in any non-existent missing data and we’ll put my brackets up and there’s a lot of different ways to fill this data if you have a really large data set some people just void out that data because if and then look at it later in a separate exploration of data one of the tricks we can do is we can take our column and we can find the means and the means is in our quotation marks so when we take the columns we’re going to fill in the the non-existing one with the means the problem is that returns a decimal float so some of these aren’t decimals certainly need to be a little careful of doing this but for this example we’re just going to fill it in with the integer version of this keeps it on par with the other data that isn’t a decimal point and then what we also want to do is we want to double check A lot of times you do this first part first to double check then you do the fill and then you do it again just to make sure you did it right so we’re going to go through and test for missing data and one of the re ways you can do that is simply go in here and take our X of I column so it’s going to go through the X ofi column it says is null so it’s going to return any any place there’s a null value it actually goes through all the rows of each column is null and then we want to go ahead and sum that so we take that we add the sum value and these are all pandas so is null is a panda command and so is some and if we go through that and we go ahead and run it and we go ahead and take and run that you’ll see that all the columns have zero null values so we’ve now tested and double checked and our data is nice and clean we have no null values everything is now a number value we turned it into numeric and we’ve removed the last column in our data and at this point we’re actually going to start using the elbow method to find the optimal number of clusters so we’re now actually getting into the SK learn part uh the K means clustering on here I guess we’ll go ahead and zoom it up one more notot so you can see what I’m typing in here and then from sklearn going to or sklearn cluster I’m going to import K means I always forget to capitalize the K and the M when I do this say capital K capital M K means and we’ll go and create a um aray wcss equals let me get an empty array if you remember from the albow method from our slide within the sums of squares WSS is defined as the sum of square distance between each member of the cluster and its centroid so we’re looking at that change in differences as far as a squar distance and we’re going to run this over a number of K mean values in fact let’s go for I in range we’ll do 11 of them range 0 11 and the first thing we’re going to do is we’re going to create the actual we’ll do it all lowercase and so we’re going to create this object from the K means that we just imported and the variable that we want to put into this is in clusters and we’re going to set that equals to I that’s the most important one because we’re looking at how increasing the number of clusters changes our answer there are a lot lot of settings to the K means our guys in the back did a great job just kind of playing with some of them the most common ones that you see in a lot of stuff is how you init your K means so we have K means plus plus plus this is just a tool to let the model itself be smart how it picks it centroids to start with its initial centroids we only want to iterate no more than 300 times we have a Max iteration we put in there we have the inth the knit the random State equals zero you really don’t need to worry too much about these when you’re first learning this as you start digging in deeper you start finding that these are shortcuts that will speed up the process as far as a setup but the big one that we’re working with is the in clusters equals I so we’re going to literally train our K means 11 times we’re going to do this process 11 times in if you’re working with uh Big Data you know the first thing you do is you run a small sample of the data so you can test all your stuff on it and you can already see the problem that if I’m going to iterate through a terabyte of data 11 times and then the K means itself is iterating through the data multiple times that’s a heck of a process so you got to be a little careful with this a lot of times though you can find your elbow using the elbow method find your optimal number on a sample of data especially if you’re working with larger data sources so we want to go ahead and take our K means and we’re just going to fit it if you’re looking at any of the sklearn very common common you fit your model and if you remember correctly our variable we’re using is the capital x and once we fit this value we go back to the um array we made and we want to go just to pin that value on the end and it’s not the actual fitware pinning in there it’s when it generates it it generates the value you’re looking for is inertia so K means. inertia will pull that specific value out that we need and let’s get a visual on this we’ll do our PL T plot and what we’re plotting here is first the xaxis which is range 0 11 so that will generate a nice little plot there and the wcss for our Y axis it’s always nice to give our plot a title and let’s see we’ll just give it the elbow method for the title and let’s get some labels so let’s go ahead and do PLT X label and what we’ll do we’ll do number of clusters for that and PLT y label and for that we can do oops there we go wcss since that’s what we’re doing on the plot on there and finally we want to go ahead and display our graph which is simply PLT do oops. show there we go and because we have it set to inline it’ll appear in line hopefully I didn’t make a type error on there and you you can see we get a very nice graph you can see a very nice elbow joint there at uh two and again right around three and four and then after that there’s not very much now as a data scientist if I was looking at this I would do either three or four and I’d actually try both of them to see what the um output look like and they’ve already tried this in the back so we’re just going to use three as a setup on here and let’s go ahead and see what that looks like when we actually use this to show the different kinds of cars and so let’s go ahead and apply the K means to the cars data set and basically we’re going to copy the code that we looped through up above where K means equals K means number of clusters and we’re just going to set that number of clusters to three since that’s what we’re going to look for you could do three and four on this and graph them just to see how they come up differently be kind of curious to look at that but for this we’re just going to set it to three go ahead and create our own variable y k means for our answers and we’re going to set that equal to whoops I double equal there to K means but we’re not going to do a fit we’re going to do a fit predict is the setup you want to use and when you’re using untrained models you’ll see um a slightly different usually you see fit and then you see just the predict but we want to both fit and predict the K means on this and that’s fitcore predict and then our capital x is the data we’re working with and before we plot this data we’re going to do a little pandas trick we’re going to take our x value and we’re going to set XS Matrix so we’re converting this into a nice rows and columns kind of set up but we want the we’re going to have columns equals none so it’s just going to be a matrix of data in here and let’s go ahead and run that a little warning you’ll see These Warnings pop up because things are always being updated so there’s like minor changes in the versions and future versions instead of Matrix now that it’s more common to set it values instead of doing as Matrix but M Matrix works just fine for right now and you’ll want to update that later on but let’s go ahead and dive in and plot this and see what that looks like and before we dive into plotting this data I always like to take a look and see what I am plotting so let’s take a look at y k means I’m just going to print that out down here and we see we have an array of answers we have 2 1 0 2 one two so it’s clustering these different rows of data based on the three different spaces it thinks it’s going to be and then let’s go ahead and print X and see what we have for x and we’ll see that X is an array it’s a matrix so we have our different values in the array and what we’re going to do it’s very hard to plot all the different values in the array so we’re only going to be looking at the first two or positions zero and one and if you were doing a full presentation in front of the board meeting you might actually do a little different and and dig a little deeper into the different aspects because this is all the different columns we looked at but we only look at columns one and two for this to make it easy so let’s go ahead and clear this data out of here and let’s bring up our plot and we’re going to do a scatter plot here so PLT scatter and this looks a little complicated so let’s explain what’s going on with this we’re going to take the X values and we’re only interested in y of K means equals zero the first cluster okay and then we’re going to take value zero for the x axis and then we’re going to do the same thing here we’re only interested in K means equals zero but we’re going to take the second column so we’re only looking at the first two columns in our answer or in the data and then the guys in the back played with this a little bit to make it pretty and they discovered that it looks good with has a size equals 100 that’s the size of the dots we’re going to use red for this one and when they were looking at the data and what came out it was definitely the Toyota on this we’re just going to go ahead and label it Toyota again that’s something you’d really have to explore in here as far as playing with those numbers and see what looks good we’ll go ahead and hit enter in there and I’m just going to paste in the next two lines which is the next two cars and this is our Nissa and and Honda and you’ll see with our scatter plot we’re now looking at where Yore K means equals 1 and we want the zero column and y k means equals 2 again we’re looking at just the first two columns zero and one and each of these rows then corresponds to Nissan and Honda and I’ll go ahead and hit enter on there and uh finally let’s take a look and put the centroids on there again we’re going to do a scatter plot and on the centroids you can just pull that from our K means the uh model we created do cluster centers and we’re going to just do um all of them in the first number and all of them in the second number which is 0o one because you always start with zero and one and then they were playing with the size and everything to make it look good we’ll do a size of 300 we’re going to make the color yellow and we’ll label them so it’s good to have some good labels centroids and then we do one do a title PLT title and pop up there PLT title you always make want to make your graphs look pretty we’ll call it clusters of car make and one of the features of the plot library is you can add a legend it’ll automatically bring in it since we’ve already labeled the different aspects of the legend with Toyota Nissan and Honda and finally we want to go ahead and show so we can actually see it it and remember it’s in line uh so if you’re using a different editor that’s not the Jupiter notebook you’ll get a popup of this and you should have a nice set of clusters here so we can look at this and we have a clusters of Honda and green Toyota and red Nissan and purple and you can see where they put the centroids to separate them now when we’re looking at this we can also plot a lot of other different data on here as far because we only looked at the first two columns this is just column one and two or 01 as you label them in computer scripting but you can see here we have a nice clusters of Carm make and we were able to pull out the data and you can see how just these two columns form very distinct clusters of data so if you were exploring new data you might take a look and say well what makes these different almost going in reverse you start looking at the data and pulling apart the columns to find out why is the first group set up the way it is maybe you’re doing loans and you want to go well why is this group not defaulting on their loans and why is the last group defaulting on their loans and why is the middle group 50% defaulting on their bank loans and you start finding ways to manipulate the data and pull out the answers you want so now that you’ve seen how to use K mean for clustering let’s move on to the next topic now let’s look into logistic regression the logistic regression algorithm is the simplest classification algorithm used for binary or multi classification problem s and we can see we have our little girl from Canada who’s into horror books is back that’s actually really scary when you think about that with those big eyes in the previous tutorial we learned about linear regression dependent and independent variables so to brush up y equals MX plus C very basic algebraic function of Y and X the dependent variable is the target class variable we are going to predict the independent variables X1 all way up to xn are the features or attributes we’re going to use to predict the target class we know what a linear regression looks like but using the graph we cannot divide the outcome into categories it’s really hard to categorize 1.5 3.6 9.8 uh for example a linear regression graph can tell us that with increase in number of hours studied the marks of a student will increase but it will not tell us whether the student will pass or not in such cases where we need the output as categorical value we will use logistic regression and for that we’re going to use the sigmoid function so you can see here we have our marks 0 to 100 number of hours studied that’s going to be what they’re comparing it to in this example and we usually form a line that says y = mx + C and when we use the sigmoid function we have P = 1 over 1 + eus y it generates a sigmoid curve and so you can see right here when you take the Ln which is the natural logarithm I always thought it should be NL not Ln that’s just the inverse of uh e your eus Y and so we do this we get Ln of p over 1 – p = m * x + C that’s the sigmoid curve function we’re looking for and we can zoom in on the function and you’ll see that the function as it deres goes to one or to zero depending on what your x value is and the probability if it’s greater than five the value is automatically rounded off to one indicating that the student will pass so if they’re doing a certain amount of studying they will probably pass then you have a threshold value at the 0 five it automatically puts that right in the middle usually and your probability if it’s less than 0. five the value rented off to zero indicating the student will fail so if they’re not studying very hard they’re probably going to fail this of course is ignoring the outliers of that one student who’s just a natural genius and doesn’t need any studying to memorize everything that’s not me unfortunately have to study hard to learn new stuff problem statement to classify whether a tumor is malignant or benign and this is actually one of my favorite data sets to play with because it has so many features and when you look at them you really are hard to understand you can’t just look at them and know the answer so it gives you a chance to kind of dive into what data looks like when you aren’t able to understand the specific domain of the data but I also want you to remind you that in the domain of medicine if I told told you that my probability was really good it classified things that say 90% or 95% and I’m classifying whether you’re going to have a malignant or a Bine tumor I’m guessing that you’re going to go get it tested anyways so you got to remember the domain we’re working with so why would you want to do that if you know you’re just going to go get a biopsy because you know it’s that serious this is like an all or nothing just referencing the domain it’s important it might help the doctor know where to look just by understanding what kind of tumor it is so it might help them or Aid them in something they missed from before so let’s go ahead and dive into the code and I’ll come back to the domain part of it in just a minute so use case and we’re going to do our noral Imports here we’re importing numy Panda Seaborn the matplot library and we’re going to do matplot library in line since I’m going to switch over to Anaconda so let’s go ahead and flip over there and get this started so I’ve opened up a new window in my anaconda Jupiter notebook by the way Jupiter notebook uh you don’t have to use Anaconda for the Jupiter notebook I just love the interface and all the tools in anac brings so we got our import numpy as in P for our numpy number array we have our pandas PD we’re going to bring in caborn to help us with our graphs as SNS so many really nice Tools in both caborn and matplot library and we’ll do our matplot library. pyplot as PLT and then of course we want to let it know to do it in line and let’s go and just run that so it’s all set up and we’re just going to call our data data not creative today uh equals PD and this happens to be in a CSV file so we’ll use a pd. read CSV and I happen to name the file or renamed it data for p2.png and let’s just um open up the data before we go any further and let’s just see what it looks like in a spreadsheet so when I pop it open in a local spreadsheet this is just a CSV file comma separate variables we have an ID so I guess they um categorizes for reference or what id which test was done the diagnosis M for malignant B for B9 so there’s two different options on there and that’s what we’re going to try to predict is the m and b and test it and then we have like the radius mean or average the texture average perimeter mean area mean smoothness I don’t know about you but unless you’re a doctor in the field most of the stuff I mean you can guess what concave means just by the term concave but I really wouldn’t know what that means in the measurements are taking so they have all kinds of stuff like how smooth it is uh the Symmetry and these are all float values we just page through them real quick and you’ll see there’s I believe 36 if I remember correctly in this one so there’s a lot of different values they take and all these measurements they take when they go in there and they take a look at the different growth the tumorous growth so back in our data and I put this in the same folder as a code so I saved this code in that folder obviously if you have it any a different location you want to put the full path in there and we’ll just do uh Panda’s first five lines of data with the data. head and we run that we can see that we have pretty much what we just looked at we have an ID we have a diagnosis if we go all the way across you’ll see all the
different columns coming across displayed nicely for our data and while we’re exploring the data our caborn which we referenced as SNS makes it very easy to go in here and do a joint plot you’ll notice the very similar to because it is sitting on top of the um plot Library so the joint plot does a lot of work for us and we’re just going to look at the first two two columns that we’re interested in the radius mean and the texture mean we’ll just look at those two columns and data equals data so that tells it which two columns we’re plotting and that we’re going to use the data that we pulled in let’s just run that and it generates a really nice graph on here and there’s all kinds of cool things on this graph to look at I mean we have the texture mean and the radius mean obviously the axes you can also see and one of the cool things on here is you can also see the histogram they show that for the radius mean where is the most common radius mean come up and where the most common texture is so we’re looking at the tech the on each growth its average texture and on each radius its average uh radius on there get a little confusing because we’re talking about the individual objects average and then we can also look over here and see the the histogram showing us the median or how common each measurement is and that’s only two columns so let’s dig a little deeper into Seaborn they also have a heat map and if you’re not familiar with heat Maps a heat map just means it’s in color that’s all that means heat map I guess the original ones were plotting heat density on something and so ever sens it’s just called a heat map and we’re going to take our data and get our corresponding numbers to put that into the heat map and that’s simply data. C RR for that that’s a panda expression remember we’re working in a pandas data frame so that’s one of the cool tools and pandas for our data and this’s just pull that information into a heat map and see what that looks like and you’ll see that we’re now looking at all the different features we have our ID we have our texture we have our area our compactness concave points and if you look down the middle of this chart diagonal going from the upper left to bottom right it’s all white that’s because when you compare texture to texture they’re identical so they’re 100% or in this case perfect one in their correspondence and you’ll see that when you look at say area or right below it it has almost a black on there when you compare it to texture so these have almost no corresponding data They Don’t Really form a linear graph or something that you can look at and say how connected they are they’re very scattered data this is really just a really nice graph to get a quick look at your data doesn’t so much change what you do but it changes verifying so when you get an answer or something like that or you start looking at some of these individual pieces you might go hey that doesn’t match according to showing our heat map this should not correlate with each other and if it is you’re going to have to start asking well why what’s going on what else is coming in there but it does show some really cool information on here me we can see from the ID there’s no real one feature that just says if you go across the top line that lights up there’s no one feature that says hey if the area is a certain size then it’s going to be B9 or malignant it says there’s some that sort of add up and that’s a big hint in the data that we’re trying to ID this whether it’s malignant or B9 that’s a big hint to us as data scientist to go okay we can’t solve this with any one feature it’s going to be something that includes all the features or many of the different features to come up with the solution for it and while we’re exploring the data let’s explore one more area and let’s look at data. isnull we want to check for null values in our data if you remember from earlier in this tutorial we did it a little differently where we added stuff up and summ them up you can actually with pandas do it really quickly data. is null and Summit and it’s going to go across all the columns so when I run this you’re going to see all the columns come up with no null data so we’ve just just to reash these last few steps we’ve done a lot of explor ation we have looked at the first two columns and seen how they plot with the caborn with a joint plot which shows both the histogram and the data plotted on the XY coordinates and obviously you can do that more in detail with different columns and see how they plot together and then we took and did the Seaborn heat map the SNS do heat map of the data and you can see right here where it did a nice job showing us some bright spots where stuff correlates with each other and forms a very nice combination or points of scattering points and you can also see areas that don’t and then finally we went ahead and checked the data is the data null value do we have any missing data in there very important step because it’ll crash later on if you forget to do this St it will remind you when you get that nice error code that says null values okay so not a big deal if you miss it but it it’s no fun having to go back when you’re you’re in a huge process and you’ve missed this step and now you’re 10 steps later and you got to go remember where you were pulling the data in so we need to go ahead and pull out our X and our y so we just put that down here and we’ll set the x equal to and there’s a lot of different options here certainly we could do x equals all the columns except for the first two because if you remember the first two is the ID and the diagnosis so that certainly would be an option but what we’re going to do is we’re actually going to focus on on the worst the worst radius the worst texture parameter area smoothness compactness and so on one of the reasons to start dividing your data up when you’re looking at this information is sometimes the data will be the same data coming in so if I have two measurements coming into my model it might overweigh them it might overpower the other measurements because it’s measur it’s basically taking that information in twice that’s a little bit past the scope of this tutorial I want you to take away from this though is that we are dividing the data up into pieces and our team in the back went ahead and said hey let’s just look at the worst so I’m going to create a an array and you’ll see this array radius worst texture worst perimeter worst we’ve just taken the worst of the worst and I’m just going to put that in my X so this x is still a panda data frame but it’s just those columns and our y if you remember correctly is going to be oops hold on one second it’s not X it’s data there we go so x equals data and then it’s a list of the different columns the worst of the worst and if we’re going to take that then we have to have our answer for our Y for the stuff we know and if you remember correctly we’re just going to be looking at the diagnosis that’s all we care about is what is it diagnosed is it Bine or malignant and since it’s a single column we can just do diagnosis oh I forgot to put the brackets the there we go okay so it’s just diagnosis on there and we can also real quickly do like x. head if you want to see what that looks like and Y do head and run this and you’ll see um it only does the last one I forgot about that if you don’t do print you can see that the the y. head is just Mmm because the first ones are all malignant and if I run this the x. head is just the first five values of radius worst texture worst parameter worst area worst and so on I’ll go ahead and take that out so moving down to the next step we’ve built our two data sets our answer and then the features we want to look at in data science it’s very important to test your model so we do that by splitting the data and from sklearn model selection we’re going to import train test split so we’re going to split it into two groups there are so many ways to do this I noticed in one of the more modern ways to actually split it into three groups and then you model each group and test it against the other groups so you have all kinds of and there’s reasons for that which is pass the scope of this and for this particular example isn’t necessary for this we’re just going to split it into two groups one to train our data and one to test our data and the sklearn uh. model selection we have train test split you could write your own quick code to do this where you just randomly divide the data up into two groups but they do it for us nicely and we actually can almost we can actually do it in one statement with this where we’re going to generate four variables capital x train capital X test so we have our training data we’re going to use to fit the model and then we need something to test it and then we have our y train so we’re going to train the answer and then we have our test so this is the stuff we want to see how good it did on our model and we’ll go ahead and take our train test split that we just imported and we’re going to do X and our y our two different data that’s going in for our split and then the guys in the back came up and wanted us to go ahead and use a test size equals. 3 that’s testore size random State it’s always nice to kind of switch a random State around but not that important what this means is that the test size is we’re going to take 30% of the data and we’re going to put that into our test variables our y test and our X test and we’re going to do 70% into the X train and the Y train so we’re going to use 70% of the data to train our model and 30% to test it let’s go ahead and run that and load those up so now we have all our stuff split up and all our data ready to go now we get to the actual Logistics part we’re actually going to do our create our model so let’s go ahead and bring that in from sklearn we’re going to bring in our linear model and we’re going to import logistic regression that’s the actual model we’re using and this we’ll call it log model o the real model and let’s just set this equal to our logistic regression that we just imported so now we have a variable log model set to that class for us to use and with most the uh models in the sklearn we just need to go ahead and fix it fit do a fit on there and we use our X train that we separated out with our y train and let’s go ahead and run this so once we’ve run this we’ll have a model that fits this data that’s 70% of our training data uh and of course it prints this out that tells us all the different variables that you can set on there there’s a lot of different choices you can make but for word do we’re just going to let all the defaults sit we don’t really need to mess with those on this particular example and there’s nothing in here that really stands out as super important until you start fine-tuning it but for what we’re doing the basics will work just fine and then let’s we need to go ahead and test out our model is it working so let’s create a VAR variable y predict and this is going to be equal to our log model and we want to do a predict again very standard format for the sklearn library is taking your model and doing a predict on it and we’re going to test why predict against the Y test so we want to know what the model thinks it’s going to be that’s what our y predict is and with that we want the capital x x test so we have our train set and our test set and now we’re going to do our y predict and let’s go ahead and run that and if we uh print y predict let me go ahead and run that you’ll see it comes up and it PRS a prints a nice array of uh B and M for B9 and malignant for all the different test data we put in there so it does pretty good we’re not sure exactly how good it does but we can see that it actually works and it’s functional was very easy to create you’ll always discover with our data science that as you explore this you spend a significant amount of time prepping your data and making sure your data coming in is good uh there’s a saying good data in good answers out bad data in bad answers out that’s only half the thing that’s only half of it selecting your models becomes the next part as far as how good your models are and then of course fine-tuning it depending on what model you’re using so we come in here we want to know how good this came out so we have our y predict here log model. predict X test so for deciding how good our model is we’re going to go from the SK learn. metrics we’re going to import classification report and that just reports how good our model is doing and then we’re going to feed it the model data and let’s just print this out and we’ll take our classification report and we’re going to put into there our test our actual data so this is what we actually know is true and our prediction what our model predicted for that data on the test side and let’s run that and see what that does so we pull that up you’ll see that we have um a Precision for B9 and malignant B&M and we have a Precision of 93 and 91 a total of 92 so it’s kind of the average between these two of 9 two there’s all kinds of different information on here your F1 score your recall your support coming through on this and for this I’ll go ahead and just flip back to our slides that they put together for describing it and so here we’re going to look at the Precision using the classification report and you see this is the same print out I had up above some of the numbers might be different because it does randomly pick out which data we’re using so this model is able to predict the type of tumor with 91 %c accuracy so when we look back here that’s you will see where we have uh B9 in mland it actually is 92 coming up here but we’re looking about a 92 91% precision and remember I reminded you about domain so we’re talking about the domain of a medical domain with a very catastrophic outcome you know at 91 or 92% Precision you’re still going to go in there and have somebody do a biopsy on it very different than if you’re investing money and there’s a 92% chance you’re going to earn 10% and 8% chance you’re going to lose 8% you’re probably going to bet the money because at that odds it’s pretty good that you’ll make some money and in the long run you do that enough you definitely will make money and also with this domain I’ve actually seen them use this to identify different forms of cancer that’s one of the things that they’re starting to use these models for because then it helps the doctor know what to investigate so that wraps up this section we’re finally we’re going to go in there and let’s discuss the ANW to the quiz asked in machine learning tutorial part one can you tell what’s happening in the following cases grouping documents into different categories based on the topic and content of each document this is an example of clustering where K means clustering can be used to group the documents by topics using bag of words approach so if You’ gotten in there that you’re looking for clustering and hopefully you had at least one or two examples like K means that are used for clustering different things then give yourself a two thumbs up B identify handwritten digits in images correctly this is an example of classification the traditional approach to solving this would be to extract digit dependent features like curvature of different digits Etc and then use a classifier like svm to distinguish between images again if you got the fact that it’s a classification example give yourself a thumb up and if you’re able to go hey let’s use svm or another model for this give yourself those two thumbs up on it C behavior of a website indicting that the site is not working as designed this is an example of anomaly detection in this case the algorithm learns what is normal and what is not normal usually by observing the logs of the website give yourself a thumbs up if you got that one and just for a bonus can you think of another example of anomaly detection one of the ones I use for my own business is detecting anomalies in stock markets stock markets are very ficked and they behave very radical so finding those erratic areas and then find finding ways to track down why they’re erratic was something released in social media was something released you can see we’re knowing where that anomaly is can help you to figure out what the answer is to it in another area D predicting salary of an individual based on his or her years of experience this is an example of regression this problem can be mathematically defined as a function between independent years of experience and dependent variables salary of an individual and if you guess that this was a regression model give yourself a thumbs up and if you’re able to remember that it was between independent and dependent variables and that terms give yourself two thumbs up summary so to wrap it up we went over what is K means and we went through also the chart of choosing your elbow method and assigning a random centroid to the Clusters Computing the distance and then going in there and figuring out what the minimum centroids is and Computing the distance and going through that Loop until it gets the perfect C and we looked into the elbow method to choose K based on running our clusters across a number of variables and finding the best location for that we did a nice example of clustering cars with K means even though we only looked at the first two columns to make it simple and easy to graph we can easily extrapolate that and look at all the different columns and see how they all fit together and we looked at what is logistic regression we discussed the sigmoid function what is logistic regression and then we went into an example of class ifying tumors with Logistics I hope you enjoyed part two of machine learning thank you for joining us today for more information visit http://www.s simplylearn outcom again my name is Richard kersner a member of the simplylearn team get certified get ahead if you have any questions or comments feel free to write those down below the YouTube video or visit us at simply learn.com we’ll be happy to supply you with the data sets or other information as requested [Music] hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to nerd up and get certified click here today we’re going to cover the K nearest neighbors a l referred to as knnn and KNN is really a fundamental place to start in the machine learning it’s a basis of a lot of other things and just the logic behind it is easy to understand and Incorporated in other forms of machine learning so today what’s in it for you why do we need KNN what is KNN how do we choose the factor K when do we use knnn how does KNN algorithm work and then we’ll dive into my favorite part the use case predict whether a person will have diabetes or not that is a very common and popular used data set as far as testing out models and learning how to use the different models in machine learning by now we all know Ma machine learning models make predictions by learning from the past data available so we have our input values our machine learning model Builds on those inputs of what we already know and then we use that to create a predicted output is that a dog little kid looking over there and watching the black cat cross their path no dear you can differentiate between a cat and a dog based on their characteristics cats cats have sharp claws uses to climb smaller length of ears meows and purs doesn’t love to play around dogs they have dle claws bigger length of ears barks loves to run around you usually don’t see a cat running around people although I do have a cat that does that where dogs do and we can look at these we can say we can evaluate their sharpness of the claws how sharp are their claws and we can evaluate the length of the ears and we can usually sort out cats from dogs based on even those two characteristics now tell me if it is a cat or a dog not question usually little kids no cats and dogs by now unless they live a place where there’s not many cats or dogs so if we look at the sharpness of the claws the length of the ears and we can see that the cat has smaller ears and sharper claws than the other animals its features are more like cats it must be a cat sharp claws length of ears and it goes in the cat group because KNN is based on feature similarity we can do classification using KNN classifier so we have our input value the picture of the black cat goes into our trained model and it predicts that this is a cat coming out so what is KNN what is the KNN algorithm K nearest neighbors is what that stands for it’s one of the simplest supervised machine learning algorithms mostly used for classification so we want to know is this a dog or it’s not a dog is it a cat or not a cat it classifies a data point based on how its neighbors are classified KNN stores all available cases and classifies new cases based on a similarity measure and here we gone from cats and dogs right into wine another favorite of mine KNN stores all available cases and classifies new cases based on a similarity measure and here you see we have a measurement of sulfur dioxide versus the chloride level and then the different wines they’ve tested and where they fall on that graph based on how much sulfur dioxide and how much chloride K and KNN is a perimeter that refers to the number of nearest neighbors to include in the majority of the voting process and so if we add a new glass of wine there red or white we want to know what the neighbors are in this case we’re going to put k equals 5 we’ll talk about K in just a minute a data point is classified by the majority of votes from its five nearest neighbors here the unknown point would be classified as red since four out of five neighbors are red so how do we choose K how do we know k equals five I mean that’s was the value we put in there so we’re going to talk about it how do we choose a factor K KN andn algorithm is based on feature similarity choose the right value of K is a process called parameter tuning and is important for better accuracy so at k equals 3 we can classify we have a question mark in the middle as either a as a square or not is it a square or is it in this case a triangle and so if we set k equals to 3 we’re going to look at the three nearest neighbors we’re going to say this is a square and if we put k equals to 7 we classify as a triangle depending on what the other data is around and you can see as the K changes depending on where that point is is that drastically changes your answer and uh we jump here we go how do we choose the factor of K you’ll find this in all machine learning choosing these factors that’s the face you get it’s like oh my gosh did I choose the right K did I set it right my values in whatever machine learning tool you’re looking at so that you don’t have a huge bias in One Direction or the other and in terms of knnn the number of K if you choose it too low the bias is based on it’s just too noisy it’s it’s right next to a couple things and it’s going to pick those things and you might get a skewed answer and if your K is too big then it’s going to take forever to process so you’re going to run into processing issues and resource issues so what we do the most common use and there’s other options for choosing K is to use the square root of n so N is a total number of values you have you take the square root of it in most cases you also if it’s an even number so if you’re using uh like in this case squares and triangles if it’s even you want to make your K value odd that helps it select better so in other words you’re not going to have a balance between two different factors that are equal so usually take the square root of N and if it’s even you add one to it or subtract one from it and that’s where you get the K value from that is the most common use and it’s pretty solid it works very well when do we use KNN we can use KNN when data is labeled so you need a label on it we know we have a group of pictures with dogs dogs cats cats data is Noise free and so you can see here here when we have a class and we have like underweight 140 23 Hello Kitty normal that’s pretty confusing we have a high variety of data coming in so it’s very noisy and that would cause an issue data set is small so we’re usually working with smaller data sets where I you might get into gig of data if it’s really clean doesn’t have a lot of noise because KNN is a lazy learner I.E it doesn’t learn a discriminative function from the training set so it’s very lazy so if you have very complicated data and you have a large amount of it you’re not going to use the KNN but it’s really great to get a place to start even with large data you can sort out a small sample and get an idea of what that looks like using the KNN and also just using for smaller data sets KNN works really good how does the KNN algorithm work consider a data set having two variables height in centimeters and weight in kilograms and each point is classified as normal or underweight so we can see right here we have two variables you know true false they’re either normal or they’re not their underweight on the basis of the given data we have to classify the below set as normal or underweight using KNN so if we have new data coming in that says 57 kilg and 177 cm is that going to be normal or underweight to find the nearest neighbors will calculate the ukian distance according to the ukan distance formula the distance between two points in the plane with the coordinates XY and ab is given by distance D equals the square Ro T of x – a^ 2 + y – b^ 2 and you can remember that from the two edges of a triangle we’re Computing the third Edge since we know the X side and the yide let’s calculate it to understand clearly so we have our unknown point and we placed it there in red and we have our other points where the data is scattered around the distance D1 is a square root of 170 – 167 2 + 57 – 51 2ar which is about 6. 7 and distance 2 is about 13 and distance three is about 13.4 similarly we will calculate the ukian distance of unknown data point from all the points in the data set and because we’re dealing with small amount of data that’s not that hard to do and it’s actually pretty quick for a computer and it’s not a really complicated Mass you can just see how close is the data based on the ukian distance hence we have calculated the ukian distance of unknown data point from all the points as shown where X1 and y1 equal 57 and 170 whose class we have to classify so now we’re looking at that we’re saying well here’s the ukian distance who’s going to be their closest neighbors now let’s calculate the nearest neighbor at k equals 3 and we can see the three closest neighbors puts them at normal and that’s pretty self-evident when you look at this graph it’s pretty easy to say okay what you know we’re just voting normal normal normal three votes for normal this is going to be a normal weight so majority of neighbors are pointing towards normal hence as per KNN algorithm the class of 57170 should be normal so recap of knnn positive integer K is specified along with a new sample we select the K entries in our database which are closest to the new sample we find the most common classification of these entries this is the classification we give to the new sample so as you can see it’s pretty straightforward we’re just looking for the closest things that match what we got so let’s take a look and see what that looks like in a use case in Python so let’s dive into the predict diabetes use case case so use case predict diabetes the objective predict whether a person will be diagnosed with diabetes or not we have a data set of 768 people who were or were not diagnosed with diabetes and let’s go ahead and open that file and just take a look at that data and this is in a simple spreadsheet format the data itself is comma separated very common set of data and it’s also a very common way to get the data and you can see here we have columns a through I that’s what 1 2 3 4 5 6 7 eight um eight columns with a particular tribute and then the ninth colum which is the outcome is whether they have diabetes as a data scientist the first thing you should be looking at is insulin well you know if someone has insulin they have diabetes CU that’s why they’re taking it and that could cause issue on some of the machine learning packages but for a very basic setup this works fine for doing the KNN and the next thing you notice is it didn’t take very much to open it up um I can scroll down to the bottom of the data there’s 768 it’s pretty much a small data set you know at 769 I can easily fit this into my ram on my computer I can look at it I can manipulate it and it’s not going to really tax just a regular desktop computer you don’t even need an Enterprise version to run a lot of this so let’s start with importing all the tools we need and before that of course we need to discuss what IDE I’m using certainly you can use any particular editor for python but I like to use for doing uh very basic visual stuff the Anaconda which is great for doing demos with the Jupiter notebook and just a quick view of the Anaconda Navigator which is the new release out there which is really nice you can see under home I can choose my application we’re going to be using python 36 I have a couple different uh versions on this particular machine if I go under environments I can create a unique environment for each one which is nice and there’s even a little button there where I can install different packages so if I click on that button and open the terminal I can then use a simple pip install to install different packages I’m working with let’s go ahead and go back under home and we’re going to launch our notebook and I’ve already you know kind of like uh the old cooking shows I’ve already prepared a lot of my stuff so we don’t have to wait for it to launch because it takes a few minutes for it to open up a browser window in this case I’m going it’s going to open up Chrome because that’s my default that I use and since the script is pre-done you’ll see I have a number of windows open up at the top the one we’re working in and uh since we’re working on the KNN predict whether a person will have diabetes or not let’s go and put that title in there and I’m also going to go up here and click on Cell actually we want to go ahead and first insert a cell below and then I’m going to go back up to the top cell and I’m going to change the cell type to markdown that means this is not going to run as python it’s a markdown language so if I run this first one it comes up in nice big letters which is kind of nice remind us what we’re working on and by now you should be familiar with doing all of our Imports we’re going to import the pandas as PD import numpy as in P pandas is the pandas data frame and numpy is a number array very powerful tools to use in here so we have our Imports so we’ve brought in our pandas our numpy our two general python tools and then you can see over here we have our train test split by now youed should be familiar with splitting the data we want to split part of it for training our thing and then training our particular model and then we want to go ahead and test the remaining data just see how good it is pre-processing a standard scaler pre-process accessor so we don’t have a bias of really large numbers remember in the data we had like number pregnancies isn’t going to get very large where the amount of insulin they take and get up to 256 so 256 versus 6 that will skew results so we want to go ahead and change that so that they’re all uniform between minus one and one and then the actual tool this is the K neighbors classifier we’re going to use and finally the last three are three tools to test all about testing our model how good is it let me just put down test on there and we have our confusion Matrix our F1 score and our accuracy so we have our two general python modules we’re importing and then we have our six module specific from the sklearn setup and then we do need to go ahead and run this so these are actually imported there we go and then move on to the next step and so in this set we’re going to go ahead and load the database we’re going to use pandas remember pandas is PD and we’ll take a look at the data in Python we looked at it in a simple spread sheet but usually I like to also pull it up so that we can see what we’re doing so here’s our data set equals pd. read CSV that’s a pandas command and the diabetes folder I just put in the same folder where my IPython script is if you put in a different folder you’d need the full length on there we can also do a quick length of uh the data set that is a simple python command Len for length we might even let’s go ahead and print that we’ll go print and if you do it on its own line link. data set in the jupyter notebook it’ll automatically print it but when you’re in most of your different setups you want to do the print in front of there and then we want to take a look at the actual data set and since we’re in pandas we can simply do data set head and again let’s go ahead and add the print in there if you put a bunch of these in a row you know the data set one head data set two head it only prints out the last one so I ually always like to keep the print statement in there but because most projects only use one data frame Panda data frame doing it this way doesn’t really matter the other way works just fine and you can see when we hit the Run button we have the 768 lines which we knew and we have our pregnancies it’s automatically given a label on the left remember the head only shows the first five lines so we have zero through four and just a quick look at the data you can see it matches what we looked at before we have pregnancy glucose blood pressure all the way to Ag and then the outcome on the end and we’re going to do a couple things in this next step we’re going to create a list of columns where we can’t have zero there’s no such thing as zero skin thickness or zero blood pressure zero glucose uh any of those you’d be dead so not a really good Factor if they don’t if they have a zero in there because they didn’t have the data and we’ll take a look at that CU we’re going to start replacing that information with a couple of different things and let’s see what that looks like so first we create a nice list as you can see we have the values talked about glucose blood pressure skin thickness uh and this is a nice way when you’re working with columns is to list the columns you need to do some kind of transformation on very common thing to do and then for this particular setup we certainly could use the there’s some Panda tools that will do a lot of this where we can replace the na but we’re going to go ahead and do it as a data set column equals data set column. replace this is this is still pandas you can do a direct there’s also one that that you look for your n a lot of different options in here but the N nump Nan is what that’s stands for is is non doesn’t exist so the first thing we’re doing here is we’re replacing the zero with a nump none there’s no data there that’s what that says that’s what this is saying right here so put the zero in and we’re going to replace zeros with no data so if it’s a zero that means the person’s well hopefully not dead hopefully it just didn’t get the data the next thing we want to do is we’re going to create the mean which is the in integer from the data set from the column do mean where we skip Nas we can do that that is a panda’s command there the skip na so we’re going to figure out the mean of that data set and then we’re going to take that data set column and we’re going to replace all the npn with the means why did we do that and we could have actually just uh taken this step and gone right down here and just replace zero and Skip anything where except you could actually there’s a way to skip zeros and then just replace all the zeros but in this case we want to go ahead and do it this way so you can see that we’re switching this to a non-existent value then we’re going to create the mean well this is the average person so if we don’t know what it is if they did not get the data and the data is missing one of the tricks is you replace it with the average what is the most common data for that this way you can still use the rest of those values to do your computation and it kind of just brings that particular value or those missing values out of the equation let’s go ahead and take this and we’ll go ahead and run it doesn’t actually do anything so we’re still preparing our data if you want to see what that looks like like we don’t have anything in the first few lines so it’s not going to show up but we certainly could look at a row let’s do that let’s go into our data set with a print a data set and let’s pick in this case let’s just do glucose and if I run this this is going to print all the different glucose levels going down and we thankfully don’t see anything in here that looks like missing data at least on the ones it shows you can see it skipped a bunch in the middle CU that’s what it does if you have too many lines in Jupiter notebook it’ll skip a few and and go on to the next next in a data set let me go and remove this and we’ll just zero out that and of course before we do any processing before proceeding any further we need to split the data set into our train and testing data that way we have something to train it with and something to test it on and you’re going to notice we did a little something here with the uh Panda database code there we go my drawing tool we’ve added in this right here off the data set and what this says is that the first one in pandas this is from the PD pandas it’s going to say within the data set we want to look at the iocation and it is all rows that’s what that says so we’re going to keep all the rows but we’re only looking at zero column 0 to 8 remember column 9 here it is right up here we printed in here is outcome well that’s not part of the training data that’s part of the answer yes column nine but it’s listed as eight number eight so 0er to eight is nine columns so uh eight is the value and when you see it in here zero this is actually 0 to 7 it doesn’t include the last one and then we we go down here to Y which is our answer and we want just the last one just column 8 and you can do it this way with this particular notation and then if you remember we imported the train test split that’s part of the SK learn right there and we simply put in our X and our y we’re going to do random State equals zero you don’t have to necessarily seed it that’s a seed number I think the default is one when you seed it I’d have to look that up and then the test size test size is 0.2 that simply means we’re going to take 20% of the data and put it aside so that we can test it later that’s all that is and again we’re going to run it not very exciting so far we haven’t had any print out other than to look at the data but that is a lot of this is prepping this data once you prep it the actual lines of code are quick and easy and we’re almost there with the actual writing of our KNN we need to go ahead and do a scale the data if you remember correctly we’re fitting the data in a standard scaler which means instead of the data being from you know 5 to 303 in one column and the next column is 1 to six we’re going to set that all so that all the data is between minus one and one that’s what that standard scaler does keeps it standardized and we only want to fit the scaler with the training set but we want to make sure the testing set is the X test going in is also transformed so it’s processing it the same so here we go with our standard scaler we’re going to call it scor X for the scaler and we’re going to import the standard scalar into this variable and then our X train equals scor x. fit transform so we’re creating the scaler on the XT train variable and then our X test we’re also going to transform it so we’ve trained and transformed the X train and then the X test isn’t part of that training it isn’t part of the of training the Transformer it just gets transformed that’s all it does and again we’re going to go and run this and if you look at this we’ve now gone through these steps all three of them we’ve taken care of replacing our Z for key columns that shouldn’t be zero and we replace that with the means of those columns that way that they fit right in with our data models we’ve come down here and we split the data so now we have our test data and our training data and then we’ve taken and we scaled the data so all of our data going in now no we don’t tra we don’t train the Y part the Y train and Y test that never has to be trained it’s only the data going in that’s what we want to train in there then Define the model using K neighbors classifier and fit the train data in the model so we do all that data prep and you can see down here we’re only going to have a couple lines of code where we’re actually building our model and training it that’s one of the cool things about Python and how far we’ve come it’s such an exciting time to be in machine learning because there’s so many automated tools let’s see before we do this let’s do a quick length of and let’s do y we want let’s just do length of Y and we get 7 68 and if we import math we do math. square root let’s do y train there we go it’s actually supposed to be XT train before we do this let’s go ahead and do import math and do math square root length of Y test and when I run that we get 12.49 I want to see show you where this number comes from we’re about to use 12 is an even number so if you know if you’re ever voting on things remember the neighbors all vote don’t want to have even number of neighbors voting so we want to do something odd and let’s just take one away we’ll make it 11 let me delete this out of here that’s one of the reasons I love Jupiter notebook because you can flip around and do all kinds of things on the fly so we’ll go ahead and put in our classifier we’re creating our classifier now and it’s going to be the K neighbors classifier n neighbors equal 11 remember we did 12 minus 1 for 11 so we have an odd number of neighbors P equal 2 because we’re looking for is it are they diabetic or not and we’re using the ukian metric there are other means of measuring the distance you could do like square square means value there’s all kinds of measure this but the ukian is the most common one and it works quite well it’s important to evaluate the model let’s use the confusion Matrix to do that and we’re going to use the confusion Matrix wonderful tool and then we’ll jump into the F1 score and finally accuracy score which is probably the most commonly used quoted number when you go into a meeting or something like that so let’s go ahead and paste that in there and we’ll set the cm equal to confusion Matrix y test y predict so those are the two values we’re going to put in there and let me go aead and run that and print it out and the way you interpret this is you have the Y predicted which would be your title up here we could do uh let’s just do p predicted across the top and actual going down actual it’s always hard to to write in here actual that means that this column here down the middle that’s the important column and it means that our prediction said 94 and prediction and the actual agreed on 94 and 32 this number here the 13 and the 15 those are what was wrong so you could have like three different if you’re looking at this across three different variables instead of just two you’d end up with the third row down here and the column going down the middle so in the first case we have the the and I believe the zero is a 94 people who don’t have diabetes the prediction said that 13 of those people did have diabetes and were at high risk and the 32 that had diabetes it had correct but our prediction said another 15 out of that 15 it classified as incorrect so you can see where that classification comes in and how that works on the confusion Matrix then we’re going to go ahead and print the F1 score let me just run that and you see we get a 69 in our F1 score the F1 takes into account both sides of the balance of false positives where if we go ahead and just do the accuracy account and that’s what most people think of is it looks at just how many we got right out of how many we got wrong so a lot of people when you’re data scientist and you’re talking to other data scientists they’re going to ask you what the F1 score the F score is if you’re talking to the general public or the U decision makers in the business they’re going to ask what the accuracy is and the accuracy is always better than the the F1 score but the F1 score is more telling it lets us know that there’s more false positives than we would like on here but 82% not too bad for a quick flash look at people’s different statistics and running an sklearn and running the knnn the K nearest neighbor on it so we have created a model using KNN which can predict whether a person will have diabetes or not or at the very least whether they should go get a checkup and have their glucose checked regularly or not the print accur score we got the 0818 was pretty close to what we got and we can pretty much round that off and just say we have an accuracy of 80% tells it is a pretty fair fit in the model so what is deep learning deep learning is a subset of machine learning which itself is a branch of artificial intelligence unlike traditional machine learning models which require manual feature extraction deep learning models automatically discovers representation from raw data so this is made possible through neural networks particularly deep neural networks which consist of multiple layers of interconnected nodes so these neural network are inspired by the structure and the function of human brain each layer in the network transform the input data into more abstract and composite representation for instance in image recognition the initial layer might detect simple features like edges and textures while the deeper layer recognizes more complex structure like shapes and objects so one of the key advantage of deep learning is its ability to handle large amount of unstructured data such as images audios and text making it extremely powerful for various application so stay tuned as we delve deeper into how these neural networks are trained the types of deep learning models and some exciting application that are shaping our future types of deep learning deep learning AI can be applied supervised unsupervised and reinforcement machine learning using various methods for each the first one supervised machine learning in supervised learning the neural network learns to make prediction or classify that data using label data sets both input features and Target variables are provided and the network learns by minimizing the error between its prediction and the actual targets a process called back propagation CNN and RNN are the common deep learning algorithms used for tasks like image classification sentiment analysis and language translation the second one unsupervised machine learning in unsupervised machine learning the neural network discovers Ms or cluster in unlabeled data sets without Target variables it identifies hidden pattern or relationship within the data algorithms like Auto encoders and generative models are used for tasks such as clustering dimensionality reduction and anomaly detection the third one reinforcement machine learning in this an agent learns to make decision in an environment to maximize a reward signal the agent takes action observes the records and learns policies to maximize cumulative rewards over time deep reinforement learning algorithms like deep Q networks and deep deterministic poly gradient are used for tasks such as Robotics and game playay moving forward let’s see what are the artificial neural networks artificial neural networks Ann inspired by the structure and the function of human neurons consist of interconnected layers of artificial neurals or units the input layer receives data from the external resources and it passes to one or more hidden layers each neuron in these layers computes a weighted sum of inputs and transfers the result to the next layer during training the weight of these connection are adjusted to optimize the Network’s performance a fully connected artificial neural network includes an input layer or more hidden layers and an output layer each neuron in a hidden layer receives input from the previous layer and sends its output to the next layer so this process continues until the final output layer produce the network response so moving forward let’s see types of neural networks so deep learning models can automatically learn feature from data making them ideal to tasks like image recognition speech recognition and natural language processing so the most common architecture and deep learnings are the first one feed foral neural network fnn so these are the simplest type of neural network where information flows linearly from the input to the output they are widely used for tasks such as image classification speech recognition and natural language processing NLP the second one convolutional neural network designed specifically for image and video recognition CNN automatically learn feature from images making them ideal for image classification object detection and image segmentation the third one recurrent neural networks RNN are specialized for processing sequential data time series and natural language they maintain and internal state to capture information from previous input making them suitable for task such as spe recognition NLP and language translation so now let’s move forward and see some deep learning application the first one is autonomous vehicle deep learning is changing the development of self-driving car algorithms like CNS process data from sensors and cameras to detect object recognize traffic signs and make driving decision in real time enhancing safety and efficiency on the road the second one is Healthcare diagnostic deep learning models are being used to analyze medic I images such as x-rays MRIs and CT scans with high accuracy they help in early detection and diagnosis of diseases like cancer improving treatment outcomes and saving lives the third one is NLP recent advancement in NLP powered by Deep learning models like Transformer chat GPT have led to more sophisticated and humanik text generation translation and sentiment analysis so application include virtual assistant chat Bots and automated customer service the fourth one def technology so deep learning techniques are used to create highly realistic synthetic media known as def fakes while this technology has entertainment and creative application it also raises ethical concern regarding misinformation and digital manipulation the fifth one predictive maintenance in Industries like manufacturing and anation deep learning models predict equipment failures before they occur by analyzing sensor data the proactive approach reduces downtime lowers maintenance cost and improves operational efficiency so now let’s move forward and see some advantages and disadvantages of deep learning so first one is high computational requirements so deep learning requires significant data and computational resources for training whereas Advantage is high accuracy achieves a state-of-the-art performance in tasks like image recognition and natural language processing whereas deep learning needs large label data sets often require extensive label data set for training which can be costly and time consuming together so second advantage of deep learning is automated feature engineering automatically discovers and learn relevant features from data without manual intervention the third disadvantage is overfitting so deep learning can overfit to training data leading to poor performance on new unseen data whereas the third deep learning Advantage is scalability so de learning can handle large complex data set and learn from massive amount of data so in conclusion deep learning is a transformative leap in AI mimicking human neural networks it has changed healthare Finance autonomous vehicles and NLP today we’ll take you through the exciting road map of becoming an AI engineer if our content picks your interest and helps feel your curiosity don’t forget to subscribe to our Channel hit that Bell icon so you never miss an update now let’s embark on this AI journey together as artificial intelligence continues to revolutionalize various Industries AI Engineers stand at the Forefront of this technological wave these professionals are essential in crafting intelligent systems that address complex business challenges AI projects often stumble due to poor planning Sapar architecture or scalability issues AI Engineers P crucial role in overcoming these hurdles by merging Cutting Edge AI Technologies with strategic M insights so in this video we’ll guide you through the essentials of becoming an AI engineer let’s start with the basics what does Ani engineer do an AI engineer builds AI models using machine learning algorithms and deep learning neural networks these models are pivotal in generating business insights that influence organizational decision making from developing applications that leverage sentiment analysis for contextual advertising to creating systems for visual recognition and language translation the scope of an AI engineer’s work is vast and impactful so to succeed as an AI engineer you need a blend of technical progress and soft skills so now let’s break down this eth month plan month one computer science fundamentals and beginners python so before we delve into AI it’s crucial to establish a strong foundation in computer science this month you should focus on the following topics data representation understanding bits and bytes how text and numbers are stored and the binary number system is foundational for everything in Computing this knowledge helps in comprehending how computers interpret and process data now next comes computer networks learn the basics of computer networks including IP addresses and internet routing protocols it’s essential to understand how data travels across networks using UDP TCP and HTTP which form the backbone of the internet and the worldwide web next comes programming Basics begin with the basics of programming like variables strings numbers conditionals loops and algorithm Basics these fundamentals will allow you to write and understand simple programs simultaneously you’ll also start with python the preferred language for AI so learn about variables numbers strings lists dictionaries sets tuples and control structures like if conditionals and for loops and then move on to functions and modules understand how to create functions including Lambda functions and work with modules by using pip install to add functionality to your projects next comes file handling and exceptions you should also practice reading from and writing to files as well as handling exceptions to make your programs more robust finally graas the basics of classes and objects which are crucial for writing organized and efficient code so this comprehensive overview sets the stage for more complex programming tasks that you’ll encounter in the following months now in month two you’ll move on to data structure algorithms and advanced python so building on the foundations from month one we’ll now delve into data structure and algorithm so familiarize yourself with the concept of bigo notation to understand the efficiency of different algorithms and data structures learn about arays link list hash tables TXS cues trees and graphs mastering these structures will allow you to store and manipulate data effectively now next comes algorithms you should EXP algorithms such as binary search bubble sort quick sort mer sort and recursion these are essential for optimizing your code and parall you’ll Advance your python skills so you can dive into inheritance generators iterations list comprehensions decorators multi-threading and multiprocessing these topics will enable you to write more efficient and scalable code so this month’s learning prepares you to handle complex data operations and enhance your coding efficiency now in month three you’ll move on to Version Control SQL and data manipulation so in the third month the focus shifts to collaboration and data management number one Version Control so understand the importance of Version Control Systems especially get and GitHub so learn basic commands such as ADD commit and push you should also learn how to handle branches reward changes and understand Concepts like head diff and merge so these skills are invaluable for tracking changes and collaborating with other developers next pull requests Master the art of creating and managing pool requests to contribute to collaborative projects next we’ll Di into SQL for managing databases so first we’ll start with SQL Basics so learn about relational databases and how to perform basic queries and then you’ll move on to Advanced queries understand complex query techniques such as CT subqueries and window functions and then comes joints and database management so study different types of joints like Left Right inner and full joint you should also learn how to create databases manage indexes and right stored procedures Additionally you will use numi and pandas for data manipulation and learn basic data visualization techniques this comprehensive skill set will be crucial As you move into more advanced data science topics so now in month four you’ll deal with math and statistics for AI so mathematics and statistics are the backbone of AI and this month is dedicated to these critical subjects so first learn about descriptive versus inferential statistics continuous versus discrete data nominal versus ordinal data measures of central tendency like mean median mode and measures of dispersion like variance and standard deviation after that understand the basis of probability and delve into normal distribution correlation and cience after which you should move on to advanced concepts so you can study the central limit theorem hypothesis testing P values confidence intervals and so on in parallel you should also study linear algebra and calculus so in linear algebra learn about vectors metrices Egan values and Egan vectors and in calculus cover the basics of integral and differential calculus so this mathematical Foundation is essential for developing and understanding AI models setting you up for success as you trans ition into machine learning now in month five comes exploratory data analysis which is Eda and machine learning so with a solid foundation in math and statistics you are now ready to delve into machine learning number one pre-processing learn how to handle na values treat out layers perform data normalization and conduct future engineering you should also understand encoding techniques such as one hard and label encoding you’ll also explore supervised and unsupervised learning with a focus on regression and classification and learn about linear models like linear and logistical regression and nonl models like decision tree random Forest Etc and then understand how to evaluate models using metrics such as mean squared error mean absolute error me for regression and accuracy precision recall Etc then comes hyperparameter tuning learn about techniques like grid search CV and random search CV for optimizing your models after which you’ll move on to unsupervised learning here you can study clustering techniques like K means and hierarchical clustering and delve into dimensionality reduction with PCA so this month’s focus on Eda and model building will prepare you for more complex AI applications transitioning to the next phase you’ll begin to work on deploying these models and real world scenarios so in month six comes mlops and machine learning projects so this month we’ll cover the operational aspects of machine learning and work on practical projects so in mlops Basics learn about apis particularly using fast API for Python and server development understand devops fundamentals including cicd pipelines and containerization with Docker and cubs you also gain familiarity with at least one Cloud platform like AWS or aure now in month 7 comes deep learning so in this month we delve into the world of deep learning so number one comes noodle Network so learn about noodle networks including forward and backward propagation and build multi-layer perceptrons after which we’ll move on to Advanced architectures so here explore convolutional Neal networks which are CNN for image data and sequence models like rnms and lsdm so this deep learning knowledge will be crucial As you move into specialized areas of AI in the final month now in the final month the eighth month comes NLP or computer vision so the final month you have the option to specialize in either natural language processing NLP or computer vision so first we’ll leave NLP track so here you should learn about rejects text representation methods like count vectorizer tfidf b word TUC embeddings and text classification with Nave base and familiarize yourself with the fundamentals of libraries like Spacey and nltk and work on end to end NLP project and talking about computer vision track focus on basic image processing techniques like filtering Edge detection image scaling and rotation utilize libraries like open CV and build upon the CNN Knowledge from the previous month practice data preprocessing and augmentation so by the end of this month you should have a solid foundation your chosen specialization ready to embark on your AI engineering career so in conclusion adopting AI is more than just a trend it’s a strategic move that can transform your organization’s approach to machine learning hey everyone welcome to Simply learn today’s video will compare and contrast artificial intelligence deep learning machine learning and data science but before we get started consider subscribing to Simply learns YouTube channel and hit the Bell icon that way you’ll be the first to get notified when we post similar content before moving on let me ask you two interesting queries which among the following is not a branch of artificial intelligence data analysis machine learning deep learning neural networks and
the second query is what is the main difference between machine learning and deep learning please leave your answer in the comments section below and stay tuned to get the answer first we will unwrap deep learning deep learning was first introduced in the 1940s deep learning did not develop suddenly it developed slowly and steadily over seven decades many thesis and discoveries were made on deep learning from the 1940s to 20 thousand thanks to companies like Facebook and Google the term deep learning has gained popularity and may give the perception that it is a relatively New Concept deep learning can be considered as a type of machine learning and artificial intelligence or AI that imitates how humans gain certain types of knowledge deep learning includes statistics and predictive modeling deep learning makes processes quicker and simpler which is advantageous to data scientists to gather analyze and interpret massive amounts of data having the fundamentals discussed let’s move into the different types of deep learning neural networks are the main component of deep learning but neural networks comprise three main types which contain artificial neural networks orn convolution neural networks or CNN and recurrent neural networks or RNN artificial neural networks are inspired biologically by the animal brain convolution neural networks surpass other neural networks when given inputs such as images Voice or audio it analyzes images by processing data recurrent neural networks uses sequential data or series of data convolutional neural networks and recurrent neural networks are used in natural language processes speech recognition image recognition and many more machine learning the evolution of ml started with the mathematical modeling of neural networks that served as the basis for the invention of machine learning in 1943 neuroscientist Warren mccullock and logician Walter pittz attempted to quantitatively map out how humans make decisions and carry out thinking processes therefore the term machine learning is not new machine learning is a branch of artificial intelligence and computer science that uses data and algorithms to imitate how humans learn gradually increasing the system’s accuracy there are three typ types of machine learning which include supervised learning what is supervised learning well here machines are trained using label data machines predict output based on this data now coming to unsupervised learning models are not supervised using a training data set it is comparable to the learning process that occurs in the human brain while learning something new and the third type of machine learning is reinforcement learning here the agent learns from feedback it learns to behave and given environment based on actions and the result of the action this feature can be observed in robotics now coming to the evolution of AI the potential of artificial intelligence wasn’t explored until the 1950s although the idea has been known for centuries the term artificial intelligence has been around for a decade still it wasn’t until British polymath Allen Turing posed the question of why machines couldn’t use knowledge like humans do to solve problems and make make decisions we can Define artificial intelligence as a technique of turning a computer-based robot to work and act like humans now let’s have a glance at the types of artificial intelligence weak AI performs only specific tasks like Apple Siri Google assistant and Amazon’s Alexa you might have used all of these Technologies but the types I am mentioning after this are under experiment General AI can also be addressed as artificial general intelligence it is equivalent to human intelligence hence an AGI system is capable of carrying out any task that a human can strong AI aspires to build machines that are indistinguishable from the human mind both General and strong AI are hypothetical right now rigorous research is going on on this matter there are many branches of artificial intelligence which include machine learning deep learning natural language processing robotics expert systems fuzzy logic therefore the correct answer for which is not a branch of artificial intelligence is option a data analysis now that we have covered deep learning machine learning and artificial intelligence the final topic is data science Concepts like deep learning machine learning and artificial intelligence can be considered a subset of data science let us cover the evolution of data science the phrase data science was coined in the early 19 60s to characterize a new profession that would enable the comprehension and Analysis of the massive volumes of data being gathered at the time since its Beginnings data science has expanded to incorporate ideas and methods from other fields including artificial intelligence machine learning deep learning and so forth data science can be defined as the domain of study that handles vast volumes of data using modern tools and techniques to find unseen patterns derive meaningful information and make business decisions therefore data science comprises machine learning artificial intelligence and deep learning hello everyone I am M and welcome back to simpl YouTube channel these days we usually ask Siri hey Siri how far is the nearest fuel station whenever we are series something the powerful speech recognition system gets to work and converts the audio into its seual form this is then sent to the Apple server for further processing and then machine learning algorithms are run to understand the user’s intent and then finally Siri tells you the answer well this is happening because of these machine learning algorithms think about it not too long ago most tasks were done by people whether it was building things performing surgeries or even playing games like chess humans were in control but now things are changing fast almost all manual tasks are becoming automated meaning machines and computer are taking over those jobs this shift is redefining what we consider manual work machine learning a type of artificial intelligence is at the heart of this transformation there are so many different machine learning algorithms out there each designed to help computers learn and get better at task from playing chess like grandmas to performing delicate surgeries with amazing Precision these algorithms are making technology smarter and more personal every day so now that we have covered a brief about ml I want you guys to quickly check out the quiz attach Below in the description section take a moment to answer and let me know your thoughts in the comment section as well in today’s video we are going to cover the top 10 machine learning algorithms that every aspiring machine learning engineer should know whether you are building models to predict the future analyzing data or creating smart apps mastering these algorithm will help you make the most of machine learning so now let’s get started with what is algorithm what is an algorithm in computer programming an algorithm is a set of well- defined instruction to solve a particular problem it takes bunch of information sources and delivers the ideal result most of us must be using SnapChat to apply filter on our faces while making videos or capturing photographs but do you know how does Snapchat recognize your face while capturing videos or photographs and put filters on it even if there are multiple phases it applies filter on every phas accurately this became possible with the help of the face recognition technique which uses machine learning algorithms to detect faces and apply required filters on them so this is the basic idea of how an algorithms work so let’s move ahead in this video and we’ll see now how algorithm Works in machine learning so how do algorithm Works everyone knows the algorithm is a step-by-step process to approach a particular problem so there are numerous example of algorithm from figuring out sets of number to finding Roots through maps to sh data on the screen let’s understand this by using example every algorithm is built on inputs and the outputs Google search algorithm is no different the input is the search field and the output is the page of result that appears when you enter a particular phrase or keyword also known as Sur or search engine result page Google has algorithm so it can sort it result from various website and provide the user with the best result when you start you will see the search box will attempt to guess what you looking for in order to better understand what the user is looking for the algorithm is trying to gather as many as suggestion from them as possible the result from the search field that best messes the query will then be janked they choose which website will Rank and in what position using more than 200 ranking variables now that we have covered a brief about how algorithms work I want you guys to quickly check out the quiz attached Below in the description section take a a moment to answer and let me know your thoughts in the comment section as well moving forward let’s see types of machine learning so machine learning is classified into supervised learning unsupervised learning and reinforcement learning there are two sort of problems in supervised learning classification and regression certain types of machine learning algorithms fall under the classification are decision tree algorithms skn algorithm logistic algorithm name based algorithm support Vector machine algorithm svm however in regression type so machine learning algorithms are linear regression regression trees nonlinear regression basian linear regression now talking about unsupervised learning there are two sort of problem in unsupervised learning which are clustering and Association algorithms that fall under the clustering problems include K means clustering algorithms principal component analysis however algorithm that fall under Association problem are a prior algorithm and FP growth and rein enforcement learning there are two types positive reinforcement and negative reinforcement the reinforcement learning algorithms are mainly used in AI application and gaming application the main used algorithms are Q learning State action reward State action s r essay and deep Q neural network dqn and Mark of decision process after discussing what algorithms is and its types so now let’s see some popular machine learning algorithms the first one is linear regression and the second one is logistic regression third one is decis trees and the fourth one is svm support Vector machine and the fifth one is PCA principal component analysis and the sixth is K means clustering and seventh is random forest and eighth is Auto encoders and Ninth is dbscan it’s known as density based special clustering of application with noise and the last one we have is hierarchical clust so now let’s see these algorithm one by one so first one we have linear regression a statical method used to model the relationship between a dependent variable which is known as the target variable and one or more independent variables which are the predictors it assumes a linear relationship between the inputs and the output real life example is house price prediction predicting house prices based on features like size location and number of room so for example on average large houses cost more is the linear Trend identified by this algorithm some application are real estate price prediction sales forecasting and stock price prediction and the second one is logistic regression a classification algorithm used to predict binary outcomes that is yes or no or two or fors it uses a logistic function to model the probability of a particular classes real life example is email spam filter identifying spam emails based on certain features keyword sender number of links for instance and an email with claim your free gift now is classified as a Spam application are email spam filtering medical diagnosis and customer churn prediction and many more the third one we have decision trees a flowchart like tree structure used to make decision each node in the tree represent a decision based on a feature and each branch represent a possible outcome real life example is loan approval process a bank using decision trees might ask is the applicant credit is score about 700 and proceed with further question to approve or deny the loan application are loan approval medical diagnosis and marketing campaign analysis the fourth one we have random Forest an ensemble method that combines multiple decision trees to improve accuracy each tree give a vote on the outcome and the majority of vote determines the final decision real life example is medical diagnosis diagnosing diseases based on a patient data like age cholesterol level and blood pressure each decision tree in the forest makes a prediction and majority vote decides the diagnosis application are Health Care disease prediction fraud detection and customer segmentation number fifth we have support Vector machine svm a classification algorithm that find the optimal boundary to separate data into different classes often used for binary classification real life example is image recognition basically phase detection svm can be used to detect faces in an image by classifying Legions of the image as either phas or non-fas based on Pixel value applications are facial recognition speech recognition and hand return return digit recognition so now let’s move forward and see some unsupervised learning algorithm the number one is K means clustering a clustering algorithm that groups data into specified number K of a cluster based on similarity the goal is to minimize the distance between data points in each cluster real life example is customer segmentation in marketing grouping customers into segments like high Spenders frequent Shoppers based on their purchasing Behavior to personalize marketing efforts applications are customer segmentation Market Basket analysis and social media grouping in number seven we have hierarchical clustering a clustering algorithm that creates a tree like structure which is also known as dendrogram by grouping similar data points which can be either agglomerative which is bottom up or divisive which is top down real life example is Gene clustering and Healthcare clustering genes with similar explanation patterns to study cancer cell the dendograms help researchers identify genes that behave similarly in response to treatment application are Gene expressive analysis customer Behavior Analysis and document clustering in number eight we have dbn the full form is density based special clustering of application with noise are density based clustering algorithms that identify cluster based on the density of data point it can also handle noise which is outliers by labeling them as noise Point real life example is identifying crime Hotpot detecting areas with frequent criminal activity by clustering location based on crime density with L being excluded applications are crime hotspot detection anomal detection and GEOS special analysis number ninth we have principal component analysis PCA a dimension deduction that transform data into smaller set of uncorrelated variables which are principal components to capture the most variance in the data real life example is image compression comprising image by reducing the number of variables retaining key feature that preserve most of the images information thus reducing storage space applications are data compression Dimension reduction and data visualization the last one we have Auto encoders a type of neural network used to learn efficient representation of data typically for DST deduction or anomaly deduction it encs input data into a compressed representation and then reconstruct it back real life example is fraud detection in financial transaction detecting unusual transaction by training and encoders or normal transaction data when an outlier transaction occurs it is flagged as potentially fraud application are fraud detection image denoising and recommendation system so these algorithms are the part of many real system that we interact with daily from predicting what product you might want to buy online to directing frauds in your bank account they are used in various Industries such as Healthcare Finance retail and security llms if you ever wondered how machine learning can Now understand and generate humanik text you are in the right place from chat boards like chat GPT to AI assistant that powers search engines llms are transforming how we interact with technology one of the most exciting advancement in this space is Google Gemini or open AI chgb a cutting as large language model designed to push the boundaries of what AI can achieve in this video we will explore what llms are how they work and why models like Gemini are critical for the future of AI Google Gemini is part of a new wave of AI models that are smarter faster and more efficient it is designed to understand context better offer more accurate responses and integrate deeply into service like Google search and Google Assistant providing more humanik interactions so we will break down the science behind llms including their massive training data set Transformer architecture and how models like Gemini use deep learning Innovation to change Industries plus we will compare Google Gemini to other popular LM such as open aity models showing how each of these Technologies is used to power chatboard virtual assistants and other a application by end of this video you will have a clear understanding of how large language models like chamini work their key features and what they mean for their future AI don’t forget to like subscribe and hit the Bell icon to never miss any update from Simply learn so what are the large language models large language models like CH GPD 4 generative pre-trained Transformer 4 o and Google Gemini are sophisticated AI system designed to comprehend and generate humanik text these models are built using deep learning techniques and are trained on was data set collected from the internet they leverage self attention mechanism to analyze relationship between words or tokens allowing them to capture context and produce coherent relevant responses llms have significant application including powering virtual assistant chat boards content creation language translation and supporting research and decision making their ability to generate fluent and contextually appropriate text has advanced natural language processing and improved human computer interaction so now let’s see what are large language model used for large language models are utilized in scenarios with limited or no domain specific data available for training these scenarios include both few short and zero short training approaches which rely on the model’s strong inductive bias and its capability to derive meaningful representation from a small amount of data or even no data at all so now let’s see how are large language model trained large language models typically undergo pre-training on a board all encompassing data set that shares statical similarities with the data set is specific to the Target task the objective of pre- training is to enable the model toire high level feature that can later be applied during the fine tuning phase for specific task so there are some training processes of llm which involves several steps the first one is text pre-processing the textual data is transformed into a numerical representation that the llm model can effectively process this conversion may be involved techniques like tokenization encoding and creating input sequences the second one is random parameter initialization the models parameter are initialized randomly before the training process begins the third one is input numerical data the numerical representation of the text data is fed into the model of processing the models architecture typically based on Transformers allows it to capture the conceptual relationship between the words or tokens in the next the fourth one is loss function calculation a loss function calculation measure the discrepancy between the models prediction and the actual next word or token in a snx the llm model aims to minimize this laws during training the fifth one is parameter optimization the models parameter are registered through optimization technique this involves calculating gradient and updating the parameters accordingly gradually improving the model’s performance the last one is itative training the training process is repeated over multiple iteration or AO until the models output achieve a satisfactory level of accuracy on that given task or data set by following this training process large language model learn to capture linguistic patterns understand context and generate coherent responses enabling them to excel at various I language related task the next topic is how do large language models work so large language models leverage deep neural network to generate output based on patterns learn from the training data typically a large language model adopts a Transformer architecture which enables the model to identify relationship between words in a sentence irrespective of their position in the sequence in contrast to RNs that rely on recurrence to capture token relationship Transformer neural network employ self attention as their primary mechanism self attention calculates attention scores that determine the importance of each token with respect to the other token in the text sequence facilitating the modeling of integrate relationship within the data next let’s see application of large language models large language models have a wide range of application across various domains so here are some notable applications the first one is natural language processing NLP large language models are used to improve natural language understanding task such as sentiment analysis named entity recognition text classification and language modeling the second one is chatbot and virtual assistant LGE language models power conversational agents chatbots and virtual assistant providing more interactive and humanik user interaction the third one is machine translation L language models have been used for automatic language translation enabling text translation between different languages with improved accuracy the fourth one is sentiment analysis llms can analyze and classify the sentiment or emotion expressed in a piece of text which is valuable for market research brand monitoring and social media analysis the fifth one is content recommendation these models can be employed to provide personalized content recommendations enhancing user experience and engagement on platforms such as News website or streaming services so these application highlight the potential impact of large language models in various domains for improving language understanding automation this video on stable diffusion one of the most advanced AI tools for generating stunning photo realistic images from just text whether you are describing a vibrant s a futuristic city or a sural dreamcap stable diffusion can turn your imagination into reality within seconds the latest version stable diffusion XL brings even higher quality results thanks to a larger Network and improved techniques not only you can generate images but you can also enhance them with features like in painting where you can edit parts of an image or out painting which expand image Beyond its original borders so how does it works the AI starts by breaking down an image into noise then cleverly reverse that process to recreate a clear and detailed picture we will also show you how to create effective proms to get the best result from stable diffusion whether you’re using web based version or running it on your own computer and yes you can even use it for commercial purposes stick around because I will be giving you a live demo and showing you step by step how to create your own images with this powerful tool so without any further Ado let’s get started so hello guys guys welcome back to the demo part of this stable diffusion so first we will I will open stgt AI okay so this is the artificial intelligence company which launched stable diffusion text to image generator okay so we have multiple models in this okay so we have image model we have video model audio 3D languages okay let’s go to this image models yes so we have two series sd3 series and SD XL series sd3 large is there tur large turbo is there sd3 medium is there in SD XL see stable diffusion XEL is there SD XL turbo is there and Japanese St diffusion XEL is there so there are two ways of using stable diffusion the first is you can install stable diffusion locally and you can use it but there are some requirements you should have on your system okay just like there should be you know uh GPU should be there graphic card and V graphic card should be there or another graphic card will be fine right so here you can uh you know use the API or you can get the license or you can download the code okay or you can can read about this table diffusion Excel here right so I will show you how to download and install this stable diffusion for a while I don’t have any graphic card on my system okay so I can’t use but I will show you properly how to use and how to run it so first step I will give you this link okay this is the hugging phase so here sdxl is you know their recently launched model okay so here you can uh read it about all the configuration or you want okay the how you can install so first you have to install python okay first you have to install python latest version and then second you have to install after installing python you have to install git not git co-pilot just get bash okay this you have to install Windows either Mac OS or Linux okay so here what you have to do I will give you this link and you have to go to files and version and you have to download this one sdxl base 1.0 0. this one or you have to download it so see I have already downloaded it okay so this file is all around 6.5 GB okay for a while I’m canceling it because as you can see here I already downloaded okay so after downloading this you have to go to you know you have to write stable web UI stable diffusion web UI okay I will give you this link as well okay here what you have to do you have to go here and download zip file okay again zip file is there but again let me download it for you okay this is done okay after that what you have to do you have to unzip this stable diffusion okay then this folder will come okay then you have to go down and here you can see web U UI user for Windows badge file you have to run this badge file okay if you are you Mac User you can uh run this shell script okay so I will install this I have already installed but you have to just double click this and everything will be installed okay so it will launch one page after installing this it will take like half an hour because it will download multiple files of 4455 GB okay for a while let me run it see this page will come so as you can see here I am running stable diffusion locally 1 to 7.0 this okay this is local so here so now we downloaded first file right if you remember we downloaded this file so what you have to do you have to go to downloads then you have to copy this file downloaded file then you go have to go here here you have to go to models then here you have to dist stable diffusion then you can copy paste here your file okay so these are the models so this is the latest models okay it will install with v15 prune model but we want to use latest model okay so that’s why we are copying it okay so while installing it will show you some these type of things here it will download multiple things right so after installing this just refresh and you can see two models are there okay you can select either okay I’m selecting this here I will write astronaut writing horse but it will give me error okay I it will generate so what error is coming see found no Nvidia driver on your system okay I don’t have any drivers install in my system I don’t have any graphic card but if you have graphic card it will run smoothly it will give you all the outputs but again we can use this stable diffusion online as well web okay here see stable diffusion 2.1 demo I will give you this link as well okay so let me write again then the same thing as not writing horse generate image so it will take 11 second as it is showing we have to wait okay it’s scanning almost done processing see Astron on riding horse okay okay let’s go to chat gbt and ask some funny give me some funny text to image generators prompts okay so let’s get okay A C weing chef at adapting oh this one is cool okay let’s copy this and paste it here okay let’s read a penguin dressed as a pirate searching for treasure on a ice flow with a parrot that only sarks okay see pirate head see pirate head okay let’s try something else a robot trying to blend at a grumpy be sitting in a therapist okay this one is cool I guess okay let me run it again it will take some 11 seconds but this you know if you run locally this will work definitely I’m sure okay because in other than my system it’s running smoothly see a grumpy we are sitting in a therapist office discussing its feeling about something see sitting this one so this is how you can use a staple diffusion okay locally and using web right this is better than Dal because Dal is again expensive it was November 30 2022 Sam Alman Greg Brockman and ilas AER would never have thought that with the push off a button they would completely alter the lives of all human beings living on the earth and of future generations to come on November 30 the open AI team launched Chad GPD Chad GPT was born that day Alit a very small event in the history of Internet Evolution but one that can no less be marked as one of the most significant events of modern IT industry Chad GPD a text based chatbot that gives replies to questions asked to it is built on GPT large language model but what was so different I mean the Google search engine YouTube Firefox browser they all have been doing the same for brackets so how is Chad GPT any different and why is it such a big deal well for starters Chad GPT was not returning indexed websites that have been SEO tuned and optimized to rank at the top chat GPT was able to comprehend the nature tone and the intent of the query asked and generated text based responses based on the questions asked it was like talking to a chatbot on the internet minus the out of context responses with the knowledge of 1.7 trillion parameters it was no shock that a Computing system as efficient and prompt as chat gbt would have its own set bits so did Chad GB it was bound by the parameters of the language model it was trained on and it was limited to giving outdated results since the last training data was from September still jjt made Wes in the tech community and continues to do so just have a look at the Google Trend search on Chad GP every day new content is being published on Chad GPT and hundreds of AI tools the sheer interest that individuals and Enterprises across the globe has shown in chat GPT and AI tools is immense ai ai ai ai generative AI generative AI generative ai ai ai ai ai ai ai a AI now here comes the fun part Chad gbt or for that matter any large language model runs on neural networks trained on multi-million billion and even trillions of data parameters these chatbots generate responses to user queries based on the input given to it while it may generate similar responses for identical or similar queries it can also produce different responses based on the specific context phrasing and the quality of input provided by each user additionally chat GPT is designed to adapt its language and tone to match the style and preferences of each user so its responses may worry in wording and tone depending on the individual users communication style and preferences every user has their own unique style of writing and communication and chat gpt’s response can worry based on the input given to it so this is where prompt Engineers come into play prompt Engineers are expert at prompt engineering sounds like a cyclic definition right well let’s break it down first let’s understand what prompts are so prompts are any text based input given to the model as a query this includes statements like questions asked the tone mentioned in the query the context given for the query and the format of output expected so here is a quick example for your understanding now that we have discussed what a prompt is so let us now understand who is a prompt engineer and why it has become the job for the future broadly speaking a prompt engineer is a professional who is capable of drafting queries or prompts in such a way that large language models like GPT Palm llama Bloom Etc can generate the response that is expected these professionals are skilled at crafting accurate and contextual prompts which Le allows the model to generate desired results so here is a quick example for you prompt Engineers are experts not only at the linguistic front but they also had extensive domain knowledge and very well versed with the functioning of neural networks and natural language processing along with the knowledge of scripting languages and data analysis leading job platforms like indeed and Linkedin already have many prompt engineer positions in the United States alone job postings for this role run in the thousands reflecting the growing demand the salary of prompt Engineers is also compelling with a range that spends from $50,000 to over $150,000 per year depending on experience and specialization so there are multiple technical Concepts that a prompt engineer must be well wored in to be successful in their jobs such as multimodality tokens weights parameters Transformers to name a few whether it’s Healthcare defense IT services or attech industry the need for skill prompt Engineers is on the rise there are already several thousand job openings in this field and the demand will continue to grow so if you want to hop on this amazing opportunity and become an expert prompt engineering professional then now is the time let us know in the comments what you think about prompt engineering and if you want to know more about the skills needed to become a prompt engineer then make sure to like and share this video with your friends and family and tell them about this amazing new job opportunity hello everyone I am M and welcome to today’s video where we will be talking about llm benchmarks tools used to test and measure how well large language models like GPT and Google Gemini performs if you have ever wondered how AI models are evaluated this video will explain it in simple terms llm benchmarks are used to check how good these models are at tasks like coding answering questions and translating languages or summarizing text these tests use sample data and a specific measurement to see how will the model perform for example the model might be tested with a few example like few short learning or none at all like zero short learning to see how it endles new task so now the question arises why are these benchmarks important they help developers understand where a model is strong and where it needs Improvement they also make it easier to compare different models helping people choose the best one for their needs however llm benchmarks do have some limits they don’t always predict how well a model will work in real world situation and sometimes model can overfit meaning they perform well on test data but struggle in Practical use we will also cover how llm leaderboards rank different model page on their benchmark scores giving us a clear picture of Which models are performing the best so stay tuned as we dive into how llm Benchmark work and why they are so important for advancing AI so without any further Ado let’s get started so what are llm benchmarks llm benchmarks are standardized tools used to evaluate the performance of La language models they provide a structur way to test llms on a specific task or question using sample data and predefined metrics to measure their capabilities these Benchmark assess various skills such as coding Common Sense reasoning and NLP tasks like machine translation question answering and text summarization the importance of llm Benchmark lies in their role in advancing model development they track the progress of an llm offering quantitive insights into where the model performs well and where Improvement is needed this feedback is crucial for guiding the fine tuning process allowing researchers and developers to enhance model performance additionally benchmarks offers an objective comparison between different llms helping developers and organization choose the best model for their needs so how llm benchmarks work llm Benchmark follow a clear and systematic process they present a task for llm to complete evaluate it performance using specific metrics and assign a score based on how well the model performs so here is a breakdown of how this process work the first one is setup llm Benchmark come with pre-prepared sample data including coding challenges long documents math problems and real world conversation the task is span various areas like Common Sense reasoning problem solving question answering summary generation and translation all present to the model at the start of testing the second step is testing the model is tested on one of the three ways few short the llm is provided with a few example before being prompted to complete a task demonstrating its ability to learn from limited data the second one is zero shot the model is asked to perform a task without any prior examples testing its ability to understand New Concept and adapt to unfamiliar scenarios the third one is fine tune the model is trained on a data set similar to the one used in The Benchmark aiming to enhance its performance on the specific task involved the third step is the scoring so after completing the task The Benchmark compares the model’s output with the expected answer and generates a score typically ranging from 0 to 100 reflecting how accurately the llm perform so now let’s moving forward let’s see key metrics for benchmarking llms so LLS Benchmark uses various metrics to assess performance of large language model so here are some commonly used metric the first one is accuracy of precision measure the percentage of correct prediction made by the model the second one is recall also known as sensitivity measure the number of true positive reflecting the currect prediction made by the model the third one is F1 score combines both accuracy and recall into a single metric weighing them equally to address any false positive or negatives F1 score ranging from Z to one where one indicates perfect precision and recall the fourth one is exact match tracks the percentage of predictions that exactly match the correct answer which is especially used for the task like translation and question answering the fifth one is perity gges here it will tell you how well a model predicts the next word or token a lower perplexity score in indates better task comprehension by the model the sixth one is blue bilingual evaluation understudy is used for evaluating machine translation by comparing andrs sequence of adjacent text element between the models output and the human produced translation so these quantitative metrics are often combined for more through evaluation so in addition human evaluation introduces qualitatively factors like coherence relevance and semantic meaning provide a nuan assessment however human evalution can be time consuming and subjective making a balance between quantitative and qualitative measures important for comprehensive evaluation so now let’s moving forward see some limitation of llm benchmarking while llm benchmarking available for assessing model performance they have several limitation that prevents them from the fully predicting real world Effectiveness so here are some few the first one is bounded scoring once a model achieves the highest possible scores on The Benchmark that Benchmark loses its utility and must be updated with more challenging task to to remain a meaningful assessment tool the second one is Broad data set llm Benchmark often rely on Sample data from diverse subject and task so this wide scope may not effectively evaluate a model performance in edge cases specialized fields or specific use cases where more tailor data would be needed the third one is finite assessment Benchmark only test a model current skills and as llms evolve and a new capabilities emerg new benchmarks must be created to measure these advancement the fourth one is overfitting so if an llm is trained on the same data used for benchmarking it can be lead to overfitting where the model performs well or the test data but struggles with the real task so this result is scores that don’t truly represent the model’s broader capabilities so now what are llm leaderboards so llm leaderboards publish a ranking of llms based on the variety of benchmarks leaderboard provide a way to keep track to the my llms and the compare their performance llm leaderboards are especially beneficial in making decision on which model do you so here are some so in this you can see here open AI is leading and GPD 40 second and the Llama third with 45 parameter B and 3.5 Sonet is there so this is best in multitask reasoning what about the best in coding so here open AI o1 is leading I guess this is the oran one and the second one is 3.5 Sonet and after that in the third position there is GPD 4 so this is inv best in coding so next comes fastest and most affordable models so fastest models are llama 8B parameter 8B parameter and the second one is L ma Lama 70b and the third one is 1.5 flesh this is Gemini 1 and lowest latency and here it is leading llama again in cheapest models again llama 8B is leading and in the second number we have Gemini flash 1.5 and in third we have GPT 4 mini moving forward let’s see standard benchmarks between CLA 3 Opus and GPT 4 so in journal they are equal in reasoning CLA 3 Opus is leading and in coding gp4 is leading in math again GPT 40 is leading in tool use cloud 3 Opus is leading and in multilingual Cloud 3 opas leading today we will discuss about booming topic of this era multimodel AI let’s understand with an example imagine you are showing a friend your vacation photos you might describe the site you saw the sounds you heard and even your emotions this is how humans naturally understand the World by combining information from different sources multimodel AI aims to do the same thing let’s break the model AI first multimodel refers to two different ways of communicating information like text speech images and video where AI stands for artificial intelligence which are systems that can learn and make decision so multimodel AI is a type of AI that can process and understand information from multiple sources just like you do when you look at your vacation photos now that we have understood what is multimodel AI let’s now go a bit further it is obvious that multimodel AI is not the only AI out there but what is big deal about multimodel AI that everyone is talking about that is what we will discuss in this segment so now let’s understand the difference between multimodel Ai and the generative AI while both multimodel Ai and generative AI are exciting advancement in AI they differ in their approach to data and functionality so generative AI Focus creates new data similar to the data it’s stained down and in multimodel AI focus is to understand and processes information from multiple sources that is text speech images and video data types of generative AI are primarily works with a single data type like text writing poems or images that is generating realistic portraits whereas in multimodel AI data types works with diverse data types enabling a more comprehensive understanding of the world the third one is examples like chat boards text generation models image editing tools whereas multimodel AI example covers virtual assistants medical diagnosis system and autonomous vehicles strength are can produce creative and Innovative content automated repetitive task and personalize your experience whereas in multimodel AI stren are provides a more humanlike understanding of the world and improve accuracy in ense generative AI excels at creating new data while multimodel AI excels at understanding and utilizing exising data from diverse sources they can be complimentary with generative models being used to create new data for multimodel AI Sy s to learn more from and improve their understanding to the work next let’s understand what are the benefits of multimodel AI the benefits of multimodel AI is that it offers developers and users an AI with more advanced reasoning problem solving and generation capabilities these advancement offers endless possibilities for how Next Generation application can change the way we work and live for developers looking to start building Vex AI gini API offers features such as Enterprise security data residency performance and technical support if you’re existing Google Cloud customers can start prompting with Gemini AI in Vex AI right now next let’s see what are the multimodel AI big challenges multimodel AI is powerful but faces hurdles the first one is data overload managing and storing massive diverse data is expensive and complex the second one is meaning mystery teaching AI to understand subtle difference in between meaning like sarcasm is tricky the third one is data alignment ensuring data points from different sources saying in tune is challenging the fourth one is data scarcity limited and potentially biased data sets hinder effective training the fifth one is missing data Blues what happens when data is missing like disorted audio the last one is Black Box Blues understanding how AI makes decision can be difficult so these challenges must be addressed to unlock the full potential of model AI next let’s see what is the future of multimodel AI and why is it important multimodel Ai and multimodels are represent a Leap Forward in how developers build and expand the functionality of AI in the next generation of application for example Gemini can understand explain and generate high quality code in the world’s most popular programming languages like python Java C++ and go freeing developers to work on building more featured field application multimodels AI potential also bring the world closer to AI That’s less like smart software and more like an expert helper or assistant open AI is one of the main leaders in the field of generative AI with its chat GPT being one of the most popular and widely used examples chat GPT is powered by open AI GPT family of large language models llms in August and September 2024 there were rumors about a new model from open AI code name strawberry at first it was unclear if it was the next version of GPT 40 or something different on September 12 open AI officially introduced the 01 model hi I am mik in this video we will discuss about open model 01 and its types after this we will perform some basic prompts using openai preview and openai mini and at the end we will see comparison between the open A1 models and GPD 4 so without any further Ado let’s get started what is open a 1 the open A1 family is a group of llms that have been improved to handle more complex reasoning these models are designed to offer a different experience from GPT 440 focusing on thinking through problems more thoroughly before responding unlike older models o is built to solve challenging problems that require multiple steps and deep reasoning open o1 models also use a technique called Chain of Thought prompting which allows the model to Think Through problem step by step open a o consist of two models o1 preview and o1 mini the 01 preview model is meant for more complex task while the 01 mini is a smaller more affordable version so what can open A1 do open A1 can handle many tasks just like other GPT models from open AI such as answering questions summarizing content and creating new material however o is especially good at more complex task including the first one is enhanced using the 0 models are designed for advanced problem solving particularly in subjects like science technology engineering and math the second one is brainstorming and ideation with its improved reasoning o is great at coming up with creative ideas and solution in various field the number third is scientific research o is perfect for task like anoing cell sequencing data or solving complex math needed in areas like Quantum Optics the number fourth is coding the over models can write and fix code performing well on coding tests like human EV and Bo forces and helping developers build multi-step workflows the fifth one mathematics o1 is much better at math than previous model scoring 83% in the international mathematics Olympia test compared to gp4 row 133% it also did well in other math competition like aim making it useful for generating complex formulas for physics and the last one is self checking o can check the accuracy of its own responses helping to improve the reliability of its answer you can use open A1 models in several ways chat gbd plus and team users have access to 0 preview and 0 mini models and can manually choose them in the model picker although free users don’t have access to the 0 models yet open AI planning to offer 0 mini to them in the future developers can also use these models open a as API and they are available on third party platform like Microsoft as Yi studio and GitHub models so yes guys I have opened this chb 40 model here and chat G1 prev as you can see so I have this plus model OKAY the paid version of chat gbd so I can access this o1 preview and 01 Mini model okay we will go with o1 preview model and we will put same prompts in both the model of the chat gity 40 and the o1 preview and see what are the differences are coming okay so we will do do some math questions and we will do some coding we will do some Advanced reasoning and quantum physics as well okay so let’s start with so I have some prompt already written with me so first one is number Theory okay so what I will do I will copy it from here and paste it in this and both okay so let me run in for and o1 preview so here you can see it’s thinking okay so this is what I was saying chain of thoughts okay so these are the chain of thoughts first is breaking down the primes this is and then is identifying the GCT and now see the difference between the output see output is 561 is not a prime number and the gcd greatest common deceiver of 48 and 180 is 12 okay here see charge o preview is giving the output in step by step first see determine if 561 is a prime number or not the number 561 is not a prime number it composite number because it has this this this okay then Second Step then the greatest common divisor then they found 12 and answer is no 561 is not composite number because of this and the greatest common divisor of 48 and 18 is 12 see just see the difference between the two model this is why CH gp1 models are crazy for math coding and advanced reasoning quantum physics for these things okay so let’s go with our second step so here if you will see you can see the attach file option in charity 40 okay you can come upload from your computer but here you we will see in o1 there is no attach file option this is one drawback okay so here upload from computer so this is one small okay and let me open this and this is the question I have okay yeah so I will copy this I will run this and this okay see it’s start giving the answer and o1 is still thinking solving the equation then solving analyzing the relationship okay so CH GT1 will take time but it will give you more accurate more step by step as you want okay so here you can see solve for x question this this this and here the steps you can see okay this is more structured way you can say in a good structured way okay ch1 preview give you in good structur way as o1 mini as well okay so yeah so here they wrote just one and two this this this and here if you’ll see question one solve for x in this and step one is this step two is this and step three is this then the answer of xal to three but here simply the roote we know this this this and X = to 3 for the second question see expanding the left hand side this this this but here step one square both sides of the given equation start by squaring both side okay it’s written but not in good way okay so this is why o1 is better for math okay so now let’s check it for the coding part okay so I have one question okay let me see what output it will give to first I will write I need okay leave it I will copy it and I will copy it as well here run it and run it see it’s start giving answer okay and still this will adjust the parameters and Shing the code generation because jpt o1 will think first then it will analyze then after that it will give you answers okay here the code is done see here the code is done and it’s still thinking step one and first here you can’t see anything see step setup development environment PP install n PL Li then this then this and here nothing and but I will ask it okay give me code in one tab okay here also like give me code [Music] and in single tab okay so I can just copy and paste so what I will do I will open one online compiler and I will directly copy it and paste okay so let’s finish this I hope it will work so let me open W3 schools compiler okay yeah same I will open for this W3 School okay so let me copy the code and my bad and paste it here same for goes for this okay okay I will copy the code and I will paste it here okay I hope okay okay it gives something yeah cool so yes now you can see the difference between the output so this is the output of 40 and this is the output of o1 preview see o1 preview output is this and this is the out output of 40 so this is the difference this is why o1 takes time but it will give you more accurate result in a good way okay so now let’s check something else so moving on let’s see some Advanced reing question okay so this is The Logical puzzle one the first one okay so I will copy it and I will paste it here okay this is for 0 this is for preview because why I’m not comparing o1 with mini because they both are same but slightly different is there okay so here we can see more difference between for old model versus new model you can say okay so now see the answer is end in this much only but it will explain you in a better way see thoughts for 7 Seconds explanation that case one then case two okay with conclusion in both scenarios summary and this here this one small explanation and that’s it right so they created o1 preview for more you know it will describe you more in a better way right now let’s see some scientific reasoning as well okay so let me copy it here say still thinking but just start giving answer see thought for 16 seconds so again I will say that you know chat G1 is much better than chb for Chad gbt 4 is great for you know content writing and all but Chad gbt 01 preview and mini are very good for reasoning math coding or quantum physics these type of things okay Advanced reasoning okay charity 4 is good for you know generative text okay like for marketing writing copies emails and all of those so now let’s see some comparison between o models and GPD 40 model when new models are released their capabilities are revealed through Benchmark data in the technical reports the new open AI model excel in complex using task it surpasses human phsd level accuracy in physics chemistry biology on the GP QA benchmark coding becomes easier with 01 as it rents in the 89th percentile of the competitive programming questions code Force the model is also outstanding in math on a qualifying exam for international mathematics Olympiad IMO GPD 4 solved only 133% of problems while 0 achieved 83% this is truly next level on the standard ml benchmarks it has huge improvements across the board MML means multitask accuracy and GP QA is reasoning capability human evaluation open a ask people to compare o wi with GPT 40 on difficult open-handed task across different topics using the same method as the 0 preview versus GPT 4 comparison like o preview o mini was preferred over gp4 for tasks that require strong reasoning skills but GPT 40 was still favored for language based task model speed as a concrete example we compared responses from gp40 o mini and O preview on the word in question while GPT 4 did not answer correctly both o mini and O preview did and O mini read the answer around 3 to 5x faster limitation and wor next due to its specialization on STEM Science technology engineering and math reasoning capabilities or min’s factual knowledge on non stamp topics such as dates biographics and trivia is comparable to small LM such as GPT 40 meaning open AI will improve these limitation in future version as well as has experiment the extending the model to other modalities and specialities outside of the stem on July 25th open AI introduce search gbt a new search tool changing how we find information online unlike traditional search engines which require you to type in specific keywords Serb lets you ask question in natural everyday language just like having a conversation so this is a big shift from how we were used to searching the web instead of thinking in keywords and hoping to find the right result you can ask now sir gbd exactly what you want to know and it will understand the context and give you direct answers it designed to make searching easier and more intuitive without going through links and pages but with this new way of searching so there are some important question to consider can sgpt compete with Google the search giant we all know what makes sgpd different from AI overview another recent search tool and how does it compare to chat GPT open AI popular conversational AI so in this video we are going to explore these questions and more we will look at what makes rgbt special how it it compares to other tools and why it might change the way we search for information whether you are new into Tech or just curious this video will break it down in simple words stick around to learn more about sgb so without any further Ado let’s get started so what is search GPT sech GPT is a new search engine prototype developed by open AI designed to enhance the way we search for information using AI unlike a typical jetbot like chat GPT s GPT isn’t just about having a conversation it’s focused on improving the search experience with some key features the first one is direct answer instead of Simply showing you a list of links sepd delivers direct answer to your question for example if you ask what is the best wireless noise cancellation head for in 2024 sir gbt will summarize the top choices highlighting their pros and cons based on Expert reviews and user opinions so this approach is different from the traditional search engines that typically provide a list of links leading to various articles or videos the second one is relevant sources SE GPD responses come with clear citations and links to the original sources ensuring transparency and accuracy so this way you can easily verify the information and Del deeper into the topic if you want the third one conversational search sgpd allows you to have a back and forth dialogue with the search engine you can ask follow-up questions or refine your original query based on the responsive you receive making your search experience more interactive and personalized now let’s jump into the next topic which is Ser GPT versus Google so sir GPT is being talked about a major competitor to Google in the future so let’s break down how they differ in their approach to search the first one is conversational versus keyword based search search GPT uses a conversational interface allowing user to ask question in natural language and refine their queries through follow-up question so this creates a more interactive search experience on the other hand Google relies on keyword-based search where user enter specific terms to find relevant web pages the second thing is direct answer versus list of links so one of the SE gpts is Stand Out Fe feacher is its ability to provide direct answers to the question it summarizes information from the various sources and clearly CES them so you don’t have to click through multiple links Google typically present a list of links leaving user to shift through the result to find the information they need the third one AI powered understanding versus keyword matching sir GPS uses AI to understand the intent behind your question offering more relevant result even if your query isn’t perfectly worded Google’s primary method is keyword matching which can sometimes lead to less accurate result especially for complex queries the fourth one Dynamic context versus isolated searches so sear gbt maintains content across multiple interaction allowing for more personalized responses whereas Google treats e search as a separate query without remembering previous interaction and the last one realtime information versus index web pages Ser is aim to provide the latest information using realtime data from the web whereas Google V index is comprehensive but may include outdated or less relevant information so now let’s jump into the next topic which is serd versus AI overviews so SBD and AI overviews both use AI but they approach search and information delivery differently it’s also worth noting that both tools are still being developed so their features and capabilities May evolve and even overlap as they grow so here are the differences the first one is Source attribution Serb provides clear and direct citation
linked to the original sources making it easy for user to verify the information whereas AI overviews include links the citation may not always be clear or directly associated with specific claims the second one is transparency control sgbt promises greater transparency by offering Publishers control over how their content is used including the option to opt out of AI training AI overviews offer less transparency regarding the selection of content and the summarization process used the next one is scope and depth sgbt strives to deliver detailed and comprehensive answers pulling from a broad range of sources including potential multimedia content and in AI overviews offers a concise summary of key points often with links for further exploration but with a more limited scope now let’s jump into the next part Ser GPT versus CH GPT Ser GPT and CH GPT both developed by open share some core features but serve different purposes so here are some differences the first one is primary purpose sgpt designed for search providing direct answer and sources from the web whereas sgpd focus on conversational AI generating text responses the second one is information sources sgb relies on realtime information from the web whereas sh GPD knowledge based on this training data which might not be correct the third one is response format sgbt prioritize concise answers with citation and Source links so whereas sgbt is more flexible generating longer text summarizes creative content code and Etc the next feature is use cases surity idle for fact finding research and task requiring upto-date information whereas chpd is suitable for creative writing brainstorming drafting emails and other open andas so now question arises when will sergt be released sergt is currently in a limited prototype phase meaning it’s not yet widely available open a is testing with a select group to gather feedback and improve the tool so if you are interested in trying sgbd so you can join the weight list on its web page but you will need a CH gbd account a full public release by the end of 2024 is unlikely as openi hasn’t set a timeline it’s more probable that sgbd features will gradually added to the Chad GPD in 2024 or in 25 with a potential Standalone release later based on testing and the feedback Sora is here open AI has introduced Sora an advanced AI tool for creating videos now available at sora.com earlier this year Sora was launched to turn text into realistic videos showcasing exciting progress in AI technology now open AI has released Sora turbo a faster and more powerful version available to jbt Plus and pro users Sora lets user create videos in 1080P quality up to 20 second long and in different formats like WID screen vertical or Square it includes tools like a storyboard for precise control and options to remix or create videos from scratch there is also a community section with featured and recent videos to spark ideas chat plus users can make up to 50 videos per month at 480p resolution while Pro user get access to more features like higher resolution and longer video duration while Sora turbo is much faster open AI is still working to improve areas like handling complex setion and making the technology more affordable to ensure safe and ethical use Sora includes features like visible watermarks content moderation and metadata to identify videos created with Sora Sora makes it easier for people to create and share stories through video open AI is excited to see how user will explore new creative possibilities with the powerful tool so welcome to the demo part of the Sora so this is the landing page when you will log in in Sora so let me tell you I have the charb plus version not the pro version so I have some 721 credits left okay uh later on I will tell you what are the credits okay so let’s explore something here so so these are some recent videos which I have created or tested you can see and this featured version is all the users of Sora which are creating videos so it’s coming under featured so we can learn or we can generate some new ideas like this okay like this parot and all like this is very cool for Learning and these are some the saved version and these are all videos and uploads like this so let’s come into the credit Parts okay so you can see I have 721 credit left so if you will go this help open.com page and this page you can see what are the credit so credits are used to generate videos with Sora okay so if you will create 4 ATP Square 5sec video it will take only 20 credits okay for 10 it will take 40 then this then this okay for 480p uh this much credit 25 credit 50 credit like this 7208 is different can it be different okay so here it is written please not that the questing multiple variation at once will be charged at the same rate as running two separate generation request okay so here this plus icon you can see so here you can upload the image or video okay so you can also do like this you can upload the image and you can create the video from that image okay and this is choose from library your personal Library this library right and this option is for the variation okay like these are basically presets like balloon World Stop Motion archive World filar or cardboard and the paper okay so this is the resolution okay 480p this is the fastest in video generation okay 720p will take like 4X lower and 1080p 8X lower I guess 1080p is only available in ch gpt’s uh pro version got it okay so we uh we are just you know doing I will I’m just uh showing you demo so I will uh choose this fastest version only okay so this is the time duration how long you want like 5 Seconds 10 seconds 15 and 20 seconds is available in pro version okay of CH gity and this is how much versions you want to take we will I will select only two okay because it will again charge more credits to you okay and these credits are monthly basis I guess okay these credits are monthly basis okay see again recard remix Bland Loop to create content this will take again more credits okay see here chity plus up to 50 priority videos 1,000 credits okay per month I guess yeah per month up to 720p resolution and the 5 Second duration and charge Pro up to 500 priority videos 10,000 credits unlimited relax videos up to 1080p resolution 20 second duration download without Watermark here you can download with Watermark I guess I don’t know yeah we’ll see uh about uh everything okay Char but charity Pro is $200 per month so huh yeah it’s expensive right so yes let’s uh do something creative so okay I will write here okay polar be enjoin on the Sahara Desert okay Sahara Desert yeah okay you can do storyboard as well or you can create directly videos okay so let me show you the storyboard first yeah so frame by frame you can give you know different uh what to say prompt okay here you can give different prompt okay polar beer with family okay playing with scent like this okay and later on it will create a whole the video okay third you can describe again you can add image like this okay this is a story created by the chgb okay let’s create okay added to the queue okay it’s very fast actually almost done yeah see with family you can see playing with the scand okay so these are the two variation okay you can choose either this or either that one or either that one okay I’m feeling this muches yeah so here you can again addit your story recut you can trim or extend this video in a new story board okay so basically record features allow you to creators to you know pinpoint and isolate the most impactful frame in a video extending uh them in either direction to build out of like complete scene okay if you choose recut okay this thing fine then remix what remix do is like the remix features allows user to reimagine existing videos by alterating their components without losing you know that essence of the original originality you can say okay you want to you know add or remove certain things okay what if I want to remove you know that this polar be or like this okay or you can say we can you know change colors or we can some tweak visual elements and this blend so this blend feature allows you to combine with different video if I want to upload some videos it will blend both the video this video particular with that video which I will upload okay right and the last Loop you know by the name Loop features you know uh feature make it easy to create seamless repetition of the video okay this will like this this is one option is ideal for background visuals music videos like this okay so this is how you can create video in 2 minutes I can say just by giving prompt okay this one is favorite you can save it for the favorite and this this you can sharing options are there copy link or this unpublished and you can download see I told you without Watermark is available in only pro version so I this with Watermark you can download see download a video and just a click or you can download as a GFI as well right and uh add to a folder okay fine this is the notification activity right so let’s create one okay monkey with family driving car on this space yeah so okay I will choose this temp 16 by9 let it takes more credit of mine it’s okay yeah add it to the queue if you’ll go to favorites it will come this one because I chose it okay and if you will ask how this Sora is working so it’s like text to image Genera AI model such as like d three stable diffusion and M so Sora is are diffusion models that means that it starts with each frame of the video consisting of the static noise see oh it’s cartoonish but yeah see if you want Lamborghini you can add that I want Lamborghini or Tesla whatever so this is how you can generate videos with Sora you know in a quick in quick two minutes so just write notebook LM in the browser and it will land here so this is the landing page and I’ll give you an overview of the website now when you come down scroll down and you will get how people are using notebook LM it’s power study organize your thinking and sparking new ideas then you will also get some reviews what people are saying like notebook LM blew our mind and basically all the good reviews now you can see this notebook LM plus if you click on this you will basically get the premium features and the subscription plans so this is free for individuals to get started and here these are the points you will get if you subscribe for Notebook LM plus so I’ll go back for this overview section and I will click on try notebook LM so when you click on this it will get here you can go to settings and basically I have created a dark mode because it is soothing we can even create a device or a light mode now here you can even click on this to get this type of view and even you can uh click on this boxes to get a box type view so basically I will click on this create new and as soon as I click on this I get this to upload the sources so I will just close this to show how it looks so it looks good now it’s time to upload the files so basically when you click on ADD Source you will get this and when you scroll down you will get three types of ways to upload the files Google Drive Link and paste text and even you can find the source limit over here like when you upload sources it should not be more than 50 so fine I’ll upload I upload three medical reports I’ll upload another Jan and Michael reports are done and I upload John these are basically the random medical reports I have collected from the internet to just show you how it works you can even add YouTube video links and even drive links now when you click on a particular thing particular report you will get Source guide a summary basically and even you will get some key topics now if I brought this thing you can see you will get some prompts over here already there a pre-written promps what factors contributed to James Smith’s anemia diagonosis so I will choose this now you can see it has provided an Insight James Smith’s anemia diagnosis is based on her low hemoglobin level of 9.5 so this is basically a clickable one that it gives a reference to what it has taken from she is also experiencing fattic and pale skin which are common symtoms of anemas the same reference you can refer to basically these are the proofs that it has taken all the insights from these three reports so now if you want notebook LM to just ignore one particular report you can basically unselect it from here and if you want everything to get selected click over here so when I have chosen this pre-selected prompt it has basically use the resources provided and find the helpful insights for that particular resource for example you can see on the screen and in some case they will provide some content that will be a larger one and they will even provide references for that m one is very short and so it has basically provided only one reference you can even add sources by clicking on this but remember it should not exit the level I have only uploaded three you can add Mor 47 you can now even see that this is one it is written save to note basically means you can save this answer for future reference like if I do save note it will basically appear over here now the best thing is you can even have a feedback of this particular response you have got maybe a good response or a bad response you can even copy here and paste it in another place this is very helpful for the students who study from different materials and may get puzzled writing down notes so basically they can save notes for future dos and even they can copy from here and paste it somewhere else you can even delete this future note like delete note and you can delete now if you see on the right section you will see a note section over here now you can even add a note you can physically write down what you have understood or maybe anything that is important to you basically acting as your notepad now coming to the surprising part that is the studio over here previously it was a notebook guide but they have recently updated the features it’s basically like a guide for you containing study guide The Briefing documents FAQ and the timeline now if we click on briefing the document it create a brief document taking the help of the resources you have provided so basically you can get a brief document out of the three resources you have provided so I’ll click on this and you can see John Michael and Jane three of them are involved and it has given an overview of all the three reports over here so this is basically making a summary of these three reports you can even get a study guide prepared for you that will clear out your idea even more now I’m very excited as I’m going to show a magical thing this can actually convert your resources rather the summar is into a podcast a podcast if you don’t no is like a radio show you can listen to anytime online it covers different topics like stories discussions or even informations you want to know so it is written audio overview you deep dive conversation of two hes in English only you can even customize it but I will generate it may take a few minutes so just stick around it’s almost a 8 minutes audio of James Smith’s animia diagnosis so we will hear that okay so we’ve got a stack of medical reports here all right and uh we’re going to take a look at three different patients sounds good uh we have Jane Smith she’s 29 years old okay we’ve got uh John Doe he is 45 right and we’ve got Michael Johnson and he is 52 okay a good spread yeah and you know it’s really interesting how these cases even though they’re different stages of life right K offers a window into some pretty common health challenges yeah definitely so are you ready to dive in I’m ready let’s do it okay so first up we’ve got Jane Smith and now it’s incredible how it actually turns the normal resources that I have provided just a three medical reports of Jane John and Michael to a real podcast so all thanks to AI it has made it conversational and even you can get a whole overview of these three reports in a very conversational and like a podcast real podcast way now you can even click on this three dots and you can change the playback speed and you can even download from here and if you don’t like it you can even delete from here it depends on your requirement the best thing is you can even give your feedback and help it grow that makes a sense because there’s always a room to upgrade now you must be thinking why it is helpful it can normally be used as a podcast maker but it can also be helpful for the people who can even hear and remember Concepts more than just by studying like a mundan routine let’s suppose you have 50 sources now it might be difficult for you to read line by line and document by document so it’s better to generate a summary even better to listen to a podcast and get an overview of all the sources and that’s how it works it was definitely heart rendering experience converting reports to a podcast now you must be thinking who will get benefited by this Google notebook LM notebook LM is for everyone who works with information students can simplify studying by summarizing notes and organize resources content creators can turn ideas into engaging podcast or easily structure their research professionals can save time by managing reports presentations or complex data whether you are learning creating or working on big projects notebook LM helps you do it faster smarter and with less effort so I can foresee that not making is about to hit New Heights and the way we have been doing it might soon be a thing of past with AI stepping in Google notebook LM is just the start of this exciting Journey it’s still in its early stages but only will get better from here I’m thrilled to see the amazing things it can do and I hope you are too think about this you’re about to create something amazing an AI that can think learn and grow in ways we only dreamed of and here’s the best part you don’t need to be an AI expert to make it happen what if you could use Lang chain a tool that connects most advanced language models to realtime data allowing you to build AI applications that are both smart and flexible it sounds like something out of Science Fiction but with lanch it’s real as large language models quickly become the backbone of many applications L chain has emerged as a game changing tool transforming the way we use these powerful Technologies today we are diving into Lang chain the ultimate framework that makes AI development easier for everyone whether you want to understand user questions with one llm create humanlike responses with another or pulling Data Insights Lang chain makes it all happen but Lang chain is more than just making AI easy to use it’s about getting these models to work together seamlessly L chain simplifies what could be a complex process into simple powerful system from Smart chat BS to enhancing data for machine learning the possibilities with L chain are endless so why has Lang chain become one of the fastest growing open source projects ever and how you can use Lang chain to get ahead in the world of AI so let’s first start by understanding what is Lang chain Lang chain is an open source framework designed to help developers build AI powered applications using large language models or llms just like GPD 4 but what really sets langin apart is its ability to link these powerful models with external data sources and other components so this allows you to create sophisticated natural language processing NLP applications that can do much more than just understand and generate text they can interact with live data data bases and other software tools now you might be asking is Lang chain a python Library yes it is L chain is available as a python Library which means you can easily integrate into your existing python projects but it doesn’t stop there langin is also available in JavaScript and typescript making it accessible to a wide range of developers whether you’re working on a web app or backend system or a standalone tool Lang chain fits right in so why should we use Lang chain so why is Lang chain such a big deal developing AI applications typically requires using multiple tools and writing a lot of complex score you need to manage data retrieval processing integration with language models and many more this can be time consuming and complicated especially if you’re not deeply familiar with AI Lang chain simplifies the entire process allowing you to develop and deploy and even manage AI applications more easily and efficiently let’s break down this with an example imagine you’re building a chart board that needs to provide realtime weather updates without land chain you would need to manually connect your to weather API fetch data process it and then format the response but with L chain the process becomes much more straightforward you can focus on what matters the most building the features and functionalities of your application while L chain handles the complex Integrations behind the scenes so let’s discuss the key features of L chain L chain is packed with features that make it incredibly powerful and flexible let’s take a closer look at some of the key components at first we have model interaction langin allows you to interact with any language model seamlessly it manages the inputs and outputs to these data models ensuring that you can integrate them into your application without a hitch for example if you want to use gp4 to generate responses to customer inquiries Lang chain makes it easy to plug that model into your workflow next we have data connection and retrieval one of the Lang chain strength is its ability to connect to external data sources so whether you need to pull data from a datab Bas or web API or even a file system L chain simplifies this process you can retrieve transform and use data from almost any Source making your AI applications more robust and versatile next we have chains Lang chain introduces the concept of chains where you can link multiple models and components together to perform complex task for example you might have a chain where one component retrieves data and another processes it and a third generates a humanlike respond this chaining ability allows you to build workflows that would otherwise require extensive coding next we have agents agents are like the decision makers in Lang chain they can create commands deciding the best course of action based on the input they receive for example an agent Cloud determine which language model to use based on the type of query it’s handling making your application smarter and more adaptive then we have memory Lang chain supports both shortterm and long-term memory making that your a I can remember past interactions this is particularly useful for applications like chat boards where maintaining context over multiple interactions are significantly improve the user experience imagine you’re building a virtual assistant the assistant needs to remember previous interactions to provide relevant responses now with the help of Lang chain you can easily Implement memory so that the assistant knows what you have talked before making the conversation more natural and engaging so what are the integr supported by Lang chain well Lang chain is designed to work seamlessly with a wide variety of Integrations making it extremely versatile for different use cases llm providers Lang chain supports integration with major llm providers like open AI hugging face and goor this means you can easily incorporate the latest and most powerful language models into your applications then we have data sources Lang chain can connect to variety of data sources such as Google search Wikipedia and Cloud platforms like AWS Google cloud and Azure this makes it easy to retrieve and use the most upto-date information in your applications Vector databases are used for handling large volumes of complex data such as images or long text sochain integrates with Vector databases like pine cone and these databases store data as high dimensional vectors which helps in allowing for efficient and accurate retrieval so this is particularly useful for applications that require searching through large data sets quickly for example let’s say you are building an application that needs to analyze thousands of documents to find relevant information with Lang chain you can integrate a vector database like pine cones to your documents as vectors and quickly search through using them powerful language models this capability can save you a lot of time and make your application much more effective now the question arises how to create proms in Lang chain creating prompts in Lang chain is much easier with something called a prompt template a prompt template acts as a set of instructions for language model and these templates can be customized to varing levels of customizations for example you might design a prom template to ask simple questions or you could create more detailed instructions that guide the language model to produce high quality responses let’s walk through how you can create a prompt using Lang chain in Python Step One is installing Lang chain first you’ll need to have python installed in your system once that’s set up you can install Lang Chain by opening your python shell or terminal and running the following command pip install Lang chain next step is adding Integrations to Lang chain this often requires at least one integration to function properly a common choice is open ai’s language model API to use open AI API you’ll need to create an account on the openai website and obtain your API key after that install open’s python package and input your API key like this so this is the following uh command which is inserted below you can look into that next step is importing and using a prom template now that you have L chain and the necessary integration set up you can start creating your promps langin offers a pre-made prom template that allows you to structure your text in a way that the language model can easily understand here’s how you can do it through this particular prompt given below so in this Example The Prompt template ask two variables which is an objective and a Content subject and uses them to generate a prompt the output might be something like tell me an interesting fact about zebras the language model would then take this prompt and return a relevant fact about zebras based on the given objective this is simple but powerful way to generate Dynamic prompt that can be adapted to a wide range of task from answering questions to generating creative content let’s now talk about how to develop applications with Lang chain so building applications with Lang chain is straightforward and involves a few key steps first Define your application know exactly what problem it’s solving and identify the necessary components like language models data sources and user interaction the next step is to build the functionality using Lang chains components such as prompt chains and agents this is where you can create the logic that drives your application like processing user input or retrieving data then we have customizing your application to meet specific needs Lang chains flexibility allows you to tweak proms integrate additional data sources and fine tune models for Optimal Performance before going live it’s crucial to test and deploy your application testing helps catch any issues and L chain makes debugging easy so you can deploy your application with confidence for example let’s build a chatboard using Lang chain first we have to Define it is a chart board that answers question about technology Trends we then create a functionality by setting a prompt and a chain to process input next we have customization we customize it by integrating a new API to pull in the latest information and finally we test and deploy the chatboard to ensure it responds accurately to users so Lang chain offers Endless Possibilities across various Industries let’s now look into the examples and use cases of Lang chain Lang chain offers Endless Possibilities across various Industries you can create customer service chart boards that manage queries and transaction or coding assistance that suggest Cod Snippets and debug issues in healthcare land chain can assist doctors with diagnosis and patient data management then we have marketing and e-commerce it can analyze consumer Behavior generate product recommendations and craft comp compelling product descriptions so with the help of this AI assistant it helps doctors make quicker more informed decisions so lch is a powerful framework that makes EI development accessible and efficient now as I mentioned one of the secret sources of deep learning is neural networks let’s see what neural networks is neural networks is based on our biological neurons the whole concept of deep learning and artificial intelligence is based on human brain and human brain consists of billions of tiny stuff called neurons and this is how a biological neuron looks and this is how an artificial neuron look so neural networks is like a simulation of our human brain human brain has billions of biological neurons and we are trying to simulate the human brain using artificial neurons this is how a biological neuron looks it has dendrites and the corresponding component with an artificial neural network is or an artificial neuron are the inputs they receive the inputs through ddes and then there is the cell nucleus which is basically the processing unit in a way so in artificial neuron also there is uh a piece which is an equivalent of this cell nucleus and based on the weights and biases we will see what exactly weights and biases are as we move the input gets processed and that results in an output in a biological neuron the output is sent through a synapse and in an artificial neuron there is an equivalent of that in the form of an output and biological neurons are also interconnected so there are billions of neurons which are interconnected in the same way artificial neurons are also interconnected so this output of this neuron will be fed as an input to another neuron and so on now in neural network one of the very basic units is a perceptron so what is a perceptron A perceptron can be considered as one of the fundamental units of neural networks it can consist at least one neuron but sometimes it can be more than one neuron but you can create a perceptron with a single neuron and it can be used to perform certain functions it can can be used as a basic binary classifier it can be trained to do some basic binary classification and this is how a basic perceptron looks like and this is nothing but a neuron you have inputs X1 X2 X to xn and there is a summation function and then there is what is known as an activation function and based on this input what is known as the weighted sum the activation function either gets gives an outut put like a zero or a one so we say the neuron is either activated or not so that’s the way it works so you get the inputs these inputs are each of the inputs are multiplied by a weight and there is a bias that gets added and that whole thing is fed to an activation function and then that results in an output and if the output is correct it is accepted if it is wrong if there is an error then that error is fed back and the neuron then adjust the weights and biases to give a new output and so on and so forth so that’s what is known as the training process of a neuron or a neural network there’s a concept called perceptron learning so perceptron learning is again one of the very basic learning processes the way it works is somewhat like this so you have all these inputs like X1 to xn and each of these inputs is multiplied by a weight and then that sum this is the formula of the equation so that sum W are i x i Sigma of that which is the sum of all these product of X and W is added up and then a bias is added to that the bias is not dependent on the input but or the input values but the bias is common for one neuron however the bias value keeps changing during the training process once the training is completed the values of these weights W1 W2 and so on and the value of the bias gets fixed so that’s that is basically the whole training process and that is what is known as the perceptron training so the weights and biases keep changing till you get the accurate output and the summation is of course passed through the activation function as you see here this wixi summation plus b is passed through activation function and then the neuron gets either fired or not and based on that there will be an output that output is compared with the actual or expected value which is also known as labeled information so this is the process of supervised learning so the output is already known and um that is compared and thereby we know if there is an error or not and if there is an error the error is fed back and the weights and biases are updated accordingly till the error is reduced to the minimum so this iterative process is known as perceptron learning or perceptron learning Rule and this error needs to be minimized so until the error is minimized this iteratively the weights and biases keep changing and that is what is the training process so the whole idea is to update the weights and the bias of the perceptron till the error is minimized the error need not be zero the error may not ever reach zero but the idea is to keep changing these weights and bias so that the error is minimum the minimum possible that it can have so this whole process is an iterative process and this is the iteration continues till either the error is zero which is uh unlikely situation or it is the minimum possible Within These given conditions now in 1943 two scientists Warren mik and Walter pittz came up with an experiment where they were able to implement the logical functions like and or and nor using neurons and that was a significant breakthrough in a sense so they were able to come up with the most common logical Gates they were able to implement some of the most common logical Gates which could take two inputs Like A and B and then give a corresponding result so for example in case of an and gate A and B and then the output is a in case of an R gate it is a plus b and so on and so forth and they were able to do this using a single layer perceptron now most of these GS it was possible to use single layer perceptron except for XR and we will see why that is in a little bit so this is how an and gate works the inputs A and B the output should be fired or the neuron should be fired only when both the inputs are one so if you have 0 0 the output should be zero for 01 it is again 0 1 0 again 0 and 1 one the output should be one so how do we implement this with a neuron so it was found that by changing the values of Weights it is possible to achieve this logic so for example if we have equal weights like 7 7 and then if we take the sum of weighted product so for example 7 into 0 and then 7 into 0 will give you 0 and so on and so forth and in the last case when both the inputs are one you get a value which is greater than one which is the threshold so only in this case the neuron gets activated and the output is there is an output in all the other cases there is no output because the threshold value is one so this is implementation of an hand gate using a single perceptron or a single neuron similarly an orgate in order to implement an orgate in case of an orgate the output will be one if either of these inputs is one so for example 01 will result in one or rather in all the cases it is one except for 0 0 so how do we implement this using a perceptron once again if you have a perceptron with weights for example 1.2 now if you see here if in the first case when both are zero the output is zero in the second case when it is 0 and 1 1.2 into 0 is 0 and then 1.2 into 1 is 1 and in the second case similarly the output is 1.2 in the last case when both the inputs are one the output is 2.4 so during the training process these weights will keep changing and then at one point where the weights are equal to W1 is equal to 1.2 and W2 is equal to 1.2 the system learns that it gives the correct output so that is implementation of orgate using a single NE on or a single layer perceptron now Exar gate this was one of the challenging ones they tried to implement an Exar gate with a single level perceptron but it was not possible and therefore in order to implement an XR so this was like a a roadblock in the progress of U neural network however subsequently they realize that this can be implemented and XR gate can be implemented using a multi-level perceptron or M l p so in this case there are two layers instead of a single layer and this is how you can Implement an XR gate so you will see that X1 and X2 are the inputs and there is a hidden layer and that’s why it is denoted as H3 and H4 and then you take the output of that and feed it to the output at 05 and provide a threshold here so we will see here that this is the numerical calculation so the weights are in this case for X1 it is 20 and minus 20 and once again 20 and minus 20 so these inputs are fed into H3 and H4 so you’ll see here for H3 the input is 01 1 1 and for H4 it is 1 1 1 and if you now look at the output final output where the threshold is taken as one if you use a sigmoid with the threshold one you will see that in these two cases it is zero and in the the last two cases it is one so this is a implementation of XR in case of XR only when one of the inputs is one you will get an output so that is what we are seeing here if we have either both the inputs are one or both the inputs are zero then the output should be zero so that is what is an exclusive or gate so it is exclusive because only one of the inputs should be one and then only you’ll get an output of one which is Satisfied by this condition so this is a special implement mation XR gate is a special implementation of perceptron now that we got a good idea about perceptron let’s take a look at what is a neural network so we have seen what is a perceptron we have seen what is a neuron so we will see what exactly is a neural network a neural network is nothing but a network of these neurons and they are different types of neural networks there are about five of them these are artificial neural network convolutional neural network then recursive neural network Network or recurrent neural network deep neural network and deep belief Network so and each of these types of neural networks have a special you know they can solve special kind of problems for example convolutional neural networks are very good at performing image processing and image recognition and so on whereas RNN are very good for speech recognition and also text analysis and so on so each type has some special characteristics and they can they’re good at performing certain special kind of tasks what are some of the applications of deep learning deep learning is today used extensively in gaming you must have heard about alphao which is a game created by a startup called Deep Mind which got acquired by Google and alphao is an AI which defeated the human world champion lead do in this game of Go so gaming is an area where deep learning is being extensively used and a lot of research happens in the area of gaming as well in addition to that nowadays there are neural networks or special type called generative adversarial networks which can be used for synthesizing either images or music or text and so on and they can be used to compose music so the neural network can be trained to comp compose a certain kind of music and autonomous cars you must be familiar with Google Google’s self-driving car and today a lot of Automotive companies are investing in this space and uh deep learning is a core component of this autonomous Cars the cars are trained to recognize for example the road the the lane markings on the road signals any objects that are in front any obstruction and so on and so forth so all this involves deep learning so that’s another major application and robots we have seen several robots including Sofia you may be familiar with Sophia who was given a citizenship by Saudi Arabia and there are several such robots which are very humanlike and the underlying technology in many of these robots is deep learning medical Diagnostics and Health Care is another major area where deep learning is being used and within Healthcare Diagnostics again there are multiple areas where deep learning and image recognition image processing can be used for example for cancer detection as you may be aware if cancer is detected early on it can be cured and one of the challenges is in the availability of Specialists who can diagnose cancer using these diagnostic images and various scans and and so on and so so forth so the idea is to train neural network to perform some of these activities so that the load on the cancer specialist doctors or oncologist comes down and there is a lot of research happening here and there are already quite a few applications that are claimed to be performing better than human beings in this space can be lung cancer it can be breast cancer and so on and so forth so Healthcare is a major area where deep learning is being applied let’s take a look at the inner working of a neural network so how does an artificial neural network let’s say identify can we train a neural network to identify the shapes like squares and circles and triangles when these images are fed so this is how it works any image is nothing but it is a digital information of the pixels so in this particular case let’s say this is an image of 28x 28 pixels and this is an image of a square there’s a certain way in which the pixels are lit up and so this pixels have a certain value maybe from 0 to 256 and 0 indicates that it is black or it is dark and 256 indicates it is completely it is white or lit up so that is like an indication or a measure of the how the pixels are lit up and so this is an image is let let’s say consisting of information of 784 pixels so all the information what is inside this image can be kind of compressed into the 784 pixels the way each of these pixels is lit up provides information about what exactly is the image so we can train neural networks to use that information and identify the images so let’s take a look how this works so each neuron the value if it is close to one that means it is white whereas if it is close to zero that means it is black now this is a an animation of how this whole thing works so these pixels one of the ways of doing it is we can flatten this image and take this complete 784 pixels and feed that as input to our neural network neural network can consist of probably several layers there can be a few hidden layers and then there is an input layer and an output layer now the input layer take these 784 pixels as input the values of each of these pixels and then you get an output which can be of three types or three classes one can be a square a circle or a triangle now during the training process there will be initially obviously you feed this image and it will probably say it’s a circle or it will say it’s a a triangle so as a part of the training process we then send that error back and the weights and the biases of these neurons are adjusted till it correctly identifies that this is a square that is the whole training mechanism that happens out here now let’s take a look at a circle same way so you feed these 784 pixels there is a certain pattern in which the pixels are lit up and the neural network is trained to identify that pattern and during the training process once again it would probably initially identify it incorrectly saying this is a square or a triangle and then that error is fed back and the weights and biases are adjusted finally till it finally gets the image correct so that is the training process so now we will take a look at same way a triangle so now if you feed another image which is consisting of triangle so this is the training process now we have trained our neural network to classify these images into a triangle or a circle and a square so now this neural network can identify these three types of objects now if you feed another image and it will be able to identify whether it’s a square or a triangle or a circle now what is important to be observed is that when you feed a new image it is not necessary that the image or the the triangle is exactly in this position now the neural network actually identifies the patterns so even if the triangle is let’s say positioned here not exactly in the middle but maybe at the corner or in the side it would still identify that it is a triangle and that is the whole idea behind pattern recognition so how does this training process work this is a quick view of how the training process works so we have seen that a neuron consists of inputs it receives inputs and then there is a weighted sum which is nothing but this XI wi summation of that plus the bias and this is then fed to the activation function and that in turn gives us a output now during the training process initially obviously when you feed these images when you send maybe a square it will identify it as a triangle and when you maybe feed a triangle it will identify as a square and so on so that error information is fed back and initially these weights can be random maybe all of them have zero values and then it will slowly keep changing so the as a part of the training process the values of these weights W1 W2 up to WN keep changing in such a way that to towards the end of the training process it should be able to identify these images correctly so till then the weights are adjusted and that is known as the training process so and these weights are numeric values could be 0.525 35 and so on it could be positive or it could be negative and the value that is coming here is the pixel value as we have seen it can be anything between 0 to 1 you can scale it between 0 to 1 or 0 to 256 whichever way Z being black and 256 being white and then all the other colors in between so that is the input so these are numerical values this multiplication or the product W ixi is a numerical value and the bias is also a numerical value we need to keep in mind that the bias is fixed for a neuron it doesn’t change with the inputs whereas the weights are one per input so that is one important point to be noted so but the bias also keeps changing initially it will again have a random value but as a part of the training process the weights the values of the weights W1 W2 WN and the value of B will change and ultimately once the training process is complete these values are fixed for this particular neuron W1 W2 up to WN and plus the value of the B is also fixed for this particular neuron and in this way there will be multiple neurons and each there may be multiple levels of neurons here and that’s the way the training process work works so this is another example of multi-layer so there are two hidden layers in between and then you have the input layer values coming from the input layer then it goes through multiple layers hidden layers and then there is an output layer and as you can see there are weights and biases for each of these neurons in each layer and all of them gets keeps changing during the training process and at the end of the training process all these weights have a certain value and that is a trained model and those values will be fixed once the training is completed all right then there is something known as activation function neural networks consists of one of the components in neural networks is activation function and every neuron has an activation function and there are different types of activation functions that are used it could be a relu it could be sigmoid and so on and so forth and the activation function is what decides whether a neuron should be fired or not so whether the output should be zero or one is decided by the activation function and the activation function in turn takes the input which is the weighted sum remember we talked about wixi + B that weighted sum is fed as a input to the activation function and then the output can be either a zero or a one and there are different types of activation functions which are covered in an earlier video you might want to watch all right so as a part of the training process we feed the inputs the labeled data or the training data and then it gives an output which is the predicted output by the network which we indicate as y hat and then there is a labeled data because we for supervised learning we already know what should be the output so that is the actual output and in the initial process before the training is complete obviously there will be error so that is measured by what is known as the cost function so the difference between the predicted output and the actual output is the error and the cost function can be defined in different ways there are different types of cost functions so in this case it is like the average of the squares of the error so and then all the errors are added which can sometimes be called as sum of squares sum of square errors or SSC and that is then fed as a feedback in what is known as backward propagation or back propagation and that helps in the network adjusting the weights and biases and so the weights and biases get updated till this value the error value or the cost function is minimum now there is a optimization technique which is used here called gradient descent optimization and this algorithm Works in a way that the error which is the cost function needs to be minimized so there’s a lot of mathematics that goes behind find this for example they find the local Minima the global Minima using the differentiation and so on and so forth but the idea is this so as a training process as the as a part of training the whole idea is to bring down the error which is like let’s say this is the function the cost function at certain levels it is very high the cost value of the cost function the output of the cost function is very high so the weight have to be adjusted in such a way and also the bias of course that the cost function is minimized so there is this optimization technique called gradient descent that is used and this is known as the learning rate now gradient descent you need to specify what should be the learning rate and the learning rate should be optimal because if you have a very high learning rate then the optimization will not converge because at some point it will cross over to the side on the other hand if you have very low learning rate then it might take forever to convert so you need to come up with the optimum value of the learning rate and once that is done using the gradient descent optimization the error function is reduced and that’s like the end of the training process all right so this is another view of gradient descent so this is how it looks this is your your cost function the output of the cost function and that has to be minimized using gradient descent algorithm and these are like the parameters and weight could be one of them so initially we start with certain random values so cost will be high and then the weights keep changing and in such a way that the cost function needs to come down and at some point it may reach the minimum value and then it may increase so that is where the gradient descent algorithm decides that okay it has reach the minimum value and it will kind of try to stay here this is known as the global Minima now sometimes these curves may not be just for explanation purpose this has been drawn in a nice way but sometimes these curves can be pretty erratic there can be some local Minima here and then there is a peak and then and so on so the whole idea of gradient desent optimization is to identify the global Minima and to find the weights and the bias at that particular point so that’s what is gradient descent and then this is another example so you can have these multiple local Minima so as you can see at this point when it is coming down it may appear like this is a minimum value but then it is not this is actually the global minimum value and the gradient desent algorithm will make an effort to reach this level and not get stuck at this point so the algorithm is already there and it knows how to identif ify This Global minimum and that’s what it does during the training process now in order to implement deep learning there are multiple platforms and languages that are available but the most common platform nowadays is tensor flow and so that’s the reason we have uh this tutorial we created this tutorial for tensor flow so we will take you through a quick demo of how to write a tensorflow code using Python and tensorflow is uh an open source platform created by Google so let’s just take a look at the details of tens ofl and so this is a a library a python Library so you can use python or any other languages it’s also supported in other languages like Java and R and so on but python is the most common language that is used so it is a library for developing deep learning applications especially using neural networks and it consists of primarily two parts if you will so one is the tensors and then the other is the graphs or the flow that’s the way the name that’s the reason for this kind of a name called tensorflow so what are tensors tensors are like multi-dimensional arrays if you will that’s one way of looking at it so usually you have a onedimensional array so first of all you can have what is known as a scalar which means a number and then you have a onedimensional array something like this which means this is like a set of numbers so so that is a one-dimension array then you can have a two-dimensional array which is like a matrix and beyond that sometimes it gets difficult so this is a three-dimensional array but tens of flow can handle many more Dimensions so it can have multi-dimensional arrays that is the strength of tensor flow and which makes computation deep learning computation much faster and that’s the reason why tensor flow is used for developing deep learning applications so so tensor flow is a deep learning tool and this is the way it works so the data basically flows in the form of tensors and the way the programming works as well is that you first create a graph of how to execute it and then you actually execute that particular graph in the form of what is known as a session we will see this in the tensorflow code as we move forward so all the data is managed or manipulated in tensors and then the processing happens using these graphs there are certain terms called like for example ranks of a tensor the rank of a tensor is like a dimensional dimensionality in a way so for example if it is scalar so there is just a number just one number the rank is supposed to be zero and then it can be a one-dimensional vector in which case the rank is supposed to be one and then you can have a two-dimensional Vector typically like a matrix then in that case we say the rank is two and then if it is a three-dimensional array then rank is three and so on so it can have more than three as well so it is possible that you can store multi-dimensional arrays in the form of tensors so what are some of the properties of tensor flow I think today it is one of the most popular platform torf flow is the most popular deep learning platform or Library it is open source it’s developed by Google developed and maintained by Google but it is open source one of the most important things about tensorflow is that it can run on CPUs as well as gpus GPU is a graphical Processing Unit just like CPU is central processing unit now in earlier days GPU was used for primarily for graphics and that’s how the name has come and one of the reasons is that it cannot perform generic activities very efficiently like CPU but it can perform iterative actions or computations extremely fast and much faster than a CPU so they are really good for computational activities and in deep learning there is a lot of iterative computation that happens so in the form of matrix multiplication and so on so gpus are very well suited for this kind of computation and tensorflow supports both GPU as well as CPU and there’s a certain way of writing code in tensorflow we will see as we go into the code and of course tensorflow can be used for traditional machine learning as well but then that would be an Overkill but just for understanding it may be a good idea to start writing code for a normal machine learning use case so that you get a hang of how tensorflow code works and then you can move into neural networks so that is um just a suggestion but if you’re already familiar with how tens oflow works then probably yeah you can go straight into the neural networks part so in this tutorial we will take the use case of recognizing handwritten digits this is like a hollow world of deep learning and this is a nice little amness database is a nice little database that has images of handwritten digits nicely formatted because very often in deep learning and neural networks we end up spending a lot of time in preparing the data for training and with amness database we can avoid that you already have the data in the right format which can be directly used for training and amnest also offers a bunch of built-in utility functions that we can straight away use and call those functions without worrying about writing our own functions and that’s one of the reasons why mes database is very popular for training purposes initially when and people want to learn about deep learning and tensor flow this is the database that is used and it has a collection of 70,000 handwritten digits and a large part of them are for training then you have test just like in any machine learning process and then you have validation and all of them are labeled so you have the images and they’re label and these images they look somewhat like this so they are handwritten images collected from from a lot of individuals people have these are samples written by human beings they have handwritten these numbers these numbers going from 0 to 9 so people have written these numbers and then the images of those have been taken and formatted in such a way that it is very easy to handle so that is amness database and the way we are going to implement this in our tens oflow is we will feed this data especially the training data along with the label information and uh the data is basically these images are stored in the form of the pixel information as we have seen in one of the previous slides all the images are nothing but these are pixels so an image is nothing but an arrangement of pixels and the value of the pixel either it is lit up or it is not or in somewhere in between that’s how the images are stored and that is how they are fed into the neural network and and for training once the network is trained when you provide a new image it will be able to identify within a certain error of course and for this we will use one of the simpler neural network configurations called softmax and for Simplicity what we will do is we will flatten these pixels so instead of taking them in a two-dimensional arrangement we just flatten them off so for example it starts from here it is a 28 by 28 so there are 7484 pixels so pixel number one starts here it goes all the way up to 28 then 29 starts here and goes up to 56 and so on and the pixel number 784 is here so we take all these pixels flatten them out and feed them like one single line into our neural network and this is a what is known as a softmax layer what it does is once it is trained it will be able to identify what digit this is so there are in this output layer there are 10 neurons each signifying a digit and at any given point of time when you feed an image only one of these 10 neurons gets activated so for example if this is strained properly and if you feed a number nine like this then this particular particular neuron gets activated so you get an output from this neuron let me just use uh a pen or a laser to show you here okay so you’re feeding a number nine let’s say this has been trained and now if you’re feeding a number nine this will get activated now let’s say you feed one to the trained Network then this neuron will get activated if you feed two this neuron will get activated and so on I hope you get the idea so this is one type of a neural network or an activation function known as softmax layer so that’s what we will be using here this one of the simpler ones for quick and easy understanding so this is how the code would look we will go into our lab environment in the cloud and uh we will show you there directly but very quickly this is how the code looks and uh let me run you through briefly here and then we will go into the Jupiter notebook where the actual code is and we will run that as well so as a first step first of all we are using python here and that’s why the syntax of the language is Python and the first step is to import the tensorflow library so and we do this by using this line of code saying import tensor flow as DF DF is just for convenience so you can name give any name and once you do this TF is tens flow is available as an object in the name of TF and then you can run on its uh methods and accesses its attributes and so on and so forth and M database is actually an integral part of tensor flow and that’s again another reason why we as a first step we always use this example Mist database example so you just simply import mnist database as well using this line of code and you slightly modify this so that the labels are in this format what is known as one hot true which means that the label information is stored like an array and uh let me just uh use the pen to show what exactly it is so when you do this one hot true what happens is each label is stored in the form of an array of 10 digits and let’s say the number is uh 8 okay so in this case all the remaining values there will be a bunch of zeros so this is like array at position zero this is at position one position two and so on and so forth let’s say this is position 7 then this is position 8 that will be one because our input is eight and again position 9 will be zero okay so one hot encoding this one hot encoding true will kind of load the data in such a way that the labels are in such a way that only one of the digits has a value of one and that indicat So based on which digit is one we know what is the label so in this case the eighth position is one therefore we know this sample data the value is eight similarly if you have a two here let’s say then the labeled information will be somewhat like this so you have your labels so you have
this as zero the zeroth position the first position is also zero the second position is one because this indicates number two and then you have third as zero and so on okay so that is the significance of this one hot true all right and then we can check how the data is uh looking by displaying the the data and as I mentioned earlier this is pretty much in the form of digital form like numbers so all these are like pixel values so you will not really see an image in this format but there is a way to visualize that image I will show you in a bit and uh this tells you how many images are there in each set so the training there are 55,000 images in training and in the test set there are 10,000 and then validation there are 5,000 so alog together there are 70,000 images all right so let’s uh move one and we can view the actual image by uh using the matplot clip library and this is how you can view this is the code for viewing the images and you can view the them in color or you can view them in Gray scale so the cmap is what tells in what way we want to view it and what are the maximum values and the minimum values of the pixel values so these are the Max and minimum values so of the pixel values so maximum is one because this is a scaled value so one means it is uh White and zero means it is black and in between is it can be anywhere in between in black and white and the way to train the model there is a certain way in which you write your tsor flow code and um the first step is to create some placeholders and then you create a model in this case we will use the softmax model one of the simplest ones and um placeholders are primarily to get the data from outside into the neural network so this is a very common mechanism that is used and uh then of course you will have variables which are your remember these are your weights and biases so for in our case there are 10 neurons and each neuron actually has 784 because each neuron takes all the inputs if we go back to our slide here actually every neuron takes all the 784 inputs right this is the first neuron it has it receives all the 784 this is the second neuron this also receives all the 78 so each of these inputs needs to be multip multiplied with the weight and that’s what we are talking about here so these are this is a a matrix of 784 values for each of the neurons and uh so it is like a 10 by 784 Matrix because there are 10 neurons and uh similarly there are biases now remember I mentioned bias is only one per neuron so it is not one per input unlike the weights so therefore there are only 10 biases because there are only 10 neurons in this case so that is what we are creating a variable for biases so this is uh something little new in tensor flow you will see unlike our regular programming languages where everything is a variable here the variables can be of three different types you have placeholders which are primarily used for feeding data you have variables which can change during the course of computation and then a third type which is is not shown here are constants so these are like fixed numbers all right so in a regular programming language you may have everything as variables or at the most variables and constants but in tens oflow you have three different types placeholders variables and constants and then you create what is known as a graph so tensorflow programming consists of graphs and tensors as I mentioned earlier so this can be considered ultimately as a tensor and then the graph tells how to execute the whole implementation so that the execution is stored in the form of a graph and in this case what we are doing is we are doing a multiplication TF you remember this TF was created as a tensorflow object here one more level one more so TF is available here now tensorflow has what is known as a matrix multiplication or matal function so that is what is being used here in this case so we are using the matrix multiplication of tens of flow so that you multiply your input values x with W right this is what we were doing x w plus b you’re just adding B and this is in very similar to one of the earlier slides where we saw Sigma XI wi so that’s what we are doing here matrix multiplication is multiplying all the input values with the corresponding weights and then adding the bias so that is the graph we created and then we need to Define what is our loss function and what is our Optimizer so in this case we again use the tensor flows apis so tf. NN softmax cross entropy with logits is the uh API that we will use and reduce mean is what is like the mechanism whereby which says that you reduce the error and Optimizer for doing deduction of the error what Optimizer are we using so we are using gradient descent Optimizer we discussed about this in couple of slides uh earlier and for that you need to specify the learning rate you remember we saw that there was a a slide somewhat like this and then you define what should be the learning rate how fast you need to come down that is the learning rate and this again needs to be tested and tried and to find out the optimum level of this learning rate it shouldn’t be very high in which case it will not converge or shouldn’t be very low because it will in that case it will take very long so you define the optimizer and then you call the method minimize for that Optimizer and that will Kickstart the training process and so far we’ve been creating the graph and in order to actually execute that graph we create what is known as a session and then we run that session and once the training is completed we specify how many times how many iterations we want it to run so for example example in this case we are saying Thousand Steps so that is a exit strategy in a way so you specify the exit condition so it training will run for thousand iterations and once that is done we can then evaluate the model using some of the techniques shown here so let us get into the code quickly and see how it works so this is our Cloud environment now you can install tensorflow on your local machine as well and I’m showing this demo on our existing Cloud but you can also install denslow on your local machine and uh there is a separate video on how to set up your tensor flow environment you can watch that if you want to install your local environment or you can go for other any cloud service like for example Google Cloud Amazon or Cloud Labs any of these you can use and U run and try the code okay so it has got started we will log in all right so this is our deep learning tutorial uh code and uh this is our tensorflow environment and uh so let’s get started the first we have seen a little bit of a code walk through uh in the slides as well now you will see the actual code in action so the first thing we need to do is import tensorflow and then we will import the data and we need to adjust the data in such a way that the one hot is encoding is set to True one hot encoding right as I explained earlier so in this case the label values will be shown appropriately and if we just check what is the type of the data so you can see that this is a uh data sets python data sets and if we check the number of of images the way it looks so this is how it looks it is an array of type float 32 similarly the number if you want to see what is the number of training images there are 55,000 then there are test images 10,000 and then validation images 5,000 now let’s take a quick look at the data itself visualization so we will use um matte plot clip for this and um if we take a look at the shape now shape gives us like the dimension of the tensors or or or the arrays if you will so in this case the training data set if we sees the size of the training data set using the method shape it says there are 55,000 and 55,000 by 784 so remember the 784 is nothing but the 28 by 28 28 into 28 so that is equal to 784 so that’s what it is uh showing now we can take just one image and just see what is the the first image and see what is the shape so again size obviously it is only 784 similarly you can look at the image itself the data of the first image itself so this is how it it shows so large part of it will probably be zeros because as you can imagine in the image only certain areas are written rest is U blank so that’s why you will mostly see the Z either it is black or white but then there are these values are so the values are actually they are scaled so the values are between Z and one okay so this is what you’re seeing so certain locations there are some values and then other locations there are zeros so that is how the data is stored and loaded if we want to actually see what is the value of the handwritten image if you want to view it this is how you view it so you create like do this reshape and um matplot lib has this um feature to show you these images so we will actually use the function called um I am show and then if you pass this parameters appropriately you will be able to see the different images now I can change the values in this position so which image we are looking at right so we can say if I want to see what is the in maybe 5,000 right so 5,000 has three similarly you can just say five what is in five five as eight what is in [Music] 50 again eight so basically by the way if you’re wondering uh how I’m executing this code shift enter in case you’re not familiar with Jupiter notebooks shift enter is how you execute each cell individually will cell and if you want to execute the entire program you can go here and say run all so that is how this code gets executed and um here again we can check what is the maximum value and what is the minimum value of this pixel values as I mentioned this is it is scaled so therefore it is between the values lie between 1 and zero now this is where we create our model the first thing is to create the require placeholders and variables and that’s what we are doing here as we have seen in the slides so we create one place holder and we create two variables which is for the weights and biases these two variables are actually matrices so each variable has 784 by 10 values okay so one for this 10 is for each neuron there are 10 neurons and 784 is for the pixel values inputs that are given which is 28 into 28 and the biases as I mentioned one for each neurons so there will be 10 biases they are stored in a variable by the name b and this is the graph which is basically the multiplication of these matrix multiplication of X into W and then the bias is added for each of the neurons and the whole idea is to minimize the error so let me just execute I think this code is executed then we Define what is our the Y value is basically the label value so this is another placeholder we had X as one placeholder and Yore true as a second placeholder and this will have values in the form of uh 10 digigit 10 digigit uh arrays and uh since we said one hot encoded the position which has a one value indicates what is is the label for that particular number all right then we have cross entropy which is nothing but the loss loss function and we have the optimizer we have chosen gradient descent as our Optimizer then the training process itself so the training process is nothing but to minimize the cross entropy which is again nothing but the loss function so we Define all of this in the form of of a graph so the up to here remember what we have done is we have not exactly executed any tensorflow code till now we are just preparing the graph the execution plan that’s how the tensorflow code works so the whole structure and format of this code will be completely different from how we normally do programming so even with people with programming experience may find this a little difficult to understand it and it needs quite a bit of practice so you may want to view this uh video also maybe a couple of times to understand this flow because the way tensor flow programming is done is slightly different from the normal programming some of you who let’s say have done uh maybe spark programming to some extent will be able to easily understand this uh but even in spark the the programming the code itself is pretty straightforward behind the scenes the execution happens slightly differently but in tens oflow even the code has to be written in a completely different way so the code doesn’t get executed uh in the same way as you have written so that that’s something you need to understand and little bit of practi is needed for this so so far what we have done up to here is creating the variables and feeding the variables and um or rather not feeding but setting up the variables and uh the that’s all defining maybe the uh what kind of a network you want to use for example we want to use softmax and so on so you have created the variables how to load the data loaded the data viewed the data and prepared everything but you have not yet executed anything in tens of flow now the next step is the execution in tens of flow so the first step for doing any execution in tensor flow is to initialize the variable abl so anytime you have any variables defined in your code you have to run this piece of code alwayss so you need to basically create what is known as a a node for initializing so this is a node you still are not yet executing anything here you just created a node for the initialization so let us go ahead and create that and here onwards is where you will actually execute your code uh intensive flow and in order to execute the code what you will need is a session tensor flow session so tf. session will give you a session and there are a couple of different ways in which you can do this but one of the most common methods of doing this is with what is known as a withd loop so you have a withd tf. session as SS and with a uh colon here and this is like a block starting of the block and these indentations tell how far this block goes and this session is valid till this block gets executed so that is the purpose of creating this width block this is known as a width block so with tf. session as cess you say cs. run in it now cs. run will execute a node that is specified here so for example here we are saying SS Dot run sess is basically an instance of the session right so here we are saying tf. session so an instance of the session gets created and we are calling that cess and then we run a node within that one of the nodes in the graph so one of the nodes here is in it so we say run that particular node and that is when the initialization of the variables happens now what this does is if you have any variables in your code in our case we have W is a variable and B is a variable so any variables that we created you have to run this code you have to run the initialization of these variables otherwise you will get an error okay so that is the that’s what this is doing then we within this width block we specify a for Loop and we are saying we want the system to iterate for thousand steps and perform the training that’s what this for Loop does run training for thousand iterations and what it is doing basically is it is fetching the data or these images remember there are about 50,000 images but it cannot get all the images in one shot because it will take up a lot of memory and performance issues will be there so this is a very common way of Performing deep learning training you always do in batches so we have maybe 50,000 images but you always do it in batches of 100 or maybe 500 depending on the size of your system and so on and so forth so in this case we are saying okay get me 100 uh images at a time and get me only the training images remember we use only the training data for training purpose and then we use test data for test purpose you must be familiar with machine learning so you must be aware of this but in case you are not in machine learning also not this is not specific to deep learning but in machine learning in general you have what is known as training data set and test data set your available data typically you will be splitting into two parts and using the training data set for training purpose and then to see how well the model has been trained you use the test data set to check or test the validity or the accuracy of the model so that’s what we are doing here and You observe here that we are actually calling an mnist function here so we are saying mnist train. nextt batch right so this is the advantage of using mes database because they have provided some very nice helper functions which are readily available otherwise this activity itself we would have had to write a piece of code to fetch this data in batches that itself is a a lengthy exercise so we can avoid all that if we are using amness database and that’s why we use this for the initial learning phase okay so when we say fetch what it will do is it will fetch the images into X and the labels into Y and then you use this batch of 100 images and you run the training so cs. run basically what we are doing here is we are running the training mechanism which is nothing but it passes this through the neural network passes the images through the neural network finds out what is the the output and if the output obviously the initially it will be wrong so all that feedback is given back to the neural network and thereby all the W’s and Bs get updated till it reaches thousand iterations in this case the exit criteria is th000 but you can also specify probably accuracy rate or something like that for the as an exit criteria so here it is it just says that okay this particular image was wrongly predicted so you need to update your weights and biases that’s the feedback given to each neuron and that is run for thousand iterations and typically by the end of this thousand iterations the model would have learned to recognize these handwritten images obviously it will not be 100% accurate okay so once that is done after so this happens for thousand iterations once that is done you then test the accuracy of these models by using the test data set right so this is what we are trying to do here the code may appear a little complicated because if you’re seeing this for the first time you need to understand uh the various methods of tensor flow and so on but it is basically comparing the output with what has been what is actually there that’s all it is doing so you have your test data and uh you’re trying to find out what is the actual value and what is the predict value and seeing whether they are equal or not TF do equal right and how many of them are correct and so on and so forth and based on that the accuracy is uh calculated as well so this is the accuracy and uh that is what we are trying to see how accurate the model is in predicting these uh numbers or these digits okay so let us run this this entire thing is in one cell so we will have to just run it in one shot it may take a little while let us see and uh not bad so it has finished the thousand iterations and what we see here as an output is the accuracy so we see that the accuracy of this model is around 91% okay now which is pretty good for such a short exercise within such a short time we got 90% accuracy however in real life this is probably not sufficient so there are other ways in to increase the accuracy we will see probably in some of the later tutorials how to improve this accuracy how to change maybe the hyper parameters like number of neurons or number of layers and so on and so forth and uh so that this accuracy can be increased Beyond 90% hello and welcome to the tensorflow object detection API tutorial in this video I will walk you through the tensorflow code to perform op object detection in a video so let’s get started this part is basically you’re importing all the libraries we need a lot of these libraries for example lumpi we need image IO datetime and pill and so on and so forth and of course mat plot lib so we import all these libraries and then there are a bunch of variables which have some paths for the files and folders so this is regular stuff let’s keep moving then we import ma plot lib and make it in line and uh a few more Imports all right and then these are some warnings we can just ignore them so if I run this code once again it will go away all right and then here onwards we do the model preparation and what we’re going to do is we’re going to use an existing neural network model so we are not going to train a new one because that really will take a long time and uh it needs a lot of computation resources and so on and it is really not required there are already models that have been trained and in this case it is the SSD with mobile net that’s the model that we are going to use and uh this model is trained to detect objects and uh it is readily available as open source so we can actually use this and if you want to use other models there are a few more models available so you can click on this link here and uh let me just take you there there are a few more models but we have chosen this particular one because this is faster it may not be very accurate but that is one of the faster models but on this link you will see a lot of other models that are readily available these are trained models some of them would take a little longer but they may be more accurate and so on so you can probably play around with these other models okay so we will be using that model so this piece of code this line is basically importing that model and this is also Al known as uh Frozen model the term we use is frozen model so we import download and import that and then we will actually use that model in our code all right so these two cells we have downloaded and import the model and then once it is available locally we will then load this into our program all right so we are loading this into memory and uh you need to perform a couple of additional steps which is basically we need to to map the numbers to text as you may be aware when we actually build the model and when we run predictions the model will not give a text the output of the model is usually a number so we need to map that to a text so for example if the network predicts that the output is five we know that five means it is an airplane things like that so this mapping is done in this next cell all right so let’s keep moving and then we have a helper code which will basically load the data or load the images and transform into numpy arrays this is also used in doing object detection in images so we are actually going to reuse because video is nothing but it consists of frames which in turn are images so we are going to pretty much use reuse the same code which we used for doing object detection in image so this is where the actual detection starts so here this is the path for where the images are stored so this is here once again we are reusing the code which we wrote for detecting objects in an image so this is the path where the images were stored and this is the extension and this was done for about two or three images so we will continue to use this and uh we go down I’ll skip this section so this is the cell where we are actually loading the video and converting it into frames and then using frame by frame we are detecting the objects in the image so in this code what we are doing basically is there a few lines of code what they do is basically once they find an object a box will be drawn around those uh each of those objects and the input file the name of the input video file is uh traffic it is the extension is MP4 and uh we have this video reader so excellent object which is basically part of this class called image iio so we can read and write videos using that and uh the video that we are going to use is traffic. MP4 you can use any mp4 file but in our case I picked up video which has uh like car so let me just show you so this is in this object detection folder I have this mp4 file I’ll just quickly PR this video it’s a little slow yeah okay so here we go this is the video it’s a short one relatively small video so that for this particular demo and what it will do is once we run our code it will detect each of these cars and it will annotate them as cars so in this particular video we only have cars we can later on see with another video I think I have cat here so we can also try with that but let’s first check with this uh traffic video so let me go back so we will be reading this frame by frame and um no actually we will be reading the video file but then we will be analyzing it frame by frame and we will be reading them at 10 frames per second that is the rate we are mentioning here and analyzing it and then annotating and then writing it back so you will see that we will have a video file named something like this traffic annotated and um we will see the annotated video so let’s go back and run through this piece of code and then we will’ll come back and see the annotated uh video this might take a little while so I will pause the video after running this particular cell and then come back to show you the results all right so let’s go ahead and run it so it is running now and it is also important that at the end you close the video writer so that it is similar to a file pointer when you open a file you should also make sure you close it so that it doesn’t hog the resources so it’s very similar at the end of it the last piece or last line of code should be video writer. close all right so I’ll pause and then I’ll come back okay so I will see you in a little bit all right so now as you can see here the processing is done the r Glass has disappeared that means the video has been processed so let’s go back and check the annotated video we go back to my file manager so this was the original traffic. MP4 and now you have here traffic annotate it M4 so let’s go and run this and see how it looks you see here it just got each of these cars are getting detected let me pause and show you so we pause here it says car 70% let us allow it to go a little further it detects something on top what is that truck okay so I think because of the board on top it somehow thinks there is a truck let’s playay some more and see if it detects anything else so this is again a car looks like so let us yeah so this is a car and it has confidence level of 69% okay this is again a car all right so basically till the end it goes and detects each and every car that is passing by now we can quickly repeat this process for another video let me just show you the other video which is a cat again there is uh this cat is not really moving or anything but it is just standing there staring and moving a little slowly and uh our application will our network will detect that this is a cat and uh even when the cat moves a little bit in the other direction it’ll continue to detect and show that it is a cat Okay so yeah so this is how the original video is let’s go ahead and change our code to analyze this one and see if it detects our Network detects this cat close this here we go and I’ll go back to my code all we need to do is change this traffic to cat the extension it will automatically pick up because it is given here and then it will run through so very quickly once again what it is doing is this video reader video uncore reader has a a neat little feature or interface whereby you can say for frame in video uncore reader so it will basically provide frame by frame so you in a loop frame by frame and then you take that each frame that is given to you you take it and analyze it as if it is an image individual image so that’s the way it works so it is very easy to handle this all right so now let’s once again run just this cell rest of the stuff Remains the Same so I will run this cell again it will take a little while so the our glasses come back I will pause and then come back in a little while all right so the processing is done let’s go and check the annotated video go here so we have cat and notated MP4 let’s play this all right so you can see here it is detecting the cat and in the beginning you also saw it detected something else here there looks like it detected one more object so let’s just go back and see what it has detected here let’s see yes so what is it trying to show here it’s too small not able to see but uh it is trying to detect something I think it is saying it is a car I don’t know all right okay so in this video there’s only pretty much only one object which is the cat and uh let’s wait for some time and see if it continues to detect it when the cat turns around and moves as well just in a little bit that’s going to happen and we will see there we go and in spite of turning the other way I think our network is able to detect that it is a cat so let me freeze and then show here it is actually still continues to detect it as a cat all right so so that’s pretty much it I think that’s the only object that it detects in this particular video okay so close this so that’s pretty much it thank you very much for watching this video and you have a great day and in case you have any questions please uh put them below the video here and we will be more than happy to get back to you and make sure you put your email ID so that we can contact you in case you have any questions thank you once again bye-bye today we’re going to be covering the convolutional neural network tutorial do you know how deep learning recognizes the objects in an image and really this particular neural network is how image recognition works it’s very Central one of the biggest building blocks for image recognition it does it using convolution neural network and we over here we have the basic picture of a u hummingbird pixels of an image fed as input you have your input layer coming in so it takes that graphic and puts it into the input layer you have all your hidden layers and then you have your output layer and your output layer one of those is is going to light up and say oh it’s a bird we’re going to go into depth we’re going to actually go back and forth on this a number of times today so if you’re not catching all the image um don’t worry we’re going to get into the details so we have our input layer accepts the pixels of the image as input in the form of arrays and you can see up here where they’ve actually um labeled each block of the bird in different arrays so we’ll dive into deep as to how that looks like and how those matrixes are set up your hidden layer carry out feature extraction by performing certain calcul ations and manipulation so this is the part that kind of reorganizes that picture multiple ways until we get some data that’s easy to read for the neural network this layer uses a matrix filter and performs convolution operation to detect patterns in the image and if remember that convolution means to coil or to twist so we’re going to twist the data around and alter it and use that operation to detect a new pattern there are multiple hidden layers like convolution layer real U is how that is pronounced when that’s the rectified linear unit that has to do with the activation function that’s used pooling layer also uses multiple filters to detect edges corners eyes feathers beak Etc and just like the term says pooling is pulling information together and we’ll look into that a lot closer here so if you’re if it’s a little confusing now we’ll dig in deep and try to get you uh squared away with that and then finally there is a fully connected layer that identifies the object in the image so we have these different layers coming through in the hidden layers and they come into the final area and that’s where we have say one node or one neural network entity that lights up that says it’s a bird what’s in it for you we’re going to cover an introduction to the CNN what is convolution neural network how CNN recognizes images we’re going to dig deeper into that and really look at the individual layers in the convolutional neural network and finally we do a use case implementation using the CNN we’ll begin our introduction to the CNN by introducing to a pioneer of convolutional neural network Yan leun he was the director of Facebook AI research group built the first convolutional neural network called lenette in 1988 so these have been around for a while and have had a chance to mature over the years it was used for character recognition tasks like reading zip code digits imagine processing mail and automating that process CNN is a feed forward neural network that is generally used to analyze visual images by producing data with a grid-like topology a CNN is also known as a convet and very key to this is we are looking at images that was what this was designed for and you’ll see the different layers as we dig in Mirror some of the other some of them are actually now used since we’re using uh tensorflow and carass in our code later on you’ll see that some of those layers appear in a lot of your other neural network Frameworks uh but in this case this is very Central to processing images and doing so in a variety that captures multiple images and really drills down into their different features in this example here you see flowers of two varieties Orchid and a rose I think the Orchid is much more dainty and beautiful and the rose smells quite beautiful I have a couple rose bushes in my yard uh they go into the input layer that data is in sent to all the different nodes in the next layer one of the Hidden layers based on its different weights and its setup it then comes out and gives those a new value those values then are multiplied by their weights and go to the next hidden layer and so on and then you have the output layer and one of those notes comes out and says it’s an orchid and the other one comes out and says it’s a rose depending on how was well it was trained what separates the CNN or the convolutional neural network from other neural networks is a convolutional operation forms a basis of any convolutional neural network in a CNN every image is represented in the form of arrays of pixel values so here we have a real image of the digit 8 uh that then gets put onto its pixel values representing the form of an array in this case you have a two-dimensional array and then you can see in the Final End form we transform the digit 8 into its representational form of pixels of zeros and on where the ones represent in this case the black part of the eight and the zeros represent the white background to understand the convolution neural network or how that convolutional operation Works we’re going to take a side step and look at matrixes in this case we’re going to simplify it we’re going to take two matrices A and B of one dimension now kind of separate this from your thinking as we learned that you want to focus just on the Matrix aspect of this and then we’ll bring that back together and see what that looks like when we put the pieces for the convolutional operation here we’ve set up two arrays we have uh in this case there a single Dimension Matrix and we have a = 5 37597 and we have b = 1 23 so in the convolution as it comes in there it’s going to look at these two and we’re going to start by doing multiplying them a * B and so we multiply the arrays element wise and we get 5 66 where five is the five * 1 6 is 3 * 2 and then the other 6 is 2 * 3 and since the two arrays aren’t the same size they’re not the same setup we’re going to just truncate the first one and we’re going to look at the second array multiplied just by the first three elements of the first array now that’s going to be a little confusing remember a computer gets to repeat these processes hundreds of times so so we’re not going to just forget those other numbers later on we’ll see we’ll bring those back in and then we have the sum of the product in this case 5 + 6 plus 6 equals 17 so in our a * B our very first digit in that Matrix of a * B is 17 and if you remember I said we’re not going to forget the other digits so we now have 325 we move one set over and we take 325 and we multiply that times B and you’ll see that 3 * 1 is 3 2 * 2 is 4 and so on and so on we sum it up so now we have the second digit of our a * B product in The Matrix and we continue on with that same thing so on and so on so then we would go from uh 375 to 759 to 597 this short Matrix that we have for a we’ve now covered all the different entities in a that match three different levels of B now in a little bit we’re going to cover where we use this math at this multiplying of matrixes and how that works uh but it’s important understand that we’re going through the Matrix and multiplying the different parts to it to match the smaller Matrix with the larger Matrix I know a lot of people get lost at is you know what’s going on here with these matrixes uh oh scary math not really that scary when you break it down we’re looking at a section of a and we’re comparing it to B so when you break that down your mind like that you realize okay so I’m I’m just taking these two matrixes and comparing them and I’m bringing the value down into one Matrix a * B we’re deucing that information in a way that will help the computer see different aspects let’s go ahead and flip over again back to our images here we are back to our images talking about going to the most basic two-dimensional image you can get to consider the following two images the image for the symbol back slash when you press the back slash the above image is processed and you can see there for the image for the forward slash is the opposite so we click the forward slash button that flips uh very basic we have four pixels going in can’t get any more more basic than that here we have a little bit more complicated picture we take a real image of a smiley face um then we represent that in the form of black and white pixels so if this was an image in the computer it’s black and white and like we saw before we convert this into the zeros in one so where the other one would have just been a matrix of just four dots now we have a significantly larger image coming in so don’t worry we’re going to bring this all together here in just a little bit layers in convolutional neural network when we’re looking at this we have our convolution layer and that really is the central aspect of processing images in the convolutional neural network that’s why we have it and then that’s going to be feeding in and you have your reu layer which is you know as we talked about the rectified linear unit we’ll talk about that a little bit later the reu is an how it Act is how that layer is activated is the math behind it what makes the neurons fire you’ll see that in a lot of other neural networks when you’re using it just by itself it’s for processing smaller amounts of data where you use the atom activation feature for large data coming in now because we’re processing small amounts of data in each image the reu layer works great you have your pooling layer that’s where you’re pulling the data together pooling is a neural network term it’s very commonly used I like to use the term reduce so if you’re coming from the map and reduce side you’ll see that we’re mapping all this data through all these networks and then we’re going to reduce it we’re going to pull it together and then then finally we have the fully connected layer that’s where our output’s going to come out so we have started to look at matrixes we’ve started to look at the convolutional layer and where it fits in and everything we’ve taken a look at images so we’re going to focus more on the convolution layer since this is a convolutional neural network a convolution layer has a number of filters and perform convolution operation every image is considered as a matrix of pixel values consider the following 5×5 image whose pixel values are only zero and one now obviously when we’re dealing with color there’s all kinds of things that come in on color processing but we want to keep it simple and just keep it black and white and so we have our image pixels uh so we’re sliding the filter Matrix over the image and Computing the dot product to detect the patterns and right here you’re going to ask where does this filter come from this is a bit confusing because the filter is going to be derived uh later on we build the filters when we program or train our model so you don’t need to worry what the fil fil actually is what you do need to understand how a convolution layer works is what is the filter doing filter and you’ll have many filters you don’t have just one filter you’ll have lots of filters that are going to look for different aspects and so the filter might be looking for just edges it might be looking for different parts we’ll cover that a little bit more detail in a minute right now we’re just focusing on how the filter works as a matrix remember earlier we talked about multiplying matrixes together and here we have our two-dimensional Matrix and you can see we take the filter and we multiply it in the upper left image and you can see right here 1 * 1 1 * 0 1 * 1 we multiply those all together then sum them and we end up with a convolved feature of four we’re going to take that and sliding the filter Matrix over the image and Computing the dot product to detect patterns so we’re just going to slide this over we’re going to predict the first one and slide it over one notch predict the second one and so on and so on all the way through until we have a new Matrix and this Matrix which is the same size as filter has reduced the image and whatever filter whatever that’s filtering out it’s going to be looking at just those features reduced down to a smaller uh Matrix so once the feature maps are extracted the next step is to move them to the reu layer so the reu layer The Next Step first is going to perform an element wise operation so each of those Maps coming in if there’s negative pixels so it sets all the negative pixels to zero um and you you can see this nice graph where it just zeros out the negatives and then you have a value that goes from zero up to whatever value is um coming out of the Matrix this introduces nonlinearity to the network uh so up until now we have a we say linearity we’re talking about the fact that the feature has a value so it’s a linear feature this feature um came up and has let’s say the feature is the edge of the beak you know it’s like or the backslash that we saw um you’ll look at that and say okay this feature has a value from -10 to to 10 in this case um if it was one and say yeah this might be a beak it might not might be an edge right there a minus 5 means no we’re not even going to look at it to zero and so we end up with an output and the output takes all these feature all these filtered features remember we’re not just running one filter on this we’re running a number of filters on this image and so we end up with an rectified feature map that is looking at just the features coming through and how they weigh in from our filters so here we have an input a looks like a twocan bird very exotic looking real image is scanned in multiple convolution and the relu layers for locating features and you can see up here is turned it into a black and white image and in this case we’re looking in the upper right hand corner for a feature and that box scans over a lot of times it doesn’t scan one pixel at a time a lot of times it will Skip by two or three or four pixels uh to speed up the process that’s one of the ways you can compensate if you don’t have enough resources on your computation for large images and it’s not just one filter slowly goes across the image uh you have multiple filters have been programmed in there so you’re looking at a lot of different filters going over the different aspects of the image and just sliding across there and forming a new Matrix one more aspect to note about the reu layer is we’re not just having one reu coming in uh so not only do we have multiple features going through but we’re generating multiple reu layers for locating the feature features that’s very important to note you know so we have a quite a bundle we have multiple filters multiple railu uh which brings us to the next step forward propagation now we’re going to look at the pooling layer the rectified feature map now goes through a pooling layer pooling is a down sampling operation that reduces the dimensionality of the feature map that’s all we’re trying to do we’re trying to take a huge amount of information and reduce it down to a single answer this is a specific kind of bird this is an iris this is a Rog so you have a rectified feature map and you see here we have a rectified feature map coming in um we set the max pooling with a 2 by two filters and a stride of Two And if you remember correctly I talked about not going one pixel at a time uh well that’s where the stride comes in we end up with a 2X two pulled feature map but instead of moving one over each time and looking at every possible combination we skip a we skip a few there we go by two we skip every other pixel and we just do every other one um and this produces our rectified feature map which as you can see over here 16x 16 to a 4×4 so we’re continually trying to filter and reduce our data so that we can get to something we can manage and over here you see that we have the Max uh 34 one and two and in the max pooling we’re looking for the max value a little bit different than what we were looking at before so coming from the rectified feature we’re now finding the max value and then we’re pulling those features together so instead of think of this as image of the map think of this as how valuable is a feature in that area how much of a feature value do we have and we just want to find the best or the maximum feature for that area they might have that one piece of the filter of the beak said oh I see a one in this beak and this image and then it skips over and says I see a three in this image and says oh this one is rated as a four we don’t want to sum it together cuz then you know you might have like five ones and I’ll say ah five but you might have uh four zeros and one 10 and that tin says well this is definitely a beak where the ones will say probably not a beak a little strange analogy since we’re looking at a bird but you can see how that pulled feature map comes down and we’re just looking for the max value in each one of those matrixes pooling layer uses different filters to identify different parts of the image like edges corners body feathers eyes beak Etc um I know I focus mainly on the beak but obviously uh each feature could be each a different part of the bird coming in so let’s take a look look at what that looks like structure of a convolution neural network so far this is where we’re at right now we have our input image coming in and then we use our filters and there’s multiple filters on there that are being developed to kind of twist and change that data and so we multiply the matrixes we take that little filter maybe it’s a 2 x two we multiply it by each piece of the image and if we step two then it’s every other piece of the image that generates multiple convolution layers so we have a number of convolution layers we have um set up in there is looking at that data we then take those convolution layers we run them through the reu setup and then once we’ve done through the reu setup and we have multiple reu going on multiple layers that are reu then we’re going to take those multiple layers and we’re going to be pooling them so now we have the pooling layers or multiple poolings going on up until this point we’re dealing with uh sometimes it’s multiple Dimensions you can have three dimensions some strange data setups that aren’t doing images but looking at other things they can have four five six seven dimensions uh so right now we’re looking at 2D image Dimensions coming in into the pooling layer so the next step is we want to reduce those Dimensions or flatten them so flattening flattening is a process of converting all of the resultant two-dimensional arrays from pulled feature map into a single long continuous linear Vector so over here you see where we have a pulled feature map maybe that’s the bird wing and it has values 6847 and we want to just flatten this out and turn it into 6847 or a sing linear vector and we find out that not only do we do each of the pulled feature Maps we do all of them into one long linear Vector so now we’ve gone through our convolutional neural network part and we have the input layer into the next setup all we’ve done is taken all those different pooling layers and we flatten them out and combine them into a single linear Vector going in so after we’ve done the flattening we have a just a quick recap because we’ve covered so much so it’s important to go back and take a look at each of the steps steps we’ve gone through the structure of the network so far is we have our convolution where we twist it and we filter it and multiply the matrixes we end up with our convolutional layer which uses the reu to figure out the values going out into the pooling and you have numerous convolution layers that then create numerous pooling layers pooling that data together which is the max value which one we want to send forward we want to send the best value and then we’re going to take all of that from each of the pooling layers and we’re going to flatten it and we’re going to combine them into a single input going into the final layer once you get to that step you might be looking at that going boy that looks like the normal inut to most neural network and you’re correct it is so once we have the flattened Matrix from the pooling layer that becomes our input so the pooling layer is fed as an input to the fully connected layer to classify the image and so you can see as our flattened Matrix comes in in this case we have the pixels from the flattened Matrix fed as an input back to our twocan whatever that kind of bird that is um I need one of these to identify what kind of bird that is it comes into our Ford propagation network uh and that will then have the different weights coming down across and then finally it selects that that’s a bird and that it’s not a dog or a cat in this case even though it’s not labeled the final layer there in red is our output layer our final output layer that says bird cat or dog so quick recap of everything we’ve covered so far we have our input image which is twisted and M multiply the filters are multiplied times the uh matri the two matrixes multiplied all the filters to create our convolution layer our convolution layers there’s multiple layers in there because it’s all building multiple layers off the different filters then goes through the reu as say activation and that creates our pooling and so once we get into the pooling layer we then and the pooling look for who’s the best what’s the max value coming in from our convolution and then we take that layer and we flatten it and then it goes into a fully connected layer our fully connected neural network and then to the output and here we can see the entire process how the CNN recognizes a bird this is kind of nice because it’s showing the little pixels and where they’re going you can see the filter is generating this convolution network and that filter shows up in the bottom part of the convolution network and then based on that it uses the relo for the pooling the pooling then find out which one’s the best and so on all the way to the fully connected layer at the end or the classification and the output layer so that’d be a classification neural network at the end so we covered a lot of theory up till now and you can imagine each one of these steps has to be broken down in code so putting that together can be a little complicated not that each step of the process is overly complicated but because we have so many steps uh we have one two three four five different steps going on here with substeps in there we’re going to break that down and walk through that in code so in our use case implementation using the CNN we’ll be using the Carr 10 data set from Canadian Institute for advanced research for classifying images across 10 categories Unfortunately they don’t let me know whether it’s going to be a toucan or some other kind of bird but we do get to find out whether it can categorize between a ship a frog deer bird airplane automobile cat dog horse truck so that’s a lot of fun and if you’re looking anything in the news at all of our automated cars and everything else you can see where this kind of processing is so important in today’s world and Cutting Edge as far as what’s coming out in the commercial deployment I mean this is really cool stuff we’re starting to see this just about everywhere in Industry uh so great time to be playing with this and figuring it all out let’s go ahead and dive into the code and see what that looks like when we’re actually writing our script before we go on let’s do uh one more quick look at what we have here let’s just take a look at data batch one keys and remember in Jupiter notebook I can get by with not doing the print statement if I put a variable down there it’ll just display the variable and you can see under data batch one for the keys since this is a dictionary we have the batch one label data and file names uh so you can actually see how it’s broken up in our data set so for the next step or step four as we’re calling it uh we want to display the image using Matt plot Library there’s many ways to display the images you could even uh well there’s other ways to drill into it but map plot library is really good for this and we’ll also look at our first reshape uh setup or shaping the data so you can have a little glimpse into what that means uh so we’re going to start by importing our M plot and of course since I am doing Jupiter notebook I need to do the map plot inline command so it shows up on my page so here we go we’re going to import matplot library. pip plot is PLT and if you remember map plot Library the P plot is like a canvas that we paint stuff onto and there’s my percentage sign map plot library in line so it’s going to show up in my notebook and then of course we’re going to import numpy as NP for our numbers python array setup and let’s go ahead and set u x equals to data batch one so this will pull in all the data going into the x value and then because this is just a long stream of binary data uh we need to go a little bit of reshaping so in here we have to go ahead and reshape the data we have 10,000 images okay that looks correct and this is kind of an interesting thing it took me a little bit to I had to go research this myself to figure out what’s going on with this data and what it is is it’s a 32×32 picture and let me do this let me go ahead and do a drawing pad on here uh so we have 32 bits by 32 bits and it’s in color so there’s three bits of color now I don’t know why the data is particularly like this it probably has to do with how they originally encoded it but most pictures put the three afterward so what we’re doing here is we’re going to take uh the shape we’re going to take the data which is just a long stream of information and we’re going to break it up into 10,000 pieces and those 10,000 pieces then are broken into three pieces each and those three pieces then are 32 by 32 you could look at this like an oldfashioned projector where they have the red screen or the red projector the blue projector and the green projector and they add them all together and each one of those is a 32x 32 bit so that’s probably how this was originally formatted with in that kind of Ideal things have changed so we’re going to transpose it we’re going to take the three which was here and we’re going to put it at the end so the first part is reshaping the data from a single line of bit data or whatever format it is into 10,000 by 3x 32x 32 and then we’re going to transpose the color factor to the last place so it’s the image then the 32x 32 in the middle that’s this part right here and then finally we’re going to take this uh which is three bits of data and put it at the end so it’s more like we do process images now and then as type this is really important that we’re going to use an integer 8 you can come in here and you’ll see a lot of these they’ll try to do this with a float or a float 64 what you got to remember though is a float uses a lot of memory so once you switch this into uh something that’s not integer 8 which is goes up to 128 you are just going to the the amount of ram let just put that in here is going to go way up the amount of RAM that it loads uh so you want to go ahead and use this you can try the other ones and see what happens if you have a lot of RAM on your computer but for this exercise this will work just fine and let’s go ahead and take that and run this so now our X variable is all loaded and it has all the images in it from the batch one data batch one and just to show we were talking about with the as type on there if we go ahead and take x0 and just look for its max value let me go ahead and run that uh you’ll see it doesn’t oops I said 128 it’s 255 uh you’ll see it doesn’t go over 255 because it’s an basically an asky character is what we’re keeping that down to we’re keeping those values down so they’re only 255 0 to 255 versus a float value which would bring this up um exponentially in size and since we’re using the map plot Library we can do um oops that’s not what I wanted since we’re using the map plot Library we can take our canvas and just do a PLT do IM for image show and let’s just take a look at what x0 looks like and it comes in I’m not sure what that is but you can see it’s a very low grade image uh broken down to the minimal pixels on there and if we did the same thing oh let’s do uh let’s see what one looks like hopefully it’s a little easier to see run on there not enter let’s hit the run on that uh and we can see this is probably a semi that’s a good guess on there and I can just go back up here instead of typing the same line in over and over and we’ll look at three uh that looks like a dump truck un loading uh and so on you can do any of the 10,000 images we can just jump to 55 uh looks like some kind of animal looking at us there probably a dog and just for fun let’s do just one more uh uh run on there and we can see a nice car for image number four uh so you can see we past through all the different images and it’s very easy to look at them and they’ve been reshaped to fit our view and what the uh map plot Library uses for its format so the next step is we’re going to start creating some helper functions we’ll start by a one hot encoder to help us we’re processing the data remember that your labels they can’t just be words they have to switch it and we use the one hot encoder to do that and then we’ll also create a uh class uh CFR helper so it’s going to having a knit and a setup for the images and then finally we’ll go ahead and run that code so you can see what that looks like and then we get into the fun part where we’re actually going to start creating our model our actual neural network model so let’s start by creating our one hot encoder we’re going to create our own here uh and it’s going to return an out and we’ll have our Vector coming in and our values equal 10 what this means is that we have the 10 values the 10 possible labels and remember we don’t look at the labels as a number because a car isn’t one more than a horse that’d be just kind of bizarre to have horse equals zero car equals 1 plane equals 2 cat equals 3 so a cat plus a C equals what uh so instead we create a numpy array of zeros and there’s going to be 10 values so we have 10 different values in there so you have uh zero or one one means it’s a cat zero means it’s not a cat um in the next line it might be that uh one means it’s a car zero means it’s not a car so instead of having one output with a value of 0 to 10 you have 10 outputs with the values of 0 to one that’s what the one hot encoder is doing here and we’re going to utilize this in code in just a minute so let’s go ahead and take a look at the next help helpers we have a few of these helper functions we’re going to build and when you’re working with a very complicated python project dividing it up into separate definitions and classes is very important otherwise it just becomes really ungainly to work with so let’s go ahead and put in our next helper uh which is a class and this is a lot in this class so we we’ll break it down here let’s just start uhop we put a space right in there there we go that this a little bit more readable add a second space so we’re going to create our class the cipher Helper and we’ll start by by initializing it now there’s a lot going on in here so let’s start with the uh nit part uh self. I equals zero that’ll come in in a little bit we’ll come back to that in the lower part we want to initialize our training batches so when we went through this there was like a meta batch we don’t need the meta batch but we do need the data batch one 2 3 4 five and we do not want the testing batch in here this is just the self all train batches so we’re going to come make an array of of all those different images and then of course we left the test batch out so we have our self. test batch uh we’re going to initialize the training images and the training labels and also the test images and the test labels so these are just this is just to initialize these variables in here then we create another definition down here and this is going to set up the images let’s just take a look and see what’s going on in there now we could have all just put this as part of the uh init part uh since this is all just helping stuff but breaking it up again makes it easier to read it also makes it easier when we start executing the different pieces to see what’s going on so that way we have a nice print statement to say hey we’re now running this and this is what’s going on in here we’re going to set up these self trining images at this point and that’s going to go to a numpy array vstack and in there we’re going to load up uh in this case the data for D and self all train batches again that points right up to here so we’re going to go through each one of these uh five files or each one of these data sets CU they’re not a file anymore we’ve brought them in data batch one points to the actual data and so our self-training images is going to stack them all into our into a numpy array and then it’s always nice to get the training length and that’s just a total number of uh self-training images in there and then we’re going to take the selft trining images let me switch marker colors cuz I am getting a little too much on the markers up here oops there we go bring down our marker change so we can see it a little better and at this point this should look familiar where did we see this well when we wanted to uh uh look at this above and we wanted to look at the images in the matplot library we had to reshape it so we’re doing the same thing here we’re taking our self-training images and uh based on the training length total number of images because we stacked them all together so now it’s just one large file of images we’re going to take and look at it as our our three video cameras that are each displaying uh 32 by 32 we’re going to switch that
around so that now we have um each of our images that stays the same place and then we have our 32x 32 and then by our three our last our three different values for the color and of course we want to go ahead and uh they run this where you say divide by 255 that was from earlier it just brings all the data into 0 to one that’s what this is doing so we’re turning this into a 0 to one array which is uh all the pictures 32x 32x 3 and then we’re going to take the self-training labels and we’re going to pump those through our one hot encoder we just made and we’re going to stack them together and uh again we’re converting this into an array that goes from uh instead of having horse equals one dog equals two and then horse plus dog would equal three which would be cat no it’s going to be uh you know an array of 10 where each one is 0o to one then we want to go ahead and set up our test images and labels and uh when we’re doing this you’re going to see it’s the same thing we just did with the rest of let me just change colors right here this is no different than what we were doing up here with our training Set uh we’re going to stack the different uh images uh we’re going to get the length of them so we know how many images are in there uh you certainly could add them by hand but it’s nice to let the computer do it especially if it ever changes on the other end and you’re using other data and again we reshape them and transpose them and we also do the one hot encoder same thing we just did on our training images so now our test images are in the same format so now we have a definition which sets up all our images in there and then the next step is to go ahead and batch them or next batch and let’s do another breakout here for batches because this is really important to understand T to throw me for a little Loop when I’m working with tensor flow or carass or a lot of these we have our data coming in if you remember we had like 10,000 photos let me just put 10,000 down here we don’t want to all 10,000 at once so we want to break this up into batch sizes and you also remember that we had the number of photos in this case uh length of test or whatever number is in there uh we also have 32 by 32 by 3 so when we’re looking at the batch size we want to change this from 10,000 to um a batch of in this case I think we’re going to do batches of 100 so we want to look at just 100 the first 100 of the photos and if you remember we set self y equal to 0er uh so what we’re looking at here is we’re going to create X we’re going to get the next batch from the very initialize we’ve already initialized it for zero so we’re going to look at X from zero to batch size which we set to 100 so just the first 100 images and then we’re going to reshape that into uh and this is important to let the data know that we’re looking at 100x 32x 32x 3 now we’ve already formatted it to the 32x 32x 3 this just sets everything up correctly so that X has the data in there in the correct order and the correct shape and then the Y just like the X uh is our labels so our training labels again they go from zero to batch size in this case they do selfi plus batch size because the selfi is going to keep changing and then finally we increment the selfi because we have zero so we so the next time we call it we’re going to get the next batch size and so basically we have X and Y X being the photograph data coming in and y being the label and that of course is labeled through one hot encoder so if you remember correctly if it was say horse is equal to zero it would be um one for the zero position since this is the horse and then everything else would be zero in here me just put lines through there there we go there’s our array hard to see that array so let’s go ahead and take that and uh we’re going to finish loading it since this is our class and now we’re armed with all this um uh our setup over here let’s go ahead and load that up and so we’re going to create a variable CH with the CFR helper in it and then we’re going to do ch. setup images uh now we could have just put all the setup images under the init but by breaking this up into two parts it makes it much more readable and um also if you’re doing other work there’s reasons to do that as far as the setup let’s go ahead and run that and you can see where it says uh setting up training images and labels setting up test images and that’s one of the reasons we broke it up is so that if you’re testing this out you can actually have print statements in there telling you what’s going on which is really nice uh they did a good job with this setup I like the way that it was broken up in the back and then one quick note you want to remember that batch to set up the next batch is we have to run uh batch equals CH next batch of 100 because we’re going to use the 100 size uh but we’ll come back to that we’re going to use that just remember that that’s part of our code we’re going to be using in a minute from the definition we just made so now we’re ready to create our model first thing we want to do is we want to import our tensor flow as TX I’ll just go ahead and run that so it’s loaded up and you can see we got a a warning here uh that’s because they’re making some changes it’s always growing and they’re going to be depreciating one of the uh values from float 64 to float type or it’s treated as an NP float 64 uh nothing to really worry about CU this doesn’t even affect what we’re working on because we’ve set all of our stuff to a 255 value or 0o to one and do keep in mind that 0 to one value that we converted to 255 is still a float value uh but it’ll will easily work with either the uh numpy float 64 or the numpy dtype float it doesn’t matter which one it goes through so the depreciation would not affect our code as we have it and in our tensor flow uh we’ll go ahead let me just increase the size in there just a moment so you can get better view of the um what we’re typing in uh we’re going to set a couple placeholders here and so we have we’re going to set x equals TF placeholder TF float 32 we just talked about the float 64 versus the numpy float we’re actually just going to keep this at float 32 more than a significant number of decimals for what we’re working with and since it’s a place holder we’re going to set the shape equal to and we’ve set it equal to none because at this point we’re just holding the place on there we’ll be setting up as we run the batches that’s what the first value is and then 32x 32x 3 that’s what we’ reshaped our data to fit in and then we have our y true equals placeholder T of float 32 and the shape equals none comma 10 10 is the 10 different labels we have so it’s an array of 10 and then let’s create one more placeholder we’ll call this a hold prob or hold probability and we’re going to use this we don’t have to have a shape or anything for this this placeholder is for what we call Dropout if you remember from our Theory before we drop out so many nodes that’s looking at or the different values going through which helps decrease bias so we need to go ahead and put a a placeholder for that also and we’ll run this so it’s all loaded up in there so we have our three different placeholders and since we’re in tensor flow when you use carass it does some of this automatically but we’re in tensor flow direct carass sits on tensor flow we’re going to go ahead and create some more helper functions we’re going to create something to help us initialize the weights initialize our bias if you remember that each uh layer has to have a bias going in we’re going to go ahead and work on our our conversional 2D our Max pool so we have our pooling layer our convolutional layer and then our normal F layer so we’re going to go ahead and put those all into definitions and let’s see what that looks like in code and you can also grab some of these helper functions from the MN the uh nist setup let me just put that in there if you’re under the tensor flow so a lot of these are already in there but we’re going to go ahead and do our own and we’re going to create our uh a knit weights and one of the reasons we’re doing this is so that you can actually start thinking about what’s going on in the back end so even though there’s ways to do this with an automation sometimes these have to be tweaked and you have to put in your own setup in here uh now we’re not going to be doing that we’re just going to recreate them for our code and let’s take a look at this we have our weights and so what comes in is going to be the shape and what comes out is going to be uh random numbers so we’re going to go ahead and just nit some random numbers based on the shape with a standard deviation of 0.1 kind of a fun way to do that and then the TF variable uh in nit random distribution so we’re just creating a random distribution on there that’s all that is for the weights now you might change that you might have a a higher standard deviation in some cases you actually load preset weights that’s pretty rare usually you’re testing that against another model or something like that and you want to see how those weights configure with each other uh now remember we have our bias so we need to go ahead and initialize the bias with a constant uh in this case we’re using 0.1 a lot of times the bias is just put in as one and then you have your weights to add on to that uh but we’re going to set this as 0.1 uh so we want to return a convolutional 2d in this case a neural network this is uh would be a layer on here what’s going on with the con 2D is we’re taking our data coming in uh we’re going to filter it strides if you remember correctly strides came from here’s our image and then we only look at this picture here and then maybe we have a stride of one so we look at this picture here and we continue to look at the different filters going on there the other thing this does is that we have our data coming in as 32 by 32 by 3 and we want to change this so that it’s just this is three dimensions and it’s going to reformat this as just two Dimensions so it’s going to take this number here and combine it with the 32x 32 so this is a very important layer here CU it’s reducing our data down using different means and it connects down I’m just going to jump down one here uh it goes with the convolutional layer so you have your your kind of your pre- formatting and the setup and then you have your actual convolution layer that goes through on there and you can see here we have a knit weights by the the shape and knit bias shape of three because we have the three different uh here’s our three again and then we return the tfnn relu with the convention 2D so this convolutional uh has this feeding into it right there it’s using that as part of it and of course the input is the XY plus b the bias so that’s quite a mouthful but these two are the are the keys here to creating the convolutional layers there the convolutional 2D coming in and then the convolutional layer which then steps through and creates all those filters we saw then of course we have our pooling uh so after each time we run it through the convectional layer we want to pull the data uh if you remember correctly on the on the pool side and let me just get rid of all my marks it’s getting a little crazy there and in fact let’s go ahead and jump back to that slide let’s just take a look at that slide over here uh so we have our image coming in we create our convolutional layer with all the filters remember the filters go um you know the filters coming in here and it looks at these four boxes and then if it’s a step let’s say step two it then goes to these four boxes and then the next step and so on uh so we have our convolutional layer that we generate or convolutional layers they use the uh reu function um there’s other functions out there for this though the reu is the uh most the one that works the best at least so far I’m sure that will change then we have our pooling now if you remember correctly the pooling was Max uh so if we had the filter coming in and they did the multiplication on there and we have a one and maybe a two here and another one here and a three here three is the max and so out of all of these you then create an array that would be three and if the max is over here two or whatever it is that’s what goes into the pooling of what’s going on in our pooling uh so again we’re reducing that data down we’re reducing it down as small as we can and then finally we’re going to flatten it out into a single array and that goes into our fully connected layer and you can see that here in the code right here we’re going to create our normal full layer um so at some point we’re going to take from our pooling layer this will go into some kind of flattening process and then that will be fed into the full the different layers going in down here um and so we have our input size you’ll see our input layer get shape which is just going to get the shape for whatever is coming in uh and then input size initial weights is also based on uh the input layer coming in and the input size down here is based on the input layer shape so we’re just going to already use the shape and already have our size coming in and of course uh you have to make sure youit the bias always put your bias on there and we’ll do that based on the size so this will return tf. matmo input layer w+b this is just a normal full layer that’s what this means right down here that’s what we’re going to return so that was a lot of steps we went through let’s go ahead and run that so those are all loaded in there and let’s go ahead and uh create the layers let’s see what that looks like now that we’ve done all the heavy lifting and everything uh we can get to do all the easy part let’s go ahead and create our layers we’ll create a convolution layer one and two two different convolutional layers and then we’ll take that and we’ll flatten that out create a reshape pooling in there for our reshape and then we’ll have our full uh layer at the end so let’s start by creating our first uh convolutional layer then we come in here and let me just run that real quick and I want you to notice on here the three and the 32 this is important because coming into convolutional layer we have three different channels and 32 pixels each uh so that has to be in there the four and four you can play with this is your filter size so if you remember you have a filter and you have your image and the filter slowly steps over and filters out this image depending on what your step is for this particular setup 44 is just fine that should work pretty good for what we’re doing and for the size of the image and then of course at the end once you have your com evolutional layer set up you also need to pull it and you’ll see that the pooling is automatically set up so that it would see the different shape based on what’s coming in so here we have Max two 2 by two and we put in the convolutional one that we just created the convolutional layer we just created goes right back into it and that right up here as you can see is the X it’s coming in from here so it NOS to look at the first model and set the the data accordingly set that up so it matches and we went ahead ran this already I think I ran let me go and run it again and if we’re going to do one layer let’s go ahead and do a second layer down here and it’s we’ll call it convo 2 it’s also convolutional layer on this and you’ll see that we’re feeding convolutional one in the pooling so it goes from convolutional one into convolutional one pooling from convolutional one pooling into convolutional two and then from convolutional two into convolutional two pooling and we’ll go ahead and take this and run this so these variables are all loaded into memory and for our flatten layer uh let’s go ahead and we’ll do uh since we have 64 coming out of here and we have 4×4 going in let’s do 8X 8X 64 so let’s do 4,096 this is going to be the flat layer so that’s how many bits are coming through on the flat layer and we’ll reshape this so we’ll reshape our convo 2 pooling and that will feed into here the convo two pooling and then we’re going to set it up as a single layer that’s 4,9 6 in size that’s what that means there we’ll go ahead and run this so we’ve now created this variable the convo two flat and then we have our first full layer this is the final uh neural network where the flat layer going in and we’re going to again use the uh Rel for our uh setup on there on a neural network for evaluation and you’ll notice that we’re going to create our first full layer our normal full layer that’s our definition so we created that that’s creating the normal full layer and our input for the data comes right here from the this goes right into it uh the convo to flat so this tells it how big the data is and we’re going to have it come out it’s going to have uh 1024 that’s how big the layer is coming out we’ll go ahead and run this so now we have our full layer one and with the full layer one we want to also Define the full one Dropout to go with that so our full layer one comes in uh keep probability equals hold probability remember we created that earlier and the full layer one is what’s coming into it and this is going backwards and training the data we’re not training every weight we’re only training a percentage of them each time which helps get rid of the bias so let me go ahead and run that and uh finally we’ll go ahead and create a y predict which is going to equal the normal full one Dropout and 10 cuz we have 10 labels in there now in this neural network we could have added additional layers that would be another option to play with you can also play with instead of 1024 you can use other numbers for the way that sets up on what’s coming out going into the next one we’re only going to do just the one layer and the one layer Dropout and you can see if we did another layer it’d be really easy just to feed in the full one Dropout into full layer two and then full Layer Two Dropout would have full Layer Two feed into it and then you’d switch that here for the Y prediction for right now this is great this particular data set is tried and true and we know that this will work on it and if we just type in y predict and we run that uh we’ll see that this is a tensor object uh shape question mark 10 dtype 32 a quick way to double check what we’re working on so now we’ve got all of our uh we’ve done a setup all the way to the Y predict which we just did uh we want to go ahead and apply the loss function and make sure that set up in there uh create the optimizer and then uh trainer Optimizer and create a variable to initialize all the global TF variables so before we dive into the um loss fun function let me point out one quick thing or just kind of a rehap over a couple things and that is when we’re playing with this these setups um we pointed out up here we can change the 44 and use different numbers there you change your outcome so depending on what numbers you use here will have a huge impact on how well your model fits and that’s the same here of the 1024 also this is also another number that if you continue to raise that number you’ll get um possibly a better fit you might overfit and if you lower that number you’ll use less resources and generally you want to use this in um the exponential growth an exponential being 2 4 8 16 and in this case the next one down would be 512 you can use any number there but those would be the ideal numbers uh when you look at this data so the next step in all this is we need to also create uh a way of tracking how good our model is and we’re going to call this a loss function and so we’re going to create a cross entropy line loss function and so before we discuss exactly what that is let’s take a look and see what we’re feeding it uh we’re going to feed it our labels and we have our true labels and our prediction labels uh so coming in here is we’re the two different uh variables we’re sending in or the two different probability distributions is one that we know is true and what we think it’s going to be now this function right here when they talk about cross entropy uh in information Theory the cross entropy between two probability distributions over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set that’s a mouthful uh really we’re just looking at the amount of error in here how many of these are correct and how many of these um are incorrect so how much of it matches and we’re going to look at that we’re just going to look at the average that’s what the mean the reduced to the mean means here so we’re looking at the average error on this and so the next step is we’re going to take the error we want to know our cross entropy or our loss function how much loss we have that’s going to be part of how we train the model so when you know what the loss is and we’re training it you feed that back into the back propagation setup and so we want to go ahead and optimize that here’s our Optimizer we’re going to create the optimizer using an atom Optimizer remember there’s a lot of different ways of optimizing the data atoms the most popular used uh so our Optimizer is going to equal the TF train atom Optimizer if you don’t remember what the learning rate is let me just pop this back into here here’s our learning rate when you have your weights you have all your weights and your different nodes that are coming out here’s our node coming out um and it has all its weights and then the error is being prop sent back through in reverse on our neural network so we take this error and we adjust these weights based on the different formulas in this case the atom formulas is what we’re using we don’t want to just adjust them completely we don’t want to change this weight so it exactly fits the data coming through because if we made that kind of adjustment it’s going to be biased to whatever the last data we sent through is instead we’re going to multiply that by 0.001 and make a very small shift in this weight so our Delta W is only 0.001 of the actual Delta W of the full change we’re going to compute from the atom and then we want to go ahead and train it so our training or set up a training uh uh variable or function and this is going to equal our Optimizer minimize cross entropy and we make sure we go ahead and run this so it’s loaded in there and then we’re almost ready to train our model but before we do that we need to create one more um variable in here and we’re going to create a variable to initialize all the global TF variables and when we look at this um the TF Global variable initializer this is a tensor flow um object it goes through there and it looks at all our different setup that we have going under our tensor flow and then initializes those variables uh so it’s kind of like a magic one because it’s all hidden in the back end of tensor flow all you need to know about this is that you have to have the initial ization on there which is an operation um and you have to run that once you have your setup going so we’ll go ahead and run this piece of code and then we’re going to go ahead and train our data so let me run this so it’s loaded up there and so now we’re going to go ahead and run the model by creating a graph session graph session is a tensorflow term so you’ll see that coming up it’s one of the things that throws me because I always think of graphx and Spark and graph as just general graphing uh but they talk about a graph session so we’re going to go ahead and run the model and let’s go ahead and walk through this uh what’s going on here and let’s paste this data in here and here we go so we’re going to start off with the with the TF session as sess so that’s our actual TF session we’ve created uh so we’re right here with the TF uh session our session we’re creating we’re going to run TF Global variable initializer so right off the bat we’re initializing our variables here uh and then we have for I in range 500 so what’s going on here remember 500 we’re going to break the date up and we’re going to batch it in at 500 points each we’ve created our session run so we’re going to do with TF session as session right here we’ve created our variable session uh and then we’re going to run we’re going to go ahead and initialize it so we have our TF Global variables initializer that we created um that initializes our our session in here the next thing we’re going to do is we’re going to go for I in range of 500 batch equals ch next batch so if you remember correctly this is loading up um 100 pictures at a time and uh this is going to Loop through that 500 times so we are literally doing uh what is that uh 500 time 100 is uh 50,000 so that’s 50,000 pictures we’re going to process right there in the first process is we’re going to do a session run we’re going to take our train we created our train variable or Optimizer in there we’re going to feed it the dictionary uh we had our feed dictionary that created and we have x equals batch 0 coming in y true batch one hold the probability five and then just so that we can keep track of what’s going on we’re going to every uh 100 steps we’re going to run a print So currently onstep format accuracy is um and we’re going to look at matches equals tf. equal TF argument y prediction one tf. AR Max y true comma 1 so we’re going to look at this is how many Ma matches it has and here our ACC uh all we’re doing here is we’re going to take the matches how many matches they have it creates generates a chart we’re going to convert that to float that’s what the TF cast does and then we just want to know the average we just want to know the average of the um accuracy and then we’ll go ahead and print that out uh print session run accuracy feed dictionary so it takes all this and it prints out our accuracy on there so let’s go ahead and take this oops screens there let’s go ahead and take this and let’s run it and this is going to take a little bit to run uh so let’s see what happens on my old laptop and we’ll see here that we have our current uh we’re currently on Step Zero it takes a little bit to get through the accuracy and this will take just a moment to run we can see that on our Step Zero it has an accuracy of 0.1 or 0128 um and as it’s running we’ll go ahead you don’t need to watch it run all the way but uh this accuracy is going to change a little bit up and down so we’ve actually lost some accuracy during our step two but we’ll see how that comes out let’s come back after we run it all the way through and see how the different steps come out I was actually reading that backwards uh the way this works is the closer we get to one the more accuracy we have uh so you can see here we’ve gone from a 0.1 to a 39 um and we’ll go ahead and pause this and come back and see what happens when we’re done with the full run all right now that we’ve uh prepared the meal got it in the oven and pulled out my finished dish here if you’ve ever watched uh any of the old cooking shows let’s discuss a little bit about this accuracy going on here and how do you interpret that we’ve done a couple things first we’ve defined accuracy um the reason I got it backwards before is you have uh loss or accuracy and with loss you’ll get a graph that looks like this it goes oops that’s an S by the way there we go you get a graph that curves down like this and with accuracy you get a graph that curves up this is how good it’s doing now in this case uh one is supposed to be really good accuracy that mean it gets close to one but it never crosses one so if you have an accuracy of one that is phenomenal um in fact that’s pretty much imp you know unheard of and the same thing with loss if you have a loss of zero that’s also unheard of the zero is actually on this this axis right here as we go in there so how do we interpret that because you know if I was looking at this and I go oh 0. 51 that’s uh 51% you’re doing 5050 no this is not percentage let me just put that in there it is not percentage uh this is log rithmic what that means is that 0. 2 is twice as good as 0.1 and uh when we see 04 that’s twice as good as 0. 2 real way to convert this into a percentage you really can’t say this is is a direct percentage conversion what you can do though is in your head if we were to give this a percentage uh we might look at this as uh 50% we’re just guessing equals 0.1 and if 50% roughly equals 0.1 that’s where we started up here at the top remember at the top here here’s our 0.128 the accuracy of 50% then 75% is about 0.2 and so on and so on don’t quote those numbers because that doesn’t work that way they say that if you have .95 that’s pretty much saying 100% And if you have uh anywhere between you’d have to go look this up let me go and remove all my drawings there uh so the the magic number is 0.5 we really want to be over a 0.5 in this whole thing and we have uh both 0504 remember this is accuracy if we were looking at loss then we’d be looking the other way but 0.0 you know instead of how high it is we want how low it is uh but with accuracy being over a 05 is pretty valid that means this is pretty solid and if you get to a 0.95 then it’s a direct correlation that’s what we’re looking for here in these numbers you can see we finished with this model at 0 5135 so still good um and if we look at uh when they ran this in the other end remember there’s a lot of Randomness that goes into it when we see the weights uh they got 05251 so a little better than ours but that’s fine you’ll find your own uh comes up a little bit better or worse depending on uh just that Randomness and so we’ve gone through the whole model we’ve created we trained the model and we’ve also gone through on every 100th run to test the model to see how accurate it is welcome to the RNN tutorial that’s the recurrent neural network so we talk about a feed forward neural network in a feed forward neural network information flows only in the forward direction from the input nodes through the hidden layers if any and the output nodes there are no Cycles or Loops in the network and so you can see here we have our input layer I was talking about how it just goes straight forward into the hidden layers so each one of those connects and then connects to the next hidden layer connects to the output layer and of course we have a nice simplified version where it has a predicted output and the refer to the input is X a lot of times in the output as y decisions are based on current input no memory about the past no future scope why recurrent neural network issues in feed forward neural network so one of the biggest issues is because it doesn’t have a scope of memory or time a feed forward neural network doesn’t know how to handle sequential data uh it only considers only the current input so if you have a series of things and because three points back affects what’s happening now and what your output affects what’s happening that’s very important so whatever I put as an output is going to affect the next one um a feed forward doesn’t look at any of that it just looks at this is what’s coming in and it cannot memorize previous inputs so it doesn’t have that list of inputs coming in solution to feed forward neural network you’ll see here where it says recurrent neural network and we have our X on the bottom going to H going to Y that’s your feed forward uh but right in the middle it has a value C so it’s a whole another process it’s memorizing what’s going on in the hidden layers and the hidden layers they produce data feed into the next one so your hidden layer might have an output that goes off to Y uh but that output goes back into the next prediction coming in what this does is this allows it to handle sequential data it considers the current input and also the previously received inputs and if we’re going to look at General drawings and um Solutions we should also look at applications of the RNN image captioning RNN is used to caption an image by analyzing the activities present in it a dog catching a ball in midair uh that’s very tough I mean you know we have a lot of stuff that analyzes images of a dog and the image of a ball but it’s able to add one more feature in there that’s actually catching the ball in midair time series prediction any time series problem like predicting the prices of stocks in a particular month can be solved using RNN and we’ll dive into that in our use case and actually take a look at some stock one of the things you should know about analyzing stock today is that it is very difficult and if you’re analyzing the whole stock the stock market at the New York Stock Exchange in the US produces somewhere in the neighborhood if you count all the individual trades and fluctuations by the second um it’s like three terabytes a day of data so we’re only to look at one stock just analyzing One stock is really tricky in here we’ll give you a little jump on that so that’s exciting but don’t expect to get rich off of it immediately another application of the RNN is natural language processing text Mining and sentiment analysis can be carried out using RNN for natural language processing and you can see right here the term natural language processing when you stream those three words together is very different than I if I said processing language natural Le so the time series is very important when we’re analyzing sentiments it can change the whole value of a sentence just by switching the words around or if you’re just counting the words you might get one sentiment where if you actually look at the order they’re in you get a completely different sentiment when it rains look for rainbows when it’s dark look for stars both of these are positive sentiments and they’re based upon the order of which the sentence is going in machine translation given an input in one language RNN can be used to translate the input into a different languages as output I myself very linguistically challenged but if you study languages and you’re good with languages you know right away that if you’re speaking English you would say big cat and if you’re speaking Spanish you would say cat big so that translation is really important to get the right order to get uh all kinds of parts of speech that are important to know by the order of the words here this person is speaking in English and getting translated and you can see here a person is speaking in English in this little diagram I guess that’s denoted by the flags I have a flag I own it no um but they’re speaking in English and it’s getting translated into Chinese Italian French German and Spanish languages some of the tools coming out are just so cool so somebody like myself who’s very linguistically challenged I can now travel into Worlds I would never think of because I can have something translate my English back and forth readily and I’m not stuck with a communication gap so let’s dive into what is a recurrent neural network recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer sounds a little confusing when we start breaking it down it’ll make more sense and usually we have a propagation forward neural network with the input layers the hidden layers the output layer with the recurrent neural network we turn that on its side so here it is and now our X comes up from the bottom into the hidden layers into Y and they usually draw very simplified X to H with c as a loop a to Y where a B and C are the perimeters a lot of times you’ll see this kind of drawing in here digging closer and closer into the H and how it works going from left to right you’ll see that the C goes in and then the X goes in so the x is going Upward Bound and C is going to the right a is going out and C is also going out that’s where it gets a little confusing so here we have xn uh CN and then we have y out and C out and C is based on HT minus one so our value is based on the Y and the H value or connected to each other they’re not necessarily the same value because H can be its own thing and usually we draw this or we represent it as a function h of T equals a function of C where H of T minus one that’s the last H output and x a t going in so it’s the last output of H combined with the new input of x uh where HT is the new state FC is a function with the parameter C that’s a common way of denoting it uh HT minus one is the Old State coming out and then xit T is an input Vector at time of Step T well we need to cover types of recurrent neural networks and so the first one is the most common one which is a one: one single output one: one neural network is usually known as is a vanilla neural network used for regular machine learning problems why because vanilla is usually considered kind of a just a real basic flavor but because it’s very basic a lot of times they’ll call it the vanilla neural network uh which is not the common term but it is you know like kind of a slang term people will know what you’re talking about usually if you say that then we run one to Min so you have a single input and you might have a multiple outputs in this case uh image captioning as we looked at earlier where we have not just looking at it as a dog but a dog in a ball in the air and then you have many to1 Network takes in a sequence of inputs examples sentiment analysis where a given sentence can be classified as expressing positive or negative sentiments and we looked at that as we were discussing if it rains look for a rainbow so positive sentiment where rain might be a negative sentiment if you were just adding up the words in there and then of course if you’re going to do a one to one many to one one to many there’s many to many networks takes in a sequence of inputs and generates a sequence of outputs example machine translation so we have a lengthy sentence coming in in English and then going out in all the different languages uh you know just a wonderful tool very complicated set of computations you know if you’re a translator you realize just how difficult it is to translate into different languages one of the biggest things you need to understand when we’re working with this neural network is what’s called The Vanishing gradient problem while training an RNN your slope can be either too small or very large and this makes training difficult when the slope is too small the problem is known as Vanishing gradient and you’ll see here they have a nice U image loss of information through time so if you’re pushing not enough information forward that information is lost and then when you go to train it you start losing the third word in the sentence or something like that or it doesn’t quite follow the full logic of what you’re working on exploding gradient problem Oh this is one that runs into everybody when you’re working with this particular neural network when the slope tends to grow EXP itially instead of decaying this problem is called exploding gradient issues in gradient problem long tring time poor performance bad accuracy and I’ll add one more in there uh your computer if you’re on a lower-end computer testing out a model will lock up and give you the memory error explaining gradient problem consider the following two examples to understand what should be the next word in the sequence the person who took my bike and blank a thief the students who got into engineering with blank from Asia and you can see in here we have our x value going in we have the previous value going forward and then you back propagate the error like you do with any neural network and as we’re looking for that missing word maybe we’ll have the person took my bike and blank was a thief and the student who got into engineering with a blank were from Asia consider the following example the person who took the bike so we’ll go back to the person who took the bike was blank a thief in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun so was a thief is singular the student who got into engineering well in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun and so you can see here the students who got into engineering with blank were from Asia it might be sometimes difficult for the eror to back propagate to the beginning of the sequence to predict what should be the output so when you run into the gradient problem we need a solution the solution to the gradient problem first we’re going to look at exploding gradient where we have three different solutions depending on what’s going on one is identity initialization so the first thing we want to do is see if we can find a way to minimize the identities coming in instead of having it identify everything just the important information we’re looking at next is to truncate the back propagation so instead of having uh whatever information it’s sending to the next series we can truncate what it’s sending we can lower that particular uh set of layers make those smaller and finally is a gradient clipping so when we’re training it we can clip what that gradient looks like and narrow the training model that we’re using when you have a Vanishing gradient the OPA problem uh we can take a look at weight initialization very similar to the identity but we’re going to add more weights in there so it can identify different aspects of what’s coming in better choosing the the right activation function that’s huge so we might be activating based on one thing and we need to limit that we haven’t talked too much about activation functions so we’ll look at that just minimally uh there’s a lot of choices out there and then finally there’s long short-term memory networks the lstms and we can make adjustments to that so just like we can clip the gradient as it comes out we can also um expand on that we can increase the memory Network the size of it so it handles more information and one of the most common problems in today’s uh setup is what they call longterm dependencies suppose we try to predict the last word in the text the clouds are in the and you probably said sky here we do not need any further context it’s pretty clear that the last word is going to be Sky suppose we try to predict the last word in the text I have been staying in Spain for the last 10 years I can speak fluent maybe you said Portuguese or French no you probably said Spanish the word we predict will depend on the previous few words words in context here we need the context of Spain to predict the last word in the text it’s possible that the gap between the relevant information and the point where it is needed to become very large lstms help us solve this problem so the lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior All recurrent neural networks have the form of a chain of repeat repeating modules of neural network connections in standard rnns this repeating module will have a very simple structure such as a single tangent H layer lstm s’s also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior LST tms’s also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way as you can see the deeper we dig into this the more complicated the graphs kit in here I want you to note that you have X of T minus one coming in you have X of T coming in and you have x a t + one and you have H of T minus one and H of T coming in and H of t+1 going out and of course uh on the other side is the output a um in the middle we have our tangent H but it occurs in two different places so not only when we’re Computing the x of t + one are we getting the tangent H from X of T but we’re also getting that value coming in from the X of T minus one so the short of it is as you look at these layers not only does it does the propagate through the first layer goes into the second layer back into itself but it’s also going into the third layer so now we’re kind of stacking those up and this can get very complicated as you grow that in size it also grows in memory too and in the amount of resources it takes uh but it’s a very powerful tool to help us address the problem of complicated long sequential information coming in like we were just looking at in the sentence and when we’re looking at our long shortterm memory network uh there’s three steps of processing assing in the lstms that we look at the first one is we want to forget irrelevant parts of the previous state you know a lot of times like you know is as in a unless we’re trying to look at whether it’s a plural noun or not they don’t really play a huge part in the language so we want to get rid of them then selectively update cell State values so we only want to update the cell State values that reflect what we’re working on and finally we want to put only output certain parts of the cell state so whatever is coming out we want to limit what’s going out too and let’s dig a little deeper into this let’s just see what this really looks like uh so step one decides how much of the past it should remember first step in the lstm is to decide which information to be omitted in from the cell in that particular time step it is decided by the sigmoid function it looks at the previous state h of T minus one and the current input xit and computes the function so you can see over here we have a function of T equals the sigmoid function of the weight of f the H at T minus one and then X at t plus of course you have a bias in there with any of your neural networks so we have a bias function so F of equals forget gate decides which information to delete that is not important from the previous time step considering an L STM is fed with the following inputs from the previous and present time step Alice is good in physics JN on the other hand is good in chemistry so previous output John plays football well he told me yesterday over the phone that he had served as a captain of his college football team that’s our current input so as we look at this the first step is the forget gate realizes there might be a change in context after en counting the First full stop Compares with the current input sentence of exit so we’re looking at that full stop and then Compares it with the input of the new sentence the next sentence talks about John so the information on Alice is deleted okay that’s important to know so we have this input coming in and if we’re going to continue on with John then that’s going to be the primary information we’re looking at the position of the subject is vacated and is a assigned to John and so in this one we’ve seen that we’ve weeded out a whole bunch of information and we’re only passing information on JN since that’s now the new topic so step two is in to decide how much should this unit add to the current state in the second layer there are two parts one is a sigmoid function and the other is a tangent H in the sigmoid function it decides which values to let through zero or one tangent H function gives the weightage to the values which are passed de setting their level of importance minus one to one and you can see the two formulat that come up uh the I of T equals the sigmoid of the weight of I a to t minus1 x t plus the bias of I and the C of T equals the tangent of H of the weight of C of H of T minus 1 x of t plus the bias of C so our I of T equals the input gate determines which information to let through based on its significance in the current time step if this seems a little complicated don’t worry because a lot of the programming is already done when we get to the case study understanding though that this is part of the program is important when you’re trying to figure out these what to set your settings at you should also note when you’re looking at this it should have some semblance to your forward propagation neural networks where we have a value assigned to a weight plus a bias very important steps than any of the neural network layers whether we’re propagating into them the information from one to the next or we’re just doing a straightforward neural network propagation let’s take a quick look at this what it looks like from the human standpoint um as I step out in my suit again consider the current input at xft John plays football well he told me yesterday over the phone that he had served as a captain of his college football team that’s our input input gate analyses the important information John plays football and he was a captain of his college team is important he told me over the phone yesterday is less important hence it is forgotten this process of adding some new information can be done via the input gate now this example is as a human form and we’ll look at training this stuff in just a minute uh but as a human being if I wanted to get this information from a conversation maybe it’s a Google Voice listening in on you or something like that um how do we weed out the information that he was talking to me on the phone yesterday well I don’t want to memorize that he talked to me on the phone yesterday or maybe that is important but in this case it’s not I want to know that he was the captain of the football team I want to know that he served I want to know that John plays football and he was a captain of the college football team the those are the two things that I want to take away as a human being again we measure a lot of this from the human Viewpoint and that’s also how we try to train them so we can understand these neural networks finally we get to step three decides what part of the current cell State makes it to the output the third step is to decide what will be our output first we run a sigmoid layer which decides what parts of the cell State make it to the output then we put the cell State through the tangent H to push the values to be between minus one and one and multiply it by the output of the oid gate so when we talk about the output of T we set that equal to the sigmoid of the weight of zero of the H of T minus one you back One Step in Time by the x of t plus of course the bias the H of T equals the out of T times the tangent of the tangent h of c a t so our o equals the output gate allows the past in information to impact the output in the current time step let’s consider the example to predicting the next word in the sentence John played tremendously well against the opponent and won for his team for his contributions Brave blank was awarded player of the match there could be a lot of choices for the empty space current input Brave is an adjective adjectives describe a noun John could be the best output after Brave thumbs up for John awarded player of the match and if you were to pull just the nouns out of the sentence team doesn’t look right because that’s not really the subject we’re talking about contributions you know Brave contributions or Brave team Brave player Brave match um so you look at this and you can start to train this these this neural network so it starts looking at and goes oh no JN is what we’re talking about so brave is an adjective Jon’s going to be the best output and we give John a big thumbs up and then of course we jump into my favorite part the case study use case implementation of lstm let’s predict the prices of stocks using the lstm network based on the stock price price data between 2012 2016 we’re going to try to predict the stock prices of 2017 and this will be a narrow set of data we’re not going to do the whole stock market it turns out that the New York Stock Exchange generates roughly three terabytes of data per day that’s all the different trades up and down of all the different stocks going on and each individual one uh second to second or nanc to nanoc uh but we’re going to limit that to just some very basic fundamental information so don’t think you’re going to get rich off this today but at least you can give an a you can give a step forward in how to start processing something like stock prices a very valid use for machine learning in today’s markets use case implementation of lstm let’s dive in we’re going to import our libraries we’re going to import the training set and uh get the scaling going um now if you watch any of our other tutorials a lot of these pieces just start to look very familiar because it’s very similar setup uh but let’s take a look at that and um just a reminder we’re going to be using Anaconda the Jupiter notebook so here I have my anaconda Navigator when we go under environments I’ve actually set up a caros python 36 I’m in Python 36 and uh nice thing about Anaconda especially the newer version remember a year ago messing with anaconda in different versions of python and different environments um Anaconda now has a nice interface um and I have this installed both on a Ubuntu Linux machine and on uh window so it works fine on there you can go in here and open a terminal window and then in here once you’re in the terminal window this is where you’re going to start uh installing using pip to install your different modules and everything now we’ve already pre-installed them so we don’t need to do that in here uh but if you don’t have them install in your particular environment you’ll need to do that and of course you don’t need to use the anaconda or the Jupiter you can use whatever favorite python ID you like I’m just a big fan of this CU it keeps all my stuff separate you can see on this machine I have specifically installed one for carass since we’re going to be working with carass under tensorflow when we go back to home I’ve gone up here to application and that’s the environment I’ve loaded on here and then we’ll click on the launch Jupiter notebook now I’ve already in my Jupiter notebook um have set up a lot of stuff so that we’re ready to go kind of like uh Martha Stewarts and the old cooking shows we want to make sure we have all our tools for you so you’re not waiting for them to load and uh if we go up here to where it says new you can see where you can um create a new Python 3 that’s what we did here underneath the setup so it already has all the modules installed on it and I’m actually renamed this so if you go under file you can rename it we I’m calling it RNN stock and let’s just take a look at start diving into the code let’s get into the exciting part now we’ve looked at the tool and of course you might be using a different tool which is fine uh let’s start putting that code in there and seeing what those Imports and uploading everything looks like now first half is kind of boring when we hit the rum button because we’re going to be importing numpy as NP that’s uh uh the number python which is your numpy array and the matap plot library because we’re going to do some plotting at the end and our pandas for our data set our pandas is PD and when I hit run uh it really doesn’t do anything except for load those modules just a quick note let me just do a quick draw here oops shift alt there we go you’ll notice when we’re doing this setup if I was to divide this up oops I’m going to actually U let’s overlap these here we go uh this first part that we’re going to do is our data prep a lot of prepping involved um in fact depending on what your system is since we’re using carass I put an overlap here uh but you’ll find that almost maybe even half of the code we do is all about the data prep and the reason I overlap this with uh carass let me just put that down because that’s what we’re working in uh is because car has like their own preset stuff so it’s already pre-built in which is really nice so there’s a couple Steps A lot of times that are in the carass setup uh we’ll take a look at that to see what comes up in our code as we go through and look at stock and then the last part is to evaluate and if you’re working with um shareholders or uh you know classroom whatever it is you’re working with uh the evaluate is the next biggest piece um so the actual code here crossed is a little bit more but when you’re working with uh some of the other packages you might have like three lines that might be it all your stuff is in your pre-processing and your data since carass has is is Cutting Edge and you load the individual layers you’ll see that there’s a few more lines here and cross is a little bit more robust and then you spend a lot of times uh like I said with the evaluate you want to have something you present to everybody else and say hey this is what I did this is what it looks like so let’s go through those steps this is like a kind of just general overview and let’s just take a look and see what the next set of code looks like and in here we have a a data set train and it’s going to be read using the PD or pandas read CSV and it’s a Google stock pric train. CSV and so under this we have training set equals data set train. iocation and we’ve kind of sorted out part of that so what’s going on here let’s just take a look at let’s let’s look at the actual file and see what’s going on there now if we look at this uh ignore all the extra files on this um I already have a train and a test set where it’s sorted out this is important to notice because a lot of times we do that as part of the pre-processing of the data we take 20% of the data out so we can test it and then we train the rest of it that’s what we use to create our neural network that way we can find out how good it is uh but let’s go ahead and just take a look and see what that looks like as far as the file itself and I went ahead and just opened this up in a basic word pad text editor just so we can take a look at it certainly you can open up an Excel or any other kind of spreadsheet um and we note that this is a comma SE ated variables we have a date uh open high low close volume this is the standard stuff that we import into our stock one of the most basic set of information you can look at in stock it’s all free to download um in this case we downloaded it from uh Google that’s why we call it the Google stock price um and it specifically is Google this is the Google stock values from uh as you can see here we started off at 13 2012 so when we look at this first setup up here uh we have a data set train equals pdor CSV and if you noticed on the original frame um let me just go back there they had it set to home Ubuntu downloads Google stock price train I went ahead and changed that because we’re in the same file where I’m running the code so I’ve saved this particular python code and I don’t need to go through any special paths or have the full path on there and then of course we want to take out um certain values in here and you’re going to notice that we’re using um our data set and we’re now in pandas uh so pandas basically it looks like a spreadsheet um and in this case we’re going to do I location which is going to get specific locations the first value is going to show us that we’re pulling all the rows in the data and the second one is we’re only going to look at columns one and two and if you remember here from our data as we switch back on over columns we always start with zero which is the date and we’re going to be looking at open and high which would be one and two we’ll just label that right there so you can see now when you go back and do this you certainly can extrapolate and do this on all the columns um but for the example let’s just limit a little bit here so that we can focus on just some key aspects of stock and then we’ll go up here and run the code and uh again I said the first half is very boring whenever you hit the Run button it doesn’t do anything cuz we’re still just loading the data and setting it up now that we’ve loaded our data we want to go ahead and scale it we want to do what they call feature scaling and in here we’re going to pull it up from the sklearn or the SK kit pre-processing import min max scaler and when you look at this you got to remember that um biases in our data we want to get rid of that so if you have something that’s like a really high value um let’s just draw a quick graph and I have have something here like the maybe the stock has a value One stock has a value of 100 and another stock has a value of five um you start to get a bias between different stocks and so when we do this we go ahead and say okay 100’s going to be the Max and five is going to be the men and then everything else goes and then we change this so we just squish it down I like the word squish so it’s between one and zero so 100 equals one or 1 equal 100 and 0 equal 5 and you can just multiply it’s usually just a simple multiplication we’re using uh multiplication so it’s going to be uh minus5 and then 100 divided or 95 divided by 1 so or whatever value is is divided by 95 and uh once we’ve actually created our scale we’ve toing it’s going to be from 0o to one we want to take our training set and we’re going to create a trending set scaled and we’re going to use our scaler SC and we’re going to fit we’re going to fit pit and transform the training Set uh so we can now use the SC this this particular object we’ll use it later on our testing set because remember we have to also scale that when we go to test our uh model and see how it works and we’ll go ahead and click on the run again uh it’s not going to have any output yet because we’re just setting up all the variables okay so we pasted the data in here and we’re going to create the data structure with the 60 time steps and output first note we’re running 60 time steps and that is where this value here also comes in so the first thing we do is we create our X train and Y train variables and we set them to an empty python array very important to remember what kind of array we’re in and what we’re working with and then we’re going to come in here we’re going to go for I in range 60 to 1258 there’s our 60 60 time steps and the reason we want to do this is as we’re adding the data in there there’s nothing below the 60 so if we’re going to use 60 time steps uh we have to start at 60 because it includes everything underneath of it otherwise you’ll get a pointer error and then we’re going to take our X train and we’re going to append training set scaled this is a scaled value between zero and one and then as I is equal to 60 this value is going to be um 60 – 60 is 0 so
this actually is 0 to I so it’s going to be 0 is 60 1 to 61 let me just circle this part right here 1 to 61 uh 2 to 62 two and so on and so on and if you remember I said 0 to 60 that’s incorrect because it does not count remember it starts at 0 so this is a count of 60 so it’s actually 59 important to remember that as we’re looking at this and then the second part of this that we’re looking at so if you remember correctly here we go we go from uh 0 to 59 of I and then we have a comma a zero right here and so finally we’re just going to look at the open value now I know we did put it in there for 1 to two um if you remember correctly it doesn’t count the second one so it’s just the open value we’re looking at just open um and then finally we have y train. append training set I to zero and if you remember correctly I to or I comma 0 if you remember correctly this is 0 to 59 so there’s 60 values in it uh so when we do I down here this is number 60 so we’re going to do this is we’re creating an array and we have 0 to 59 and over here we have number 60 which is going into the Y train it’s being appended on there and then this just goes all the way up so this is down here is a uh 0 to 59 and we’ll call it 60 since that’s the value over here and it goes all the way up to 1258 that’s where this value here comes in that’s the length of the data we’re loading so we’ve loaded two arrays we’ve loaded one array that has uh which is filled with arrays from 0 to 59 and we loaded one array which is just the value and what we’re looking at you want to think about this as a Time sequence uh here’s my open open open open open open what’s the next one in the series so we’re looking at the Google stock and each time it opens we want to know what the next one uh 0 through 59 what’s 60 1 through 60 what’s 61 2 through 62 what’s 62 and so on and so on going up and then once we’ve loaded those in our for Loop we go ahead and take X train and YT train equals np. array XT tr. NP array ytrain we’re just converting this back into a numpy array that way we can use all the cool tools that we get with numpy array including reshaping so if we take a look and see what’s going on here we’re going to take our X train we’re going to reshape it wow what the heck does reshape mean uh that means we have an array if you remember correctly um so many numbers by 60 that’s how wide it is and so we’re when you when you do XT train. shape that gets one of the shapes and you get um X train. shape of one gets the other shape and we’re just making sure the data is formatted correctly and so you use this to pull the fact that it’s 60 by um in this case where’s that value 60 by 1199 1258 minus 60199 and we’re making sure that that is shaped correctly so the data is grouped into uh 11 99 by 60 different arrays and then the one on the end just means at the end because this when you’re dealing with shapes and numpy they look at this as layers and so the in layer needs to be one value that’s like the leaf of a tree where this is the branch and then it branches out some more um and then you get the Leaf np. reshape comes from and using the existing shapes to form it we’ll go ahead and run this piece of code again there’s no real output and then we’ll import our different carass modules that we need so from carass Models we’re going to import the sequential model dealing with sequential data we have our D layers we have actually three layers we’re going to bring in our D our lstm which is what we’re focusing on and our Dropout and we’ll discuss these three layers more in just a moment but you do need the with the lstm you do need the Dropout and then the final layer will be the dents but let’s go ahead and run this and that’ll bring Port our modules and you’ll see we get an error on here and if you read it close it’s not actually an error it’s a warning what does this warning mean these things come up all the time when you’re working with such Cutting Edge modules that are completely being updated all the time we’re not going to worry too much about the warning all it’s saying is that the h5py module which is part of carass is going to be updated at some point and uh if you’re running new stuff on carass and you start updating your carass system you better make sure that your H5 Pi is updated too otherwise you’re going to have an error later on and you can actually just run an update on the H5 Pi now if you wanted to not a big deal we’re not going to worry about that today and I said we were going to jump in and start looking at what those layers mean I meant that and uh we’re going to start off with initializing the RNN and then we’ll start adding those layers in and you’ll see that we have the lstm and then the Dropout lstm then Dropout lstm then Dropout what the heck is that doing so let’s explore that we’ll start by initializing the RNN regressor equals sequential because we’re using the sequential model and we’ll run that and load that up and then we’re going to start adding our lstm layer and some Dropout regularization and right there should be the Q Dropout regularization and if we go back here and remember our exploding gradient well that’s what we’re talking about the uh Dropout drops out unnecessary data so we’re not just shifting huge amount of data through um the network so and so we go in here let’s just go ahead and uh add this in I’ll go ahead and run this and we had three of them so let me go and put all three of them in and then we can go back over them there’s the second one and let’s put one more in let’s put that in and we’ll go and put two more in I meant to I said one more in but it’s actually two more in and then let’s add one more after that and as you can see each time I run these they don’t actually have an output so let’s take a closer look and see what’s going on here so we’re going to add our first lstm layer in here we’re going to have units 50 the units is the positive integer and it’s the dimensionality of the output space this is what’s going going out into the next layer so we might have 60 coming in but we have 50 going out we have a return sequence because it is a sequence data so we want to keep that true and then you have to tell it what shape it’s in well we already know the shape by just going in here and looking at x train shape so input shape equals the XT Trin shape of 1 comma 1 makes it really easy you don’t have to remember all the numbers that put in 60 or whatever else is in there you just let it tell the regressor what model to use and so we follow our STM with Dropout layer now understanding the Dropout layer is kind of exciting because one of the things that happens is we can overtrain our Network that means that our neural network will memorize such specific data that it has trouble predicting anything that’s not in that specific realm to fix for that each time we run through the training mode we’re going to take 02 or 20% of our neurons and just turn them off so we’re only going to train on the other ones and it’s going to be random that way each time we pass through this we don’t overtrain these nodes come back in in the next training cycle we randomly pick a different 20 and finally they see a big difference as we go from the first to the second and third and fourth the first thing is we don’t have to input the shape because the shape’s already the output units is 50 here this Auto The Next Step automatically knows this layer is putting out 50 and because it’s the next layer it automatically sets that and says oh 50 is coming out from our last layer it’s coming out you know goes into the regressor and of course we have our Dropout and that’s what’s coming into this one and so on and so on and so the next three layers we don’t have to let it know what the shape is it automatically understands that and we’re going to keep the units the same we’re still going to do 50 units it’s still a sequence coming through 50 units and a sequence now the next piece of code is what brings it all together let’s go ahead and take a look at that and we come in here we put the output layer the dense layer and if you remember up here we had the three layers we had uh lstm Dropout and d uh D just says we’re going to bring this all down into one output instead of putting out a sequence we just know want to know the answer at this point and let’s go ahead and run that and so in here you notice all we’re doing is setting things up one step at a time so far we’ve brought in our uh way up here we brought in our data we brought in our different modules we formatted the data for training it we set it up you know we have our y x train and our y train we have our source of data and the answers we’re we know so far that we’re going to put in there we reshaped that we’ve come in and built our carass we’ve imported our different layers and we have in here if you look we have what uh five total layers now carass is a little different than a lot of other systems because a lot of other systems put this all in one line and do it automatic but they don’t give you the options of how those layers interface and they don’t give you the options of how the data comes in carass is Cutting Edge for this reason so even though a lot of extra steps in building the model this has a huge impact on the output and what we can do with this these new models from carass so we brought in our dents we have our full model put together our regressor so we need to go ahead and compile it and then we’re going to go ahead and fit the data we’re going to compile the pieces so they all come together and then we’re going to run our training data on there and actually recreate our regressor so it’s ready to be used so let’s go ahead and compile that and I can go ahe and run that and uh if you’ve been looking at any of our other tutorials on neural networks you’ll see we’re going to use the optimizer atom adom is op optimiz for Big Data there’s a couple other optimizers out there beyond the scope of this tutorial but certainly Adam will work pretty good for this and loss equals mean squared value so when we’re training it this is what we want to base the loss on how bad is our error or we’re going to use the mean squared value for our error and the atom Optimizer for its differential equations you don’t have to know the math behind them but certainly it helps to know what they’re doing and where they fit into the bigger models and then finally we’re going to do our fit fitting the RN into the training set we have the regressor fit xtrain y train epics and batch size so we know where this is this is our data coming in for the xtrain our y train is the answer we’re looking for of our data our sequential input epex is how many times we’re going to go over the whole data set we created a whole data set of XT train so this is each each of those rows which includes a Time sequence of 60 and badge size another one of those things where carass really shines is if you were pulling this save from a large file instead of trying to load it all into RAM it can now pick smaller batches up and load those indirectly we’re not worried about pulling them off a file today because this isn’t big enough to uh cause the computer too much of a problem to run not too straining on the resources but as we run this you can imagine what would happen if I was doing a lot more than just one column in one set of stock in this case Google stock imagine if I was doing this across all the stocks and I had instead of just the open I had open close high low and you can actually find yourself with about 13 different variables times 60 cuz it’s a Time sequence suddenly you find yourself with a gig of memory you’re loading into your RAM which will just completely you know if it’s just if you’re not on multiple computers or cluster you’re going to start running into resource problems but for this we don’t have to worry about that so let’s go ahead and run this and this will actually take a little bit on my computer it’s an older laptop and give it a second to kick in there there we go all right so we have epic so this is going to tell me it’s running the first run through all the data and as it’s going through it’s batching them in 32 pieces so 32 uh lines each time and there’s 1198 I think I said $199 earlier but it’s $ 1198 I was off by one and each one of these is 133 seconds so you can imagine this is roughly 20 to 30 minutes runtime on this computer like I said it’s an older laptop running at uh 0.9 GHz on a dual processor and that’s fine what we’ll do is I’ll go ahead and stop go get a drink of coffee and come back and let’s see what happens at the end and where this takes us and like any good cooking show I’ve kind of gotten my latte I also had some other stuff running in the background so you’ll see these numbers jumped up to like 19 seconds 15 seconds which you can scroll through and you can see we’ve run it through 100 steps or 100 epics so the question is what does all this mean one of the first things you’ll notice is that our loss is over here it kind of stopped at 0.0014 but you can see it kind of goes down until we hit about 0.0014 3 times in a row so we guessed our epic pretty close since our loss has remain the same on there so to find out we’re looking at we’re going to go ahead and load up our test data the test data that we didn’t process yet and real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let’s go ahead and go through this code and we can see we’ve labeled it uh part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV you see I’ve changed the path on it for my computer and uh then we’ll call it the real stock price and again we’re doing just the one column here and the values from ication so it’s all the rows and just the values from these that one location that’s the open Stock open and let’s go ahead and run that so that’s loaded in there and then let’s go ahead and uh create we have our inputs we’re going to create inputs here and this should all look familiar this is the same thing we did before we’re going to take our data set total we’re going to do a little Panda concat from the data Sate train now remember the end of the data set train is part of the data going in and let’s just visualize that just a little bit here’s our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data so to find out we’re looking at we’re going to go ahead and load up our test data the test data that we didn’t process yet and real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let’s go ahead and go through this code and we can see we’ve labeled it part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV you see I’ve changed the path on it for my computer and uh then we’ll call it the real stock price and again we’re doing just the one column here and the values from I location so it’s all the rows and just the values from these that one location that’s the open Stock open let’s go ahead and run that so that’s loaded in there and then let’s go ahead and uh create we have our inputs we’re going to create inputs here and this should all look familiar this is the same thing we did before we’re going to take our data set total we’re going to do a little Panda concat from the data State train now remember the end of the data set train is part of the data going in and let’s just visualize that just a little little bit here’s our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data cuz that’s part of the next data or it’s actually the top 59 so that’s what this first setup is over here is we’re going in we’re doing the real stock price and we’re going to just take the data set test and we’re going to load that in and then the real stock price is our data test. test location so we’re just looking at that first uh column the open price and then our data set total we’re going to take pandas and we’re going to concat and we’re going to take our data set train for the open and our data site test open and this is one way you can reference these columns we’ve referenced them a couple different ways we’ve referenced them up here with the one two but we know it’s labeled as a panda set is open so pandas is great that way lots of Versatility there and we’ll go ahead and go back up here and run this there we go and uh you’ll notice this is the same as what we did before we have our open data set we pended our two different or concatenated our two data sets together we have our inputs equals data set total length data set total minus length of data set minus test minus 60 values so we’re going to run this over all of them and you’ll see why this works because normally when you’re running your test set versus your training set you run them completely separate but when we graph this you’ll see that we’re just going to be we’ll be looking at the part that uh we didn’t train it with to see how well it graphs and we have our inputs equals inputs. reshapes or reshaping like we did before we’re Transforming Our inputs so if you remember from the transform between zero and one and uh finally we want to go ahead and take our X test and we’re going to create that X test and for I in range 60 to 80 so here’s our X test and we’re pending our inputs I to 60 which remember is 0 to 59 and I comma zero on the other side so it’s just the First Column which is our open column and uh once again we take our X test we convert it to a numpy array we do the same reshape we did before and uh then we get down to the final two lines and here we have something new right here on these last two lines let me just highlight those or or mark them predicted stock price equals regressor do predicts X test so we’re predicting all the stock including both the training and the testing model here and then we want to take this prediction and we want to inverse the transform so remember we put them between zero and one well that’s not going to mean very much to me to look at a at a float number between 0 one I want the dollar amounts I want to know what the cash value is and we’ll go ahead and run this and you’ll see it runs much quicker than the training that’s what’s so wonderful about these neural networks once you put them together it takes just a second to run the same neural network that took us what a half hour to train ahead and plot the data we’re going to plot what we think it’s going to be and we’re going to plot it against the real data what what the Google stock actually did so let’s go ahead and take a look at that in code and let’s uh pull this code up so we have our PLT that’s our uh oh if you remember from the very beginning let me just go back up to the top we have our matplot library. pyplot as PLT that’s where that comes in and we come down here we’re going to plot let me get my drawing thing out again we’re going to go ahead and PLT is basically kind of like an object it’s one of the things that always threw me when I’m doing graphs in Python because I always think you have to create an object and then it loads that class in there well in this case PLT is like a canvas you’re putting stuff on so if you’ve done HTML 5 you’ll have the canvas object this is the canvas so we’re going to plot the real stock price that’s what it actually is and we’re going to give that color red so it’s going to be in bright red we’re going to label it real Google stock price and then we’re going to do our predicted stock and we’re going to do it in blue and it’s going to be labeled predicted and we’ll give it a title because it’s always nice to give a title to your H graph especially if you’re going to present this to somebody you know to your shareholders in the office and uh the X label is going to be time because it’s a Time series and we didn’t actually put the actual date and times on here but that’s fine we just know they’re incremented by time and then of course the Y label is the actual stock price PLT do Legend tells us to build the legend on here so that the color red and and real Google stock price show up on there and then the plot shows us that actual graph so let’s go ahead and run this and see what that looks like and you can see here we have a nice graph and let’s talk just a little bit about this graph before we wrap it up here’s our Legend I was telling you about that’s why we have the legend to showed the prices we have our title and everything and you’ll notice on the bottom we have a Time sequence we didn’t put the actual time in here now we could have we could have gone ahead and um plotted the X since we know what the the dates are and plotted this to dates but we also know this only the last piece of data that we’re looking at so last piece of data which ends somewhere probably around here on the graph I think it’s like about 20% of the data probably less than that we have the Google price and the Google price has this little up jump and then down and you’ll see that the actual Google instead of a a turn down here just didn’t go up as high and didn’t low go down so our prediction has the same pattern but the overall value is pretty far off as far as um stock but then again we’re only looking at one column we’re only looking at the open price we’re not looking at how many volumes were traded like I was pointing out earlier we talk about stock just right off the bat there’s six columns there’s open high low close volume then there’s whether uh I mean volume shares then there’s the adjusted open adjusted High adjusted low adjusted close they have a special formula to predict exactly what it would really be worth based on the value of the stock and then from there there’s all kinds of other stuff you can put in here so we’re only looking at one small aspect the opening price of the stock and as you can see here we did a pretty good job this curve follows the curve pretty well it has like a you know little jumps on it bends they don’t quite match up so this Bend here does not quite match up with that bend in there but it’s pretty darn close we have the basic shape of it and the prediction isn’t too far off and you can imagine that as we add more data in and look at different aspects in the specific domain of stock we should be able to get a better representation each time we drill in deeper of course this took a half hour for my program my computer to train so you can imagine that if I was running it across all those different variables might take a little bit longer to train the data not so good for doing a quick tutorial like this so we’re going to direct into what is carass we’ll also go all the way through this into a couple of tutorials because that’s where you really learn a lot is when you roll up your sleeves so we talk about what is carass carass is a highlevel deep learning API written in Python for easy Implement implementation of neural networks uses deep learning Frameworks such as tensorflow pytorch Etc as backend to make computation faster and this is really nice because as a program there is so much stuff out there and it’s evolving so fast it can get confusing and having some kind of high level order in there we can actually view it and easily program these different neural networks uh is really powerful it’s really powerful to to um uh have something out really quick and also be able to start testing your models and seeing where you’re going so cross works by using complex deep learning Frameworks such as tensorflow pytorch um mlpl Etc as a back end for fast computation while providing a userfriendly and easy tolearn front end and you can see here we have the carass API uh specifications and under that you’d have like TF carass for tensor flow thano carass and so on and then you have your tensorflow workflow that this is all sitting on top of and this is like I said it organizes everything the heavy lifting is still done by tensor flow or whatever you know underlying package you put in there and this is really nice because you don’t have to um dig as deeply into the heavy end stuff while still having a very robust package you can get up and running rather quickly and it doesn’t distract from the processing time because all the heavy lifting is done by packages like tensor flow this is the organization on top of it so the working principle of carass uh the working principle of carass is carass uses computational graphs to express and evaluate mathematic iCal Expressions you can see here we put it in blue they have the expression um expressing complex problems as a combination of simple mathematical operators uh where we have like the percentage or in this case in Python that’s usually your uh left your um remainder or multiplication uh you might have the operator of x uh to the power of3 and it us is useful for calculating derivatives by using uh back propagation so if we’re doing with neural networks we send the error back back up to figure out how to change it uh this makes it really easy to do that without really having not banging your head and having to hand write everything it’s easier to implement distributed computation and for solving complex problems uh specify input and outputs and make sure all nodes are connected and so this is really nice as you come in through is that um as your layers are going in there you can get some very complicated uh different setups nowadays which we’ll look at in just a second and this just makes it really easy to start spinning this stuff up and trying out the different models so when we look at Cross models uh cross model you have a sequential model sequential model is a linear stack of layers where the previous layer leads into the next layer and this if you’ve done anything else even like the sklearn with their neural networks and propagation and any of these setups this should look familiar you should have your input layer it goes into your layer one layer two and then to the output layer and it’s useful for simple classifier decoder models and you can see down here we have the model equals AOSS sequential and this is the actual code you can see how easy it is uh we have a layer that’s dense your layer one has an activation they’re using the ru in this particular example and then you have your name layer one layer Den Rao name Layer Two and so forth uh and they just feed right into each other so it’s really easy just to stack them as you can see here and it automatically takes care of everything else for you and then there’s a functional model and this is really where things are at this is new make sure you update your cross or you’ll find yourself running this um doing the functional model you’ll run into an error code because this is a fairly new release and he uses multi-input and multi-output model the complex model which Forks into two or more branches and you can see here we have our image inputs equals your coros input shape equals 32x 32x 3 you have your uh dense layers dense 64 activation railu this should look similar to what you already saw before uh but if you look at the graph on the right it’s going to be a lot easier to see what’s going on you have two different inputs uh and one way you could think of this is maybe one of those is a small image and one of those is a full-sized image and that feedback goes into you might feed both of them into one note because it’s looking for one thing and then into one node for the other one and so you can start to get kind of an idea that there’s a lot of use for this kind of split and this kind of setup uh where we have multiple information coming in but the information’s very different even though it overlaps and you don’t want it to send it through the same neural network um and they’re finding that this trains faster and is also has a better result depending on how you split the data up and and how you Fork the models coming down and so in here we do have the two complex uh models coming in uh we have our image inputs which is a 32x 32 by3 your three channels or four if you’re having an alpha channel uh you have your dense your layers dense is 64 activation using the railu very common uh x equals dense inputs X layers dense x64 activation equals Rao X outputs equals layers dense 10 X model equals coros model inputs equals inputs outputs equals outputs name equals ninc model uh so we add a little name on there and again this is this kind of split here this is setting us up to um have the input go into different areas so if you’re already looking at corus you probably already have this answer what are neural networks uh but it’s always good to get on the same page and for those people who don’t fully understand neural networks to dive into them a little bit or do a quick overview neural networks are deep learning algorithms modeled after the human brain they use multiple neurons which are mathematical operations to break down and solve complex maical problems and so just like the neuron one neuron fires in and it fires out to all these other neurons or nodes as we call them and eventually they all come down to your output layer and you can see here we have the really standard graph input layer a hidden layer and an output layer one of the biggest parts of any data processing is your data pre-processing uh so we always have to touch base on that with a neural network like many of these models they’re kind of uh when you first start using them they’re like a black box you put your data in you train it and you test it and see how good it was and you have to pre-process that data because bad data in is uh bad outputs so in data pre-processing we will create our own data examples set with carass the data consists of a clinical trial conducted on 2100 patients r ing from ages 13 to 100 with a the patients under 65 and the other half over 65 years of age we want to find the possibility of a patient experiencing side effects due to their age and you can think of this in today’s world with uh co uh what’s going to happen on there and we’re going to go ahead and do an example of that in our uh live Hands-On like I said most of this you really need to have hands on to understand so let’s go ahead and bring up our anaconda and I’ll open that up and open up a Jupiter notebook for doing the python code in now if you’re not familiar with those you can use pretty much any of your uh setups I just like those for doing demos and uh showing people especially shareholders it really helps because it’s a nice visual so let me go and flip over to our anaconda and the Anaconda has a lot of cool to tools they just added datal lore and IBM Watson Studio clad into the Anaconda framework but we’ll be in the Jupiter lab or Jupiter notebook um I’m going to do jupyter notebook for this because I use the lab for like large projects with multiple pieces because it has multiple tabs where the notebook will work fine for what we’re doing and this opens up in our browser window because that’s how Jupiter notebook soorry Jupiter notebook is set to run and we’ll go under new create a new Python 3 and uh it creates an Untitled python we’ll go ahead and give this a title and we’ll just call this uh cross tutorial and let’s change that to a capital there we go we go and just rename that and the first thing we want to go ahead and do is uh get some pre-processing tools involved and so we need to go ahead and import some stuff for that like our numpy do some random number Generation Um I mentioned sklearn or your s kit if you’re installing sklearn the sklearn stuff it’s a s kit you want to look up that should be a tool of anybody who is uh doing data science if if you’re not if you’re not familiar with the sklearn toolkit it’s huge uh but there’s so many things in there that we always go back to and we want to go ahead and create some train labels and train samples uh for training our data and then just a note of what we’re we’re actually doing in here uh let me go ahead and change this this is kind of a a fun thing you can do we can change the code to markdown and then markdown code is nice for doing examples once you’ve already built this uh our example data we’re going to do experimental there we go experimental drug was tested on 2100 individuals between 13 to 100 years of age half the participants under 65 and 95% of participants are under 65 experience no side effects well 95% of participants over 65 um experience side effects so that’s kind of where we’re starting at um and this is just a real quick example because we’re going to do another one with a little bit more uh complicated information uh and so we want to go ahead and generate our setup uh so we want to do for I in range and we want to go ahead and create if you look here we have random integers train the labels of pin so we’re just creating some random data uh let me go ahead and just run that and so once we’ve created our random data and if you if I mean you can certainly ask for a copy of the code from Simply learn they’ll send you a copy of this or you can zoom in on the video and see how we went ahead and did our train samples a pin um and we’re just using this I do this kind of stuff all the time I was running a thing on uh that had to do with errors following a bell-shaped curve on uh a standard distribution error and so what do I do I generate the data on a standard distribution error to see what it looks like and how my code processes it since that was the Baseline I was looking for in this we’re just doing uh uh generating random data for our setup on here and uh we could actually go in uh print some of the data up let’s just do this print um we’ll do [Music] train samples and we’ll just do the first um five pieces of data in there to see what that looks like and you can see the first five pieces of data in our train samples is 49 85 41 68 19 just random numbers generated in there that’s all that is uh and we generated significantly more than that um let’s see 50 up here 1,000 yeah so there’s 1,00 here 1,000 numbers we generated and we could also if we wanted to find that out we can do a quick uh print the length of it and so or you could do a shape kind of thing and if you’re using numpy although the link for this is just fine and there we go it’s actually 2100 like we said in the demo setup in there and then we want to go ahead and take our labels oh that was our train labels we also did samples didn’t we uh so we could also print do the same thing oh labels uh and let’s change this to labels and [Music] labels and run that just to double check and sure enough we have 2100 and they’re labeled one Z one0 one0 I guess that’s if they have symptoms or not one symptoms uh Zer none and so we want to go ahead and take our train labels and we’ll convert it into a numpy array and the same thing with our samples and let’s go ahead and run that and we also Shuffle uh this is just a neat feature you can do in uh numpy right here put my drawing thing on which I didn’t have on earlier um I can take the data and I can Shuffle it uh so we have our so it’s it just randomizes it that’s all that’s doing um we’ve already randomize it so it’s kind of an Overkill it’s not really necessary but if you’re doing uh a larger package where the data is coming in and a lot of times it’s organized somehow and you want to randomize it just to make sure that that you know the input doesn’t follow a certain pattern uh that might create a bias in your model and we go ahead and create a scaler uh the scaler range uh minimum Max scaler feature range 0 to one uh then we go ahead and scale the uh scaled train samples we’re going to go ahead and fit and transform the data uh so it’s nice and scaled and that is the age uh so you can see up here we have 49 85 41 we’re just moving that so it’s going to be uh between zero and one and so this is true with any of your neural networks you really want to convert the data uh to zero and one otherwise you create a bias uh so if you have like a 100 creates a bias versus the math behind it gets really complicated um if you actually start multiplying stuff because a lot of multiplication Edition going on in there that higher end value will eventually multiply down and it will have a huge bias as to how the model fits it and then it will not fit as well and then one of the fun things we can do in Jupiter notebook is that if you have a variable and you’re not doing anything with it it’s the last one on the line it will automatically print um and we’re just going to look at the first five samples on here and just going to print the first five samples and you can see here we go uh 9 95. 791 so everything’s between zero and one and that just shows us that we scaled it properly and it looks good uh it really helps a lot to do these kind of print UPS halfway through uh you never know what’s going to go on there I don’t know how many times I’ve gotten down and found out that the data sent to me that I thought was scaled was not and then I have to go back and track it down and figure it out on there uh so let’s go ahead and create our artificial neural network and for doing that this is where we start diving into tensor flow and carass uh tensor flow if you don’t know the history of tensor flow it helps to uh jump into we’ll just use Wikipedia careful don’t quote Wikipedia on these things because you get in trouble uh but it’s a good place to start uh back in 20 Google brain built disbelief as a proprietary machine learning setup tensor flow became the open source for it uh so tensorflow was a Google product and then it became uh open sourced and now it’s just become probably one of the defao when it comes for neural networks as far as where we’re at uh so when you see the tensorflow setup it it’s got like a huge following there are some other setups like a um the site kit under the sklearn has our own little neural network uh but the tensorflow is the most robust one out there right now and caros sitting on top of it makes it a very powerful tool so we can leverage both the carass uh easiness in which we can build a sequential setup on top of tensor flow and so in here we’re going to go ahead and do our input of tensor flow uh and then we have the rest of this is all carass here from number two down uh we’re going to import from tensorflow the coros connection and then you have your tensorflow cross models import sequential it’s a specific kind of model we’ll look at that in just a second if you remember from the files that means it goes from one layer to the next layer to the next layer there’s no funky splits or anything like that uh and then we have from tensorflow Cross layers we’re going to import our activation and our dense layer and we have our Optimizer atom um this is a big thing to be aware of how you optimize uh your data when you first do it Adams’s as good as any atom is usually uh there’s a number of Optimizer out there there’s about uh there’s a couple main one thems but atom is usually assigned to bigger data uh it works fine usually the lower data does it just fine but atom is probably the mostly used but there are some more out there and depending on what you’re doing with your layers your different layers might have different activations on them and then finally down here you’ll see um our setup where we want to go ahead and use the metrics and we’re going to use the tensorflow cross metrics um for categorical cross entropy uh so we can see how everything performs when we’re done that’s all that is um a lot of times you’ll see us go back and forth between tensor flow and then pyit has a lot of really good metrics also for measuring these things um again it’s the end of the you know at the end of the Story how good does your model do and we’ll go ahead and load all that and then comes the fun part um I actually like to spend hours messing with these things and uh four lines of code you’re like ah you’re G to spend hours on four lines of code um no we don’t spend hours on four lines of code that’s not what we’re talking about when I say spend hours on four lines of code uh what we have here I’m going to explain that in just a second we have a model and it’s a sequential model if you remember correctly we mentioned the sequential up here where it goes from one layer to the next and our first layer is going to be your input it’s going to be uh what they call D which is um usually it’s just D and then you have your input and your activation um how many units are coming in we have 16 uh what’s the shape What’s the activation and this is where it gets interesting um because we have in here uh railu on two of these and softmax activation on one of these there are so many different options for what these mean um and how they function how does the ru how does the softmax function and they do a lot of different things um we’re not going to go into the activations in here that is what really you spend hours doing is looking at these different activations um and just some some of it is just U um almost like you’re playing with it like an artist you start getting a fill for like a uh inverse tangent activation or the tan activation takes up a huge processing amount uh so you don’t see it a lot yet it comes up with a better solution especially when you’re doing uh when you’re analyzing Word documents and you’re tokenizing the words and so you’ll see this shift from one to the other because you’re both trying to build a better model and if you’re working on a huge data set um it’ll crash the system it’ll just take two long to process um and then you see things like soft Max uh soft Max generates an interesting um setup where a lot of these when you talk about rayu oops let me do this uh Ru there we go railu has um a setup where if it’s less than zero it’s zero and then it goes up um and then you might have what they call lazy uh setup where it has a slight negative to it so that the errors can translate better same thing with softmax it has a slight laziness to it so that errors translate better all these little details make a huge different on your model um so one of the really cool things about data science that I like is you build your uh what they call you build defil and it’s an interesting uh design set set up oops I forgot the end of my code here the concept to build a fail is you want the model as a whole to work so you can test your model out so that you can do uh you can get to the end and you can do your let’s see where was it overshot down here you can test your test out the the quality of your setup on there and see where did I do my tensor FL oh here we go I did it was right above me there we go we start doing your cross entropy and stuff like that is you need a full functional set of code so that when you run it you can then test your model out and say hey it’s either this model works better than this model and this is why um and then you can start swapping in these models and so when I say I spend a huge amount of time on pre-processing data is probably 80% of your programming time um well between those two it’s like 8020 you’ll spend a lot of time on the model once you get the model down once you get the whole code and the flow down uh set depending on your data your models get more and more robust as you start experimenting with different inputs different data streams and all kinds of things and we can do a simple model summary here uh here’s our sequential here’s our layer our output our parameter this is one of the nice things about carass is you just you can see right here here’s our sequential one model boom boom boom boom everything’s set and clear and easy to read so once we have our model built uh the next thing we’re going to want to do is we’re want to go ahead and train that model and so the next step is of course model training and when we come in here this a lot of times is just paired with the model because it’s so straightforward it’s nice to print out the model setup so you can have a tracking but here’s our model uh the keyword in Cross is compile Optimizer atom learning rate another term right there that we’re just skipping right over that really becomes the meat of um the setup is your learning rate uh so whoops I forgot that I had an arrow but I’ll just underline it a lot of times the learning rate set to 0.0 uh set to 0.01 uh depending on what you’re doing this learning rate um can overfit and underfit uh so you’d want to look up I know we have a number of tutorials out on overfitting and underfitting that are really worth reading once you get to that point in understanding and we have our loss um sparse categorical cross entropy so this is going to tell carass how far to go until it stops and then we’re looking for metrics of accuracy so we’ll go ahead and run that and now that we’ve compiled our model we want to go ahead and um run it fit it so here’s our model fit um we have our scaled train samples our train labels our validation split um in this case we’re going to use 10% of the data for validation uh batch size another number you kind of play with not a huge difference as far as how it works but it does affect how long it takes to run and it can also affect the bias a little bit uh most of the time though so a batch size is between 10 to 100 um depending on just how much data you’re processing in there we want to go ahead and Shuffle it uh we’re going to go through 30 epics and uh put a verbose of two let me just go and run this and you can see right here here’s our epic here’s our training um here’s our loss now if you remember correctly up here we set the loss see where was it um compiled our data there we go loss uh so it’s looking at The Spar categorical cross entropy this tells us that as it goes how how how much um how much does the um error go down uh is the best way to look at that and you can see here the lower the number the better it just keeps going down and vice versa accuracy we want let’s see where’s my accuracy value accuracy at the end uh and you can see 619 69. 74 it’s going up we want the accuracy would be ideal if it made it all the way to one but we also the loss is more important because it’s a balance um you can have 100% accuracy and your model doesn’t work because it’s overfitted uh again you w’t look up overfitting and underfitting models and we went ahead and went through uh 30 epics it’s always fun to kind of watch your code going um to be honest I usually uh um the first time I run it I’m like Ah that’s cool I get to see what it does and after the second time of running it I’m like i’ like to just not see that and you can repress those of course in your code uh repress the warnings in the printing and so the next step is going to be building a test set and predicting it now uh so here we go we want to go ahead and build our test set and we have just like we did our training set a lot of times you just split your your initial set set up uh but we’ll go ahead and do a separate set on here and this is just what we did above uh there’s no difference as far as um the randomness that we’re using to build this set on here uh the only difference is that we already um did our scaler up here well it doesn’t matter because the the data is going to be across the same thing but this should just be just transform down here instead of fit transform uh because you don’t want to refit your data um on your testing data there we go and now we’re just transforming it because you never want to transform the test data um easy mistake to make especially on an example like this where we’re not doing um you know we’re randomizing the data anyway so it doesn’t matter too much because we’re not expecting something weird and then we went ahead and do our predictions the whole reason we built the model as we take our model we predict and we’re going to do here’s our xcal data batch size 10 verbose and now we have our predictions in here and we could go ahead and do a um oh we’ll print predictions and then I guess I could just put down predictions and five so we can look at the first five of the predictions and what we have here is we have our age and uh the prediction on this age versus what what we think it’s going to be what what we think is going to going to have uh symptoms or not and the first thing we notice is that’s hard to read because we really want a yes no answer uh so we’ll go ahead and just uh round off the predictions using the argmax um the numpy argmax uh for predictions so it just goes to a zer1 and if you remember this is a Jupiter notebook so I don’t have to put the print I can just just put in uh rounded predictions and we’ll just do the first five and you can see here 0 1 0 0 0 so that’s what the predictions are that we have coming out of this um is no symptoms symptoms no symptoms symptoms no symptoms and just as uh we were talking about at the beginning we want to go ahead and um take a look at this there we go confusion matrixes for accuracy check um most important part when you get down to the end of the Story how accurate is your model before you go and play with the model and see if you can get a better accuracy out of it and for this we’ll go ahead and use theit um the SK learn metric uh s kit being where that comes from import confusion Matrix uh some iteration tools and of course a nice map plot library that makes a big difference so it’s always nice to um have a nice graph to look at um pictures worth a thousand words um and then we’ll go ahead and do call it CM for confusion Matrix y true equals test labels y predict rounded predictions and we’ll go ahead and load in our cm and I’m not going to spend too much time on the plotting um going over the different plotting code um you can spend uh like whole we have whole tutorials on how to do your different plotting on there uh but we do do have here is we’re going to do a plot confusion Matrix there’s our CM our classes normalized false title confusion Matrix cmap is going to be in blues and you can see here we have uh to the nearest cmap titles all the different pieces whether you put tick marks or not the marks the classes the color bar um so a lot of different information on here as far as how we’re doing the printing of the of the confusion Matrix you can also just dump the confusion Matrix into a caborn and real quick get an output it’s worth knowing how to do all this uh when you’re doing a presentation to the shareholders you don’t want to do this on the Fly you want to take the time to make it look really nice uh like our guys in the back did and uh let’s go ahead and do this forgot to put together our CM plot labels we’ll go and run that and then we’ll go ahead and call the little the definition for our mapping and you can see here plot confusion Matrix that’s our the the little script we just wrote and we’re going to dump our data into it um so our confusion Matrix our classes um title confusion Matrix and let’s just go ahead and run that and you can see here we have our basic setup uh no side effects 195 had side effects uh 200 no side effects that had side effects so we predicted the 10 of them who had actually had side effects and that’s pretty good I mean I I don’t know about you but you know that’s 5% error on this and this is because there’s 200 here that’s where I get 5% is uh divide these both by by two and you get five out of a 100 uh you can do the same kind of math up here not as quick on the flight it’s 15 and 195 not an easily rounded number but you can see here where they have 15 people who predicted to have no uh with the no side effects but had side effects kind of setup on there and these confusion Matrix are so important at the end of the day this is really where where you show uh whatever you’re working on comes up and you can actually show them hey this is how good we are or not how messed up it is so this was a uh I spent a lot of time on some of the parts uh but you can see here is really simple uh we did the random generation of data but when we actually built the model coming up here uh here’s our model summary and we just have the layers on here that we built with our model on this and then we went ahead and trained it and ran the prediction now we can get a lot more complicated uh let me flip back on over here because we’re going to do another uh demo so that was our basic introduction to it we talked about the uh oops here we go okay so implementing a neural network with coros after creating our samples and labels we need to create our carass neural network model we will be working with a sequen model which has three layers and this is what we did we had our input layer our hidden layers and our output layers and you can see the input layer uh coming in uh was the age Factor we had our hidden layer and then we had the output are you going to have symptoms or not so we’re going to go ahead and go with something a little bit more complicated um training our model is a two-step process we first compile our model and then we train it in our training data set uh so we have compiling compiling converts the code into a form of understandable by Machine we used the atom in the last example a gradient descent algorithm to optimize a model and then we trained our model which means it let it uh learn on training data uh and I actually had a little backwards there but this is what we just did is we if you remember from our code we just had o let me go back here um here’s our model that we created summarized uh we come down here and compile it so it tells it hey we’re ready to build this model and use it uh and then we train it this is the part where we go ahead and fit our model and and put that information in here and it goes through the training on there and of course we scaled the data which was really important to do and then you saw we did the creating a confusion Matrix with caras um as we are performing classifications on our data we need a confusion Matrix to check the results a confusion Matrix breaks down the various misclassification ifications as well as correct classifications to get the accuracy um and so you can see here this is what we did with the true positive false positive true negative false negative and that is what we went over let me just scroll down here on the end we printed it out and you can see we have a nice print out of our confusion Matrix uh with the true positive false positive false negative true negative and so the blue ones uh we want those to be the biggest number because those are the better side and then uh we have our false predictions on here uh as far as this one so I had no side effects but we predicted let’s see no side effects predicting side effects and vice versa if getting your learning started is half the battle what if you could do that for free visit scaleup by simply learn click on the link in the description to know more now uh saving and loading models with carass we’re going to dive into a more complicated demo um and you’re going to say oh well that was a lot of complication before well if you broke it down we randomized some data we created the um carass setup we compiled it we trained it we predicted and we ran our Matrix uh so we’re going to dive into something a lot a little bit more fun is we’re going to do a face mask detection with carass uh so we’re going to build a carass model to check if a person is wearing a mask or not in real time and this might be important if you’re at the front of a store this is something today which is um might be very useful as far as some of our you know making sure people are safe uh and so we’re going to look at mask and no mask and let’s start with a little bit on the data and so in my data I have with a mask you can see they just have a number of images showing the people in masks and again if you want some of this information uh contact simply learn and they can send you some of the information as far as people with and without masks so you can try it on your own and this is just such a wonderful
example of this setup on here so before I dive into the mass detection uh talking about being in the current with uh covid and seeing that people are wearing masks this particular example I had to go ahead and update to a python 3.8 version uh it might run into a 37 I’m not sure I haven’t I kind of skipped 37 and installed 38 uh so I’ll be running in a three python 38 um and then you also want to make sure your tensor flow is up to date because the um they call functional uh layers that’s where they split if you remember correctly from back uh oh let’s take a look at this remember from here the functional model and a functional layer allows us to feed in the different layers into different you know different nodes into different layers and split them uh very powerful tool very popular right now in the edge of where things are with neural networks and creating a better model so I’ve upgraded to python 3.8 and let’s go ahead and open that up and go through uh our next example which includes uh multiple layers um programming it to recognize whether someone wears a mask or not and then uh saving that model so we can use it in real time so we’re actually almost a full um end to end development of a product here uh of course this is a very simplified version and it’d be a lot more more to it you’d also have to do like uh recognizing whether it’s someone’s face or not all kinds of other things go into this so let’s go ahead and jump into that code and we’ll open up a new Python 3 oops Python 3 it’s working on it there we go um and then we want to go ahead and train our mask we’ll just call this train mask and we want to go ahead and train mask and save it uh so it’s uh save mask train mask detection not to be confused with masking data a little bit different we’re actually talking about a physical mask on your face and then from the cross stampo we got a lot of imports to do here and I’m not going to dig too deep on the Imports uh we’re just going to go ahead and notice a few of them uh so we have in here go alt D there we go have something to draw with a little bit here we have our uh image processing and the image processing right here me underline that uh deals with how do we bring images in because most images are like a a square grid and then each value in there has three values for the three different colors uh cross and tensorflow do a really good job of uh working with that so you don’t have to do all the heavy listing and figuring out what going to go on uh and we have the mobile net average pooling 2D um this again is how do we deal with the images and pulling them uh dropout’s a cool thing worth looking up if you haven’t when as you get more and more into carass intenser flow uh it’ll Auto drop out certain notes that way you’ll get a better um the notes just kind of die uh they find that they actually create more of a bias than a help and they also add process in time so they remove them um and then we have our flatten that’s where you take that huge array with the three different colors and you find a way to flatten it so it’s just a one-dimensional array instead of a 2X two by3 uh dense input we did that in the other one so that should look a little familiar oops there we go our input um our model again these are things we had on the last one here’s our Optimizer with our atom um we have some pre-processing on on the input that goes along with bringing in the data in uh more pre-processing with image to array loading the image um this stuff is so nice it looks like a lot of work you have to import all these different modules in here but the truth is is it does everything for you you’re not doing a lot of pre-processing you’re letting the software do the pre-processing um and we’re going to be working with the setting something to categorical again that’s just a conversion from a number to a category uh 01 doesn’t really mean anything it’s like true false um label bizer the same thing uh we’re changing our labels around and then there’s our train test split classification report um our I am utilities let me just go ahead and scroll down here Notch for these this is something a little different going on down here this is not part of the uh tensor flow or the SK learn this is the S kit setup and tensor flow above uh the path this is part of um open CV and we’ll actually have another tutorial going out with the open CV so if you want to know more about Open CV you’ll get a glance on it in uh this software especially the ne the second piece when we reload up the data and hook it up to a video camera we’re going to do that on this round um but this is part of the open CV thing and you’ll see CV2 is usually how that’s referenced um but the IM utilities has to do with how do you rotate pictures around and stuff like that uh and resize them and then the map plot library for plotting because it’s nice to have a graph tells us how good we’re doing and then of course our numpy numbers array and just a straight OS access wow so that was a lot of imports uh like I said I’m not going to spend I spent a little time going through them uh but we didn’t want to go too much into them and then I’m going to create um some variables that we need to go ahead and initialize we have the learning rate number of epics to train for and the batch size and if you remember correctly we talked about the learning rate uh to the -4.1 um a lot of times it’s 0.001 or 0.001 usually it’s in that uh variation depending on what you’re doing and how many epics and they kind of play with the epics the epics is how many times are we going to go through all the data now I have it as two um the actual setup is for 20 and 20 works great the reason I have for two is it takes a long time to process one of the downsides of Jupiter is that Jupiter isolates it to a single kernel so even though I’m on an8 core processor uh with 16 dedicated threads only one thread is running on this no matter what so it doesn’t matter uh so it takes a lot longer to run even though um tensor flow really scals up nicely and the batch size is how many pictures do we load at once in process again those are numbers you have to learn to play with depending on your data and what’s coming in and the last thing we want to go ahead and do is there’s a directory with a data set we’re going to run uh and this just has images of masks and not masks and if we go in here you’ll see data set um and you have pictures with mass they’re just images of people with mass on their face uh and then we have the opposite let me go back up here without masks so it’s pretty straightforward they look kind of a skew because they tried to format them into very similar uh setup on there so they’re they’re mostly squares you’ll see some that are slightly different on here and that’s kind of important thing to do on a lot of these data sets get them as close as you can to each other and we’ll we actually will run in the in this processing of images up here and the cross uh layers and importing and and dealing with images it does such a wonderful job of converting these and a lot of it we don’t have to do a whole lot with uh you have a couple things going on there and so uh we’re now going to be this is now loading the um images and let me see and we’ll go ahead and uh create data and labels here’s our um uh here’s the features going in which is going to be our pictures and our labels going out and then for categories in our list directory directory and if you remember I just flashed that at you it had uh uh face mask or or no face mask those are the two options and we’re just going to load into that we’re going to pin the image itself and the labels so we’re just create a huge array uh and you can see right now this could be an issue if you had more data at some point um thankfully I have a a 32 gig hard drive or Ram even that does you could do with a lot less of that probably under 16 or even eight gigs would easily load all this stuff um and there’s a conversion going on in here I told you about how we are going to convert the size of the image so it resizes all the images that way our data is all identical the way it comes in and you can see here with our labels we have without mask without mask without mask uh the other one would be with mask those are the two that we have going in there uh and then we need to change it to the one not hot encoding and this is going to take our um um up here we had was was it labels and data uh we want the labels uh to be categorical so we’re going to take labels and change it to categorical and our labels then equal a categorical list uh we’ll run that and again if we do uh labels and we just do the last or the first 10 let let’s do the last 10 just because um minus 10 to the end there we go just so we can see where the other side looks like we now have one that means they have a mask one zero one zero so on uh one being they have a mask and zero no mask and if we did this in Reverse I just realized that this might not make sense if you’ve never done this before let me run this 01 so zero is uh do they have a mask on zero do they not have a mask on one so this is the same as what we saw up here without mask one equals um the second value is without mask so with masks without mask uh and that’s just a with any of your data processing we can’t really a zero if you have a 01 output uh it causes issues as far as training and setting it up so we always want to use a one hot encoder if the values are not actual uh linear value or regression values are not actual numbers if they represent a thing and so now we need to go ahead do our train X test X train y test y um train split test data and we’ll go ahead and make sure it’s going to be uh random and we’ll take 20% of it for testing and the rest for um setting it up as far as training their model this is something that’s become so cool when they’re training the Set uh they realize we can augment the data what does augment mean well if I rotate the data around and I zoom in iom zoom out I rotate it um share it a little bit flip it horizontally um fill mode as I do all these different things to the data it um is able to it’s kind of like increasing the number of samples I have uh so if I have all these perfect samples what happens when we only have part of the face or the face is tilted sideways or all those little shifts cause a problem if you’re doing just a standard set of data so we’re going to create an augment and our image data generator um which is going to rotate zoom and do all kinds of cool thing and this is worth looking up this image data generator and all the different features it has um a lot of times I’ll the first time through my models I’ll leave that out because I want to make sure there’s a thing we call build the fail which is just cool to know you build the whole process and then you start adding these different things in uh so you can better train your model and so we go and run this and then we’re going to load um and then we need to go ahead and you probably would have gotten an error if you hadn’t put this piece in right here um I haven’t run it myself cuz the guys in the back did this uh we take our base model and one of the things we want to do is we want to do a mobile net V2 um and this what we this is a big thing right here include the top equals false a lot of data comes in with a label on the top row uh so we want to make sure that that is not the case uh and then the construction of the head of the model that will be placed on the top of the base model uh we want to go ahead and set that up and you’ll see a warning here I’m kind of ignoring the warning because it has to do with the uh size of the pictures and the weights for input shape um so they’ll it’ll switch things to defaults just saying hey we’re going to Auto shape some of this stuff for you you should be aware of that with this kind of imagery we’re already augmenting it by moving it around and flipping it and doing all kinds of things to it uh so that’s not a bad thing in this but another data it might be if you’re working in a different domain and so we’re going to go back here and we’re going to have we have our base model we’re going to do our head model equals our base model output um and what we got here is we have an average pooling 2D pool size 77 head model um head model flatten so we’re flattening the data uh so this is all processing and flattening the image and the pooling has to do with some of the ways it can process some of the data we’ll look at that a little bit when we get down to the lower level on this processing it um and then we have our dents we’ve already talked a little bit about a d just what you think about and then the head model has a Dropout of 0.5 uh what we can do is a Dropout the Dropout says that we’re going to drop out a certain amount of nodes while training uh so when you actually use model it will use all the notes but this drops certain ones out and it helps stop biases from up forming uh so it’s really a cool feature on here they discovered this a while back uh we have another dense mode and this time we’re using soft Max activation lots of different activation options here softmax is a real popular one for a lot of things and so is Ru and you know there’s we could do a whole talk on activation formulas uh and why what they uses are and how they work when you first start out you’ll you’ll use mostly the ru and the softmax for a lot of them uh just because they’re they’re some of the basic setups it’s a good place to start uh and then we have our model equals model inputs equals base model. input outputs equals head model so again we’re still building our model here we’ll go ahead and run that and then we’re going to Loop over all the layers in the base model and freeze them so they will not be updated during the first training process uh so for layer and base model layers layers. trable equals False A lot of times when you go through your data um you want to kind of jump in partway through um I I’m not sure why in the back they did this for this particular example um but I do this a lot when I’m working with series and and specifically in stock data I want it to iterate through the first set of 30 Data before it does anything um I would have to look deeper to see why they froze it on this particular one and then we’re going to compile our model uh so compiling the model atom init layer Decay um initial learning rate over epics and we go ahead and compile our loss is going to be the binary cross entropy which we’ll have that print out Optimizer for opt metrics is accuracy same thing we had before not a huge jump as far as um the previous code and then we go ahead and we’ve gone through all this and now we need to go ahead and fit our model uh so train the head of the network print info training head run now I skipped a little time because it you’ll see the run time here is um at 80 seconds per epic takes a couple minutes for it to get through on a single kernel one of the things I want you to notice on here while we’re while it’s finishing the processing is that we have up here our augment going on so anytime the train X and train y go in there’s some Randomness going on there and is jiggling it around what’s going into our setup uh of course we’re batch sizing it uh so it’s going through whatever we set for the batch values how many we process at a time and then we have the steps per epic uh the train X the batch size validation data here’s our test X and Test Y where we’re sending that in uh and this again it’s validation one of the important things to know about validation is our um when both our training data and our test data have about the same accuracy that’s when you want to stop that means that our model isn’t biased if you have a higher accuracy on your uh testing you know you’ve trained it and your accuracy is higher on your actual test data then something in there is probably uh has a bias and it’s overfitted uh so that’s what this is really about right here with the validation data and validation steps so it looks like it’s let me go ahead and see if it’s done processing looks like we’ve gone ahead and gone through two epics again you could run this through about 20 with this amount of data and it would give you a nice refined uh model at the end we’re going to stop at 2 because I really don’t want to sit around all afternoon and I’m running this on a single thread so now that we’ve done this we’re going to need to evaluate our model and see how good it is and to do that we need to go ahead and make our predictions um these are our predictions on our test X to see what it thinks are going to be uh so now it’s going to be evaluating the network and then we’ll go ahead and go down here and we want to need to uh turn the index in remember it’s it’s either zero or one it’s uh 0 1 01 on you have two outputs uh not wearing uh wearing a mask not wearing a mask and so we need to go ahead and take that argument at the end and change those predictions to a zero or one coming out uh and then to finish that off we want to go ahead and let me just put this right in here and do it all in one shot we want to show a nicely formatted classification report so we can see what that looks like on here and there we have it we have our Precision uh it’s 97% with a mask there’s our F1 score support without a mask 97% um so that’s pretty high high setup on there you know three people are going to sneak into the store who are without a mask and that thinks they have a mask and there’s going to be three people with a mask that’s going to flag the person at the front to go oh hey look at this person you might not have a mask um if I guess it’s a set up in front of a store um so there there you have it and of course one of the other cool things about this is if some some’s walking into the store and you take multiple pictures of them um you know this is just an it it would be a way of flagging and then you can take that average of those pictures and make sure they match or don’t match if you’re on the back end and this is an important step because we’re going to this is just cool I love doing this stuff uh so we’re going to go ahead and take our model and we’re going to save it uh so model save Mass detector. model we’re going to give it a name uh we’re going to save the format um in this case we’re we’re going to use the H5 format and so this model we just programmed has just been saved uh so now I can load it up in model now they can use it for whatever and then if I get more information uh and we start working with that at some point I might want to update this model um make a better model and this is true of so many things where I take this model and maybe I’m uh running a prediction on uh making money for a company and as my model gets better I want to keep updating it and then it’s really easy just to push that out to the actual end user uh and here we have a nice graph you can see the training loss and accuracy as we go through the epics uh we only did the you know only shows just the one Epic coming in here but you can see right here as the uh um value loss train accuracy and value accuracy starts switching and they start converging and you’ll hear converging this is the convergence they’re talking about when they say you’re you’re um I know when I work in the S kit with sklearn neural networks this is what they’re talking about a convergence is our loss and our accuracy come together and also up here and this is why I’d run it more than just two epics as you can see they still haven’t converged all the way uh so that would be a cue for me to keep going but what we want to do is we want to go ahead and create a new Python 3 program and we just did our train mask so now we’re going to go ahead and import that and use it and show you in a live action um get a view of uh both myself in the afternoon along with my background of an office which is in the middle still of reconstruction for another month and we’ll call this uh mask detector and then we’re going to grab a bunch of um a few items coming in uh we have our um mobilet V2 import pre-processing input so we’re still going to need that um we still have our tensor floral image to array we have our load model that’s where most of the stuff’s going on this is our CV2 or open CV again I’m not going to dig too deep into that we’re going to flash a little open CV code at you uh and we actually have a tutorial on that coming out um our numpy array our IM utilities which is part of the open CV or CV2 setup uh and then we have of course time and just our operating system so those are the things we’re going to go ahead and set up on here and then we’re going to create this takes just a moment our module here which is going to do all the heavy lifting uh so we’re going to detect and predict a mask we have frame face net Mass net these are going to be generated by our open CV we have our frame coming in and then we want to go ahead and and create a mask around the face it’s going to try to detect the face and then set that up so we know what we’re going to be processing through our model um and then there’s a frame shape here this is just our um height versus width that’s all HW stands for um they’ve called it blob which is a CV2 DNN blob form image frame so this is reformatting this Frame that’s going to be coming in literally from my camera and we’ll show you that in a minute that little piece of code that shoots that in here uh and we’re going to pass the blob through the network and obtain the face detections uh so faet do set inport blob detections face net forward print detections shape uh so these is this is what’s going on here this is that model we just created we’re going to send that in there and I’ll show you in a second where that is but it’s going to be under face net uh and then we go ahead and initialize our list of faces their corresponding locations and the list of predictions from our face mask Network we’re going to Loop over the detections and this is a little bit more work than you think um as far as looking for different faces what happens if you have a fa a crowd of faces um so We’re looping through the detections and the shapes going through here and probability associated with the detection uh here’s our confidence of detections we’re going to filter out weak detection by ensuring the confidence is greater than the minimum confidence uh so we’ said it remember zero to one so 0 five would be our minimum confidence probably is pretty good um and then we’re going to put in compute bounding boxes for the object if I’m zipping through this it’s because we’re going to do an open CV and I really want to stick to just the carass part and so I’m I’m just kind of jumping through all this code you can get a copy of this code from Simply learn and take it apart or look for the open CV coming out and we’ll create a box uh the box sets it around the image ensure the bounding boxes fall within dimensions of the frame uh so we create a box around what’s going to what we hope is going to be the face extract the face Roi convert it from BGR to RGB Channel again this is an open CV issue not really an issue but it has to do with the order um I don’t know how many times I’ve forgotten to check the order colors we’re working with open CV because it’s all kinds of fun things when red becomes blue and blue becomes red uh then we’re going to go ahead and resize it process it frame it uh face frame setup again the face the CBT color we’re going to convert it uh we’re going to resize it image to array pre-process the input uh pin the face locate face. x. y and x boy that was just a huge amount and I skipped over a ton of it but the bottom line is we’re building a box around the face and that box because the open CV does a decent job of finding the face and that box is going to go in there and see hey does this person have a mask on it uh and so that’s what that’s what all this is doing on here and then finally we get down to this where it says predictions equals mass net. predict faces batch size 32 uh so these different images of where we’re guessing where the face is are then going to go through an generate an array of faces if you will and we’re going to look through and say does this face have a mask on it and that’s what’s going right here is our prediction that’s the big thing that we’re working for and then we return the locations and the predictions the locations just tells where on the picture it is and then the um prediction tells us what it is is it a mask or is it not a mask all right so we’ve loaded that all up so we’re going to load our serialized fac detector model from dis um and we have our the path that it was saved in obviously you’re going to put it in a different path depending on where you have it or however you want to do it and how you saved it on the last one where we trained it uh and then we have our weights path path um and so finally our face net here it is equals CB2 dn. read net uh Proto text path weights path and we’re going to load that up on here so let me go ahead and run that and then we also need to I’ll just put it right down here I always hate separating these things in there um and then we’re going to load the actual mass detector model from dis this is the the the model that we saved so let’s go ahead and run that on there also so this is in all the different pieces we need for our model and then the next part is we’re going to create open up our video uh and this is just kind of fun because it’s all part of the open CV video setup and me just put this all in as one there we go uh so we’re going to go ahead and open up our video we’re going to start it and we’re going to run it until we’re done and this is where we get some real like kind of live action stuff which is F this is what I like working about with images and videos is that when you start working with images and videos it’s all like right there in front of you it’s Visual and you can see what’s going on uh so we’re going to start our video streaming this is grabbing our video stream Source zero start uh that means it’s C grabbing my main camera I have hooked up um and then you know starting video you’re going to print it out here’s our video Source equals zero start Loop over the frames from the video stre stream oops a little redundancy there um let me close I’ll just leave it that’s how they had it in the code so uh so while true we’re going to grab the frame from the threaded video stream and resize it to have the maximum width of 400 pixels so here’s our frame we’re going to read it uh from our visual uh stream we’re going to resize it and then we have a returning remember we returned from the our procedure the location and the prediction so detect and predict mask we’re sending it the frame we’re sending it the face net and the mass net so we’re sending all the different pieces that say this is what’s going through on here and then it returns our location and predictions and then for our box and predictions in the location and predictions um and the box is is again this is an open CV set that says hey this is a box coming in from the location um because you have the two different points on there and then we’re going to unpack the box and predictions and we’re going to go ahead and do mask without a mask equals prediction we’re going to create our label no mask we create color if the label equals mask l225 and you know this is going to make a lot more sense when I hit the Run button here uh but we have the probability of the label we’re going to display the label and bounding box rectangle on the output frame uh and then we’re going to go ahead ahead and show the output from the frame CV2 IM am show frame frame and then the key equals CV2 weit key one we’re just going to wait till the next one comes through from our feed and we’re going to do this until we hit the stop button pretty much so are you ready for this let’s see if it works we’ve distributed our uh our model we’ve loaded it up into our distributed uh code here we’ve got it hooked into our camera and we’re going to go ahead and run it and there it goes it’s going to be running and we can see the data coming down down here and we’re waiting for the popup and there I am in my office with my funky headset on uh and you can see in the background my unfinished wall and it says up here no mask oh no I don’t have a mask on uh I wonder if I cover my mouth what would happen uh you can see my no mask goes down a little bit I wish I’d brought a mask into my office it’s up at the house but you can see here that this says you know there’s a 95 98% chance that I don’t have a mask on and it’s true I don’t have a mask on right now and this could be distributed this is actually an excellent little piece of script that you could start you know you install somewhere on a a video feed on a on a security camera or something and then you’d have this really neat uh setup saying hey do you have a mask on when you enter a store or public transportation or whatever it is where they’re required to wear a mask uh let me goe and stop that now if you want a copy of this uh code definitely give us a hauler we will be going into open CV in another one so I skipped a lot of the open CV um code in here as far as going into detail really focusing on the carass uh saving the model uploading the model and then processing a streaming video through it so you can see that the model works we actually have this working model that hooks into the video camera which is just pretty cool and a lot of fun so I told you we’re going to dive in and really Roll Up Our Sleeve and do a lot of coating today uh we did the basic uh demo up above for just pulling in a carass and then we went into a cross model uh where we pulled in data to see whether someone was wearing a mask or not so very useful in today’s world as far as a fully running application today we are talking about must have python AI projects and how to build them so that can really help you sharpen your skills and stand out in the growing field of artificial intelligence so let’s quickly see what is python python is one of the most popular programming languages for AI because it’s Simplicity and the powerful libraries it offers like tensorflow kasas and pytorch building projects using python is a great way to get started if you want to break into the AI industry whereas artificial intelligence is transforming Industries like healthcare finance and even entertainment companies are now looking for expert who know how to apply AI to real world problems in this video we will expl L beginner level to advanced level projects so these projects are designed to give you hands-on experience in building intelligent systems analyzing data and even automating tasks so without any further Ado let’s get started so let’s start with beginner level projects so number one we have fake new detection using machine learning so in today’s world fake news is a major concern causing misinformation to spread rapidly across social media and news platform detecting fake news is crucial to maintaining the Integrity of the information we consume so this projects aim to a machine learning model that can identify fake news articles by analyzing their textual content by learning from existing data sets of real and fake news the model will able to classify articles into these two categories thus assisting media outlets and social media platform in reducing the spread of misinformation this project is perfect introduction to natural language processing NLP as it involves X data manipulation feature extraction and supervised learning it can also be adapted for real-time use on website or social media platforms to flash suspicious article and provide more reliable information to users so now let’s see how to build this in this first step is data collection use data set like l or fake News Net that can contain label real and fake news articles you can find data set from the kle the second step is pre-processing clean the text by removing stop words punctuation and spal character tokenize and stem words using nltk or spacy the third step is feature extraction use TF IDF or bag of words to convert the text into numerical data for machine learning models the fourth step is model training train a classifier like logistic regression na Bas or random Forest on the data set the fifth step is evaluation evaluate the model using accuracy precision recall and F score metrix to determine how well it classify fake and real news tools you can use is nltk psyched learn and pandas skills you will gain from this are text pre-processing NLP classification models and if you want to make us video on this project please comment Down Below in number two we have image recognition using CNN image recognition is one of the core application of deep learning and computer vision Us in variety of Industries ranging from Healthcare to autonomous vehicles this project will guide you through building an image classifier using CNS a deep learning architecture designed specifically for image recognition task the goal is to create a model that can accurately classify images such as differentiating between cats and dogs by working on this project you will gain a solid understanding of the fundamental concepts of CNS such as convolution layers pooling and activation functions this project not only introduces you to the basics of CNS but also teaches essential skills like image pre-processing data set handling and model evalution which can be applied to more advanced computer vision projects in the future so now let’s see how to build this project in this project first you have to import data set so use a data set like cifr 10 or kegle cats versus dogs with label images the second step is pre-processing resize normalize and augmented images using libraries like open CV or pil to prepare the data set the third step is model architecture design a basic CNN model with convolutional Cooling and fully connected layers using K us or tensor flow the fourth step is training split the data set into training and validation sets and train the CNN model to classify images fifth step is model evaluation use accuracy precision and confusion metrics to evaluate how well the model predicts the correct class level tools you will use in this project are kasas tensorflow open CV pandas skills you will gained after doing this projects are image pre-processing CNN architecture model evaluation and if you want to make us video on image recognition using CNN please comment down below second we have intermate level projects in this first we have ai based recipe recommendation system recommendation systems have become an integral part of modern digital platform from e-commerce website suggesting product to users to streamline services recommending shows and movies so in this project you will build a recipe recommendation system that suggest diseses based on the ingredients a users has on hand this project demonstrate how recommendation algorithms such as content based filtering and collaborative filtering can be used to provide personalized suggestion you will also learn how to pre-process data and clean textual data such as ingredient list and Implement a machine learning algorithm to match user inputs with recipe databases this is an excellent project for understanding how recommendations system work and how they can be applied in various industries from food Tech to personalized contact recommendations so now let’s see how to build this in this Project’s first step is data collection use web scraping tools like beautiful soup to scrape recipes from websites or use data sets like recipe 1M the second step is pre-processing normalize and clean ingredient data by standardizing ingredient names and handling missing values the third step is recommendation algorithms Implement content based filtering and collabor filtering to recommend it recipes content based filtering matches ingredient list while collaborative filtering uses user preferences model fourth step is model training use cosign similarity to match user provided ingredients with recipe ingredient in the data set and the next step is interface and tools so create a simple web interface using flask where users can input ingredients and receive recipe recommendation tools you will use beautiful soup Panda psyched learn and flask so you will gain skills like web screen in data cleaning recommendation systems using this project by doing this project and if you want to make us video on this project please comment down below number fourth we have chatbot with sentiment analysis chatbots have transformed how businesses and services interact with users providing 24×7 support and personalized responses in this project you will build a conversational chatbot that can analyze the sentiment behind user inputs and respond accordingly by incorporating sentiment analysis the chatbot will not only understand the content of the users message messages but also the emotional toll such as whether the user is Happy frustrated or neutral this allow the chatboard to adjust a tone and responses to improve user satisfaction for example a chatboard could offer a more empathetic response if it detects a negative sentiment this project will give you hands-on experience in building a conversational AI system while learning how to integrate machine learning technique like sentiment analysis the skills you develop in this project can be applied to customer service Healthcare education and more so now let’s see how to build this project in this project the first step is chatbot framework use tools like dialog flow or rasa to build a conversational chatbot capable of handling various user intents the second step is sentiment analysis integrate a sentiment analysis model using pre-rain models like Veda orber the third step is conversational flow adjust the chatbox responses based on the sentiment positive negative or neutral detected in the user’s input the fourth step is integration and deployment so build an interface website or messaging platform where users can interact with the chatbot in real time and deploy the chatbot on a website or app allowing user to engage with it and receive sentiment aware responses so you will use tools in this are dialog flow rasa verer Transformers flask and you will gain skills like sentiment analysis chatbot deployment conversational year and if you want to make us video on this project please comment down below so now let’s see some advanced level projects in this we have ai powered image colorization image colorization is a fascinating application of deep learning that transform black and white images into color by predicting and applying realistic colors to grayscale images this project explore how CNN and G can be used to learn mapping between gray scale and colored images you will gather data set of colored images convert them into gray scale and train the model generate color version this project is especially useful in areas such as film restoration photography and artistic creation where colorization can breathe new life into old black and white images more however it highlights the power of deep learning in understanding and generating complex visual data giving you insight into how these models work for tasks like image Generation video prediction and Beyond the skills you learn in this project will also be useful for other creative AI application like style transfer and image synthesis so now let’s see how to build this project so in this project first step is data collection use data set of colored image convert them into gray scale and use the grayscale images as inputs while training the model to Output colorized version the second seate preprocessing normalize images pixel values and resize them for training the third step is model architecture Implement a unit model or generative advisal Network G which are well suited for image generation task like colorization fourth step is training and evaluation train the model on grayscale images as input and colored images as output using mean squared error method for guidance evaluate with visual inspection and Peak signal to noise ratio the last step is deployment so create a web interface where user can upload black and white images and get them colorized so you will use tools like tensor flow K US Open CV and flask so we will gain skills like deep learning CNN GN and image pre-process and if you want to make a video on AI powered image colorization project so please comment down below and the last we have object detection using YOLO you only look once object detection is one of the most popular computer vision application allowing machine to recognize and locate multiple objects within an image or video stream in real time YOLO is a state-of-the-art object detection algorithm known for its speed and accuracy this project involves building a real-time object detection systems capable of identifying multiple objects in images or videos feeds by drawing bounding boxes around them object detection has wide spread use in fields such as security surveillance autonomous driving and augmented reality where system need to understand and interact with their surroundings in real time by working on this project you will learn how to pre-process image data format bounding box labels and train a YOLO model using a data set like koku or Pascal you will also gain a valuable experience in deploying object detection system that process video streams giving you the skills to build application in Dynamic environment from traffic monitoring to Industrial robotics so now let’s see how to build this projects so in this project we will first import data set so use a data set like coko or Pascal V which contains label objects in images with bounding boxes the second step is pre-processing resize image and normalize pixel values uring bounding box labels are appropriately formatted the third step is model architecture use the U Loop U only Loop once architecture which splits images into a grid and predicts bounding boxes and class probabilities for each object the fourth step is training and evaluation train the YOLO model on label data using a framework like Darkness evaluate the model using metrics like intersection over Union and mean average Precision the last step is deployment develop a system that can process video streams in real time detecting objects and drawing bounding boxes around them tool you will use open CV tensor flow and darket you will gain skills like object detection YOLO architecture Real Time video processing and if you want to make vide on this project please comment down below so in conclusion these python AI projects not only help you build Hands-On skills but also provide a solid foundation for advancing your career in artificial intelligence whether you are working on fake news detection image recognition or developing Advanced tools like chatbots and object detection system these projects offer real world application that companies are looking for start small keep learning and as you complete each project you will get better prepared to take on the challenges in the growing AI field imagine this you are using a calculator app on your phone and it gives you an answer to be a complex math problem faster than you can blink pretty standard right but what if instead of just crunching numbers that app could actually think through the problem breaking it down like a human would considering the best approach and even explaining why it made certain decisions sound futuristic doesn’t it well we are not too far from that reality today we are diving into open A’s latest project code named Strawberry a new AI model that pushing the boundaries of reasoning and problem solving so in this video we will break down what makes strawberry special how it works and why it could change the game for AI systems moving forward so first off what exactly is strawberry according to recent report open AI is preparing to release this new AI model in the next two weeks or in the couple of weeks and it’s set to improve on things like reasoning and problem solving previously known as Q or qar this model is designed to be much better at thinking through problems compared to what we have seen from previous versions what makes a strawberry different from what we have used before so now let’s take a look one of the coolest things about strawberry is that it uses something called system Toth thinking this idea came from the famous psychologist Daniel kman and it refers to a more careful and slow way of thinking like when you really focus on solving a tricky problem instead of answering question instantly strawberry takes about 10 to 20 seconds to process its thought this extra time helps it to avoid mistakes and gives more accurate answers but the model doesn’t just think slowly it’s got some really cool abilities that makes it stand out let’s talk about those strawberry is built to handle Advanced reasoning and solve mathematical problems these are areas where AI system struggles but strawberry is designed to be a lot better at breaking down complex problem step by step and and here is something interesting it might even be added to Future versions of chity possibly as a model name called Oran or GPT 5 if it that happen it could mean chat GPT will become more smarter and more reliable in solving tough problems now here is where it gets really fascinating there is some research that might help us understand how strawberry improv it thinking let’s check it out you might have heard about something called star which stand for selftaught Reasoner this is a method where an AI can actually teach itself to think better here is how it works star starts with a few examples where the AI is shown how to solve problem step by step then the AI tries solving their problem on its own getting better as it goes it keeps improving by looking as its mistakes and learning from them this could be what’s happening with strawberry it’s using similar method to teach itself how to reason better and solve complex problem but the AI doesn’t just think better it’s also learning how to break down the problems in a very humanlike way so now let’s explore that next strawberry uses something called Chain of Thought reasoning basically when faced with a complex problem it breaks it down into smaller manageable steps kind of like how we do when we are solving a puzzle instead of just jumping on to an answer it takes a time to go through each step making the solution more understandable and accurate so this is especially useful in math where strawberry is expected to be a really strong with all its potential what does the future hold for AI models like strawberry so now let’s W this thing with a look at what’s next so now what’s next for open AI well strawberry is just the beginning there is talk about a future model called Oran which could be the next big version after gp4 or gp40 it may even use that strawberry Learners to get better at solving problems but here is the thing TR training these Advanced model is expensive training gp4 for example cost over 100 million even though open AO Sam old men said the era of just making bigger models is coming to an end it’s clear that the models like strawberry are focused on becoming smarter and more efficient so what does all of this mean for the future of AI and how we use it strawberry could represent a huge leap in ai’s ability to reason and solve complex problem so with its focus on slower more deliberate thinking and its potential connection to the star method it’s Paving the way for smarter more reliable AI system and this is just the star as we move forward models like Oran the possibilities are endless and that’s a r on open AI exciting new model strawberry it’s clear that this AI could bring major advancement in reasoning and problem solving and we can’t to see how it all unfolds what are thoughts on your strawberry do let us know in the comment section below today we are diving into the fascinating world of of Google Quantum AI we break it down step by step what Google Quantum AI is how is different from classical computers and why it’s a GameChanger and the real problem it’s solving we’ll also explore the latest developments their Innovative Hardware the challenges they face and why despite the hurdles it’s still an incredibly exciting field with a bright future stick with me because by the end you’ll be amazed at how this technology is shaping tomorrow so let’s get started the universe operat on quantum mechanics constantly adapting and evolving to overcome the hurdles it encounters Quantum Computing miror the dynamic nature it doesn’t just work within its environment it responds to it this unique tra opens the door to groundbreaking solutions for tomorrow’s toughest challenges the question arises what is Google Quantum AI Quantum AI is Google’s leap into the future of computing it’s a cuttingedge project where they are building powerful quantum computers and exploring how these machines can solve problems that traditional computers struggle with or can’t solve at all if not aware classical computers use bits like zero or one and solve tasks step by step great for everyday use now quantum computers use cubits which can be zero one or both simultaneously allowing them to solve complex problems much faster so think of Google Quantum AI like you’re trying to design a new medicine to fight a disease a regular computer would analyze molecules step by step which could take years but go Quantum AI on the other hand can simulate how molecules interact at the quantum level almost instantly this speeds up drug Discovery potentially saving millions of lives by finding treatments faster now you must be wondering why is it so necessary Google Quantum AI is necessary because some problems are just too big and complex for regular computers to solve efficiently these are challenges like developing life-saving medicines creating unbreakable cyber security optimizing Traffic systems or even understanding How the Universe works regular computers can take years or even centuries to crack these problems while quantum computers could solve them in minutes or hours so the question is actually what problems they’re solving it is basically solving so many problems I will list some of them number one drug Discovery simulating molecules to find new treatments faster then comes cyber security developing Ultra secure encryption systems to keep your data safe AI advancements training AI models much quicker and with more accuracy climate modeling understanding climate changes to create better solutions for global warming so in simple terms Google Quantum AI is here to tackle The Impossible problems and bring futuristic solutions to today’s challenges it’s like upgrading the world’s brain to things smarter and faster so Google Quantum AI has been at the for front of quantum Computing advancements pushing boundaries from the groundbreaking psychol process to the latest Innovation Willow in 2019 Google introduced psychor a 53 Cubit processor that achieved something called Quantum Supremacy so cubits or Quantum bits are the code of quantum computers unlike regular bits which are either zero or one cubits can be zero one or both at once this called superposition allowing quantum computers to process vast data simultaneously they are powerful but fragile needing precise control and hold the key to solving complex problems psychos solved a problem in just 200 seconds that would take the world’s fastest supercomputer over 10,000 years this was a big moment it showed quantum computers could do things that classical computers couldn’t after psychor scientists realized a key issue quantum computers are very sensitive to errors even small disturbances can mess up calculations to fix this Google started working on error correction making their systems more accurate and reliable for real world use in 2024 Google launched Willow a 105 Cub processor this ship is smarter and more powerful and it can correct errors as they happen so Willow shows how much closer we are to building quantum computers that can solve practical problems Google’s logical chibits have reached a huge breakthrough they Now operate below the critical Quantum error correction threshold sounds exciting right but what does this mean let’s break it down so conun computers use cubits which are very powerful but also very fragile they can easily be disrupted by noise or interference causing errors so to make quantum computers practical they need to correct these errors while running complex calculations this is where logical cubits comes in they group multiple physical cubits to create a more stable and reliable unit for computing the error correction threshold is like a magic line if errors can be corrected faster than they appear the system becomes scalable and much more reliable by getting their logical keybords to operate below this threshold Google has shown that their quantum computers can handle ERS effectively Paving the way for larger and more powerful Quantum systems so let’s discuss what is a great Hardware approach in Google Quantum AI that made it possible Google Quantum ai’s Hardware approach focuses on making quantum computers stable and reliable for practical use they group cubits which are the building blocks of quantum computers to work together allowing the system to fix errors as they happen so by keeping the chips at extreme cold temperatures they reduce interference which keeps the calculations accurate this setup helps the system handle bigger and more complex tasks like simulating molecules for drug Discovery improving AI models and creating stronger encryption for data security it’s a big step in making Quantum Computing a tool for solving real world problems so while Google Quantum AI has achieved incredible Milestone it still faces some key limitations which are fragile cubits cubits are extremely sensitive to noise and interference which can cause errors keeping them stable requires Ultra cold temperatures and precise control error correction challenges also Google has made progress in fixing errors Quantum error correction still isn’t perfect and needs more work before quantum computers can scale to solve real world problems reliably limited applications right now quantum computers are great for specialized problems like optimization and simulation for everyday Computing tasks classical computers are still better Hardware complexity building and maintaining a quantum computers incredibly expensive and complicated the advanced cooling systems and infrastructure make it hard to expand these systems widely still in early stages quantum computers including Google’s are still in the experimental phase they’re not yet ready for large scale practical useing Industries but despite its challenges Google Quantum AI is Paving the way for a future where Quantum Computing tackles problems that regular computers can’t handle like finding new medicines predicting climate changes and building smarter AI it’s an exciting start to a whole new era of Technology full of possibilities we are just beginning to explore the future of Google Quantum AI is incredibly exciting with the potential to solve real world problems that traditional computers can’t handle it’s set to revolutionalize Industries like healthare by speeding up drug Discovery Finance through advanced optimization and energy with better material modeling so Quantum AI could also lead to breakthroughs in AI by trailing smarter models faster and commuting Unbreakable encryp for stronger data security as Google improves its hardware and error correction its Quantum systems will become more powerful and reliable Paving the way for large scale practical applications the possibilities are endless and Google Quantum AI is the Forefront of shaping a transformative future artificial intelligence or AI is Transforming Our World making things faster and more efficient but what happens when AI makes mistakes when AI is biased it can have serious consequences for companies and people’s lives imagine missing out on a job being wrongly identified in a photo or being unfairly treated all because a computer program made a bad decision these mistakes don’t just harm individuals they can affect entire communities without realizing it so AI bias is also called algorithmic bias happens when AI system unintentionally favor one group over another take healthcare for example if the data used to train an AI system doesn’t include enough woman or people for minority groups the system might not work as well for them this can lead to incorrect medical prediction like giving black patients less accurate result than white patients in job hiring AI can unintentionally promote certain stereotypes like when job ads use term like ninja which may attract more men than women even though the term isn’t a requirement for the job even in Creative areas like image generation AI can reinforce biases when asked to create picture of cosos AI system often mostly white men leaving out women and people of color in law enforcement AI tools sometimes rely on biased data which can unfairly Target minority communities so in this video we will explore some well-known examples of AI bias and how these mistakes are impacting people and Society from healthare to hiring and even criminal justice AI bias is something we need to understand and fix so let’s dive in and learn how these bias happen and what can be done to stop them so without any fur further Ado let’s get started so what is AI bias AI bias also called machine learning bias happens when human biases affect the data used to train AI system causing unfair or inaccurate results when AI bias isn’t fixed it can hurt a business success and prevent some people from fully participating in the economy or Society biases makes AI less accurate which reduce its Effectiveness businesses May struggle to benefit from system that give unfair result and Scandals from a bias can lead to loss of trust especially among groups like people of color woman people with disabilities and the lgbtq community AI models often learn from the data that reflect Society biases this can lead to unfair treatment of marginalized groups in the areas like hiring policing and credit scoring as a Wall Street Journal notes businesses still find it challenging to address this widespread biases as the AI use grows so moving forward let’s see some sources of AI bias distorted outcomes can negatively affect both organization and society as a whole so here are some common forms of AI bias the first one is algorithm bias if the problem or question is not well defined or the well feedback provided to the machine learning algorithm is inadequate the result may be inaccurate or misleading the second one is cognitive bias since AI system rely on a human input they can be affected by unconscious human biases which may influence either the data set or the model’s Behavior the third one confirmation bias this occurs when the AI overly depends on existing beliefs or Trends in the data reinforcing prior biases and falling to detect new patterns or Trends the fourth one execution bias when important data is omitted from the data set often due to the developer overlooking New or crucial factors this type of bias arises the fifth one measurement biases so this bias stems from incomplete data such as when a data set fails to represent the entire population for instance if a college analyze only graduates to determine success factors it would Overlook reasons why other drop out so moving forward let’s see how to avoid bias so here are checklist of six process steps that can keep AI programs free of bias the first one is choose the right model ensure diverse stakeholder select training data in supervised model and integrate bias Direction tools in unsupervised models the second one use accurate data train AI with complete balanced data that reflects the true demographics the third one build a diverse team a Vari team helps sport buyers including innovators creators implementers and end users the next one watch data processing buyers can appear during any phase of processing so stay viland throughout the fifth one monitor regularly so continuously test models and have independent assessment to detect and fix biases and the last one check infrastructure ensure technological tools and sens are functioning properly to avoid hidden biases so conclusion is AI biases poses serious challenges by amplifying existing societal biases affecting individuals and businesses from Healthcare to hiring AI system can unintentionally reinforce stereotypes and inequalities imagine you are managing a global supply chain company and where you have to handle orders shipments and demand forecasting but unexpected issues arises where certain shortages like transport delays and the changes in demand so instead of relying on manual adjustments what if an AI agent could handle everything automatically this AI wouldn’t just suggest actions it would decide execute and continuously improve its strategies That’s The Power of agent Ki with that said guys I welcome you all on our today’s tutorial on what is Agent Ki now let us start with understanding first the first wave of artificial intelligence which was Predictive Analytics or we could say data analytics and forecasting what exactly happened happened uh like predictive AI focused more on analyzing the historical data identifying the patterns and making forecast about the future events and these model do not generate any new content but instead it was predicting outcomes based on the statistical models and machine learning now technically how used to work so basically what we had like we used to take uh suppose this is the ml model okay so this is taking a structured data which could be like suppose any past user activity or it could be a transaction record or any sensor reading for example you can consider say Netflix users watch History okay it could be any movie genre watch time and the user rating so now after this what we were basically doing is we were doing the feature engineering or pre-processing okay now in the future uh Engineering Process we were extracting key features like user watch time Trends preferred genre and was frequency and we could
also apply scaling normalization and encoding techniques to basically make data more usable for the ml model then we were using the ml models suppose it could be a Time series forecasting models like ARA lstm and all those given algorithms which was basically predicting the Future movie preferences based on the historical data and in the output guys Netflix AI recommends new shows or movies based on the similar user patterns so this is how exactly the Netflix model was working incorporating the machine learning model so this was exactly the first wave of AI now let us discuss about the second wave of AI now if I discuss about the second wave which was basically content creation and use of conversational AI so you know LM models like chat GPT became very much popular during the second wave of artificial intelligence so what exactly was happening like generative AI was taking input data and it was producing new content such as text images videos or even code and these models learn from patterns in large data sets and it was generating humanik outputs now let us bit understand how exactly this technology was working so basically first there was a data input okay so basically any prompt from the user so suppose in the GPT okay so I’ll just open GPT all over here and say we are uh suppose we are giving any new prompt say such as write a article on AI okay so this was our given prompt and after this what exactly was happening was tokenization and pre-processing so the input text suppose which I have written all over here write a article on AI so this text was basically split into smaller parts for example like uh you could consider certain thing like this so here you have WR as one uh you know and as next and similarly you could carry on you know for the other words then what exactly used to happened that these words were you know uh converted into word embeding means the numerical vectors represent words like in a higher dimensional space and then we used to perform neural network processing so here the LM processes input such as attention mechanisms okay or you know using these models like gb4 bir and L and with the help of self attention layers they were understanding the context and they were predicting the next word okay now as a result you were getting output certain thing like this so which was basically a generat AI phase so this was guys our second evolution of AI now if I talk about our third wave so it is basically agentic AI or autonomous AI agent now what is this guys so the agentic AI actually goes beyond text generation so it integrates decision making action execution and autonomous learning these AI systems don’t just respond to prompts but they also independently plan execute and optimize the processes so you could understand something like this so so here the first uh step was the user input or receiving any go so user provides any high level instruction for example it could be like say optimize Warehouse shipments for maximum efficiency it could something be like that and unlike generative AI which would generate text agentic AI executes real world actions after this what suppose The Prompt that we have given like optimize Warehouse shipments for maximum efficiency then the next step would have been quering the databases the AI would pull the realtime data from multiple sources so it could be traditional database like SQL or no SQL where we are fetching inventory levels or shipment history then it could be a vector uh database from where it is receiving some unstructured data like past customer complaints and all those things then with the help of external apis it is connecting to like forecasting services or fuel price apis or supplier Erp systems and these things are like present with this uh respect then uh the third step was the llm decision making now after quing the database the AI agent processes data through the llm based reasoning engine example like decision rules applied like suppose if inventory is low then it could automate supplier restocking orders like if shipment cost is increased then it is rerouting shipments through cheaper vendors and suppose also if weather condition impact the route then it is adjusting the delivery schedules now you can understand how agentic AI is behaving all over here in the decision making process now next step would be action execution bya apis so AI is executing task without human intervention it is triggering an API call to reorder a stock from A supplier or update the warehouse robot workflows to PRI I fast moving products or even send emails and notifications to logistic partners and about the changes what is going to be happen and after this finally it is continuously learning which is a data fly wheel all over here okay the AI is monitoring the effectiveness of its action like uh it was restocking efficient or did routing shipments you know uh reduce the cost and all so it is mon monitoring the effectiveness of the action it has taken and the data flywheel is continuously improving the future decisions so basically it is using reinforcement learning and fine-tuning to optimize its logic okay now let’s have a just quick recap about the comparison of all these three waves of AI so basically creative ai’s main focus was on forecasting the trends okay while generative AI was creating the content and agentic AI on the other hand which is at the final step right now is making decision and taking action so you could see how the Evolution happened of AI in all these stages and if you uh understand about the learning approach then productive AI was basically analyzing the historical data while generative AI was learning from the patterns like using text image generation okay and but agentic AI is basically using the reinforcement learning or the self-learning to improve its learning approach now if we just look at the user involvement in productive AI so human is asking for the forecast in all here human is giving the prompts but in the agent AI the prompts or the intervention of human input has become very much minimal if you could understand the technology like basically productive VI was using machine Learning Time series analytics so these kind of you know uh algorithms they were using generative AI was using Transformers like GPT llama BT and all those things now agentic ai is doing word guys it is using llm plus apis plus autonomous execution so we have discussed how this workflow is you know in a short way how it is working and uh moving ahead we are also going to discuss uh through an example how exactly all these steps like agent AI is working so based on the example you could understand like uh predictive AI you know Netflix recommendation model which they have on their system and uh similarly if you talk about U generative AI then you could understand about chat GPT you know writing articles and all those things and agentic AI we could imagine like how AI if Incorporated in Supply chains how you know things are working out so guys I hope so you would have got a brief idea regarding the three waves of AI now let us move ahead and bit understand about what is the exact difference between generative Ai and agentic AI now guys let us understand the difference between generative Ai and agentic AI so let us first you know deep dive into what exactly is a generative AI okay so as you can see all over here that generative AI models generally are taking input query okay and they are processing it using llm or large language model and basically returning a static response without taking any further action so in this case for example a chatbot like uh chat GPT you know it is taking the input from the user so as I’ve shown you earlier that uh say suppose I’ve given an input like write a blog post on AI in healthcare so when I have written this uh given given uh you know user input or given the query so when it goes to the large language model this model is actually you know tokenizing all these input query and it is retrieving the relevant Knowledge from its training data and it generate text based on the patterns now we give the prompt then llm processes it okay and then we are getting the given output so now this is basically how you know generative AI is working so you could see all over here we have GPT mod model we have Del we have codex so these are some of the you know amazing you know generative AI models okay now let us discuss bit about Del which is actually a you know realistic image generation you know gen so uh like Del is described as you know the realistic image generation model by the open Ai and this actually is a part of you know generative AI category alongside with GPT which is basically for human like language creation purposes this this model was created and you could have also codex for like uh it could be used for advanced code generation purposes so let us discuss a bit about di so di is like a deep learning model basically which is designed to generate realistic images from the text prompt and it can create highly detailed and creative visuals based on descriptions provided by the users so uh some of the aspects of di like you could have all over here like text to image generation where users can input text prompts and Di can generate Unique Images based on those description the images generated by di are highly realistic and creative okay and it can generate photo realistic images artistic illustration and even surreal or imaginative visuals we will also have customization and variability where it is allowing variation of an image edits based on text instruction and multiple style so this is also part of a generative AI model and it is this tool is actually playing a very amazing role so I will show you one example like how generative AI is actually working in a mage generation Our purposes so guys as you can see all over here I have opened this generative VI tool called di let us give a prompt to Di and let us see how the image is generated so let’s say we want have a futuristic city at Sunset filled with neon skyscrapper they have flying cars and holographic Billboards streets are bustling with humanoid robots and we can have people wearing uh let’s just say Hightech you know let’s include some technology okay now let us see how uh di is trying to create an image so this is how actually generative AI is working so let is wait for a few seconds as the output comes up now you could see all over here that uh this image which is generated basically this is generated by Ai and you could see based on our prompt it has given like the kind of you know uh the input we gave and we got the output based on this now so this is one of the amazing uh gen tool we could explore this guys okay now guys let us discuss about agentic AI or autonomous decision making and action execution so you could see this diagram all over here so agentic AI like unlike the generative AI it is not generating responses but it is also executing a task autonomously based on the given query for example like if you take AI in managing a warehouse inventory okay suppose we want to optimize the warehouse shipment for the next quarter so here what is going to happen so first the agent is going to receive its goal all over here okay and um this AI agent uh you know is going to query the external data sources so it could uh you know for example it could be your uh you know inventory databases or Logistics API and then it retrieves real time inventory levels and it demands the given forecast okay now at here it is going to make the autonomous discussions and the kind of output we are going to get will be kept in observation by this agent okay so basically it is going to analyze the current Warehouse stock product demand for the next quarter check the supplier’s availability and automate the restocking if inventory is below the given threshold so U for example you could uh imagine uh you know suppose based on the you know output what we are going to get all over here so based on this output we could get certain thing like this like uh say current inventory level like say 75% capacity okay then uh it could have also other thing like uh say demand forecast say 30% increas in expected in quarter two and also it is going to go say like say reordering initiated so this is output what we are going to get based on the supply chain man management and example what we are trying to get so as we have seen in generative AI user is giving the input okay prompt then it is using llm model to generate the given output but agent AI is doing what guys it is going it is going to take action you know beyond just generating a text so in this scenario it is squaring the inventory databases it is automating the purchase order it is going to select the optimal shipping providers which could be you know suitable for the given company it is going to continuously refine the strategies based on the realtime feedback so guys let’s recap Once More so if we talk about the function base then J is more concerned with producing a written content or a visual content okay and even it can code from the pre-existing input but if you talk about agent AI guys uh it is actually you know it’s all about decision making taking actions towards a specific goal and it is focused on achieving the objectives by interacting with the environment and making the aous decision gen is exactly relying on the existing data to predict and generate content based on say patterns it has learned during its training phase but it does not adapt or evolve from its experiences whereas if I talk about agentic AI it is adaptive so it is learning from its actions and experiences it is improving over time by analyzing the feedback adjusting Its Behavior to meet objectives more effectively with the help of Genna human input is essential to The Prompt so that you know basically with the help of that it could go into the LM model and it could generate the given uh you know output based on your prompt once uh you set up the agentic AI it requires like minimum human involvement it operates autonomously making decisions and adapting to changes like without continuous human guidance and it can even learn in real time so that’s what the beauty of agentic so we have given one example of gen like basically giving prompt to the chat GPT or Del okay and agentic AI one example could be your Supply Chain management system now let us bit deep dive into understanding the technical aspects of how agent AI is exactly working now guys let us try to understand how agentic AI is exactly working so there is actually a four step process of you know how agentic AI exactly works so the first step is you know perceiving where basically what we are doing is we are gathering and processing information from databases sensors and digital environments and also the next step is reasoning so with the help of large language model as a decision-making engine it is generating the solutions if we talk about the third step which is acting so it is integrating with external tools and softwares to autonomously execute the given task and finally it is learning continuously to improve through through the feedback loop which is also known as the data flyv okay now let us explore each of the step one by one and let us try to understand so if you talk about perceiving okay so this is actually the first step where agentic AI is actually stepping up so it is doing the perception where what exactly is happening guys that AI is collecting data from multiple sources so this data could be from database okay like your traditional and Vector databases Okay so it could be graph CU like vector database means the same and if you talk about other from data it could be from epis like it is fetching realtime information from external systems it is uh basically taking data from the iot sensors like for real world applications like Robotics and Logistics and also it could take you know data from the user inputs also like it could be text command voice commands or a chatbot interaction now how it is exactly working guys so basically let us recollect everything technically and let us see how this is happening so the first step which is going in perceiving is the data extraction where uh exactly the AI agent queries the structured uh databases like SQL or nosql for Relevant records uh it is also using Vector databases to retrieve any semantic data for context aware responses like it could be you know any complaint certain uh you know it is trying to find out okay so so next after it has got the data extraction it goes for feature extraction and pre-processing where AI is filtering the relevant features from the raw data for example like a fraud detection AI is scanning the transaction log for anomalies the third thing it is entity recognition and object detection so AI uses basically computer version to detect objects and images and uh then it applying the named entity recognition this is a technique okay uh to extract the critical terms from the given text also so we have three uh step-by-step process which is happening in uh perceiving the first one is data extraction second one is feature extraction and pre-processing the third one is like entity recognition and object deduction so uh let us take a very simple example like AI based customer support system so if it consider an agentic AI assistance like for a customer service so say a customer is asking where is my order so the AI queries multiple databases all over here suppose it is going to query the e-commerce order database to retrieve the order status or it could go to the logistics API to track the realtime shipment location also it could go for customer interaction history to provide personalized response the result what we get all over here is that the AI is fetching the tracking details identifying any delays if it is happening and suggesting the best course of action now uh the next step is reasoning okay now ai’s understanding and decision making and problem solving is making agentic AI very greater so here what is exactly happening like once the AI has perceived the data now it should start reasoning it okay so the LM model acts as a reasoning engine you know orchestrating AI processes and integrating with specialized models for various function so if you talk about the key components uh like here used in the reasoning it could be llm based decision making so AI agents could use models like llms like gb4 Cloud llama to interpret a user intent and generate a response it is basically coordinating with smaller AI models for domain specific task like it could be like Financial prediction or medical Diagnostics so these could be uh you know the given an example then it is using uh retrieval augmented generation or r model okay to with the help of which AI is enhancing the accur you know by retrieving any propriety data from the company’s databases for example like instead of relying on gbt 4’s knowledge the AI can fetch company specific policies to generate the accurate answers so this could be the one and uh in in the reasoning the final step is AI workflow and planning so it is a multi-step reasoning where AI is breaking down complex task into logical step for example like if asks to automate a financial report AI is retrieving the transaction data and analyzing the trend and it is formatting the results Al so for example you could use this in uh Supply Chain management suppose consider there is a logistics company which is using the agentic AI to optimize what could be the you know uh shipping routes you know so a supply chain manager requesting the AI agent to find the best shipping route to reduce the delivery cost so the AI processes realtime fuel prices traffic conditions and weather report so using llm Plus data retrieval it finds out the optimized routs and selects the cheapest carrier result you get is that AI chooses the best delivery option so here the cost is reduced and improving efficiency so this is one of the uh use cases guys uh so after perceiving you get his reasoning okay now let us move ahead and discuss about the third step which is act so in this step basically what is happening like AI is taking autonomous actions so unlike generative AI which stops at generating Conta so agentic AI takes the real world action okay how AI is executing task autonomously guys so basically first step is like here the integration with apis and software could be happen where AI can send automated API calls to the business systems for example like reordering the stock from the suppliers Epi so suppose any inventory level is going down so it could you know reorder that particular stock from the suppliers apepi so it is interacting with the given API now it could also automate the workflows like AI executes multi-step workflows without human supervision so here like AI can handle like insurance claims by verifying the documents checking policies and approving the payouts and finally AI could operate within predefined business rules okay to prevent any unauthorized actions also so ethical AI is basically being worked in this direction for example like AI can automatically process claims up to say uh $10,000 you know but it is requiring the human approval for the higher amounts So based on you know insurance and policy making stuff so agentic AI could be you know really helpful in this scenario uh one example like uh let’s consider so let’s say we have this agentic managing an IT support system so suppose a user says my email server is down so the AI can diagnose the issue restart the server and confirms the given resolution now if it is unresolved then AI escalates to a human technician then it results into you know AI is fixing the issues autonomously reducing the downtime okay so this is where your action or act is coming up into the picture now if you go on to the next and the final step which is learning so uh learning basically with the help of data fly wheel it is continuously learning okay so this is the feedback loop all over here which is the data fly wheel so how AI learns over the time if we ask this question so what is exactly happening that it is uh interacting with the data collection suppose AI logs uh successful and failed actions for example like if users correct AI generated responses then AI is learning from those Corrections second thing what you could do is you could model uh you could fine-tune the model and do reinforcement learning so AI adjust its decision- making models you know basically to improve future accuracy it uses reinforcement learning basically to optimize workflows based on past performance okay now uh third step could be automated data labeling and self correction so here what is happening that AI is labeling and categorizing past interactions to refine its knowledge base example like AI autonomously is updating frequently Asked answers based on the recurring user queries so in this way AI is learning over the time uh EX example one you could consider uh so say we have this uh AI is optimizing any financial fraud deduction so say this is uh consider that this is a bank which is AI powered which has this AI powered fraud detection system so AI is analyzing these financial transaction and it is detecting any suspicious activity and if flagged the transactions are false and AI is learning to reduce these false alerts so over the time AI is improving the fraud detection accuracy like minimizing disruptions for the customer so in this way AI is getting smarter over the time like reducing the false alerts and also the financial fraud so let’s have a just quick recap of what uh we studied right now so agentic AI Works in four steps the first step is perceiving where AI is gathering data from databases sensors and apis the next step is reasoning so it is using llm to interpret task applies logic and generating the solution the third step is acting so here AI is integrating with external systems and automating the task and finally it is learning so AI is improving over the time you know bya feedback loop or which is basically called as data fly me so guys uh now let us see this diagram and try to understand what this diagram is trying to say so the first thing you could see an AI agent all over here so this is an AI agent which is basically an autonomous system so which has a capability of perceiving its environment making decision and executing actions without any human intervention now ai agent is acting as the Central Intelligence okay in this given diagram and it interacts with the user okay uh and various other data sources it processes input queries databases makes decision using a large language model and it is executing action and it is learning from the given feedback now the next step you could see the llm model so if you talk about llms these are the large language model model which is kind of an advanced AI model trained on massive amount of Text data to understand generate and reason over natural language now if I talk about this llm so This is actually acting as the reasoning engine all over here and it is interpreting the user inputs and making informed decision it is also retrieving relevant data from the databases generating responses it can also coordinate with multiple AI models for different tasks like it could be content generation okay predictions or decision making now when the user is asking a chat board like for example let’s say what is my account balance so the llm processes the query retrieves the relevant data and responds the given bank balance accordingly now if you look at the kind of database the llm is interacting so we have the traditional database and the vector database so uh here if I say uh the database like AI agent basically squaring the structured database so suppose structure database like it could be a customer records or inventory data or it could be any transactional log also so traditional databases basically store well defined you know structured information okay so for example like when a bank a assistant is processing a query like show my last five transaction so it is basically fetching the information from a traditional SQL based database next we have this Vector database also guys so Vector database is a specialized uh kind of a database for for storing unstructured data which could be like text embeddings images or audio representations so guys like unlike traditional databases that store exact values Vector databases store in a high dimensional mathematical space it allows AI models to search semantically uh similar data instead of like exact matches now ai is retrieving the contextual information from the vector databases which is ex actually enhancing the decision making it is improving the AI memory by allowing the system also to search for you know conceptually similar past interaction let us take a example to understand this for example uh we have discussed about a customer support jackbot So suppose if it queries a vector database to find out similar pass tickets like when responding to a customer query so a recommendation engine could use a vector database to find out similar products on a user’s past preferences so this could be done in that scenario also some of the like popular Vector databases could be like Facebook’s AI similarity search Pine Cone or vv8 these are the certain amazing Vector databases then you could see the next step is you know after it has worked on these given data it is performing the action so the action component is referring where ai’s agent has this ability to now execute task autonomously after the reasoning is done so AI is integrating with external tools apis or automation software to complete the given task it does not provide only information but it is actually uh say you know performing the given action so for example like in a customer support the AI can automatically reset a user’s password after verifying their identity if we talk about in finance then AI can approve a loan also like based on the predefined eligibility criteria now finally we have the data fly wheel so data flywheel is a continuous feedback loop where AI is learning from the past interactions refining its models and it is always improving over the time now every time like the AI is interacting with the data or taking an action or receiving a feedback that information is fed into this model so this is creating a self uh improving AI system that is becoming smarter over the time so the data fly wheel is allowing AI to learn from every interaction and uh AI is becoming more efficient by continuously optimizing responses and refining strategies thing in could be used in a fraud detection so in this the AI is going to learn from the past fraud cases and it is going to detect new fraudulent patterns and more effectively chatbots also can learn from user feedback and improve the responses and finally you have the model customization which is basically you are trying to fine-tune the AI models on specific business need or any industry requirement so AI models are not static like they can be adapted and optimized for a specific task so custom fine-tuning is actually improving the accuracy and domain specific application like it could be Finance Healthcare or cyber security so a financial institution uh say fine-tuning an llm to generate a investment advice okay on a historical market trends that could be one use case or in healthcare if you discuss like the healthcare provider is fine-tuning then AI model to interpret the medical reports and recommend the treatments so guys based on the given diagram you would have got a brief idea like how uh you know agentic AI is working now if we discuss about the future of agentic AI then guys I would say it looks very much promising because it is keep improving itself and it is finding new ways to be useful like with better machine learning algorithms and smarter decision making these AI system will be more uh independent handling complex task on their own and believe me in Industries like healthcare Finance customer service they have already started to see how AI agents can make more impact and it could be more efficient from personalization perspective you know managing resources and many more other things so as this system continue to learn and adapt I think so they will be opening up even more possibilities helping businesses grow improving how we live and work now I would say that uh in conclusion that agentic AI is actually Paving the way for New Opportunities like unlike the old bu versions of AI which was assisting with generating content or predicting the data you know or responding to any queries but agentic AI can perform techniques independently with minimal human effort and agentic AI has become self-reliant in decision making day and it is making very big differences in Industry like Healthcare Logistics customer services which is enabling companies to be more efficient as a result it is providing better services to their clients that’s wrap full course if you have any doubts or question you can ask them in the comment section below our team of experts will reply you as soon as possible thank you and keep learning with simply learn staying ahead in your career requires continuous learning and upskilling whether you’re a student aiming to learn today’s top skills or a working professional looking to advance your career we’ve got you covered explore our impressive catalog of certification programs in cuttingedge domains including data science cloud computing cyber security AI machine learning or digital marketing designed in collaboration with leading universities and top corporations and delivered by industry experts choose any of our programs and set yourself on the path to Career Success click the link in the description to know more hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to nerd up and get certified click here

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

Leave a comment