Amjad Izhar Blog

Category: Artificial Intelligence (AI)

Designing and Building the 21st Century Robot
This compilation of texts focuses on the 21st Century Robot Project, spearheaded by Brian David Johnson, a futurist and science fiction author. It explores the transition from science fiction concepts of robots to their real-world realization, particularly focusing on the development of a social, open-source robot named Jimmy. The sources discuss the philosophical underpinnings and technical challenges of creating robots designed for interaction and companionship, rather than solely for industrial tasks. Included are excerpts from a manifesto, science fiction stories featuring Jimmy and his creator, and details about the collaborative process involving engineers, artists, and even first-grade students in designing and building these robots. The overarching goal presented is the democratization of robotics, making it accessible for anyone to imagine, design, and build their own robot.

Podcast

Play or Download The Podcast Audio – 21st Century Robot

Science Fiction Prototyping

Science Fiction Prototyping is an unconventional tool used by Brian David Johnson, a professional futurist. It involves using science fiction stories, often based on research, to explore what it will feel like to live 10, 15, or even 20 years in the future and how people will act and interact with technology. According to Johnson, science fiction provides a language to talk about the future.

In the context of the 21st Century Robot project, the walking, talking robot named Jimmy and other 21st Century Robots were first born in science fiction stories about a decade before the book was written. Johnson used his imagination to create these stories, which in turn fired up the imaginations of the scientists, engineers, academics, and designers who helped bring Jimmy to life. Weaving fiction with reality is part of how Johnson came to create the 21st Century Robot Collective.

The Creative Science Foundation, a group of researchers and professors including Dr. Simon Egerton, collaborated on this new kind of robot, with science fiction at the center of their research. They used the stories as prototypes that allowed them to understand what it might be like to interact and live with these robots. These stories helped move their research forward, envision the new kind of robot, and led to new approaches in software and artificial intelligence (AI). This process was iterative: each story led to a new breakthrough and more research, which then led to another story, building upon the previous one. This iterative process, building off open source sharing, is described as where things “get really interesting”. The book itself is presented as a mix of science fiction stories and nonfiction chapters, reflecting this approach.

The idea of imagining first is crucial to the 21st Century Robot Manifesto. Science fiction stories, comics, and movies are seen as powerful tools to help you imagine your robot. It is suggested that science fiction, based on science fact, can be used to design robots and even shared as a technical requirements document. This aligns with the broader idea that we must be able to imagine the future so we can then build it.

Building Your 21st Century Robot

Building robots, particularly the kind envisioned by professional futurist Brian David Johnson in the context of the 21st Century Robot Project, is presented as a process that is now accessible to almost anyone, in contrast to the 20th century when it was largely confined to universities and corporations. The project aims to make the production and use of robots as common as caring for a family pet. The goal is simple: to create 7 billion robots, making them as common as smartphones, tablets, and TVs.

The method for achieving this goal is elaborate and involves several key steps and philosophies:
1. Imagination First: The most important skill needed to build a robot is imagination. Nothing built by humans was not imagined first. This involves envisioning the robot’s personality, name, how it will interact, and what unique things it will do. Science fiction stories, comics, and movies are powerful tools to help imagine your robot. This aligns with Brian David Johnson’s use of science fiction prototyping, an unconventional tool that uses stories based on research to explore future interactions with technology and how people will feel and act. Science fiction, based on science fact, can even be used to design robots and shared as a technical requirements document. This is fundamental to the idea that we must be able to imagine the future so we can then build it. The fictional robots in the 21st Century Robot project were first born in science fiction stories about a decade before the book detailing their creation was written.
2. Design: Once imagined, the robot needs to be designed. The physical creation often starts with illustrations and digital design tools [28, 3D printing section]. The design process gives form to the robot and helps refine its functionality. The design includes both the exoskeleton (the outer shell) and the endoskeleton (the internal structure). The exoskeleton contributes significantly to the robot’s personality and how people perceive it. The design needs to consider practical aspects like balance and weight. Digital design files, sometimes generic to start with, can be modified using software like Autodesk’s 123D or more complex tools. Sharing these designs is encouraged to foster collaboration and building upon others’ ideas. Different design approaches can result in varying levels of complexity and functionality, as seen in the different motor configurations explored by the Olin College students (4, 7, 18, and 24 motors).
3. Building the Body (Construction): Building the physical body involves assembling the endoskeleton, which consists of frames and brackets, and incorporating the necessary components like servo motors, wires, and internal workings. Servo motors act like the robot’s muscles, enabling movement and are often intelligent, tracking their own state. Simple movements may use single servos, while more complex motions, like those in the ankle or hip, require multiple servos in a double-axis configuration, connected by frames and brackets. Key areas like the feet, legs, hips, torso, arms, head, and neck require specific arrangements of servos and brackets. The torso typically houses the main electronics. The exoskeleton, often 3D printed from the design files, forms the outer shell and protects the internal components while expressing the robot’s look. Kits, like those from Trossen Robotics or ArcBotics, provide all the necessary parts for assembly and often include instructions or tutorials.
4. Programming the Brain (Software): The robot’s brain consists of hardware and software, known as artificial intelligence (AI). The 21st Century Robot Project models the robot brain on the human brain, splitting it into three parts: the autonomic system (handling low-level functions like walking and balance), the conscious mind (personality and higher-level thinking), and the reflex core (managing communication between the two). The AI software is structured in layers, such as the action primitives layer (controlling basic movements), the social primitives layer (handling social interactions like listening and gesturing), the character layer (defining personality and vocabulary), and the app layer (allowing customization through application programming interfaces or APIs). The development and sharing of this software, particularly through open source platforms like ROS (Robot Operating System) and DARwin-OP, are crucial to the project’s goal of accessibility. An environment called “Your Robot” is provided for exploring and developing the robot’s brain, including programming movements and downloading apps. Robots can also be given a voice with customizable volume, pitch, and speed.
5. Iteration and Sharing: The process of building is intentionally iterative, involving repeating the process to make multiple versions and build upon previous learning and designs. Open source sharing is fundamental, allowing people to modify and build upon others’ ideas. The collective efforts of scientists, engineers, academics, designers, makers, and even first-grade students contribute to the project, refining designs and developing new approaches.
Once a robot is built, there is a process for booting it up and ensuring all components are working correctly, sometimes involving a diagnostic check like the HELLO! Protocol. Troubleshooting resources are available to help navigate potential problems.

Ultimately, building robots in this context is not about creating one “best robot ever,” but empowering everyone to create their best robot ever, leading to a future with seven billion unique robots.

Anatomy of the 21st Century Robot Brain

Drawing on the information in the sources and our conversation history, let’s discuss the concept of Robot Brains within the context of Brian David Johnson’s 21st Century Robot Project.

In this project, building a robot, particularly its brain, is presented as something now accessible to nearly anyone, a significant shift from the 20th century when it was primarily the domain of universities and corporations. The foundation of this accessibility lies in the idea that imagination is the most important skill needed to build your robot. As we discussed, this connects directly to the unconventional tool of Science Fiction Prototyping, where science fiction stories based on research are used to explore how people will interact with technology in the future. These stories, in fact, acted as prototypes that helped researchers envision the new kind of robot and led to new approaches in software and artificial intelligence (AI).

The robot’s brain, consisting of hardware and software known as artificial intelligence (AI), is modeled on the human brain [Source from our previous conversation history, 196]. This approach was inspired by Dr. Simon Egerton’s research into designing social robots meant to operate in complex environments like human homes, by taking inspiration from human behavior and how our brains work.

The robot brain is conceptually split into three parts:
1. The Autonomic System (or subconscious): This part handles the crucial, low-level functions automatically, freeing up the rest of the brain for more complex tasks. In the context of the 21st Century Robot, this includes controlling walking and balance, communicating with the servo motors through a microcontroller.
2. The Conscious Mind: This is where the robot’s personality and character reside, and where higher-level thinking occurs.
3. The Reflex Core: Acting as a translator and traffic cop, this thin strip allows signals to move between the conscious mind and the autonomic system, using primitives to speed up the transfer of information.
A key insight from Simon Egerton’s research that influenced the robot brain’s architecture was the idea of allowing robots to make both good and bad decisions. Just as humans learn by making mistakes, the belief was that allowing robots to do so would accelerate their learning process. This complexity required a new system architecture.

This new architecture emerged from the collaboration of Simon Egerton, Vic Callaghan, and Graham Clarke, drawing on the concept of multiple personalities or personas from psychoanalytic theory. This persona-based approach illuminates how humans adapt to changing contexts by switching between different sets of behaviors (personas). Applying this to AI meant envisioning the robot’s intelligence as a collection of different actions or behaviors.

This concept led to a significant breakthrough: the realization that these personas could be grouped together and that new behaviors could be generated or downloaded, much like apps on a smartphone. This made programming and personalizing social robots easier for everyone. As stated, “Our robots became smartphones with legs”.

The software that powers the robot’s brain is structured in four layers, built upon open source principles (using platforms like ROS and DARwin-OP) to promote accessibility and sharing:
1. Action Primitives Layer: This layer operates at the bridge between the conscious and autonomic parts of the brain, controlling low-level motor functions and enabling basic movements like walking.
2. Social Primitives Layer: Unlike traditional robots focused on physical manipulation, 21st Century Robots are designed to be social. This layer simplifies the complexities of social interaction, handling basic behaviors like listening, gesturing, and talking. It helps the robot figure out where to stand, when to make eye contact, and how to use gestures appropriately.
3. Character Layer: This layer defines the robot’s personality and behaviors, determining how it will respond in different situations and what vocabulary it will use. It uses the social and action primitives to interact with the subconscious part of the brain.
4. App Layer: This layer allows users to customize their robot by writing or downloading applications (apps) using application programming interfaces (APIs). These apps can transform the robot into various tools or companions, such as an alarm clock or a game machine, with its function influenced by its personality.
A development environment called “Your Robot” is provided to allow individuals to explore and develop their robot’s brain, including programming movements and downloading apps. Additionally, robots can be given a voice with customizable attributes like volume, pitch, and speed.

The open source nature of the software means that anyone can access and modify the code, from the low-level primitives to the personality layer and apps, fostering collaboration and building upon others’ ideas. This accessibility to designing and programming the robot brain is central to the project’s goal of empowering everyone to create their best robot ever.

Creating Our Social Robot Companions

Drawing on the information in the provided sources and our conversation history, Social Robots are a central concept within Brian David Johnson’s 21st Century Robot Project. The project champions a radical shift in how robots are perceived and created, moving them from the confines of universities and corporations in the 20th century to becoming accessible companions for nearly anyone in the 21st century. The core goal is to make the production and use of robots as common as caring for a family pet.

The 21st Century Robots are intentionally designed to be fiercely social. Unlike traditional robots often relegated to industrial tasks (sometimes referred to as “Dirty, Dangerous, Dull” or 3D tasks) or locked away in labs, these new robots are primarily designed to act and interact with people. They are envisioned as companions and friends, not just servants. The project aims for a future where robots are as common and normal as smartphones, tablets, and TVs, becoming a part of our daily lives.

The journey to creating social robots begins with imagination. You must first envision the robot’s personality, name, and how it will interact with people. Science fiction, grounded in science fact, serves as a powerful tool and even a technical requirements document to help imagine and design these social robots. The fictional robots in the project, like Jimmy, were first conceived in science fiction stories.

Design plays a crucial role in a social robot’s reception. The exoskeleton, or outer shell, is significant in conveying personality and influencing how people perceive the robot. Designers deliberately aimed for a look that was cute, approachable, and friendly, like Jimmy, drawing inspiration from characters like E.T. to avoid scaring people. The design needs to ensure the robot looks like it wants to be your friend. The question of whether a robot is a boy, girl, or neither is also relevant to social design, often depending on context, story, and how humans generally perceive machines (often defaulting to male unless cues like color are added). Children, notably Ms. Moore’s first-grade class, instinctively imagine these robots as friends and companions, desiring interactions like playing, dancing, and helping with chores, rather than seeing them as servants.

The robot’s brain, the artificial intelligence (AI), is key to its social capabilities and is modeled on the human brain. This architecture includes an autonomic system (for low-level functions like movement), a conscious mind (for personality and higher-level thinking), and a reflex core connecting the two. Inspired by research into enabling robots to make both good and bad decisions (like humans) to accelerate learning, a new persona-based architecture was developed. This approach views the robot’s intelligence as a collection of actions or behaviors that can be grouped and added, much like apps on a smartphone, making it easier for anyone to program and personalize a social robot. As one collaborator noted, “Our robots became smartphones with legs”.

The software enabling social interaction is structured in layers:
- The Action Primitives Layer handles basic movements necessary for a robot to operate in a physical, social environment, freeing up the rest of the brain.
- The Social Primitives Layer simplifies the complexities of social interaction, managing behaviors like listening, gesturing, talking, deciding where to stand, and making eye contact in a socially appropriate manner. This layer allows the robot to react naturally without extensive processing.
- The Character Layer defines the robot’s personality, behaviors, and vocabulary, using the primitives to guide interactions.
- The App Layer allows users to customize their robot through applications (apps) and APIs, enabling it to function as different tools or companions (like an alarm clock or game machine), with the robot’s personality influencing how the app is performed.
The open source nature of the software and hardware is fundamental to making social robots accessible. It allows individuals to access, modify, share designs and code, fostering collaboration and innovation within a community of builders worldwide. This collective effort, from scientists and engineers to makers and first-grade students, drives the project forward.

Ultimately, social robots are seen as more than just machines; they are viewed as companions that can form relationships with humans. There can be bonds developed between humans and robots, even in professional settings. The project aims to empower everyone to create their best robot ever, resulting in seven billion unique, social robots filled with humanity and dreams. These robots are intended to be extensions of ourselves, reflecting our hopes and dreams, and helping us explore our own humanity and relationships.

The Open Source 21st Century Robot Project

Based on the provided sources and our conversation history, let’s discuss Open Source within the context of Brian David Johnson’s 21st Century Robot Project.

The concept of open source is a fundamental principle of the 21st Century Robot Project. The underlying idea is that people should have control over the technology we use. This means we should have the ability to build it, modify it, and share it. This practice and community became popular around the end of the 20th century with the growth of the internet and software like the open source operating system Linux.

The 21st Century Robot Project embraces this philosophy fully:
- A 21st Century Robot is completely open source.
- This starts with the 3D design files for the robot’s body, allowing everyone to design and customize their own robot.
- The software that runs the robot and makes up its brain is free and open.
- Users are encouraged to play with the operating system and design different apps for their robot.
- A core aspect is the encouragement to share designs with others. If you create a cool new leg design or app, you should share it so others can use and build upon it.
- The production of these robots is also open, enabling people all over the world to collaborate to build better, smarter, funnier, and more exciting robots.
This open source approach is seen as a key factor in removing the barriers that had previously limited robot creation primarily to large universities and corporations in the 20th century. Technological advances combined with open source software and hardware have made it possible for anyone to imagine, design, build, and program their own robot in the 21st century. Open source hardware taps into the creativity of millions of smart developers and non-traditional builders.

The software powering the robot’s brain is structured in layers (Action Primitives, Social Primitives, Character, and App layers). This software runs on open source operating systems like ROS (Robot Operating System) and DARwin-OP, which were developed by universities to advance robotics and artificial intelligence. The open source nature of these platforms means you can see and change the code if you want to. Both ROS and DARwin-OP have large communities of students, inventors, and makers who actively share ideas and solve problems online, which is a significant benefit of this approach. The project provides a development environment called “Your Robot” and encourages users to play with the code, whether it’s low-level functions, personality layers, or developing/downloading apps. The website http://www.21stCenturyRobot.com is a hub for accessing the software and connecting with the community.

The open source initiative in robotics is specifically aimed at lowering the barrier to entry and making it easier for people to get started. This accessibility is crucial for allowing the tremendous potential of robots to be realized, by getting them everywhere and letting people build them. It is believed that the amazing ideas will come from these new points of view.

Ultimately, the goal of the 21st Century Robot Project is not to build one “best robot ever”. Instead, through open source design and the creativity it enables, everyone can take what others have done and modify it to make their best robot ever. The project aims to provide the tools, materials, design files, and code necessary for everyone to imagine, design, build, program, and share their own robots. This collective effort, driven by open source principles, is intended to lead to a future with seven billion best robots ever, each reflecting the unique humanity and dreams of its creator. As illustrator Sandy Winkelman noted, he’s most excited to see what people do when they start creating their own robots.

Download PDF Book

Read or Download The PDF Book – 21st Century Robot

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
May 30, 2025
Too Much Technology is Too Much for Mankind and Is a Waste Only.

In an age where every click promises convenience and every notification demands our attention, humanity finds itself not empowered, but overwhelmed. The accelerating pace of technological advancement has crossed a threshold where utility often gives way to futility. What was once a tool for progress is now, in many ways, a burden on our well-being, values, and identity.

We stand at a crossroads where innovation, though dazzling in its potential, increasingly encroaches on the natural rhythms of life. Instead of enriching the human experience, an excess of technology frequently diminishes our capacity for critical thought, emotional depth, and authentic human connection. As Marshall McLuhan aptly said, “We become what we behold. We shape our tools and thereafter our tools shape us.” It is imperative to question the blind worship of gadgets and algorithms that demand more than they deliver.

This blog post aims to dissect the myth of technological utopia and expose the subtle but corrosive ways in which too much technology is too much for mankind. Through twenty compelling reflections, supported by expert views and scholarly insight, this discussion urges a return to balance. Humanity must reassert its primacy over the tools it has created—lest it becomes subservient to them.

1- The Illusion of Connection

Though digital technology promises to connect us more than ever, it has ironically made meaningful human relationships more elusive. The proliferation of social media has led to superficial interactions, weakening genuine empathy and communal bonds. Psychologists like Sherry Turkle, in her book Alone Together, explore how constant connectivity breeds emotional isolation.

Moreover, technology often replaces face-to-face communication with emojis and curated personas. We now prefer to text rather than talk, even in intimate relationships. The emotional texture of human interaction is flattened by algorithms designed to maximize screen time rather than facilitate sincere dialogue.

2- Erosion of Critical Thinking

The digital age has nurtured a culture of immediacy, where instant answers are preferred over thoughtful inquiry. This undermines our ability to engage in critical thinking and sustained reflection. Philosopher Nicholas Carr in The Shallows warns that the internet rewires our brains for distraction rather than deep comprehension.

Instead of nurturing intellectual discipline, we are spoon-fed pre-packaged data, diminishing our cognitive resilience. The rise of AI and search engines has created a dependency where thinking is outsourced. As a result, our intellectual muscles are atrophying in favor of convenience.

3- Surveillance Capitalism and Loss of Privacy

With every app download and online transaction, we barter our privacy for convenience, often unwittingly. Shoshana Zuboff’s The Age of Surveillance Capitalism outlines how corporations manipulate personal data for profit, turning users into products.

This constant monitoring alters our behavior. Knowing we are watched, we become more guarded, less authentic. It’s not just data being mined—it’s human freedom. In essence, over-reliance on technology reshapes the very nature of individuality and autonomy.

4- Dependency and Cognitive Laziness

The more we lean on technology for simple tasks, the less capable we become of solving problems independently. From GPS navigation to spellcheck, our mental faculties are being underused. Technology becomes not a supplement, but a crutch.

This dependency nurtures a form of learned helplessness. Psychologists warn that such behavior limits our ability to respond creatively to real-world challenges. As the mind grows idle, so too does our ability to adapt and evolve intellectually.

5- Mental Health Crisis

Excessive screen time correlates strongly with anxiety, depression, and sleep disorders. The blue light emitted from devices interferes with circadian rhythms, while the dopamine-driven feedback loops of apps like TikTok and Instagram keep users in cycles of addiction.

Experts such as Dr. Jean Twenge link the rise of mental health issues among teens to smartphone use. In a hyperconnected world, loneliness has paradoxically become a public health epidemic. The price of endless digital engagement is emotional exhaustion.

6- Diminishing Attention Span

The swipe-and-scroll culture has fundamentally altered how we consume information. Long-form content and deep reading are replaced by short clips and memes, training our minds for distraction. A study by Microsoft found that the average human attention span has fallen below that of a goldfish.

This shift has serious implications for education, work, and civic life. Democracies depend on informed citizens who can engage in sustained reasoning. Technology, used excessively, undermines this requirement.

7- Dehumanization of Work

Automation and AI threaten not only jobs but the dignity associated with labor. Increasingly, people are being treated as cogs in a machine, their worth determined by productivity metrics. Yuval Noah Harari warns in Homo Deus that mass unemployment may result in a “useless class” of people rendered obsolete by machines.

In striving for efficiency, we risk stripping work of its human element. Creativity, empathy, and ethics—qualities that define our species—cannot be encoded into an algorithm.

8- Environmental Costs

The carbon footprint of technology is staggering. Data centers consume vast amounts of energy, and electronic waste is a growing ecological disaster. The quest for the newest gadget fuels mining, pollution, and unsustainable consumption patterns.

According to the UN, the world produces over 50 million tons of e-waste annually. The environmental degradation tied to tech addiction exposes the hypocrisy of digital “progress.” Sustainability is often sacrificed at the altar of speed and convenience.

9- Disruption of Education

While ed-tech tools have potential, an overreliance on screens in classrooms can impede deep learning. Students are distracted, and the tactile, human elements of education are lost. Educational theorists like Neil Postman argue that teaching is not simply data transfer but character shaping—something technology struggles to replicate.

True education requires conversation, reflection, and moral guidance—elements that cannot be automated. The screen cannot replace the mentor.

10- Commodification of Time

Technology, especially mobile apps, turns time into a commodity. Our attention is bought, sold, and traded in attention markets. This results in a sense of time poverty, where people feel chronically rushed despite not being more productive.

Sociologist Judy Wajcman in Pressed for Time explains how digital technology paradoxically increases stress. Instead of freeing us, it enslaves us to schedules, notifications, and unrealistic expectations.

11- Dulling of the Senses

Excessive digital interaction blunts our sensory experience. Nature, art, and human expression are increasingly filtered through screens. Philosopher Albert Borgmann laments this in Technology and the Character of Contemporary Life, suggesting that devices displace the “focal practices” that give life depth.

Our world becomes pixelated, less textured. We trade immersion for immediacy, and in doing so, lose our connection to the richness of lived experience.

12- Ethical Blindness

Technological progress often outpaces ethical reflection. From AI decisions in healthcare to facial recognition used in policing, we face moral dilemmas that are unresolved. Wendell Berry rightly said, “The great enemy of freedom is the alignment of political power with wealth and technological power.”

As creators, we must pause to ask not just can we do it, but should we? Unchecked innovation without ethical anchors invites dystopia.

13- Polarization and Echo Chambers

Algorithms optimize for engagement, not truth. Social media platforms thus foster echo chambers that amplify bias and deepen division. According to Eli Pariser in The Filter Bubble, users are fed content that confirms rather than challenges their views.

The resulting polarization threatens social cohesion and civil discourse. When reality is fragmented into personalized feeds, consensus becomes nearly impossible.

14- Addiction and Behavioral Manipulation

Digital platforms are engineered to be addictive. With features like infinite scroll and variable rewards, they hijack our psychology. This is not incidental—it’s by design. Behavioral scientists such as B.J. Fogg have influenced these persuasive technologies.

Users become products, their behavior shaped by unseen algorithms. This manipulation erodes autonomy and makes true freedom of choice an illusion.

15- Technological Elitism

Access to cutting-edge technology is uneven, creating new social divides. The digital divide widens inequality, privileging those who can afford constant upgrades. As Evgeny Morozov argues, technology often serves the elite more than the underprivileged.

This leads to a two-tiered society: one hyper-connected, the other left behind. Technology, instead of being a great equalizer, becomes a marker of exclusion.

16- Suppression of Creativity

While some tech tools aid creativity, overexposure to digital media can hinder original thought. The constant influx of pre-made content discourages experimentation and deep introspection. Neil Postman warned that technology can turn creators into passive consumers.

True creativity demands solitude, discomfort, and patience—all of which are undermined by tech’s emphasis on instant gratification and replication.

17- Artificial Reality over Actual Reality

The rise of virtual reality and augmented experiences risks replacing life with simulations. As we immerse ourselves in digital realms, real-world connections and responsibilities fade. This escapism is dangerous.

Reality, with all its imperfections, teaches resilience and wisdom. Virtual substitutes, though seductive, often reinforce narcissism and detachment.

18- Overengineering of Daily Life

Smart homes, wearable tech, and IoT promise convenience but introduce unnecessary complexity. What was once simple—like turning off a light—is now app-controlled. Philosopher Ivan Illich criticized such overengineering as an erosion of convivial tools.

Technology should simplify life, not micromanage it. The fetish for automation often ignores the joy and meaning found in simple, manual acts.

19- Moral Laziness

When technology handles difficult decisions, humans become morally passive. Whether it’s AI moderation or automated warfare, we risk abdicating responsibility. As Hannah Arendt warned, banality arises not from evil intentions but from disengagement.

Technology must not absolve us from moral reckoning. Convenience should never come at the cost of conscience.

20- The Myth of Infinite Progress

Technological utopianism falsely promises that all problems can be solved with more innovation. But not all human challenges are technical. Many are moral, spiritual, or philosophical in nature.

C.S. Lewis warned against the “idol of progress,” cautioning that advancements without wisdom lead to ruin. True progress must include inner growth and ethical maturity—not just better gadgets.

Conclusion

In the final analysis, technology is neither inherently good nor evil—it is a mirror that reflects human intention. But when it becomes an idol, revered without restraint, it begins to corrode the very fabric of what makes us human. The march of progress must be matched by an equally robust growth in wisdom, ethics, and restraint.

This blog post serves as both a critique and a caution. If mankind is to flourish in the digital age, it must reclaim the authority to say “enough.” As Socrates urged, “Know thyself.” Only by doing so can we ensure that technology remains our servant—and never our master.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

May 30, 2025
Microsoft AI Tour Keynote Session by Satya Nadella Bengalore January 7, 2025
This presentation outlines Microsoft’s AI strategy, focusing on three core platforms: Copilot, a user interface for AI; the Copilot stack, an AI infrastructure built on Azure; and Copilot devices, extending AI capabilities to the edge. The presentation highlights the development of AI agents for various applications, emphasizing low-code/no-code tools like Copilot Studio for broader accessibility. It also stresses the importance of data, model orchestration, and trust in building robust and reliable AI systems. Finally, it announces a commitment to train 10 million people in India in AI skills by 2030.

AI and Platform Shifts: A Study Guide

Glossary of Key Terms
- Mo’s Law: The observation that the number of transistors on a microchip doubles approximately every two years, leading to exponential increases in computing power.
- DNN: Deep Neural Network – a type of artificial neural network with multiple layers between the input and output layers, allowing for complex data processing.
- GPUs: Graphics Processing Units – specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate the creation of images. They are increasingly used in AI for their parallel processing capabilities.
- Transformers: A deep learning model architecture that uses self-attention mechanisms to process sequential data, particularly effective for natural language processing tasks.
- Inference time/Test time compute scaling law: Refers to the efficiency of using AI models for prediction and analysis (inference) after training.
- Multimodal capability: The ability of AI systems to interact with and understand information from multiple modalities, such as text, images, and speech.
- Planning and reasoning capabilities: The ability of AI systems to think strategically, plan multi-step actions, and make decisions based on logical reasoning.
- Gentic behavior: AI behaviors that mimic human-like problem-solving, creativity, and adaptability.
- Agents: AI systems designed to perform specific tasks autonomously, often within a larger ecosystem of interacting agents.
- Co-pilot: A suite of AI-powered tools developed by Microsoft designed to assist users in various tasks and workflows.
- Microsoft 365 Graph: A platform that connects data and intelligence from Microsoft 365 applications, providing a comprehensive view of user activity and relationships.
- Pages: An interactive, AI-first canvas within the Microsoft 365 ecosystem, used for collaboration and knowledge sharing.
- Co-pilot Actions: AI-powered rules and automations that operate across multiple Microsoft 365 applications, simplifying complex workflows.
- Co-pilot Studio: A low-code/no-code tool for building and customizing AI agents, making agent development accessible to a wider range of users.
- Co-pilot Analytics: Tools for measuring and evaluating the impact of co-pilot features on individual and organizational productivity.
- Knowledge turns: A concept analogous to supply chain turns, referring to the speed at which an organization can generate, disseminate, and utilize knowledge.
- Azure: Microsoft’s cloud computing platform, providing a wide range of services, including infrastructure, data management, and AI tools.
- Tokens per dollar per watt: A metric for evaluating the cost-effectiveness and energy efficiency of AI infrastructure.
- Liquid cool AI accelerators: Advanced cooling systems designed for high-performance AI hardware, utilizing liquid immersion or direct liquid contact for optimal heat dissipation.
- Silicon Innovation: The development of specialized computer chips optimized for AI workloads, focusing on improving processing power and energy efficiency.
- Data estate: The comprehensive collection of data assets within an organization, including structured, unstructured, and semi-structured data.
- Retrieval augmented generation (RAG): A technique that combines information retrieval with text generation to produce more informative and contextually relevant outputs.
- AI App Server: A software platform that provides the necessary infrastructure and services for building, deploying, and managing AI applications.
- Foundry: Microsoft’s AI app server, designed to streamline the development and deployment of AI models and applications.
- Model catalog: A centralized repository of pre-trained AI models, providing developers with easy access to a diverse range of models for various tasks.
- Models as a service: Pre-trained AI models made available through an API, allowing developers to integrate AI capabilities into their applications without managing the underlying infrastructure.
- Fine-tuning: The process of adapting a pre-trained AI model to a specific task or dataset by further training it on relevant data.
- Model distillation: A technique for creating smaller, more efficient AI models by training them to mimic the behavior of larger, more complex models.
- Groundedness tests: Evaluations that assess an AI model’s ability to generate outputs that are factually accurate and consistent with real-world knowledge.
- GitHub Co-pilot: An AI-powered coding assistant that provides code suggestions and completions within popular code editors, such as Visual Studio Code.
- Multifile edits: The ability of GitHub Co-pilot to make code changes across multiple files simultaneously, streamlining complex code refactoring.
- Repo-level edits: Code modifications that affect an entire code repository, such as adding a new feature or refactoring existing code across multiple files.
- GitHub Co-pilot Workspace: An AI-powered development environment that allows developers to create and manage code projects using natural language instructions.
- Code spaces: Cloud-based development environments that provide developers with a pre-configured workspace accessible from any device.
- Windows 365: A cloud-based desktop service that delivers a full Windows experience, including applications and data, to any device with an internet connection.
- Co-pilot Devices: Computers and other devices optimized for AI workloads, featuring specialized hardware and software designed for enhanced AI performance.
- NPUs: Neural Processing Units – specialized hardware accelerators designed specifically for AI tasks, such as deep learning inference.
- Hybrid AI: AI systems that combine local processing on edge devices with cloud-based processing, leveraging the strengths of both environments.
- Adversarial attacks: Attempts to manipulate or exploit AI systems by providing malicious input or manipulating training data.
- Prompt injection: A type of adversarial attack where malicious code is injected into an AI system’s input prompt, potentially leading to unintended or harmful behavior.
- Confidential computing: A security approach that protects data in use by encrypting it while it is being processed, even from the cloud provider.
- Hallucinations: Instances where AI models generate outputs that are factually incorrect or nonsensical, often due to limitations in their training data or understanding of the world.
Short-Answer Quiz
1. Explain the significance of Mo’s Law in the context of AI advancements.
2. Differentiate between pre-training and inference time in AI.
3. What are the key components of an effective AI agent?
4. How does Microsoft envision Co-pilot as the UI for AI?
5. Describe the role of Pages within the Co-pilot ecosystem.
6. What are the three design considerations for successful AI business transformation, according to the source?
7. Explain the importance of “tokens per dollar per watt” as a metric for AI infrastructure.
8. How does Foundry contribute to the development of AI applications?
9. Describe the concept of hybrid AI in Co-pilot devices.
10. Why is trust a critical factor in the adoption and development of AI?
Answer Key
1. Mo’s Law, which predicts exponential growth in computing power, has been a driving force behind AI advancements. It enables the development of increasingly complex AI models by providing the necessary computational resources for training and inference.
2. Pre-training involves training an AI model on a massive dataset to develop a general understanding of a task or domain. Inference time refers to using the trained model to make predictions or generate outputs on new, unseen data.
3. Effective AI agents possess multimodal capability, allowing them to interact with diverse data types. They have planning and reasoning skills to strategize and execute multi-step tasks. Importantly, they leverage memory, context, and tools to enhance their decision-making.
4. Microsoft envisions Co-pilot as a user-friendly interface that simplifies interaction with AI capabilities. It integrates AI into existing workflows, making it accessible within familiar applications like Microsoft Office and Teams.
5. Pages serve as interactive canvases for collaborative work within the Co-pilot ecosystem. Users can promote data and insights from various sources into Pages, facilitating knowledge sharing and collaborative decision-making with AI assistance.
6. The three key design considerations for AI business transformation are: using Co-pilot as the UI layer for seamless AI interaction, adopting Foundry as the platform for building and managing AI applications, and leveraging Fabric for effective data management and integration.
7. “Tokens per dollar per watt” is a crucial metric because it measures the efficiency of AI infrastructure. It considers the cost, energy consumption, and processing power (represented by tokens), emphasizing the need for both economic and environmental sustainability in AI development.
8. Foundry acts as an AI app server, providing tools and services for deploying, managing, and optimizing AI models. It streamlines the process of building AI applications, enabling developers to focus on innovation rather than infrastructure management.
9. Hybrid AI in Co-pilot devices combines local processing on NPUs with cloud-based AI capabilities. This approach allows for efficient and powerful AI experiences, leveraging the edge for tasks that benefit from local processing while tapping into the cloud for resource-intensive operations.
10. Trust is paramount in AI development due to concerns about security, privacy, and safety. Building trustworthy AI systems requires addressing potential vulnerabilities like adversarial attacks, protecting user data, and ensuring responsible AI development practices.
Essay Questions
1. Analyze the impact of scaling laws on the evolution of AI, considering both the benefits and potential limitations of continued scaling.
2. Discuss the transformative potential of AI agents in various industries, focusing on how they can enhance productivity, creativity, and collaboration.
3. Evaluate the significance of low-code/no-code tools like Co-pilot Studio in democratizing access to AI development and empowering non-technical users.
4. Compare and contrast the advantages and disadvantages of cloud-based and edge-based AI processing, considering factors such as latency, security, and data privacy.
5. Explore the ethical considerations surrounding the development and deployment of AI, focusing on issues such as bias, transparency, and accountability.
Microsoft’s AI Vision and Platforms

Briefing Document: The Future of AI – Microsoft’s Vision and Platforms

This briefing document reviews the key themes and insights from a speech by Satya Nadella, CEO of Microsoft, focusing on the company’s vision and platforms for the future of AI.

Main Themes:
1. The Age of AI Action: The transition from admiring AI capabilities to utilizing them for bold and transformative initiatives is upon us.
2. The Power of Platforms: Microsoft emphasizes its commitment to being a platform and partner company, enabling the development and deployment of AI solutions.
3. Scaling Laws and Inference Time Compute: The continued relevance of Moore’s Law, particularly in driving the scaling of AI models and the emerging importance of optimizing inference time compute.
4. Multimodal Interfaces, Planning & Reasoning: The rise of multimodal interfaces like voice and image recognition, coupled with the increasing capabilities of AI in planning and reasoning, point to a more intuitive and powerful interaction with technology.
5. The Rise of Agents: The convergence of multimodal interfaces, planning & reasoning, memory, tools, and entitlements pave the way for a world of personal, team, and enterprise-wide AI agents.
6. The Importance of Infrastructure, Data, and Tools: A strong emphasis on robust infrastructure, organized data estates, and developer-friendly tools like GitHub Copilot are crucial for realizing the full potential of AI.
7. Trust as a Foundational Element: Addressing security, privacy, and AI safety concerns through dedicated engineering efforts is paramount to building trust and fostering responsible AI development.
Key Platforms:

1. Copilot: The UI for AI, seamlessly integrated into existing workflows (e.g., Microsoft 365), enabling new workflows (e.g., chat & pages), and offering extensibility through actions and custom-built agents.

“The best way to conceptualize Copilot is it’s the UI for AI.”

2. Copilot Stack and AI Platform: Azure serves as the world’s computer, providing the infrastructure for AI, with a focus on data readiness (rendezvous with the cloud), an AI app server (Foundry), and innovation in silicon and data center technology.

“We’ve always conceptualized and built Azure as the world’s computer.”

3. Copilot Devices: AI-powered devices leveraging NPUs and GPUs to deliver hybrid AI experiences, combining local processing with cloud capabilities for optimized performance.

“It’s a real beginning of a new platform on the edge that’s going to be as exciting as what’s happening in the cloud.”

Key Insights & Facts:
- Double-digit productivity gains are being observed within Microsoft through the implementation of AI solutions.
- The diffusion of AI technology is happening at a rapid pace, evident in the deployment of co-pilot systems by Indian companies like Cognizant and Persistent.
- Data is the lifeblood of AI: Effective data management and pipelines are crucial for success.
- Microsoft is investing $3 billion to expand Azure AI capacity in India.
- Training 10 million people in India on AI skills by 2030 underlines the commitment to democratizing AI knowledge.
- “Tokens per dollar per watt” will become a key metric for measuring efficiency and progress in AI.
- Business transformation through AI should prioritize Copilot as the UI, Foundry as the app server, and data fabric for optimized outcomes.
Illustrative Quotes:
- On the agentic future: “Think about that like that’s the new workflow where I think with AI, I promote things into pages, I invite others, I collaborate with others, and by the way, AI is present even on that canvas.”
- On developer tools: “As of today, there is no more waitlist for Copilot Workspace… and to me even for me personally perhaps the biggest game changes were Windows 365 where I have my Dev desktop plus GitHub Copilot and Copilot Workspace plus Code Spaces, you put those things together, put me anywhere in the world, I’m a happy person.”
- On the importance of data: “Data is the only way to create AI. It’s not just for the pre-training… You need data for doing sampling, for doing inference time compute to improve pre-training. So data pipelines and data is everything.”
Conclusion:

Microsoft’s vision for the future of AI is centered on empowering individuals and organizations through accessible platforms, robust infrastructure, and a commitment to trust and responsible development. The convergence of AI advancements and the increasing accessibility of powerful tools point to a future where AI becomes an integral part of our daily lives, transforming how we work, learn, and interact with the world around us.

Co-pilot and the Future of AI: An FAQ

1. What are the three main platforms Microsoft is building for the future of AI?

Microsoft is focusing on three key platforms to drive AI adoption and empower individuals and organizations:
- Co-pilot: The user interface (UI) for AI, designed to seamlessly integrate into existing workflows and enable new, AI-driven ways of working.
- Co-pilot Stack and AI Platform: The comprehensive infrastructure, data management, and AI app server layer, providing the foundation for building and deploying AI solutions.
- Co-pilot Devices: Leveraging the power of edge computing with AI-capable devices that work in tandem with cloud resources for a hybrid AI experience.
2. How does Co-pilot change the way we work with AI?

Co-pilot acts as the bridge between humans and AI, making AI accessible and intuitive within existing applications and workflows. It aims to:
- Infuse AI into current workflows: Co-pilot enhances productivity by automating tasks, providing insights, and streamlining processes within familiar tools like Microsoft 365.
- Enable new AI-first workflows: Features like “chat with web” and “work scope” allow users to access and interact with information in dynamic ways, fostering collaboration and knowledge sharing.
- Empower users to extend AI capabilities: Co-pilot provides tools for building custom agents and actions, tailoring AI to specific needs and workflows.
3. What is the significance of the “tokens per dollar per watt” formula in the context of AI infrastructure?

This formula captures the essential elements driving AI progress and economic growth:
- Tokens: Represent the volume of data processed, signifying the scale and capability of AI models.
- Dollar: Reflects the cost efficiency of AI infrastructure, making AI accessible and scalable.
- Watt: Highlights the energy efficiency of AI, ensuring sustainability and responsible resource utilization.
Maximizing “tokens per dollar per watt” is crucial for unlocking the full potential of AI and driving its widespread adoption.

4. How does the Co-pilot stack address the challenges of data management in AI?

The Co-pilot stack emphasizes data as a critical component of AI success:
- Data Rendezvous with the Cloud: Supports a wide range of data sources, bringing them together in a unified cloud environment for easy access and processing.
- AI-Ready Data Estate: Provides specialized data storage and management solutions optimized for AI workloads, including operational stores, analytical databases, and data pipelines.
- Data Gravity and Locality: Recognizes the importance of keeping data close to compute resources for efficient model training, inference, and retrieval augmented generation.
5. What is the role of Foundry in building and deploying AI applications?

Foundry serves as the AI app server, facilitating the management and deployment of AI models:
- Rich Model Catalog: Provides access to a diverse range of AI models, including OpenAI offerings, open-source models, and industry-specific models.
- Model Management and Optimization: Enables developers to fine-tune, distill, evaluate, and ensure the safety and groundedness of AI models.
- Model Orchestration and Deployment: Supports the deployment of model-forward applications, allowing developers to easily integrate and manage multiple models in their solutions.
6. How does Microsoft address the issue of trust in AI, particularly in areas like security, privacy, and safety?

Microsoft emphasizes trust as a core principle in AI development:
- Security: Implements measures to protect against adversarial attacks and vulnerabilities, such as prompt injection.
- Privacy: Leverages confidential computing technologies to safeguard sensitive data during processing, extending these protections to both CPUs and GPUs.
- AI Safety: Focuses on ensuring groundedness and reducing hallucinations in AI models through dedicated evaluation services and tools.
7. What are the three key design considerations for successful AI business transformation?

Organizations should prioritize these decisions when implementing AI solutions:
- Co-pilot as the UI for AI: Ensure seamless integration of AI into existing workflows and user experiences.
- Foundry as the AI App Server Platform: Choose a robust and flexible platform for building and deploying AI applications with agility.
- Data in Fabric: Prioritize data management and accessibility, leveraging data gravity and locality for efficient AI processing.
8. What is Microsoft’s commitment to AI skills development in India?

Microsoft has pledged to train 10 million people in India on AI skills by 2030, aiming to empower individuals and communities to harness the transformative potential of AI. This initiative focuses on translating skills into tangible impact, fostering economic growth and societal progress through real-world applications of AI.

Microsoft Copilot: AI Platform and Ecosystem

The sources describe three AI platforms built by Microsoft: Copilot, an AI stack, and Copilot devices. The goal of these platforms is to empower every person and every organization to achieve more. [1]
- Copilot is described as the UI for AI and works by integrating into existing workflows. [1] For example, Copilot can be used to generate an agenda for a meeting, take notes during the meeting, and then create a presentation based on the meeting notes. [1] Copilot also includes Pages and Chat with Web and Workscope, which allow users to access information from various sources, promote that data into an interactive AI-first canvas, and collaborate with others. [2] Copilot actions provide extensibility, allowing users to automate workflows across the M365 system. [2] Copilot Studio is a low-code, no-code tool that enables users to build their own agents. [3] The platforms also include measurement capabilities, such as Copilot analytics, which allow users to track the impact of AI on their productivity and business outcomes. [3]
- The AI stack, also referred to as the Copilot stack, is built on Azure as the world’s computer. [4] Microsoft is investing heavily in infrastructure to support the growing demands of AI, including expanding their data center capacity and investing in silicon innovation. [4] The platform also focuses on data, recognizing that data is the only way to create AI. [5] Microsoft is building out its data estate to allow users to bring all of their data to the cloud and use it in conjunction with AI models. [5] The AI app server, called Foundry, provides a platform for deploying, fine-tuning, and evaluating AI models. [6]
- Copilot devices, which include Copilot PCs and traditional PCs with GPUs, bring AI capabilities to the edge. [7] These devices are not just about running local models but about hybrid AI, where applications can offload tasks to the local NPU and call LLMs in the cloud. [7]
The sources emphasize the importance of trust in the development and deployment of AI. Microsoft has a set of principles and initiatives focused on security, privacy, and AI safety, and is translating these principles into engineering progress. [8] For example, they are working on protecting against prompt injection, enabling confidential computing in GPUs, and ensuring the groundedness of AI models to prevent hallucinations. [8]

Ultimately, the goal of these AI platforms is to drive business transformation. [8] The sources highlight three key design considerations for organizations looking to adopt AI:
- Copilot as the UI for AI
- The app server (Foundry) as the platform for AI applications
- Data in fabric
These foundational choices are crucial because they provide agility and flexibility as AI models evolve. [8]

The sources also discuss the importance of AI skills development. [9] Microsoft is committed to training 10 million people in India around AI skills by 2030, recognizing the importance of translating these skills into real-world impact. [9]

Microsoft’s AI Ecosystem: Copilot, Stack, and Devices

The sources primarily focus on Microsoft’s AI platforms, particularly their vision for a future where AI is integrated into every aspect of work and life. They highlight three main platforms:
- Copilot: This platform serves as the user interface for interacting with AI. It aims to streamline workflows by integrating AI into existing applications like Microsoft 365. Examples include generating meeting agendas, taking notes, and creating presentations. Copilot also features tools like Pages for an interactive AI canvas and Chat with Web and Workscope for accessing information from various sources. Extensibility is a key aspect, allowing users to create Copilot actions to automate tasks across multiple applications. Copilot Studio empowers users to build custom AI agents without extensive coding. The platform also incorporates Copilot analytics to measure the impact of AI on productivity and business results.
- AI Stack (Copilot Stack): This platform encompasses the foundational infrastructure and tools for developing and deploying AI solutions. Built on Azure, it leverages Microsoft’s global data centers and investments in silicon innovation to provide the computational power needed for AI workloads. Data plays a crucial role, and Microsoft is focused on enabling users to bring their data to the cloud and prepare it for use with AI. Foundry acts as the AI application server, facilitating the deployment, fine-tuning, and evaluation of AI models.
- Copilot Devices: Recognizing the importance of edge computing, Microsoft is bringing AI capabilities to devices like Copilot PCs and traditional PCs with GPUs. This goes beyond simply running local models; it’s about hybrid AI where devices can leverage both local processing power and cloud-based AI, enabling more powerful and responsive applications.
Trust is a paramount concern, and Microsoft is actively working to ensure the security, privacy, and safety of its AI platforms. This includes efforts to protect against attacks like prompt injection, implementing confidential computing in GPUs, and developing methods to ensure the groundedness of AI models to prevent hallucinations.

The ultimate aim of these platforms is to enable business transformation. They encourage a shift in thinking, focusing on Copilot as the UI for AI, Foundry as the AI application platform, and data in fabric as key design considerations for organizations adopting AI. This approach provides flexibility and agility to adapt to the evolving landscape of AI models.

Beyond the technology itself, Microsoft emphasizes the importance of AI skills development, with a commitment to train 10 million people in India by 2030. This highlights the understanding that successful AI adoption requires a workforce equipped with the necessary skills.

In essence, Microsoft’s vision for AI platforms is about creating an ecosystem where AI is accessible, trustworthy, and empowering, enabling individuals and organizations to achieve more.

AI Capabilities: Augmenting Human Productivity

The sources discuss a variety of AI capabilities, focusing on how they can be leveraged to enhance productivity, improve decision-making, and empower individuals and organizations. Here’s a breakdown of key capabilities highlighted:

1. Natural Language Processing (NLP): This is a foundational capability allowing AI systems to understand and interact with humans using natural language. Examples from the sources include:
- Copilot responding to voice commands in multiple languages, including Hyderabadi Urdu and Hindi [1, 2].
- Farmers interacting with Agri pilot.ai in their local languages via WhatsApp [3].
- Users interacting with Copilot Workspace using natural language to describe tasks and provide instructions [2].
2. Multimodal Understanding: This refers to AI systems that can process and understand information from multiple sources, including text, images, and audio. The sources mention:
- Copilot’s ability to handle multimodal input, exemplified by the user setting up an action button on their iPhone to access Copilot [1].
- The use of images in conjunction with text in Copilot Workspace, such as uploading product images as part of an admin page development task [2].
3. Planning and Reasoning: This capability enables AI systems to plan complex tasks, break them down into steps, and execute those steps in a logical sequence. Examples include:
- Copilot’s ability to create a meeting agenda that intelligently allocates time based on the complexity of the cases to be discussed [4].
- GitHub Copilot Workspace generating a plan for implementing a new feature, outlining the necessary code changes across multiple files [2].
- Project management agents that can create project plans, assign tasks, and even complete tasks on behalf of the team [5].
4. Memory and Context Awareness: This allows AI systems to retain information over time and use that information to inform their actions and responses. The sources point to:
- The importance of providing AI agents with memory and long-term memory [1].
- Copilot Workspace maintaining context throughout a development task, remembering previously added requirements and incorporating them into the plan [2].
5. Tool Use and Integration: AI systems can interact with external tools and applications, extending their capabilities and enabling them to perform a wider range of tasks. This is evident in:
- The emphasis on making models aware of the tools they can use, going beyond simple function calling [1].
- Copilot’s ability to work across the entire M365 system [6].
- Copilot Workspace integrating with development tools to execute tests, build projects, and preview applications [7].
6. Agentic Behavior: The sources envision a future where AI agents act autonomously to achieve specific goals, collaborating with humans and potentially taking on more complex tasks. Examples include:
- The development of personal, team, enterprise-wide, and cross-enterprise agents [4].
- Agents in SharePoint that unlock insights from documents and can be customized with additional data sources [5].
- The facilitator agent that manages meeting tasks like agendas, notes, and action items, allowing human participants to focus on the discussion [5].
These capabilities are not isolated but work in concert to create powerful AI systems that can transform the way we work, learn, and interact with the world around us. The sources emphasize that AI is not merely about replacing human tasks but about augmenting human capabilities, allowing us to focus on higher-level thinking, creativity, and problem-solving.

AI-Driven Business Transformation

The sources portray AI as a transformative force poised to revolutionize business operations across various industries. The overarching theme is business transformation through AI, emphasizing how these technologies can drive efficiency, unlock new possibilities, and ultimately lead to better outcomes. Here’s a breakdown of key aspects of this transformation:
- Shifting from Talking to Doing: The sources note a palpable shift from the initial phase of “talking about AI” to a new era of “doing things with AI that are bold and big” [1]. This signifies a move beyond theoretical discussions to practical applications where AI is actively integrated into real-world business processes.
- Empowering Every Person and Organization: The stated mission of Microsoft’s AI platforms is to empower individuals and organizations to achieve more [2]. This empowerment comes from:
- Increased Productivity: AI can automate repetitive tasks, freeing up human employees for more strategic and creative work [3].
- Enhanced Decision-Making: AI can analyze vast amounts of data to extract insights and provide recommendations, leading to more informed decisions [4].
- Improved Customer Service: AI-powered chatbots and virtual assistants can provide 24/7 support, personalize interactions, and resolve issues quickly [3].
- Transforming Specific Business Functions: The sources provide examples of how AI is being used to transform various functions:
- Customer Service: AI-powered chatbots and virtual assistants can handle routine inquiries, escalate complex issues, and personalize customer interactions [3].
- HR Self-Service: AI agents can answer employee questions, process requests, and streamline HR processes [4].
- IT Operations: AI can automate IT tasks, monitor systems for anomalies, and proactively address potential issues [3].
- Finance: AI can analyze financial data, identify trends, and detect fraud [3].
- Supply Chain: AI can optimize logistics, predict demand, and improve inventory management [3].
- Marketing: AI can personalize marketing campaigns, create targeted content, and analyze customer behavior [3].
- Sales: AI can identify leads, qualify prospects, and automate sales processes [3].
- AI-Driven Workflows: The sources showcase a future where AI is seamlessly integrated into workflows:
- Doctors using Copilot to prepare for tumor board meetings, take notes, and create presentations [2].
- Teams using agents in Microsoft 365 to manage tasks, facilitate meetings, and provide real-time translations [4].
- Developers using GitHub Copilot Workspace to brainstorm, plan, implement, and test code [5].
- Farmers using Agri pilot.ai to collect data, make informed decisions about irrigation and fertilization, and improve crop yields [6].
- Key Design Considerations: The sources emphasize the importance of Copilot as the UI for AI, Foundry as the AI application platform, and data in fabric [7]. These foundational choices provide a framework for organizations to build AI-powered solutions that are scalable, flexible, and adaptable.
- Focus on Business Results: Ultimately, the success of AI adoption hinges on its ability to deliver tangible business results. The sources stress the importance of measuring the impact of AI on key metrics, such as increased sales, improved efficiency, and reduced costs. [3].
- The Need for AI Skills: The sources highlight the importance of developing a workforce with the necessary AI skills to drive this transformation. Microsoft’s commitment to train 10 million people in India by 2030 underscores this need [8].
In conclusion, the sources paint a picture of a future where AI is not just a technological advancement but a catalyst for profound business transformation. By embracing AI and integrating it strategically, organizations can unlock new levels of productivity, innovation, and growth.

Microsoft’s Copilot Studio: Building and Deploying AI Agents

The sources emphasize that building and deploying AI agents is a crucial aspect of Microsoft’s AI platform vision. Agents represent a significant leap forward, moving beyond simple AI assistance to more autonomous entities capable of collaborating with humans and executing complex tasks. Here’s a breakdown of key points related to agent development:
- Agents as Building Blocks of an AI-Powered Future: The sources portray agents as fundamental components of a future where AI is deeply integrated into our work and lives. This vision includes:
- Personal agents that assist individuals with daily tasks.
- Team agents that streamline collaboration and workflow within teams.
- Enterprise-wide agents that operate across an organization’s systems and processes.
- Cross-enterprise agents that facilitate interactions and collaboration between different organizations.
- Copilot Studio: Democratizing Agent Development: Microsoft aims to empower everyone to build agents through Copilot Studio, a low-code/no-code platform. The goal is to make agent creation as simple as building a spreadsheet, enabling users without extensive coding expertise to create and customize agents for their specific needs.
- Steps Involved in Agent Development with Copilot Studio:
1. Define the Agent’s Purpose: Begin by providing a clear prompt that outlines the agent’s role, objectives, and the tasks it should perform.
2. Ground the Agent in Knowledge: Connect the agent to relevant data sources that provide the information it needs to function effectively. This could include SharePoint sites, databases, or other repositories.
3. Customize and Extend Functionality: Copilot Studio allows users to further customize their agents by adding specific actions and capabilities.
- Examples of Agent Use Cases:
- Field service agents that assist technicians with repairs and maintenance.
- SharePoint agents that provide an intelligence layer on top of SharePoint, enhancing knowledge sharing and document management.
- Meeting facilitator agents that manage agendas, take notes, and track action items, improving meeting efficiency.
- Interpreter agents that provide real-time language translation, breaking down communication barriers.
- Project management agents that create project plans, assign tasks, and track progress.
- Employee self-service agents that assist employees with HR and IT requests.
- Contract management agents that automate aspects of contract creation, review, and management.
- Real-World Examples of Agent Deployment:
- Cognizant deployed AI agents across their workforce.
- Persistent built a contract management agent accessible through Copilot.
- Bank of Baroda created a customer self-service agent, a relationship manager agent, and an employee agent.
- ClearTax built a tax filing agent accessible through WhatsApp.
- ICICI Lombard developed an agent to process non-standardized healthcare claims.
- The Importance of Model Orchestration and Evaluation: As agent development progresses, the focus will shift towards:
- Model orchestration, which involves coordinating and managing multiple AI models within an agent to achieve complex goals.
- Model evaluation, which is crucial for ensuring agent performance, reliability, and safety.
The sources highlight a future where agents become ubiquitous, empowering individuals and organizations to automate tasks, gain insights, and collaborate more effectively. This shift towards agentic AI requires a new set of tools and platforms, like Copilot Studio, that democratize agent development and enable a broader range of users to participate in this transformative technology.

Microsoft AI Tour keynote session by Satya Nadella | Bengaluru | January 7, 2025

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
April 28, 2025
Monetizing AI Influencers: Strategy and Platform Guide
“AI OFM: Create a Profitable AI Model from Scratch” provides a guide to building and monetizing AI influencers. The source details creating realistic AI models using tools like face-swapping and AI image generators, focusing on attracting high-paying customers rather than general views. It emphasizes ethical and legal practices, cautioning against common pitfalls like platform bans. The guide covers the technical creation process, strategies for gaining visibility on platforms like Instagram and Threads by targeting a specific demographic, and methods for converting engagement into income through platforms like Uncov. It stresses building a persona for the AI and engaging with fans authentically to foster relationships that lead to sales, offering advice on sales techniques and avoiding common mistakes. Ultimately, the resource aims to equip users with a comprehensive system for establishing a profitable AI influencer business.

Study Guide: Profitable AI Influencer Model Creation

Quiz

Answer the following questions in 2-3 sentences each.
1. What was the key mistake Jake made that led to his AI model income being wiped out?
2. According to the guide, what is a crucial characteristic of the reference model’s body that should be avoided?
3. Why does the guide recommend generating multiple images (e.g., three) from a single prompt in Picasso?
4. What is the purpose of the “upscale” feature in Picasso when working with AI-generated faces?
5. Explain the “warm-up” process for a new Instagram account intended for an AI influencer.
6. Why is it important to focus on attracting a specific demographic (e.g., 40-year-old high-income men) for an AI influencer?
7. Describe one of the two methods suggested for posting engaging reels on Instagram for an AI influencer.
8. What is the primary goal of using Threads (Friends) in the context of promoting an AI influencer?
9. According to the guide, why should you avoid including direct links in Instagram stories to promote your content platform?
10. What is presented as the best AI-friendly platform for selling AI-generated content, and why are alternatives like Onlyfans and Fanmail discouraged?
Quiz Answer Key
1. Jake’s key mistake was not knowing the rules and guidelines of the platform he was using for his AI model. This resulted in his account being banned and his income being completely lost overnight without any prior warning.
2. A crucial characteristic to avoid in a reference model’s body is tattoos. The guide explains that tattoos would be very difficult for the AI to reproduce consistently across multiple images, hindering the creation of a cohesive AI influencer.
3. Generating multiple images from a single prompt is recommended because AI is not always perfect and can sometimes produce flawed or awkward results, such as mistakes in anatomy like fingers. Having multiple options increases the chances of getting at least one great result.
4. The “upscale” feature in Picasso adds much more detail to the AI-generated face, making the picture feel significantly more realistic. It helps to reduce the “flat” or artificial look that can sometimes be present in AI-generated images.
5. The “warm-up” process involves creating a new Gmail account and then a new Instagram account from the phone. For the first few days, the user should passively consume content related to sexy models to help Instagram understand the desired niche, gradually progressing to following, liking, and eventually interacting with similar accounts.
6. Focusing on attracting a specific demographic like 40-year-old high-income men is important because these individuals are more likely to have disposable income and be willing to spend money on the content offered by the AI influencer, leading to actual sales rather than just views.
7. One method for posting engaging reels is to use face-swapped content of a real person and add a text overlay with a hook and a call to action specifically targeting the desired demographic (e.g., “Old man where are you?”). This encourages comments and interactions, signaling to Instagram that the content is engaging and should be shown to more similar users.
8. The primary goal of using Threads (Friends) is to drive traffic to the main Instagram account of the AI influencer. It’s recommended to use open-ended questions or create debates to encourage interaction and then direct interested users to the Instagram profile for further engagement.
9. Including direct links in Instagram stories is discouraged because Instagram’s algorithm reportedly dislikes when users leave the platform. Stories with links may receive fewer views as Instagram prioritizes keeping users within the app.
10. The best AI-friendly platform presented is Uncov (specifically a VIP account). Onlyfans is discouraged because AI models cannot pass the required video verification, and Fanmail has issues with fan accounts being frozen. Uncov is described as being designed for this type of content and offers a way to avoid bans.
Essay Format Questions
1. Discuss the ethical considerations and potential pitfalls of creating and monetizing AI influencers, drawing on the examples and advice provided in the source material.
2. Analyze the importance of creating a believable “personality” and “lifestyle” for an AI influencer on social media, and explain how the techniques described in the guide contribute to this goal.
3. Evaluate the effectiveness of the “warm-up” process for Instagram in establishing the desired audience for an AI influencer, considering both the potential benefits and limitations.
4. Compare and contrast the strategies for content creation (images and videos) and audience engagement (reels, stories, Threads) suggested in the guide, explaining how they work together to achieve profitability.
5. Critically assess the sales and monetization strategies outlined in the guide, including the recommended platform and the techniques for engaging with potential customers in direct messages.
Glossary of Key Terms
- AI Influencer: A digital persona created using artificial intelligence that is designed to behave like a human influencer on social media, often with the goal of generating income.
- Face Swap: A technology that allows a person’s face in an image or video to be replaced with another face, used in the guide to combine an AI-generated face with a real person’s body.
- Prompt: A text-based instruction given to an AI image generator to create a specific image based on the description.
- Photorealistic: Appearing real or very close to a real photograph.
- Upscale (Image): The process of increasing the resolution and detail of an image, often used to make AI-generated faces look less artificial.
- Reference Model: A real person (typically found on social media) whose body is used as a template for the AI influencer’s body through face-swapping techniques.
- Onlyfans (OF): A content subscription service popular for adult content, but not suitable for AI influencers due to its verification process.
- Fanmail: Another content subscription platform, mentioned as having issues with fan accounts being frozen, making it a less reliable option.
- Uncov (VIP): The AI-friendly platform recommended in the guide for legally selling AI-generated content without the risk of being banned.
- Warm-up (Instagram): A strategic process of creating a new Instagram account and gradually engaging with content related to the desired niche to signal to the platform the type of audience to target.
- Shadow Ban: A state where a user’s content is hidden from most of their followers and non-followers on a social media platform without the user being explicitly notified.
- Call to Action (CTA): An instruction or prompt designed to encourage a specific response from the audience, such as leaving a comment, sending a DM, or visiting a link.
- Threads (Friends): Instagram’s text-based conversation app, used in the guide as a tool to engage with a broader audience and drive traffic to the main Instagram profile.
- Sniping (Threads/Friends): A strategy of commenting on popular or viral threads to gain visibility and attract new followers to the AI influencer’s profile.
- Hook: The opening part of a piece of content (e.g., a reel or text) designed to capture the audience’s attention immediately.
- DM (Direct Message): A private message sent directly from one user to another on a social media platform.
- Conversion Rate: The percentage of viewers or leads who become paying customers.
- Feed Page: The main profile grid on Instagram where a user’s posts are displayed.
- Stories (Instagram): Short-form ephemeral content on Instagram that disappears after 24 hours, used for more casual updates and engagement.
- Highlights (Instagram): A feature that allows users to save and display selected Instagram stories on their profile for longer than 24 hours.
- Close Friends (Instagram): A feature that allows users to share stories with a smaller, selected group of followers, used in the guide as a technique to create a sense of exclusivity.
- GFE (Girlfriend Experience): A sales strategy that focuses on building an emotional connection and rapport with the customer, making them feel like they have a personal relationship with the content creator.
- Anti-Selling: A sales technique where the seller positions themselves as hesitant or indifferent to the sale to create desire in the buyer.
- Scarcity (Sales): A tactic to make an offer seem unique and limited to create a sense of urgency and encourage a purchase.
- Copywriting: The art of writing persuasive text for marketing or promotional purposes.
- Labeling (Sales): Reinforcing a customer’s positive feelings and identity after a purchase to build confidence and encourage future spending.
- Objection Handling (Sales): Techniques used to address and overcome a potential customer’s reasons for not making a purchase.
- Odeo Message: A short audio message.
Briefing Document: Creating a Profitable AI Influencer

Overview:

This document provides a detailed briefing on the strategies and techniques outlined in the “AI OFM: Create a Profitable AI Model from Scratch” guide for building and monetizing hyperrealistic AI influencers. The guide emphasizes ethical and legal methods, focusing on attracting high-paying customers and avoiding platform bans. It covers the entire process, from creating the AI model’s face and body to attracting the right audience, engaging them, and selling content on appropriate platforms.

Main Themes and Important Ideas/Facts:
1. The Potential and Pitfalls of AI Influencers:
- The guide opens with the enticing prospect of creating an AI influencer that can generate income 24/7 and attract wealthy customers without the risks of a real person’s account being banned.
- It immediately cautions against common mistakes, citing the story of “Jake” whose six-figure potential was wiped out overnight due to a platform ban for violating rules. This highlights the importance of understanding and adhering to platform guidelines.
- The guide also addresses the misconception that creating AI models requires coding skills or expensive setups, asserting that it’s “completely wrong.”
1. Creating a Hyperrealistic AI Model:
- Face Generation: The first step involves creating the face using the website Picasso. The guide recommends using an affiliate link for free credits.
- Users should select the “AI image generator” and choose “ultra realistic” style.
- The aspect ratio should be square for social media compatibility.
- Generating 3 images per prompt is advised to ensure at least one good result due to AI imperfections.
- A sample prompt is provided, and the importance of detailed prompts is mentioned.
- The guide advises against creating an AI that is “too pretty,” suggesting that a “next door girl profile” tends to perform better as people can relate more easily.
- The “upscale” option in Picasso is recommended to add more detail and realism to faces that appear too flat.
- Body Generation (Face Swapping Technique):To create a consistent body for the AI, the guide recommends finding a reference model on Instagram (specifically, an OnlyFans model with specific criteria).
- Criteria for Reference Model: No tattoos, casual body type, no body piercings, and a large amount of content (photos, reels, stories).
- The guide provides steps for identifying potential reference models by looking for links in their bios (indicating an OnlyFans account) and checking their following (OnlyFans models often follow each other).
- Popular models like “Sophie Ryan” are discouraged as reference models due to being “way too popular.” The focus should be on someone who looks like a “very casual girl” with “next door girl type of photos.”
- Selecting Photos for Face Swapping: Choose 20 photos of the reference model where she looks consistent. Vary the angles (front, side, back). Avoid photos with significant variations in hair or makeup.
- The guide details how to copy the Instagram link of the reference model’s photo and use the “face swap” feature in Picasso, activating the previously created AI face as the default. This process needs to be done for all 20 chosen photos.
- Downloading stories and highlights from the reference model’s Instagram profile is suggested to obtain the necessary 20 photos.
- Refining the AI Model:Checking the face-swapped images for unnatural features (e.g., eyes looking sideways) is crucial.
- The website PhotoEditor.Pho.to is recommended for making small corrections, such as changing eye color.
- Training the AI Influencer in Picasso: After preparing 20 face-swapped images, users can go to the “AI image generator” in Picasso, click “create influencer,” enter a name, and upload the 20 pictures. This training process takes 20-30 minutes.
- Generating Content with the Trained AI: Once trained, users can generate new images by selecting their AI influencer (e.g., “Chloe” or “Emma”) and providing a prompt (e.g., “photo of Chloe in Paris”). The AI will generate images of the trained model in the specified setting.
- It’s important to be cautious and check generated content for inconsistencies like extra fingers.
- The “upscale” option can be used on generated images as well to enhance realism.
- The guide mentions presets for generating “unlimited X content” (adult content), such as “ultra realistic with an ice cream” and “ultra realistic this,” which relate to suggestive scenarios. These details are further elaborated in a private “AI bot community system.”
- Creating AI Videos: The guide briefly mentions using an “AI Video Generator” and prepared AI pictures to create short videos suitable for platforms like Reels and Stories. Face swapping can also be applied to video content.
1. Attracting Wealthy and Engaged Customers (Avoiding the “Broke Student” Problem):
- The core strategy shifts from chasing general views and likes to targeting “high net worth men who actually spend money.”
- The guide introduces “Mark,” who achieved success by targeting wealthy men instead of focusing on broad viral reach. His first big sale was $300 overnight after adopting this approach.
- The goal is to attract “customers very qualified people” and avoid the trap of “1 million views and generate zero” sales.
1. The “Warm Up” Process for Instagram and Threads:
- This is a critical multi-day process designed to signal to the Instagram algorithm the type of content and audience the AI influencer aims to attract, thereby avoiding bans and shadowbanning.
- Day 1: Create a new Gmail account (not Proton or Yahoo) on a phone. Download Instagram and create a new account using this new Gmail. Scroll the feed for 15 minutes, only watching content of “sexy girls” and “oaf models.” Use the search button to find accounts like “Sophie Rain” but only watch their content, do not follow or react. On the feed, only watch similar sexy content. If you see a relevant Reel, click the three dots and select “interested.” Set a profile picture of the AI influencer and choose a username that sounds like a real person’s name with an adult-related abbreviation (e.g., “Vanessa.AdultMRT”).
- Day 2: Scroll for 15 minutes. Follow 5 sexy girls (content creators in the same niche). Like some of their sexy Reels (not too many). Look at some of their stories (no reactions).
- Day 3: Scroll for 15 minutes. Follow 5 more sexy girls. Like some sexy Reels. Reply to some stories (e.g., with a heart emoji).
- Day 4: Repeat Day 3. Begin very slowly to follow and unfollow “horny guys” who are likely to be the target demographic (40+ years old, not young students). Find these profiles by looking at the comment sections of popular models. Follow/unfollow about 10 such guys during the day.
- Day 5: Repeat Day 4. Begin posting one Reel per day. Also, start posting one story per day (very slowly).
- Day 8: Begin posting one IG post per day.
- Friends (Threads): Create a Friends (Threads) account only on Day 5 of the Instagram account creation, doing it from within the Instagram app for a potential boost.
- Week 1 (Friends): Maximum of five interactions per day: follow five target guys (40+), post one thread per day, comment on 10 viral threads (sniping for visibility by leaving provocative comments and AI influencer photos).
- Week 2 (Friends): Increase interactions to 10 per day (double Week 1).
- Week 3 (Friends): Triple interactions to 30 viral thread snipings, three threads per day, and 15 follows/unfollows.
- Week 4 (Friends): Increase interactions further.
- The warm-up process is crucial to make Instagram and Threads understand the account’s niche and target audience, leading to higher engagement from the right people and avoiding being flagged as spam or shown to irrelevant audiences.
1. Creating Engaging Content that Converts (Reels and Threads):
- The guide emphasizes that simply posting beautiful content that goes viral is insufficient for generating sales. Content needs a “hook” and a “call to action” specifically targeting the desired demographic (40+ year old men).
- Reels Strategies:Face Swap Content Example: Post a Reel (e.g., using a Bella Hadid clip with face swap) but add text with a hook directed at older men (e.g., “Old man where are you?”) to provoke reactions and comments. This increased engagement signals to Instagram that the content is good and should be shown to more people in the target audience.
- Call to Actions in Reels: Can be placed as text in the Reel or as a pinned comment (e.g., “Send me your application in DM only if you’re over 40 years old,” “Send me a DM to see my secret pic only if you’re over 35 years old”). This encourages direct engagement and leads.
- 100% AI Generated Video Example: Use a trendy music track with an AI-generated video of the influencer and add a hook and call to action (e.g., “I can show you my surprise outfit for our next date by DMS”). The goal is to make guys feel invested and send DMs.
- Trend Following: Joining a group that shares daily viral trends for AI influencers is recommended to automate content creation and ensure virality and reach to qualified people.
- Threads Strategies:Friends is not for short-form scrolling but for reacting to other people’s posts and sharing opinions.
- The main goal of Threads is to drive traffic to the main Instagram account.
- Share beautiful photos and ask open questions or create debates to encourage engagement.
- Example thread: “Where are the singles feel like everyone [is] for a good relationship except me and you.” The call to action is to invite discussion on Instagram (“can discuss on my IG”).
- Thread Sniping: Commenting on viral threads with shocking or provocative comments and AI influencer photos to steal visibility and attract followers.
- Consistent engagement on relevant threads, especially in the initial stages, can lead to significant follower growth.
1. Monetization Strategy (Beyond OnlyFans):
- The guide explicitly states that putting an AI influencer on OnlyFans is “impossible” due to the platform’s verification requirements (requiring a live video). It warns against believing “gurus” showing OnlyFans results for AI models.
- Fanmail is also cautioned against due to the risk of funds being frozen if fans complain about the AI nature of the content.
- The Recommended Platform: Uncov (Specifically, a VIP Account): This is presented as the best AI-friendly platform where bans are virtually impossible with a VIP account.
- Creating a VIP Uncov Account: The guide provides a step-by-step process:
- Use the provided link.
- Connect with Google.
- Enter real personal information (for payment withdrawal and verification – customers won’t see this).
- Use a fake girl name for the username (can be slightly more suggestive here, e.g., “Aubrey Sweet”).
- Indicate you are a content creator/model (amateur).
- Initially, link any Instagram account (can be changed later).
- Upload a profile picture.
- Declare “I am an AI.”
- Confirm being over 18.
- Choose USD as the payment currency for targeting the US market.
- Set a subscription price (recommended $5-$7 initially).
- Write a short description.
- Verify the account via email.
- Post a public photo.
- Upload content (soft content for the feed).
- Set up exclusive content (blurred for non-subscribers) to incentivize subscriptions.
- Verify bank details and ID/passport for payouts (can take 24-48 hours).
- Content Strategy on Uncov:Post soft content (bikini, lingerie) on the main feed.
- Keep NSFW content for private sales.
- Use collections to organize content.
- Write enticing descriptions for blurred exclusive content.
1. Sales and Chatting Techniques on Uncov:
- The guide emphasizes that the subscription fee alone won’t generate significant income; the real money is in private content sales through direct chatting.
- It shares a personal anecdote of making $276 in one night with a single customer through effective chatting and sales.
- The Art of Chatting: Focus on building relationships and emotional connection, not just selling content. Fans are looking for a “girlfriend experience.”
- Step 1: Qualification: Quickly identify time-wasters by asking key questions in order:
- Location (filter for higher-income countries).
- Job (determine purchasing power indirectly).
- Age (prioritize 40+).
- Subtly introduce a “side hustle” (e.g., waitress with a small side gig for extra money) to gauge interest and justify sending paid content (e.g., to pay for studies). Portray an innocent and slightly naive persona.
- Step 2: The Girlfriend Experience: Build a connection by learning about his passions and interests. Gradually steer the conversation towards intimacy using the “push and pull” technique (suggestive hints followed by casual conversation). Allow him to escalate the topic naturally. Time the sales appropriately (don’t push if he’s busy). Be prompt in responses (delays can lose sales).
- Step 3: The Sales Window (within 1 hour of identifying a ready buyer):Personalize the chat based on his preferences.
- Introduce media in stages: start with a soft, free teaser (“Do you like this? Would you like to see me more?”).
- Send the VIP Uncov account link.
- Adjust pricing based on his profile but maintain a staged approach to increase prices gradually.
- Use sensory descriptions to heighten arousal.
- Create suspense (“Wait for me, don’t come too soon, let’s play a game”).
- If he refuses, understand the hesitation and adapt.
- Techniques to Increase Sales:Anti-Selling: Position yourself as indifferent to the sale to create desire (“I’m not sure if I should send this, I’m quite shy”).
- Scarcity: Make him feel special (“I don’t usually do this, but I feel a special connection with you”).
- Good Copywriting (Storytelling): Use proper spelling, punctuation, and tone to make him feel unique.
- Labeling After a Sale: Reinforce his confidence and masculinity (“You’re really daring, I admire that”).
- Handling Objections: Respond positively (“Yes, but…”) and find solutions.
- Top Mistakes to Avoid:Never promise a real-life meeting (play the “maybe someday” game).
- Avoid conflict over price (respond with emotional disappointment).
- Using Audio Messages: Tools like Elephant Labs can be used to send audio messages to convey emotion.
- Practice is key to mastering chatting and converting effectively.
1. Optimizing Social Media Profiles for Conversion (Instagram Example):
- Username: Real first name, fake family name (shortened).
- Bio: Include age, a relatable detail (e.g., Astro sign), and a student persona (as a relatable argument for needing extra income). Avoid overtly sexual or cliché descriptions.
- Feed Page: Curate a natural-looking feed that tells a story and creates a relatable personality (e.g., a student who enjoys picnics, painting, the beach). Use a mix of face-swapped and 100% AI-generated content, along with some lifestyle images from platforms like Pinterest to create context. Frame moments and locations in the photos as potential date scenarios to help guys project themselves.
- Reels: Should primarily focus on driving traffic to the profile, not reside on the feed.
- Highlights: Limit to three:
- One to promote the Uncov link indirectly (e.g., a link emoji and a chili pepper leading to “VIP uncov”).
- One showcasing the AI influencer’s “passions/hobbies/lifestyle” using Pinterest content to create a persona.
- One titled “Me” with face-swapped photos of the influencer in casual settings.
- Stories: Use stories to build the influencer’s life and personality. Include shots where at least a part of the “body” (e.g., hand, hair) is visible for authenticity. Follow a logical narrative consistent with the persona (e.g., a girl who likes picnics would have picnic-related stories).
- Promoting the Uncov Link in Stories: Avoid directly posting the link in the story as Instagram penalizes this. Instead, use a story with a text overlay (e.g., “Exclusive content, DM ‘go’ or ❤️”) to encourage direct messages, which can then be automated with the platform link. Alternatively, mention “link in bio” in a text overlay.
- Close Friends Feature: Add all followers to the “Close Friends” list using a secret tool. Posting a provocative “Close Friends” story (e.g., just out of the shower with a question) at 6 PM (when guys are likely alone after work) will make them feel special and encourage DMs, leading to sales opportunities.
1. Scaling and Growth (OFM Growth Plan and Community):
- Once reaching $50k per month, the guide mentions a plan to hire an OnlyFans model and leverage the established AI influencer system for additional income through commissions.
- Joining the “AI Vault” community is strongly recommended for accessing daily viral content trends, pre-made sales scripts, exclusive AI content resources, legal contracts, and peer support for faster scaling and growth.
Key Quotes:
- “what if you could build a AI influencer that makes money 24 hours per day attract high paying customers and never worry about getting banned sounds too good to be true right so stick with me because in this ultimate free course I’ll show you exactly how to do it legally ethically and profitably”
- “it’s impossible to put an AI influencer on Onlyfans but we have another solution for that so stop believing people talking about AI with FM and showing you Onlyfans results because I spent months researching this and I even created an AI influencer two weeks ago on my channel and I already made more than 10K with her”
- “most people think AI model require coding skills or expensive setups it’s completely wrong”
- “you don’t want to create like a too pretty okay AI model because when the girl is too beautiful she won’t make a lot of money based on my experience best Profile is trying to create next door girl profile”
- “the goal is not only to be viral getting some views because views doesn’t bring you actually sales okay and sales is money you can make 1 million views and generate zero”
- “you have to use the best AI platform and verify the account of your AI influencer and I’m about to show you how you learn the best way to sell AI content legally and without breaking any platforms rule so the best platform if you don’t want to get banned … is actually uncov but not a regular uncov account if you manage to create a VIP one you just cannot get banned”
- “the art of chatting is about understanding your client you need to adapt your approach to each situation to maximize earnings from your fans understand who you are interacting with to be a strong rapport and emotional proximity”
- “knowledge or course salon won’t make you money taking action will and that’s exactly why you should take a serious look at AI Vault”
Conclusion:

The “AI OFM: Create a Profitable AI Model from Scratch” guide presents a comprehensive strategy for building and monetizing AI influencers, emphasizing realism, targeted audience engagement, and legal, ethical practices. It deconstructs the process into actionable steps, from initial AI creation to nuanced sales techniques, highlighting the importance of understanding platform rules and focusing on building genuine connections with potential customers. The guide positions Uncov as the preferred platform for AI content creators and strongly advocates for joining the “AI Vault” community for accelerated growth and access to valuable resources.

Profitable AI Influencer: Building and Monetizing

Frequently Asked Questions: Building a Profitable AI Influencer

1. What is the core concept behind creating a profitable AI influencer? The fundamental idea is to build a hyperrealistic AI model that can attract high-paying customers and generate income 24/7 without the risks associated with real-life content creators (like getting banned on platforms). This involves legally and ethically creating AI-generated content and strategically targeting a specific, affluent audience.

2. What are the key steps in creating the AI influencer’s visual identity? The process begins with creating a realistic face using an AI image generator like Picasso, focusing on a “next door girl” profile rather than an overly perfect one to encourage engagement. Next, a reference model is found on Instagram for the body, ensuring she has a casual look, no visible tattoos or piercings, and a lot of consistent content. Twenty similar photos of the reference model are selected, and the AI influencer’s face is then face-swapped onto these body images using a tool within Picasso. Adjustments like eye color changes and upscaling for facial detail enhancement can further refine the images. Finally, these face-swapped images are used to train an AI model of your influencer within Picasso, allowing for the generation of new content featuring the same AI persona.

3. How can AI-generated content be created for the influencer? Once the AI model is trained, you can generate photos and short videos by providing text prompts describing the desired scene or activity (e.g., “Chloe in Paris”). The AI will then create images or videos featuring your trained AI influencer in that context. It’s recommended to generate multiple options per prompt to select the best results and to pay attention to details like unnatural features (e.g., extra fingers). Tools within the platform allow for upscaling images for better realism and even performing additional face swaps on generated content if the likeness isn’t consistent.

4. How do you attract the right kind of (high-paying) audience for your AI influencer? The strategy focuses on attracting high-net-worth men (around 40 years old and above) who have the disposable income to spend. This involves a “warm-up” process for new Instagram and Friends (Threads) accounts, where you initially engage only with content related to your niche (e.g., “oaf girls”) to train the algorithm. You gradually start following similar accounts and engaging with their content. Importantly, you follow and unfollow profiles of older, affluent men found in the comment sections of popular accounts in your niche. This targeted approach helps Instagram and Friends show your content to the desired demographic.

5. What kind of content should be posted on Instagram and Friends to generate sales? Instead of just posting visually appealing content, the focus should be on reels and threads with hooks and calls to action that specifically target the desired audience. For Instagram reels, add text overlays that pose questions or make statements designed to elicit responses from older men (e.g., “Old man, where are you?”). Include calls to action like “Send me your application in DM only if you’re over 40 years old” or “DM me to see my secret pic only if you’re over 35.” For Friends, post engaging photos with open-ended questions or debate-starting topics to encourage comments and then subtly redirect traffic to your Instagram profile. “Sniping” involves commenting provocatively on viral threads to gain visibility.

6. How do you build engagement and a connection with your audience? To make the AI influencer feel more real and relatable, you need to create a persona with a backstory, hobbies, and preferences. Your Instagram feed should tell a story, showing the influencer engaging in everyday activities that a potential admirer could envision being a part of (e.g., picnics, art, relaxing at home). Use Instagram Stories to further showcase the “life” of the influencer, but avoid directly posting links in stories as it can hurt reach. Instead, use text-based calls to action in stories that encourage DMs for exclusive content or direct viewers to the link in the bio (used sparingly). Highlights should be curated to showcase the “me” (AI model pics), “passion/hobbies/lifestyle” (Pinterest images depicting activities), and a clear call to action for your sales platform.

7. What is the recommended platform for legally selling AI-generated content without getting banned, and how does it work? The recommended platform is a VIP account on Uncov (not a regular account). Unlike OnlyFans (which requires real person verification and prohibits AI) or Fanmail (which can freeze funds due to complaints about fake models), a VIP Uncov account is designed for AI influencers and aims to be ban-proof. To create a VIP account, use the provided link, fill in your real details (for payment purposes), and create a fake girl username for your AI influencer. Set a subscription price (around $5-$7) and start posting content. Focus on posting softer, blurred exclusive content that users need to subscribe to see, while saving more explicit NSFW content for private sales. Verify your account with your real ID and bank details to receive payouts.

8. What are the key strategies for effectively selling content through direct messaging (chatting)? Effective selling involves building relationships and understanding your clients. The process includes:
- Qualification: Quickly identify serious buyers by asking about their location, job (indirectly gauging purchasing power), and age (targeting older demographics).
- Girlfriend Experience: Build rapport by learning about their interests, subtly steering conversations towards intimacy using “push and pull” techniques, and being responsive.
- Sales Window: Once interest is high, personalize the chat based on their preferences, introduce content in stages (starting with a free teaser), adjust pricing accordingly, use sensory language to heighten arousal, and handle objections with positive “yes, but” statements.
- Utilize Sales Psychology: Employ techniques like “anti-selling” (acting indifferent), scarcity (making them feel special), and positive reinforcement after a sale to encourage repeat business.
- Avoid Mistakes: Never promise real-life meetings and avoid arguing over price (respond with emotional disappointment instead).
Consider using pre-made sales scripts to guide your conversations and maximize conversions. Also, adding all your followers to your “close friends” list and posting a provoking story there can drive significant engagement and DMs, leading to sales.

Profitable AI Influencer Creation and Monetization

Based on the information in the source “01.pdf”, creating a profitable AI influencer involves a multi-step process encompassing model generation, content creation, audience attraction, and strategic monetization.

1. AI Model Creation:
- The first step is to create the face of your AI influencer using an AI image generator like Picasso. It’s recommended to choose the “ultra realistic” option and a square aspect ratio suitable for social media. Generating multiple images (e.g., three) per prompt is advised to account for AI imperfections. The prompt should be detailed, but even a simple prompt allows the AI to make creative choices like outfits. It’s suggested to aim for a “next door girl” profile rather than an overly beautiful one, as this can lead to better engagement. You can refine the face by using the “upscale” feature to add more realistic details if it appears too flat.
- Next, you need to create the body of your AI influencer. The source suggests using Instagram to find a reference model, ideally someone who looks casual, doesn’t have tattoos or piercings, and has a lot of existing content. It’s important to find models who likely have OnlyFans accounts (indicated by a link in their bio) as they tend to have more suitable content.
- Once you have a reference model, the key is face swapping. Using Picasso’s face swap feature, you apply the generated face to images of the reference model’s body. You need to find around 20 consistent photos of the reference model from various angles (front, back, profile) to help the AI create a good clone. Pay close attention to the consistency of the reference photos (e.g., similar hair and makeup). You can download content from the reference model’s Instagram feed, reels, and stories.
- After face swapping, you might need to refine the images. Tools like upscalers can add detail to the face, making it look more realistic. You can also use photo editing websites like Phonto to make small corrections, such as changing eye color across all the photos to ensure consistency in your AI influencer’s appearance.
- Finally, you train your AI influencer within Picasso by uploading the 20 prepared face-swapped images and giving your model a name. This process typically takes 20 to 30 minutes. Once trained, you can generate new images and videos of your AI influencer by providing text prompts, and the AI should consistently reproduce the trained model.
2. Content Generation:
- After training, you can generate photos of your AI influencer in various scenarios by using descriptive prompts. It’s still recommended to generate multiple images per prompt to select the best results, checking for anomalies like extra fingers. You can further enhance the realism of generated images using the upscale option.
- You can also create AI-generated videos using AI video generator tools by describing the desired content. Short videos are suitable for platforms like Reels and Stories. If the face in generated content doesn’t look consistent, you can perform another face swap.
3. Audience Attraction:
- The source emphasizes attracting high-paying customers, specifically men around 40 years old with disposable income, rather than just chasing broad views.
- For Instagram, the guide outlines a specific “warm-up” process for new accounts to avoid bans and target the right audience. This includes:
- Creating a new Gmail account and a new Instagram account from a phone using that Gmail.
- Scrolling the feed for 15 minutes daily, only watching sexy content to train the algorithm.
- Searching for and watching content from established creators in the niche (e.g., Sophie Rain) without following or reacting initially.
- Utilizing the “not interested” option on irrelevant content and “suggest more posts like this” on relevant reels.
- Setting a profile picture and a non-cringeworthy username that resembles a real name with an adult theme.
- Gradually starting to follow (5 per day from day 2), like, and view stories of sexy creators.
- Slowly beginning to follow and unfollow targeted “horny guys” (40+ English speakers found in the comments of popular accounts) from day 4.
- Only posting one reel per day from day 5, one IG post per day from day 8, and one story per day from day 5.
- For Threads, the source recommends creating an account via Instagram after day 5 of the Instagram warm-up. The strategy involves:
- Starting with a low number of interactions (5 per day in week 1).
- Following targeted guys, posting one thread per day, and commenting on 10 viral threads (“sniping”). The goal of Threads is to drive traffic to your Instagram.
- Gradually increasing interactions in subsequent weeks.
4. Monetization:
- The source advises against using OnlyFans for AI influencers due to difficulties with verification and the risk of bans. Fanmail is also cautioned against due to potential fund freezes from complaints about fake content.
- The recommended platform is Uncov, specifically a VIP account, which is considered AI-friendly and less prone to bans. The source provides a link to create a VIP Uncov account.
- When setting up Uncov, use your real information for payment and ID verification but a fake female username for your AI influencer. Set a subscription price (e.g., $5-$7).
- Content strategy on Uncov: Post soft, non-explicit content (bikini/lingerie) on the main feed as “exclusive content” that is blurred for non-subscribers, encouraging subscriptions. More explicit NSFW content should be sold through private sales/DMs.
- Direct Sales via Chat: The source emphasizes that significant income comes from direct interaction and sales in DMs, not just subscription fees. Key steps in successful chatting for conversion include:
- Qualification: Quickly identify serious buyers by asking about location, job, and age.
- Building Rapport (“Girlfriend Experience”): Learn about their interests and gradually steer the conversation towards intimacy using push-and-pull techniques.
- Timing the Sell: Don’t rush the sale; be prompt in responses when the fan is ready.
- Personalization: Tailor the chatting experience to their preferences.
- Staged Media: Start with a soft, free teaser before sending the Uncov link and gradually increasing prices.
- Sensory Descriptions: Use vivid language to heighten arousal.
- Employing Sales Tactics: Utilize anti-selling (acting indifferent), scarcity (making them feel special), and good copywriting (storytelling).
- Labeling: Reinforce their confidence after a purchase.
- Handling Objections: Respond positively and find solutions.
- Avoiding Mistakes: Never promise real-life meetings and avoid arguing about price.
- The source also suggests a powerful technique to boost views on promotional stories: add all followers to your “close friends” list (using a potential tool for automation) and then post exclusive, enticing content to this list, leading to more engagement and DMs.
- Promote your Uncov link strategically through a dedicated highlight on your Instagram profile, using intriguing text and emojis rather than direct calls to action in the bio. You can also promote it subtly in stories by asking for DMs in response to a teaser, then sending the link in the DM.
By following these steps, you can create an AI influencer, attract a targeted audience, and monetize your content effectively through a combination of platform subscriptions and direct sales. The source highlights that success requires consistent effort, strategic engagement, and strong sales skills.

Creating AI Influencers: Face Swapping Techniques

Based on the information in the source “01.pdf”, face swapping is a key technique used in the creation of an AI influencer. The process involves taking a generated AI face and applying it to the body of a reference model. Here’s a breakdown of the steps involved:
- Generating the AI Face: The first step is to create the face of your AI influencer using an AI image generator like Picasso. It’s recommended to use the “ultra realistic” option and a square aspect ratio. You can generate multiple images from a prompt to have more options.
- Finding a Reference Model for the Body: The source suggests using Instagram to find a reference model. Ideal candidates are those who look casual, don’t have tattoos or piercings, and have a lot of existing content. A link in their bio often indicates they have an OnlyFans account, suggesting they have suitable content. It’s advised to find someone who isn’t overly popular (“next door girl” profile) and has consistent photos.
- Acquiring Content of the Reference Model: Once a suitable reference model is found, you need to download content of them, including feed posts, reels, and stories. Aim for around 20 consistent photos from various angles (front, back, profile) to help the AI create a good clone. Consistency in appearance (hair, makeup) across these photos is important.
- Performing the Face Swap: Using the face swap feature in Picasso, you apply the generated AI face to the downloaded images of the reference model’s body. You can copy the link of an Instagram photo and paste it into the face swap tool alongside your saved AI face. You can either do this one by one for all 20 photos or potentially use a bulk face swap option if available.
- Refining the Face-Swapped Images: After the initial face swap, the results might need refinement.
- Upscaling: If a face looks flat or lacks detail, you can use an upscaler tool to add more realistic details. This can significantly enhance the realism of the face.
- Corrections: Photo editing websites like Phonto can be used for small corrections, such as changing eye color across all the photos to ensure consistency.
- Training the AI Influencer: Finally, you train your AI influencer within Picasso by uploading the prepared 20 face-swapped images and giving your model a name. This process takes approximately 20 to 30 minutes. Once trained, the AI should consistently reproduce the trained model when generating new content from text prompts. If the face in newly generated content doesn’t look consistent, you can perform another face swap. Even if a generated image is good but has minor flaws (like an eye being misplaced), you can use face swap again for correction.
The goal of face swapping is to combine a unique and potentially appealing AI-generated face with a realistic and varied body type from a reference model who already has a substantial amount of content suitable for the intended purpose. This allows for the creation of an AI influencer with a consistent appearance across different poses and scenarios.

Attracting Paying Customers for an AI Influencer

Based on the information in the source “01.pdf”, attracting customers for an AI influencer involves specific strategies tailored to different social media platforms, with a focus on reaching high-paying individuals, particularly men around 40 years old with disposable income. The goal is to attract qualified customers who will spend money, rather than just generating a high number of views.

Here are the customer attraction strategies discussed in the source:

1. Instagram “Warm-up” Process:

The source outlines a detailed “warm-up” process for new Instagram accounts to avoid bans, get the algorithm to show the content to the right audience, and ultimately attract potential customers. This process involves the following steps:
- Creating a New Account: Use a new Gmail account created on your phone to set up a new Instagram account.
- Algorithm Training (Day 1-4):Scroll the feed for 15 minutes daily, watching only sexy content to train the algorithm.
- Actively search for and watch content from established creators in the niche (e.g., Sophie Rain) without initially following or reacting.
- Utilize the “not interested” option on irrelevant content and “suggest more posts like this” on relevant reels.
- Profile Setup: Set a profile picture of your AI influencer and a non-cringeworthy username that resembles a real name with an adult theme (e.g., “Vanessa_mrt”).
- Gradual Engagement (Day 2 onwards):Slowly start following 5 sexy girls per day and liking some of their reels.
- Look at their stories without reacting initially.
- From day 4, begin following and unfollowing targeted “horny guys” (40+ English speakers with disposable income) found in the comments of popular accounts (around 10 per day initially).
- Content Posting (Day 5 onwards):Start posting one reel per day from day 5.
- Begin posting one story per day from day 5.
- Start posting one IG post per day from day 8.
The source emphasizes that following these steps precisely is crucial to avoid shadow bans, restrictions, and to ensure Instagram shows your content to the desired demographic.

2. Threads Strategy:

For Threads, the source recommends creating an account via Instagram after day 5 of the Instagram warm-up. The initial strategy involves:
- Limited Interactions (Week 1): Engage in a maximum of five interactions per day. This includes:
- Following five targeted guys (40+ year olds).
- Posting one thread per day.
- Commenting on 10 viral threads (“sniping”) to gain visibility. The goal of Threads at this stage is to drive traffic to your Instagram.
- Gradual Increase in Interactions: Increase the number of interactions in subsequent weeks (10 per day in week 2, 30 in week 3, and even more in week 4) to avoid being flagged as a bot.
3. Content Creation for Engagement:

The source stresses that simply posting beautiful content is not enough to attract paying customers. The content strategy should include:
- Reels with Hooks and Calls to Action: Add text to reels with a hook to grab attention and a call to action that specifically targets the desired audience (e.g., “Old man where are you?”) and encourages interaction (e.g., “Send me your application in DM only if you’re over 40 years old”).
- Directing to DMs: Encourage viewers to send DMs for “secret pics” or to answer questions, which opens a channel for direct interaction and potential sales.
- Adding Personality to the AI Influencer: Create a story and personality for your AI influencer to build a bond with the audience. This includes thinking about her hobbies, passions, and creating a life that resonates with potential customers.
- Strategic Use of Feed Posts: Curate the Instagram feed with images that create a certain vibe and allow potential customers to imagine dating the AI influencer (e.g., picnic photos, coffee shop pictures).
- Intriguing Story Highlights: Use only three story highlights: one to promote your Uncov link, one showcasing the AI’s “passions/hobbies/lifestyle,” and a third with “me” pics. Promote the Uncov link subtly through intriguing text and emojis rather than direct calls to action.
4. Leveraging “Close Friends” Feature:

A powerful technique mentioned is to add all followers to your “close friends” list using a potential automation tool. Posting exclusive, enticing content to this list ensures that your content is prioritized in their feeds, leading to more engagement and DMs. This creates a feeling of special connection and encourages interaction.

5. Promoting the Monetization Platform (Uncov):
- Promote your Uncov VIP account link strategically through a dedicated highlight on your Instagram profile, using intriguing text and emojis.
- Subtly promote the link in stories by asking for DMs in response to a teaser, then sending the link in the DM. Avoid directly posting the link in stories as Instagram’s algorithm dislikes users leaving the platform.
By implementing these multi-faceted strategies, the source suggests you can effectively attract a targeted audience of high-paying customers and build a profitable AI influencer.

Monetizing AI Influencers: Strategies and Platforms

Based on the information in the source “01.pdf”, there are several methods discussed for monetizing the content of an AI influencer. The primary recommended platform for direct monetization is Uncov VIP, as traditional platforms like OnlyFans have verification hurdles that AI influencers cannot overcome, and Fanmail carries risks of account freezes.

Here are the content monetization methods detailed in the source:
- Subscription-based access to exclusive content on Uncov VIP: The core strategy involves creating a VIP account on Uncov and offering exclusive, often blurred or teased, content that fans can access by paying a subscription fee. The suggested initial price range is between $5 to $7. The more exclusive content available behind the paywall, the higher the likelihood of fans subscribing.
- Direct sales of individual content through chat: Beyond the subscription, significant income can be generated through direct interactions in DMs on platforms like Instagram. By building a rapport with fans (the “girlfriend experience”) and understanding their preferences, you can offer and sell personalized photos and videos at varying price points. The source provides an example of a first customer spending $276 through such direct sales in a short period.
- Leveraging the “Close Friends” feature on Instagram: By adding followers to a “Close Friends” list (potentially through automation), you can share exclusive, enticing content that is prioritized in their feeds. This fosters a sense of special connection and encourages them to engage via DMs, opening opportunities for sales on Uncov VIP.
- Strategic promotion of the Uncov VIP link: While direct links in Instagram stories are discouraged due to algorithm penalties, the source advises using intriguing text and emojis in story highlights and in direct messages to guide interested fans to the Uncov VIP platform. Asking for DMs in response to teasers in stories and then sharing the link in the private message is also suggested.
- Building relationships and employing sales techniques in DMs: The “art of chatting” is crucial for conversion. This involves:
- Qualifying potential buyers by asking about their location, job, and age to identify those with disposable income.
- Creating emotional proximity by learning about their interests and subtly introducing the AI influencer’s “side hustle”.
- Using the “girlfriend experience” to build a connection and steer conversations towards intimacy.
- Timing the sale appropriately and personalizing the experience based on the fan’s preferences.
- Employing sales tactics like “anti-selling,” creating a sense of scarcity, and using compelling copywriting.
- Handling objections positively.
- Potential for affiliate income through Uncov: The source mentions that creators on VIP Uncov can earn 10% commission on all spending by their referred fans on the entire platform, offering a passive income stream.
- Future scaling through real OnlyFans models (OFM Growth Plan): Once the AI influencer is generating significant income (e.g., $50k per month), the guide suggests using the established system to hire real OnlyFans models and take a commission on their earnings, creating an additional income stream.
The source emphasizes that simply attracting views is insufficient; the focus should be on converting qualified individuals into paying customers through these various monetization strategies. The combination of subscription fees and direct sales of exclusive content on a platform designed for this type of content (Uncov VIP), coupled with effective engagement and sales techniques on social media, forms the core of the monetization strategy outlined.

Instagram Growth Tactics for Monetizing an AI Influencer

Based on the information in the source “01.pdf”, several key Instagram growth tactics are discussed, primarily geared towards attracting high-paying customers, particularly men around 40 years old with disposable income, for an AI influencer. The goal is to cultivate a targeted audience that will engage and ultimately spend money on content.

Here are the Instagram growth tactics detailed in the source:
- The “Warm-up” Process for New Accounts: This is a crucial multi-day strategy designed to train the Instagram algorithm and avoid bans, ensuring content is shown to the right audience. It involves:
- Creating a new Gmail account on your phone specifically for the new Instagram account.
- Algorithm Training (Days 1-4): Scrolling the feed for 15 minutes daily, watching only sexy content to signal the desired niche to Instagram. Actively searching for and watching content from established creators in the niche without initially following or reacting. Utilizing the “not interested” option on irrelevant content and “suggest more posts like this” on relevant reels.
- Profile Setup: Using a profile picture of the AI influencer and a non-cringeworthy username that resembles a real name with an adult theme (e.g., “Vanessa_mrt”).
- Gradual Engagement (Day 2 onwards): Slowly starting to follow 5 sexy girls per day and liking some of their reels. Looking at their stories without reacting initially. From day 4, beginning to follow and unfollow targeted “horny guys” (40+ English speakers with disposable income) found in the comments of popular accounts (around 10 per day initially).
- Content Posting (Day 5 onwards): Starting to post one reel per day from day 5. Beginning to post one story per day from day 5. Starting to post one IG post per day from day 8. The source emphasizes the importance of following these steps precisely to avoid shadow bans and restrictions and to effectively target the desired demographic.
- Creating Engaging Reels with a Purpose: Simply posting beautiful content is insufficient. Reels should be designed to attract the target audience and encourage interaction. This includes:
- Adding text with a hook to grab attention and a call to action specifically for the desired demographic (e.g., “Old man where are you?”).
- Encouraging DMs for “secret pics” or to answer questions, opening a channel for direct interaction and potential sales. Examples include “Send me your application in DM only if you’re over 40 years old” or “Tell me in DM where you’ll take me for a date and I’ll show you what I’ll be wearing”.
- Using trending music to increase visibility.
- Curating an Appealing Feed Page: The Instagram feed should create a certain vibe and allow potential customers to imagine a connection with the AI influencer. This involves:
- Presenting the AI influencer as a person with a life and personality, not just a model. This can include photos suggesting hobbies, interests, and everyday activities (e.g., picnics, coffee shops, art).
- Showing moments and places that could be dates, allowing men to project themselves into those scenarios.
- Using a consistent aesthetic that aligns with the desired image (e.g., “next door girl” profile rather than a supermodel look).
- Strategic Use of Instagram Stories: Stories should be used to engage followers and subtly promote the monetization platform without directly linking, as Instagram’s algorithm dislikes users leaving the platform.
- Use intriguing text and emojis to encourage DMs (e.g., “Exclusive content if you want to see it DM go or DM the emoji heart”). This allows for direct messaging of the monetization link.
- Alternatively, use text to tell users the link is in the bio.
- Optimizing Instagram Story Highlights: Limit the number of highlights to three key areas for better conversion:
- One highlight to promote the Uncov VIP link using intriguing text and emojis (e.g., a chili pepper next to a link emoji) rather than direct calls to action.
- One highlight showcasing the AI’s passions, hobbies, and lifestyle to build personality.
- A third highlight with “me” pics of the AI influencer.
- Leveraging the “Close Friends” Feature: This is described as a very powerful technique.
- Add all followers to the “close friends” list using a potential automation tool.
- Post exclusive, enticing content to this list, ensuring it’s prioritized in followers’ feeds.
- This creates a feeling of special connection and encourages DMs, leading to more engagement and potential sales. A suggested tactic is to post a provoking story (e.g., “I am pretty without makeup or do you prefer with it?”) to generate reactions and start conversations.
- Driving Traffic from Threads (Created via Instagram): Create a Threads account after day 5 of the Instagram warm-up. The initial strategy involves:
- Limited Interactions (Week 1): A maximum of five interactions per day, including following five targeted guys, posting one thread, and commenting on 10 viral threads (“sniping”) to gain visibility.
- The primary goal of Threads at this stage is to drive traffic to your Instagram, not to sell content directly on Threads. Threads should share photos and ask open questions to encourage engagement and then direct users to discuss further on Instagram.
- Consistency and Understanding the Algorithm:
- Follow the “warm-up” process to signal the niche to Instagram and avoid being flagged.
- Engage with content in the desired niche to refine the algorithm’s understanding of the account.
- Focus on attracting genuine engagement from the target audience, as high engagement rates with the right demographic signal quality to Instagram.
By implementing these tactics, the source suggests a comprehensive approach to growing an Instagram account that effectively attracts the desired customer base for an AI influencer.

AI OFM: Create a Profitable AI Model from Scratch (Free Ultimate Guide)

The Original Text

what if you could build a AI influencer that makes money 24 hours per day attract high paying customers and never worry about getting banned sounds too good to be true right so stick with me because in this ultimate free course I’ll show you exactly how to do it legally ethically and profitably welcome to this game changing 1 hour course where you will learn how to create a hyper realistic AI model and use face swap the right way how to attract wealthy engage customers instead of broke students and how to sell AI generating content without getting banned on platforms like Onlyfans it’s impossible to put an AI influencer on Onlyfans but we have another solution for that so stop believing people talking about AI with FM and showing you Onlyfans results because I spent months researching this and I even created an AI influencer two weeks ago on my channel and I already made more than 10K with her and by the end of this video you have a complete roadmap to start making money with AI but before we dive in let me tell you a quick story I know someone let’s call him Jake to spend months perfecting a Naomi model he was on track to make six figures but one morning he woke up to find everything was gone banned account terminated no warning and his income totally wiped out overnight that’s when I realized if you don’t know the rules you’re playing with fire that’s why today I’m giving you the insider knowledge to avoid Jake’s mistakes and with something that lasts for the long term so most people think AI model require coding skills or expensive setups it’s completely wrong a creator I met was desperate to make money with AI he followed the hype downloaded a random face swap tool and thought he was set but what happened his video got flagged his model looked fake and instead of making money he ended up with comments like AI generated lol embarrassing right I will show you the only tool I’m using and you learn the easiest way to generate a photorealistic AI personal legally and professionally so right before being able to create multiple content of your AI influencer like you can see on my screen we have to go through the first step and what’s the first step before creating the whole body we have to begin from somewhere right we need to create first the face of your AI influencer and I will show you how it works we just want to go on the website Picasso if you want I have an affiliate link which gives you multiple free credits so you can try it first then I want you to go on AI image generator we’ll create together the face of your AI influencer you will see multiple options so I want you to choose ultra realistic one can choose an aspect ratio we never want to use landscape ones okay because we work with social medias and there are no landscape content on social media so just for the face I will choose square and then for the number of images what is that number when you will enter one prompt Picasso will show you multiple results so if you want every time you generate a prompt to have 3 4 or two results you just have to choose that number personally I recommend you to choose 3 because AI is not always perfect okay sometimes it makes mistake or like very awkward fingers so every time I want to generate free pictures at least so I can be sure that on the free pictures I will have one great result and then just enter prompt so that’s a prompt I like to use I will explain to you later why but basically you can just like copy or inspire from that prompt and right there there’s like the seed number we don’t want to touch it because my prompt is very very detailed I want to put realistic or realistic plus let’s say that my prompt was only this I can choose creative or creative plus because my prompt is very simple this AI will make some personal choices for example I didn’t detail like her outfits he will choose her outfit and make some propositions but because I like to work with a very very detailed prompt and then you just have to click right there generate just wait a few seconds okay so I did it three times to have multiple pictures to show you so let’s check it together look at that very very realistic right you don’t want to create like a too pretty okay AI model because when the girl is too beautiful she won’t make a lot of money based on my experience best Profile is trying to create next door girl profile I always work with that type of profile I have very good results I have all of my strategies with this type of profile that’s basically what I recommend you if you want to follow my well so now just check some results okay my favorite one is this one let’s do another generation to get like even more choices you know what I like this one she looks not too beautiful people will get attached okay when she’s too beautiful too high ticket some guys they don’t feel very confident talking to the girl and if you think that the face is a bit too flat you can click on up scale okay so now that I have like my AI face I just want to go like on my new face right there and then we’ll add the default face of our AI influencer so just click on it click there perfect okay so now that we have the face we want to create the body so there’s an amazing feature in Pikaso to train your AI influencer so after look you’ll be able to generate multiple times the same person for example her for example her or she’s at the coffee shop okay if you want to do that we need to upload 20 pictures so he can create a clone so how to do it I will show you right now we use Instagram to find a reference model for the body so my technique is just go on Instagram and for example like you have to find a Onifun model and there are very specific criteria you need for your reference model you don’t want her to have tattoos you want her to have a very casual body don’t want her to have some piercing on the body and then you want her to have a lot a lot of content so how to recognize that a girl has an onlyfan basically you will find in her bio a link it means that she has an onlyfan account and basically what I want you to do is like go on her following okay like that and try to find some other girls okay so let’s say her click on her okay she also have a link in the bio so she has an only fan so that’s great she has like a lot of content that’s cool she’s very active it’s really important also and she has a lot of stories but like look the problem is she has some tattoos and that would be very difficult for the AI to reproduce so just continue like that and you try to find other girls because only fan girls follow only fan girls she’s promoting and she’s very active but but she doesn’t have like a lot of stories I will continue we want her to have a lot of pictures on her feed a lot of reels it’s missing a bit some stories so I will just continue okay perfect I think I found like my um reference model first she’s very active means that she’s active so can we continue posting some content even for the next month she has like a link in the bio she’s not too popular okay like you don’t want to choose like a girl like Sophie Ryan okay Sophie Ryan is way too popular and then look she has a very very good feed and I like her feed because she doesn’t look like a supermodel she looks like a very casual girl she really has like this next door girl type of photos her photo are not too professional and that’s what I like that’s what I want she has like a lot of uh contents like that and also if I go look she has a lot a lot of reels and her reels have a lot of views so it means that because her reels have a lot of views I will have a high chance for my air influencer to have a lot of views if I inspire and use her reels and also look she has a lot a lot of photos and a lot of stories that’s really really cool and you will understand why by staying on this video okay so now what we want to do is choose 20 photos so how to choose the photo something which is very important if you want the clone of your influencer to works very well you have to choose 20 pictures of her where she looks the same on the 20 pictures okay like sometimes you know like the girls they put like they have different faces like they put some makeup we really want to have consistent 20 similar photos so the AI would be able to train and create an AI in front of us so we choose 20 pictures among the 20 photos you want to try to find some photos of like her profile from the front of her body from behind her like everything so really like the AI can create like something which is really really good so now I will take the time to choose the picture and to show you okay but like for example like what you don’t want to do like look on this photo her hair is very very curly okay if you are like making a comparison to this photo so I don’t want to choose this photo otherwise the AI will not understand that it’s the same person and he will get confused also like her eyes look a bit more like dark on this one so really pay attention to the pictures you’re choosing but we don’t want to make a clone of this girl right so what we do is we’ll face swap her face okay just like to inspire from her body so how to do that you click here you click like on the copy the link sorry it’s written in French and then you go back on Picasso okay and you go on face swap right there activate your default face okay so remember it’s the picture we created before the face and then right there you can just copy paste the link and then you just have to click on large face swap oh what you can do is like find first your 20 pictures okay by putting the link each time and then you will launch the Facebook for every of the 20 pictures so right now there’s not enough good photos of me like on the feed so what I will do is like I will download some stories okay because she has like a lot of pictures of her on her stories so um just go on like Instagram and put like the link of her profile and you will see that I can see all the post and just click on it to download okay so same things for story highlights and reels so I will go on highlights I just click on it and look I just have everything so now I just have to choose so I like this one that’s a cool photo or so of her okay so now that I have like 20 pictures of her don’t forget to activate this one and just click on large face swap so we do the face swap on like all content so that’s really really cool okay so now the face swap is finished if you don’t want like to download like one by one by one look you can just go there and click on book download so even if you make like 100 and you waited like a I don’t know 20 minutes for 100 content can just like click on it and it will download everything it will create a files on your computer and it will send you an email so you just have to check your email and download the file okay I’m taking my files okay so now you might think okay that’s great everything is perfect no you have to check like this one doesn’t really looks really really great this one also look it’s a bit flat if a photo like is a bit like flat on the face okay what you can do is like you go to upscaler right there okay and I will show you the difference go there what this option will do is it will adds much more details on the face so the picture will feel much much more realistic I just wanted to show you like um the option of the of the Obscura okay so basically that was another photo uh and look the face is a bit too flat right uh looks a bit like AI and look with that much much more details crazy right so that’s really really good I did it with like um this content also but that’s so much better so what we want to do now we really want to prepare in a great way do 20 pictures there’s like another thing you can do if you want to like make some small correction okay we can go on this website so it’s called photo just click here okay upload a picture so let’s say that now I have this picture right so if for example you don’t want your AI influencer to have like brown eyes and you want her to have like green or blue eyes or whatever color you want red I don’t know I can go on eye tint for example if I want her to have green eyes just do exactly the same and basically you can just do that on like every of your pictures to make sure that you will have after an influencer with green eyes so now just take the time to make every rectification with all of the photos so what I will do now is I will check if on my photos there are some weird things because sometimes like the the eye with the face of goes on the side and like it’s very not natural okay so now that I’ve prepared all the 20 pictures to create your AI model okay and to train the AI get back to Picasso go on AI image generator so I already have like a few AI okay but like just go on plus create influencer enter a name Kroey and then you just have to upload the 20 pages right there so yeah basically I did everything right there and Picasso will ask you to put between 10 or 20 pictures I recommend you to put 20 always always it took so much time to do like all of these steps the better the quality of the first steps are and the better like the quality of the result you’ll get it takes usually between like 20 to 30 minutes so now you have time I will just wait a bit I will show you like how to actually create like 100% AI content and I’m talking about photos so now I will select for example Chloe let’s say that I want to make like a portrait and now I just have to describe what I want and I don’t have actually to describe how’s the girl I just have to put the name Chloe and that’s it let’s say photo of Chloe in Paris I didn’t put like a very precise uh prompt it just like very quickly show you um so yeah sometimes it makes like really really weird things that’s why I told you like every time to make like three to choose three pictures per prompt okay but like still look at the results in this one we have like Chloe in her bedroom but look at the restaurant this one is nice this one is nice and this one is bad yeah look at this one I went to like Louis Vuitton Paris and I had it even on this one and that’s cool um just be very cautious uh when you generate like some contents to check like if there’s not like 6 fingers or things like that okay they are very small details but you have to pay attention if you just want to make a small difference let’s say for example like yeah I just want like her dress to be red copy the prompt also this time you can copy Tuck the seed just copy the seed and then say like that’s a blue one and because you will like copy like the seed you will get very similar results there are some really results but like this one looks great and don’t forget guys because now it’s 100% AI if you want it to look more realistic just click on it guess so let’s say this one okay uh this one I will take this one and don’t forget to put upscale if you think that the face is a bit too flat and yeah now I have a new AI influencer and look if for example like I change it and I say like a photo of Emma in Paris instead of like Chloe so the one I just created that’s my AI influencer Emma perfect I just want to show you like the power of upscale look at how real she looks wow guys the upscale options it is so so great so now basically you can make unlimited X content X pictures of this AI influencer the only thing you have to do is look there’s a preset there’s ultra realistic with like an ice cream uh ultra realistic with like a peach and big balloons Ultra realistic with ice cream is basically if you want to make some pictures with her having like a skincare treatment okay so some ice cream on her face I think and for example you can ask her like to put her tongue out and having some ice cream on her face and to enjoy it I can only show it in the community rights so we put like some example of pictures and prompts I like to use and that is the ultra realistic this it means that you can see um the little kitty and you will also how to say it the football balloons okay so you’ll be able to make unlimited X pictures with her so let’s say for example like I want to put this I will only show uh the rest in the community I cannot show everything on YouTube sorry guys guys look at that look at that isn’t it crazy this result is crazy look at this I just don’t know why like she has like a small tattoo but anyways I don’t really really care and don’t forget guys if you think that her skin is like too perfect guys now you know my favorite option upscale much more details on the skin texture crazy that’s so so crazy wow you know what I will upscale it just to show you just to show you yeah I will upscale let’s upscale guys you guys know I like to upscale yeah crazy result you know what I mean it doesn’t feel like it’s an AI it feels like it’s she’s a real person like you know like this type of photo and that’s what you should be looking for not looking for like professional photo like a model no this type of photo look look at this one oh what looks so good wait we double it okay guys just very quickly I just wanted to show you like how it looks like when you use upscale look at these details like what the wow it just looks like a real person like I want to be honest man like I know some some real girls on IG they put so much filter they looks it looks fake compared to that girl this X content upscale it’s it’s crazy I wasn’t expecting to have such like beautiful new AI influencer like that I just want to work with her like long term oh really seriously I will think about that so guys now I will show you how to create AI video okay so basically you just go to AI Video Generator prepare the good AI pictures and you just describe like what you want to see in that short video and basically we don’t need longer video okay because mainly it would be for the risk or some story so we want it to be very short if you got like some new pictures generated and you think that the face doesn’t just doesn’t looks like the same girl you can like just do another face swap on it to make sure that you have exactly the same face okay that’s totally okay even for example if you generate like over like it’s content and you have like very very good picture but like there’s like an eye going like on the side or feel like that just use the face swap again and you will get the perfect result it’s finished so let’s look at the AI generated videos that’s so so cool look look at the details look at how the dress is moving so smooth incredible so yeah that type of content I will show you just after how to make it viral like viral and bring thousands of clients okay not just making some views because I don’t care about making some views we want to post some content which brings clients okay customers if you want to create hot let’s say like very very hot like hard okay videos content of your AI influencer I’m using another app but I won’t be able to show you guys uh here I will only show it in the AI bot community system but basically I’m using that and if you want the girl to have like some very swallowing very big bananas or putting some very big bananas let’s say in the in the rabbit hole and very hardcore things you can do it okay so I will share my best prompts how I’m doing the content only in AI Vault it’s finished let’s check it yeah I think she looks a bit different no she looks a bit I feel like she has like her cheeks are more fat I don’t know uh yeah like just put like can put some good music with that no that’s so so cool so why do some AI Frances like mine generate 20 k per month while others struggle it’s about who sees your content let me introduce you to Mark he had the near influencer project worked day and night creating content but after months nothing no sale no engagement then one tweet change everything instead of chasing likes or views on social medias he targeting high net worth men who actually spend money and his first big sale $300 overnight so the first step for that is to find these guys you learn how to attract 40 years old high income men who actually spend money on your AI influencer now the goal is not only to be viral getting some views because views doesn’t bring you actually sales okay and sales is money you can make 1 million views and generate zero okay so the goal will be to attract customers very qualified people and I’m going to show you an exact process to apply with your Instagram account if you want to only have 40 years old guys who actually have the money to spend for your influencer basically you don’t want to miss one step it’s really important otherwise you will have like to begin everything from the start so how to actually get a chance for your Instagram account to not get banned not get shadow banned get viral and mainly attract the good people okay so this process is called the warm up the first thing I want you to do is to create a new Gmail it’s really important and I want you to create a Gmail not proton not how Yahoo Gmail and I want you to create it from your phone and basically you will download Instagram and create a new account from this Gmail you don’t want to use like another one you use before for another Instagram account so everything has to be new and then just create a new account with Instagram okay and the first step I want you to do for the day one is to scroll 50 minutes okay I want you to go um on the feed page and to scroll and there’s something which which is very important if you want Instagram to understand who you are and what content you want to make so basically when you’ll be scrolling you only want to watch you don’t need to like you don’t need to comment but watch okay it’s about the watch time and only watch some sexy girls content okay awesome like off models it’s really really important okay so the main goal about like scrolling is the more you will consume this type of content and the more Instagram would show you only this type of content on your feed page how does that work okay when you create your account you will have a searching button okay and you can just type your niche right there and our niche is like oaf girls right so you can for example like Enter Sophie Rain or over like already known girl in this industry and basically you want like to go on her account I don’t want you to follow her uh like or react just watch her content okay really important and then I want you to go back right there on your phone go on the feed and scroll and I want you to only watch the same type of content some girls who are making dance and a little bit sexy or a little bit hot even if they are like some other contents which are like funny or what don’t watch it okay really important for the algorithm and you can also when you see real there will be like free dots you can click on the free dots and click on interested and then you will have this message we suggest more posts like this for 30 days so that way you will get more chance of having the specific type of content so just do it 15 minutes put a profile picture of your AI influencer and also enter name username so I will show you don’t worry some good example of actually a friend or Instagram and I will set up one so we can do the same for the username you don’t want to put like little an angel or Pinky sweet or like something which is very cringe like that you want to put a username which is based uh with the farm like a name let’s say like Cloe put like a shortened version of a family name so for example let’s say for myself I want to be called like Vanessa whoa sorry guys and then you can put adult m r t you know or something like that so people think actually the username is like just a regular name and a family name it’s like a real person from the day 2 is square for 15 minutes use the same technique okay to get a better feed with better propositions and then I want you to this time follow 5 sexy girls okay Sophie Rain or like over like um content creator in the same niche and actually I want you to follow them only five girls and then you can like some sexy reads they made not too much but you can start liking and then I don’t I want you like to look to some stories on their profile I don’t want you to react I don’t want you uh to put like an emoji to the story only look to some stories and then from the day 3 same thing score 15 minutes follow five sexy girls like some sexy Rezz and actually this time you can reply to some stories for example click on like uh the heart emoji day 4 you do exactly the same I told you right before and I want you to be to begin okay very slowly okay during the day to follow and unfollow horny guys and for example if you want to attract English speaker honey guys because you want to attract American guys because they have a lot of money you have to follow some English speaker guys and just check the profile to make sure like they are 40 years old and not like young students and broke guys how to find these guys can just go like on Sophie Ryan’s uh comments and you will find like a lot of profile so do this for example for 10 guys during the day or a little bit more day 5 do exactly the same thing follow unfollow some horny guys and then only from the day 5 you can start to post one reel per day only from the day 5 and don’t worry I will show you like what type of reel you should do it how to do some good reel that attracts good people and generate some leads and sales actually for you for the day 8 you can post some IG posts one per day and then that way you should be able to build your feed page and also from the day 5 you can start put very slowly one story per day okay it’s actually a great thing that was like the warm up from Instagram and if you don’t do this exact step you will have no chance to get viral with your AI influencer it will be dead because Instagram are very careful with like adult content type of influencer so you have to pay attention if you don’t want to get shadow banned restricted more than that if you want to attract the right people because if Instagram only show your content to so 40 years old alone guys they will like comment and follow more and because Instagram will see that the only few people they show your reels or your content to have a high engagement they will be like oh that’s really good content so I will like show it more to more people like that if you don’t do this process to make Instagram understand who you are and what type of creator you are and they show your content to some 80 years old girls 20 years old girls some feminist or even like some kids these type of people won’t interact and for Instagram because people when they actually see your content they don’t interact it means that you’re a bad creator and they won’t help you so it’s very important to do this you can do exactly the same thing for friends and friends is very powerful because it’s actually affiliate to Instagram create the friends account only from the day five of your Instagram account and from the day 5 you have to create the account from IG and that way you will have a boost because actually these two social medias are affiliate by meta so the week 1 I want you to do five interactions per day maximum so what are these type of interactions first I want you to follow five guys you know how that works you know the type of guys you have to follow if you only follow some 18 years old or 19 years old guys friends we show your content and recommend you to similar profile so try to only follow some 13 or actually 40 years old guys I want you also to post one friend per day and don’t worry I will also tell you about how to actually make good friends and then you can comment on 10 viral threads this one is really really important so let’s say that there’s a really pretty girls who made a viral threads okay with a lot of people who are commented write a comment for example let’s say a girl a girl said like yeah guys but like what are you looking for in a girl you know and there are like a lot of reply and comments and like you can put a shocking comments to try to steal some uh visibility some views so you can comment something like I know what they want they only want to see that and you put like a very provoking photo of your AI influencer so that way when people will like go in the comment section they will see your reply and they’ll be like wow who’s that crazy girl I will check her profile and only by using that it’s called sniping friends I will show you some example you can gain a lot a lot of visibility but do it only 10 times from the rig too you can do 10 interactions per day so basically you can do all of these actions I saw I showed you before and double it on the rig free you can triple it make 30 viral friend sniping a free friends per day and 15 follows unfollow and on the week 4 you can make it even more that’s how you should work on friends because if you don’t do it and you just start to do a lot you will get considered like a bot and not a real user so you will never be able to do this much interactions on your friend’s account even like in 1 year so that’s why you have to do this process otherwise there will be impossible for you to do so much things and also you will not get shadow banned or restricted actually so now I will show you some example on like how to do some Instagram reels and some example of Fred so now because you followed every single steps of the warm up I’m going to show you what type of content exactly to post if you actually want to generate some sales and to attract some customers because there’s so many air influencers who post content get viral get million of views but actually if you just post good content who get viral this is what would be happening even myself when I see a beautiful girl while I’m scrolling on TikTok or Instagram I will just scroll see a beautiful girl I will like and I will continue to scroll right so you will just get views and views don’t bring money so that’s exactly what type of content you should do and it’s so simple so first go on your Instagram accounts and go on post a real and I will show you two ways so the first way is to post a reel using face swap and the second way is to post a reel using 100% AI generated video I will show you both so first step is face swap content okay so let’s say I was just like posting this reel like that ladies and gentlemen the winner of H Q Hugo Boss Model of the Year Miss Bella Hadid if I just post this I can get some views but people will just see it and that’s all right so what we’ll do is I will add a text with a hook and a call to action so actually I will give some instructions to the 40 years old guys who will see this real so I can have more leads of customers so what I want to do is click here and you will click on other text I already prepared like a text just to show you okay now I prepared my text okay but right before I will change few things first let’s change the police and always use this one classic one okay and every time you’re posting real you have to post exactly the same settings otherwise you won’t have results okay and then I put this one okay not this one not this one not this one this one so it’s really easy to read and I don’t want to put it too big or too small so pay attention to that let’s say something like this and now what will happen is Joe I just added this text but people will have a lot of reactions some people will agree and some people will disagree okay and these guys because it’s a beautiful girl who said it they will react okay and comment the real and the more you get some comments the more Instagram will be like oh there’s a lot of comments there’s a lot of interactions so it’s a good content actually I will show it to more people and because I said old man where are you that real will help me reach the good qualified people I want to so there’s a higher chance that I will have more older men who will see this reel so that’s why you have to put a text and then you can add otherwise a call to action and a call to action you can put it at the end of the reel or you can actually put it in the comments right I will show you you don’t have to do like me and put like the call to action at the end of the video basically you can just like pin a comment send me your application in DM only if you’re over 40 years old for example to next like I told you okay send me a DM to see my secret pic only if you’re over 35 years old that’s one example of what type of real you should be doing and that way you’ll be posting your real and not only you have higher chance of getting a lot of views and being viral but also that will generate so much comments and also you will have to do nothing and you will get a lot of requests and a lot of DMS so then you just have like to answer to all of the comments and to answer like to the people who want to see a secret pic you just send them like a good photo and then you can start chatting with them to basically go on the next step selling the content and making money right but if you don’t do this and you only get some views that will bring nothing actually for your air influencer so that’s the whole difference ladies and gentlemen so for this one it’s 100% AI generated right so what we do is like try to find a trend music I just use an example I showed you before don’t forget not too small and not too big I can show you my surprise outfit for on Next Date for on Next Date by DMS and that’s basically the same thing you know I just have a great hook people want to watch it and then there’s a call to action so guys I’m not just like seeing this and shaking the profile or following but they want to say the DM you know we want like the guys to be invested in that that would be much more easier for us like to sell some content and also by doing all of the things I’m showing you you’re actually adding like so much personality to your AI influencer and that’s the best because we want the guys to fall in love you know just being beautiful is not enough oh I can instead of I can show you my surprise outfit you can say my secret outfit no so the guys will send DM hey uh tell me what’s this beautiful outfit you got for me you know oh how can I manage to be able to see that uh next outfit you know so much guys will respond that we answer that I know how it works you know and then just post it that’s all tell me in DM where you’ll take me for a date and I’ll show you what I’ll be wearing great the guys uh want to know it and it’s completely free so all of the guys want to see it they have nothing to lose it just take 2 seconds to send a DM you know so that’s what uh you should be doing ladies and gentlemen the winner of GQ Yoga Box Model of the Year Miss Bella Hadid ladies and gentlemen the winner guys I actually created a group where I’m sharing all the daily trains I’m using for all of my air influencer so you don’t have to do it and I can tell you my team is taking a lot of times to try to find each day the most performant trends so you can automate your air influencer and making sure you go viral but also like having good qualified people and generating some sales so you can also attract qualified traffic and become viral every single day with your AI influencer I will let the link in the description you should definitely check it to actually create your friends account so you want to do it from your profile on Instagram and by doing that your first friend will get a boost a massive boost so it’s really important to pay attention to your first friends and to really like take social media seriously I just did my first friends and the thing about friends is it’s not a short form content okay so it’s not about like watching some content and scrolling it’s actually about reacting to what other people are saying and sharing your opinion okay a bit like x uh Twitter before so right now I just made my first uh thread okay by following all the warm up you have to be very precise what you want to do is uh actually sharing some photos and asking open questions or just to create some debates okay so people are willing to share their opinion and everything so for example I share like a very beautiful photo of like an essay where are the singles feel like everyone for a good relationship except me and you will get a lot of guys who will actually answer oh yeah I’m I’m here if you want okay so you get a lot a lot of answer like really especially for like the first thread because you will get that that massive boost can discuss on my IG and then you just share like your Instagram because some people will not go like on your profile and click on the Instagram if there’s a good feeling like this okay people we actually see that comment and that will be the call to action because basically the main goal of Fred’s is sending the traffic you’ll get on that app on your actual Instagram you don’t want to sell content on friends the only goal direct the people to your main IG we can discuss on my IG if you have the same problem okay we can discuss on my IG if you have the same problem okay so if you’re also single just send me a DM and just by doing that guys we go crazy like think about like this opportunity how many guys have this opportunity you know especially when they see that there’s not a lot of comments because actually your friends is new so they will jump on this opportunity and you will get like lot a lot of follower that’s the type of things we want to do and don’t forget guys if you guys want every day to have the best friends okay and even like the best text and go to actions for your Instagram reels I created a group where I share daily the best trends for AI influencer to make sure that you will get viral so you don’t have to lose some times and you can automate your AI influencer look I just downloaded Freds okay and look on my feed page I have like a girl saying I want a boyfriend I want a male best friend okay and it was like yesterday literally and look at all the comments I need a female friend I want a baby girl I need a girlfriend so what you can do actually that’s why I was talking like about like doing some sniping okay on viral friends this one is not viral okay but basically when you follow all the warm up you can comment on all of these threads and you can do it like so much time if you follow exactly the warm up so I just have to say look I want confident guys it’s hard to only stay friends and just post it and look at all the people who are replying and then they will check because I have a girl’s name right they will check my profile and just follow and look I need a boyfriend age don’t mind look there are so many opportunities I want one also but older than me I’m 21 and you just continue you know and look there’s so much opportunity you know guys they just like to answer and to comment like all of these girls like friends right so it’s really easy look another one I want a boyfriend I just have to write exactly the same you want really like to try to find like some viral friends or some trends and basically to just like comment on it comment on it and you will gain so much followers even for the first month you can gain 5,000 followers only by doing this doing this at the start of your Instagram accounts and bringing the followers to your IG will bring a massive boost for the algorithm okay and you you will find like very very fast your first customers and we’ll see just after how with one customer you’ll be able to sell for hundreds and hundreds only with one client I will show you my way of doing the sets don’t worry but basically that’s how you want to work and only by applying these two social medias it’s enough to make 20 k per month with your AI influencer and I created a group where I share daily the best trends for your AI influencer so you don’t have to do this job saving you so much time and automating this job for your AI influencer making sure you’ll go viral every time you’re posting content so you definitely want to have a look I’ll leave the link of the group in the description because if you want to sell as much as possible to the same traffic you have to make sure you are squeezing all the juice from the lemon I will show you an example of an air in France I found on my feet which have like crazy traffic so I don’t know like the owner of the account um I don’t know like if you will see this video let’s just analyze so I can give you some advices on the conversion not about like the traffic or the amount of people you’re trying it’s about the conversion rate right huge AI influencer like hundreds thousand of followers like honestly like uh the guy did a great job okay with the algorithm and was very like consistent okay he put like a lot of works but you will see that it’s not because this AI influencer has a big traffic that she’s making the most of money from this traffic and I will show you why and also I will show you a very great example I made myself so you can understand and also apply it even you if you’re like the owner of uh this account so first the bio that’s really not what we want to do in AI with them or not feel like a real person first let’s check about the reels right because they make a lot of views so you can see that all the reels are very different from what I just show you before right so there’s no co to action no hook and look for 3,000 likes he has less than 200 comments for 3,000 likes that’s so so low what’s happening is a lot of guys are seeing this real and then they just continue to scroll they are not commenting and they are not following he has like huge huge amount of views but so less comments and so less DMs so you’re wasting a lot of these people and a lot of these guys and a lot of these viewers could be customers only if we from like this real so look a lot of people who are in love Germany English Spanish I don’t know like which market the owner of this account is trying like to attract but basically he’s not like focusing one market and that’s a problem because all of these people are not like one particular profile I don’t know if the owner of this account did like uh the process of warming up so he can like target particular like people but the thing is also because he’s attracting so different people also like very different like language speaker like Russian uh Germany Spanish English French people a lot of these guys are buying okay the content so how do you manage to do the sales if you have to speak multiple languages that’s the problem that’s why you should only focus on one market but if you don’t write and put some text it will be difficult to target that typical market if you don’t make some content actions like I just told you before you will get views but no customers you won’t maximize the conversion from viewers to customers if he had like more comments because he was like putting like some call to actions uh Instagram is encouraging big like interactions reels he will get even more views like 10 times more really this reel of like 50 k if he was like adding some call to action he should have at least between 7 to 800 comments and look he only has like 200 you don’t want as an influencer like as a girl on the social media to send the first DM because it’s very unnatural or you can do it if you found a reason to that’s why like I recommend to put some call to actions so you tell the guys to put a specific comment put the emoji flame if you want to receive my free gift and then the guy put it and I just have to send him a DM and it won’t be weird and then it would be easy to just discuss with this guy so yeah you have to pay attention to that also there are zero stories highlights so that’s why this AI influencer doesn’t feel like a real girl and you feel no proximity actually with this AI influencer you have to use story highlights because real users like guys uh myself and even girls they create highlights because there’s no text you feel like it’s just like an image her physique and she doesn’t have a personality and when you want to make big amounts of money in AI with them you have to understand that you’re here not to sell content and to create content but you’re here to sell relation so you want with your Instagram to create a bond with your audience so they get attached to the girl you have to create a story a personality to your influencer like what does she do in life what does she likes to eat what’s her hobby what’s her passion what does she dislike no what does she hate you have to know everything and to write like a really really good story like you feel no personality it’s just like a physical picture that’s all let’s see like you know what let’s see the story clearly he posted the story like uh today to promote to promote the link for the sales okay so you just have to click on this link and then you will go on the platform so the customers can buy the content it’s really great to daily make a story to promote actually your link it’s a huge mistake this account made why basically and I don’t worry I will show you exactly how to do it the best way okay how to promote your platform okay to make some sales you never want to make a story with a link on the story because the way the algorithm Instagram is working he hates when there are some links in the stories okay why because Instagram don’t want the users to leave Instagram page okay so he doesn’t want like the euro the users who are actually on IG leaving the app to go on another website and when you put a link like this on the story they detect it and you will make less and less views with this story that’s the thing and I don’t know like if the owner of this account will see it but I think like from the 400 k followers you have you are making very very few views compared to that with a story when you put a link on it or just try it so when you want to promote the link the way you should do it is like just use like a story okay a video and then write a text exclusive content if you want to see it DM go or DM the emoji heart so the guy you will just have to send a message and then you can for example like use many chats so every guys who like send like the flam emoji the fire emoji they will receive the link in the DMS okay and you will have like bigger views with your Instagram story or otherwise you you can say for example like exclusive content if you want to see it the link is in my bio you only want to use text Instagram is not helping you because they hate when people are leaving the app I will show you what’s the main difference when you want to optimize all the conventional rate I just made like a feed page so you know how it works um so it’s not like a warm up account or anything but like first what you want to do is for the username like I told you but real name a fake family name okay shortened version just like real girls do okay like I did in the bio I like to say first the age a lot of guys they like very young girls so I say 21 years old and basically what girls like to do is you know sharing the like a Astro sign because I like to create next door girl and I want them to be students because we will use some arguments during the sales with that notice how I didn’t say something very weird like oh I have a private garden or yeah I have a naughty secret do you want to discover me or like yeah I’m very nice but I can be like a very naughty girl if you want I don’t say that type of thing okay it’s really really weird I will show you how to actually use a good feed page to skyrocket your conversion rate and I will show you like how I did all of this content because basically you don’t want to copy paste your reference model no you have to optimize everything and I will show you how to optimize everything especially if your influencer is new you have to do it that way and you will see that even with very small amount of traffic you will find for customers very quickly this is 100% generated pictures okay so look look it looks very natural right so how did I do it basically I just took this picture and generated a 10 seconds video and then drink that 10 seconds video I just I just chose few frames okay so it looks like a perfect carousel but it was actually a video I cut multiple times that’s all okay so this one is face swap this one is face swap this one is from Pinterest okay I found it on Pinterest because you don’t only want like to show like the girl or she will looks like very narcissistic right no we just want to create a feed page on a student girl okay uh she’s not too rich she’s not rich she’s has a great personality you want to talk to her and she’s having a life she has a life of personality she likes to go to the beach she’s a bit fun she likes to make some art okay that’s what I’m talking about that’s what you want to do okay you want to show her life you want to create her life she’s not just a model and a really beautiful physical girl no she’s actually a person okay and she has a story and she’s interesting like basically you want to discover her that’s the thing this one is face swap face swap face swap and then this one is 100% AI generated okay and then like this one was on Pinterest so you know you have to create some stories so you have to tell a story so this girl as you can see she has a personality she likes doing some picnics painting it’s cool not all of the girls are like that this one face swap face swap interest and facewap this one too this one also is facewap content okay and this one is 100% AI generated this one too this one too this one too and this one is on Pinterest you have to find a logical story okay for example like I wanted this girl to be like very natural like simple things okay she likes to read she likes to go to picnic she likes to paint so of course I will not put uh a picture of her playing video games doesn’t make sense right and basically what I did for this is like the same thing I generated a video like I showed you before and I chose like few frames that’s all and you it looks like it’s a great post a great carousal even if there’s not a lot of followers look at how much you can get you feel proximity with this account just because of the feed okay and you can immediately get to known like the personality of the of this girl and if this girl looks interesting or not okay and that’s changing everything okay because basically you feel like this girl is sharing some parts of her life with you and you will also want to be part of that life I’m not posting like random pictures okay I’m sharing moments and pictures where these places or these moments could be dates if you look at this okay or you look at this or you look at this or you look at this basically when the guy he will come and he will see this he will be oh okay so if I go to a date with her like at a picnic at a coffee shop at a restaurant or like if I go to her place that’s he can project himself I only choose pictures where like the guy can imagine himself having the date and he’ll be like oh if I go like on a date with this girl she’s be looking like that and that’s changing everything you don’t want your reels to be like on this feed page because this feed page is here for the conversion okay you don’t need like to have a feed page to bring traffic okay no because to bring traffic you have to go on the reels right there you want to make some collaborations like that this is like uh the the post I showed you before and I took like few frames the guy he’s scrolling he see this he’s going right there on the page and he’ll be like wow and now he will get like interested he’ll be like wow oh let me check oh wow she’s beautiful oh I want to date her I want to go to a picnic with her she’s been looking like that wow that would be so great like really I want to do it you know and then he will like stalk a bit and you need to use the highlights but like don’t make these mistakes of making many many and a lot of highlight stories you only need 3 only need 3 having more than 3 highlights it’s useless no guys we like take the time to appreciate thousand and thousand of stories okay so you don’t need to put all of your stories into the highlights only use 3 and noticed I didn’t put the link in my bio because first it will be flagged if you don’t follow correctly the warm up and 2 putting the link in the bio you will have a very very bad conversion because a lot of guys who can get interested in that girl they will check their account and they will see oh you know like this comment only fan detected opinion rejected that’s the same thing the guy when he see like the link in the bio he’ll be like oh no she just want money okay I don’t care about her I forget about everything her personality what she told me everything I just forget the best way to do it actually so you only need three highlights one highlight to promote your link for the sales one highlights the name can be like patient my passion my hobbies uh lifestyle and a third one which is like me so some pics of the AI influencer I will show you how I do it I like to put to choose like Ariana Grande I like to choose like Doja Cat Billy Eilish you know just like all these like uh girls uh music right SZA so all of these are like Facebook content and then I just like added this uh at the end uh with Pinterest okay and I added like the the heart emoji basically that’s all that’s all for like the me that’s all you don’t need to put much more okay the guy just by seeing few like this we like be able to make like his opinion and then look all of these highlight are like Pinterest content okay easily just showing her her life she’s having a life because an air influencer right she has like a real girl she has a life she has some friends she goes out she goes to restaurants maybe like she likes animals I have to show this no one is showing like her life or his life even mine on my IG and only like just showing like his face it’s really weird okay nobody is doing that it’s time I chose some stories when you can actually sing the girl’s hand or like her hair or like some part of her body you know so it’s really important to do that for example if you want to show her like at a picnic okay or like at a forest you have to be able to see at least one part of the body of the girl okay otherwise it will look less real let’s say that okay if that story was only about like the girl with the black hat it will not feel just like as real okay you know it’s just that because like you can always see like the hand of the girl it just looks more authentic all the stories I made was very logical okay like I told you you have to know the story of your influencer mine I know she likes picnic she likes going out with her friends and she like cats like I told you before right there okay so look I just do exactly the same and look she’s like outside with some leaves on the floor and I just put like something like follow your dream just to like uh let the guys know like how’s her personality like she’s very optimistic and you don’t need like a lot of content to just show this you just have to be good at storytelling it’s all all of these are like psychological finally to promote the link you know that’s why I put like a highlights with like um a link emoji and like a chili pepper and then you just do this look you know just a link spicy people are intrigued they want to check and and you know it’s done in a way uh like the previous like any friends are like hey look I have a link in the bio go there go there I have a link in the bio you saw my wrist I have a link in the bio no by using this way especially when you’re beginner okay and you don’t have a lot of followers they feel like they discovered someone which is good and then they get like to know that girl by seeing her likes her personality what she likes what she likes to drink what she like what type of cards does she like oh wow she likes also painting that’s really cool I like painting I want to see I want to learn I never tried painting and then just like they were like they were in trained they were like oh what’s this thing oh she’s seen content and they feel like they found like a secret like something they just discovered you know it’s completely different all of these like psychological like um steps we have a great impact on the conversion okay and on the attitude and how much the guy is willing to spend and then that would be like the place where like you’ll be sharing your VIP uncov the best AI platform to sell content and another bonus if you want to make your conversion rate explodes and to be sure that some guys they will go on your platform like they just have to okay they have no choice add all of your followers to your close friends and that’s a very powerful technique there’s a secret tool I use for my influencer allows you to do is basically let’s say I have like 1 thousand 2 thousand 5,000 10,000 followers I can like pay with this app and this app will put like my 10 k followers into my close friends list and that’s very powerful so now what I recommend you to do is you have to increase okay your number of leads with the same amount of traffic is at six PM you can make a close friend story and all of your followers because of that tool will see that close friend story so imagine now okay basically and I will show it to you my account is new because I just created uh for this example but like imagine you just followed a girl not too much like followers doesn’t seem like a girl who’s trying to make some views because she has a very nice like feed page and you also discovered that she’s also sending some content and she looks great the great personality you want to talk to her okay and then you see on your feed page a close friend story and the thing is close friend story let’s say that you follow a lot of accounts okay with some stories read stories all of the people who are putting like close friend stories will be at your left left left left it would be like the first stories you can click on that’s the thing and look you’re a guy you’re alone you’re 40 years old and you see pretty girl who put you in close friend story these guys never had once a pretty girl put them on the close friend story they never had that before so they will just click on it and also they will feel special and unique they have a special relation with that girl and that’s why they will be clicking on that girl story and know like all the other girls they are also following because you’re different and they will feel a stronger connection and proximity so then you just have to click on it and to do like to post like a very erotic photo okay so not trash but like very erotic for example like your young friend says going out from the shower and then what you want to do is like don’t put the link okay inside and don’t try to say like link is on my bio or what because you only have 200 followers so people will be like what are you trying to say you have 200 followers so what you can do is like put a question yeah I just got out of the shower um I am better with or without makeup and then you will get so much guys will be answering they will say yes or no but like they will answer or they will send like a heart emoji oh you’re too beautiful you’re you’re beautiful anyways crazy crazy amount of DMS and DMS for us is or sales or future sales so your goal like I told you is to post some reels to provoke some DMs so you can get some future sales or you can use this story technique to also generate some DMs and if you do that when the guys they come back from work at six PM they’re alone they’re in the bedroom they don’t know what to do they just want like to relax you know and they see a pretty girl making a close friend story and then they answer and then the girl is actually answering to that reaction they’ll be like oh yes I said I like I did the first move and that worked like she answered me okay let’s continue you know he will feel proud and he will just give everything and then we will do the last step which will be the search and I will exactly show you how with this type of profile I generates big amounts of sales and only one customer win brings hundreds if you’re good at sales actually also I didn’t do it but like let’s check up right there so her first account you don’t want like to put the link in the bio you want to redirect on IG okay really important they are like 0 0 call to actions it’s bad like with all of this traffic you can literally like make 10 times more like qualified people views like that would be crazy and even like this influencer I just show you now like this one she could make she could like make triple like quadruple like even 10 times more like her actual income just by applying all of these devices I’m going to tell you how personally I’m doing this sale and you will understand why can make so much money so don’t forget to use this technique and to post at 6 p m 1 provoking close friends story for example your air in friends are getting out of the shower and the taste like I am pretty without makeup or do you prefer with it will generates a lot of reactions from guys and you can start chatting with them so they go on your VIP uncov to buy your content and the more DMs and chatting you do the more clients you will have so how to put your AI influencer on onlyfans guess what you just cannot meet Lisa she spent months building a stunning AI model had funds and was making bank until she got banned overnight her passive income was gone why she didn’t use the right AI platform but here’s the thing you have to use the best AI platform and verify the account of your AI influencer and I’m about to show you how you learn the best way to sell AI content legally and without breaking any platforms rule so the best platform if you don’t want to get banned because you cannot like pass the verification on onlyfans because it requires like a video of yourself it’s totally forbidden so that’s why don’t believe guys I told you again okay some gurus for showing you only fans dashboards and who are talking about AI influencer okay AI WFM it’s impossible also I don’t use Fanmail because on Fanmail there’s just like too much cases where the fans are getting frozen okay because if you manage to have like one guy who’s complaining or like who thinks that your influencer looks too fake or for example you’re using some face swap content you will just get banned you will just get your phones frozen okay that’s the thing so we tell you the best option for your AI influencer the best AI friendly platform is actually uncov but not a regular uncov account if you manage to create a VIP one you just cannot get banned there’s a very cool bonus basically you can earn like 10% of commission of all of your fans so if your fans are actually spending money on over models over air influencers you will earn 10% of all of these sales over sales AI influencer making with the same funds you have how to create it does the link I use to create a VIP account okay so I’m just telling this it’s not my iPhone link just have to click on it I would like the link under this video so you can also access it for free click on it and then you’ll arrive on this website and you have to click on become a creator wait for this connect with Google for the gender okay you can just put like your real information like we don’t care about it put your real first names okay uh your real email put your real password the only thing is for the username use a fake uh girl name my influencer name is Aubrey I will do exactly the same thing so I can say for example like Aubrey Aubrey Sweet you can use this this time like a small username it’s okay and also I am an undercover okay so I want to be a creator then you can say that you’re a content or model I’m amateur of course and then what you can do is like right now if you don’t have the Instagram account of the iron model it’s okay you can just put like one Instagram and then you’ll be able to change it later but your real information is really important because you will then use your ID so you can like withdraw the money and all of these informations and the customers will not be able to see all of these informations only the username and then choose a profile picture it will ask you I am an AI or not they will detect that you’re basically an AI okay so we don’t care you can put it and say that you’re over 18 years old of course in which currency do you wish to be paid so for myself I’ll say USD because I want to target the US market because American people have so much money okay and we it will be very easy to find very very big spiders so I just say that so yeah for the price choose something between like 5 to max like $7 okay I will put like 5 right there and then I write a very like short description okay so now you just want like to check your emails to basically verify your account do the first thing post a public um photo you can just upload like first photo and then right there it’s so well done even for like the feed page you can upload like 20 photo at the same time it’s great and they have like their own cloud so you can basically put all of the content on and it’s really similar to onlyfans like really it is such a great platform especially like you will have zero problem even for the long long terms and even like in 10 years with your AI influencer because it’s basically made for that I would say sexy and then I can create a collection then I just post it and basically you want to only post on your feed very very soft content otherwise it will be difficult to sell some photos without outfit because they will be able to say it for like $5 okay so it’ll be difficult to sell it for $40 $70 $100 that’s why you have to really set up the price put the right price on like the right content then you have to post exclusive content so we want to post a lot of exclusive content so basically what is exclusive content when the fans he will like come on the feed page he won’t be able to see the exclusive content it will be broad for him he will have to pay for the subscription to be able to unlock all of this content so the much more content you have on the feed page exclusive content and the more chance that when the fans he will arrive he will be like wow there’s like 40 20 40 photos blurred and I just have to pay $5 and I’ll be able to see all of this content yeah that’s actually worth it and he will do it but if the fans he arrives and he only see five photos he’ll be like $5 and I only see five photos have to think about it but we don’t want that we want him to be to just pay the subscription take that and look same thing it’s sexy right there it’s also the feed page but this time it’s like exclusive content and if you want all of the posts having like exactly the same descriptions the same category basically you want like to write some small descriptions so the guys really want to see like that broad content okay you have to teach him you can click on save these settings for the next post and it will apply like all of these um options for all of the posts now just wait and that’s cool to be able to do some bulk post like that it’s so much faster like I told you for the exclusive post is to only have soft photo so in bikini or lingerie okay you don’t want showing too much skin okay no because that type of content okay NSFW will be on the private cells okay that’s how you should do it okay so now that you have everything and it’s cool you just need to verify your account go on account verification check and you only need like to check your email to put like right there your bank details and you just have to put like your bank account on it and the money will directly go on your bank also right there you only need to enter your real ID okay or passport and it can take like up to 24 or 88 hours for the verification and be sure like when you do the verification so you don’t have to wait and to do it again to take a really good quality photo of your ID it’s really important so yeah we did all of this only need to check the email the bank and the identity let me do it right there you can like you can create also like promotion code if you want so I’ll do the verification myself and then I will just have to wait it’s been 48 hours so I got my account verified it’s so so cool and by the same time I was doing the warmer process I was able to find my first client so I just wanted to show you how it went and because of that I had very very little sleep I really wanted like to make good sales with this first customer and to show you like actually how it works and what you can have in terms of results so right now we are the 10th March uh it’s like uh 1 p m and I work during all the night maybe I went to sleep like uh as you can see at 5 a m this morning but I will show you how it works so I found my first customer okay so Ricky and let’s actualize so I can show you how it works yep so as you can see in my total amount I was able yesterday with this guy to make like 276 I wasn’t able I want to be honest to find another customer but I wanted like to tell you what’s really possible so yeah of course sent a free stuff he just bought like a photo for $40 send another video for like 130 and right there I said another content for $170 and he bought it I wanted to continue to sell some content but the guy was like just not answering anymore so I think maybe he finished he did all his things and like yeah but I still like write some good messages so the guy wants to come back again the goal is every customer we want them to buy like every week or every month right so that’s really cool I was able to make that amount and of course also paid my subscription so it was $5 it’s not a lot and I want to explain you how to exactly do the work chatting how chatting is working I will show you this don’t worry the subscription is not like you will not make a ton lot of money with the subscription okay it’s like with your exclusive post like right there and as you can see um we started chatting like at 4:00am and then 40 50 minutes so of course I was like on my phone on my computer if you’re scared of sending content you won’t make it and even at the end like I was continuing because maybe I could like make more money so you have to do that if you’re scared to send content and put big price you will not make money but like all the time I put building the Instagram account doing the warm up process and everything follow and unfollow it’s worth it because now I made like coupon hundred of bucks and when you’re a beginner it’s good money really so now just imagine if you have lot of customers everyday you have five guys like Ricky who’s actually buying some content from your influencer but for that you have to be good in sales that’s why I will give you like some tips I have because I cannot show you on YouTube the content I actually sell all the sentences I’m personally using what to say even if the guys doesn’t want to buy anymore so you can make that sale happen so if you want to grab the script I’m using and also the script for myself I will let that in the description of this video okay so Instagram chatting for converting on uncovip and the art of chatting is about understanding your client you need to adapt your approach to each situation to maximize earnings from your fans understand who you are interacting with to be a strong rapport and emotional proximity you’ll encounter various fan personas that require different strategies and the key mindset is to focus on building relationships not just selling content if fans wanted just adult content they would visit free adult sites and if they are here with you they are looking for connection during this emotional process that you need to engage them now that you have this mindset let me show you how to apply it step one qualification like I explained you before many time wasters exist on Instagram so your goal is to eliminate them quickly ask the following key questions to understand their situation and ensure that you ask this in order as the first two are most critical location filter out prospects from low income countries focus on users with the capacity to purchase content then the job determine the purchasing power by asking about their profession avoid directly asking about income to prevent appearing like materialists are like a gold digger so use this information later to set content prices accordingly the age prioritize targeting boomers 40 years old and plus who often have higher disposable income and could be looking for companionship if a fund meets these criteria they are more likely to convert but always cross check with all your information introduce your side hustle subtly to judge their interest for example as your air influencer you can say something like I work as a waitress or just like choose a profession but I have a small side gig to earn extra and if they ask what it is you can respond with playful suspense oh no I’m scared you will stop talking to me can I trust you I’m a bit ashamed to say the thing is otherwise to justify sending content for example to pay for studies to save for a car to afford my first vacation like I never took some holidays before a portrayal of like an innocent slightly naive girl who doesn’t fully grasp where the conversation is heading this strategy helps bring emotional evolments before introducing sexual topics and handing early sexualization if someone crosses the line too soon gently redirect and emphasize the need for trust encourage conversation about him instead the Step 2 the girlfriend experience once a fan shows potential shift to relationship building your goal is to make the fan addicted to your model to your AI influencer so learn about his passions interests and gradually steering the conversation towards intimacy using the push and pull technique make suggestive hints but quickly return to casual conversation allow him to escalate the topic naturally timing the cell don’t push for a cell if the timing isn’t right if he’s busy at work focus on active lead and return letter to the others be prompt in responses delaying for five to 10 minutes can result into losing a potential sale that’s why you have to be fast and step 3 the sales window once you identify that the fund is ready the window of opportunity opens you have a maximum of 1 hour to maximize the cell so how to proceed first personalize the chatting experience according to his preferences fetish intellectual based on what he likes and then introduce media in stages start with a soft free teaser and ask do you like this or would you like to see me more and then send the private platform link of your VIP account uncurve adjust pricing based on his profile but maintain the staged approach to gradually increased prices and then use sensory descriptions to heighten arousal oh I wish you could feel how weight I am I wish you could hear my little mounds and keep the suspense wait for me don’t come too soon let’s play a game the one who lasts longest wins and if he refuses understand his hesitation is it timing financial constraints or uncertainty then adapt accordingly I will teach you some techniques to increase sales there’s the anti selling so position yourself as indifferent to the sale to create desire example I’m not sure if I should send this I’m quite shy or are you sure I can trust you or scarcity make him feel special and that this is a unique offer just for him I don’t usually do this but I feel a special connection with you and then be sure to have a good copywriting you have to be a storyteller so not just a seller tailor messages with proper spelling punctuation and tone to make each fan feel unique labeling after a sale reinforce his confidence and masculinity avoid making him feel guilty we want to provoke the cerebral syndrome so say things like you’re really daring I admire that or wow you’re so confident you can spend that money and you know how to treat a woman it’s impressive and to handle some objections you can always respond positively using yes but statement find solutions to each objections and here’s the top mistakes to avoid never promise a meeting okay it’s forbidden to promise a real life meeting play the maybe someday game to keep hope alive without lying and then you want to avoid conflict over price avoid arguments if you try to negotiate respond with emotional disappointment not anger for example you can use Elephant Labs to create an Odeo message saying I thought we had a connection together and a connection that can be powerful and then my final advice is now you have all the tools to master chatting and convert effectively but the key is practice so don’t hesitate to apply these strategies and refine your approach through experience so like I told you I cannot like tell you everything about my script says and like show you like some sentences because I just cannot say it but if you actually want to have all the sentences I’m using van for the sale to get more money so every customers are becoming addicted I will leave my personal like sell scripts in the description so you should definitely have a check so now you have the complete blueprint you know how to create an AI model attract the right audience and sell legally without getting banned nolege or course salon won’t make you money taking action will and that’s exactly why you should take a serious look at AI Vault AI Vault isn’t just another online community or course it’s a system with implemented actions that accelerates your AI influencer business and takes all the guess work inside your get the best daily viral content trends stop wasting hours in stressful research with the best daily viral content trends you can simply copy what’s already proven to work no more guessing no more anxiety just consistent high quality traffic and wealthy customers coming to you on Instagram and friends this is the ultimate shortcut to effortless growth and unstoppable engagement you also get my pre made sales scripts and converting offers you’re struggling to close big deals and induce massive income goals it doesn’t have to be complicated with this powerful done for you sales scripts you know exactly what to say to seal the deal every time no more stress no more second guessing just proven strategies that turn conversations into cash this is the shortcut to making huge amounts with one client thoughtlessly so you make sure to make that money and you know that you are not losing or wasting some money so you know that you’re taking 100% of the money of each customer and you are not actually wasting some customers you also have some exclusive resource for your AI generating content and legal contracts to have access to ready to use AI content with the legal Protection you need for your business and in bonus the OFM Growth plan because once you hit 50 k per month with your AI influencer this is the plan to add this plan adds an additional income stream seamlessly I’ll show you the best way to hire an onlyfans model and use the same system you build with your AI influencer for her and take your commission on the total income right now you’ll learn but you can change that and it’s not only about you and me join a real support community where sales AI entrepreneurs connect share feedback and help each other scale faster no more guessing no more struggling solo just real conversations real strategies and real growth and this is where your next big breakthrough begins just imagine waking up to automated income from your AI influencer because you actually implemented the AI Vote system why the community keeps you updated with the latest strategies and tools that’s the power of AI Vote if you’re serious getting into this business you should click the link below and Jonas you don’t want to miss out what’s inside guys I keep my promises and I know this will become the biggest AI OFM Entrepreneurs community why because I’m giving you my best tools my deepest experience and my full dedication it’s not just about me it’s about you the motivated and hard working people ready to build something extraordinary from the start of this channel I made a commitment to share the best and only the best of my daily business journey so together we can grow learn and take AI or FM to the next level I truly hope this business helps you change your life like he changed mine years ago but remember everything you learn in this video is nothing compared to what’s coming next and the best is yet to come thank you for being here at the end of the video this is Nok sensei and don’t hesitate to subscribe and drop a comment if you want to support this channel and our growing community let’s build greatness together

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
April 11, 2025
Building a Full-Stack AI Chatbot with TypeScript, Express, Vue, and OpenAI
The source material provides a comprehensive guide to constructing a full-stack AI-powered chat application. It details the technologies employed, including Vue.js and Node.js, TypeScript, OpenAI’s API, Stream Chat, and a Neon Postgres database. The text outlines the development process, from setting up the backend API and frontend UI to implementing real-time chat, AI interactions with context, and database integration for persistent data. Deployment to Render for the backend and Vercel for the frontend is also covered, creating a complete development-to-deployment workflow.

Study Guide: AI-Powered Chat Application

Quiz

Instructions: Answer the following questions in 2-3 sentences each.
1. What are the primary front-end and back-end technologies used in this chat application project? The front end utilizes Vue.js with Pinia for state management, while the back end is built with Node.js and Express. Both the front end and back end are developed using TypeScript for type safety and enhanced development experience.
2. Explain the role of the OpenAI API in the Chat AI application. The OpenAI API is crucial for the artificial intelligence capabilities of the application. Specifically, it uses the completions API with the GPT-4 model to process user messages and generate intelligent and context-aware responses, providing the AI-powered chat functionality.
3. What is Stream Chat, and what aspects of the application does it handle? Stream Chat, provided by getstream.io, is a service used for implementing the real-time chat features of the application. It handles aspects such as messaging, user management, and channels, abstracting away the complexities of building a real-time communication system.
4. Why is a PostgreSQL database with Neon used in addition to Stream Chat’s features? While Stream Chat stores chat logs, the project sets up its own PostgreSQL database with Neon for persistent storage of users and chat history. Neon offers a serverless, scalable PostgreSQL solution with features like branching, giving the developers more control over their data.
5. Describe the purpose of Drizzle ORM in this project. Drizzle ORM (Object-Relational Mapper) is used to interact with the PostgreSQL database provided by Neon. It simplifies database operations by allowing developers to work with JavaScript/TypeScript objects instead of writing raw SQL queries, and it’s also used for defining database schemas and running migrations.
6. What are the key steps involved in setting up the back-end project using Node.js and TypeScript? Setting up the back end involves initializing a Node.js project with npm init, installing necessary dependencies like Express and TypeScript, configuring TypeScript with a tsconfig.json file, and defining scripts in package.json for development, building, and starting the server.
7. Explain the purpose of environment variables and how they are used in this project. Environment variables are used to store configuration settings such as API keys (for OpenAI, Stream, and Neon) and the database URL, keeping sensitive information separate from the codebase. The dotenv package is used to load these variables from a .env file into the application’s process environment.
8. Outline the process of registering a new user with Stream Chat in the back end. Registering a user involves creating an instance of the Stream Chat client using API keys, receiving the user’s name and email from the front end, generating a unique user ID (often derived from the email), checking if the user exists in Stream Chat, and if not, using the upsertUser method to create a new user with the provided details.
9. Describe the flow of a user sending a message and receiving a response from the AI. When a user sends a message, the front end sends it to the back end’s /chat endpoint along with the user ID. The back end retrieves the user’s past messages for context, sends the conversation to the OpenAI API, receives the AI’s response, saves the interaction in the Neon database, and finally sends the AI’s reply back to the front end.
10. What is the role of Pinia and the pinia-plugin-persistedstate in the front-end application? Pinia is a state management library for Vue.js that helps manage the application’s data in a centralized and reactive way. The pinia-plugin-persistedstate is used to automatically save and reload Pinia stores across page reloads, ensuring that user sessions and other relevant data persist.
Essay Format Questions
1. Discuss the advantages of using a full-stack approach with technologies like Vue.js, Node.js, TypeScript, and cloud services like OpenAI, Stream, and Neon for building a real-time AI-powered chat application. Consider aspects such as development efficiency, scalability, and maintainability.
2. Analyze the architectural design of the Chat AI application, focusing on the separation of concerns between the front end, back end, and the various third-party services integrated. Explain how these components interact to deliver the overall functionality.
3. Evaluate the choice of using Stream Chat for the real-time messaging features versus building a custom solution. Consider the trade-offs in terms of development time, complexity, scalability, and the features provided by Stream Chat.
4. Explore the process of integrating artificial intelligence into a web application using the OpenAI API. Discuss the steps involved, the role of the API key, the structure of the API requests and responses, and the importance of managing API costs and usage.
5. Compare and contrast the use of a serverless PostgreSQL database like Neon with a traditional relational database setup. Discuss the benefits and potential drawbacks in the context of a modern web application like the AI-powered chat app, considering factors like scalability, cost, and operational overhead.
Glossary of Key Terms
- Full-Stack: Refers to the development of both the front-end (client-side) and the back-end (server-side) components of an application.
- AI-Powered Chat App: An application that uses artificial intelligence to understand and respond to user messages in a conversational manner.
- Vue.js: A progressive JavaScript framework used for building user interfaces and single-page applications.
- Node.js: A JavaScript runtime environment that allows JavaScript to be executed on the server-side.
- TypeScript: A statically typed superset of JavaScript that adds optional types, classes, and interfaces to JavaScript.
- OpenAI API: A service that provides access to advanced AI models, such as GPT-4, for various tasks including natural language processing and generation.
- Stream Chat: A platform that offers SDKs and APIs for building real-time chat, video, and audio functionalities into applications.
- Neon Database: A serverless PostgreSQL database platform designed for scalability and developer convenience.
- Pinia: A state management library for Vue.js, providing a reactive store for application data.
- State Management: The process of managing and organizing the data that drives an application’s user interface.
- Express: A minimal and flexible Node.js web application framework, providing a robust set of features for building web and mobile applications.
- API Key: A unique identifier used to authenticate and authorize access to an API service, such as OpenAI or Stream.
- Completions API (OpenAI): An endpoint in the OpenAI API that generates text based on a given prompt.
- GPT-4: A powerful large language model developed by OpenAI, capable of understanding and generating human-like text.
- SDK (Software Development Kit): A collection of tools, libraries, documentation, code samples, and processes that allow developers to create software for a specific platform or service.
- ORM (Object-Relational Mapper): A programming technique that converts data between incompatible type systems using object-oriented programming languages, simplifying database interactions.
- Drizzle ORM: A lightweight and type-safe ORM for TypeScript and JavaScript, used for interacting with SQL databases.
- Schema: A blueprint or structure that defines how data is organized within a database, including tables, columns, and their properties.
- Migrations: Scripts used to manage changes to a database schema over time, such as creating or altering tables.
- Serverless: A cloud computing execution model in which the cloud provider dynamically manages the allocation and provisioning of servers, allowing developers to focus on writing code without worrying about server infrastructure.
- Render: A cloud hosting platform used to deploy and host web applications and back-end services.
- Vercel: A platform for deploying and hosting front-end web applications and static sites.
- vite: A build tool and development server for modern web projects, known for its speed and efficiency.
- dotenv: A zero-dependency module that loads environment variables from a .env file into process.env.
- CORS (Cross-Origin Resource Sharing): A mechanism that allows restricted resources (e.g., fonts, JavaScript, etc.) on a web page to be requested from another domain outside the domain from which the first resource served.
- Middleware: Functions that have access to the request object (req), the response object (res), and the next middleware function in the application’s request-response cycle.
- Asynchronous Function: A function that can pause its execution while waiting for an operation to complete (like an API call) without blocking the main thread.
- Promise: An object representing the eventual completion (or failure) of an asynchronous operation and its resulting value.
- JSON (JavaScript Object Notation): A lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate.
- Postman: An API client used to test and interact with HTTP APIs.
- Regular Expression: A sequence of characters that define a search pattern, used for matching and manipulating text.
- State (Front-end): The data that represents the current condition or mode of the user interface.
- Action (State Management): Functions that modify the state in a state management system like Pinia.
- Mutation (Implicit in Pinia with Actions): The actual modification of the state in response to an action.
- Getter (State Management): Functions that derive computed values from the state without modifying it.
- Ref (Vue.js Composition API): A reactive and mutable reference that holds a value, used to track changes in data.
- Reactive Variable: A variable whose changes trigger updates in the user interface or other parts of the application.
- Component (Front-end): A reusable and self-contained building block of a user interface.
- Route (Front-end Routing): A mapping between a URL path and a specific view or component in a single-page application.
- Router (Vue Router): The official router for Vue.js, used for navigating between different views or components.
- Composition API (Vue.js): A set of additive, function-based APIs that allow flexible composition of component logic.
- Options API (Vue.js): The traditional way of structuring Vue.js components using options like data, methods, and computed.
- Emit (Component Communication): A mechanism for a child component to send events up to its parent component.
- Props (Component Communication): Data passed from a parent component down to its child component.
- HTTP Request: A message sent by the client (e.g., a web browser or application) to a server to request a resource or trigger an action.
- HTTP Response: A message sent by the server back to the client in response to an HTTP request.
- API Endpoint: A specific URL that represents an entry point for accessing resources or functionalities provided by an API.
- Context (in AI): The history of previous interactions or information that an AI model uses to understand and respond to the current input more effectively.
- Chat History: A record of past messages exchanged between the user and the AI.
Briefing Document: AI-Powered Chat App Development

This document provides a detailed review of the main themes and important ideas presented in the provided source, which outlines the development of a full-stack AI-powered chat application named “Chat AI.” The application utilizes Vue.js and Pinia for the frontend, Node.js and Express for the backend, TypeScript for both, OpenAI’s API (GPT-4) for AI capabilities, Stream Chat for real-time messaging, and Neon Database for a serverless PostgreSQL database. The document includes relevant quotes from the source.

Main Themes
1. Full-Stack Development: The project encompasses the entire development lifecycle, from setting up the frontend and backend to database integration and deployment.
- “hey what’s going on guys so I got a really cool project for you today we’re going to be building a full stack AI powered chat app called chat Ai”
1. Integration of Multiple Technologies: The application leverages a diverse set of modern technologies for different aspects of its functionality.
- Frontend: Vue.js (framework), Pinia (state management), TypeScript (language).
- Backend: Node.js and Express (framework), TypeScript (language).
- AI: OpenAI API (GPT-4 completions).
- Chat: Stream Chat (real-time messaging SDK).
- Database: Neon Database (serverless PostgreSQL), Drizzle ORM.
- Deployment: Render (backend), Vercel (frontend).
1. AI-Powered Chat: The core functionality revolves around enabling users to interact with an AI for information and conversation.
- “we’re going to be building a full stack AI powered chat app called chat Ai”
- “for the whole artificial intelligence aspect we’re using the open AI API uh the completions API using gp4”
1. Real-time Messaging: Stream Chat is employed to handle the real-time aspects of the chat application, including users, channels, and messaging.
- “for the whole chat aspect we’re using stream chat at getstream.io so stream offers sdks for powerful applications that Implement real-time chat as well as video and audio”
1. Serverless Database: Neon Database provides a scalable and easy-to-set-up serverless PostgreSQL solution for storing user data and chat logs.
- “we’re going to set up our own database our postgres database with neon so neon offers serverless postgres databases that you can literally set up in like 10 seconds”
1. DevOps and Deployment: The project covers the deployment process to platform-as-a-service providers.
- “at the very end we’re going to deploy the back end is going to go to render … and then the front end The View application will go to versel”
1. Learning by Doing: The presenter encourages viewers to follow along with the tutorial as the best way to learn.
- “I would suggest you follow along with me I think that’s the best way to learn”
Most Important Ideas and Facts
- Application Demo: The presenter provides a quick demo showcasing user login (name and email), sending questions to the AI, maintaining conversation context, and saving chat history upon logout and return.
- “if I just say simply who was the 12th it’s going to know what I’m talking about”
- “if I were to log out or leave the chat and then come back with the same email my chat will be saved”
- Technology Stack Breakdown: A detailed list of technologies used on both the frontend and backend is provided.
- Frontend: Vue.js, Pinia, TypeScript.
- Backend: Node.js, Express, TypeScript.
- AI: OpenAI API (GPT-4 Completions API).
- Chat: Stream Chat (getstream.io).
- Database: Neon (serverless PostgreSQL), Drizzle ORM (for database interaction).
- Deployment: Render (backend), Vercel (frontend).
- External API Documentation: The presenter highlights the websites where documentation for the key technologies can be found:
- Stream Chat: getstream.io (Developers -> Chat Messaging -> Node.js).
- Neon Database: neon.pdf (implied to be accessible).
- OpenAI: platform.openai.com.
- API Key Management: The project requires generating and managing API keys for OpenAI, Neon, and Stream Chat.
- “we are going to have to generate API keys for open AI for neon and for stream so you might want to just just keep these websites open”
- Backend Setup:Project structure involves separate folders for the backend (chat-AI-API) and frontend (chat-AI-UI).
- npm init is used to initialize the Node.js backend.
- ES Modules (“type”: “module” in package.json) are used.
- Key backend dependencies include: express, cors, dotenv, stream-chat, openai, typescript, tsx, drizzle-orm, drizzle-kit, @types/node, @types/express, @types/cors.
- tsx is used to execute TypeScript on the backend.
- drizzle is the ORM for PostgreSQL, and drizzle-kit is its CLI.
- A tsconfig.json file is configured to specify TypeScript compilation options (ES Modules, target, output directory, source directory, strict mode).
- npm scripts are defined for development (dev: tsc –noEmit && tsx watch ./src/server.ts), build (build: tsc), and production start (start: node ./dist/server.js).
- A basic Express server is set up in server.ts with middleware for CORS, JSON parsing, and URL-encoded data.
- .env file is used to manage environment variables (e.g., PORT).
- Initial Git setup with .gitignore (excluding node_modules and .env).
- Register User Route (/register-user – POST):Registers a user with Stream Chat.
- Expects name and email in the request body.
- Uses the Stream Chat client (stream-chat library) initialized with API key and secret from environment variables.
- Generates a unique user ID from the email (replacing non-alphanumeric characters with underscores).
- Checks if the user exists in Stream Chat using chatClient.queryUsers.
- If the user doesn’t exist, it adds the user to Stream Chat using chatClient.upsertUser with the generated ID, name, email, and role (‘user’).
- Returns the user ID, name, and email.
- Send Message to AI Route (/chat – POST):Handles sending a message to the AI (OpenAI).
- Expects message and user ID in the request body.
- Initializes the OpenAI client (openai library) with the API key from environment variables.
- Verifies if the provided user ID exists in Stream Chat.
- Uses the OpenAI Chat Completions API (openai.chat.completions.create) with the gpt-4 model.
- Sends an array of messages to OpenAI, including the user’s message with the role ‘user’.
- Extracts the AI’s reply from the response (response.choices[0].message.content).
- Returns a JSON response with the AI’s reply.
- Database Integration (Neon and Drizzle):Installation of drizzle-orm, @neon-database/serverless, and drizzle-kit.
- Database configuration in src/config/database.ts using Neon and drizzle, connecting to the database URL from .env.
- Schema definition in src/db/schema.ts using pgTable, defining chats and users tables with their respective columns (ID, user ID, message, reply, created at for chats; user ID, name, email, created at for users).
- Type inference for insert and select operations using Drizzle.
- Drizzle configuration file (drizzle.config.ts) in the root directory, specifying the schema location, migrations folder (./migrations), dialect (‘postgres’), and database URL.
- Generating and running migrations using npx drizzle-kit generate and npx drizzle-kit migrate.
- Integrating database operations into the /register-user route to save user information in the Neon database after successful Stream Chat registration.
- Integrating database operations into the /chat route to save the chat message and AI reply in the Neon database.
- Get Messages Route (/get-messages – POST):Retrieves chat history for a specific user from the Neon database.
- Expects user ID in the request body.
- Uses Drizzle’s db.select().from(chats).where(eq(chats.user_id, userId)) to fetch messages for the given user ID.
- Returns an array of messages in the JSON response.
- Contextual Conversations: Implementation on the backend to fetch the last 10 messages for a user and include them in the prompt sent to OpenAI, enabling contextual conversations. The chat history is formatted for OpenAI, and the latest user message is added. The role for the AI in the conversation is specified as ‘assistant’.
- Frontend Development (Vue.js):Project creation using npx create-vue.
- Installation of frontend dependencies: vue, vue-router, pinia, pinia-plugin-persistedstate, axios, tailwindcss, @tailwindcss/aspect-ratio.
- Tailwind CSS setup involving importing tailwindcss in vite.config.js and @tailwind base, @tailwind components, and @tailwind utilities in src/style.css.
- Router setup in src/router/index.ts with routes for home (/) and chat (/chat) views.
- Pinia store setup in src/stores/user.ts for managing user ID and name with actions to set and clear the user, and persistence enabled.
- Home view (src/views/HomeView.vue) with a form for name and email, connected to the backend’s /register-user endpoint using Axios and updating the user store upon success, then redirecting to the chat view.
- Header component (src/components/Header.vue) with a logout button that clears the user store and redirects to the homepage.
- Chat store (src/stores/chat.ts) using the Composition API to manage chat messages, loading state, and functions to load previous chat history (/get-messages) and send new messages (/chat).
- Chat view (src/views/ChatView.vue) displaying messages, handling scrolling, and using a ChatInput component.
- Chat input component (src/components/ChatInput.vue) with an input field and a send button, emitting the message to the parent component.
- Formatting AI messages in the frontend (formatMessage function using regular expressions) to improve display of lists, bold text, etc., using v-html.
- Deployment:Backend deployment to Render: Configuring the build command (npm install && npm run build), start command (node dist/server.js), and environment variables from .env.
- Frontend deployment to Vercel: Selecting the Git repository, with automatic build settings (using Vite), and configuring the environment variable VITE_API_URL with the Render backend URL.
This detailed overview captures the comprehensive nature of the project, highlighting the interconnectedness of various technologies to build a modern AI-powered chat application.

Chat AI Application: Overview and Functionality

General App Overview
1. What is Chat AI? Chat AI is a full-stack AI-powered chat application built to demonstrate how various technologies can be integrated to create a real-time conversational experience with an AI. Users can log in by providing their name and email, engage in chats with an AI, and their chat history is saved and persists across sessions using the same email.
2. What are the main technologies used in Chat AI? The application utilizes a range of modern web development and AI technologies. On the front end, it employs Vue.js for the user interface and Pinia for state management, both written in TypeScript. The back end is built with Node.js and Express, also using TypeScript. For the AI aspect, it leverages the OpenAI API, specifically the gpt-4 model via the Completions API. Real-time chat functionality is provided by Stream Chat (getstream.io). The application also uses a Neon serverless PostgreSQL database to store user information and chat logs, interacting with it through the Drizzle ORM. Deployment is handled by Render for the back end and Vercel for the front end.
Core Functionality and Features
1. How does user registration work in Chat AI? User registration in Chat AI is a straightforward process. Users provide their name and email through a form on the front end. This information is sent to the back end, which then registers the user with Stream Chat, assigning them a unique ID derived from their email. Additionally, the user’s ID, name, and email are stored in the Neon PostgreSQL database. No traditional password-based authentication is implemented in this demonstration.
2. How does Chat AI handle the conversation with the AI? When a user sends a message, the front end transmits the message and the user’s ID to the back end’s /chat endpoint. The back end then retrieves the recent chat history for that user from the Neon database to provide context to the AI. It sends this context, along with the new user message, to the OpenAI API (using the gpt-4 model). The AI’s response is then sent back to the front end and also saved in the Neon database, associated with the user’s ID.
3. Does Chat AI remember previous messages in a conversation? Yes, Chat AI is designed to maintain context within a conversation. Before sending a new user message to the OpenAI API, the back end fetches the recent chat history for that user from the Neon database. This history, along with the current message, is sent to OpenAI, allowing the AI to consider previous turns in the conversation when generating its response.
4. How is chat data and user information stored in Chat AI? Chat AI uses a Neon serverless PostgreSQL database to persist data. When a user registers, their user ID (derived from their email), name, and email are stored in a users table. Every message sent by a user and the corresponding AI reply are stored as a record in a chats table, linked to the user by their ID. This ensures that chat history is saved and can be retrieved for ongoing conversations and across user sessions.
Development and Deployment
1. How are the front end and back end of Chat AI structured? The Chat AI project is structured with a clear separation between the front end and back end. The front end UI, built with Vue.js, resides in a chat-ai-ui folder, while the back end API, using Node.js and Express, is located in a chat-ai-api folder. This separation allows for independent development, testing, and deployment of each part of the application.
2. How is Chat AI deployed? The back end of Chat AI is deployed using Render. The Render configuration specifies the build command (npm install && npm run build) to compile the TypeScript and the start command (node dist/server.js) to run the server. Environment variables, such as API keys for OpenAI, Stream Chat, and the Neon database URL, are configured in Render. The front end, built with Vue.js, is deployed using Vercel. Vercel automatically handles the build process (vite build) and serves the static assets. The API URL for the back end (the Render-provided URL) is set as an environment variable in Vercel.
Chat AI: Development and Deployment of an AI Chat App

The sources detail the development and deployment of an AI-Powered Chat App called Chat AI. This full-stack application allows users to log in by providing their name and email, and then engage in conversations with an AI. The AI can answer questions and maintain context within a single chat session. User chats are saved, so if a user logs out and returns with the same email, their previous chat history is retained.

Here’s a breakdown of the technologies and functionalities involved:

Functionality:
- User Interaction: Users can input text-based questions or statements.
- AI Response: The app uses the OpenAI API, specifically the completions API with the GPT-4 model, to generate responses to user input.
- Context Management: The backend is designed to retain conversation history within a user’s session, allowing the AI to understand follow-up questions. This is achieved by fetching past messages and including them in the prompt sent to OpenAI.
- Chat Interface: The app utilizes Stream Chat from getstream.io to handle the messaging aspect, including users and channels.
- Data Persistence: User information and chat logs are stored in a PostgreSQL database hosted on Neon, a serverless platform. The Drizzle ORM is used to interact with this database, manage schemas, and run migrations.
- User Registration: When a user enters their name and email, the backend registers this user with Stream Chat and also saves their information in the Neon database. A unique user ID is generated based on the user’s email.
- Chat History Retrieval: The backend provides an endpoint to retrieve the chat history for a specific user from the Neon database.
- Deployment: The backend, built with Node.js and Express, is deployed to Render. The frontend, built with Vue.js, is deployed to Vercel.
Technologies Used:
- Frontend:
- Vue.js: A JavaScript framework for building the user interface.
- Pinia: For state management on the frontend, allowing for a global store to manage user and chat data. Pinia Persisted State is used to persist the user’s login status across page loads.
- TypeScript: Used for type checking on the frontend.
- Vite: A build tool and development server for the Vue.js application.
- Vue Router: For handling navigation between different views (homepage and chat page).
- Axios: An HTTP client for making API requests to the backend.
- Tailwind CSS: A utility-first CSS framework for styling the application.
- Backend:
- Node.js: A JavaScript runtime environment for the backend.
- Express: A web application framework for Node.js used to build the API.
- TypeScript: Used for type checking on the backend. TSX is used to execute TypeScript code.
- CORS: Middleware to enable cross-origin requests between the frontend and backend.
- dotenv: To load environment variables from a .env file.
- Stream Chat (getstream.io): A platform providing SDKs for implementing real-time chat features. The Node.js client library (stream-chat) is used to interact with Stream Chat’s API for user management and channel creation.
- OpenAI API: For the artificial intelligence capabilities, using the openai Node.js client to interact with the completions API and the GPT-4 model.
- Neon Database: A serverless PostgreSQL database used for storing user data and chat logs.
- Drizzle ORM: A TypeScript ORM used to interact with the PostgreSQL database, define schemas, and perform database operations. Drizzle Kit is used as a CLI for Drizzle to generate migrations.
AI Implementation:
- The backend receives user messages and sends them to the OpenAI API using the gpt-4 model.
- Previous chat messages for a user are fetched from the database and included in the prompt sent to OpenAI to maintain context. The conversation history is formatted according to the OpenAI API’s requirements, with roles specified as “user” and “assistant”.
- The AI’s response is then sent back to the frontend and also stored in the Neon database along with the user’s message.
Chat Implementation:
- Stream Chat is used to manage users and create channels for conversations.
- When a user registers, they are also registered as a user in Stream Chat.
- When a user sends a message, a unique channel is created or retrieved for that user (identified by chat- and their user ID).
- Both the user’s message and the AI’s response are sent as messages within this Stream Chat channel.
This project demonstrates a comprehensive approach to building an AI-powered chat application, covering frontend development with Vue.js, backend development with Node.js and Express, integration with AI and chat service providers, database management, and deployment.

Chat AI Application: A Full-Stack Development Example

Based on the sources and our previous discussion about the Chat AI application, full-stack development refers to the process of building a complete web application that encompasses both the frontend (client-side) and the backend (server-side), as well as the database that supports them. A full-stack developer is proficient in working with all these layers of the application.

The development of the Chat AI application serves as a practical example of full-stack development. The project involved building:
- The Frontend (Chat AI UI): This is the user interface that users interact with directly in their web browsers. It was built using Vue.js, a JavaScript framework for building user interfaces. Key aspects of the frontend development included:
- Creating the user interface with components for the homepage (login/registration) and the chat page.
- Managing the application’s state using Pinia. This included managing user information (user ID, name) and chat messages. Pinia Persisted State was used to ensure user login status persists across page reloads.
- Handling routing between different pages (homepage and chat) using Vue Router.
- Making API calls to the backend using Axios to register users, send messages, and retrieve chat history.
- Styling the application using Tailwind CSS, a utility-first CSS framework.
- Using TypeScript for type checking to improve code quality and maintainability.
- Using Vite as a build tool and development server.
- The Backend (Chat AI API): This is the server-side logic that handles requests from the frontend, interacts with the database and external APIs (OpenAI, Stream Chat), and sends responses back to the frontend. It was built using Node.js and Express. Key aspects of the backend development included:
- Setting up API endpoints using Express to handle user registration (/register-user), sending chat messages (/chat), and retrieving chat history (/get-messages).
- Handling cross-origin requests using CORS to allow communication between the frontend and backend running on different domains.
- Loading environment variables using dotenv to manage API keys and database connection strings securely.
- Integrating with Stream Chat to manage users and channels for the chat functionality. This involved registering users with Stream Chat and creating unique channels for each user’s AI conversations.
- Integrating with the OpenAI API to generate AI responses to user messages using the GPT-4 model. This involved sending user prompts along with conversation history to maintain context.
- Interacting with the Neon PostgreSQL database using the Drizzle ORM to store and retrieve user data and chat logs. This included defining database schemas, running migrations using Drizzle Kit, and performing database queries.
- Using TypeScript for type checking on the backend, along with TSX for executing TypeScript code.
- The Database (Neon PostgreSQL): This is where the application’s data is stored and managed. In the Chat AI project, Neon, a serverless PostgreSQL platform, was used. The database stored user information and the history of conversations between users and the AI. The backend interacted with this database to persist and retrieve data.
Finally, full-stack development also includes deployment, making the application accessible to users over the internet. In the Chat AI project, the frontend was deployed to Vercel, and the backend was deployed to Render.

Therefore, the Chat AI application is a clear example of full-stack development, requiring the integration of various frontend and backend technologies along with a database to deliver a complete and functional user experience.

Chat AI Application: Real-Time Features via Stream Chat

Based on the sources and our conversation history, the Chat AI application incorporates real-time chat features through the use of Stream Chat from getstream.io. The source explicitly states that Stream Chat offers SDKs for powerful applications that implement real-time chat.

Here’s a breakdown of how real-time chat features are relevant to the Chat AI application based on the provided information:
- Stream Chat as the Foundation: The application uses Stream Chat to handle the entire messaging aspect. This includes managing users and channels, which are fundamental components of a chat system.
- Real-time Capabilities: Stream Chat is designed to provide real-time communication. While the current implementation described in the source focuses on a turn-based interaction with the AI (user sends a message, AI responds), the underlying technology of Stream Chat is capable of supporting more interactive and real-time scenarios if needed.
- Abstraction of Complexity: By using Stream Chat, the developers of Chat AI likely leveraged Stream’s SDKs to abstract away the complexities of building a real-time messaging infrastructure from scratch. This includes handling things like message delivery, user presence (if implemented), and channel management.
- Backend Integration: The backend of the Chat AI application (built with Node.js and Express) integrates with Stream Chat using Stream’s JavaScript client library (stream-chat). This allows the backend to register users with Stream Chat and create channels for the conversations.
- Potential for Expansion: Although the initial version focuses on AI interaction, the use of Stream Chat provides a foundation for potentially adding features like multi-user chat or other real-time communication elements in the future, as Stream Chat also offers video and audio capabilities.
In summary, Stream Chat is the key technology chosen for the Chat AI application to provide robust and scalable real-time chat features, even though the current implementation primarily showcases a single-user interaction with an AI. The use of Stream Chat simplifies the development of the messaging aspects of the application by providing pre-built SDKs and handling the underlying complexities of real-time communication.

Chat AI: Serverless PostgreSQL with Neon

Based on the sources and our conversation history regarding the Chat AI application, a serverless PostgreSQL database is a fully managed database service hosted in the cloud that offers on-demand scaling, automated management, and billing based on actual usage. The Chat AI application utilizes Neon as its serverless PostgreSQL database.

Here’s a breakdown of the discussion around Neon and its role as a serverless Postgres database within the context of the Chat AI project:
- Neon as the Chosen Database: The project explicitly states that it uses Neon for its cloud PostgreSQL database. It’s described as offering serverless PostgreSQL databases that can be set up very quickly, “literally in like 10 seconds”.
- Key Features and Benefits of Neon (as mentioned in the source):
- Easy Setup: Neon allows for a very rapid setup of a PostgreSQL database.
- Branching: Neon offers features like database branching, similar to code branching with Git, which is useful for development and testing new features without affecting the main database.
- Scalability: It is highlighted as being extremely scalable, implying that it can handle varying workloads automatically without requiring manual intervention.
- Data Persistence: The Chat AI application is configured to save user information and chat logs directly in the Neon database.
- Cloud-Based: Neon is a cloud postgres database, offering the benefits of cloud infrastructure such as reliability and accessibility.
- User-Friendly Interface: Neon provides an easy-to-use interface for setting up and managing databases. This includes the ability to run straight SQL queries directly through the interface.
- Integration with the Chat AI Application:
- The backend of the Chat AI application connects to the Neon database using a connection string. This connection string, including the database URL and password, is stored as an environment variable (DATABASE_URL) for security and configuration.
- The Drizzle ORM is employed to interact with the Neon PostgreSQL database. Drizzle is a TypeScript-based ORM that simplifies database operations.
- Drizzle Kit is used as a command-line interface (CLI) for Drizzle to create database schemas and run migrations. The schema, defining the structure of the chats and users tables, is created in TypeScript, and Drizzle Kit uses this schema to generate and apply the necessary SQL to the Neon database.
- The backend API includes endpoints to save user data into the users table and store chat logs (user messages and AI replies) in the chats table within the Neon database.
- Another API endpoint (/get-messages) retrieves the chat history for a specific user from the Neon database by querying the chats table.
In essence, Neon serves as the persistent data storage layer for the Chat AI application, providing a scalable, easily manageable, and feature-rich PostgreSQL database in a serverless environment. This allows the developers to focus on the application logic rather than managing the underlying database infrastructure. The integration with Drizzle ORM further streamlines database interactions within the TypeScript-based backend.

Cloud Deployment of a Chat AI Application

Based on the sources and our conversation history, the Chat AI application was deployed to the cloud using two separate platforms for the frontend and the backend.

Here’s a breakdown of the deployment process discussed in the source:
- Backend Deployment to Render:
- The backend, built with Node.js and Express, was deployed to Render (render.com).
- The deployment process on Render involved connecting the Render service to the GitHub repository containing the backend code (chat AI API).
- Specific build and start commands were configured for the Render deployment:
- Build Command: npm install && npm run build. This ensures that the dependencies are installed and the TypeScript code is compiled into JavaScript before running the server.
- Start Command: node dist/server.js. This specifies how to start the backend server, pointing to the compiled JavaScript file located in the dist folder.
- Environment variables crucial for the backend’s operation were configured in Render. This included the Stream API key and secret, the OpenAI API key, and the Neon database connection URL (DATABASE_URL). These environment variables allow the deployed backend to securely access necessary services and the database.
- After configuring these settings, the backend was deployed as a web service on Render. The source mentions that the deployment process can take a few minutes.
- Once deployed, Render provides a live URL (domain) for the backend API, which can then be accessed by the frontend. This was verified by making a request to the /get-messages endpoint using Postman and successfully retrieving data from the deployed API.
- Frontend Deployment to Vercel:
- The frontend, built with Vue.js, was deployed to Vercel.
- Similar to the backend, the deployment on Vercel involved connecting the Vercel platform to the GitHub repository containing the frontend code (chat AI UI).
- Vercel automatically handles the build process for Vue.js applications, typically using vite build as the build command, although the source indicates that the default settings were largely kept.
- An environment variable specific to the frontend, VITE_API_URL, was configured in Vercel. This variable was set to the live URL of the backend API deployed on Render, ensuring that the frontend communicates with the correct backend endpoint in the cloud. The VITE_ prefix is convention for Vite to expose environment variables to the frontend code.
- After configuring the API URL, the frontend application was deployed to Vercel. The source notes that the frontend deployment can also take a short amount of time.
- Upon successful deployment, Vercel provides a live URL for the frontend application, making it accessible to users via a web browser. The deployed frontend was tested by logging in, initiating a chat, and verifying that it could communicate with the backend and display responses.
In summary, the Chat AI application utilized a microservices-like deployment strategy, with the frontend and backend deployed independently to platforms optimized for their respective technologies. Render was chosen for the Node.js backend, providing a platform for running server-side applications with support for environment variables and custom build/start commands. Vercel was chosen for the Vue.js frontend, offering streamlined deployment for modern web applications with easy configuration of environment variables for API integration. This approach allows for independent scaling and management of the different parts of the full-stack application in the cloud.

Build & Deploy An AI-Powered Chat App | Vue, Node, TypeScript, Open AI, Stream & Neon Database

The Original Text

hey what’s going on guys so I got a really cool project for you today we’re going to be building a full stack AI powered chat app called chat Ai and I’m just going to give you a quick demo before I explain anything so we just log in and and you just have to put your name and email and then we can go ahead and say something we’ll say like um we’ll say who we’ll say who was the 10th president of the US okay so it says AI is thinking and get back the 10th president of the US was John Tyler now it’s going to keep the context so if I just say simply who was the 12th it’s going to know what I’m talking about so the 12th President uh was Zachary Taylor all right and if I were to log out or leave the chat and then come back with the same email my chat will be saved all right now to achieve this we’re going to be using VJs and pinea for State Management on the front end and no. JS and Express on the back end and we’re using typescript on both the front end and backend now there’s a bunch of technologies that we need to incorporate to achieve this for the whole artificial intelligence aspect we’re using the open AI API uh the completions API using gp4 and then for the whole chat aspect we’re using stream chat at getstream.io so stream offers sdks for powerful applications that Implement real-time chat as well as video and audio and and this is just one of the many ways to use stream chat so like I said it’s going to handle the whole messaging aspect the users the channels now even though stream does store your chat logs and stuff we’re going to set up our own database our postgres database with neon so neon offers serverless postgres databases that you can literally set up in like 10 seconds and there’s features like branching it’s extremely scalable and we’re going to set it up so that our our project saves the users and the chats in the neon database and then we’ll be using the drizzle orm to interact with the database uh as well as create schemas run migrations and so on and then at the very end we’re going to deploy the back end is going to go to render so render doc um and then the front end The View application will go to versel all right so this is just an all-in-one Dev to deployment project I have the repositories for both the front end UI and the back backend API in the description so I would suggest you follow along with me I think that’s the best way to learn so let’s jump into it all right so I just want to quickly go over some of the websites where you can find the documentation for different parts of this project so first off we have stream chat which we’ll be using for all the chat aspects and you can find the docs at getstream.io if you go to Developers and we’re using the chat messaging there’s also video and audio capabilities if you want to check that out but if we go to chat messaging there’s all these platforms that we can choose from and we’re using it on the back end so you want to choose the the node.js option and then from here you have all different topics like users tokens permissions um creating channels and all that good stuff so that’s for stream chat and then for neon which is our Cloud postgres database you can go to neon.pdf here and then finally we have openai which is at platform. open.com and this is what we’re using for the whole AI aspect of it and we are going to have to generate API keys for open AI for neon and for stream so you might want to just just keep these websites open so let’s go ahead and just open up a terminal and just navigate to wherever you want to create this project and we’re going to have two separate folders for the back end and front end the back end will be chat AI API and the front end will be chat AI UI that’ll be the vue.js back end will be the node and expr Express so I’m going to put both in a parent folder so let’s create that so I’m going to say make directory and chat Ai and then CD into that chat AI folder and make another directory called chat AI uh- API and that will be our back end and I want to open that up in in vs code so I’m going to say code and then chat AI API and of course if you want to use a a different text editor that’s absolutely fine okay I’m going to be using my integrated terminal so I’ll go ahead and open that up and we just want to run npm in nit of course you need to have no JS installed and let’s go through this so package name version that’s good description let’s say this is going to be a backend for uh we say for an AI chat application and then s uh entry point I’m going to call server say server.js and then author you can put your own name if you want and MIT for the license so type I’m going to be using ES module so we want to put module instead of commonjs I mean if you want to use commonjs you can but um we’ll be using import all right so now that we’ve done that let’s install our dependencies and I just want to quickly go over what our backend dependencies are going to be so we have Express of course which is our backend web framework cores which which allows access to resources from a different origin that’s because our front end and back end will be on different domains EnV allows us to use environment variables from a EnV file stream chat is the official JavaScript client to work with stream chat in nodejs or in JavaScript and then open AI is the client to work with the open AI API typescript we’re going to be using on both the back end and front end a little bit more to the setup in the the back end but it’s not too bad we’re going to be using something called TSX to execute our typescript because even though with node.js version I think 23 um typescript is supported but all it really does is is strip your types it doesn’t actually compile it doesn’t execute it so that’s where TSX comes in if you want to use TS node or something else you can and then drizzle is the OM to interact with our postgis database and then drizzle kit is a CLI for drizzle and we’ll install the drizzle stuff a little later but let’s install some of these dependencies now so let’s say npm install Express we want what else cores we want EnV um stream chat so stream Das chat and also open AI I think that’s I think that’s all I want right now as far as regular dependencies now for for Dev dependencies let’s say npm install D- uppercase D and we want typescript and I’m also going to install TSX to execute the typescript and then for types we’ll do types SL node let’s say at types SL Express and also types slash cores now as far as uh as our typescript config goes we’re going to create our TS conf fig with npx and let’s say TSC D- init and I’m going to open that up and there’s I have a configuration that I’m going to use if you want to use the same one you can just get it from the link in the description the the GitHub repo but I’m going to paste this in so it’s pretty simple we’re going to I want to use es modules and all that all the latest features so using node next for module and module resolution and Es next for the Target our output directory when we compile our typescript is going to be slash and then our root directory where we write our code is going to be SL Source okay we’re using strict mode or strict type checking U and then we’re allowing importing commonjs modules we’re allowing importing of Json files as modules and we’re skipping the Declaration files in the node modules folder to speed up compilation so pretty simple setup we’ll go ahead and save that file and and then we want to create our scripts in package.json so we want a Dev script and of course with our Dev script we want to compile our typescript so we’re going to use TSC so the typescript compiler and then we’re going to add– no emit because we don’t actually want to produce JavaScript files when we run this we’re just running our Dev server and we just want to compile our typescript and we also need to run TSX as well and we’re going to run that in watch mode okay so no need for like node Monon or anything like that and then that’s going to be source and then server. TS will be the entry point so that’s what we want to run okay and then to build our project out to just compile we want to run just TSC and then the start script so to run in production is just going to be node and then it’ll be in a disc folder and it’ll be called server.js so that’s all we need for uh for our npm scripts all right so now let’s create a folder in the rout called Source that’s where we will write all of our code and let’s create our entry point which is going to be server. TS I mean if you don’t want to use typescript that’s fine we’re not doing too much as far as typescript goes so even if you’re not familiar with it you should be all right Honestly though it’s really becoming the standard so I mean you’re going to be creating TS files and TSX files all the time so I mean I would suggest learning the basics of typescript if you don’t know it so let’s just create a basic Express server so we want to import Express from Express and we want to import uh let’s see we want to import cores as well and we also want to import EnV and then we want to call env. config and then we’ll initialize our Express app so set that to express okay we need have some middleware to add so cores requires us to add an app.use and we’re also going to do app.use express. Json because we want to be able to um when we send a request we want to be able to send Json in the body we also want to be able to send form data so let’s also do app.use and say Express and then URL encoded and then just pass in an object with extended and set that to false all right and then we want to create our Port variable and I’m going to set the port in the EnV so I’ll say process. env. port or then use 5000 and then let’s just uh take our app and let’s listen on that Port whoops listen on Port and then when that happens we’ll just run a console.log and put in some btics here and we’ll say the server server running on and then output that Port variable all right so let’s um let’s create ourv so that’s going to go in the rout not in the source folder and from here we’ll say port and for now I’m just going to set it to 8,000 because I want to make sure that it’s actually reading this value so down at the bottom here at the terminal or wherever your terminal is let’s run npm run Dev okay so server running on 8,000 and it’ll compile any typescript we have obviously we don’t have any right now and before we start to add our route I just want to make my my first commit so I’m going to open up a new terminal here and let’s run and get a knit and then we’ll create our do get ignore uh CU you definitely don’t want to push thatv so let’s add not node modules and Dot EnV all right so I’ll say get add all get commit oh what’s this don’t care get commit and we’ll just say initial Express setup set up our initial Express and TS setup all right so now what we want to do is have make our first route and what this is going to do is it’s going to reach out to to stream chat and it’s going to register a user with stream okay because you can actually log into to your stream dashboard and you can see the users that were registered for um what is this doing here for your for your application and it when I say register it’s not traditional authentication where you’re going to have a password and stuff basically you come to the the app and you put in your name and email and then from there you get sent to the you know the the form to interact with the AI so let’s start by just creating a route now since we’re using typescript we’re also going to bring in from Express we want to bring in uh request and response uppercase R and let’s create our first route we’ll go down here let’s say I’ll put a comment let’s say register user with stream chat and it’s going to be a post request so app post and for the endpoint we’ll call this excuse me we’ll say SL register D user and then we’re going to have an async function here okay and then as far as what this returns will be a promise and we’re just going to add any to this and then it’s going to take in the request and response and we want to set those to those types so request and then res which will be response and then just to test it I’ll do a res. send and we’ll just say test okay so now we have our first endpoint now as far as how you test your endpoints it’s it’s really up to you I I like to use Postman and I have the postman extension for vs code so if I click on this icon right here I can make a a new HTTP request and it just opens up in a new tab which is nice so I’m going to make a post request to http and we want Local Host and I have it running on 8,000 and we want to do register Das user and you’ll see I get a 200 response and I get test so we know that that that route is working we also want to send an a name and email so why don’t we do a little bit of validation here we’ll say if or first of all let’s get the name and email so that’ll be in the request body so let’s say cons and then we’ll destructure the name and email from requestbody and then we’ll say if not name or not email then we want to send back an error so so we’ll return res Dot and let’s do a status of 400 which is a user error and then we’ll attach the Json uh let’s just put an error here and we’ll say name name and email are required and then after the if I’ll just do a res. status say 200 and Json message success all right so let’s try that out so if I just send as is I get the error but if I send a name and email in the body I can either do Json data or I can use the form URL encoded so I’m just going to add name and email and then send and I get success all right so now we want to start to work with stream we installed that you should have the stream client so if you just look in your dependencies you should have stream chat that’s what we’re going to use now in order to create an instance we have to we we need our API key so we’re going to have to go to the getstream.io and just you can log in with with either GitHub or Google so I’m already logged in I’m going to go to my dashboard and you don’t have to pay anything or enter any credit card info for this we want to create a new app and this has to be unique so I’m just going to say chat Dash uh we’ll say chat Ai and I’ll just do das Brad you could put your own name whatever it just it has to be unique and then just choose the locations for the the chat and video storage and feed storage that’s closest to you and click create app and then you’ll have your keys right here we have a key and a secret we need both of these so I’m going to copy the key I’m going to go into myv and we’re going to add stream API key and set that to the key and then we also want the stream API secret which we can get from right here just going to copy that okay so I’ll paste that in now that we have that we should be able to create an instance so let’s bring in the library so import or the client and that’s called stream chat and then we’re going to create it or initialize it right above our row let’s say initialize initialize the stream stream chat or stream client so we’ll say const we’ll call this chat client and I’m going to set that to the the stream chat stream chat Dot and then it’s get instance and then that’s going to take in your API key and your API secret so let’s say process Dov Dot and then stream API key and I’m going to put a bang on the end of this which is a nonnull assertion so I’m basically telling typescript that this is it’s not going to be null or undefined it’s definitely there and then we’ll do the same with the secret that gets passed in as well okay so now we should have our chat client initialize now we can use it in our route and the first thing I want to do is is create an ID because when you create a user it needs to have a unique ID and it’s up to you on what you want that ID to be I mean you could install a package like uuid but what I want to do is take the email you don’t have to type this out but let’s say that the email is Brad gmail.com then I want the ID to be brador gmailcom so that way we have a unique ID but it’s also readable it’s understandable all right so let’s um let’s do that so I actually want to after we do the if here let’s wrap this in a TR catch and I’ll move this this success I’m going to move that into the try and then let’s copy it and then in the catch if something goes wrong then I’m going to send a 500 error so let’s change the status to 500 and for the message we’ll just say or it’s actually going to be error so error we’ll say internal server error all right now in the try let’s generate the ID so we want to say user ID and we have access to the email that they enter so we’re going to use replace and replace takes in it’ll take in what we want to replace and what we want to replace it with so and we pass in a regular expression I know a lot of you guys and including myself hate regular Expressions but this is pretty simple so in Brackets we’re going to use the uh the carrot so basically we’re saying if not if it doesn’t match whatever I type here which is going to be a lowercase A to Z so a low lowercase letter or an uppercase A to Z or 0 to9 or an underscore or a dash if it’s anything other than that I want to replace it with an underscore and I want this I want it to be Global so I’m going to put SLG so the second argument you pass in is what you want to replace it with which in our case is going to be an underscore okay so that’ll generate the user ID in fact we can go ahead and do a console log of user ID just to check and then if I make a request again with this Brad at Gmail and I send if we look down in the console brador gmailcom and again if you want to do something different for IDs you can so yeah let’s get rid of the console log and now I want to check to see if the user exists in stream chat so let’s say check if user exists and we can do that let’s put this in user response and this is a synchronous so we want to do await and then chat client and then there’s a method called query users so we want to use that and then what we can do is pass in an object where we want to match the ID and we set that to an object with this money sign EQ so we’re saying if it equals the user ID and we can do a console log of that as well and then just go ahead and send and if we look down here you see it gives us an object with a duration and the user’s array which is empty because that the user doesn’t exist that um in my case it would be the Brad gmailcom doesn’t exist so what we can do after that after we check if the user exists or after we set that variable we’ll say if not user response. users. length so we’re saying if that array is empty then we want to add new user to stream so we can do that with the uh there’s a few different methods we can use I’m going to use upsert user which will create or update a user so let’s say chat client. upsert we want to do it’s just a single user so upsert user and then we’re going to pass in an object I’m going to assign the ID to the user ID which will be that formatted email the name set that to name and the email and then I’m also going to add the role of user because there’s there’s different roles there’s an admin user as well in fact if we look at the the docs here if I search for roll permissions let’s see yeah so right here buil-in roles so user is a default user role you have guest um you have admin which is a role for users that can perform administrative tasks with elevated permissions so we just want a a regular user now I should also mention that this is where you you can also generate a token so let’s see um we could call right here create token however we’re we’re using this on the server side with an API key so we don’t need to do this but if you were using this from you know react or view or some kind of front end then you would want to create a token and you’d want to save that and then send that with your um you know the rest of your requests so but we don’t have to do that so yeah I just want to return now basically just return the user ID the name and the email so let’s go right under that if actually we already have this we might as well just use that um yeah we’ll get rid of the message and let’s send the user ID the name and email and I think that should do it so we can try it out now so I’m going to come back here I’m going to register the user Brad gmail.com let’s click Send and I get back a 200 I get back my that’s my user id formatted from my email my name and the actual email now you should be able to actually log into your dashboard so if I go to my chat AI app here and under chat messaging if I go to Explorer you can see I have my chat AI Brad app has an app ID app name if I click users I got Brad 90 not sure where that came from but right here is the the user that I just created and it has all these fields it has the ID the name language role created at updated that if the user banned if they’re online if they’re invisible so lots of information uh and you can delete users too in fact I’ll delete that Brad 90 user oh it’s an original dashboard user okay so I can’t delete that all right so we’re able to to register users now for the chat and later on we’re also going to implement our neon postgres database so that it saves users there as well and it also saves all the chats all the logs but before we do that let’s create our chat route chat route so that we can send a question or or whatever it is we want to send to the AI and use the open AI API to respond to that so let’s first of all bring in the open AI client so up at the top here we’ll say import and it’s going to be open AI from open Ai and then we need to initialize open AI here as well just like we did with stream so right here I’ll say initialize open AI now we are going to need an API key so why don’t we do that real quick I’m going to jump over to platform. openai and log in so from here if we click on settings go to API Keys you should see them here um I’ll just create a new one let’s say chat and for project I’ll just use um project one okay so I’m going to copy that and then I’m going to go ahead and add that to myv so here let’s say opencore AI uh actually no let’s do open AI underscore and then API uncore key and then set that okay so now what we can do is initialize here let’s say const open a excuse me open AI set that to new uppercase o Open Ai and then pass in our API key which is going to be process.env do open aore API key okay so that’ll initialize that now let’s create our our chat route so I’m going to go under the register user and let’s say um what should I say here let’s say send message to Ai and this is going to be a post request as well so app post and the route is going to be just slash chat and then let’s say async okay I’m going to return say promise and any and then we’re going to pass in the request and res response all right so the first thing I want to do is get from the body there should be a message and there should be the user ID because when you register a user you get the ID and then you’re going to send that along to the chat route so let’s say const and let’s get the message and the user ID from the request. body and then we want to just make sure that that exists so we’ll say if not message or not user ID then we’re going to want to return let’s say status 400 and Json we’ll say message and user are required all right then we’re going to go under that if statement and let’s open up a TR catch and in the catch we’re going to I’ll just copy this response here except we’re going to change this to a 500 and then for the error for the error we’ll just say internal server error and in the try first thing we’re going to do is let’s say verify user exists so we’ll say const user response we want to set that to a wait on the chat client and then we’re going to use Query query users pass in an object we want the ID to match the user ID okay after we do that let’s check that response we’ll say if not user response remember it has an array called users so we’re going to check that we’re going to check the length and basically if it’s an empty array then we know the user isn’t hasn’t been found it doesn’t exist so let’s return res. status and four we’ll do 404 because it’s a not the user’s not found and then we’ll we’ll do Json and let’s pass in an error and for the error we’ll say user not found and we’ll say please register first all right now before we do anything else let’s just let’s just check if that works so we’ll just do a simple res. send and just say success okay so when we make our request now to slash chat it should reach out to stream and uh in the body I don’t have uh I don’t have anything so I should get this message and user are required so let’s add in the message I’m going to say what is the capital of Massachusetts and then for the user ID for the user ID I’m going to put a user that doesn’t exist I’ll just do one two three and if I send that I get user not found please register first now we know that the user ID for me brador gmailcom we know that that exists so let’s try that out and we get success so so far so good now what we want to do is start to work with open Ai and we’re going to use the chat completions API which will work like chat GPT you send it a prompt and it sends you a response so let’s go right here where I have the res. send and delete that and let’s send the message to open Ai and we’re going to be using the gp4 model so we’ll say con response and set that to await open AI and it’s going to be chat dot completions Dot and then create and then we want to pass in an object that has the model that we want to use which in this case is going to be GPT you have all these different options we’re going to do gp-4 so that’s the model we want and then we send messages which is going to be an array and we’re going to pass in an object here with a rle of the user and then the content is going to be the message all right so whatever we add in the message which in my request is just what’s the capital of Massachusetts now I want to show you what that gives us so why don’t we just do a console log of the response and then as far as what we return I’ll just let’s just do uh yeah we’ll just say res. send success okay I just want to see what what this gives us so let’s come back over here and I’m going to send the same response with the message and the correct user ID we get success but let’s take a look in the in the console here and we get this object has an ID blah blah blah what we care about is this right here this a choices array and there’s an one object in there with a message and we can’t see it here we just see object um yeah I don’t think we can see that so why don’t we log that so we got console log response and then it’s going to be dot choices which is an array we want the first and only item in that array and we want the message okay let’s send it again and there we go so we get an object with the role is assistant okay so it’s the the AI That’s responding has a role of assistant and then content is what we’re looking for the capital of Massachusetts is Boston all right so it’s as easy as that to to create a prompt and get back a response so now obviously we want to return that response from the endpoint so let’s come back in here we know how to access it now right with this it’s actually message. content that will give us the exact you you know what we’re looking for so let’s get rid of the console log here around this and let’s put this into a variable we’ll say const AI message and um let’s type that it’s going to be a string and set it to that so response choices message content now I am going to use um optional chaining here for for uh message so just add a question mark there and then we also want to use a nullish coalescing operator because if that happens to be null or whatever then we’re just going to make it no response from AI that’ll get rid of any typescript errors now before we actually send this AI message back from this endpoint we need to create a channel which is used for managing conversations in fact if we go to the docs here and we search for Channel and then creating channels so it shows us how to do that we store a reference in a variable using our client and then. channel pass in the type which is going to be messaging okay there’s different types if we come down here and look at type you have um live stream messaging team gaming Commerce messaging is is for like you know one-on-one conversations or group chats um that’s uh typical type for things like that and we’re having a one-on-one chat with it’s just not with a user it’s with the AI so that’s what we’re going to use and then once we store the reference we can then call channel. create and then we can actually do channel. send message as well which will send the message through through the channel it’ll get stored and so on so let’s um let’s do that let’s go right below the AI message and let’s say actually I’m going to just put a comment here let’s say create channel or it’s actually create or get channel and we’re going to create the reference with our chat client. channel the type is going to be messaging and we can also pass in a unique ID which I’m going to use back ticks and then just say chat Dash and then the um the user ID so that’ll be a unique identifier and then we want to pass in an object with the name of the channel which I’m going to call we’ll call it AI chat and then we also need to add this created uncore byor ID which if you were chatting with another user then it would be that user but since we’re using an AI we’re going to call it AI bot all right so that will will create the reference now we need to call channel. create like I just showed you in the docs and then after that we can do uh sorry this needs a wait and then after that we can do await channel. send message and pass in an object with the text which will be the AI message so AI message and then the user ID and make sure you do user uncore ID that’s what the key is it’s not camel case it’s underscore and then that’s going to be the AI bot that sends this message okay now as far as what we want to respond with let’s do res. J actually we’ll do status 200. Json and then pass in um an object oops passing an object with a reply and that reply will be the AI AI message oops not Al message AI okay so yeah that should do it and then I just want to do a console log here as well if there is an error let’s put um error generating AI response and then also the error all right cool so let’s try that out I’m going to come over here and I have I have my message I have my user id let’s go ahead and send and we get an object with the reply the capital of Massachusetts is Boston and what’s cool is now if we go back to the stream dashboard and if we go to you know chat messaging Explorer we have the AI bot user here and you can see under channels we have messaging so it’s that’s the type and then we have the unique identifier which is chat D Brad Gmail com because I set that right here right that’s the unique identifier and then we should be able to see any messages that are in through that channel so we have one message here and it shows the text which is the capital of Massachusetts is Boston so whatever the AI sent us back so pretty cool now what I’d like to do is Implement our own database I mean we do have the the you know you can see the chats and using stuff through stream but a lot of times you want to do more with it so you’ll want to store the chat logs in your own database so I want to expand this to to do that and also store our users so you want to create uh a postgres database through neon so I’m going to go ahead and log in here all right and then we’re going to go to well yeah I guess we’ll create a new project so once you log in and this interface is is so easy to use and it’s so easy to set up a database it’s basically just a couple clicks so I do want to create a new project I’ll call this uh tutorial and you can choose your postr G version I’m going to stick with 17 your database name I’m going to call this chat chat Aid DB and I’ll just choose AWS and then create and you can do a lot from this interface I mean you can run straight SQL queries there’s branching so just like you have branching with GitHub with your code you have branching with your databases so if you want to whatever add a new feature and you don’t want to affect the main branch you can just create a new Branch work with that once that’s all set and you know that’s what you want to use then you can merge the branches so really cool and what we want to do now is just click connect and that will give us our connection string so right here we want to copy this and actually let’s click show password too and then copy and we want to store that that uh reference to the the database the database string in ourv so let’s go in there and let’s call this database uncore URL and go ahead and paste that in all right right so now that we have that we need a way to interact with our database and that’s where drizzle comes in drizzle is an OM it’s typescript base it’s really easy to use um but one thing we do have to do since we’re using neon is use the neon database serverless adapter so we do have to install that as well so let’s come down to the terminal here and let’s run npm install and we want drizzle Das omm and then we also want to install at neon database SL serverless okay so it just allows us to to use um drizzle and to use this with with uh neon’s infrastructure and then we also want to install drizzle kit as a a Dev dependency so let’s say npm install Das uppercase D and then drizzle dkit and this is a CLI and we can run migrations from it and stuff okay so now that those are installed a couple things we have to do a few files we need to create so one is going to be our database config file which I’m going to put in the source folder I’ll create a new folder called config and then in that config we’ll have a database. TS file and this is where we configure That Neon database adapter so let’s go ahead and import a couple things here first one is going to be Neon and that’s going to be from this right here neon database SLS serverless then we want drizzle so import drizzle and let’s see we’re going to bring that in from drizzle omm and then it’s going to be slash neon HTTP and then we want to bring in from the dotv package we want to bring in the config function because we’re going to be using environment variables so let’s go ahead and load environment variables and we do that by calling config and then since I have this in you know in the source folder than in the config folder I’m just going to specify the path to the EnV file you can do that by passing in an object like this just say the path and it’s justv because it’s in the root okay cuz that’s going to start in the root all right then we just want to check for the database URL so if not process. env. database uncore URL then let’s throw a new error and we’ll say database uncore URL is undefined okay so we got that um next thing we want to initialize the neon client so actually going to put a comment here so init the neon client and we’re going to put that in a variable called SQL so we set that to Neon and that’s going to get passed in the database URL so process. EnV do database URL and then we need to initialize drizzle and that’s going to be exported because we’re going to be using this in other files so we’re going to call this uh variable of DB and then we set that to drizzle and that gets passed in the SQL variable which is the neon client so we can close that file up and now we want to create our schema and if you’ve used like Mongoose or SQL eyes or Prisma we create a a model or a schema of our data and then we can use drizzle kit to run the migration looking at that schema and it will create the tables for us so I’m going to put this um in let’s see we’ll have in the source folder I’ll have a folder called DB and then in DB will have a schema. TS file all right so let’s start off by importing what we need from drizzle so we need PG table which is going to create post postres tables for us and that’s going to come in from drizzle o/ PG core so in addition to that any Fields any field types that you want to use you bring in here so for instance serial which is what our primary key IDs are going to be and then text and then timestamp I believe those are the only ones we need so what we do now is export for any any table we want to create we export uh uh PG table function that takes in the name of the table and then all the fields we want to use so for instance for the chats let’s say const chats and we want to set that to PG table and the name of the table will be chats and then we pass in an object with all the fields that we want so for instance the ID is Going to Be A Serial type serial field and that’s going to be named ID and then I want that to be the primary key so we can tack on do primary key next thing we want is the user ID so that’s going to be text and user undor ID uh and then let’s see we’re that’s going to also be not null so we’ll tack that on then we have the message so let’s say message of the chat and this will be message as well then we want the reply which will be the you know the message that’s sent back from um from the the AI so that’ll be text as well and reply and not null all right and then the last thing I want here is created at so created at is going to be a Tim stamp and that’s going to also be called Crea underscore uh at and let’s add on to that I’m going to use default now which will do the current time stamp and that should also be any parenthesis there that should also be not null all right so that’s our our chats now let’s do the users because I want to store users as well so we’ll say con users not exports so cons user set that to um PG table and the name of the table will also be users and then we want to pass in all of our Fields so first thing is going to be the user ID which will be text and user ID for the column name and this is going to be the primary key so we want to add that all right then we get the name so name will be text as well name for the column name and not null all right then we got email so let’s change this and this to email and then we want a created at so I’m just going to copy this one because it’s the same thing okay so that those are the two schemas and the two tables we want to create now drizzle is is really great when it comes to typescript we can have our type inference in inference for drizzle queries so basically when we insert a chat for instance it’s going to be structured in a specific way with a specific type and we can add that here the type inference for the chat insert for the chat select for the user insert and for the user select so we want to export the so we can use them outside of this file and we’re going to say type and then chat insert and we want to set that to type of and then chats and we want this infer insert okay money sign infer insert and then we want to do the same for the chat select which is going to be infer select and change this to chat select okay and then we want to do the same thing for users so what I’ll do is just uh whoops what I’ll do is just grab both of these and copy those down and then we’re going to change this first one here to user insert and make sure we change this to users and then this one here change that to users and this one is going to be users select like that okay so that takes care of the type inference for those types for insert and selects now we want to create a config file for for drizzle or for drizzle kit because it needs to know where the scheme is are it needs to know um where the migrations will go things like that so that this is going to go in the root because that’s where it looks so in the root not in the source but in the root we’re going to create a drizzle. config dots all right and we want to import let’s see we’re going to be using EnV so we want to bring in the config from that so that’s going to be from EnV and then we’re going to import Define config and that’s going to come from drizzle kit and then let’s just run uh let’s say config because we’re going to be using the environment variables to get the database URL and again I’m just going to specify path and it’s just in the root. the EnV okay and then we want to export as default Define config and then that’s going to get passed in a couple things so first is going to be schema and that’s going to point to what we just created which is going to be from the root in the source folder and then in the DB and then schema schema. TS okay so that’s our schema um then we have out so this is going to be the migrations folder so just uh it’s going to be slash migrations all right then we have the dialect so dialect I mean you can use drizzle with different databases MySQL Etc we’re using postgres in fact you can see the options here so we’re going to use postgres and then the for the DB credentials that that’s going to be an object and we just need to provide the URL which is going to be our process uh process. env. database URL and let’s just add a bang on the end of that all right so that’s our config so we can close that up now since that drizzle config is in the root directory we are probably going to have an issue with typescript because it’s compiling the The Source folder right so we can run into an issue there so what we can do is exclude that file so under compiler options right so it ends right here so under compile options we’re going to add exclude and it’s going to be an array and we just want to add in that file so we want uh drizzle what is it drizzle. config dotx yes and that just cleared up that error that was there okay so now we have our schema created so that means that we’re ready to create our migration and when we create our migration we first generate it and then we we do drizzle kit generate and then drizzle kit migrate and that should actually create our tables for US based on this based on the schema so let’s try that let’s come down here and let’s run or we need npx and drizzle dkit and then we want to do generate first okay so your your SQL migration file so it created that here and then and you’ll see there’s a migrations folder here now then to actually migrate we want to do npx drizzle kit migrate all right so let’s see you can only connect to remote super B instances through web socket not exactly sure why it’s saying that it is a warning so let’s go to our neon dashboard and let’s check it out so if we go to tables and we have our chats yep chats table you can see the fields here ID user ID message and we have our users table okay so that worked I’m not exactly sure can only connect to remote neon versel postgress through a web socket um I guess I’m just not going to worry about that right now because it did work it created our tables from our schema which is what should have happened so now let’s integrate the database into our endpoints into the register user and into the chats because again we want to save the users we want to save the chats actually let’s bring in up top what we need to First okay so we’re going to import we want our DB right so DB which is going to come from our database now this is something that I want to mention when you’re using typescript with node with with TSX and you have the configuration that we do when you import a file because this is the first file that we’ve imported the rest have just been you know package modules so you it says JS even though it’s a TS file okay you can’t do import TS that’s not going to work it’s going to give you an error so even though it says JS everything to do with typescript compilation everything is just it still works fine this is just the syntax we have to use so in addition to that we want to bring in our schemas so chats and users and that’s going to come in from slash DB and then slash schema and again we’re going to do schema JS even though it’s a TS file and then let’s see what else do we need um there’s a utility called EQ to to basically compare values to compare the users and so on so that’s going to be EQ from drizzle omm and then what else do we have here uh the oh the chat completion message pram type so we’re going to bring that in as well so chat completion I don’t see it here completion message right here so that’s going to come from openai resources MJS actually we don’t need the MJS I don’t think yeah that should work okay so now that we we have our Imports let’s figure out where we actually want to use the database so we have our register user right so that creates a user with stream but again we want to also save the users in our own database so I’m going to go down right above the the response okay after we deal with all the stream stuff and let’s check for existing user in the database I know we checked for if it’s in stream but we also want to check in our own database so let’s say const existing user and I’m just going to close that up and we want to set that to a wait and this is where use the DB right so DB from our config file database config and then I’m going to call a few different methods on this one is Select so I want to select from the users table where so do where this is just the syntax of drizzle which I I like pretty clean so where now this is where we use that EQ utility so we want to pass in EQ and we want to see where users do user ID is equal to the user ID okay and that the user ID would be you know what we create from the email when a user is registered so it’s going to check for that and well it’s going to put it in this variable now let’s come down under it and let’s say if not existing user uh we’ll say if not existing user. length then let’s just um we’ll first off we’ll just do a console log and I’ll put some back ticks in here and we’ll say user and then the user ID uh we’ll say does not exist in the database and we’ll say adding them because that’s what we’re going to do all right so under that line let’s await DB and then we’re going to call insert and we want to insert into users where values okay so the values we want to add pass in an object here with the user ID the name and the email all right and that should be it so let’s save it now and I’m going to come back to uh to postman so let’s send a request so I’m going to make a post request to http we want to do register user and in the body for form data let’s add a name and an email okay so I’m going to go ahead and send that all right so I get back what I’m supposed to to now we want to check the database so let’s go to uh to Neon just going to reload this and there’s the user so user ID which is also the primary key the name email and the created ad so now not only are we do we have the users in the chat Explorer and stream we have it in our own database which we can do whatever we want with right and I had deleted the the user before so there it is again all right so in addition to the users in the database we also want to store the chats so let’s go back to server TS and let’s go down to the chat endpoint and figure out where we want to use the database here so let’s see we’re getting the user response here right then we check the user response so let’s go right under that and we’ll say check check user in database so again we’re going to do const actually I can just copy the the line of code I just put here okay so we’ll add that here and then um actually we can take what we put after that too which is this except we’re not going to create the user so we do have to change that so right under where we check the user let’s paste that and uh we’re going to get rid of this insert we don’t want to do that and then for the actually we’re not even going to do a console log we want to return an error if the user isn’t there so let’s say return and then res. status and we’ll do uh 404 and let’s do Json error and we’ll set the error we’ll say user not found in database please register first okay so we’re making it so that if the user isn’t in our postres database our neon database then it’s still it’s not going to let it happen right it has the user has to exist in stream and it has to exist in our database so now let’s go under where we get the AI message message and store that and let’s say save we’ll say save chat to database so to do that we can just say await db. insert we want to insert into the chats or the chats U table and then for values we’re going to add the user ID we want the message the reply uh actually reply we’re going to set that to AI message okay so we’re just taking whatever this is right and we’re just storing it as reply and that’s it so now we can try that out so I’m going to come over here and let’s go let’s make a request to slash chat and we’re going to change the stuff here so let’s say message and let’s do something different um Let’s do let’s do something a little more difficult like create a simple let’s see what should we do we’ll say create a simple rest API with python so that’s our message and then for the user ID I’m going to put my user which is going to be brador gmailcom so let’s try that out and it might take a couple seconds obviously the the more difficult the the answer the longer it’s going to take okay so we got our reply and it has a bunch of you know new line characters and stuff for formatting and we’ll we’ll when we create our UI with view we’ll have this displaying nicely so here’s a simple example on how to create a rest API using flask a microweb web framework in Python and it goes through and gives us the steps all right so we know that’s working now let’s make sure that that got saved to the database so we’ll go to Neon and I’m going to go to chats and there it is message create a simple rest API reply there’s the reply now since we implemented a database I’d like to add one more route to to get the messages of a specific user okay and we’re going to need that for our UI because obviously when you’re you know sending messages as a user you only want your messages so let’s create a new route we’re going to come down here and let’s say we want to get chat history for a user so app. poost and let’s see the the endpoint is going to be SLG get- messages and we want to do a sync return promise any okay I’ll pass in here our request so request Andre which will be response all right so what I’m going to do first is get it’s going to take a user ID so we’ll destructure the user ID from request. body because that needs to be sent with the body and then let’s check for the user ID or check if not you user ID and if not then we’re going to return res. status and we’ll do 400 and we’ll pass in an error and say user ID is required all right so if the user ID is there then let’s open up a try catch and let’s create a variable here we’ll call it chat history so chat history and then we want to await DB and then we’re going to select I’m going to go on the next line here so db. select and from okay we want to select from chats and let’s say where so where and we’re going to use EQ here CU we’re comparing so we want to to say where the chats. user ID is equal to the user ID and then we’re going to return actually we can just do res res. status 200. Json and let’s pass in we’ll do messages messages and set that to the chat history okay and then if there’s an error then we’re going to res. status 500 and let’s uh we’ll pass in an error and we’ll say internal server error and then I do want to just do a a console log as well so we’ll say um actually let’s do quotes and say error fetching chat history and we’ll output the error okay so that’s our get messages so let’s try that out I’m going to come over here post request to SLG get- messages and then we just want the user ID in the body so I’m going to go ahead and send and now it gives me back an array and you see we have id1 user ID message and then the apply I should have my other one here actually no I deleted the other one so let’s make another um let’s do another chat and then for the message make sure we check that let’s say what is the most popular JS framework send as of 20 okay so the 2021 is the cut off for this um it says the most popular JavaScript framework is react all right so now if I were to go back to get Dash messages with the user ID I should have both so create a simple rest API and what is the most popular JS framework okay so we have three routes we can we register a user we use that user to to chat to ask a question to the AI and then we have a an endpoint to to get all the messages of a specific user so that’s pretty much it as far as the back end there there’s a couple things later on while we’re doing the front end that I want to add to the back end but I think that I’ll be able to explain it better once we have the UI and I know that backend development like this can be a little tricky and can be a little weird because you don’t you’re not looking at like a user interface for your are the results of what you’re doing you’re just seeing a bunch of data so it can be can be tough for for some people so um if you if you’ve been confused through this don’t worry about it I mean it happens as you do more of it it’ll kind of Click but now what I’d like to do is jump into the front end create a whole new folder for our our chat AI UI and start to use vuejs okay so now we’re going to get into our front end and like I said we’re going to be using vue.js version three um we’re going to be using a couple other dependencies so let me just show you those real quick so view we’re using vit for our Dev server and environment we’re using pinf for our state management Library so we’ll be able to create a store for our users for our chats uh we’ll be able to have our actions and our state in those stores axios I’m using for my HTP Library although if you want to use fetch that’s fine too really doesn’t matter it’s just preference and then Tailwind CSS I’m using for The Styling so we will be be adding quite a bit of Tailwind classes to make it look nice view router we’re using the official router for view uh and we’re just going to have two routes two pages one is going to be the homepage with the form that has the email and the name so that you can you know enter your name to start chatting and then of course the chat route chat page where you interact with the AI and then typescript which we don’t have to set up at all it just works out of the box with vit so those are our frontend dependencies now I have the backend running so you definitely want to have that um this is just something I was testing out but you can see it’s mine’s running on Port 8000 and then I’m going to go into the chat AI folder and you can see I have the the back end that’s what’s running over here I want to create a folder alongside of this in my chat AI folder not inside the API folder that’s the back end this is completely separ separate so alongside that let’s go ahead and run npx and then I’m going to use create V and let’s call this chat D ai- UI okay this is the user interface part of our application and then we’re going to choose view I’m going to choose typescript although if you you want to use JavaScript that’s fine too and I mean even if you don’t know typescript it’s we’re not doing that much so you should be fine and and we already used it in the back end um and it’s much easier to set up I mean there really is no setup in the front end it just works so now let’s go ahead and CD into chat ai- UI and then let’s run our npm npm install to install our initial dependencies that come with v and then we’ll install a couple other dependencies as well all right so let’s go npm install and view- router we want pinf for our state management and there’s also a plugin called uh persisted State and we want to use this because we want our state our user to persist across page loads so we’re going to install pinea Das plugin Das persisted State and then also axios so those are our front end those are our regular dependencies and then for Dev dependencies it’s just tailwind and the plug-in for V for Tailwind so let’s do Dash uppercase D and then Tailwind CSS and then also at uh what is it at Tailwind CSS SLV and Tailwind version 4 is super easy to get set up with v okay so now that we have our dependency set up I’m going to actually just open this folder up in vs code and I’m going to run the dev server from the integrated terminal so from here let’s say NP say npm run Dev and it’s going to be 5173 for the port by default and I’m just going to make this going to just bring this over here make this a little smaller okay so this is just a landing page we’re going to get rid of the the boiler plate um what I do want to do is set up Tailwind which is really easy we just need to go into our V config first and we’re going to import uh Tailwind CSS from and then it’s going to be this at tailwind cssv and then we just need to add the plugin to the array here so Tailwind CSS parentheses and then the only other thing we need to do is go into our main stylesheet so in the source folder I want to go into style CSS and we can get rid of all this other stuff and just simply do at import and then in quotes take Tailwind CSS and that’s it and you can see Tailwinds working because there’s it’s all the same font size there’s no padding or or margin on the body so Tailwind is working all right now let’s just clear this up a little bit I don’t want the hello world component so we can completely delete that okay and then in the app. view we don’t want that I’ll leave the script tag there and then in the template uh actually we can get rid of this the scope style and then in the template for now let’s just have oops let’s just have an H1 and we’ll just say my app okay so just kind of clear everything out and in the assets we don’t need the view SVG you can get rid of that now I do have a little robot icon and logo that I want to use so let me just find that real quick you guys can get this from the the GitHub repository that that’s in the description let me just find it real quick uh let’s see Dev I’m just trying to find it off screen where we chat AI UI all right so it’s going to be in you have the fabicon which is in the root so just bring that over to your root and then you have um in the source assets folder there’s a robot PNG you’re going to bring that into your assets folder so it’s just this little robot guy all right now to add the fabcon we can go into the index HTML in our um in the root and I’m just going to change the page title to let’s do chat Ai and then for the fabicon we’ll just add a link and let’s change the real to Icon We’ll add a type of image slash it’s an Ico and then slash favicon Ico and then you should see the little robot in the tab all right so right off the bat I just want to set up routing okay because we’re using view router we’re going to have the home route we’re going to have a chat route so let’s start by creating those pages or those views so in the source we’re going to create a folder called views and then in there we’re going to create a file file called home let’s call it home view. viw Vue and then let’s create another file called chat view. viw okay and then I’m just going to have a script tag and then we’re going to add our setup so as far as vuejs goes this is not an intro to view I’m not going to explain the the basics um I have a view course A View crash course on YouTube that I just did it like a couple months ago so it’s really up to date if you don’t know anything or you know very little about VJs I would suggest watching that I mean you can watch it after if you want so you can kind of understand what you did but I would suggest watching it before but basically with the composition API which is a kind of a more Modern Way of of building view components you basically have to have in your script you would export uh a setup function right and then you do all your JavaScript all your state stuff you would do in here but you can do a shortcut by just adding setup here that’s why I’m doing that and then I’m also just going to add Lang and since I’m using typescript I’m going to add Lang TS and then I’m not even going to put anything in there for now then we have our template so basically the HTML we want to show on this in this component or in this page which right now I don’t really care we’ll just do an H1 and we’ll just say chat or chat page all right and then for the home view I’m just going to grab that and we’ll just change this to homepage all right because I I just want to get these created so that we can create set up our router so we can close those up for now and then for the router we’re going to have a folder in the source folder called router and then in that will have a file called index.ts and this is where we set up our routes now to do that we need to import a couple things so first off we need create router from view router and then we also want a function called create web history which allows us to use the the HTML 5 history API to to do routing instead of you know actual page loads and then we also want to bring in any views or any components that we want to load so that would be the home View and that would be the chat view okay then what then what we do is create an array for our routes so we’ll set that to an array and each route will have an object with a a path so in this case I want this to be just slash for the homepage and then we can load a component which our views are our components so let’s set that to the home View all right and then we’ll do the same thing for the chat view so let’s make this slash chat and this will be for the component chat view then what we want to do is export const router so this is what we bring into other files and this is where we want to set this to create router which is going to take in an object with history and we want to set that to the create web history again that’s going to use the the HTML 5 history API and that’s a function so you want your parenthesis and then you just want to pass in your routes as well all right and that’s it for that file now couple other things we need to do before we can actually use our router we need to initialize it in the main.ts file which is basically our entry point so in this file uh let’s see we’re going to bring in let’s bring in in and we want create router uh I’m sorry not create router just router we already used that in our router file that we just created and that’s what we’re bringing in here so router from. SL router now we have to use that so where we have this create app which bootstraps the entire application it mounts it to this this element with app um I’m going to just put this in a variable and then we’ll mount it down below so we’ll take this and we’ll do app. Mount here instead and that way we can take that app that app object and we can call use and then we can use the router okay so we got that done now the last thing to be able to see the routes on the page in the main app. view where we have this H1 we want to replace that with router View save that and now we see homepage because we’re on the the Home Route if I go to SL chat then we’re on the chat page and if you want to create just a quick navigation we can do nav and then we can use router link okay so router link we can add a two attribute here to slash and then we’ll have one to chat slash chat and then we should be able to click on chat takes us to the chat page home takes us to the the homepage now we do need to set up our store but before we do that let’s just get the form displayed on the hom page uh and we can get rid of this nav we don’t need that okay so let’s go into Let’s see we want to go into the home View and I’m going to bring in the that robot image because that’s going to be our logo so let’s import robot image from and then it’s going to be dot dot slass assets slash and then robot.png okay so we’ll bring that in and then let’s just start to add some of the the elements down here and use our Tailwind classes so we’re going to have a div that wraps everything get rid of this um div and then for classes here we’re going to do h- screen let’s make this a flex box and align everything to the center so items Das Center and justify Center I’m not going to explain what every Tailwind class does I have Tailwind crash courses I have a premium course if you want that but it’s pretty obvious what most of these classes do background we’re going to make gray let’s do 900 for the shade and then we’re going to do text white okay so that just gives us this dark background then inside that I’m going to have another div with a class of pad let’s do padding 8 and I’m going to do BG gray 800 make get a little lighter and rounded large let’s do shadow shadow large and let’s make it width Dash full and a Max width of medium all right whoops I don’t know what what happened oh need to do this all right so there’s our container for the form so in that div let’s have an H1 we’ll do text Dash let’s do 2XL for the sizing we’ll say font Das semibold let’s do uh margin bottom four and text Center and in that we’ll say welcome to chat AI all right oh I forgot the robot so let’s actually put above the the H1 I’m gonna have my image now since the source is going to be pointed to a variable right it’s Dynamic we need to bind it so we could either do a v-h on or we can just put a colon here and then set it to something Dynamic such as the robot image variable and then I’m just going to add a couple classes on this as well so let’s do MX so margin on the X Access Auto width 24 height 24 and then let’s do a margin bottom four and there we go so now I have the little robot guy and then let’s go under the H1 and we want to have our inputs so for the classes here we’re going to do Dash uh with Dash full let’s do padding two for margin bottom two and then I’m going to do a BG gray 700 so make it a little lighter the text will be white um let’s say round rounded Das large and then I’m also going to add a focus style so on Focus I want outline Das none okay uh yeah and then let’s add a couple other things so in addition to whoops what did I do here that should be there so I want to add a placeholder and it’s going to say name and then we want to buy this name to uh a reactive variable we want to we want to have a a piece of component State called name and we do that with v- model and we we’ll set that to name and we want to create that name variable up top here in the script so let’s say con name and we set that to ref okay we’re creating a reactive value so ref and then whatever the default is will go in here which is going to be an empty string now that ref we do have to import that from view so let’s bring that in that should be lowercase R okay there we go so now that gets bound to that if I put in you know hello here it’s going to show here because that is bound to that input okay and then we’ll have some other values here as well like the the uh email and let’s see do I want to add the rest of the stuff now um yeah uh might as well just copy this down so we’re going to have a loading state so basically when we reach out to our API we’re going to set loading to true and then when we get the result it’ll be set to back to false so let’s set this to ref default value will be false um and then I also want for ER if we have an error I want to have that in our state as well that’ll just be an empty string by default as well and then let’s come back down let’s create our email input so I’m going to take this copy it down let’s change the type to email and let’s see we’ll change placeholder to email and the V model to email so now this input is going to pertain or is going to be bound to this variable if I save it we see the email and then let’s create the button okay so right here let’s say say button and as far as classes go let’s do with- full let’s do padding to we’ll do BG blue Das 500 and rounded large all right and inside that I want it to say start chat but if it’s loading then I want it to show logging in okay and it’s not actually a login but you know what I mean creating the user or just getting the user so if you want to show something Dynamic uh within your view template you use double curly braces so here I can put a JavaScript expression like if it’s loading then show logging in dot dot dot else then show start chat so now we have our start chat button and then I also want to make this button disabled if loading is true so it’s going to be um Dynamic so I’m going to do colon disabled because what I’m setting this to is a is a variable right it’s loading so if I were to set loading up here I know a lot of you guys know this stuff but for those of you that are kind of new to view if I set that now it’s disabled and it says logging in so I’ll set that back to false all right now let’s see um for the error I want to show that down here in a paragraph So in the paragraph we can use a v if directive which is just like an if statement whatever I put in here in here will only show if this is true so for V if we just want to set that to error so if error is true then I just want to show the actual error and we’ll just add a class uh let’s add a class of text Dash red we’ll do red 400 and text Center and let’s do margin top two okay so if I have an error which we can test out by just putting something in here then it will show like that all right cool now that button is going to call a function so let’s go to the button here we’ll say at click so when we click this we’re going to call call a function called create user which doesn’t exist yet so we’re going to go up here and we want to create we want to create the create user function so here let’s say const create user we’ll set that to async an async function and let’s let’s check for the the name and email so we’ll say if not name and we can access the value with DOT value or if not email. value okay so if either one of those are not you know not added then I want to set an error so we’ll say error. value and we’ll set that to let’s say name and email are required and then we’ll just return okay and then I think you know what what I think that’s as far as I want to go because we don’t have our store yet cuz what we’re going to do is send a request to well we could you know what we’ll send the request I’m trying to think of how I want how the order I want to do this in do we want to do the store first yeah you know what before we do the request let’s do the store I mean we can test out this little validation if I were to click Start chat without putting anything then it’s going to give us an error but yeah let’s create our store so so basically when the data comes back from SL register user from our back end it’s going to be stored in our user store which we have yet to create so we’re using pinea which means we have to initialize it so we’re going to go into our main.ts which is right here and couple things we need to do um yeah we’ll go right here and let’s import here create pinea from pinea and then we also need that plugin which is pinea plugin persisted State that’s going to come from oops it’s going to come from uh this right here okay so we want to bring those in and then let’s go above this app let’s say const pinea set that to the create pinea function and then we should be able to take that pinea object and say do use and we can use the plugin so pinea plugin persisted State we want to pass that in and then the only other thing we need to do is use it just like we did the router so copy that down and pass in pinea okay I don’t have the code right in front of me but I’m pretty sure that’s that’s right all right so we’ll close that up now to create our store let’s go into the source folder create a folder called stores and for each resource we’ll have a file so I want to create a file called user. TS so this is where our Global State goes as well as any actions which are going to be functions that mutate the state in some way um so what we’re going to do is import Define store and that’s going to be from pinea all right and then we’re going to export let’s export const and we’re going to call this use user store and we want to set that to that defined store and then that’s going to take in a name of our of our store so user and then we pass in an object and this is where we can Define our state which is going to be set to an arrow function and some and it’s going to be set to an OB it’s going to return an object in that object we’re going to have our user ID and we’re using typescript so I’m going to say null as string or null so what we’re doing is using a type assertion and we’re defining it as null to begin with but we’re saying it can be null or string and then we’re going to do the same thing with the name okay so if you remember when we hit that register user route it sends back the user ID the name and the email we don’t need the email um in the the the project so I’m not going to store it in the in the store if you want to you can but I’m just going to leave it out for now so in addition to our state we want to have our actions so let’s put a comma here and then actions and actions is an object with functions in it to manipulate the state in some way so let’s create a function called set user and set user is going to take in a data object so for the type it’s going to be an object that has a user ID which will be a string and also name which will be a string and we can a access this right these values in our state we can access with this keyword so we can say this. user ID is going to be set to the user ID that’s passed in here uh I’m sorry it’s going to be be data. user ID okay and then we’ll do the same thing with the name so this set that so this.name is going to equal data. name then let’s go under set user because we also want to have a log out because we need we need a way to clear this state so here we’ll say this. user ID is equal to null and then this do name is equal to null and then the last thing we’re going to do is persist the state meaning that it’s going to the user will stay even if the page refreshes or reloads so we’re going to go right here put a comma and we can add in here persist and we’re just going to set that to true and I’ll put a little comment here and just say keep user logged in across um across page reloads and that yeah that should do it so that’s our user store and we’ll have a chat store later on so now let’s go back to our home View and we’re going to import the we want to import the use user store that we just created that’s going to be from and then dot dot slash stores sluser okay and then we’re going to be redirecting the user as well so we’re also going to import use router from view router now we have to initialize these two things so right here let’s say const user so user store set that to uh use user store um why is that oh it’s just I just haven’t used it right yeah okay so then also the router so let’s say cons router set that to use router all right now we have access to the user store so let’s make our request down here in the create user so after this if statement let’s first off set the loading value and set that to true because we’re now loading because we’re we’re making a request and let’s just make sure the error value is clear and then I’m going to use a tri catch for the request now for as far as the URL that we make the request to for me it’s going to be HTTP Local Host 8000 but when you go into production obviously that’s going to change so we should put that in an environment variable so let’s go into the root not the source but in the root and let’s create a EnV and here we’re going to say vit under so these are going to be prefixed with vit and then API URL and we’ll set that to http Local Host 8000 if you ran it on a different port make sure you put that all right so that way we can just have one central place for that URL now let’s make the request and I’m using axios by the way so I have to import that as well uh let’s see so import axios and then let’s say const I’m going to destructure the data if you’re using fetch then you would do that here but I’m going to await on axios.com API URL okay so that’s where we want to go and then slash so after the the closing curly brace SL register Das ususer so we’re making a request to our backend now we also want to pass in after that backtick uh a comma and then we want to pass in the data that we’re sending which is going to be our name and we remember we access that with name. value and then email so this will be email email. value and then once we do that we can then set the this the user data in our store to the data that we get back okay so let’s do that we’re going to say user store and then we’re going to call our set user action we’re going to pass in the user ID set that to the data. userid and then we want to do the same thing with the name so this will be name oops name and then it’ll be data. name okay so we’re setting it to our store and then the last thing we want to do in the try is just redirect so router. push and we want to go to slash chat and then in the catch let’s do uh we’ll say error. value and set that to we’ll just say something went wrong and we’ll say please try again okay uh let’s see did I set this oh you know what this let’s make this e r r because we can’t have that being error and the state being error error all right and then we want to set the loading to false no matter what whether it’s an error or not so we can add a finally so right here finally and then we’ll set the loading do value to false okay so that should do it right so again just to kind of reiterate we click the button it calls create user and checks the name and value inputs sets loading to true then we make our request to our backend which we already created send the name and email we get back the user ID the name and email and we’re storing the user ID and the name in our state right in our frontend UI State then we’re just redirecting to chat and we’re setting loading back to false so let’s try this out uh I’m going to come over here I’m just going to refresh the page here and then let’s do like I’ll say John Doe or just John and John gmail.com start chat says logging in redirects me to the chat page right now I should be able to check I can check this in a few places I can check stream or I can check um uh neon so we’ll do both going to go back here to my Explorer and stream and if we look look at users you’ll see right here John gmailcom so that’s what that’s who I just registered as right and then if we go to our neon console and go to users John gmailcom so now our front end is connecting to our backend sending a request signing up with both you know setting it in stream sending it to our own database so that we have our own uh store of users for what whatever you want to do with them um sets a channel and that’s it now what we need to do is create this chat page so that we can interact with the AI however before we do that I’d like to have a header at the top because we do need um we do need a way to log out I say log out but it’s not actual authentication but you know what I mean just make the user know right the user ID and the email so let’s create a component we’re we going to go into the components folder and let’s create a header. viw file okay and we’re going to add our script let’s add our setup so setup and for the Lang set that to TS and then we’ll have our template okay so in our template I’m going to have a div with padding y let’s do py4 px-6 BG Das gray 800 let’s do Shadow DMD make this a flex box we want to justify uh justify between and let’s do items Das Center all right so within that div I’m going to have the robot image which I need to bring in so up here just close that up so up here let’s import that so import robot image and that’s going to be from dot dot SL assets SL robot.png all right so for the source I’m going to just add a colon here since this is a dynamic variable it’s a robot image and let’s see just a couple classes on this so so class is going to be W so width d8 and height -8 and we’ll just say for the alt chat AI now we want to bring this header in uh it’s not finished yet but I just want to be able to see it so it’s only going to be on the chat page it doesn’t have to be on the homepage so we’re not going to put it in you know the main app file we’ll put it right in chat so in the chat view let’s go ahead and import header and let’s see we’ll just get rid of this uh H1 here I do want to have a div though with just a couple classes so let’s do Flex I’m going to do flex-all h- screen BG let’s do gray 900 and text- white and then in that we’ll put our header and there we go so looks pretty good so back to let’s see home view we can close up but back in the header let’s finish this up I’m going to put under the image here an H1 with a class of text large and font semibold and we’ll say AI or chat AI now it’s way it’s way over on the right because I have justify between so all the spaces in between but I’m going to put a log out button so this will be pushed over into the middle and the log out button will be on the end so let’s add button and I’m going to do text Gray 400 and also on Hover so hover will make text- white and inside of it will’ll say log out okay so there’s that now when we click it let’s go ahead and add an event handler so we want to click so when we click it’s going to call a function called logout now the log out function is the action in the store right if I go to my user store we have this log out action so we can bring this into to our our header and use it so up at the top here let’s let’s import and whenever you want to use the user store you need to bring in use user store all right so we’re bringing that in um I’m also going to bring the the user use router in so use router from view router and then we need to initialize both of those so let’s say const use or user store set that to use use user store and then let’s say cons router set that to use router and then for the log out uh what we’re going to do is if it first calls a logout function here so we’ll create that and then from there we want to then take the user store and we want to call the logout action and we can do that just simply by doing Dot user store. logout then we just want to redirect so router. push and we want to redirect to the homepage the home view okay so when we call that log out it’s going to clear the state because again in our user store it’s going to set the user ID and the name to nulls so when we log in we hit our backend we get the data back it sets it when we log out it clears it so let’s do it let’s say log out and now there we go so now we want to do the the chat right we want to be able to actually interact with the AI so I’m going to log back in as John so I should be able to just do John on at gmail we want to work on getting the the messages which right now John is a new account so he doesn’t have any so I’m just going to add some so I’m going to go to my to postman or open whatever HTTP client you want let’s make a post request to chat and then in the body you want to include your user ID which for me is going to be johore gmailcom that should be lowercase ID and then we want to send a message and for the message I’ll just say what is the capital of um say Maine and let’s send okay so capital of main is Augusta let’s send another another message here what do I want to ask it um say what is the what is the most popular programming language curious to see what it says oh most popular language is Javascript however it’s very it it may vary based on the specific industry or use case Python and Java are also widely used all right so we have uh some chats now so we want to get these and we want those to display here so what I’m going to do is create a new store for chats so in stores let’s create a file called chat. TS now you can set your store up in different ways in the user store we use the object style where we have a state you know State we have an object with State action and uh just any other options but we can also use the composition API and we can use reactive variables with ref like we did in our components so that’s what I’m going to do with chat the chat store because it’s a little more complicated so let’s start off by importing Define store okay so we want to bring that in from paa and then let’s also import ref because I’m going to be using reactive variables that’ll be from View and then I’m going to bring in axios because I’m also going to do the the fetch from here or the HTTP request and then let’s do what else we also want the user store okay because we’re going to need the user ID the the user that’s logged in so let’s say use user store bring that in from do/ user all right now we’re going to be formatting our chat messages a certain way so meaning the message that we send as well as the message we get back and I’m going to use an interface for that okay so this is a typescript interface and it’s basically just like a type where we Define certain fields that it has to match so let’s call this chat message and chat message has to have a message which is going to be a stream Str and has to have a reply which will be a string so that’s our chat message let’s do another interface of formatted message and formatted message is going to have a role because when we get our messages back we need to know which which one is the the user and which one is the AI so the role is going to be a string of either user or AI okay it has to be one of those and then we’ll have the content which will be either the message or the reply so that’s going to be a string so those are our interfaces and you could put those in separate files if you want but there’s only two Fields so I’ll just keep them here and then let’s export const and let’s say use chat store just like we did use user store set that to our Define store which is going to take in the name which which we’ll call chat and then we’ll have our function yeah actually this yeah we don’t want an object here you can use an object like we did in the use uh user the user store but I’m going to have a function and we can use the composition API here so let’s create a variable for messages and set that to ref so it’s going to be a reactive variable and I’m going to use typescript generics here to Define what a message should be which is going to have a roll which will be a string and content which will be a string and it’s going to be an array messages so we also want to have after the curly brace here our brackets and uh ref is a function so we want to have our parentheses and the default value goes in here which is an empty array okay and then the other thing that I want to have is is loading and and set that to ref which is going to be false by default then we want to initialize our user store because we need that user ID so let’s say const user store and set that to use use user store right and then we need to have a function that will load previous chat messages so let’s say load previous chat messages and we’ll call this load chat history and this is going to be asynchronous okay and first thing I’m going to do here is check for the the user the user ID so we’ll say if the user store do user ID or rather we want to say if not the user ID then we just want to return okay then we’re going to open up a try catch and we want to make our request and the the end point we’re making the request to is get messages right because we want to get our messages so let’s say const and we’re going to destructure the data from axios so let’s say await axios Dot and it’s going to be a post request and then we have our API URL which is in the the EnV file so let’s use some back ticks here and we’ll say um to access that we can do import. meta Dov and then Vore apiurl and then outside of the curly brace we want to do SLG get- messages okay so we’re going to hit that endpoint and then we want to pass in an object because we need to send the user ID with it to get the user’s messages so it’s a user ID and then user store. user ID all right and then uh let’s see so the way that the messages come back right in fact we can just we can just check this out so if I say new new request and I want to hit I want post and then the get messages and in the body I’m going to add my user id which is johore gmailcom if I do that I get back my messages the me the two that I just sent right so it comes back in an array called messages and then an object that has ID user ID message reply created at where message is what I sent reply is what the a sent now I want to I want to manipulate this data into basically into an array that has the role which will be either the AI or the user and the content which will be either the message or the reply based on if it’s the AI or the user right we want to match this right here so this this interface so we have to uh basically map through and and return an array with those two Fields um and we also have to flatten it because what we what we can do by using map right if we map through we can get like an array with the the roll so it would be like user and then content would be just whatever right and then we’d have another array like that and we don’t want that we want we want an array that has objects in it like this so we want to flatten it so the the method we can use to do that is called flat map okay which is just a JavaScript method so that’s what we’re going to do we’re going to come down here and we’re going to take messages which is our reactive variable that we set up above and I want to set the value of it it right so value and I want to set it to the data. messages which comes back which looks like this initially it’s going to look like this an array of of those objects so on this we want to run our DOT our flat map just going to go on the next line here so let’s say um flat map and then that takes in a function and for this we’re going to say MSG so for each message which is going to be formatted as the chat message interface and then what that’ll return so right here we want to put a colon and what that’ll return is a formatted message array okay that has that matches this right here the U the role and the content okay so it gets passed in the chat message we want it to return the formatted message and the way we do that um actually we just want to set this right to an array and pass in an object and say roll so roll user and then the content will be the MSG Dot and then message because we the me remember what we’re getting back here the message field is the user what the user says the reply is what the AI says so we want to do that and then let’s copy that down and then the rooll for AI will be the reply so MSG do reply and then I want to just filter filter it because all I want is the content that’s what we want to show right is the message and the replies which is now going to be in content we don’t want to show the ruler so the ruler the role so I’m going to filter that out so let’s go on the next line and say filter and pass in uh let’s say MSG so MSG which will now be the formatted message okay no brackets because it’s not an array we’re we’re going through each one so each one will be a formatted message and we just want to return from that we want to filter out just the content which will be the reply or the me and the message and then for the error so in the catch here let’s just do a console do error and we’ll say error loading chat history and then show show the error Okay cool so we have our chat store and we have our load chat history so now we want to call this right we want to call this within our um chat view so let’s go to chat View and we want to bring in couple things here so up at the top let’s import now we want this to happen when the component mounts and we can do that by using on mounted so I’m going to bring in on mounted I’m also going to bring in a function called Next tick and what that does is it allows us to wait until the Dom finishes loading before it does something so we’re going to do that and that’s going to both of those are going to be from View and then we want to bring in both stores so let’s say use user store and then we also want to import the use chat store and while we’re at it let’s bring in use router from view router and I think that should do it for now now we need to initialize a couple things so the user store set that to use user store then we have the chat store set that to use chat store and then the router so const uh router set that to use router and then I want to make sure that the user is logged in so uh let’s just say Ure user is logged in and again I know I’ve said this but if you want to incorporate actual authentication with a password you can do that but I didn’t really want to focus on authentication because that’s you know that’s such a huge thing I wanted to focus on the whole AI aspect of it so yeah let’s just check if not so if not user store. userid then we’re just going to take the router and we’re going to push to slash and then I also want to scroll to the bottom right so because we’re going to be able to scroll through the chats I want it to to scroll to the bottom so let’s say autoc scroll to bottom so I’m going to create a function here called scroll to bottom all right and then in that um this is where we’re going to we’re going to use next tick which again we’ll wait until the Dom has been updated and that takes in a function and then to scroll to the bottom let’s get the chat container which we haven’t created yet we don’t have that you know down in the output but we will so we’ll say chat container and we’re going to set that to document. getet element by D and it’s going to have an ID of chat Dash container okay so then under that I’m going to just check for chat container and then we’ll take chat container and we’re going to set the scroll Top Value to the scroll height so chat container. scroll height okay now like I said when when we the component mounts right we come to the page the component mounts that’s when we want to call the load chat history that we just created in the store so let’s go under that scroll to bottom and let’s say on mounted okay so this is how we use this we just pass in a function and then we’re going to call chat store and then load chat history which returns a promise so I’ll just use a DOT then um and then we want to scroll to the bottom so scroll to bottom like that all right and let’s see why is this does not exist on type store did I not save it um oh I didn’t return this oh so at the bottom here we want want to return an object and we’re going to return the messages the is loading the load chat history and that’s it for now okay good so now that error should go away now we want to add the output so let’s go right under the header and I’ll say chat messages so we’re going to have an ID of chat Dash container and let’s add a couple classes here as well so for classes we’re going to do Flex das1 let’s say overflow on the Y AIS Auto let’s do padding so P4 and space- y-4 now in here we’re going to have a div with a V4 so the V4 attribute will allow you to Loop over something in our case the the chat messages and output um elements based on you know on those messages so let’s have um let’s do a div I’m just going to add a class of flex onto this and then items Dash start okay so on this div I’m going to add a v-4 directive and set that to the way we do this is open up some parentheses and say MSG so MSG we can also get the index in and then whatever we want to Loop over which is going to be this the chat store. messages so we Loop over those we also need to add a unique key so let’s bind so colon key and we’re going to use the index as the key and then uh let’s see we’re going to add we’re going to have some we’re going to have a a conditional class because if it’s a user I want to align it a certain way and if it’s a AI I want to align it a certain way so the way we do that is in addition to class we can do colon class to make this Dynamic and then we can put basically a JavaScript expression in here I’m going to say if the message. roll if that is equal to user then I want to add add the class of justify Das end else I want to add the class of justify Dash start okay so that’ll add some conditional styling and then inside that div we’re going to have another div and let’s do a class of we’ll say Max let’s do Max with uh extra small so Xs and I’m going to do p padding on the xaxis 4 padding on the Y I 2 and let’s do rounded large and then I also just want to add on medium screens and up then let’s do a Max width of medium okay and then I want this to have a conditional class as well because I want to have a different color based on if it’s the user or the AI so just like we did above we’re going to do colon class for conditional uh styling and let’s say If the message. roll is equal to user then I’m going to have the class BG Das blue- 600 with text- white else then I’m going to have the class of BG Das gray- 700 and text- white and then let’s put in here we want the content so MSG do content which will be either the reply if it’s the ai’s message or message if it’s the user’s message okay so hopefully that makes sense um yeah let’s go ahead and try that there we go so looks pretty good what’s the capital main capital main is Augusta it’s the most popular programming language gives us the answer cool now the next step is to be able to actually ask something because right now we’re just seeing the messages that are there so let’s go back to our chat store and we’re going to have a new function to do this so let’s see we’re going to go under this one and let’s say send new message to Ai and we’re going to create a function called send message set that to async and it’s going to take in a message which is going to be a string okay then in our function I’m going to check for let’s say if we’ll say if not message and I’m just going to add trim onto that to trim the whites space or not user store. userid then we want to return okay so if there’s no message or if there’s no user then we just want to return then we’re going to take our messages right so messages is our reactive value up here right so our messages and I want to push onto that so let’s say messages. value. push I want to add to that an array not an array an object with a rle of user because it’s coming from the user and then the content of message okay okay and then this is where we’re going to use that is loading so I’m going to set is loading to True okay because uh actually that needs to be is is loading. value set that to true because we’re about to make our request all right so let’s add a try catch and in the try we’re going to make our request so we’ll destructure the data from await axios dopost and the endpoint we’re hitting is the chat endpoint so let’s do our back ticks and then we’re going to add the import. meta Dov we want the Vore API URL and then slash chat so that’s the end point and then we want to pass in the data that we want to send which is going to be the message and the user ID which is going to be set from the user store. userid okay now let’s come down here and let’s take the messages value and push on to that and this time it’s going to be the role of AI because this is the response right so role is going to be Ai and then content is going to be the data dot in the rep reply CU again the it’s going to look like well this is get messages but when we send a chat it it’s formatted the same way basically all right so we do that and then in the error let’s let’s just return a message we’ll have a message but it’ll just say like unable to process so let’s take our messages and let’s well first off you know what let’s let’s just console. error and we’ll say error Sending message and we’ll put the error okay and then as far as the messages go let’s say value. push and we’re going to pass in an object with the role which is going to be Ai and then the content which will say error unable to process request okay and we want to set the is loading to false so for that we can add a finally and then we’ll set is loading the dot value set that to false and then we want to make sure that we return the um send message okay so now ready to use that so let’s go back to our chat View and let’s see we’re going to go so we got our message content and we want to go under under the second closing div here and we’re going to have uh a div let’s do Flex so flex and justify Dash start and then this is going to have an if because I want to check for that loading state so let’s say a vif and set that to chat store Dot and then is loading okay so if it’s loading then I just want it to say AI the AI is thinking so let’s actually put a div with a a background we’ll do gray 700 and text- white PX4 py 2 and let’s do rounded large okay and then in that div we’ll have a span and I’m going to use the animate Das pulse class and we’ll say ai ai is thinking dot dot dot okay so that’s if it’s loading and then we want to put the input the chat input uh under Let’s see we got one two 2 three so just above the last div is where I want to put the chat input now I want this to be in its own component so let’s uh let’s do that let’s create a new component in the components folder and we’ll call this chat input. view okay and then we’re going to add our script and our setup and our Lang of TS okay and we’re going to import we’re going to import ref from View and I want to set a message variable here so let’s say cons message set that to ref it’s going to be empty by default and then we have to send this up a level right because we’re embedding the chat input component into the chat View and we need the message to be sent up so we need to emit it so we’re going to say const Emit and the way we do this is we can use Define emits and then we’re going to pass in an array and it’s going to be called send Okay um now here we want to have our send message and set that to an arrow function okay and then that’s what we’re going to call when we submit the input put so before we do that let’s create our output our template so in our template we’ll have a class of padding padding four let’s do BG Das gray 800 and flex and then in that div we’re going to have our input add a couple classes here Flex one padding to rounded large uh BG gray 700 text white and then I want to have a focus class as well so at the end here we’ll say Focus colon and I just don’t want to have an outline so outline none all right and then in this input we’re going to bind it to that message variable so we use V model for that so V model to message and then we’ll add a placeholder as well so say send a message make that lowercase so send a message and then I want to be able to call the send message function when we hit enter with we’re going to have a button too but I also want to be able to hit enter so I’m going to add here let’s say at and then we can do key up and the key that I want is enter so we can just say do enter and then set that to send message all right and then underneath the input just going to put a slash there so underneath the input we’ll have the button so here let’s say button and give it some classes margin left we’ll do margin left two PX -4 py2 I’m going to do a color of BG blue 500 and rounded large okay and the button will just say send and of course when we click this we want it to call send message so let’s say at click set that to send message all right and that should do it now we just have to add our send message here and all we really need to do is emit the the message up to the to the chat view so let’s first off check we’ll say if not if not message. value and then we’ll just trim it so if not message then we want to return okay and then we want to use that emit and we want to emit send and we want to send along with it the message. value and then we’ll just clear the value so message. value set that to an empty string all right now we need to embed the chat uh the the chat input into the chat view so let’s import it so right here we’re going to import chat input and we’re going to come down here and go right above the last div and let’s embed it so chat input and then we want to pass here we want to say at send because remember we from the chat input we emitted a custom event called send so when that’s called basically when we hit the button or we hit enter then that event is going to get called so we need to add a Handler to it and the Handler is simply going to be the chat store. send message whoops chat store. send message and that’s it so now we have our input down at the bottom here so let’s try it out we’ll say who who was the who is the 10th president of the us so AI is thinking and then it comes back 10th president was John Tyler so yeah I mean this is working now one thing that I’d like to add and we’re going to have to do this in the back end is context because you’ll see if I say now uh we’ll say what year what years was he president okay so I’m going to send that a i is thinking to respond to your query I need to know the specific person so it doesn’t have context right it doesn’t know it doesn’t look back and and see that I just asked about John Tyler so we have to add that functionality in the back end I’m actually going to go let’s see so this is the back end here I’m going to open up vs code in the back end okay so this is now my my Express API and we’re going to go to the uh let’s see the server TS and the where we want to do this is going to be the chat endpoint and if you want to break these up into separate files you can I know that this is kind of a long file um but yeah so this is the chat endpoint and let me just figure out where where I want to put this um so we we check for the message the user ID we query users check make sure the user’s there check the user in the database and let’s see then we set our response so let’s go after we check the user in the database and before we send the response so we’re going to go right here and we what we want to do is fetch users past um p pass messages for context and what we want to do here is essentially instead of sending back because right now when we hit slash chat it sends back this it sends the the an object with the role the user rle and then whatever the message is all right uh I’m sorry that’s not what it sends back that’s what we’re sending it and then it sends back down here it sends back the reply but what we want to do is instead of of us sending it only the current message we want to send it we want it to also have context of the last whatever 10 20 messages however however many you want to set it you want to input that into open AI as well as the current message so that way it has context and it can look back and when we say you know what years did he serve then they can look back at the last few messages and respond with that context so to fetch the users um not fetch the users fetch the the users messages we can come down to get messages and right here the chat history we can just copy this this block of code okay and then we’ll put that here so right before we send the the request to open Ai and we’re just selecting from the chats where the user is the user ID I’m going to get rid of this semicolon because I also want to just add order by and we want to order by chats Dot and then create at and then I’m also going to limit it to the last 10 messages if you want to put more that’s fine but just remember the more you put the longer the request is going to take all right so we have that now like I said we need to send not just the current message but but the last the chat history so what I’m going to do is create a variable for actually let’s put a comment here let’s say format the chat history for open AI okay because it has to be formatted in a certain way so I’m going to call this conversation because that’s what it is a bunch of messages in context is a conversation and for the type we’re going to use this uh chat what is it chat completion message param so we want to use that and it’s an array so put some brackets on there and uh I believe we have to bring that in so up at the top oh it was brought in okay so this comes in from open AI resources all right so back to where we were so we’re going to set that and we’re going to use flat map because it needs to be a flattened array so let’s say chat history and then flatmap pass in our function and say for each chat we want to return an array so actually this can be just brackets so we want to return an array that has uh it should look like this right so it should have the role so the user R will have the content of chat. message and the AI role will have the chat reply because we need both we need our messages and the ai’s messages in the conversation in order to have context so now let’s go under that and let’s say uh we’ll put a comment here we’ll say add latest user messages to the conversation okay so we can do that by saying conversation. push and we’re going to push onto it the latest which is this right normally or or what we were doing is just passing this in we still want to pass that in along with this stuff so we’re pushing this onto this so we’ll just put that in there like that all right now we should be able to come down here and instead of for messages instead of doing this we’ll say conversation as and then the chat completion message param array uh let’s see content oh so this right here this this shouldn’t be AI this should be assistant because when you’re dealing with open AI there’s a couple different types that you can use for role there’s user um we’re dealing with assistant here the AI is is the assistant so that should clear up that error all right so now that we’ve done that it should have the context we shouldn’t have to change anything in the front end because it’s still returning the same thing you know that’s still returning the AI response but it should have context now so let’s go ahead and run the back end because I did I stopped it so I’m going to do npm runev all right so now I’m going to come back over here and I’m going to say who was the who was the 11th president of the US okay James K pul now let’s say what years did he serve there we go James K poke served as the president March 4th in 1845 1849 cool so now we have context and I think that’s something that’s you know really important you don’t want to just ask one question and that’s it you want to be able to conversate all right so we’re we’re just about done this one other thing I want to do so let’s say I I was to do like um give me give me a list of the the top five cities in the US when it comes to crime rate and you could take I guess I could interpret that as low lowest crime or best or you know least crime it answers but it’s not formatted very nicely we have number one here Albuquerque number two Baton Rouge so I want this to actually show in a nice list so what we can do is just add a simple format message function in our chat view so let’s go to our chat View and we can use regular Expressions to replace certain um certain tags and so on so let’s uh let’s see where do I want to put this we’ll put it right above the scroll to bottom and let’s say format format AI messages for better display so we’ll call this format message and that’s going to take in text which is going to be a string okay and then we just want to check if there’s no text if there’s no text then we’re just going to return an empty string all right and then we’re going to return a bunch of replace methods with some regular expression and I’m not going to type this out I’m going to paste it in and you guys can get it from the repo if you don’t want to type this stuff in but basically we’re going to preserve line breaks we’re going to bold text um so we’re looking for like new lines and replacing it with a a line break HTML tag bold text inline code um bullet points Etc so the way that we use this is down here where we output the content instead of outputting it in the div inside of these curly braces we’re going to use the the V HTML directive on the div right so this div right here let’s add v- HTML and then we can just set that to the the format message function and then pass in our MSG do content and now as you can see the list is nice and clear all right all right so that’s yeah that’s pretty much it guys um if I were to log out and log in as I think I have some messages with this with uh what was it Brad at Gmail and you can see the code when I asked for a rest API I mean it’s not the best if I make it a little bigger you can see it better it’s not formatted the best but it’s readable you know um but yeah so that’s it we now have a an AI a AI chat bot that works pretty well and will give you you know gp4 powered data now the last thing I’d like to do is actually deploy this and since we have our back end and front end completely separate we’re going to be hosting our back end with render and we’re going to host our front end with versel all right so let’s get into that all right guys so we’re going to go ahead and deploy this project and we have the back end which we’re going to be deploying to render.png all right so I’m going to go to render.jpg sub repos I’m going to choose chat AI API and then there’s just some options that we need to add here so for one um the build Command right so if we look at our package.json we’re using typescript and we need to compile that typescript so we’re going to do that on the server so we need it to run this npm run build in addition to npm install so right here we’re going to say and run npm run build and then the other thing is the the start it’s the server.js is going to be in the disc folder so here instead of node server JS we’re going to say node disc SL server JS all right then we just have some environment variables so in ourv copy everything but the port and then we can say add from. EnV paste that in add those variables and that’s it now we’ll click deploy web service and this can take like a minute or two all right so it says your uh build successful your server is live this is the domain here and we can test this real quick so I can go into Postman and make a new request post request to this that’s not it it didn’t copy there we go all right so we want to do this domain right and then we’re going to do slash and let’s do get Dash messages and then in the body we’re going to add user ID and let’s do johore gmailcom and send and there we go so we’re getting the messages so we know that that the API is now live so now we want to do the front end so I’m going to go to verel and log in with GitHub okay and then we’re going to click add new project we’re going to select chat AI UI this time and let’s see so V build and output settings I believe leave everything we can just leave as is um for environment variables though we want to add the Vore aior key not key what am I doing API underscore URL right because this is going to be our endpoint and we don’t want to use Local Host 8000 or whatever we want to paste in the the render endpoint or the render URL and yeah I think that’s it so let’s click deploy again this can take like a minute or two all right so that took like 10 seconds and then we’re going to continue and let’s check it out if we just click on that and you can see we’re at the live site now and I’m going to log in as let’s say John and John at Gmail start chat and I should see my chats because remember we’re using the same database and let’s try it out I’ll say what is the capital of Texas AI is thinking the capital of Texas is Austin so we have this fully deployed and you can see just how easy that that was all right but that’s it hopefully you guys enjoyed this I may do another video where we create a react front end uh or I might save it for my react course I’m really not sure yet um and I may add on to this we may add actual authentication with you know protect um protection of the actual messages and adding password authentication and so on but I think that this is a really good start and hopefully you guys learned quite a bit from this and thanks for watching if you liked it leave a like and I’ll see you next time

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
April 8, 2025
AI Voice Agent: Building a Conversational Learning Platform
The provided text details the development of an AI voice assistant for educational content using various free tools and services. The process begins with setting up a Next.js application and integrating UI components using Shadcn. Authentication is implemented with Stytch, followed by the creation of a dashboard featuring interactive coaching options. The text then explains how to establish real-time speech-to-text functionality with AssemblyAI, connect to AI models via OpenRouter, and utilize Amazon Polly for text-to-speech conversion. Finally, it covers user profile management, token usage tracking, deployment on Vercel, and putting Stytch into production.

Study Guide: AI Voice Assistant for Educational Content

Quiz
1. What is the primary purpose of AssemblyAI in this project?
2. Describe the function of OpenRouter in the context of the AI voice assistant.
3. What are the two main ways to host Convex, as mentioned in the source? Briefly explain each.
4. Explain the role of Shadcn UI in the development of the user interface.
5. What is StackOS, and what problem does it solve in this project?
6. Describe the folder structure created when a new Next.js application is initiated using the command npx create-next-app@latest.
7. What is the significance of the middleware.jsx file in this application?
8. Explain the purpose of React Context in managing user data within the application.
9. How is user authentication handled before a user can access the dashboard?
10. What is the role of Amazon Polly in the final stage of the AI voice assistant’s functionality?
Quiz Answer Key
1. AssemblyAI is used to convert live audio input from the user into text with high accuracy and low latency, enabling real-time transcription of speech.
2. OpenRouter serves as a platform to explore and utilize multiple Large Language Models (LLMs) from various AI providers, allowing the application to leverage different AI models for generating responses.
3. The two main ways to host Convex are Convex Cloud, where the database is set up and managed on Convex’s own platform, and self-hosting, where Convex can be hosted on the developer’s own infrastructure using Docker.
4. Shadcn UI is a Tailwind CSS-based UI component library that provides pre-built, customizable components to rapidly build the user interface of the AI voice assistant.
5. StackOS is an authentication service provider used to add user sign-up and sign-in functionality to the application, ensuring that only authenticated users can access protected parts like the dashboard.
6. The initial Next.js application structure includes an app folder for routes and layouts, public for static assets, next.config.mjs for Next.js configuration, and package.json for project dependencies and scripts. The global.css handles global styles, and with Tailwind CSS v4, it also includes color variables.
7. The middleware.jsx file is used to protect specific routes of the application. It checks if a user is authenticated before allowing access to those routes, redirecting unauthenticated users to the sign-in page.
8. React Context provides a way to share data (like user information and authentication status) across different components in the application without the need for prop drilling, making the data accessible in various parts of the UI.
9. User authentication is handled using StackOS. When a user attempts to access a protected route, the application checks their authentication status. If authenticated, they are granted access; otherwise, they are redirected to the sign-in/sign-up page.
10. Amazon Polly is used to convert the text generated by the AI model (as a response to the user’s speech) back into spoken audio, allowing the AI voice assistant to provide verbal feedback to the user.
Essay Format Questions
1. Discuss the key technologies and services (AssemblyAI, OpenRouter, Convex, Shadcn UI, StackOS, Amazon Polly) chosen for this AI voice assistant project. Explain the specific role and benefits of each in the overall functionality of the application.
2. Outline the process of setting up and implementing user authentication and authorization in this Next.js application using StackOS. Detail the steps involved in protecting routes and managing user sessions.
3. Compare and contrast the two methods of hosting Convex databases (Convex Cloud and self-hosting with Docker). Discuss the advantages and disadvantages of each approach in the context of developing and deploying this AI voice assistant.
4. Describe the workflow of a user interacting with the AI voice assistant, from speaking into the microphone to receiving a spoken response. Detail the steps involved in speech-to-text conversion, AI model processing, and text-to-speech synthesis, highlighting the technologies used at each stage.
5. Evaluate the importance of state management in this AI voice assistant application, focusing on the use of React Context for managing user data and conversation history. Discuss how effective these strategies are for maintaining application state and facilitating component communication.
Glossary of Key Terms
- AssemblyAI: A platform that provides APIs for converting speech to text and offers features like live audio transcription with high accuracy and low latency.
- Convex: An open-source, real-time database service that offers features like type safety, serverless functions, and self-hosting capabilities.
- Docker: A platform for building, sharing, and running applications in isolated environments called containers, used here for self-hosting Convex.
- Large Language Model (LLM): An AI model with a vast number of parameters, trained on a massive corpus of text, capable of understanding and generating human-like language (e.g., Gemini, GPT).
- Next.js: A React framework for building server-rendered and statically generated web applications, offering features like file-system routing and API routes.
- OpenRouter: A platform that aggregates multiple AI LLMs, allowing developers to access and utilize various models through a single API.
- React Context: A feature in React that allows sharing state between components without explicitly passing props through every level of the component tree.
- Shadcn UI: A collection of reusable UI components built using Tailwind CSS, designed for easy customization and integration into React applications.
- StackOS (Stackup): An authentication service provider that simplifies the process of adding user sign-up, sign-in, and user management to web applications.
- Tailwind CSS: A utility-first CSS framework that provides low-level utility classes to style HTML directly in the markup.
- Amazon Polly: A cloud-based service that converts text into lifelike speech, supporting various voices and languages.
- Self-hosting: The practice of hosting an application or service on one’s own infrastructure rather than using a third-party platform’s managed service.
- API Key: A unique identifier used to authenticate requests to an API, ensuring that only authorized users or applications can access the service.
- Middleware: A layer of software that acts as a bridge between an operating system or database and applications, often used for tasks like authentication and request handling.
- Mutation (Convex): A type of function in Convex that modifies data in the database.
- Query (Convex): A type of function in Convex that reads data from the database without making any changes.
- React Hook: A feature in React that lets you use state and other React features without writing classes (e.g., useState, useEffect, useContext).
- Speech to Text (STT): The process of converting spoken audio into written text.
- Text to Speech (TTS): The process of converting written text into spoken audio.
Building an AI Voice Assistant for Education

## Briefing Document: AI Voice Assistant for Educational Content Development

**Date:** October 26, 2023 (based on content references)

**Subject:** Review of Source Material for Building an AI Voice Assistant

**Introduction:**

This briefing document summarizes the main themes, important ideas, and key facts presented in the provided source (“01.pdf”). The source outlines the initial steps and technologies involved in developing an AI-powered voice assistant specifically tailored for educational content creation. The document highlights the planned features, the technology stack, and the setup procedures for the frontend, authentication, backend database, speech-to-text, text-to-speech, and basic UI components.

**Main Themes and Important Ideas:**

1. **Goal:** To develop an AI voice assistant that can be used for creating educational content. This involves converting speech to text, leveraging AI models for content generation or assistance, and converting text back to speech.

2. **Technology Stack:** The project intends to utilize a free-to-use technology stack, including:

* **Frontend Framework:** Next.js (version 14 or 15 based on “at latest” mention, specifically using the app router).

* **UI Component Library:** Shadcn UI (Tailwind CSS based, targeting version 4).

* **Authentication:** Stytch (free to use).

* **Backend Database:** Convex (with options for both cloud and self-hosting using Docker).

* **Streaming Speech-to-Text:** AssemblyAI (aiming for 90% accuracy and <600ms latency, with a $50 credit offer).

* **AI Model Exploration:** OpenRouter (to access multiple LLM models).

* **Text-to-Speech:** Amazon Polly (free to use).

3. **Project Setup (Next.js):** The source details the initial steps for creating a Next.js application using `npx create-next-app@latest`. It covers:

* Project naming (“AI coaching voice agent”).

* Choosing not to use TypeScript.

* Using ESLint.

* Selecting Tailwind CSS (mentioning the shift in version 4 where `tailwind.config.js` is integrated into `global.css`).

* Opting for the app router.

* Describing the basic folder structure (`app`, `public`, `next.config.js`, `package.json`).

* Running the development server using `npm run dev`.

* Basic page creation (`page.js`) and folder-based routing.

> “now simply go to the terminal click new terminal or you can open this terminal from the bottom just type a command npm run D click enter and then you will see your application is now running on this particular URL with this 3,000 port number”

4. **UI Component Library (Shadcn UI):** The process of integrating Shadcn UI is explained:

* Visiting `ui.shadcn.com` and following the Next.js installation guide (specifically for Tailwind CSS version 4).

* Initializing Shadcn UI using `npx shadcn-ui@latest init`.

* Selecting base color themes (e.g., neutral).

* Forcefully adding the library with React 19 (`use force`).

* Installing individual components (e.g., button using `npx shadcn-ui@latest add button`).

* Customizing component styles through `global.css` by modifying color variables.

> “simply copy this command in order to initialize the shadan I will open another terminal and and simply we’ll execute this shadan command”

> “whenever you want to use any shat CN UI components you have to first install that component so in this case let’s say you want to use button component so I will copy this inside the terminal let’s paste that”

5. **Authentication (Stytch):** The source outlines the integration of Stytch for user authentication:

* Creating a free account on `stytch.com`.

* Creating a new project and selecting sign-in options (Google, GitHub, email/password, etc.).

* Copying the `.env` variables provided by Stytch and creating a `.env.local` file in the project root.

* Installing the Stytch Next.js SDK using `npm install @stytch/nextjs`.

* Initializing Stytch using the provided command (which might require `sudo` on macOS due to permission issues).

* Stytch automatically creates a `handler/stytch/page.js` route for signup and sign-in.

* The `<StytchProvider>` is added to `layout.js`.

* Using the `<UserButton>` component from `@stytch/nextjs` to display user profile options.

* Implementing route protection using middleware (`middleware.jsx`) and the `@stytch/nextjs/server` package.

> “in order to add an authentication we are going to use a stack o when I search on Google the bestas authentication service provider there are lot of service provider like Clerk and stack O is a alternative for the clerk”

> “simply simp go to this stack o.com and create a new account and again one more important thing that it is free to use”

> “I will create this file called middleware and this need to be a jsx and simply we’ll copy this line of code”

6. **Backend Database (Convex):** The source details setting up Convex as the backend database:

* Creating a free account on `convex.dev`.

* Creating a new project.

* Installing the Convex client library using `npm install convex`.

* Initializing Convex in the project using `npx convex dev` and linking it to the created project.

* A `convex` folder is created in the project.

* Wrapping the application with the `<ConvexProvider>` in a custom `provider.js` (marked as `use client`).

* Creating a schema (`schema.js`) to define database tables and their columns using Convex’s type system (`V.string`, `V.number`, `V.optional`).

* Convex automatically generates unique IDs and creation timestamps.

* **Self-Hosting Convex:** The document briefly mentions the option for self-hosting using Docker, referring to a separate guide for detailed instructions. This involves downloading a `docker-compose.yml` file, running `docker compose up`, generating an admin key, and configuring environment variables (`.env.local`) with the self-hosted Convex URL and admin key.

> “in order to store a application data we need a database and for that one we are going to use a convex convex is a open-source database for your application”

> “simply click create project and give the project name here we’ll say AI coaching voice agent and simply click create once it is created your database is now ready”

> “simply go to the uh copy this npm install convex command we’ll go to our terminal and inside the new terminal we’ll execute this command”

> “inside the Explorer you will see the new folder get created called convex and inside that you will have some of the files you don’t need to worry about any of this file right now”

> “inside the schema we’ll simply say export default Define schema and inside the Define schema you need to write a table name let’s say users here we’ll say Define table”

7. **Saving User Information:** The process of saving user information to the Convex database upon successful authentication is described:

* Creating a Convex mutation function (`users.js`) called `createUser` to insert user data into the `users` table.

* The function checks if a user with the given email already exists before inserting a new record.

* It takes `name` and `email` as arguments and defaults `credits` to 50,000.

* The `subscriptionId` field in the schema is marked as optional.

* A custom `AuthProvider` component (marked as `use client`) is created to fetch user information using Stytch’s `useUser` hook.

* A React state (`user`) from `useUser` provides user details (display name, primary email, etc.).

* A Convex mutation hook (`useMutation`) is used to call the `createUser` function.

* The `createUser` mutation is called within the `AuthProvider` when user information is available, passing the user’s display name and email.

> “next Once user is authenticated very first time we going to save user information to our database so that we can keep a track of all the user uh activity and everything so that what we are going to see next”

> “inside the convex we need to write a function so I will create a new file called users. JS and inside that we’ll create export constant create user now inside this create user we are going to insert the user inside our database for that one obviously it’s a mutation”

8. **State Management (React Context):** The source introduces the use of React Context for sharing user data across the application:

* Creating a `_context` folder to store context-related files.

* Creating a `userContext.jsx` file using `createContext`.

* Wrapping the application’s children within the `AuthProvider` with the `UserContext.Provider`.

* Storing the fetched user data in a React state (`userData`, `setUserData`).

* Passing the `userData` and `setUserData` to the `UserContext.Provider`’s `value`.

* Components can then use `useContext(UserContext)` to access the shared user data.

> “once we get the user information from the database we need to save in a state so that we can share it across a application in a different different components and for that one we are going to use a react context”

> “simply go to this app and inside this we’ll create a component sorry create a folder called underscore context now we are giving this name under UND start with underscore because nextjs then will not consider this as a route”

> “now inside that I will give the context name I will keep the context name similar to the file name and we’ll say create context and that’s all that’s how easily you can create the context”

9. **Dashboard UI Structure:** The source begins to outline the basic structure of the dashboard screen:

* Creating a `main` folder to house authenticated application routes.

* A `layout.jsx` within `main` serves as the layout for the dashboard.

* A `dashboard` folder with a `page.jsx` represents the main dashboard page.

* Route protection using `middleware.jsx` to redirect unauthenticated users away from `/dashboard` routes.

* An `_components` folder within `main` to store shared UI components (e.g., `app header`).

* The `AppHeader` component includes a logo (loaded from `public/logo.svg`) and the Stytch `<UserButton>`.

* A `FeatureAssistant` component is introduced for displaying core functionalities as a grid of options with icons and names.

* A list of coaching options with names and image icons is defined in `services/options.jsx`.

* Basic styling using Tailwind CSS classes is applied for layout (flex, grid, padding, margin, etc.) and visual appearance.

* Sections for “Your previous lectures” and “Interview feedback” are planned.

> “it’s time to design the dashboard screen where we are going to add the header at the top and we will make sure uh all the routes which user can uh have authorized to see right that on that all the pages we are going to keep this header uh constant”

> “inside this main folder I’m going to create another folder called underscore components and inside this we’ll create app header do jsx”

> “inside this main let’s create a new uh folder called dashboard inside this dashboard uh we can create a page jsx file”

10. **Enabling Microphone Access:** The initial steps for enabling microphone access using `record-rtc` are outlined:

* Installing the `record-rtc` library.

* Creating a `connect to server` function to initiate recording using `navigator.mediaDevices.getUserMedia`.

* Using React state (`enableMic`, `setEnableMic`) to toggle between “Connect” and “Disconnect” buttons.

* Using `useRef` to manage the `recorder` object.

* A `disconnect` function is created to stop recording and reset the recorder state.

* Addressing potential server-side rendering issues with `record-rtc` by dynamically importing it.

> “now it’s time to enable the microphone and this is the first thing that we want to implement I’m going to make your life easier because I’m going to provide this source code uh so that you can enable the microphone and it’s quite straightforward”

> “inside this connect to server method we need to call when we user click on this connect button so here we say on click and then call this connect to server”

> “constant recorder is equal to use reference and initially it will be nuls make sure to import this use reference as well”

11. **Speech-to-Text with AssemblyAI:** The source introduces AssemblyAI for real-time speech-to-text conversion:

* Creating an account on `assemblyai.com`.

* Installing the AssemblyAI JavaScript SDK using `npm install assemblyai`.

* Explaining the concept of streaming speech-to-text for live audio transcription with low latency.

* Setting up a WebSocket connection to AssemblyAI’s streaming endpoint.

* Sending audio data (Blobs) to the WebSocket.

* Handling incoming messages, including `partial_transcript` and `final_transcript` types.

* Updating a React state (`transcribe`, `setTranscribe`) to display the transcribed text in real time.

* Storing final transcripts in a `conversation` state.

> “The next step is to convert our speech to a text and for that one we are going to use an assembly AI where we are going to stream the speech in a real time to a text”

> “simply go to this assembly ai.com or click uh the link in the description so you’ll jump on specific page”

> “very first thing we need to do is to install this assembly AI so I will copy this assembly AI statement and inside our terminal I will just make sure to install that”

> “we’ll set up a WebSocket connection to this URL and then we’ll pass the key and the format”

12. **AI Model Integration with OpenRouter:** The source details using OpenRouter to access various LLM models:

* Creating an account on `openrouter.ai`.

* Generating an API key.

* Installing the Open AI SDK (`npm install openai`).

* Initializing the Open AI client with the OpenRouter API key.

* Using the `chat.completions.create` method to interact with an AI model (e.g., Gemini Pro 2.0 experimental).

* Passing user input (topic and message) and a prompt (defined in `services/options.jsx`) to the AI model.

* Replacing a placeholder (`user input`) in the prompt with the actual user topic.

* Sending the conversation history (last two messages) to the AI model for context.

* Handling potential browser security errors (`Dangerous allow browser`) by moving the Open AI API call to a server-side API route (implicitly suggested by the later fix).

* Updating the `conversation` state with the AI model’s response.

> “we are going to use an open router. a open router contains a lot of different AI model which you can use it for free”

> “simply uh make sure to sign up with this your account and then select the model which you want to use”

> “install this open so let’s copy this uh inside the terminal make sure to install and once it install make sure to import this open a”

> “using the chat.completions.create method to interact with an AI model”

13. **Text-to-Speech with Amazon Polly:** The source explains integrating Amazon Polly for converting text to speech:

* Creating an AWS account and accessing the Amazon Polly service.

* Generating AWS access keys (Access Key ID and Secret Access Key) with the necessary permissions for Polly.

* Installing the AWS SDK client for Polly (`@aws-sdk/client-polly`).

* Initializing a Polly client with the AWS region and credentials.

* Using the `SynthesizeSpeechCommand` to convert text to an audio stream in MP3 format.

* Selecting a voice ID based on the chosen expert.

* Converting the audio stream to a buffer.

* Creating an audio element in the browser and setting its source to a data URL representing the MP3 audio.

* Playing the generated audio.

> “it’s time to convert our text to speech and for that one we are going going to use AWS Amazon poly”

> “simply search on Google Amazon poly and uh it’s free to use”

> “install the AWS SDK client for Polly (`@aws-sdk/client-polly`)”

> “using the `SynthesizeSpeechCommand` to convert text to an audio stream in MP3 format”

14. **User Interface Enhancements:** The source covers various UI improvements:

* Implementing a dialog for user input upon selecting a coaching option.

* Displaying the selected coaching option name as the dialog title.

* Adding a text area for users to enter their topic.

* Showing a list of coaching experts with their names and avatars (loaded from `public`).

* Allowing users to select an expert, visually indicating the selection with a border.

* Adding “Cancel” and “Next” buttons in the input dialog.

* Creating a chat box to display the conversation history, differentiating user and assistant messages with styling.

* Implementing scrolling for long conversations within the chat box.

* Adding a “View Summary” section with basic formatting.

* Creating a profile dialog accessible via the user button in the header, displaying user information (profile picture, name, email), token usage with a progress bar, current plan, and an upgrade option.

15. **Token Management:** The source details the initial implementation of token usage tracking:

* Adding a `credits` field to the user schema in Convex.

* Creating a Convex mutation function (`updateUserToken`) to update a user’s credits.

* Using a React mutation hook (`useMutation`) to call the `updateUserToken` function.

* Calculating token usage based on the length of user and AI-generated messages.

* Updating the user’s credits in Convex and the local `userData` state after each message exchange.

16. **Deployment (Vercel):** The source briefly explains deploying the Next.js application to Vercel:

* Creating a Vercel account and linking the project repository (GitHub, GitLab, Bitbucket).

* Vercel automatically detects the Next.js project and handles the build and deployment process.

* Environment variables (API keys, database URLs) need to be configured in the Vercel project settings.

* Stytch and Convex production settings need to be enabled.

**Key Facts:**

* The project aims for a free-to-use technology stack.

* AssemblyAI offers a $50 credit for new users.

* Stytch provides a free authentication service.

* Convex offers both cloud and self-hosting options.

* OpenRouter allows access to various free AI models.

* Amazon Polly is free to use for text-to-speech.

* Next.js version 15 was the latest at the time of recording.

* Tailwind CSS version 4 integrates its configuration into `global.css`.

* Stytch automatically creates authentication routes.

* Convex automatically generates IDs and timestamps.

**Quotes:**

* *”assembly AI with the help of assembly’s streaming speech to text you can convert the live audio into a text up to 90% of accuracy and within a less than 600 millisecond of latency how cool”*

* *”all the sources that we are going to use to build this application is free to use”*

* *”assembly AI will give you $50 of credit by joining the link in the description”*

* *”convex is a open-source database for your application uh it also provide a lot of different feature like realtime updates it’s a type safety included also if you want to run any specific task you have the chrone job uh in order to store the file they also provide a file storage functionality and there are many more”*

* *”stack O is a alternative for the clerk so I thought to give one try to the stack o and let’s see how it works so simply simp go to this stack o.com and create a new account and again one more important thing that it is free to use”*

* *”Amazon poly to convert our text to speech everything”*

**Conclusion:**

The source provides a comprehensive overview of the initial stages of developing an AI voice assistant for educational content. It lays out a clear technology roadmap, details the setup procedures for key components, and begins to implement core functionalities such as frontend structure, authentication, database integration, and basic speech-to-text capabilities. The emphasis on free-to-use services and the step-by-step instructions make this a practical guide for developers looking to build similar applications. The subsequent parts of the source likely delve deeper into AI model integration, text-to-speech implementation, UI enhancements, and deployment.

AI Educational Voice Assistant: Technology and Implementation

Q1: What technologies and services are used to build the AI voice assistant for educational content?

The AI voice assistant will utilize several key technologies and services:
- Next.js: A React framework used to build the web application, taking advantage of its features like folder-based routing and component structure.
- Shadcn UI: A Tailwind CSS-based UI component library for creating a consistent and customizable user interface.
- Stack o: An authentication service provider offering features like user sign-up, sign-in (including social logins), and user management, which is free to use.
- Convex: An open-source, real-time database for storing application data, with options for both cloud-hosted and self-hosted deployments using Docker.
- AssemblyAI: A speech-to-text service with a streaming API capable of converting live audio into text with high accuracy and low latency.
- OpenRouter: A platform that provides access to multiple large language models (LLMs), allowing the application to explore and use various AI models.
- Amazon Polly: A text-to-speech service used to convert the AI model’s responses back into spoken audio.
All the core services and technologies mentioned are free to use, making it accessible for development.

Q2: How is the Next.js application set up, and what is the purpose of the key files and folders?

The Next.js application is set up using the npx create-next-app@latest command, which installs the latest version of Next.js (version 15 in the source). During setup, the user is prompted to configure options like using TypeScript (no), ESLint (no), Tailwind CSS (yes – installing version 4), and the app router (yes).

Key files and folders include:
- app folder: Contains all the application’s pages, routes, and layouts. page.js is the default home page, and layout.js is the root layout defining the structure for all pages. global.css contains global styles, including Tailwind CSS variables.
- public folder: Stores static assets like images and fonts, which can be directly referenced in the application without specifying a path.
- next.config.mjs: Contains configuration settings for the Next.js application.
- package.json: Holds metadata about the application, including its name, version, scripts for running and building the application, and dependencies with their versions.
- middleware.jsx: Used for protecting routes by checking user authentication status before allowing access.
- convex folder: Created after setting up Convex, containing files related to the database schema and functions.
Q3: How is authentication implemented in the application?

Authentication is implemented using Stack o, a free authentication service provider. The process involves:
1. Creating a new project on the Stack o platform and configuring sign-in options (e.g., Google, GitHub, email/password).
2. Copying the provided environment variables (Stack o project ID, public API key, and secret API key) into the application’s .env.local file.
3. Installing the Stack o Next.js SDK using npm install @stack-o/react.
4. Initializing Stack o in the application, which automatically adds necessary files and components like handlers/stack/page.js, loading.js, layout.js (with a <StackOProvider>), and stack.js.
5. Utilizing the /handler/signup route provided by Stack o for user registration and login.
6. Using the <UserButton> component from @stack-o/react/stack to display user profile information and options like account settings and sign out.
7. Protecting routes using a middleware.jsx file, which intercepts requests and redirects unauthenticated users to the sign-in page using Stack o’s functionalities.
Q4: How is the Convex database set up and used for storing application data?

Convex is set up as the application’s database. The process involves:
1. Creating a new project on the Convex platform (or opting for self-hosting via Docker).
2. Installing the Convex client SDK in the Next.js application using npm install convex.
3. Initializing Convex in the project by running npx convex dev and linking it to the created Convex project. This generates a convex folder in the application.
4. Wrapping the application within a <ConvexProvider> in a provider.js file (marked as a client component) to provide Convex context to the application. This provider is then used in the root layout.
5. Defining the database schema in a schema.js file within the convex folder. This involves exporting a default DefineSchema and specifying tables with their columns and data types using Convex’s v (validator) object. Convex automatically handles unique IDs and creation timestamps.
6. Creating Convex functions (queries for fetching data and mutations for modifying data) in files within the convex folder to interact with the database.
The application uses Convex functions to check if a user exists upon login and to save new user information, including name, email, and initial credits, into the “users” table.

Q5: How is self-hosting of the Convex database achieved?

Self-hosting of the Convex database can be achieved using Docker. The steps include:
1. Installing Docker Desktop on the local machine.
2. Downloading the docker-compose.yml file provided in the Convex self-hosting documentation and placing it in the root directory of the project (renaming it to docker-compose.yml).
3. Running the command docker-compose up in the terminal, which sets up the Convex backend and dashboard using Docker.
4. Generating an admin key by running a specific Docker command provided in the documentation.
5. Accessing the Convex dashboard locally (usually at localhost:6791) and logging in using the generated admin key.
6. Updating the environment variables in the .env.local file to point to the self-hosted Convex instance (NEXT_PUBLIC_CONVEX_URL and CONVEX_ADMIN_KEY) and commenting out the cloud-based Convex URLs.
7. Running npx convex dev again to connect the application to the local Convex instance.
This setup allows developers to run and manage their Convex database on their own infrastructure.

Q6: How is speech-to-text functionality implemented?

Speech-to-text functionality is implemented using AssemblyAI’s streaming speech-to-text service. The process involves:
1. Installing the AssemblyAI client library using npm install assemblyai.
2. Creating an AssemblyAI account and obtaining an API key.
3. Implementing a function to connect to the microphone using the recordRTC library. This function initializes the microphone and starts recording audio.
4. Establishing a WebSocket connection to AssemblyAI’s streaming endpoint (wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000) when the user clicks a “connect” button. The API key is sent in the WebSocket URL’s Authorization header.
5. Sending the audio data in chunks (Blobs) over the WebSocket connection as the user speaks.
6. Receiving real-time transcription results from AssemblyAI over the WebSocket. These results include “partial transcripts” for immediate feedback and “final transcripts” when AssemblyAI detects a pause in speech.
7. Updating the application’s state with the received transcripts to display the spoken text on the screen in real time.
8. Handling the “disconnect” event to close the WebSocket connection and stop the microphone recording.
The final transcript is used to pass the user’s spoken input to the AI model for processing.

Q7: How is text sent to the AI model and the response handled?

Text is sent to the AI model using OpenRouter, a platform that provides access to various LLMs. The process involves:
1. Signing up for an OpenRouter account and obtaining an API key.
2. Installing the OpenRouter SDK (using npm install open-ai) and importing the OpenAI class.
3. Initializing the OpenAI client with the OpenRouter API key and a custom base URL (https://openrouter.ai/api/v1).
4. Creating a function that takes the user’s input text (obtained from AssemblyAI), the selected coaching option, and potentially conversation history as arguments.
5. Within this function, constructing a message payload with roles (“user” and “assistant” or “system”) and content. The “assistant” role’s content includes a prompt tailored to the selected coaching option, with a placeholder for the user’s topic which is replaced with the actual input.
6. Using the OpenAI client’s chat.completions.create method to send the message payload to a selected AI model (e.g., Gemini Pro 2.0 experimental).
7. Receiving the AI model’s response, which typically contains generated text.
8. Parsing the response and updating the application’s state to display the AI-generated text in the chat interface.
The application can be configured to send only the last user message or a history of the conversation to the AI model to provide more context.

Q8: How is text-to-speech functionality implemented to vocalize the AI’s responses?

Text-to-speech functionality is implemented using Amazon Polly. The process involves:
1. Installing the AWS SDK client for Polly using npm install @aws-sdk/client-polly.
2. Setting up an AWS account and creating an IAM user with programmatic access and the AmazonPollyFullAccess policy.
3. Obtaining the AWS access key ID and secret access key for the created IAM user and storing them as environment variables in .env.local.
4. Creating a function that takes the text to be synthesized (the AI model’s response) and the desired voice ID (based on the selected expert) as arguments.
5. Within this function, initializing a Polly client with the AWS region and credentials.
6. Creating a SynthesizeSpeechCommand with parameters including the text, output format (MP3), and voice ID.
7. Sending the command to the Polly client to synthesize the speech.
8. Receiving an audio stream in the response.
9. Converting the audio stream into an audio buffer.
10. Creating an audio element in the browser and setting its source to a data URL representing the audio buffer (e.g., audio/mpeg;base64,…).
11. Playing the audio, allowing the user to hear the AI’s response.
The application uses different voice IDs based on the “expert” selected by the user, providing a personalized voice for the AI assistant.

AI Voice Agent: Speech-to-Text with Assembly AI

The sources discuss speech-to-text technology extensively in the context of building an AI-powered voice agent for educational purposes. The application aims to teach users about specific topics, conduct mock interviews, provide question-and-answer learning, and even assist in language learning. A key feature of this application is the ability to convert speech to text.

Here’s a breakdown of the speech-to-text aspects based on the sources:
- Core Functionality: One of the fundamental aspects of the AI voice agent is learning how to convert speech to text. This allows users to interact with the agent using their voice.
- Real-time Transcription: During interactions like mock interviews, there is a live Speech to Text option that displays the user’s speech in real time. This feature is crucial for a natural conversational flow.
- Technology Used: The application utilizes Assembly AI to implement the speech-to-text functionality.
- Assembly AI’s streaming speech-to-text feature is specifically employed to convert live audio into text with high accuracy (up to 90%) and low latency (less than 600 milliseconds).
- The development process includes learning how to integrate Assembly AI’s streaming speech-to-text from the basics.
- Workflow Integration: The workflow of the application involves several steps, including getting microphone access, converting spoken words into text using Assembly AI, and then processing this text to get responses from an AI model.
- Once the user speaks, the audio needs to be converted to text, which is then passed to the AI for generating an answer.
- Implementation Details:The application uses the recordRTC library to handle microphone access and audio recording in the browser.
- The recorded audio is then sent to Assembly AI for transcription.
- The implementation involves setting up a WebSocket connection with Assembly AI’s streaming endpoint and sending audio buffers for real-time transcription.
- Assembly AI sends back transcript data, which includes both partial transcripts (for real-time display) and final transcripts (once the user pauses speaking).
- The application includes logic to process these transcripts and update the user interface to show the text in real time.
- The final transcript is used to capture the complete user input for processing by the AI model.
- Cost Considerations: Assembly AI provides a $15 credit to new users who join through a specific link, making it free to get started with the speech-to-text integration.
- Token Management: The transcribed text (output of speech-to-text) plays a role in token management, where the length of the conversation (in words) might be used to track user usage and potentially for paid plans.
In summary, speech-to-text is a foundational component of this AI-powered educational voice agent. Assembly AI is the chosen technology for providing accurate and real-time transcription of user speech, enabling a natural and interactive learning experience. The development process includes learning to integrate and utilize Assembly AI’s streaming capabilities effectively.

Educational Voice Agent: AI Model Integration

Based on the sources, AI models are central to the functionality of the educational voice agent being built. The agent relies on these models to provide a variety of learning experiences, including:
- Answering questions.
- Facilitating mock interviews.
- Delivering topic-based lectures.
- Assisting with language learning.
- Generating feedback on mock interviews.
- Creating notes for topic-based lectures.
The application aims to be flexible in its use of AI models, utilizing Open Router to explore and integrate with multiple Large Language Models (LLMs). The developers intend to connect to various AI models such as Gemini, OpenAI, ChatGPT, Dips, and CLA.

Here are further details about the use and integration of AI models:
- Model Selection: The project utilizes Open Router, which allows access to a wide range of AI models. This platform offers both free and paid models. One specific model mentioned as being used (at least for demonstration purposes) is Gemini Pro 2.0 experimental, noted as being free to use.
- Prompt Engineering: To get relevant responses from the AI models, prompt engineering is crucial. The application uses prompts that are tailored based on the coaching option selected by the user. These prompts include the user’s chosen topic, and a template that gets populated with the actual user input. The prompt can also be designed to influence the length and specificity of the AI’s responses.
- Interaction via API: The application uses the open-ai SDK to interact with the AI models through the Open Router API. This involves initializing the OpenAI client with an API key obtained from the Open Router platform. The API key is stored as an environment variable for security.
- Passing Context: For more coherent conversations, the application has been designed to pass the conversation history (or at least the last few messages) to the AI model. This allows the AI to provide contextually relevant responses based on the ongoing interaction.
- Generating Feedback and Notes: A separate AI model (or potentially the same model with a different prompt) is used to generate feedback and notes based on the completed conversations. The summary prompt defined for each coaching option guides the AI in generating these summaries.
- Client-side vs. Server-side Implementation: Initially, the code for interacting with the AI model was implemented directly on the client-side. However, due to potential security concerns (exposing the API key) and best practices, the source suggests that it might be better to handle the AI model interaction on the server-side by creating an API endpoint. The source also mentions a quick fix of using dangerousAllowBrowser: true to bypass a browser security warning when using the OpenAI client directly on the client-side for development purposes.
In summary, AI models are the intelligence behind the voice agent, enabling it to understand user input, provide educational content, and generate valuable feedback and notes. The application leverages the Open Router platform to access a variety of LLMs and uses prompt engineering and conversation history to guide the AI’s responses. The interaction with these models is facilitated by the open-ai SDK.

AI Voice Agent: Text-to-Speech with Amazon Polly

Based on the sources, text-to-speech (TTS) is a crucial component of the AI-powered voice agent for educational purposes. It serves as the mechanism to convey the AI agent’s responses to the user in an audible format, contributing to a more natural and interactive learning experience.

Here’s a breakdown of the text-to-speech aspects discussed in the sources:
- Core Functionality: The application aims to convert text responses generated by the AI models into speech, allowing users to hear the information rather than just reading it. This is essential for creating a voice-based interaction.
- Technology Used: The application utilizes AWS Amazon Polly for the text-to-speech functionality. Amazon Polly is described as an AWS service that converts text into lifelike speech. It is also mentioned that Amazon Polly is free to use.
- Implementation Details:The implementation involves using the AWS SDK client for Polly (@aws-sdk/client-polly). This SDK needs to be installed as a dependency in the project.
- A PollyClient is initialized with AWS credentials (access key ID and secret access key) and the desired AWS region. These credentials need to be obtained from an AWS account with appropriate permissions for Amazon Polly. It is recommended to store these keys as environment variables for security.
- To convert text to speech, a SynthesizeSpeechCommand is created. This command requires several parameters:
- Text: The text that needs to be converted into speech. This text comes from the responses generated by the AI model.
- OutputFormat: The desired format for the synthesized speech, which is set to MP3 in this application.
- VoiceId: The voice to be used for synthesizing the speech. The application intends to select the voice based on the coaching expert chosen by the user. The expert’s name is used to determine the VoiceId for Amazon Polly.
- The polyClient.send() method is used to send the SynthesizeSpeechCommand to the Amazon Polly service. This returns an audio stream.
- The audio stream is then transformed into a byte array (transformToByteArray).
- The byte array is converted into an audio blob (new Blob) with the correct MIME type (audio/mpeg).
- Finally, the audio blob is converted into a URL that can be used by an <audio> element in the browser using URL.createObjectURL().
- An <audio> tag is added to the user interface with the src attribute set to the generated audio URL and autoplay enabled, so the AI agent’s response is played automatically.
- Voice Selection: The VoiceId parameter in the SynthesizeSpeechCommand is linked to the coaching expert selected by the user. The application attempts to use the expert’s name to choose a corresponding voice from Amazon Polly.
- Error Handling: The code includes a try…catch block to handle potential errors during the text-to-speech conversion process, logging any errors to the console.
In summary, Amazon Polly is the chosen service for converting the AI agent’s textual responses into audible speech. The implementation involves using the AWS SDK, configuring credentials and voice settings, sending the text for synthesis, and then handling the resulting audio stream to play it in the user’s browser. This text-to-speech capability is essential for creating a fully interactive voice agent.

AI Voice Agent: A Next.js Educational SaaS Application

The sources detail the development of an AI-powered voice agent for educational purposes built using React and Next.js. This is described as a full-stack SaaS application being built from scratch. Here’s a comprehensive discussion of the Next.js application based on the provided information:
- Foundation: The application is built upon the Next.js framework, utilizing its features for server-side rendering or static site generation, routing, and API endpoints. The initial step in the development process involves creating a new Next.js application using the command npx create-next-app@latest. Using @latest ensures that the latest version of Next.js (version 15) is installed.
- Project Setup: The setup process includes prompts for project name, TypeScript usage (chosen as ‘no’), ESLint linking (chosen as ‘no’), Tailwind CSS (chosen as ‘yes’), src directory (chosen as ‘no’), and App Router (chosen as ‘yes’, which is highlighted as very important).
- Project Structure: After creation, the project structure includes key folders and files:
- app folder: This folder contains all the pages, routes, and layouts of the application. Next.js uses a folder-based routing system, where the folder structure within the app directory defines the application’s routes. Files named page.js or page.jsx are treated as route handlers.
- public folder: This folder is used to store static assets such as images, fonts, and other files that can be directly accessed by the browser.
- global.css: This file contains the global styles applied to the application. With Tailwind CSS version 4, the Tailwind CSS color variables are included directly in this file, eliminating the need for a separate tailwind.config.js file.
- layout.js (or layout.jsx): This file defines the root layout of the application. It contains the <html> and <body> tags, and all pages are rendered within this layout. Custom fonts and meta tags for SEO are also typically added here. Specific layouts can also be created within subdirectories (e.g., main/layout.jsx for the dashboard layout) to apply different structures to different sections of the application.
- page.js (or page.jsx) in the app directory: This is the default page of the application, rendered when the user navigates to the root path (/).
- next.config.mjs: This file contains configuration related to the Next.js application.
- package.json: This file stores information about the application, including its name, version, scripts for running and building the application, and a list of dependencies and devDependencies along with their versions. The source notes that after installing Tailwind CSS version 4, it and its related postCSS dependency are listed here.
- Routing: Next.js employs a folder-based routing system, making it easy to define routes based on the directory structure within the app folder. The source mentions that Next.js handles routing without manual configuration and that the developers will learn about nested routing and dynamic routing.
- UI Components: The application integrates Shadcn UI, a Tailwind CSS-based UI component library. Shadcn UI components are installed individually using the npx shadcn-ui@latest add <component-name> command. The library is configured using npx shadcn-ui@latest init. Shadcn UI components generate .jsx files within a components folder, which can then be imported and used in the application. The theme and base color of the UI can be customized.
- Authentication: Stack O is used as the authentication service provider for the Next.js application. The integration process involves installing the Stack O SDK using npm install @stack-o/js. Environment variables provided by Stack O are stored in a .env.local file. Stack O provides pre-built UI components for sign-up and sign-in, such as the UserButton, SignIn, and SignUp components. Middleware can be used to protect specific routes by checking user authentication status.
- Data Storage: Convex is used as the open-source database for the application. The Convex client library (convex) is installed using npm install convex. A Convex development environment is started using npx convex dev. Convex uses a schema defined in schema.js to create tables. Mutations and Queries are defined as functions within Convex to interact with the database. Next.js hooks like useMutation and useQuery from the convex/react library are used to call these Convex functions from React components.
- State Management: React Context is used for state management, specifically to share user data across different components. A UserContext is created, and a provider wraps the application to make user information accessible throughout the component tree.
- Layouts: Custom layouts are implemented to provide a consistent structure for different parts of the application. For example, a dashboardLayout is created within the main folder to provide a specific layout for all routes under /main/dashboard.
- API Routes: Next.js allows the creation of API routes within the app/api directory. These routes can be used to handle server-side logic. For example, an API route /api/get-token/route.jsx is created to fetch a temporary token from Assembly AI.
- Deployment: The application is deployed on Vercel, a platform optimized for Next.js applications. The deployment process involves pushing the code to a Git repository (like GitHub) and connecting it to a Vercel project. Vercel handles the build and deployment automatically. Stack O also requires setting the application to production mode in its dashboard and potentially adding the production domain.
In summary, the Next.js application being developed leverages many key features of the framework, including its routing system, component-based architecture, layout capabilities, API route functionality, and integration with third-party services and UI libraries to build a comprehensive AI-powered educational tool.

Stack O User Authentication in a Next.js Application

Based on the sources, user authentication is a critical aspect of the AI-powered voice agent application, ensuring that only authorized users can access the dashboard and secure their data. The application implements user authentication using Stack O, which is presented as a free alternative to Clerk.

Here’s a detailed discussion of user authentication as described in the sources:
- Authentication Service Provider: The application utilizes Stack O as its authentication service provider. Stack O is chosen for its ease of use and the fact that it is free.
- Integration Process: Integrating Stack O involves the following steps:
- Creating a Stack O Account and Project: The developer needs to create a new account on the Stack O website (stacko.com) and then create a new project within the Stack O dashboard.
- Selecting Sign-in Options: During project creation, the developer can choose which sign-in methods to enable, such as Google, GitHub, email/password, etc..
- Obtaining Environment Variables: Stack O provides environment variables (API keys or IDs) upon project creation, which need to be copied.
- Installing the Stack O SDK: The Stack O JavaScript SDK (@stack-o/js) is installed into the Next.js project using the command npm install @stack-o/js.
- Storing Environment Variables: The copied environment variables from Stack O are pasted into a .env.local file in the root directory of the Next.js application.
- Initializing Stack O: Running the command npx @stack-o/js init in the project’s terminal initializes Stack O for the application. This might require using sudo on macOS due to permission issues. This process adds necessary files like handlers/stack/page.js, loading.js, layout.js (with a StackoProvider), and stack.js.
- Authentication Flow: The application enforces authentication with a standard flow:
- When a user attempts to access protected areas (like the dashboard), the application first checks if the user is authenticated.
- If the user is already authenticated, they are redirected to the dashboard.
- If the user is not authenticated, they are redirected to the sign-in/sign-up page provided by Stack O (/handlers/signup).
- After successful sign-in, the user is redirected back to the dashboard.
- Protecting Routes: The application uses middleware (middleware.jsx) to protect specific routes from unauthorized access. By specifying route paths within the middleware, the application can ensure that only authenticated users can access those routes. For example, the source indicates that all routes under /dashboard will be protected. The middleware utilizes the @stack-o/server/app package and NextResponse from next/server.
- Stack O UI Components: Stack O provides pre-built React components that simplify the integration of authentication features into the user interface:
- UserButton: This component displays the user’s profile image and provides options for account settings and signing out.
- SignIn and SignUp components: These dedicated components provide ready-to-use sign-in and sign-up forms.
- User Information Retrieval: The @stack-o/react package provides the useUser hook, which allows React components to easily access the authenticated user’s information, such as display name, primary email address, and profile image URL. This hook is indicated to be used on the client side.
- Saving User Information to the Database: Upon the first successful login, the application saves the user’s information (display name, primary email) into the Convex database using a mutation called createUser. This function checks if the user already exists based on their email; if not, a new user record is inserted with default credits.
- Account Settings and Sign Out: Stack O provides a dedicated account settings page that users can access. The UserButton component offers a straightforward way for users to sign out of the application.
In summary, user authentication in this Next.js application is handled by the Stack O service, which provides a comprehensive set of features including UI components, route protection via middleware, and user management. The integration involves a simple setup process, and the application leverages Stack O’s capabilities to secure access and manage user sessions. Upon successful authentication, user information is also stored in the application’s database (Convex) for further use.

🎙️Build & Deploy an AI Voice Agent for Education | Next.js, React, Tailwind, Convex, AssemblyAI

The Original Text

with the help of react nees and power of AI we are going to build a powered voice agent for educational purpose this is one of the gamechanging size application which we are going to build completely from the scratch hey there and welcome back to tub guruji Channel today we are going to build AI powered Vice agent to learn about a specific topic give the mock interview learn about the question answer or you can even learn any Lang languages this AI voice agent help you this is a full stack s application which we are going to build completely from the scratch after building this application you going to learn how to convert text to speech or speech to text as well as how to get the response from the AI and how to present to the any user with modern UI Trends we are going to design this application and build it step by step let me walk through the demo of this application and then we’ll talk about the all the text tag which we are going to use to build this application here we have this beautiful Landing screen and after clicking get started it will redirect you to the signin screen where you have option to sign in with the GitHub sign in with the Google or even you can sign in with the email and password so let’s sign in with the Google after sign in you will land on the dashboard screen if you see after refresh you will see this beautiful animation effect it will give you say the feding effect to this all the list then you have all these different different options that you can add it and you going to learn how can you add other uh coaching option as well for example if you want to learn about the any topic based uh lectures you can get that even though you can do the mock interview or prepare for the question answer then if you want to learn any new language that also you can learn and for if you want put a con concentration while doing a study then you can select this meditation option let’s select this mock interview now if I select here you need to give topic on which you want to give the mock interview let’s say you are a full stack developer and you want to give a mock interview on that one so we’ll say full stack uh frontend developer and especially you can say reactjs then you have option to select a coaching expert you have Jonah Sally joy and that also so I’m going to tell you how to do it and simply after selecting click next then you’ll land on this discussion room page where you have option to connect now right now I disable my camera but once you enable this camera you can see your uh camera or webcam view as well at this side at the bottom side then we’ll connect with the jna and then once it connect we will start conversation hi Jonah how are you good morning good morning I’m doing well thank you ready to dive into your interview yes I’m pretty excited great to hear let’s start with the basics what is react and why would use it in a project react is a web framework to develop the web application and it’s client Side Library as well good start to clarify react is a JavaScript library for building user interfaces can you explain its key features and that’s how you can do a two-way conversation after doing the all the conversation you have option to generate the feedback If You observe when you talk about uh anything you will see a live uh Speech to Text option and that’s how we are going to implement with the help of asembly AI assembly AI help us to stream the speech to text and with the of that one we are going to implement the speech to text uh feature into our application now on the right hand side you will see all this uh conversation between you and the voice agent over here you will see an option to generate the feedback and notes depends on your conversation now if you are giving the interview then it will generate the feedback for your interview and some suggestion as well to improve it and if you are learning uh topic based lecture then it will give you the notes let’s generate the feedback and notes it start generating the feedback and notes with the help of AI models that we are going to learn how to connect and attach to our application once the feedback and notes are saved you will see this notification and then you can go back to the dashboard on the dashboard you will see this option where you can see the feedback anytime if you talk about any lecture then you will see the notes over here if I go to this View feedback it will navigate to The View summary screen where you will see your conversation everything in the detail along with uh some key feature the uh the Gap where you are lacking and everything in the detail on the right hand side you will see the conversation and depends on that one you got the feedback for that given interview everything we are going to implement from the scratch but not only that we are conver this application to SAS if I go to this profile you will see we use this token based architecture where depends on the user usage we are updating the token user have option to upgrade to the paid plan and with the help of Reserve pay payment Gateway we added the payment integration as well right now I’m on paid plan and that’s the reason I’m able to use up to 50,000 tokens and that’s completely uh up to you how much much token you want to give for a free or on a paid plan over here you also see an account setting option if I go to this account setting you have option to update your profile email out and everything and everything that we are going to implement within a one single component without writing any code how cool right as this is a completely responsive friendly application and you can also work on this application can work on any different kind of uh devices this application is not only a SAS application but with this application you going to learn a lot of different text tack which are very important to get a job in the IT industry once you build this application it will be very useful to the all the students or job Seeker to prepare for their exam and interview let’s talk about the text tag which we are going to use to build this application along with the react and negs we are going to use a tawin CSS version 4 then for the UI component Library we are going to use a shadan and for the database we are going to use a convex database also we are going to learn how to do the convex self hosting and for a streaming speech to text we are going to use an assembly AI with the help of assembly’s streaming speech to text you can convert the live audio into a text up to 90% of accuracy and within a less than 600 millisecond of latency how cool and that also we going to learn from the basic then for the AI model I’m going to use open router where you can explore a multiple AI llm model and the last but not least the Amazon poly to convert our text to speech everything and all the sources that we are going to use to build this application is free to use also if you want to access all of my uh exclusive courses and the source code you can visit to tui.com where you will find all the projects and their source code you can join the tub gurji Pro membership just under $15 also assembly AI will give you $15 of credit by joining the link in the description just click on that one and you will get $50 of credit to use it all the important links all the text tag which we are going to use I will put the link in the description so that it will be easy for you to access it so guys without doing any further delay let’s begin to develop AI voice assistant for educational content now let’s create the react nijs application so this is a command to create the nextjs application so simply go to the folder where you want to create this project open a terminal at that folder and then just type npx create next app at latest now we you mentioned at latest then it will install the latest version of nextjs which is nextjs 15 then say do you want to proceed and then it will ask you to enter the project name so here we’ll say AI coaching voice agent okay we can rename that again later on then it will ask do you want to use ties script we’ll say no also no to the yes link and yes to the telin CSS now recently telin CSS um launched the version uh four and with that version it will you will not see the telin CSS file but obviously once you say yes it will install the telin CSS for you and we will see the all new changes introduced inside this telin CSS version 4 say app router yes this is very important and we’ll send not to other two question now if you see uh it it will install some dependency which is the important like react react Dom next and Dev dependency for the twiin CSS now project is ready simply go to the vs code open a folder where you just created your project and I will open this folder say yes I trust the author and here you will see all the files and some folders get created when you create your application let me walk through each one of one by one so first is a app folder which contain all your pages routes uh the layouts right so first we have this global. CSS so which contain the all the style applied to the uh application so you can mention it here now with with the help of tvin CSS version 5 all the tvin CSS color variable also included inside the global CSS so you will not find the new or separate file for the tail. config.js file okay if you see we don’t have that file anymore then we have layout. Js which is the root layout of your application in which you will see this HTML body tag and through this body tag you are all the pages are going to render here you will see the meta tag as well this is very important uh in order to add SEO inside your application and at the top uh here you we have some uh custom fonts which is applied but obviously uh you can update it that later on then we have the page.js file which is the default page of your application now once we run the application you will see this page is going to render then we have this public folder in which we are going to save all our images assays font whatever you want you can save inside this public folder and without mentioning any path you can directly use it right then we have this uh next doc. MJS file which contain the configuration related to the nextjs application the package.json which contain the name and version of your application also uh the script to run to build your application and the dependency along with their version so whenever you install any new dependency you will see this dependency along with their version and the de dependency as well here you will see the tailn CSS is now on version 4 and this is the post CSS which is again related to the telin CSS only now simply go to the terminal click new terminal or you can open this terminal from the bottom just type a command npm run D click enter and then you will see your application is now running on this particular URL with this 3,000 port number so I will just go to this browser open this local 3000 and then you will see your application is now running as I say in the beginning that this is the default page which is nothing but our inside this app we have this page.js file and this is the default page of your application now let me bring this side by side quickly and let me delete all of this line of code okay and simply we are going to add a du tag inside the DU tag I will add h2 tag and here we will say subscribe to tube guruji so if you did not subscribe to our Channel please do subscribe and save it as soon as save you will see the text is now displaying to subscribe to tub G now all the routes or Pages you need to name with the page.js file and the route name is nothing but the folder name which you want to give that we are going to learn it also along with that one you want to learn how to add the nested routing Dynamic routing and many more as nextjs is a folder based routing it will be very easy and you don’t need any configuration for the routing now next thing we want to uh add the UI component Library which is a Shad CN Shad CN is a tailn CSS based UI component library and it’s very popular among the developers simply go to the ui. shen.com and here just go to this documentation now click nextjs as a framework because we want to install now at the top here you will see the following guide is for twin CSS version 4 right but if you’re using version three or any other then you have to follow this tutorial okay or you can use this shadan with the version now this updated shn tutorial is supported with the version T CSS version 4 okay now simply copy this command in order to initialize the shadan I will open another terminal and and simply we’ll execute this shadan command now we’ll say yes to proceed and then you can select the base color whether you want neutral gray or any other so here we’ll select the neutral and obviously as we are using the react version 19 we have to forcefully um add this Library so we’ll say use force and then it will install but if you next from next time if you don’t want that particular command to run just go to the package.json and simply update the react version to 18 it will not affect to your application but make sure once you update the version inside the package.json install make sure to run this npmi command so it will install the L uh whatever the version you mention that particular package will will get installed now once the shat scene is ready you will see inside our folder uh or inside our Explorer you will see two new things get added one is the leave folder inside that we have .js file and other component. Json file which contain the shn related configuration here you will find the theme the base color the style and everything next Let’s test this particular component now whenever you want to use any shat CN UI components you have to first install that component so in this case let’s say you want to use button component so I will copy this inside the terminal let’s paste that and then once the button component get installed you will see inside this component folder this button. jsx file get added obviously this is the shat you have option to customize whenever you want and if I go back to the button.js I will simply add this button component and here we’ll simply write subscribe now if I go back back to our application here you will see the button called subscribe obviously it’s not uh it’s like just like a ghost kind of thing it’s not showing the color and everything because we provide a neutral uh color also if I go to this Global CSS here you will see all the colors and the the respective color code as well here you can update your color with your hex code as well if you want to add something like hash which is kind of black color save it and if I go back you will see the button color will change to Black okay so it’s up to you and obviously you can customize all other um styling as well if you want to use something like let’s say destructive okay so I will go to this button I will put the variant as a destructive and here we have this destructive um button okay so uh everything whatever the theme you want to add just update the color combination and then you are good to go it’s time to add an authentication inside our application it’s very important if you are developing any size application you have to protect your application from unauthorized user to maintain all the data to be secured from unauthorized user and that the reason also we are going to add an authentication inside our application so let’s consider that user want to xess access our dashboard or application first obviously we going to check if user is authenticated or not if user is already authenticated then we are going to redirect user to the dashboard if not then we’ll redirect user to the signin sign up page and after successful sign in only then we are going to redirect user to the dashboard as as simple as that and in order to add an authentication we are going to use a stack o when I search on Google the bestas authentication service provider there are lot of service provider like Clerk and stack O is a alternative for the clerk so I thought to give one try to the stack o and let’s see how it works so simply simp go to this stack o.com and create a new account and again one more important thing that it is free to use okay so you don’t need to pay anything I will log to my account and after successful signing you will land on this project page where you have option to create a new project so we’ll click on new project here uh give the project name so we’ll say AI coaching voice agent you can give whatever name you want you can even change that later on and and here you can add whatever the signin option you want to add right now I’m keeping sign in with Google sign with GitHub email password and everything now simply click create project after creating project you will get this um en variable just copy that and go back to your application inside the root directory we’ll create a EnV file or we click. loal and simply paste this all these three which key which we copy and save it after that we’ll say continue and then this is your um dashboard for your application here you can see once user login and accessing your application you can see live uh from which country which location everything then at the bottom you will see uh all the daily signups active user everything inside this user tab you will find the list of user and there are other lot of things you can just go ahead and access it here obviously you have option to modify your different SSO providers as well you can add it you can enable disable now simply go to the documentation page because that’s where uh obviously we need to integrate first and click on setup and installation first thing uh this is the recommended way okay and it’s very easy to go but you can uh do it manually as well let’s select this command and copy this command and go to this project inside the terminal we’ll just paste this command and then it will ask you do want to proceed we’ll say yes and then next uh it will initialize this uh stack up for us so right now if you see on the Windows you might sorry on the Mac OS you might get this error okay because it is just um not allowing user or is denying the permission in Windows you might not get this error okay so simply what I will do I will run this command using sud sudo so that it will ask me to enter a password just we are giving permission to write into the file and now if you see it will it will ask question we found the nextjs project do you want to install the stack o so we’ll say yes click enter and then it will start installing the uh stack op for us now after successful installation automatically this page will open okay just close this one and here you will find some of the file get created let’s go to this uh folder structure and inside this you will see a new folder get created called Handler and and inside that we have this stack folder and then inside we have this page.js file which is added by Stack o again after this you will see one more file get added called loading. JS and lay inside the layout. JS you will see this tag provider uh get added okay this is the context and this context get added with this all the uh oh stack theme is also get added so this component two component get added okay and last important thing the stack. JS so these are the file get added you don’t need to worry about any of this file okay just I’m telling you just that you will get to know okay which are the files uh where it came from and everything now I will go back and over here obviously in the step three it will it’s just mentioning which are the different different file get added and everything okay now if I go to this particular route which is local 3000 Handler SL signup let’s click on this one you will see our application now has the dedicated signup screen how cool right now obviously um all of these things you can manage from your um U dashboard page okay and that’s where you can enable or disable the provider I will select the account and we’ll log to our account and once you log in now obviously you will redirect to this local 3,000 page okay so that’s how simple it is in order to add at this authentication service provider now you can even go to this account setting page and here you will see your account details along we have the sidebar and everything okay uh obviously we are not going to use this but in if I go to this next here we have this dedicated component called user button then we have this sign in sign up component and lot of other let’s try this user button so simply I will go to our page.js and I will use this user button component okay if you see it is that it’s importing from the stack frame / stack and simply save it now if I go to this local 3,000 I will just refresh this and here you will see the user profile image if I click on that one you you will see an option called account setting sign out the user email and all perfect right so that’s how you need to add an authentication now one more important thing that there are there is a uh um obviously you need to add or you need to protect some routes right so if I go to the next um with the help of this ug user hook you will get the user information that we will see later on but if I scroll down again I think uh protecting page so this is the option called protecting page now there are three way that you can protect the page one using this middle we file one is the client component and one is the server component the best way I think is a middleware because everything you can keep in one file only whenever you want to protect any routes you can just mention that over here so I will create this file called middleware and this need to be a jsx and simply we’ll copy this line of code now make sure to install this stack server app from the stack and make sure to import uh import this next response as well and then save it right now we are not protecting any route because we only have the default page but once we have the dashboard and everything is set up then we are going to add the route which you want to protect okay so that route we going to add inside the middle wear. jsx file now next Once user is authenticated very first time we going to save user information to our database so that we can keep a track of all the user uh activity and everything so that what we are going to see next in order to store a application data we need a database and for that one we are going to use a convex convex is a open-source database for your application uh it also provide a lot of different feature like realtime updates it’s a type safety included also if you want to run any specific task you have the chrone job uh in order to store the file they also provide a file storage functionality and there are many more I’m using this convex from a long time and it perfectly work into any application with the help of convex you can build web application mobile application keeping your data at one place now convex provide a two different way to where you can use the convex one is the convex Cloud where you can uh set up the datab based on their own platform or convex also provide a self hosting where you can host the convex on your own premises or on your own platform how cool right and in this particular chapter we are going to see both the feature that how to enable convex on their own platform or how to do the self hosting with the help of Docker so first thing go to the convex DOD and here create a new account if you don’t have or just log to your convex account here if you see I already have a lot of project I created with the convex and here are some of the project which you can see that I already used the convex now simply click create project and give the project name here we’ll say AI coaching voice agent and simply click create once it is created your database is now ready over here inside the data you can able to create the tables obviously we are going to write a schema in order to uh generate these tables then whatever the function you write you function will comes under this function tab then you can upload the files you can schedule the chrone job and everything you will find over here the best thing about this convex that you can keep this Dev environment and the production environment separately you also have preview environment as well how cool right so you your data will be uh specific to that particular environment also once your convex is ready simply click in the link in the description about the convex documentation then you will jump on this specific documentation for the convex nextjs now first thing you need to do is uh to install the convex as we already created the nextjs application so we’ll just go to the uh copy this npm install convex command we’ll go to our terminal and inside the new terminal we’ll execute this command after this you just need to run npm npx convex D so right now okay I need to run with this pseudo so we’ll run with the sud sudo npm and after that I will run this npx convex de so it will run the convex now we’ll choose the existing project and here make sure to select the correct project so I will select the AI coaching voice agent and click enter and over here you will see it’s saying your convex function is ready now inside the Explorer you will see the new folder get created called convex and inside that you will have some of the files you don’t need to worry about any of this file right now okay now let’s go back and the next important step you need to do is wrap your application inside this convex provider okay so if you see this children is rendering inside this convex provider so we’ll go back to our application and inside this app folder we’ll create a new file called provider. JS you can say jsx add a default template and I will just rename this to provider with the p capital and I will accept the children now I’m making this provider as a different file from the layout because we can keep this provider on the client side and we can keep the layout. js on the server side now simply I will also Mark this as a use client and save it now make sure to render this children like this and inside your layout. JS now I will wrap this children inside this provider now obviously on the application side nothing will get change okay now simply you copy this import statement okay let’s copy this one and inside the provider I will paste it here next you need to copy this convex provider client okay now if you see over here inside the clients they provide this convex okay they initialize this convex client so I will just copy this and we paste inside this provider I will just remove this exclam Mark and uh this next public convex URL it automatically get added inside your env. loal and if you see it has this convex deployment URL and the convex URL as well this is very important because that we need it later on but for now you’re good and then we’ll W this children so copy this convex provider and I will just paste it here something like this perfect right after this if I go to this particular section you don’t need to implement that this all of these things because we keep as simple as as it is and I think your convex is now ready and anytime if you find any error you can see inside the terminal now let’s create a new table so I will go to this convex folder and we’ll create a new file called schema .js now inside the schema we’ll create a new table and inside this schema we’ll simply say export default Define schema and inside the Define schema you need to write a table name let’s say users here we’ll say Define table and inside this Define table you need to mention all the column which you want let’s say we want a name column which is of type let’s make sure to input this uh convex value as a V Dot string then email ID we want so we’ll say email column as V do string again and I will also put a credits column uh which is of type number okay so we can give some default credit to the user and here also we will say subscription ID if you subscribe to our uh application right with some payment that subscription ID we want to save which is up type string obviously later on you can add as much column you want as well you can come back add the column and that’s all as soon as we save this one you will see that our convey function is now ready if I go to our convex uh data and boom here we have the new table cool right inside the schema you will see the schema which you wrote and obviously if I click on add you you will see these are some of the fields uh which you can add it it’s nothing but those column now you might wonder why we did not add it the ID column now convex will automatically generate the unique ID for each of the record also along with that one it also generate the uh created time when you created this record so you don’t need to uh create those column okay now let’s save this one and that’s all that’s how you can create and set up the convex now this is the first method that you are running this convex on their own platform okay obviously uh you cannot move that because that is running on their convex Cloud platform okay now the question is how to do the self hosting so simply click on the link uh which I kept in the description which is regarding the self hosting and click on this self hosting guide okay I will also put the link in the description now on this particular documentation you will find everything in the detail how to uh host the convex on the docker with the help of Docker how to connect the convex to the postgrad SQL database using the neon and there are lot of other uh things is mentioned inside this documentation let’s do one by one for because first we want to host uh convex on our platform so with the help of Docker we are going to do so so simply go to the doer. and if you’re not familiar don’t worry just follow this particular video and you will be okay so click download Docker desktop and make sure to select your operating system and downloaded it now I already downloaded so I will just open this Docker see so I will open the docker and uh in the meantime it’s opening but yeah this is how it looks like okay if you see I already have this Docker sorry this particular application running on this Docker but um let’s go to the documentation and you need to download this Docker Das compose file so just click on it make sure to download now I already have multiple files but I will edit the name as well so I will copy that and will paste inside the root directory now I will just rename this to only Docker compose okay something like this after that let’s go back to the documentation again you need to run this command called Docker compose up so inside the terminal we’ll add a new terminal and just paste this command now it will set up the back end and dashboard for you with the with the help of Docker and convex so we’ll wait to finish and after that you need to generate the admin key okay which using this command so copy this command and once it is finished creating the back end and the dashboard if you see now it’s running right now inside the new terminal we’ll just run this command which will generate the admin key see now go to this particular URL local 67 91 and on this particular uh URL you need to add this admin key so copy this admin key and simply paste it here and click login and boom now if you see same UI you will see on the convex Cloud as well that also running on your Local Host it means it’s running on your local and you don’t need to worry about the cloud how many users you have to do nothing about that you can just run convex locally how cool right now next thing if you go back to this documentation that you need to add these two variable so let’s go back to your environment file so I will go to this enemy. loal but make sure to comment this both this URL because that that was running for the cloud right now we will add this self hosted URL and self hosted admin key now whatever the admin key is generated I will just copy that and I will paste it here say okay now once you put the production you just need to updated this host hosted URL that also mentioned in this documentation how to put in the production now obviously we already have the convex installed you don’t need to install but you can run this convex de command so simply what I will do I will go back to this particular convex and obviously it’s running again see it’s updated but I will stop this and I will run this npx convex T command and boom see now that’s all you need to do now if I go to this local 6790 sorry 9 6791 and inside the data you will see the table call user and here obviously whatever you saw in the convex platform same thing you will see inside your convex local as well okay so that’s how you can uh do the convex self hosting if you have any question about the self hosting you can always reach out to the convex Discord Channel or you can ask the question in the comment section or on my tube gji Discord Channel as well now once user successfully sign in or sign up we need to save the user information to our database as our database is already set up just we need to save the user information we also need to make sure that if user is new then only we want to save the user user information but if user already exist inside our database then we don’t need to store the user information so let’s go back to our application and inside the convex we need to write a function so I will create a new file called users. JS and inside that we’ll create export constant create user now inside this create user we are going to insert the user inside our database for that one obviously it’s a mutation so we’ll say mutation but if you are fetching the data then it will be a query now make sure it’s importing from this uh path then inside that we’ll get the arguments okay now obviously inside the argument we’ll get the usern name we’ll provide a type so we’ll say V Dot string then we want to uh get the user email of type string so right now only two parameters are necessary then we want to add a Handler with a Sync here we have CTX comma arguments and the arrow function now first thing we’ll check if user already exist if not then add new user okay now in order to check if user is already exist or not we’ll say constant user data is equal to AIT cx. db. query because we want to F the record you have here you need to give the table name dot filter and depends on the email ID we will we can check it so filter here we’ll say q q do uh equal and inside that we’ll say q. fill on email comma uh arguments. email here we’ll check the fill from the table and from the our uh arguments email we’ll check if both are match then just f it okay like this now if user data user data is empty okay so here we say if user data Dot length is equal to equal to zero it means it doesn’t don’t have any user data then we need to insert the record so here we’ll say constant result is equal to await CTX do DB dot insert and you need to provide a table name users and then you need to provide the value so first is name now instead of providing this value over here I will just Define a constant called data and inside that we’ll Define name let’s say name with arguments. name then email as arguments. email and I’m also going to provide credits okay uh from obviously I will put default credit to 50,000 also one more thing that I’m going to update so inside the schema I will make this subscription ID as a optional field so just wrap this particular column inside this v. optional okay so we are just making that this particular field is optional you don’t need to pass any data if you don’t have so right now these three field are mandatory so we are passing that one and either you can pass data something like this or you can just destruct dest structurize it something like this okay it’s up to you and once it is inserted we’ll return the data now we are not returning the result because result will only return the um inserted ID so we’ll just console that one as well okay but if uh user data is already there then over here we’ll say return user data of zero now obviously user data will give the list and from the first item we need to just return that one and then save it so this is how easily you need to create a function and inside this function you will see all the logic we return now next thing let’s go to the app and inside the provider. jsx here instead of Provider we’ll create another provider okay so here we’ll say uh Au provider. jsx okay now what out provider will do it will check if user is new or not but first we’ll Define this o provider something like this um you can rename this provider only I think that’s okay that’s fine I think o provider and just get the children I will render this children over here and then I will wrap this o provider something like this okay now if I go to our local 3,000 you will not see any change okay because we are just rendering this children again through this OD provider now inside this OD provider first we’ll get the user information using constant user is equal to use user and this user hook we are getting from the stack frame okay now if you want we can I can show you what data it will return so whenever the user information is available we’ll just console this user and if I go to this inspect panel go to the console here inside this object you will see that it will return the display name then it has the primary email address profile image URL and many other things perfect now simply in order to save the user information we’ll create a new method called constant here we say create new user and uh simply first Define the mutation so we’ll say create new create user mutation I will say only is equal to use mutation and if you see this use M mutation is importing from the convex re inside this use motation you need to provide this API so which is api. use us. create user now over here we say constant result is equal to await and the mutation name which is create user and then inside that you need to provide the value the first is the name that you can get it from the user do display name and the email which is user dot uh email I think let me a primary email I believe so you can just check the object and where is the email so which is this primary email perfect now here let make this as a sync and once you get the data you will see inside this result Now call this create new user method only when the user information is available let’s say with this one let’s go back to our application and I will just refresh this screen once and right now we have one error okay so it’s about the so we use this hook you user okay and I think it look like we need to call this on the client side so let’s make this as a client and let’s test this out now this error cause because we added this loading. TSX file but it’s saying NOS SP boundary error now to fix this issue simply go to your uh provider. jsx and over here I will just add this suspense from the react okay so make sure to wrap this and over here we say fall back and inside that we’ll add a P tag we’ll say just loading okay so this is a fallback Whenever there is a loading going on now if I refresh it you will see the text loading and then it will jump on this particular uh home screen over here in the inside the console you will see inside the loging we got this ID from this create user convex cool right and it will return me this particular information that we that we are expecting not only that but if I go to our database here inside the users you will see a new record with the credit email name obviously we are don’t have subscription ID but we also have this creation t along with this ID right and this is the same ID what it is printing over here because this is the inserted record ID right which we um console inside this create user right now we don’t need this I will remove that one and that’s how you need to add now again if I refresh it now in this case it will just return us the data and if I go over here you will see we have this IDE and all the information but uh it will not insert the record into database because it’s already checking if this user email already exist or not and depends on that one it will take an action so that’s how easily you can write the convex function in order to create a new user and check if user is already exist with the help of convex functions you can write the logic on the on this convex function itself and you don’t need to write it inside your client component once we get the user information from the database we need to save in a state so that we can share it across a application in a different different components and for that one we are going to use a react context context is State Management who help us to share the data across the different component instead of passing from one component to other let me show you how to define it first so simply go to this app and inside this we’ll create a component sorry create a folder called underscore context now we are giving this name under UND start with underscore because nextjs then will not consider this as a route okay now inside this we’ll create a user context. jsx file now inside that I will give the context name I will keep the context name similar to the file name and we’ll say create context and that’s all that’s how easily you can create the context now next step is simply uh go to the O provider and over here you can wrap this children inside this user context something like this now to this user context obviously it has a provider so make sure to pass um add this provider and then we have to pass some default value so we’ll Define the user uh here we’ll say user data comma set user data is equal to use State and once we have the user information right over here we’ll set the user data as result and simply pass this user data State comma set user data now in in what or whichever component you want to use this user data information just you need to use the context call user context and then you can access that obviously you can you you want to learn how to do that but for now make sure that you are passing this user data to our user context provider now it’s time to design the dashboard screen where we are going to add the header at the top and we will make sure uh all the routes which user can uh have authorized to see right that on that all the pages we are going to keep this header uh constant then the the dashboard content contain this some options the button for the profile where user have option to see the credits account setting and lot of other things and then user have uh option that he can see the previous lecture and the interview feedback at the bottom okay this is Simple and Clean screen obviously all these assets I’m going to share with you so first let’s go back to our application and inside this app folder I’m going to create a new folder called main now inside this main folder whatever the um application routes which we want to keep after user authenticated we can keep inside this main folder now first I will create a layout. jsx file now this layout is dedicated to a dashboard screen so I will add a default template and we’ll call it as a dashboard layout okay now through this one obviously I’m going to render the children and nextjs will automatically detect that this layout page layout uh file can be used for all the route which is inside this main folder let’s save this one now inside this main let’s create a new uh folder called dashboard inside this dashboard uh we can create a page jsx file and then add a default template to this we’ll say a dashboard you can give any name you can say workspace as well and save it now if I go back to our application let’s go to the new route called dashboard you will see we are now redirecting to the dashboard screen perfect right but what user is not authenticated let me go back and let me log out user so I will sign out out the user and it’s saying leave the site then I will go to the dashboard screen and it’s allowing me but we don’t want that one if you remember when we add the authentication I we added this middle. jsx file where we need to add whatever the route which you want to protect so over here I will just add all the routes come after the dashboard we can just protect that see something like this and then save it now if I go back and try to refresh this you will see it will navigate us to a signin screen because it will check if user is already authenticated or not and boom if you say it’s navigated cool right now only one when you sign in then only you can able to access the uh dashboard screen and now if I go to the dashboard then I can able to access the dashboard that’s how you can protect your routes from unauthorized user okay perfect now let’s go to the dashboard and first thing we need to add the header so inside this main folder I’m going to create another folder called underscore components and inside this we’ll create app header do jsx now let’s add a default template and then save it now make sure to add this app header inside the layout because we want to keep this app layout throughout the uh Das dasboard right so over here we’ll say app header and then save now you will see uh on this we have this app header showing on the top now first thing we want to get the logo so I will use this logo placeholder site okay to get the logo it’s just a placeholder but you can replace with a with an actual logo so let’s search some Simple and Clean logo so I will get this logo click on it it will copy the SVG file and then simply go to the public folder inside that we create a logo. SVG file and paste the code which is the SVG code save it so that you can use inside this app header so first thing we’ll add image tag from the next SL image then Source tag now this image tag from the next slash image is very helpful for the image optimization okay now inside the source just Define the file name and automatically nextjs will detect that you are trying to access this particular image or file from the public folder now here we say logo now whatever the width and height you are mentioning here that you are just optimizing that image I will give let’s say 200 and height of 200 okay and save it now if I save and go back to our application or let me open this one let me make sure to save this one okay I think it’s saved and here we have the logo perfect obviously uh you can just change width and height according to your requirements then we also want to show a user button on the right hand side so here we’ll say user button now this component is inputting from the uh stack frame sl/ stack and save it and now it will reload I don’t know why it’s slow but yeah if you see now we have this profile now we want this button on the right hand side and also we are going to add some padding to this header so for this du we’ll add a class name We’ll add padding to let’s say three and we also add little bit Shadow so we’ll say shadow small and then save it now if you see once we add the padding we’ll see the change now also make this do as a flex and then make justifi between along with the item Center so that this particular item will be on the right hand side where we have option to see the account setting and other options cool right now this is how our header is ready now moving to the next part which is this particular section called workspace now over here I’m going to create a component inside the dashboard because this component related to dashboard so we’ll create that new folder called underscore compon components and inside this we’ll create we can call it as a feature assistant. jsx okay you can name anything and then add a default uh template now inside our page. jsx I will just add it here call feature assistants and then save it now if I go back to our application you will see that feature assistant will display over here right now we want to give some padding and margin so that we can keep everything in the center and depends on the screen size you can change the padding as well so for that one uh I will go to this layout. JS and I will give the padding to this children so that whatever the routes that we are going to add inside this uh main it will apply to all the screen so over here we’ll add a class name first we’ll give padding to 10 we’ll give margin top to let’s say 20 then on the medium screen we’ll give padding to 20 maybe yeah then on large screen we’ll give padding to 32 on extra large we’ll give padding X to 48 and we have 2 Xcel as well you padding to let’s say 56 and save it now once we add the padding you will see the feature assistant now showing over here okay let’s add more padding let’s say over here you can say 72 so that you will see the change so actually this screen comes under the Excel size so maybe we can increase this one perfect okay after that we want to add this particular text called my workspace and then welcome back and username so this will be the usern name that we want to display and on the right hand side we have this button as well so let’s go back and simply uh future assistant over here I will add a texe called my uh let me see the text so it’s called My workspace and Below to that we have wel welome back and the username so username will be uh we can or we can get that username from the hook so we’ll say use user okay and obviously May mark this particular component on the client side as well and over here we’ll just say user dot display name and save it okay and if I go back here we have this text called my workspace and on the second line you will see that we have the welcome back and the name of the uh user now let’s applied some style so over here we’ll say style oh sorry the class name because we are applying the tailn CSS over here we say font medium and I will just change the gray to let’s say 500 then for this S2 tag We’ll add a class name we’ll say text let’s say 3XL font bold and this is how it will look like on the right hand side we will add a button so first I will wrap this in a d and then we’ll add a button and we’ll simply say uh profile and save it you can name whatever the button you want and once you add you will see the button will display on the right hand side or obviously currently it is just below to that one but we want to move to the right side so I will again drag this de I add this both the component inside this du and then we’ll add a class name we’ll say Flex then justify between and item to be in the center okay also um let’s save this first and let’s see how it looks and boom now we want to keep the primary color little different kind of blue color right we want to change the whatever the components that we are going to add from the shatan we want the primary color something like our logo so in order to change the primary color for all the components simply go to the global CSS that’s where you will find all the uh color codes Now search for the primary right now it’s a black color we added right but over here we want to add some kind of blue so I will just change this x to uh 1 F at4 EF and then save it and make sure to okay let’s save this okay I think we need to add colum and save now once you save it just make sure to refresh your application and boom if you see the button color is now changed right and whenever you add any any component the primary color will be this blue color next we want to display all of these options right and for each of them I already added these images inside our public folder so if I go to the our project inside this public folder I added this let’s say interview. PNG language.png and so on as I told you that I’m going to share that with you now obviously uh we’ll create one list and in that we’re going to add the name icon and all other fields which we want so basically uh first thing I will create a new folder called Services inside the services we’ll create options. jsx file and inside that we say export constant um I will name it as let’s say what name we give for this component assistants so so here we’ll say um experts list we can say okay and inside that I will add the name let’s say um lecture on topic over here I will add the icon related to that one so inside this I already have this uh lecture. PNG so we’ll just name it as a lecture. PNG I will copy this and we’ll add it something like this okay right now for this particular screen If You observe we only want the image and the name right so that’s the reason we added only these two Fields uh then we want let’s say interview or we’ll say mock interview for that one we have this interview. PNG file then question answer prep over here say we’ll say qa. PNG and then we have the another one I think called languages here we say language.png you can name whatever you want and then last we’ll say meditation to focus on the study so here we’ll say mediation PNG and then save it now make sure to export this particular list so that you can use it so once you uh add added this all the options simply go to our app folder and inside our dashboard component where we have this feature assistant list right here we are going to just display that so I will add one du inside this du uh we’ll add an uh let me I forgot okay it’s a expert list do map and here we’ll say expert we’ll say just option comma index and the arrow function inside this we’ll add one more View and then oh sorry uh the D we want to add and inside that we’ll add an image The Source tag and inside this image we want to add option do icon okay here inside the alt tag I will give the option do name as a alt tag then inside the width you can just mention the width let’s say 150 height to 150 and then inside the class name you can actually mention the height so let’s say 70 pixel and width to 70 pixel now this width and height is used to optimize our uh images okay now let’s save this one and let’s go back to our application and over here now we’ll display the images and boom if you see we have all the images right also this particular error which you saw we got it because we need to provide the key as a index and then save now after this uh we also want to show the uh name or text whatever we give so here we’ll say option do name and save it and once we add that one you will see the name for each of these options perfect now we want this in the form of grid so for this du we’ll add a class name and we’ll make this as a grid Now by default we can show a three column and when screen size is larger we’ll say grid column 5 on extra large maybe we can say grid column 6 and save and now you will see everything will will be in the form of grid see perfect now obviously we want to add some style to this one first thing for this de I will add a class name we’ll give a padding to three and we’ll give a background color to a secondary also we’ll make rounded corner to let’s say uh 3 XL um we’ll make this Flex Flex column and justify in the center and also make item to in the center so everything will be in the center something like this say now let’s give some Gap as well so over here for this actually uh D we need to give Gap let’s say Gap to 10 perfect and uh let’s give margin from the top so over here we say margin top to 10 uh for now let’s make this F and let’s see how it looks I think five is also much better okay one more important thing um I think for now I think that’s good uh if you want to change the this text you can uh change the size and all here I will add just margin top to two and then save it and most important thing if you try to go to the inspect panel and change the size you will see the change now on the smaller screen I think it’s uh better if I go to the this mobile view this is how it look likees right instead of making three you can just mark this as a two okay so that will be good one perfect now next thing we want to add is this particular section okay so again we have to add uh two section for that one so I what I will do I will go to the page. jsx and just below to that I will add a du and inside this du first we want to add a component called uh lectures or we say history history. jsx we’ll add a default template and we’ll add another component called feedback do jsx now both the component I’m going to add over here first we have the uh history and then we have the feedback component now for this du we’ll add a class name and we’ll make this as a grid and make grid column one when the screen size is smaller and grid column two when the screen size is medium or larger and then I will give the Gap to 10 as well and now if I go back to this page you will see the history and feedback let’s give margin top so over here I will just add margin top to 10 then then uh we are I’m just going to Simply add this particular text and also we’ll increase this margin top to let’s say 20 now inside the history we’ll say the text called your previous lectures your previous lectures over here we’ll add a class name we’ll make font bold text large same uh I will copy the same h2 tag I will just make sure that it’s displaying correctly I think we can increase this size let’s say Excel and then copy this oop sorry copy this and go to the feedback same thing I’m going to add inside the feedback and here we’ll say um feedback as a text only and save it now under the history I’m uh obviously we don’t have anything so I will just add a text a simple text okay obviously we going to update that here we say you don’t have any previous lectures and inside the class name we’ll say text Gray to let’s say 400 same thing I’m going to add over here and here we say you don’t have any previous interview feedback and save it and let’s go back and here we have perfect so that’s how our dashboard is now ready only thing that I’m concerned the space from we gave from the top so basically inside the P um not here but inside the layout we give margin top to 12 over here I will just make to let’s say 140 I don’t know uh let’s make it 11 14 only margin top to 14 and I think that will be good one I think that’s good okay so that’s how guys you need to uh add the dashboard simple we don’t have any logic return just we uh display the images now one last important thing that we want to do is to add some animation to this particular cards so whenever we we h on this one we want to move this um images little bit so it’s quite easy simply go to our feature assistant component and for this image we’ll say onover I just want to rotate this image let’s say 12° okay also we’ll add a cursor pointer so here we’ll say cursor pointer and then save now if I on this one you’ll see this beautiful moment right and that’s the reason we added this now for the smooth animation you can also mention transition all and that once you add this particular um tell CSS class this it’s very smooth see now from the previous okay also if you want to when you reload your application we want to show this uh cards with some kind of animation and for that to Shad CN animation uh you can go to this magic ui. design go to the components here you will see a different different uh components for the card I will definitely prefer this FL blur fade okay and if you see if I refresh this is how it will look like okay very easy to install first just copy this npm command execute inside your vs code inside the terminal I will just paste that and then just go to the code and here we have the example how to use it you just need to wrap your component inside this BL fed and then you need to provide the delay let me show you how so I will copy this and then okay so over here I think we need to I need to pass with the pseudo okay because in my Mac OS is little secured it will not give you permission easily and in meantime I will just wrap this particular uh view inside this blur fade now make sure to close this tag as well uh I think this particular tag perfect and uh over here we’ll say a key let’s say key as option dot uh icon over here this is the delay okay so depends on the index it will show and I think that’s pretty good let’s save this one and let’s teste this out so let’s go back now to our application also I just um is that you need to import this blur Fed so make sure to import this blur fade from this magic UI import statement and then save it now if I go back and refresh this screen and if I reload this now you will see how beautiful the animated right boom so that’s how you can add the animation you can explore the more animation inside this one we will use some of them so that our application look more interactive and once you add the animation it will give a different effect to your application as well now when user click on any of this option we want to open a dialogue and on that dialogue we are going to accept the topic name on which user want to uh put a conversation and the different tutorial guide so in this case if you see we have three guides in this mockup and that kind of way that that we want to show it on this dialogue now if I go back to our application right we have this five different kind of Agents we can say and on the click of any of them you want to open dialogue obviously we are not going to create a different dialogue for each of them but rather we will create only one dialogue and on the click of that one that will open that particular dialogue and only thing that um the information depend on which U agent user selected that information we want to pass to that d so let’s go back to our application and very first thing that we need to do we’ll create a new component and we’ll call it as a user in input dialog. jsx let’s add a default template now in order to add a dialogue uh I’m going to use a Shaden dialogue component so go to the Shaden and search for this dialogue component it’s very easy uh if I click on this one this is how the dialog will open first thing you need to install this dialog component so inside the new terminal I will just add this dialog component and once it is installed in the meantime we can copy this import statement and then we’ll also copy this example and we’ll paste it here and save it now if you see this is the dialog trigger it’s nothing but on the click of this open Button this dialogue will open but we want to open the dialogue on the click of these particular options so simply you can accept the children and pass the children inside this dialog trigger now I will go to this our feature assistant one more thing that I updated this name to coaching options okay so over here we have the pre different name but I just updated doesn’t matter actually that much but just I rename that particular list now simply I’m going to our wrap this inside our user input dialogue okay something like this so now what will happen this will be our children for this user input dialogue and along with this one we I’m also going to pass the some data so here we’ll say uh coaching option is equal to and then selected option and here I will accept that okay and save it now if I go back I will just make sure to refresh this once and if I open any of them let’s open this one you will see that I have this uh dialogue open right but there is some style change as well so let’s fix that one so I will just do one thing I will just copy this over here and I will remove this all the class name let’s see how it

looks I’m not sure but still we want to give this in the center of the screen so let’s keep this yes we don’t need to give the padding okay I’m just updating the style uh nothing else and I think y that’s pretty cool perfect next thing uh after this let’s go to this user input dialogue and for the dialog title I’m going to show the from the coaching option do name that’s what we give right so if I click on this question answer you will see it sh question answer if I select mock interview it will show the mock interview and so on now let’s go back and here inside this dialogue description I’m going to Simply add a du and inside this du tag We’ll add a text we’ll say enter a topic to master your skills in and you can give the this uh name okay so and save it now if I go back if I click on this one you will see it’s saying enter topic to master your skill here we have the error which is the hydration error so either you can just add it as a as child to this one and hopefully the error will come if I select this one if you see the error is no longer uh present now I will just give margin top as well so for maybe you can give it to this D okay so margin top to three and then we want to add a text area so from the shat CN we have the text area component just uh make sure to install so I will just copy this component and over here I will paste it so it will get install and after this S2 tag We’ll add this text area component in inside that we’ll add a placeholder and we’ll say enter your topic here if I go back now you will see the text area perfect just I will give little bit margin top to two and uh for this h2 tag I will make a text is of black color perfect now next thing we wanted to show the teachers or the expert name right the guide so for that one I’m going to add I will copy the similar S2 tag I will paste it here uh I will add some margin top to five and then we want to show that all the options so I will add that option inside our service folder inside this options and here we say export constant coaching expert is equal to and inside this um we’ll add a name now name is nothing but the name of the person obviously but this name we also going to use to give the voice okay and for that one we are going to use Amazon poly which is a uh text to speech AI generator that we going to learn it later on in the this course but for now make sure you will keep the same name okay and I’m also going to show you how to get all of this name as well then for outar I will say we give the name as a T1 dot I think it’s a jpj so I already added that images inside our public folder oh so it’s avif that’s interesting and I will paste this couple of times the second name I will give is as a Sally and Matthew and this is T3 it’s a JPEG and save it now this particular uh option we wanted to show so we’ll say coaching expert. map we say expert comma index and the arrow function inside the that we’ll add a text sorry sorry uh not text here we will add an image tag and inside that we’ll add a source The Source will be expert. AAR here we’ll add Al tag as expert. name and then we’ll give width let’s say 100 height 200 now if I go back to our application you will see this options perfect also uh I will make sure to wrap this in a du for this du We’ll add a key as a index and then we’ll add an H2 tag with a name now for with this de We’ll add a grid and we’ll make grid column three when the screen size is smaller and on on medium or larger we’ll say G column five also we’ll give cap to let’s say six and this how it look like perfect um after this for this image We’ll add a class name we make a rounded let’s say 2 XEL and save this one let’s give height proper height so we’ll say 80 pixel width to 80 pixel and we’ll say object cover so it will not break and if you see this is how it will look like uh for this particular de we’ll give some margin top to three some space perfect now let’s make sure to Center this particular text so we say text in the center now whenever we h on this one we want to give some kind of uh Zoom effect right so basically either you can provide some border okay when user select that particular item so onover I will just scale this to 105 and for smooth animation we’ll say transition all and if i h on this one so this is how it looks okay and once you selected we want to show a border so in order to save the selection we say constant selected expert comma set selected expert is equal to use State okay and on the click of that one obviously we want to show Alo forgot to add the cursor pointer so whenever you hover on this one you will see this cursor pointer um here we will add on click event and when it’s selected so we’ll say set selected expert will be the expert. name because we are just saving the name of the selected expert after this uh we can add a border so we can add a class name make sure uh you’ll add the curly braces because we want to add depends on the condition right so here we’ll say condition if selected expert is equal to equal to expert. name then we we I put and and operation and then we want add a border uh I think just add a border okay let’s say border two uh also outside of this one will give a padding to one and for I will also make the rounded corner and save it and let’s see how it looks so if I open any of this one if I select this you will see this border and the selected one perfect right instead of I think uh let’s let’s remove this uh this styling and we’ll apply to the image only okay so we want don’t want this to the de so basically I will wrap in a curly Braes with this tag sorry quot and then this condition I’m going to write over here and I will remove from this one okay and save it now if I select this one the border is visible but very small and either we need to give a padding so here we’ll just save border but let’s add a padding to one perfect and now if I see if I select this one it’s showing the selected one perfect you can even change the color border color so here is we’ll say border primary I think that’s much better see perfect right and then at the bottom we want to add two buttons so maybe oh sorry after this du we’ll add one more du here we’ll say button and we’ll say cancel and we have another button and here we’ll say next now this cancel button we put a variant as ghost and for this du We’ll add a flex gap of five and justify to end and this will be look like this at the right hand side perfect uh I will also give margin top to five perfect and this is how our dialogue simple and cling is ready now make sure when user enter the topic we need to save it in one state so we’ll Define a state set constant topic comma set topic is equal to use State and when user enter value inside the text area we need to add onchange method which will emit the event e and here we’ll say set topic with the event. target. value and then save it perfect right now if I open this one whenever you enter any topic select this and then click next now you also make to you have to make sure that whenever you enter the topic value and selected the option then only you need to enable this next button so basically here you can add a condition it will be disabled if uh topic or you can add something like this if topic and and or if topic is not there Q or we’ll say or the expert name is not there so here we say selected expert is not there then disable it if I open this one right now it’s disabled I will enter some topic and and select this value and boom now it’s enabled perfect right and if I remove the topic it will again disable it also on the click of this cancel we want to close the dialogue so simply uh we have one uh component here we have dialog close but make sure it’s importing from this component UI dialog and then wrap your button inside this dialog close also don’t forgot to mention it as a child save this one and if I go back to our application if I click cancel see it’s closing right but obviously once you enter the data select this and the next it’s not uh closing it because on the click of next we want to save the user information like which top user selected which agent or that particular is selected like in this case I select the mock interview you we enter some topic then user will select this um some expert and this information we need to save I just figured out we keep this same name over here so we’ll update this name sorry this particular label so let’s go to this dialogue and over here we’ll say select your coaching expert so that yeah now it’s perfect now to save this information we need to have the table into our database so simply go to the schema. JS file which is inside this convex uh folder and in here we’ll create a new uh table we’ll call it as a dis discussion room okay and we’ll say Define table inside here you can provide the column name so First Column I will name it as coaching option or coaching type you can say here we’ll say V do string because it is of type string string then the topic name again it is of type string then the expert name you can mention as type string and uh we are going to save the conversation between this expert and the user right so conversation now this will be an optional field so I will put an optional and it is of type any because we are going to save a Json data into this conversation column now as soon as you save this this one right inside this um convex you will see the discussion room table is get created now let’s create a new function called discussion room and whatever the um functions related to a discuss room which we are going to write like mutation query we keep inside the same folder now inside this we’ll say export constant create cre new room and this will be the mutation so we’ll write mutation first we’ll get the arguments so I will simply copy this because all of this we want except the conversation so I will remove this then we say Handler async CTX comma arguments and the arror function here we say constant result is equal to await CTX do TB do insert and here you need to provide a table name which is discussion room and then provide all the field so first let’s say you want to save this coaching option so we’ll say coaching option do is sry coaching option then the value will be arguments dot I think we forgot to import this V make sure to pass this and then we’ll get the coaching option same thing you want to say topic so we’ll say V sorry arguments do topic and then expert name as arguments dot expert name okay so these three fields which we are adding and once we get the uh when once we inserted the record successfully we’ll get the result which is nothing but the inserted record ID so here we say return the result okay now this return return is very important because once we get the ID with that ID we need to uh pass inside the uh route okay because we are creating Dynamic route later on but make sure to return the result now simply go to this user input dialogue and here we’ll Define this mutation here we say create uh discussion room is equal to use mutation and here we provide API do discussion room do create new room and then we’ll create a new method called constant on click next because that a next button and then over here we say constant result is equal to await create discussion room and inside this we need to pass a topic name which we already have then you want to pass a coaching option that we can get it from the coaching option do name because we are just saving the name and then expert name so expert name again uh which is nothing but the selected expert so from this state and then here we mark it as a sync okay now this on click next I’m going to attach to our button so here we say on click and just call this button right also when uh the data get inserting right we’ll add a loading State and which will be the false initially and when uh you click on this next button we’ll set the loading as a true and once it finished we’ll set the loading as a false now also make sure whenever there is a loading is true right we can show the indicator loading indicator so over here I will add a condition if loading is true then I’m going to show a loader icon and I will just add some animation to this so we’ll say animate spin also I will disable this button whenever the loading is true okay so user will not click multiple times and I think that’s all we needed I will just console the result as well and then save it now if I go back to our application I will just refresh this once and let’s say you want to do some lecture on your topic so I open this and give the topic name so here we’ll say uh I want to learn react CHS basic okay then you can select the coaching expert okay let’s select Matthew and then I will also open the inspect panel so we can observe the result and click next Once you click next right now we get an error it’s saying uh are you running the npx convex de so I’m not sure whether we are running or not so I think we are running okay okay so I think we forgot to save this particular uh screen if you see we have this dot it means we did not save we forgot to save and now if you see uh I think it look like data inserted or not sure let’s let’s try it again so I will open this we say uh basic react JS you want to learn select it and then click next and boom if you see inside the console we got the ID now this ID is nothing but the record which we inserted and it has generated this unique ID if you see we have this coaching option obviously conversation is empty the expert name topic and the creation time all the details are now are now saved to our tables perfect right so that’s what we wanted now as soon as this ID generated we want to navigate to the new screen and on that one user can start learning from our AI voice agent let’s understand the workflow of our application now very first thing we need to do is to connect to the server and very first thing that we need to um get access to the microphone now once you open our application we have to make sure we will allow you uh the microphone access so that we can talk it and then the next step is whatever you talk we need to convert that into a text and for that one we are going to use an assembly AI which is the sponsor of this video now this assembly AI has a feature called streaming speech to text which will uh give us a text in a real time so as soon as you start speaking you you will get the text immediately without any lag and that’s the reason we are going to use the assembly Ai and once we uh get the microphone access then also we are going to start the session uh in assembly and that also we are going to learn next next step is uh to get an answer from the AI model once user ask any question once we have the text we will pass that text to the um gini or any a model obviously we going to pass the prompt and I’m going to tell you that how we can connect to the different uh AI model like Gemini open AI chat gpts dips CLA and many others and that is completely for a free and once we have that model ready and set up we’ll get the answer from the AI model and once we have that answer in the form of text we going to convert that into a speech and for that one we again we are going to use an AWS Amazon poly Amazon poly is a AWS service which will convert your text to speech and again it’s also free once we have this ready then we’re going to play that uh audio uh to the user and that’s how all the AI voice agent works and this is the simplest workflow that I I gave it to you obviously while implementing this I’m going to uh or we are going to follow step-by-step process and each time I’m going to inform you how and where and at what step we are on so that you will understand step by step process to implement it and integrate all of these AI models now once we added the all this information to our database we need to close this dialogue and we want to navigate to the new screen so we want to create a new route and this is our new screen will look like if user select any of these options right it will navigate to this new route and over here we have option uh you will see this image which uh you will feel like okay you are talking with some uh assistant or some agent uh at the right hand side this will be actually our um chart section okay so as soon as you start speaking you will see the chart between you and uh the assistant so you don’t need to type anything but you can see in a real time and then we have this Connect Now button which will connect to the server it allow you to talk um it will also enable the microphones and everything okay so we need to create this route as well uh first thing inside this user input dialogue we have to make sure uh once we save this information we need to close the dialogue and for that one we can programmatically close this dialogue so first we’ll say open dialog comma and we’ll Define this state actually is equal to use State and initially I will make this false now once uh user successfully uh let me click this down when user successfully save the information information we’ll simply say set open dialogue to false okay and for this dialogue we’ll say open as open dialogue okay and then it also have another method called on open change and here we’ll pass set open dialog okay and then save it also I will make sure everything is good over here and Let’s test this out now if I go back and I will just refresh this once and when you open any of this one and add some topic react I want interview on the react just um you can select the any coaching expert and click next and boom obviously it is inserted to our uh table as well okay so that’s how it works now as soon as it gets saved we want to navigate to the new route so let’s create a new route first so inside this main folder I’m going to create a new folder called discussion room and inside this discussion room we are going to create a dynamic route okay and this will be like the room ID okay now this room ID might be different every time okay I make sure this room ID we need to create inside the discussion room okay and then we have this room ID and then we pass the page. jsx file add a default template here we will say discussion room as the uh component or page name and then save it and let me show you how it works so over here if I go to the discussion room/ one then it will navigate to that particular page and if you see the text showing discussion room now on this particular URL you can pass any IDE it doesn’t matter because it’s a dynamic one and that’s the reason whenever you want to create any Dynamic route make sure to give the folder name with a square okay and then you have to pass this room ID which we pass inside the URL now in order to get this room ID you just need to write constant then the name of this particular uh folder is equal to use params okay so this use par hook we are going to use I will make this component on the client side okay and then save it now I will just console this room ID so that you can check whether it’s a correct or not now inside this inspect panel and if I go to the console you will see this ID see and this one and this one perfect but what we want to do um when we create a new record we want to navigate with that particular ID so so basically from this user input dialogue we want to navigate right so as soon as you uh click on next simply or here we’ll say we want to navigate so we need to add a router first so we’ll Define a router is equal to use router from the next SL navigation and simply I’m going to say router. push and give the path name so in this case is discussion room slash and the ID now this ID is nothing but the ID which you get it from this result so that result I will pass okay and then save it and let’s test this out now I will say lecture on topic here we’ll say I want to learn on um history of India okay and you can select any coaching expert say next and boom over here you will see our path along with the ID this is the record ID obviously we have getting from that from the convex and then we jump on this home screen obviously this is the discussion rooms home screen okay so that’s how guys you need to Route it now on this screen as per the design we need to create it and one more important thing if You observe we have the header you don’t need to add a header because we are already in uh using the dashboard layout file okay now for this one first if you see whatever we select that name we are showing so same thing I’m going to add but before that with the help of this room ID we need to get the record information from the database so I will go to this uh discussion room. convex file and here we write new function to get the uh room details okay or the discussion room details or record details you can say so here we say constant get discussion room and the arrow function like this oh sorry not Arrow function as this is the just a query so we’ll pass a query and inside the query we’ll accept the arguments and the argument will be the just ID and here we’ll say V do ID but you have to tell from which column so that column name is nothing but the discussion room I just copy this and paste it here um then we’ll add an Handler async and the aror function okay so here we say constant result is equal to await CTX okay so we need to add CT comma DB sorry arguments and here we say CTX do D.G because we already have the ID so you don’t need to give any table or anything okay just um inside here you need to add the ID something like this argument data ID and conx automatically detect where you are fetching from because we already mentioned this uh ID from which table right and that’s all you need to do once we have the return a result just return the result and then save it as simple as that now inside the page. jsx you have two option that you can fit it either you can uh use the Ed convex hook or you can use the Ed query hook I’m going to Simply uh let’s use the use qu hook so simply here we say constant uh get we say uh discussion room is equal to discuss room data we’ll say is equal to um use Query and then API sorry API dot discussion room. get discussion room and here we need to pass an argument so in this case the ID we need to pass and we already have the room ID so let’s pass that room ID and let’s console this discussion room data and then save it now if I go back to our application go to the inspect panel just to verify whether we got the data or not okay and Let me refresh this screen once and I don’t know whether we got it okay yeah if you see we got the data so which has the coaching option expert name topic name perfect now with the help of this information obviously this information we already have now right but uh from this information we uh or from the expert name we need to get the information so if I go to our options right and if I if I passing this John so I need also I want to fetch out right so that information will get it so simply we’ll say constant or let’s do one thing um okay let’s add a constant uh expert is equal to and over here we’ll say discussion room data okay not discussion room but we’ll say options let me see the field name coaching expert actually so from the coaching expert make sure to import dot find and here we’ll say item then item dot name is matches with the discussion room data sorry this data dot uh expert name and that expert we are going to fet I will just make sure and console whether we are getting or not and we’ll save it and if I refresh this so it’s saying the expert name is undefined because it takes some time to fade the data say right so another option is obviously you need to add that inside the use effect okay and then call this only when the discretion room data is available so here I can add a condition if discussion room data is available then only call it and here we’ll pass the inside this use effect okay so whenever the discuss room data is updated it will execute this line of code and then that expert information need to save so we’ll say constant expert comma set expert is equal to use state so make sure it’s a use State and then we’ll set the expert as a expert perfect and then save it now if I go back and refresh the screen you will see now we got the data with outar and the name now let’s go over here let’s add h2 tag and I want to give the name from the description room data so here we say discussion room data dot um is a coaching option right so that’s what we gave the name let me check and if you see the lecture on topic let’s apply some font to this one so here we say style oh sorry not style the class name and here we say text large font bow then we’ll add a du inside the D we’ll add two more du and we’ll give a class name margin top to five and in this first du we’re going to show um let me show you the mockup so this particular section and on the right the chart section okay so over here first let’s show the image and that image we will f it from the expert do um outar here we’ll say outar only the width let’s add 200 for now height to 200 I will just make sure it’s displaying on the screen so right now we have an error okay so make sure it’s an optional field oh not optional but it’s we will add this uh operator optional operator okay so we have an error expert is not defined obviously because it start with a small e and we have this image beautiful obviously uh I’m going to add a class name we’ll say uh height to 80 pixel and width to 80 pixel and also we’ll make rounded full full okay and uh for the image we’ll say object cover and save it perfect now let’s divide this particular du um this complete de right because we have we need to make two column so here I will add a grid when the screen size is smaller we’ll make grid column two and when the screen size is larger we make grid column uh four okay out of these four column I’m going to assign three column colum to this one so this need to be a column span three and one column to this one only when the screen size is larger and let’s give the Gap so we’ll say Gap to 10 okay and if I go over here obviously you will not see any change but uh later on you will see this then uh for this particular du uh we’ll add some sty uh tail CSS class so over here I will provide some height let’s say 60 pixel uh then I will add a background color to let’s say secondary and also add border we’ll make rounded to 4 4XL and I will make it Flex Flex column item to be in the center and justify in the center and then it’s waiting oh that’s weird this this is not I expected but let’s see why it’s happening okay so this need to be a column span three okay we gave the incorrect name and now we have the width but somehow the height did not okay so instead of 60 pixel we need vertical height 60 vertical height so that’s what we need to mention and boom so this is how it will look like perfect right uh then after that um I’m going to add the text below to this one so we’ll say S2 tag and inside this uh we’ll just add expert. name okay and save it so now we have the name for this one so you will get to know who you are talking with okay and uh let’s add a class name we’ll make text Gray let’s say 500 okay and save it now for this image we can add one animate uh pulse effect okay so you will see once we add the pulse effect it will glow and hide it something like this right it will give you the something like talking calling effect on the right hand side we want to show your uh profile image something like this see so that what we are going to show so simply over here I will add a d and inside this du We’ll add a user button okay and for this du we’ll add a class name uh first I will add a padding let’s add padding to five okay then we’ll add a background gray color let’s add background okay let’s add background gray to 200 okay then we’ll make ping X to 10 and we’ll make rounded large and save it see but we want on the right hand side right so make this as absolute and we will say bottom to 10 and right to 10 and oh so if you see it’s going to this screen right side but we want over here so simply uh for this particular du you can mark it as a relative and then it bounds the boundary kind of thing right and it will be inside this one now perfect next on the right hand side we need to show a chart box so basically similar de I will just copy over here and I’m going to paste something like this okay the height 60 vertical is fine everything is fine and just write a chart section inside the S2 tag and if I go over here okay it’s coming to the bottom that’s the reason is we have this columns span I will remove that that and then on the right hand side we have this chart section as well okay beautiful but it’s quite small right so what you can do you can just change this column span to two and here I will make this three so now this one is quite bigger I think this is much better now now below to this one we are going to add one button which will connect which will have the text called connect Okay so let’s go over here okay I think we need to wrap this in a one de okay something like this and make sure this particular option you’ll add to this particular D so that in another du I can add a button and we’ll just say connect here we’ll give class name margin top to five we’ll make Flex then item to be in the center justify contain in center and then save it and here we have the button okay now obviously this button help us to connect to the server and then you can start the conversation uh over here I’m going to write one message just below the chart section so again same thing I’m going to add over here I will wrap this in another de and then over here we’ll add h2 tag and we’ll simply say uh at the end of your converation we will automatically generate feedback slash on notes from your conversation okay so what we are going to do we are going to generate the feedback note depends on on the user conversation that also we are going to learn okay so in obviously inside the workflow I did not mention that one but this is very important part as well let’s save it and this is how it will look like I will just add some style so here I will add a class name margin top to five I will make text Gray let’s if uh 400 and uh we’ll make text smaller and this how it will look like I think that’s much better also over here we have lot of space so I will also fix this simply uh for this parent de you can add a class name give margin top to let’s say 12 and once you add the margin it will go into the negative margin right and that’s how you need to add inside the telin CSS I think and now if you see the margin is gone I think this is much better so everything will be in the same screen you don’t need to scroll it now it’s time to enable the microphone and this is the first thing that we want to implement I’m going to make your life easier because I’m going to provide this source code uh so that you can enable the microphone and it’s quite straightforward so first let’s copy this from this documentation I will keep the link in the description and when we click connect uh that time we need to call so here we say constant connect to server I will say and the arrow function like this and simply paste this line of code uh obviously there are lot of spaces so I will just remove this all the spaces and then save it now this particular connect to server method we need to call when we user click on this connect button so here we say on click and then call this connect to server also we have to make sure whenever we do the click connect we need to show a disconnect button to disconnect from the microphone as well or from the server as well so I’m going to create one state we’ll say constant enable uh microne I we say enable recording okay so enable mic or we say set enable mic is equal to use State and initially it will be false now whenever the set mic is false then we are going to show this connect button so here will say say uh enable mic if it’s false so I will make this as opposite then show the connect button otherwise we’ll add another button we’ll say disconnect and for this button I’m going to add a variant as destructive so let me get this variant name so this is the destructive I will just paste it here and then close it so obviously uh we’ll call a new method to disconnect here we’ll say uh dis connect okay and this disconnect method I’m going to write over here perfect and uh let’s save this one now over here we use this record RTC from the browser method okay so we have to import this This Record RTC first so from this package uh you can just search on Google record RTC you will find this package copy that and then paste it here so inside the browser I will just add this record RTC oh I need to add with the sud sudo and then once it is installed you can easily import that so now if you see it’s importing from this record RTC uh I will just also comment this one because we don’t have this transcript rber okay but I think other than that we are good uh at the top make sure to define the recorder okay because this is very important and obviously uh we also need to add let’s do one thing let’s add a constant recorder is equal to use reference and initially it will be nuls make sure to import this use reference as well so that uh we can start we can refer that and we saying just start the recording and once it is finished we’ll say stop the recording as well so inside the disconnect it will accept the event and inside that we’ll say e dot prevent default so this need to be a prevent default and then we’ll see recorder do current dot PA recording okay and then we’ll set the recorder. current to null and also we have to make sure set enable mic to be false okay and once you start connecting we’ll say set enable mic to true and then save it now let’s go back to our application I will just refresh this once also one more thing I forgot um you need to mention this slice timeout so I will at the top I will just say let slice sorry silence timeout this is needed uh to check whether user is silent or not right if user pause it means we have to make sure that now ai need to do his work okay because here user will waiting for the ai’s answer right and that’s the reason we added this um I don’t know why we got this internal server error Let me refresh this again we got this internal server error because we added this record RTC and sometime Nick just first run on server and then we’ll check is it a client or not so to fix this I added this line called constant record RTC and then I’m importing dynamically with this import statement and make sure you can say server sign rendering as a false okay and uh uh inside this connect server we already have have this method I just commented to test this out and I think we are good let’s test this out so if I click now connect and right now if you see it’s saying permission denied right and Mac phone is not allowed now I by means by manually I disable this one but once you enable right I will reload our application and then if I click connect you will see that the microphone is enabled right and it’s saying using now it means whatever we are speaking now it’s getting started now to test this out I will just console this buffer so that you will get to know whether we are recording or not now if I click connect you will see it start recording perfect so as soon as I start speaking it will change this value as well when I click disconnect it will just pause the recording and now it will not record anymore okay so that’s how you need to uh connect the microphone and enable it and depends on the requirement you can connect and disconnect the call now the next step is to convert our speech to a text and for that one we are going to use an assembly AI where we are going to stream the speech in a real time to a text so it will be very useful so that whatever the text we are going to get it from this assembly AI we are going to pass to the AI model so simply go to this assembly ai.com or click uh the link in the description so you’ll jump on specific page the assembly is a platform where you can convert speech to text also it provide a product called streaming to spee to text that what we are going to use uh in real time it going to happen so as as soon as you start speaking you will see uh it will display the text on the screen also it provide a lot of other different features which you can taste power frame they have the playground where you can taste these features uh you can try out and definitely asmbly AI is one of my favorite um platform because I already use asmbly a in my one of the previous video now simply create a new account if you don’t have and log to this account on the home screen you will see the code example if you want to do the uh transcribe your first audio file to to the text and there are lot of other model now we want to deal with this transcript live audio stream that’s what we are looking so very first thing we need to do is to install this assembly AI so I will copy this assembly AI statement and inside our terminal I will just make sure to install that once it install uh this is the code that you can use it but we are going to write in a little bit different way which is comp completely comp able to the react and nijs also let me walk through this assembly AI dashboard so whenever you uh make a call to this assembly AI API you will see this usage over here uh along with how much did it cost and everything you will also get $50 of credit uh once you join it okay then you will find all all all of these analytics uh depends on how much you use then inside this account section you’ll you will find all the red part in the building section you have option to add a funds um you’ll see this all the pricing um the most important and most interesting thing that you can set the alert so right now if you see I just set alert to $1 and whenever the $1 reach I I will get notification that okay you are about to reach $1 then we can take an action accordingly inside this API key you’ll find this API key just copy this API key which we want later on in this project and then in the documentation you’ll jump onto this assembly AI API documentation which help us to integrate with any kind of uh application or any kind of platform with an assembly AI now first thing uh I’m going to copy this API key because we want that okay so just copy this go back to your project and inside your do local file I will just paste it here okay so let’s paste assembly API key is equal to and paste this key and simply save it I will close all of this tab for now now once you install the assembly AI let’s go to the discussion room page. jsx file and here I’m going to define the we’ll say real time transcriber is equal to use reference and initially I will set this as a null now inside this connect to server over here we’ll initialize this uh assembly AI first okay now inside this assembly AI we’ll say realtime transcriber do current is equal to new real time transcriber okay and inside this one you need to provide the token now we need to generate the token every time whenever uh you your session is started you have to make sure every time you will generate new token which will help us to avoid unauthenticated uh API call to this uh assembly a okay and then we also need to provide a sample rate now the REM recommendation from the assembly AI the sample rate need to be 16,000 something like this you need to provide and here we need to generate the token okay okay so once you generate the token then it will get initialized automatically now in order to generate this token you need to generate on the server side so for that one inside this app folder I’m going to create a new folder called API and inside this API folder we are going to create a new uh folder again called um let’s say get token okay and then we’ll say route. jsx now this is the end point that API endpoint or API we are going to create called get token and in that one we are going to uh write or we are going to call assembly AI to get the temporary token so it’s quite simple we’ll say export uh con sorry export async function as this is a get request so we’ll say get request and then we’ll say constant token is equal to await here we’ll say uh assembly is okay let me Define this first okay so over here we say constant assembly is equal to say assembly AI uh is equal to new assembly AI make sure to import that one and here inside that you need to provide the API key which we already have from process. EnV and from this environment variable we’ll copy this assembly API key and we’ll paste it here and then once we have this assembly AI we’ll say assembly AI do real time. create temporary token and you can even uh mention the expiration of this temporary token let’s say um 6 Minute 5 6 minutes something like that I think okay and just mention once you get the token you just return that token so we say next response do Json and then pass this token okay and then save it so that’s how you can create the API endpoint now once you have this API endpoint ready uh simply go back to your to this page.js and here we’ll say await oh so over here uh we need to call this HTTP endpoint to get the token so what I will do uh inside our service folder let’s go to the services folder and we’ll create uh one file and we’ll say uh Global Services dot jsx okay we’ll add a default template and we’ll write this method over here to make an uh sorry we don’t need a default but we’ll just add that something like this and uh to make an HTTP call to this particular API endpoint we need an exos exos is the HTTP uh client Library so just install that first so you just need to add npmi exos and it will install the exos for you once it is installed we’ll write a method called export constant get token and arror function over here we’ll say constant result is equal to await exos doget and then give the end point which is API SL uh get token okay and we’ll make this as a sync and once we have the result we’ll say return result. data and inside this result. data we have the token let’s save this one and simply call this so we’ll say get sorry await first await get token and if you see it is importing this get token um from our Global Services okay so that’s how uh you can generate the token and then you can use it now every whenever you start the session it will generate the new token for you now write some uh uh shocket function or from the assembly so it’s like realtime transcriber do current. On and Here the name is so I will just copy this name from here if you see uh uh let’s go to this stream typescript and this is the transcript okay so we’ll copy we’ll paste it here we’ll say async transcript okay and the aror function now inside this one um you will get the transcript so I will just console this transcript for now and we’ll tast this out okay and over here don’t forget to connect that so here we say await realtime transcriber do current do connect so it has a Connect method which will connect to the asembly AI okay and then save it also when you disconnect it it’s very important that you you also disconnect from this assembly AI so we’ll say realtime transcript do current do close something like this okay and then Simply Save it let make this as a sync and save now we’ll test this uh until this point and we’ll check whether we are getting any data from AI uh assembly or assembly AI so let’s refresh uh we’ll connect this and uh let’s talk something okay I’m not sure whether something is happening or not just we are getting this whatever you talk right that buffer we are getting right now we are not getting anything right so what we are missing that whenever you speak right you need to pass that buffer or because whatever we we are getting from the mic to this socket so this is the socket nothing but uh uh this transcript when you pass that it will detect that the change and then it will consolid now obviously first we’ll say if realtime transcriber do current is true or not okay and then uh over here we’ll we need to send this buffer so we’ll simply say realtime transcriber do current do send audio and that audio file we are going to send we’ll say buffer so whatever we have inside the buffer that is nothing but the encoded uh audio okay whatever we speak and that we are passing now let’s save this one and then let’s test this out so I will refresh the screen once go to the inspect panel and console and let’s start now we say connect now once you connected uh we’ll see whether we are getting any data or not and right now if you see we got some data and beautiful right let’s stop it and once we stop obviously it will stop but over here you will find we uh inside the transcript we got different type of data okay obviously over here you will see whatever we talk that coming inside the text but it’s called a partial transcriptor but once you find ize then we’ll also have the message type called final transcript which contain your complete message now partial transcript is helpful when you want to show in a real time and final transcript is helpful when you want a complete uh text once us are pause for few millisecond or second then this final transcript will get execute right now until this point simple thing that we are able to uh get the data inside the transcript only thing that we need to to uh show it on the screen right as soon as we start talking so over here what we can do um let’s say so we’ll write a logic in order to update the real uh state in real time so you can define a uh over here we say let text and curly braces okay and also I’m going to Define constant real time text comma send Real Time text or we can say uh transcribe actually here we say set transcribe is equal to use State okay now inside this transcript we’ll simply get the text first okay so I think it’s name this two let’s make it text now in in order to show the real time right we’ll say text TT inside that we’ll say a transcript dot let me see what we get yeah transcript dot audio start okay obviously this particular field you will find over here see audio start okay and then we are going to show a transcript dot uh text because we are getting that text then we’ll say constant Keys is equal to object do keys and then we’ll pass a text into that one okay after this we’ll say keys do sort and we are just shorting depends on the time right because we have the already saved and then last we’ll save for constant key of keys and arrow function here we say if takes of key if it’s there then we’ll simply say message obviously we’ll Define a message in a moment and then we’ll save in that one okay so we’ll just add text of key okay and obviously uh every time whenever this particular transcript execute right we’ll Define a message over here and once we have all of this message then simply we are going to save in a state so here we have already Define the state called set transcribe and then we’ll add the message to that one okay now for now okay we are going to show the transcript maybe uh after this du we can add let’s add a du again and inside do we can show this transcrip and then save it now Let’s test this out I’m not sure whether it will work or not but now what you need to do let’s connect again and once it connected obviously it will take some time we need to add a loading function for that one and we’ll see whether it’s working or not right now um I think something is missed so we have an error so it’s saying Keys is not defined okay let’s fix that one so maybe somewhere we use incorrect variable name oh okay so over here it need to be a keys okay we gave the incorrect one so let’s connect again uh you will see that as soon as I talk speaking you will see in a real time we are getting the uh text printing on the screen how cool right so that’s how uh this assembly AI give you the speech to text in a um real time so that’s how easy it is right we did not add a lot of code just you need to make sure uh once we get the data from this uh transcript right we need to add it in proper way so that we can just display it on the screen and here we have the result obviously uh over here I’m going to update little bit so we will see um u in proper way but inside the chart section we need to save the final script okay we don’t want a partial script some return something like this so to do that uh inside the transcript over here okay maybe uh we can add after this we’ll say if transcript okay let me first disconnect this because otherwise it will get all my text so I will disconnect this and make sure to refresh now inside the transcript we will say transcript do message type is equal to equal to and if you go okay I just refresh that’s fine it has a um field name called final I me the message type has a two type of script one is the final and one is the uh partial okay we want the final script and then once we have the final script we’ll Define in a or we’ll push it inside this list so we’ll say Conversation Set conversation is equal to use state and it will be the list and once we have the final script we’ll just push that so we’ll say set conversation here we’ll add the previous one and we’ll say dot dot dot previous value along with the new value now here I will add a role as a user because when users speak then we are adding that right and here we add a content with the transcript sorry uh transcript. t text okay so what we are doing whatever the final transcript we are getting we are just pushing it with the role as a user and inside the content we uh add whatever the text we are getting from the assembly AI as simple as that and then save it now we have this conversation right uh State now only thing that we need to do once we get this result we need to display it in the form of chart as simple as that and once we have uh the user conversation then we’ll move to this next thing that to get the answer from the AI prompt now moving to the next section which is uh getting the response from the AI model as we already have the text which uh user speak and we converted to a text by using the assembly Ai and then that text we are going to pass to the AI model now in this case you can use any AI model like Gemini open AI D CLA and any other I’m going to tell you how to get all of this API for free also uh inside our option. jsx where we have this coaching option and we added the name and icon along with that one I added this prompt field now I’m going to share this particular file with you so that you can use this prompt in each of this prompt we pass this user topic right and obviously we are going to replace this user topic string with the actual user topic which user enter I will just rename this fi name okay so I will make sure it’s in the small case and then save it now I will close this all of this file and inside this services in Global Services we are going to create a new um method we say constant uh we’ll say AI model and the arrow function now inside this we’ll write a logic to get the data from the AI model and for that one we are going to use an open router. a open router contains a lot of different AI model which you can use it for free they also have paid um uh API available but it’s up to you which one do you want to use you can even try by going to this chart and you can teste this out completely for free now if I go to this model section here you will find a bunch of different model which you can use it now if you want to use any free model here you will see uh the model for for example this gamma 31b is completely free that we can use it or you can just filter filter this out with this free model and here you will see that we have the dips um the Google Gemini and there are lot of other right so you can use whatever whichever you want now simply uh make sure to sign up with this your account and then select the model which you want to use let’s say for example I want to use this Gemini Pro 2.0 experimental select this model here you will see all the information okay then inside this API section you need to create an API key okay just click on this create API key and you will jump on this API key you have option to create a new API key I already created for uh VI agent so I will just copy this one but you can create that’s completely for free then select the typescript and this is the simple code example that you can use it directly inside your application but if you want to use any third party um hdk that also you can use it so I will click on this framework documentation and it will jump on me to use this open asdk and that’s what we are going to use okay so uh basically what I’m going to do um I will copy first obviously make sure to install this open so let’s copy this uh inside the terminal make sure to install and once it install make sure to import this open a so over here I will import that and then I will just copy everything as it is so let’s copy this open initialization first so we’ll copy this uh I will put just above this so it will initialize we don’t want this default header so I will remove this we want to replace this with an actual API key so I will just copy this environment variable which I kept inside this. loal file and then simply I will say process. EnV and this key once you open is initialized let’s go back to the documentation and this is the simplest code which you can use it okay let’s copy this console log as well to verify that we are getting the data let make this as a a sync and over here you need to provide the model which you want to use now we’ll go back to this previous screen and you will see this model name over here just copy this model name and over here just wanted to show you that this is completely free okay so you don’t need to pay anything you don’t need to add any card detail just uh use it now once you add this gini AI model inside this messages you have option to pass the role okay now to this AI model we are getting we are going to accept two main FS one is the user topic okay whatever the topic user selected and other is the uh I think that we can call it as a instruction we say uh coaching option user selected option okay that option we are going to get it so if user select this one then that option will get it uh I think that’s all we needed for now okay over here here I will just commment this code because from this coaching option we need to get the prompt so here we say constant um here we say option is equal to we say coaching option I don’t know what we call okay coaching options so here we say coaching options. find and here we’ll say item if item dot name is matches with the coaching option name okay so whatever the coaching option name you pass if that matches then we have this option and then once we have that we have this prompt right and uh we can just get that prompt using option. prompt okay now to this one we need to replace this particular keyword called user input and we need to replace with the actual topic so here I will say replace this user uh topic string with an actual topic which user entered okay so this topic we have and then once we have this final prompt I will just uncomment this here I’m going to add one more uh message will say assistant okay or you can add a system we’ll say assistant and then inside the content you can pass this prompt okay so what it will do every time when you send a request it will make sure this is the prompt that you are passing so that uh from next time it will give you the uh answers related to that particular uh promp okay so that that’s the reason we added also we need to accept one more fill call message so whatever the message you just send that message we are going to pass over here and then save it now this is the simplest way that we are doing but later on we are going to update this message field in order to get an uh quite accurate data if you are talking with the AI voice agent for 10 minutes right so obviously we need to pass an all the history so that depends on that one he will send you the relevant data now once you are done with this one let’s save it and then I will save this file as well and go back to your component called discussion page right over here inside this final script once we have the final script we need to call that AI model to get the answer so here we’ll say uh calling AI text model to get response so we’ll say constant result or we’ll say uh AI response is equal to await AI model and make sure to import that one and here we need to pass the three field first is the topic so obviously from our discussion data you’ll get the topic dot topic we have then uh from the discussion room data so we say disc room data dot um the coaching option which we selected and the last the message now the message is nothing but the one which you are getting from this one right so we have this transcript. text something like this okay I will just bring this down and then we’ll say console. log AI response and then save it now in the browser you will see you might get this error called Dangerous allow browser because we use the open AI directly on the client side so basically what you can do either you can create a write the method in on the server side by creating the API or other option is just um over here just pass this dangerous allow to True okay and then save it and now if I refresh you might not get any error also when we click connect right we want to show the loading so that we can just disable the button and we can show that that particular page is get load sorry we are going to connect it so what we can do we can just create a loading state is equal to use State and initially I will make this as a false and when user click on this connect to server we’ll set the loading as a true and once it’s connected successful so after this one we can set the loading as a false and for this button where we have this connect button right over here I’m going to show a loader icon so first I will add loading condition if loading is true then we can add a loader icon and in order to spin it I will add animates pin animation okay and then save it same thing you can do it for the disconnect as well whenever user click on disconnect we’ll set the loading as a false sorry uh true and when it’s disconnected successfully we’ll set the loading as a false and and that also we can show the same thing so I will just copy this and then we’ll paste it here okay but make sure whenever the loading is true you can disable this button okay so we’ll say disable whenever the loading is true so user will not click on this button again and again now let’s test this out I will open the inspect panel and the console we can observe the result now when I click connect you will see it’s showing the loading and button is disabled and now I can able to speak hey bro how are you hey hi there and wow so if you see we have this role and it’s saying some message and we got the response perfect uh I think I’m talk lot so that’s the reason it’s not giving immediately answer but let’s try to disconnect and try it again if you see we got the response as well but obviously when I want to explain then it will be little difficult to get the answer immediately because I’m talking some random things right let’s try it and I will just talk with him him okay just observe hi Jonah how are you can you tell me little more about India and H and its culture now I’m disconnecting this one and if you see we got the response here we have lot of other things see I asked about the Indian culture and it’s saying let’s explore the Fantastic culture of India and there are lot of other things pretty cool right so that’s how we can get it but if you see we have very long response and we have to minimize this response obviously in order to do that you need to update our prompt and that’s very important okay because many guys think about why we are not getting exact response and many other things but to update this particular prompt is very very important so that you can get an exact answer okay so either you can just put it put this in a chat GPT tell him that okay we want to update this prompt in a specific format and then you can up update it but don’t worry I’m going to update it for you and then you can use it directly so boom I updated this prompt but it’s quite similar but here I added that uh the answer need to Wi 12 character only okay now once you mention this one this is very important uh Thing Once you mention it so that from next time it will not give you the big answer and you don’t need to wait for a long time if you see it’s very big answer actually okay here we have see okay and we don’t want this big answer because it’s a conversation between uh the AI assistant and us right so that’s the reason and uh obviously now you can try this and then you will find out obviously you can test our demo of this application I will put the link in the description so you can just just check it out and then you’ll get the idea how it works now once we get the answer we need to save it in our conversation list because we already have this conversation and we are saving the user but now we also want to save the uh AI response okay so simply I’m going to add a set conversation after uh getting this AI response and here we’ll say previous one dot dot dot previous one comma the AI response now I keep I’m just adding this AI response as it is because we are getting the similar structure what we are using so if I go to this this one so we are getting the role as a assistant if it’s from Ai and we are getting the content as well and that’s what we wanted and that’s why I’m directly saving inside this conversation okay and then save it now we need to show this all the conversation inside our chart box so that’s what now we are going to display uh whatever the conversation we are going to make whenever you just speak that uh conver that text also going to display inside the chat section if the AI or assistant give the reply that reply also we are going to show so for uh designing the chat box this is how uh it will look like we have the user added message means whatever user speak that going to add and the the response from the AI as well okay so it’s not that much difficult so let’s go back to our application and we already have this chat box section right so this particular so basically I’m going to create a component inside this discussion room so we add a or maybe let’s create inside this let me go to this room ID and inside that we’ll create underscore components folder inside this we’ll add CH box. jsx file add a default template and then I will just cut everything from here and then we’ll paste it inside this chat box over here I will just import the chart box component and uh I think we need to add a du so let’s wrap this in one de something like this and then save both the things make sure that on the UI side nothing will get changed okay so I will just refresh this once and everything will be as it is beautiful um then to this chart box component we want to send a conversation right so I will pass this conversation state which we have and for now as a default one right just for testing purpose I’m going to add two messages one from the user let’s say AI for example or sorry it should be assistant and then the content we’ll say hi then I will also add another one with the role user and then the content is hello okay now this is I’m just adding so that we can display it and then we can design according to our

requirement so here make sure to accept the conversation and now we need to uh iterate this list of conversation in order to display it so I will add one D inside this du We’ll add a conversation so here we’ll say conversation do map we’ll say item comma index and the Arrow function let’s add a du and inside this du I will add an H2 tag and then we’ll say item do content and save it now if I go to this chat box you will see that we have this High and Hello currently it’s showing in the center of the screen so basically you need to remove this item Center justify Center okay and then save it and if you see now it’s showing on the left side corner if you want you can just uh change this rounded to little bit smaller something like this I think that’s will be good uh also I’m going to add some padding let’s say padding to four and I will just change this style little bit okay now next thing if the message is from AI we want to show it on the left side otherwise we want to show it on the right side so to add that we need to add a condition so inside here we say if item do roll is equal to equal to assistant right then we’ll say let’s add a S2 tag with item. content obviously we need to add a style to this one otherwise let’s add another S2 tag uh if the user is user role is uh sorry the item role is user so it will show this h2 tag now for this AI S2 tag we say class name uh we’ll add padding to one padding X to two we’ll change the background color to primary and then make the text white okay here I will make the uh inline block okay that is important and if you want you can add the rounded uh Corner let’s say medium for now and then save it let’s see how it looks and if you see this is the message will look like beautiful same thing you need to do so I will just copy everything as it is for this S2 tag right but instead of background primary I’m going to add a background gray let’s say 200 and text I will make I will keep it as a black only okay and we want inline block and rounded MD but it’s just showing below to this message but we don’t don’t want that one also we want some margin top to one so for both of them I will give margin top to one and over here I will mark it as a flex and then we’ll say justify end okay I think you don’t need to Mark flex but let’s say justify in and that’s nothing is happen actually so we need to apply that style to this de let’s add a class name and I will add a conditional over here okay so first we’ll make it Flex then in dollar sign We’ll add a condition if item do R is equal to equal to user then I will make this justify end okay so we want on the right hand side so that’s how it looks on hello here we have the high perfect right so that’s how you can add it now obviously once you start entering them or adding the message it will just display over here now if you have more than let’s say 10 message 20 message right you need to have a scroller over here so that user you can able to scroll it properly so basically for this particular du or maybe for this de only okay uh let’s remove this do and just keep the top du and to this du we’ll say overflow Auto okay and then save it and I think that will be okay now once you start speaking you will see the message will appear over here and let’s test this out also uh inside the Global Services we did not return this message so make sure uh from this a model we will return the response okay so that we’ll get it uh on the UI side obviously inside the page. jsx once we have the response we are adding to this set coners so that it will displayed inside our chat box as well and if you see I added some uh questions and some answer and it’s starting adding inside our chart box as well pretty cool right also it’s showing all of this information uh the user obviously the a which one is from a assistant which is from the U and everything cool obviously uh you can add some more styling to this one so if I go to the chat bo uh chart board I will add some more P margin top okay also um I think that’s all we need it if you want you can change the size font size but I will keep as it is that’s I think better one and uh that’s all now another thing if you see this scroll bar right if you want to keep the scroll bar then you can keep it but if you want to hide this scroll bar you can also hide it so for that one you have to say you just search on Google uh tell says no scroll bar in M package now in TSS you cannot hide the scroll bar directly so we have to use this third party Library uh make sure to install this one so I will just add that and with this with the help of this Library first thing um you need to add this TN SC height but the thing is we don’t have that file okay so I don’t know whether this will work or not but let’s directly add okay so if you see we have this t for CSS support that’s pretty cool so I will just um use this scroll bar hide and let’s see whether it’s working or not if it’s not then we will leave it as it is but if I say scroll bar hide I don’t think so it works okay but anyway uh maybe let’s refresh this maybe after refresh it might work okay if not work then you can just leave it it’s not big deal now one last important thing that the thing which we want to do uh to This Global API service we are just passing one single message right but instead of that one we can pass the last two to three message maybe okay or you can pass complete conversation to this particular AI model okay so it’s quite simple inside this page. jsx currently you’re passing this uh text right instead of that one I will pass pass the conversation and let’s try to pass last two messages so what we can do just before this one I will uh just get the last two message so here we say last two uh message I will say is equal to conversation dot slice and we’ll say minus two okay so it will give you the last two result and then once we have that one we can just pass over here okay like this now inside your am model here we have this uh we say last to conversation I will just rename this one okay and I’m going to pass that over here so I will just remove this and we say dot dot sorry dot dot dot and this one so what will it will do it will dest structurize your list and it will be something like this okay it will add to this messages only and then save it so um I made one quick fix uh I paused the video and I made that fix so inside this connect server right we uh just after inside this final script we are making an AI model call right I move this AI model call to the inside the US effect okay so I added this fet data as a one assing function and inside that I’m calling it the reason is um whenever the conversation change I am executing this use effect but I also made make sure the last message is uh and or added by the user and then only I’m making the AI model call okay that is necessary because whenever um the conversation uh State change we need to make sure we are updating the list okay so make sure to uh add this particular fix this is very very important now it’s time to convert our text to speech and for that one we are going going to use AWS Amazon poly as we completed all of these STS one by one and this is one of the last ST which we need to convert the text to speech so simply search on Google Amazon poly and uh it’s free to use so you don’t need to pay any anything for this one just go to that one and here you will find all the details about this Amazon poly it has a real life vies customizable output and there are lot of other feature this Amazon poly will uh provide you now simple sign into your account and then over here or you can search on the search bar let’s say Amazon poly okay now once you are in this Amazon poly uh you can even play around it you can test a different different engines uh if you have standard then make sure to select the specific voice we have a lot of specific voice as well and simply click listen my name is you can test different different uh engines along with the different voices along with the different languages as well pretty cool right now the thing is how to enable this now first thing uh you need uh Amazon poly SDK so here we will say just search on Google Amazon poly SDK or just type npm as well so over here we’ll say oh sorry I think I forgot to add WS poly npm and then open this first npm package which is this aw SDK client poly copy this go back to your project and just execute over here now once you install this inside our global Service we are going to create a new method so over here maybe after this we’ll say constant convert text to speech and the arrow function like this obviously it will accept a text which you want to convert right and here this will be an async then we’ll say constant poly client is equal to new poly Cent and make sure to import this from this AWS hdk client poly then inside this we need to provide a region now which region you want to use that region you can provide so that region name you can find it uh to your Amazon so right now if you see we have this Us East one which I selected but you can select anything let’s say if you want to select Mumbai you have to type AP uh South one okay so like that then after that you need to provide the credentials now inside the credential first we have this access key ID which you want to add and another is uh security access key now both the key we need to generate and get it from the AWS so in order to generate it go to your accounts and click on this SEC security credential inside here you will see the users right now I already have this voice agent created before as well but you can create a new one so click on create user and give the user name so here we’ll say uh AI coaching uh voice agent or we can say whatever you want then click next and make sure to select this attach policy directly over here search for AWS poly so over here if you type poly you will see this Amazon poly access just click on that and click next after this just review all the changes because this is required and then click create user once the user is created then you have the access then simply go to this AI voice agent here you have option called security credentials and you can create this access key okay uh if you scroll down you have multiple option but we need to create an access key so we’ll create access key over here you need to click on this application running outside AWS right because we are running outside AWS click next and then you can just add the tag value whatever you want want it’s not mandatory I believe but you can now create access key and boom if you see now we have this access key that I’m going to use I will copy that go to your environment file and then over here I will paste it so we’ll say next underscore public underscore AWS access key ID is equal to and paste this key then we want one more which is nextore public uh AWS secret key and that will paste it here okay so you can just copy this and then paste it here save it so that uh now once you save it uh just click done okay and you can even download it but make sure that you will copy this otherwise you will not get it get it okay you have to create it everything again uh so once you create that’s all you need to do and then your Amazon PO is now enabled inside the AWS now just use that so here we’ll say process do environment Dot and I will copy this access key ID and the secret key ID as well so we’ll say process. EnV and this key and and then save it okay I will just make sure to export this so we can use it later on now once your poly client is initialize here we say constant command is equal to new and we have synthesis speech command okay it is importing from this hdk client poly make sure to import that and to this one you need to provide a text now whatever the text we are passing to this one this text you need to add then you can provide the output format it has the different output format but we want the MP3 so we’ll select MP3 and then a voice ID now this voice ID we have a lot of other see right but obviously depends on this option because we are already passing that option that option uh like uh the name expert name we give right so I will accept the expert name as as well and then we’ll pass it here after this uh inside the try catch block uh we’ll say constant we will we need to generate the audio stream so we’ll say audio stream is equal to await poly client. send and this particular configuration we need to send to the poly client hdk once we send it we will get the audio stream and that audio stream we need to convert into a buffer so we’ll say constant audio array buffer is equal to await audio stream so we see audio stream so this need to be an capital A actually okay so I will just use this one audio stream dot transform bite to sorry transform to bite array so here we say trans transform to bite array okay something like this and once you done this one convert this into a blob so here we say audio blob is equal to new blob and inside that we’ll say audio array buffer make sure to wrap this and here we’ll say of type in to add audio SL MP3 okay and once we have audio block we’ll make sure to convert into a URL which you can play so here we will add URL do create object sorry create object URL and then pass this audio block then inside the return we’ll say audio URL as simple as that if you get any error so inside the console I will D I will just pass uh console the error okay so that is the complete uh logic in order to generate the text to speech now let’s save this one and inside the page. jsx once you have the response ready we are going to generate that okay so over here we’ll say um constant uh audio we’ll say just URL for now a wait and we’ll paste this not paste but what is the name we give convert text to speech so here we’ll say convert text to speech now whatever the AI response we are getting and from that one we will get the content comma um we also need to pass the name of the experts so that we are getting from the discussion room data so over here we’ll say discuss room data do expert name okay and once we have that I will just console the log with a URL okay now this URL obviously this is audio URL we need to save in one state so here we’ll say audio URL comma set audio URL is equal to use State and at the bottom so inside the use effect we’ll set the audio URL to URL as simple as that and then save it okay now now let’s test this out I will go back to our application I will make sure it’s completely new okay let me go to the existing one also one more thing uh we we have this inside the global Service we added this audio stream this n Tu capital A with the audio with the audio string okay so this is important because we are just whatever the value we are getting from this poly client s we are D structur structur it and we are getting this this audio stream so make sure that uh now I will open the inspect panel inside the console we’ll check whether we are getting the URL or not so let’s connect it and let’s talk hey hi I am tube gurji so right now it look like um the value which we pass right it’s not correct the Sally is not correct let’s pass some other name okay so let’s see because we have couple of name right that we can use it uh let me go back and select the other one so I will go to this previous one let’s go to the dashboard and we’ll create a completely new it look like Sally is not working uh so let’s select this topic based lecture here we’ll say I want to learn react native basic and we’ll select the J okay and click next and let’s connect now and then we’ll speak hi Janna how are you and over here you will see we got this URL perfect right once we get this URL we need to uh play that audio URL inside the audio tag as we are we are already saving that URL inside this audio URL state so simply uh maybe after this image tag after this expert name I’m going to add an audio tag over here we’ll provide a source The Source will be audio URL just close this tag over here I’m also going to provide a type uh I will just say audio MP3 and uh make sure to do the auto play okay so as soon as the audio URL is ready it will play automatically you don’t need to do anything let’s save this and let’s restart again and in this case now you’ll see once we start talking okay it will also give give us the response in a voice and that’s what uh this all about so let’s save this one I will just open the console in case we see any error hi Jonah how are you I’m doing great thanks how about you ready to dive into some react native Basics yes for sure awesome let’s start with what react native is it’s a framework for building mobile apps using JavaScript and react excited yes quite excited great to hear do you have any specific topics in mind like components or navigation can you tell me how to create the react application sure to create a react native app use the commanda set it up no um I have to go okay bye no problem have a great day feel free to out have more questions bye and that’s how cool it is right instantly within a second you are getting the answer from the AI and everything we are going we already connected together all the pieces first we connected the microphone then we convert that speech to text with the help of assembly AI then we C the response from the AI model and then we convert the text to speech and then we are playing that one and again the last step repeat obviously that is already happening how cool right and everything we implemented for free so guys that’s how um it works now you have one task right now if you see as soon as we talk we are showing that over here right either you can style this as depends on your requirement or you can add some text box so that it will get added to that text box and then once you stop it will get sent it’s up to you how you want it okay but uh I will leave up to you okay so until this point if you have any question any doubt let me know in the comment section ask the question on my Discord Channel because there are lot of things that we are going to implement again now it’s time to save the user conversation with an AI into our database if you know that we already have this column conversation inside the discussion room and in that one we want to save all of this conversation so in order to do that we just need to Simply uh write a function and then we need to save it so let’s go back to our convex and inside the convex we already have this discussion room inside that we are going to write a new function to update it so we’ll say export constant update um conversation and is equal to mutation now obviously this is a update so obviously it’s come under mutation and arguments we are going to pass or we are going to get the user record ID okay so here we’ll say ID uh V do ID and then we just want to pass the table name which is discussion room after this we also need to get the conversation whatever the conversation we want to save and which is of type A any so we’ll say V do any then let’s define the Handler and make sure you’ll get this idea and conversation when you are passing now inside the Handler we’ll say async uh CTX comma arguments so args and arrow function over here we can directly Define a wa CTX do db. patch now patch is used in convex to update the record here you need to pass the ID of the record which you want to update so here we say arguments. ID and then the field which you want to update so in this case we’ll say conversation and simply We’ll add argument. conversation and that’s all that’s how easy you can do this update uh the record functionality using the conve now simply go back to our uh discussion room page. jsx and here we’ll Define the mutation so we’ll say constant update conversation is equal to use mutation and give the API endpoint so API do discuss room do update conversation now this update conversation we need to call so let’s go back and whenever user disconnect the uh conversation or then then only we are going to call so here we’ll say await um the update conversation and inside that we need to provide the arguments so ID now ID is nothing but from we can get it from the discussion room data doore ID and then the conversation which we already have the state called conversation and that’s how easy you can do it and that’s all about the updating the conversation now let’s test this out so simply we’ll uh we’ll connect with connect cre and then we’ll uh add some conversation and once we add right and if you see right now I added some of the conversation now I will disconnect it right and as soon as I disconnect it will also save to our database so after disconnecting let’s go to the um discuss u in our convex and inside the convex if you see we have the user conversation if I open you will see the content the role depends on whether it’s a user or assistant all the conversation now we saved to our discussion room conversation column that’s how you can do it guys um if you have any question in it this application will not only help you to convers uh put a conversation between you and air Voice Assistant but it also give you the feedback or notes depends on the choice now once you click on disconnect uh you have option to generate the feedback or notes depends on whether you are you are giving interview question answer or topic based leure okay once you click on the feedback then we are going to send all the conversation to the AI model along with the some prompt and same thing we are going to do for the notes once you send this conversation to the AI model it will get the response and it will generate the feedback and notes and then that feedback and notes we are going to save to our database later on on the dashboard user can access the conversation and the notes or feedback any time how cool right so that’s what we are going to see next so very first thing that we need to do uh let’s go back and inside this option. jsx I added this summary prompt and uh for the topic based lecture uh learn language and meditation added same kind of summary prompt okay it will just say I generate notes depend on the conversation okay but for the mock interview and question answer it will generate the feedback along with the various the Improvement space okay and that’s all obviously if needed we can uh redefine this prompt okay so to get an exact output from the AI model now once we have this summary prompt simply go to the AI model so sorry go to our Global Services and inside that we already have this uh AI model right but uh we can actually use the same model as well okay because uh we need to pass a topic we need to pass a coaching option and from that one once we get the option we can just get the prompt which we want okay but it’s always better uh we can keep this uh separately so I will just copy this and let me paste just below to this one and it will say AI model to generate feedback and notes okay so obviously in this case you can pass the topic if you want uh it’s completely optional okay so the topic is not that much required okay so I will just remove that topic and uh coaching option is required and the conversation so here we’ll say conversation now first obviously we need to get the option okay so depends on the name right we will get the option once we have the option we’ll get the prompt so over here we want summary prompt and you don’t need to pass any topic then uh this will be same as it is uh you need to pass prompt and then uh conversation okay maybe you can just move this something like this and let’s see whether it’s okay or not I’m not sure exactly whether it’s working or not but let’s try this out and then once it is ready let’s go to our page. jsx file and uh once you click disconnect right because uh here we can um set one state let’s say constant enable feedback note comma set enable feedback notes is equal to use State initially it will be false and once it disconnect then we’ll set this enable feedback nodes to True okay now at the bottom over here maybe I think we need to add that inside the chat box that’s where we have uh we need to add the button right so over here I will add that enable feedback notes option we copy this so that we can accept that inside the chat box and we can just hide this uh we will enable this when it’s a false okay and then we will add a button to generate it so over here we’ll say button and close this now inside that we’ll say generate uh feedback SL notes okay and then save it now inside this this actually we can write okay instead of going back so over here we say constant generate feedback notes and inside here uh we need to call this method on the click of this one so here we say on click and simply call this method next uh from our global Service we need to call this method so over here we say constant result is equal to await AI model uh generate feedback and notes and to this one we need to pass the coaching option uh I don’t think so we have this coaching option here we will add add it in a moment and that’s all we need it okay uh so let’s get the coaching option so from here we are passing this coaching option from the discussion room data so I will pass that to the Chart box so over here we say coaching option which US are selected so here we are saving at a coaching option only and inside the chat box make sure to accept that so that we can pass and then we also want to pass a conversation which we already have then let make this as a sync and once we have the result we’ll say result dot uh content okay and then save it now here I’m going to define the state so call loading set loading is equal to use State and initially it will be false but once you start generating we’ll set the loading as a true and after finish we’ll set the loading as a false then when the loading is true we can just show show the loader icon so over here we’ll say if loading is true then we’ll show the loader icon something like this then inside the class name we’ll say animate spin so it will give some animation and also I will make sure this button is disabled when the loading is true okay and then save it now if I go back obviously we don’t have any conversation so I will pause the video we’ll make some conversation and then we’ll see whether it’s working or not now over here I had conversation and after disconnecting if you see we have this button generated right obviously we’ll align this button right now it’s not correctly aligned we need to give some margin top and all but when I click generate feedback and notes if you see it start generating feedback and notes and it will take a some few seconds of time but inside the console we also check whether it’s working or not right now um if if you see it it threw an error on line 41 saying last two conversation is not defined so that’s where so if I go to the global Service we forgot to add this conversation over here and that’s the reason we caught that error um I think we need to start again and now again we will try this and once you click on generate feedback and notes we’ll see boom we got the information pretty cool right so it it contains the uh some star it means it’s a bold one and obviously we will convert that into a um specific type when you want display that but right now if you see we have all the information pretty cool right so that’s how you can generate the feedback and notes so in this case this is a question answer so that’s why it generated the feedback but when you are uh selected the lecture then it will generate the notes for you depends on your conversation now once it generate we need to save this into our database right now we don’t have the column to save it so we need to uh either create a new column or you can create a new table in that one also you can save it but rather we can save in same column We’ll add a new column as well so before that let’s align this um button okay so because right now it’s weird so I will go to the chat box and to this button I will add a class name you margin top to 7 oh sorry margin top up to 7 and also we’ll mark withd to be full okay uh next let’s go to the schema file inside the convex in order to add a new column okay so before that okay you can make it optional so inside the description room I will add a new column called summary only okay so and inside the summary we’ll put this as an optional field and then I will put it is of type any or you can put it as a string as well or text and save it uh let’s make it this as a text if we I don’t think so we have the type text inside this um I will put it any only okay that will be good and then save it now over here you will see the new column get added call summary and currently it is unset now obviously let’s go back to the discuss room. jsx inside the convex and here we need to write a method to update the um summary so I will copy the existing method and here we say update summary inside that everything will be good over here I will accept the field as a summary and over here we’ll also update the summary okay so just make these small changes and that’s all you need to do after this let’s go to the chat box and once the it generated first Define this mutation so we’ll say constant o sorry let’s define a constant then update summary I will say is equal to use mutation API do discon room. update summary and then over here we’ll say a wait update summary dot oh sorry uh we need to pass a field we need to pass an ID so if you don’t have an ID we need to get the ID inside the chat box and uh or you can also get the ID directly from this uh discussion room ID okay so this room ID you can get it something like this room ID is equal to use parents okay the similar way we get it from the page.js and once we have that we’ll pass that room ID after this we need to pass a summary and that is nothing but this result. content okay and then save it you can WRA this everything in try catch block okay so we’ll put it inside the try block so if you get any error then it will not stop your application and over here we’ll set the loading as a false and outside of this one as well or maybe inside this we can set loading false okay and then SA save this one uh I think that’s all we need it for now now one more thing uh whenever you save any information you disconnect it or whatever right we need to show some kind of notification so that you will get to know whether the feedback notes generated or not and like that right but before that let’s test this out and right now we already have the information I will again click on this generate feedback notes and we’ll see whether it’s getting saved into our database or not okay now if you see we I generate this feedback and notes inside discretion room boom We inside the summary we have this summary now beautiful right perfect so that’s what we wanted and it means once we generate the feedback and notes we are able to save it successfully now simple thing that we need to show a notification so from the shadan we have a component called sonar this component act like a toast message see so in order to add it just copy this uh npm command then uh you need to add this toaster inside your layout file so over here I need to add a sudo in my case and then I will go to the layout file you can add it inside the root layout and over here you can add toaster from the component U onar and once once you add that go to the chat box and whenever the content is ready you can show the toast message saying um feedback sln notes saved okay and over here if it’s Error we’ll say internal server error try again okay uh so that’s how you need to do it um I think uh when user connected inside the page.js that Al that time also we can show the toast message so basically on connect to server when we click on the connect right so over here we’ll set the we just show the to message saying uh connected and on disconnect I can show toast message disconnected okay and then save it perfect so that’s how we need to do it I will just try this one more time oh so it refresh everything anyway but once you try it now you will see the toast notification okay so that user will get to know whether the feedback is generated or not it’s time to display all the previous history uh in in which obviously whatever the lecture user attend whatever the mock interview attend everything we are going to display on the dashboard and the along with their feedback notes generated by the AI okay so for that one uh we are going to fet the data from the discussion room but we have to make sure we only F the record which is belong to that particular user but unfortunately we are not saving the user information inside this discussion room we need to add a new column called user ID and in that one we need to save uh which user created this particular record so simply uh I will close this tpe for now and let’s go to this convex and schema inside the schema I’m going to uh add a new column over here and we’ll say user ID here uh right now I have to give it as a optional field and we’ll say v. ID but over here you need to give the table name okay and this automatically connect to our user table like this and then save it now as soon as you save uh you will see new column get added to our convex uh discussion room and over here if you see we have this user ID right but right now it’s empty so what I will do I will just copy this user ID and I will paste it uh so that we can tast out so this need to be a string okay uh I think oh that’s correct so this need to be a string so I will paste something like this I added for a couple of uh Records okay now obviously from next time we have to make sure we will save the user ID so we have to make some changes so inside the uh discussion room right here uh when we create a new room we need to accept the user ID which is of type ID but make sure uh that table name we need to give and then simply I’m going to pass that user ID over here so we’ll say arguments. uid now from next time it will save it but when you click on creating this uh discussion room that time also we need to make sure we will pass the user ID so over here we have this user input dialogue and on the click of next we are creating that so make sure to pass this user ID now in order to get the user information you’ll get it from the hook which you are you already created which is user data is equal to use context and user context and from that one from the user data you’ll get the user ID something like this underscore user ID okay and then save it now whenever you create a new room it will automatically save the user ID for us now in order to display the previous lecture and history right or feedback uh this is how we are going to show now you have two option one you can use this existing icon in order to do display or you can add a new abstract images for each of them so in this case I am going to add this abstract image to show it okay so it will give you some different uh if uh look to your dashboard and then uh the topic name the coaching uh option name and then the time when you did uh you attend last time right and uh inside our option. jsx for each of these field sorry this object I added this abstract fi and this ab1 ab2 PNG I have this file I already added inside our public folder something like this okay and that what we are going to use so let’s go back to our uh dashboard and inside that we already have this history. jsx page correct now inside this we need to fetch all the uh option which is belong to that particular user so we need to fetch all this record from the discussion room so we’ll say constant gate uh discussion rooms and the arrow function now inside the discussion room. jss from the convex right uh just make sure you inser inside the convex here um we already have this gate discussion room but it gives by ID we want all the discussion room right so here you need to write a method or you can just copy this existing one and then I will paste it here but over here we’ll say get all discussion room then inside the argument we need to get the user ID so here I will say U ID and and this need to be a users ID okay and uh over here I will just filter that out so instead of get we’ll say DB do query inside that you need to give the table name from which you want know Fage then we’ll say do filter and here you need to write a logic on which column you want to F so here we’ll say q q do fill and inside this we’ll say Q do equal to uh oh sorry I think uh this need to be a first Q do equal to because we need to compare with the two column so we’ll say q. fill and then give the fill name which is the user ID we want to compare with our arguments user ID and if that is match then simply we’ll say dot collect okay it means select all that uh Records which matches this particular condition and once we have the result we simply going to return that and then save it now let’s go back to this history. jsx here as this is a query and we want to fetch depends on the user ID I’m going to define the convex first and we say use convex hook okay along with that one we need a user ID so that you can get it from the user data is equal to use context and here we’ll say user context now inside the use effect we going to call this particular method and we want to execute this use effect only when the user data is available and here we’ll say that’s the reason we are going to add user data is and and gate discussion room it will call this method only when the user data is available and then inside the gate discussion room here we say result is equal to await convex do query and inside this you need to pass the API so which is this um get all discussion room okay and then you need to pass a parameter which is the user ID we need to pass so that you can get it from the user data doore ID as we use AWA let’s make this as a sync and then once we have the result for now we’ll just consol it and we’ll verify whether we are are getting the result or not inside this history component so let’s go back to our application I will go to the inspect panel and inside the console uh obviously we you uh we use this use effect and so I think we need to make this component on the client side so we’ll say use client and if you see over here we got the for record with this coaching option expert name topic and menu other things now simply we want to save this in one state so inside the history I will create a state called constant discussion room list comma set discussion room list is equal to use State okay and then save it now over here we’ll say set discussion room list with an result now next thing if the discussion room list so over here we say if discussion room list. length is equal to equal to zero then we are going to show this h2 tag with this particular text okay and uh if obviously if discussion room is there then we show the discussion room list so over here we’ll add discussion room list do map here we’ll say item comma index and the arrow function like this oops and inside that we’ll add a d now inside the DU I will add one more de and we’ll add a text oh sorry uh the h2 tag with item dot um I just verify the field name so the field name is I want this topic name so we’ll say topic something like this okay after this uh another H tag with the coaching option so we’ll say item do coaching option and then save it now if I go back and obviously on the screen let’s refresh this once you will see we have this list of all the information but also we have to make sure only lectures means topic based lectures learn language and medication uh list or records we are going to display under this previous lecture for mock interview and question answer we are going to show on the feedback side because uh from the previous lecture we are going to show all the notes and for feedback we are going to show the feedback okay so that’s the reason we have uh differentiating that two things uh also for this one we’ll add key as a index and uh as I say right here we’re going to add a condition so simply we’ll say if item dot uh coaching option okay if item. coaching option is equal to equal to um I will just get the exact option name which is topic based lecture or or okay item do coaching option is equal to equal to let’s say learn language then only we want to allow right and medication as well that you can add it here we’ll add and and operator and let let bring this down okay and then save it now if I go back you will see only two records beautiful now for this h2 tag I will say class name font bold and here we will say class name text Gray 400 something like this uh after this as I told you right we also want an image so I will Define one method called constant get uh abstract images is equal to and here I’m going to pass an uh Arrow function and inside this I’m going to pass an option on depends on that one we want it right so here we say constant uh coaching option here we say coaching option is equal to coaching options doind here we’ll say item then item dot name it matches with the option then we are getting this coaching option and then simply we’ll return the coaching option dot uh abstract okay so that abstract image I’m getting now this particular method I’m going to call so over here I will WRA in another du something like this and then over here we’ll add an image tag with a source and inside the source I’m going to call this method then inside the alt tag I’m going to you can say anything let’s say abstract then inside the width let’s say 70 and height is of 70 or let’s make it 54 now and save now if I go back and here okay so we got an error but not sure okay I think yeah we have the error it’s saying coaching option do abstract is undefined okay so maybe what you can do you can add optional field and over here if it’s not there I’m going to add a slab 1.png okay I think that will be good let’s save it and let’s test this out and right now if you see we have this images okay so both are topic based lecture so that’s the reason it uh it has the same image that’s good now let’s add some Styles so here we’ll add a rounded corner with a full and save it let make this 17 let’s see how it looks also for this du I will add a flex gap of 7 and item to be in the center okay obviously uh we need to give height inside this let’s say 70 pixel width of 70 pixel I think that’s too big right let’s make it 50 only perfect now let’s give some margin from the top oh sorry uh so over here for this D only we’ll add a class name margin top to five then we want to add a border at the bottom side only okay so maybe for this do we’ll add a class name we’ll say a border bottom to let’s say 2 pixel and also padding bottom to let’s say three and save it so what it will do it will add the um border only okay I will make this one pixel and also We’ll add margin bottom to let’s say four perfect okay and that’s what we wanted now another important thing when we hover on this one we want to show a button and that button will call view notes so in order to add an H effect to that one we’ll add first we’ll make this as a group okay now inside this D only I’m going to add a button uh we’ll say view notes to this button I will I’m going to add a class U let’s add a variant only so variant I’m going to add as a outline so you will see this outline button but we want on the right hand side so for this de we’ll make a flex justify between and then item to be in the center so it will be on the right hand side but we only want to show when you hover on any of this particular item so basically inside the class name we’ll say invisible okay but on a group H so here we’ll say group however we want to show it so we’ll say visible now right now if you see it’s invisible but if you h on this one it visible now if i h on any of these items is still visible you don’t need to hover on this button only because we already mark this as a group so this will be a one group and even though on a it’s a group hover right even though you hover on any of this item or on this particular group it will uh show this particular button okay uh also make sure I will add this cursor point enter over here and then save it perfect and now you you see the cursor get changed as well same thing you have to implement for the feedback also we are getting uh the same image the reason is uh we we are not passing the option over here so we need to pass uh coach item. coaching option okay to this method so that you’ll get a different result also um from the discussion room you can order by the creation time and right now we are not showing the time when it get created so I will add another S2 tag and inside that we need to add item dot um let’s get the fill name which is the underscore creation time we’ll say underscore creation time and save it so you will see this time but this time is in the form of time stamp we’ll change that in a moment but I will make this gray 400 only and save it now to make this like um 20 minutes ago 30 minutes ago or 24 hours ago you need to add one Library called moment.js so simply just type npmi moment click enter and then it will add the moment.js library which help us to convert any kind of date in a specific format you can just go to this mj.com and here you will find a different different option now if you type this from now it will calculate like this 303 days ago 9 hours ago 29 minutes ago right how cool so that’s how we want to do it so we already have the uh creation time right simply I will add a moment Library make sure to import this and here we’ll say from now and save it once you add that one you will see it’s saying 21 hours ago a day ago something like that now for the order by right because we want this as a first because the latest one we want to show at the top so inside the discussion room when you f the all the discussion room here we have um I think just before that we’ll say order and then you can add order by let say descending okay and let’s save this one and and then if I refresh this screen now you will see now the latest one is on the top also um I will just change the font size for this time so here will say text small perfect now same thing you have to implement to the feedback okay so what I will do I will just copy everything as it is so everything I want to copy okay and then inside our feedback I’m going to paste something like this okay obviously we need to import the all the statement so it will be better if you copy as it is and then save it now over here once you save it uh it’s also showing on this side but we need to change that now little bit name because we copied so here we say um your previous feedbacks or just we will say feedback here as well we need a feedback and but most important the coaching option that we need to update so obviously the coaching option um only when we want to show the name is mock interview and the question answer prep so over here will say mock interview and this side will say question answer prep pration we don’t need third one so I will remove that and everything will be as it is let’s save this one and right now if you see we only one have one uh option which is this react just which is question answer prep and instead of view notes we’ll say view feedback and then save it okay so when I H on this one you will see this button view feedback perfect and that’s how you need to display your previous or history about your lectures about your interviews and everything which you can uh see later on at this point you already build 80% of application so you learn a lot of things and you know now whatever the new feature you want to add you can add yourself if you build until this point and now we are going to add one more feature where when you click on any of these view notes or view feedback option we are going to navigate to the new screen which you already know how to navigate it and on that particular screen um as per this mockup I don’t have the exact design for that one but at the top we are going to show some basic information similar like this one uh and at the bottom side we are going to show a feedback notes so whatever notes we are already saved that note uh notes or feedback we are going to display and on the right hand side we are also going to to add a chart box where um we can show the conversation history okay obviously you cannot talk just a display purpose only and for this chat box we can use the existing chart box okay so we already have the component for that one just we need to add a new component for the feedback and notes now as I say that you already know a lot of things so simply let’s create a new route so inside the main I will add a new folder here we say view um discussion room or we say you we can say view feedback and notes so we’ll say view or we’ll say simply summary that will be easy and inside that we want a dynamic route so we’ll add a room ID again so this need to be actually folders so make sure is a folder so room ID and then I will add a new page. jsx file let’s add a default template here we’ll say view summary and then save it now inside this view summary first thing we need to get a room ID so we’ll say room ID is equal to uh use params so we’ll get the room ID now we need to get the discussion room information from the room ID so here we say get discussion room data I will say and you already have the uh query for that one so I don’t think so we need this particular method okay I think let’s let’s just call constant uh or let’s copy from our discussion room page. jsx because we already have this discussion room data just copy that as it is and then paste it here because we are just passing this room ID as it is and just for your confirmation I will also print this room discussion room data okay now on the click of view data or sorry view feedback or view notes we want to navigate to this particular page so I will go to the dashboard and inside the history. jsx where we have this button right I will just use a link tag so you can use this link tag in order to navigate it’s similar to our anchor tag inside the HTML CSS but it’s optimized one in for a next J so that’s the reason we are going to use over here we’ll say view summary slash and the ID which is uh the ID is just Item doore ID and then save it now if I go back to our application and if I click for example let’s say reactjs view feedback and uh okay I think let’s refresh this on and when we click view feedback I don’t know why it’s not navigating oh so we did not add it for this feedback we just added for this uh lecture so if I click on this it is navigating but obviously we have error in that page so let’s fix that one as well so inside the page we need to mark this as a use client because we are using this room ID right and uh also inside the feedback. jsx I’m going to wrap inside the link tag H reference here we say slash view summary slash the ID which is the item doore ID and the arrow function something like like this and save it okay and now we can test again so if I click view feedback now I don’t know it did not refresh or what and I will also make sure it’s saved I think it’s saved and now if I click view feedback boom it’s navigating to view summary with this ID and if I go to the inspect panel we’ll just make sure we are getting all the data so inside the console here we have object and inside the object we have this coaching option expert name along with this ID conversation and lot of other information beautiful now let’s go to our page. jsx and as I told you right we need to divide into two screen and at the top we want to show um oh let me go back to our markup we want to show this information so let’s add that quickly so first I will add a de and again I will added one more de over here we’ll say image The Source tag and the arrow function and I will go to this feedback where we have this abstract image right I will just copy this and we’ll paste it here because we want to get the abstract image from the coaching option so similar way I’m going to pass the discussion room data dot coaching option so from that one we’ll get the abstract image here we’ll say Al tag as a abstract then we want to show we we will add a width let’s say 100 height to 100 inside the class name we’ll say width to 70 pixel height to 70 pixel I will make this rounded full and save it and if I go back let’s make sure the image is showing beautiful then inside this du tag similar like this history or whatever right I will just copy everything as it is and I will paste it here okay now over here this need to be a discussion room data. topic uh coaching option and the creation time and save it make sure to import this moment library and is refreshing also make sure to add this question mark for that optional operator and if I go back okay we have an error Let me refresh this again and here we have the data beautiful now let’s apply some styling so for this de we’ll add a class name we’ll make it Flex gap of let’s say seven item to be in the center and as I told you right we want this on the right hand side so basically for this h2 tag I will keep uh let’s keep outside of this one or let’s let’s add let’s do one thing I will put outside of this one and then I will wrap again inside one more d and over here I will just keep this size as it is and now for this D as well we make it Flex I will add justify between and uh item at the end okay something like this see perfect I think this is good for me um just one more uh here we’ll make text larger next thing as I told you we want to divide this into two uh columns so basically we’ll add a d and inside this we have the two option one for the chat box and one for the notes and here we’ll add a class name We’ll add a grid grid column one when the screen size is smaller and when the screen size is larger we’ll make grid column uh four okay out of this four column I’m going to assign three column to this particular do so here we’ll say column span 3 and also I’m going to add a gap to five and save it now as you know that we already have the chart boox component so I’m going to use this chart boox component as it is to this chart boox component we need to set a conversation that conversation you can get it from the discussion room. conversation then we also have the coaching option that coaching option you can get it from the discussion room. coaching option and we don’t have the enable feedback notes so here I will make this as a false and then save it and once you save you will see the chart box as well okay so make sure this you are making this feel optional the reason is initially the discuss room data is empty right so that’s the reason you need to mark this as optional operator field with an optional operator field um also over here we have an error it’s saying cannot re property of undying map so let me get back to our inspect panel and inside the console okay maybe we what we can do we can just make sure we want to show this chat box only when we have the data okay so if chatbox do data we’ll say do conversation if it’s there then only show the chat box component and boom here we have and this is our chart box component perfect somehow it’s coming very small uh the reason is we need to put our grid so this is our grid right and uh I think that’s good I don’t know why it’s showing too small okay so here this need to be a column span three okay so maybe that is the reason and instead of that one let’s make it span two only so this will get little more space and I also Mark this column span two now we I think we are good on this part see perfect so this is the conversation uh simply for this da I will Mark margin top to five so some space um now on the left hand side we can show the notes as well so I also modify this grid little bit I made this five and then 32 okay so three size to this first D and other for two now over here we need to create a component for the nodes so simply inside the view summary I will add underscore components folder and inside this we’ll say uh summary box. jsx add a default template and save it and this summary box I’m going to add over here perfect now to this summary box we need to pass the summary okay so here we say summary as from the discussion room data do summary and save it now inside the summary box make sure to accept that as a summary only and save it now here we’ll say h2 tag and simply you can show the summary as well like this and once you add you will see this summary but obviously you need to format this uh perfectly right now it’s completely unformatted so you need to format this now to format this particular text you need to use this react markdown Library okay so which will convert your unformatted U text to a perfect formatted value so simply copy this npm command and then install this okay once you install this you just need to wrap your summary so here we say react markdown make sure to import this react markdown and inside that you need to provide your text which you want to format it and here after refresh it’s refreshing and boom if you see now some of the text are in bold and it’s well formatted but still if you want to add uh little spacing between this line and all simply We’ll add a class name here we’ll say text Medium sorry text base slash you can provide the how much um size you want for example in this case we mark it as a text base sl8 you will see the line spacing the line height is change okay and that’s good also you can uh make this scroll so you don’t need to scroll the complete page so for this du We’ll add a class name here I will add a height of 60 vertical height which is similar to our chart box and then we’ll say overflow Auto okay and then save it now over here if you see you need to scroll it in order to view beautiful right and you don’t need to uh scroll your complete page this will help us to do that uh also I’m going to make one more change inside this page. jsx I will add a class name we’ll say minus margin top to let’s say 10 so it will be little up okay and I think that’s much better perfect over here um you can give some Valu or let’s say h2 tag and here I will say uh summary of your cont conversation I will add a class name font instead of font we’ll say text large let’s say font bold let’s see how it looks perfect okay and where you can scroll it um you can give margin bottom so here we say margin bottom to let’s say six so some space okay and same thing you can do over here on the right hand side so I will copy the same one just I will paste it inside this D tag and here we will say uh your conversation okay and I think that’s all let’s save it and let’s test this out perfect so here we have this conversation we don’t need this particular text okay so so let’s remove this from the chart box because that is not that much relevant okay so I will just let me do this empty I don’t want to show anything so I will just remove that this particular text okay and I think this is pretty much all for this particular text okay so that’s how guys you need to show this view summary as simple as that you it’s not a rocket science but the most interesting thing about this one user can see what answer he gave and what feedback he received inside this summary it is very important in order to build this size application we need to keep track of all the uh conversation and the tokens because we already giving the user some default token when user create the account and when user makes some usage right when user start conversation we need to update the token obviously we need uh so that user can purchase later on if if he needed now in order to update this token we need to count the length of the uh conversation right so if user talk five words then we need to calculate that five words and then we need to update that so simply uh go to this discussion room page. jsx file and that’s where all our logic are right so inside this I’m going to create a new method called constant update user token and the arrow function now in order to update it first we need to write a method to update the token so inside the users uh con uh convex folder right here uh we’ll write a new method we say export constant update user token is equal to mutation and in that one it will get two arguments one is the ID so this ID is nothing but the user ID right and then uh the credits okay the updated credits that we are going to which we have want to update so that is of type number and then we’ll pass the Handler over here we’ll simply say a wait CTX do db. patch and and then we need to pass which uh record you want to update so we’ll say arguments. ID and then the field which you want to update so in this case we’ll say credits with our arguments do credits perfect something like this okay and that’s how you can update it as simple as that now simply at the top we’ll mention that particular um we will update that so here we say update user token is equal to use mutation and then provide the API so API dot users do update user token now make sure to update this particular method so here we say update user token uh we’ll say method only okay and inside that we’ll say constant result is equal to await update user token and inside this we need to provide two important value one is the user ID so that we can get it from the user data doore ID now if you don’t have this user data Define here you can get it from the use context so over here you can Define user data comma set user data is equal to use context and here we’ll say user context now once you define this user data from the user data you’ll get the user ID here I will make this as a sync and then in order to create the credits obviously you have this user data do credits as well but the problem is we need to calculate that first so simply over here we say constant token count is equal to and now this particular method we need to call inside the use effect because use effect executing whenever the conversation changes right so simply at the bottom you can call this method called update user token method and inside this conversation or instead of this conversation right let’s update uh whenever you you generate the final transcript okay so over here you can update it and you just need to send this text and nothing else something like this okay but also uh here I will make this await if needed but also you need to update on uh whenever the AI generate the response so that logic uh inside this uh use effect right so maybe you can say after this conversation and over here we’ll say update token but we just want a generated message so that we can get it from the AI response do content okay so here we’ll update AI generated token and over here we’ll say update user generated token okay so this two method we update and then inside this update user token we’ll get the text and depends on this particular text we’ll say text. trim and of obviously if it’s uh there we say text. trim then we’ll split this and here we’ll say uh this is the rejects okay that we need to follow and Slash and we’ll say dot length so it will calculate the length otherwise it will give zero if the text token count is empty and then once we have some token right so this will be our token so we’ll just minus the token from this one so if user have the some

credits minus this token now make sure this is a number and I will also make sure this will be also a number okay and then in credits we’ll get the result as a number only and that’s all you need to do also make sure you’ll update our hook so inside the set user data you need to write um let’s say previous and inside this we’ll say all the field from the previous but you need to update the credits field with this value perfect right so make sure to update like this and then save it now what we do from the next time whenever uh you uh start the conversation it will also update your token and that is very very important so here I did some conversation okay and if you see in our database we have this credit updated to 49928 okay depends on how much conversation I did how cool so that’s how you need to do it you can also implement the same thing when you generate the feedback and notes okay so same thing you have to do just call this method when you are disconnecting right we already have this update conversation and uh once you get the result then also you can update it’s up to you so if you have any question any doubt let me know in the comment section you can ask on my Discord Channel as well now it’s time to implement the profile section on the click of profile we are going to show a dialog and on that dialogue we user have option to uh update his account setting usern name as well as he can check the how many credits left and option to upgrade and join the membership as as well through that particular profile section so first thing we going to add the dialogue on the click of this profile so simply we’ll go to the shadan and search for the dialog component as you know that we already installed this component so you don’t need to install it again just copy this import statement also uh we are going to create a new component for that so inside our dashboard component folder I’m going to add profile dialog. jsx we’ll add a default template and then simply whatever the UT statement you copy for the dialogue paste it here same thing I’m going to copy this uh dialogue use case example we’ll paste it here and then save it now we want to open this dialogue on the click of this profile button so simply uh go to the page. jsx that’s where okay so inside this feature assistant uh we have this button right so we’ll just wrap the profile dialogue something like this okay now this will be the children so inside the profile dialogue we’ll accept the children and then we’ll render this children over here make sure to mark this as a child so that you will not get any hydration error now once you save it let’s go back and now on the click of this profile you will see it opens the dialogue pretty cool right so that how it works as simple as that now inside this dialogue description I’m going to create a new component under this component folder under the dashboard called credits. jsx and in that one we are going to show Credit Now this component we are going to just import inside this dialog description so over here we say credits and then save it now if I go back and click profile you will see the text credits only now let’s go to this credits and that’s where we are going to show how many credits left and option to upgrade the credits as well so first I’m going to get the user information so we’ll say user data is equal to use um context andway here we say user context okay and then save it now inside this D first we’ll add uh another div inside this we’ll say image The Source tag and inside the source we want to show a user profile picture so here we will say user dot picture it’s need to be user data actually user data. picture then we’ll give width let’s say 60 and height to 60 now if I save this one make sure the profile picture is displaying on the screen so somehow the profile picture is not f visible I will just make sure the field is correct so right now uh it look like we are not saving the picture so in order to get the user picture you can get it from the uh our authentication hook so which is equal to use user from the stack uh St frame SL stack okay and then you can say user. picture and I will just make sure so we will say user Dot uh profile image URL okay so in order to display the picture you need to add this profile image URL uh now if I go to this profile so right now we are getting this error now this error is stating that you need to add this host name inside your next. config.js file because this is third party URL right and we need to add this host name into that one so simply copy this host name go to the next docon MGS file inside here let’s add images inside the images let’s add a domain and whatever domain you want to Whit list just add this once you add that you need to refresh your application or restart your application and in meantime uh I will also check whether it’s displaying or not and here we have perfect now we’ll just add some styling so over here we’ll add a class name we’ll make rounded full and save it after this uh on the right hand side I want to show user name so over here we’ll say user do um display name then another tag for user. email or primary email now for this S2 tag We’ll add a text larger font bold perfect and let’s bring in one line so um for this one as well I will make a gray color let’s say 500 and uh for this de I will make it Flex gap of five item to be in the center something like this okay then I will put one horizontal line so after this de We’ll add a horizontal line here uh you can add class name margin y to let’s say three so some space then um we’ll add another du tag and it will show the how many token you we we you already use right so um simply uh let’s for now let’s say S2 tag inside that we’ll say token usage and uh I will give a class name font B okay then just below that one we want to show how many token usage let’s say uh 30,000 from the 50,000 you can say anything like this okay and then we want to show a progress bar so right now this is how it look lies but we want to show a progress bar and it will display uh the actual progress so in order to add a progress bar just go to this shadan components and search for this progress component it’s very easy just copy this import statement and make sure to install this uh once you install just use it so over here I will add this progress from the components and to this progress you can Define the value okay so if you see this example you need to Define this value for now let’s say Define a value as 33% completed okay um perfect and save it and if I go back to this profile you will see this one now for this progress bar We’ll add styling We’ll add margin top let’s say margin y to five so some space perfect okay uh next I will just make this four or maybe three and then we want to show the current plan information okay so over here I will add an D tag inside the D we will add h2 tag and inside that we’ll say current Plus plan then we’ll add another h2 tag and we’ll say for example free plan okay and save it right now we are just building UI but we are going to add some condition as well that I’m going to tell you in a moment now over here we’ll say class name font bold then for this do we’ll add a class name we’ll make it Flex then justify between item to be in the center for this S2 tag I’m going to add a class name We’ll add a padding into one then background secondary and also we’ll make rounded corner to let’s say uh large something like this see let’s add padding X2 two perfect uh for this due let’s add margin top to three perfect okay then we want to add one card uh in that one we can show option to upgrade so over here I will add one du inside this du uh let’s add another du with an H2 tag and here we’ll say Pro Plan then we’ll add another h2 tag here we’ll say uh how many tokens you are giving to that Pro Plan so we’ll say 50,000 tokens now over here we’ll make font bold and uh I will make sure to wrap this in One D something like this and then we’ll add another us tag to display the amount okay so here we’ll say $10 per month here I will make this font bold okay now let’s bring everything in one line so here we class name flex and justify bit so this is how it will look like but obviously we need to use some margin top and all so for this one we we will add margin top to five then we’ll add padding to let’s say five and save it perfect let’s add a border as well and we’ll make rounded uh to Excel and then at the bottom we want to add a button so over here uh first We’ll add a horizontal line for this we’ll add margin Y 2 3 and then we’ll add button we’ll say uh upgrade into this button I’m going to add a wallet icon and we’ll say upgrade you can pass the amount as well we say doll10 and then save it and this is how it look likees let’s put the width to be full so for this class um button We’ll add width to full something like this and that’s how our uh token profile uh dialogue is ready okay now the question is how can we show which one is free plan which one is paid and everything if you remember when we create the user information we added this subscription ID if user has a valid subscription ID then we are going to show that user is on a paid um plan and also we are going to show the credits because we already have this credit information as well so from the user data we are going to show the credits how many credits you user left with so over here we’ll just simply going to add a credits sorry user data dot credits and if I save it now you will see the actual user credits so right now it’s not displaying somehow so it need to be a credits with a C Small C and here we have and you can just detect this uh 50,000 from this one okay but before that uh I’m going to now if it’s a free plan then we can show the uh just minimum token right so let’s consider over here we can add a condition if user data dot subscription ID I will just make sure the FI name is correct okay so this is the subscription ID I will just uh okay that’s fine if subscription ID is there right then we can show the text as 50,000 token okay otherwise we can show uh let’s say 5,000 tokens okay and uh after this let’s save this one and then you will see now user only have 5,000 token obviously by default I already have the 50,000 token right but now let’s let add subscription ID let’s say 1 2 3 4 5 as this is a string I will add this and if I refresh it now you will see that token count is to 50,000 good right so that’s how you need to add it uh next thing uh same thing depends on that one we need to calculate the progress so over here I will write a method constant calculate progress I will say and the arrow function now this method I’m going to call over here and it will return a data so first we’ll check if user is free or not so over here we say if user data dot subscription ID if it’s there right then we’ll return the 50,000 minus user data dot uh credits okay so this need to be a number so I will just make sure it’s a number okay and then save it now if I go back and uh I don’t know okay so if you see uh if I refresh this let make sure it’s correct and make sure okay so you have to make sure it you have it need to be between 1 to 100 so you have to divide this by 100 so let’s divide this by 100 so actually here instead of minus let’s do divide this uh number of tokens so we’ll do number of tokens divide by uh maximum token so in this case maximum token will be 50,000 right and then we’ll say into 100 so into 100 let’s say save it and let’s see I think it’s over the our logic I think our logic is incorrect our logic is correct but if you see the percentage is very minimum it’s 99.85% let’s say I will make 40,000 right and enter it then if I refresh this now you will see that change and then you will update this progress to let’s say 80% see okay so this is correct so that’s good and uh obviously once you reach to zero then this progress bar will end to this left side right so nothing you don’t have a token then you need to upgrade it so something like that uh next thing uh over here you need to show paid plan if user already have that subscription ID so over here I can put a logic so you can put this logic I will just copy everything as it is I’ll paste it here and the free plan we can add over here and this will say paid plan okay and if you see oh I think spelling is incorrect so here we say paid plan so you if user is already on paid plan then you will see displayed plan option but if you want to upgrade this plan then after clicking on that one you can navigate to this payment Gateway now for the payment Gateway I already added the Reser pay payment Gateway which is compatible to our Indian currency but you can also add the strappy and for that one you can watch my bill personal AI assistant uh video in that one we already have the dedicated payment uh integration chapter uh you can refer to that one it’s similar uh 99% similar to what we already build it also you can go to the TU guru.com we have this build personal aist application go to that one um you can also get the source code of this one or you can get the source code of this particular application from the tui.com so on that one on the click off upgrade you will navigate to this Reserve pay payment Gateway and then you can easily make a payment I will add the payment integration into this source code so you can get the source code and then you can have the payment integration ready to use now you also need to make sure once you add the payment and you have the subscription ID make sure to update the subscription ID into your user us column as well as you need to update the credits so this two point is very important now let’s consider that you want to show your video on the screen right obviously we need to enable the webcam for that one and that’s what we are going to integrate now now it’s up to you whether you want to display the webcam on this big screen or on this small screen now consider that we want to show this on this smaller screen so first that one the easiest way to do that just uh install this react webcam Library copy this react webcam make sure to install this one and once you install you can simply uh add that so what I will do um right now I’m going to uh commment this out okay for testing purpose and I will add a du inside that we’ll add a webcam like this and then save it now to this webcam I’m going to provide the height and and width so let’s say I provide the height of 1770 width I will give 250 also you can provide the class name and I will make a rounded corner to let’s say 2 XEL and then save it now once we add that let’s go back and here we have we can able to see the webcam cool right now next we want this on the right side corner similar like this one so you can just add the class name we’ll make this absolute the class name to be an absolute then we’ll set a bottom to let’s say uh 10 and right to 10 okay and then save it now it will be on the right hand side at the bottom I think we can decrease this size so I will make this to let’s say 80 and here I will make this 130 some little smaller something like this I think that’s much better okay so that’s how you need to add it uh if you want to add more customization into that one on the click of that you want to see the bigger image that also you can do it it’s up to you how you want to do it okay for now I will keep it simple but uh if you want to know how to do it let me know in the comment section and we can do it in the source code now it’s time to deploy our application to a production so I added this land Landing screen Simple and Clean if you want to get this Landing screen you can access it from the source code it is included in the source code as well also there are some features that are also available in inside the source code if you want it you can access that one now first thing that we are going to do is to push all our code to the GitHub and then we are going to connect GitHub to a worel because worel is the cloud platform to host your site for a free so let’s go to GitHub first and we’ll create a new report here I will give the name as AI coaching voice agent then you can keep the public or private it’s up to you and then create the repo once the repo is created you can set the origin to this repo so I will copy this and simply go to your terminal here I will first initialize the git once it is initialized set the remote origin then run G add command so all your files are get stage then give the commit message we’ll say initial commit and then simply push this change but if you are pushing this first time then make sure to push with this command and then once the code is pushed it will be available to your repo now many user ask me the question why we are not pushing directly to the worel you can also do that but keeping the all your code to the GitHub will help you to update in later on you will not lose your code at any time and it will be very uh easy to share across a multiple uh uh way right so simply once it pushed make sure it is available on the GitHub now go to the versal and over here you have option to click add new click project now you have to connect your GitHub report to this one so I already connected then you will see um the name of your project because we just now pushed so it will it’s saying just now and then click import over here you can give the project name then it automatically select the framework as well now as we are also using the convex so you have to do little bit modification while building the versal project so if I go to this convex documentation here you need to follow couple of Step first thing you need to override this build command with this npx convex deploy and this npm run build so I will copy this go back to your project and over here we have this build output setting just overwrite this and I will paste the one which we copied then I will go back over here and then you need to get the convex deploy key so simply in order to get this conx deploy key go to the dashboard and and inside the dashboard select the project but make sure to select the production mode now inside this production mode obviously uh go to the settings and inside the settings you have this URL deploy key now let’s generate this production deploy key as I’m using the convex Cloud on their own platform so uh you can directly go to the dashboard here I will say production key and then save it once this key is generated just copy this key and you need to add the environment variable with this convo deploy key okay so that’s how easy it is but before that after overriding this command click environment variable and go back to your project open the env. local file and copy all the environment Keys like this and simply paste it here boom right then I will add one more key and then we want to give the name as a convex dep key so I will add that and from the convex production we need to also copy the convex production key and I will paste it here and simply click deploy now if you face any issue while deployment just check the log and just fix the issue or fix the error as simple as that you don’t need to worry about if it’s uh not uh deployed correctly or if you get any error so now we’ll wait to to finish the deployment right now it’s building then it will install all the dependency and then it will deploy our application on the cloud and boom our application is now live and here is the preview if I click to open this application boom we have this now dedicated domain to this one and here you can access our application so guys that’s how easily you can deploy your application on the production mode on the cloud now one more important thing go to the stack.com and here we need to make your application to production mode and it’s quite easy I already Del it so go to the project setting and inside the project setting okay first before that let’s go to the domain because you need to add your domain so click add new domain and you can add overal domain make sure the URL is correct you don’t need to have your own dedicated domain okay I already added so it’s saying domain already exist once you add that one make sure to disable uh this devop setting okay and then you can go to the project setting and here you need to enable this production mode and boom you are good to go okay so that’s how easy you can put this stack o on production mode as well so guys that’s all for this video If you really like this video press like button if you did not subscribe to our Channel please please do subscribe and don’t forget to press notification Bell icon once you press the notification Bell icon you will get all of my update and you will not miss any update from me so guys see you in the next video

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 30, 2025
AI Keras Neural Networks: Fundamentals and Practical Applications
The provided text introduces fundamental concepts and practical applications of machine learning and deep learning. It explains various learning paradigms like supervised, unsupervised, and reinforcement learning, alongside common algorithms such as linear regression, decision trees, support vector machines, and clustering techniques. The material further explores neural networks, convolutional neural networks, recurrent neural networks (specifically LSTMs), and large language models, detailing their architecture, training processes, and diverse applications in areas like image recognition, natural language processing, autonomous vehicles, and healthcare. Practical code examples using Python libraries like TensorFlow and Keras illustrate the implementation of these concepts, including image classification, stock price prediction, and real-time mask detection.

Machine Learning Study Guide

Quiz
1. Explain the difference between a positive and a negative relationship between variables in the context of linear relationships. Provide a brief real-world example for each.
2. In linear regression, what is the significance of the mean values of X and Y (X̄ and Ȳ) in relation to the best-fit line?
3. Describe the purpose of calculating entropy in the context of decision trees. What does a high or low entropy value indicate about the data?
4. Explain the concept of Information Gain and its role in the construction of a decision tree. How is it used to determine the splitting of data?
5. What is the fundamental goal of a Support Vector Machine (SVM) algorithm in classification? How does it aim to achieve this goal?
6. Define the term “hyperplane” in the context of SVMs. Why is this concept important when dealing with data that has more than two features?
7. In K-Means clustering, what are cluster centroids and how are they iteratively updated during the algorithm’s process?
8. Explain the “elbow method” and how it can be used to determine the optimal number of clusters (K) in a K-Means clustering analysis.
9. Describe the purpose of the sigmoid function in logistic regression. How does it transform the output of a linear equation for classification tasks?
10. Explain the concept of “nearest neighbors” in the K-Nearest Neighbors (KNN) algorithm. How does the value of K influence the classification outcome?
Quiz Answer Key
1. A positive relationship means that as one variable increases, the other variable also tends to increase (positive slope), such as speed and distance traveled in a fixed time. A negative relationship means that as one variable increases, the other tends to decrease (negative slope), such as speed and the time it takes to cover a constant distance.
2. The linear regression model’s best-fit line should always pass through the point representing the mean value of X and the mean value of Y (X̄, Ȳ). This point serves as a central tendency around which the regression line is fitted to minimize error.
3. Entropy in decision trees is a measure of randomness or impurity within a dataset. High entropy indicates a mixed or chaotic dataset with no clear class separation, while low entropy indicates a more homogeneous dataset where the classes are well-defined.
4. Information Gain measures the reduction in entropy after a dataset is split based on an attribute. It guides the decision tree construction by selecting the attribute that yields the highest information gain for each split, effectively increasing the purity of the resulting subsets.
5. The fundamental goal of an SVM is to find the optimal hyperplane that best separates data points belonging to different classes. It achieves this by maximizing the margin, which is the distance between the hyperplane and the nearest data points (support vectors) from each class.
6. A hyperplane is a decision boundary in an N-dimensional space that separates data points into different classes. In SVMs with more than two features, the decision boundary becomes a hyperplane (a line in 2D, a plane in 3D, etc.) necessary to separate the data effectively in higher-dimensional space.
7. Cluster centroids are the mean vectors of the data points within each cluster in K-Means. Initially, they can be chosen randomly or strategically. During the iterative process, each data point is assigned to the nearest centroid, and then the centroids are recalculated as the mean of all data points assigned to that cluster.
8. The elbow method is a technique to find the optimal K by plotting the within-cluster sum of squares (WSS) against the number of clusters (K). The “elbow” point, where the rate of decrease in WSS starts to diminish sharply, suggests a good balance between minimizing WSS and avoiding overfitting by having too many clusters.
9. The sigmoid function in logistic regression is an S-shaped curve that takes any real-valued number and maps it to a probability value between 0 and 1. This transformation allows the linear output of the regression equation to be interpreted as the probability of belonging to a particular class in a classification problem.
10. In KNN, the “nearest neighbors” are the K data points in the training set that are closest to a new, unlabeled data point based on a distance metric (e.g., Euclidean distance). The value of K determines how many neighbors are considered when classifying the new point; a majority vote among these K neighbors determines the class assigned to the new data point.
Essay Format Questions
1. Compare and contrast linear regression and logistic regression. Discuss the types of problems each algorithm is best suited for and explain the key differences in their approaches and outputs.
2. Explain the process of building a decision tree, including the concepts of entropy and information gain. Discuss the advantages and potential limitations of using decision trees for classification.
3. Describe the core principles behind the Support Vector Machine algorithm. Elaborate on the role of the hyperplane and margin, and discuss scenarios where SVMs might be a particularly effective classification technique.
4. Outline the steps involved in the K-Means clustering algorithm. Discuss the importance of choosing an appropriate value for K and explain methods like the elbow method used for this purpose.
5. Consider a real-world problem where multiple machine learning algorithms could be applied (e.g., predicting customer churn, classifying emails as spam). For two different algorithms discussed in the sources (e.g., decision trees and logistic regression), explain how each algorithm could be used to address the problem and discuss potential strengths and weaknesses of each approach in this context.
Glossary of Key Terms
- Positive Relationship:
- A relationship between two variables where an increase in one variable is associated with an increase in the other.
- Negative Relationship: A relationship between two variables where an increase in one variable is associated with a decrease in the other.
- Linear Regression: A statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation.
- Mean: The average of a set of numbers, calculated by summing all the values and dividing by the count of the values.
- Linear Regression Model: A mathematical equation (typically in the form y = mx + c for simple linear regression) that represents the best linear relationship between the independent and dependent variables.
- Slope (m): The rate of change of the dependent variable with respect to the independent variable in a linear equation. It indicates the steepness and direction of the line.
- Coefficient (c or b): The y-intercept of a linear equation, representing the value of the dependent variable when the independent variable is zero.
- Scatter Plot: A type of plot that displays pairs of values as points on a Cartesian coordinate system, used to visualize the relationship between two variables.
- Entropy: A measure of randomness or impurity in a dataset, often used in the context of decision trees.
- Information Gain: The reduction in entropy achieved by splitting a dataset on a particular attribute, used to determine the best splits in a decision tree.
- Decision Tree: A tree-like structure used for classification or regression, where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or a predicted value.
- Support Vector Machine (SVM): A supervised learning algorithm used for classification and regression. It works by finding the hyperplane that best separates the different classes in the data.
- Hyperplane: A decision boundary in an N-dimensional space that separates data points belonging to different classes in an SVM.
- Margin: The distance between the separating hyperplane and the nearest data points (support vectors) in an SVM. The goal is to maximize this margin.
- Support Vectors: The data points that lie closest to the hyperplane and are crucial for defining the margin in an SVM.
- K-Means Clustering: An unsupervised learning algorithm that aims to partition n observations into k clusters, in which each observation belongs to the cluster with the nearest mean (cluster centroid).
- Cluster Centroid: The mean of the data points assigned to a particular cluster in K-Means.
- Elbow Method: A heuristic method used to determine the optimal number of clusters (K) in K-Means by plotting the within-cluster sum of squares (WSS) against different values of K and looking for an “elbow” in the plot.
- Logistic Regression: A statistical model that uses a sigmoid function to model the probability of a binary outcome. It is used for binary classification problems.
- Sigmoid Function: A mathematical function that produces an “S” shaped curve, often used in logistic regression to map any real value into a probability between 0 and 1.
- K-Nearest Neighbors (KNN): A supervised learning algorithm used for classification and regression. It classifies a new data point based on the majority class among its k nearest neighbors in the training data.
- Nearest Neighbors: The data points in the training set that are closest to a new, unlabeled data point based on a distance metric.
- K (in KNN): The number of nearest neighbors considered when classifying a new data point in the KNN algorithm.
Briefing Document: Review of Machine Learning Concepts and Algorithms

This briefing document summarizes the main themes and important ideas presented in the provided excerpts, covering fundamental concepts in machine learning, linear regression, decision trees, support vector machines (SVMs), K-Means clustering, logistic regression, K-Nearest Neighbors (KNN), recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) networks, convolutional neural networks (CNNs), and transfer learning.

1. Foundational Machine Learning Concepts

The sources introduce fundamental concepts like positive and negative relationships between variables, illustrated with the example of a bicyclist. A positive relationship means “as distance increase so does speed increase,” while a negative relationship means “as the speed increases time decreases.”

The importance of data in machine learning is emphasized throughout. Different algorithms require different formats and preprocessing of data to function effectively.

2. Linear Regression

Linear regression is presented as a method for finding the best-fit line through a set of data points using the formula “y = MX + C.” The process involves:
- Calculating the mean of the x and y values. “remember mean is basically the average.”
- Finding the slope (M) using the formula: “m equals the sum of x – x average * y – y aage or y means and X means over the sum of x – x means squared.”
- Calculating the y-intercept (C) by using the mean values and the calculated slope. “since we know that value we can simply plug plug that into our formula y = 2x + C.”
- Predicting new values using the derived regression equation.
- Evaluating the error between the predicted and actual values. “our goal is to reduce this error we want to minimize that error value on our linear regression model minimizing the distance.”
- The concept is extended to multiple dimensions, where the formula becomes more complex with more features. “this is only two Dimensions y = mx + C but you can take that out to X Z ijq all the different features in there.”
3. Decision Trees

Decision trees are described as “a tree-shaped algorithm used to determine a course of action.” Key concepts include:
- Splitting data based on different attributes to make decisions. Each branch represents a possible outcome.
- The challenge of determining the optimal split, especially with complex data. “how do you know what to split where do you split your data what if this is much more complicated data?”
- Entropy as “a measure of Randomness or impurity in the data set.” Lower entropy is desired.
- Information Gain as “the measure of decrease in entropy after the data set is split.” Higher information gain indicates a better split.
- The mathematical calculation of entropy using probabilities of outcomes (e.g., playing golf or not). “In this case we’re going to denote entropy as I of P of and N where p is the probability that you’re going to play a game of golf and N is the probability where you’re not going to play the game of golf.”
- Building the decision tree by selecting the attribute with the highest information gain for each split. “we choose the attribute with the largest Information Gain as the root node and then continue to split each sub node with the largest Information Gain that we can compute.”
4. Support Vector Machines (SVMs)

SVMs are introduced as a “widely used classification algorithm” that “creates a separation line which divides the classes in the best possible manner.” Key ideas include:
- Finding the optimal hyperplane that maximizes the margin between different classes. “The goal is to choose a hyperplan…with the greatest possible margin between the decision line and the nearest Point within the training set.”
- Support vectors as the data points closest to the hyperplane, which influence its position and orientation.
- The concept of a hyperplane extending to multiple dimensions when dealing with more than two features. “One of the reasons we call it a hyperplane versus a line is that a lot of times we’re not looking at just weight and height we might be looking at 36 different features or dimensions.”
- A practical example of classifying muffin and cupcake recipes based on ingredients using Python’s sklearn library. This demonstrates data loading, visualization using seaborn and matplotlib, data preprocessing (creating labels and features), model training using svm.SVC with a linear kernel, and visualizing the decision boundary and support vectors.
5. K-Means Clustering

K-Means clustering is presented as an unsupervised learning algorithm for grouping data points into clusters based on their similarity. Key steps include:
- Selecting initial cluster centroids, either randomly or by choosing the farthest apart points.
- Assigning each data point to the closest cluster based on the distance to the centroids (often Euclidean distance).
- Recalculating the centroids of each cluster as the mean of the points assigned to it.
- Repeating the assignment and centroid recalculation until the cluster assignments no longer change (convergence).
- The elbow method is introduced as a way to determine the optimal number of clusters (K) by plotting the within-cluster sum of squares (WSS) against the number of clusters and looking for an “elbow” in the graph.
- A use case of clustering cars into brands based on features like horsepower and cubic inches is mentioned, using Python with libraries like numpy, pandas, and matplotlib.
6. Logistic Regression

Logistic regression is described as “the simplest classification algorithm used for binary or multi classification problems.” It differs from linear regression by predicting categorical outcomes using the sigmoid function. Key concepts include:
- The sigmoid function (P = 1 / (1 + e^-y)) which transforms the linear regression output into a probability between 0 and 1, generating an “S-shaped” curve.
- The logarithmic transformation of the sigmoid function: “Ln of p over 1 – p = m * x + C.”
- A threshold value (typically 0.5) to classify the outcome. Probabilities above the threshold are rounded to 1 (e.g., pass, malignant), and those below are rounded to 0 (e.g., fail, benign).
- A use case of classifying tumors as malignant or benign using a dataset with multiple features and Python’s pandas, seaborn, and matplotlib libraries. The process includes data loading, exploration, preprocessing, model building using sklearn.linear_model.LogisticRegression, training, and evaluation.
7. K-Nearest Neighbors (KNN)

KNN is presented as a simple classification algorithm that classifies a new data point based on the majority class of its K nearest neighbors in the feature space. Key aspects include:
- Choosing a value for K, the number of neighbors to consider.
- Calculating the distance (e.g., Euclidean distance) between the new data point and all existing data points. “distance D equals the square Ro T of x – a^ 2 + y – b^ 2.”
- Selecting the K nearest neighbors based on the calculated distances.
- Assigning the new data point to the majority class among its K nearest neighbors. “majority of neighbors are pointing towards normal.”
- A use case of predicting diabetes using a dataset and Python’s pandas and sklearn libraries. The process involves data loading, preprocessing (handling missing values by replacing with the mean), splitting data into training and testing sets, scaling features using StandardScaler, training a KNeighborsClassifier, making predictions, and evaluating the model using metrics like the confusion matrix, F1 score, and accuracy.
8. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks

RNNs are introduced as neural networks designed to handle sequential data. Key characteristics include:
- Recurrent connections that allow information to persist across time steps. “RNNs are distinguished by their feedback loops.”
- The challenge of vanishing and exploding gradients in standard RNNs, making it difficult to learn long-range dependencies.
- LSTMs are presented as a type of RNN that addresses the vanishing gradient problem. “LSTMs are a special kind of RNN, capable of learning long-term dependencies.”
- LSTM architecture involves forget gates, input gates, and output gates to control the flow of information through the cell state.
- Forget gate: Decides which information from the previous state to discard. “F of equals forget gate decides which information to delete that is not important from the previous time step.”
- Input gate: Decides which new information to add to the cell state. “I of T equals the input gate determines which information to let through based on its significance in the current time step.”
- Output gate: Decides which information from the cell state to output. “our o equals the output gate allows the past in information to impact the output in the current time step.”
- A use case of predicting stock prices using an LSTM network and Python’s Keras library (running on TensorFlow). The process includes data loading, feature scaling (MinMaxScaler), creating time series data with specified time steps, reshaping data for the LSTM layer, building a sequential LSTM model with dropout regularization, compiling the model, training it on the historical stock prices, and making predictions for future prices.
9. Convolutional Neural Networks (CNNs)

CNNs are introduced as a powerful type of neural network particularly effective for image recognition. Key components and concepts include:
- Convolutional layers that use filters (kernels) to extract features from the input image. “The basic building block of a CNN is the convolutional layer.”
- Pooling layers that reduce the spatial dimensions of the feature maps, making the network more robust to variations in the input. “The pooling layer’s function is to progressively reduce the spatial size of the representation.”
- Activation functions (e.g., ReLU) applied to the output of convolutional layers.
- Flattening the feature maps before feeding them into fully connected layers for classification.
- The success of CNNs in tasks like image classification, object detection, and image segmentation.
- A use case of building a CNN to classify images from the CIFAR-10 dataset (10 classes of objects) using Python’s TensorFlow and Keras libraries. The process involves loading the dataset, preprocessing (normalizing pixel values and one-hot encoding labels), building a CNN model with convolutional layers, pooling layers, dropout, flattening, and dense layers, compiling the model with an optimizer and loss function, and training it on the CIFAR-10 training data. Helper functions for one-hot encoding and setting up images are also described.
10. Transfer Learning

Transfer learning is presented as a technique to improve the performance of a model on a new, smaller dataset by leveraging knowledge learned from a pre-trained model on a large, related dataset. Key ideas include:
- Using a pre-trained base model (e.g., a CNN trained on ImageNet) as a feature extractor.
- Freezing the weights of the pre-trained layers to prevent them from being updated during the initial training on the new dataset. “Loop over all the layers in the base model and freeze them so they will not be updated during the first training process.”
- Adding a new classification head (e.g., dense layers) specific to the new task.
- Training only the weights of the new head on the smaller dataset.
- Optionally, unfreezing some of the later layers of the base model for fine-tuning after the head has been trained.
- A use case of using a pre-trained ResNet50 model (available in TensorFlow.Keras.applications) for a mask detection task. The process involves loading the pre-trained base model, freezing its layers, adding a custom classification head, compiling the model, training it on a dataset of images with and without masks (using data augmentation to increase the training data), evaluating the model’s performance (precision, recall, F1-score, accuracy), and saving the trained model.
11. Ethical Considerations

The example of classifying tumors (malignant or benign) with logistic regression briefly touches upon ethical considerations in the medical domain. Even with high probability predictions, the user would likely seek professional medical confirmation (“I’m guessing that you’re going to go get it tested anyways”). This highlights the importance of understanding the context and limitations of machine learning models, especially in high-stakes applications.

Overall, the provided excerpts offer a foundational overview of several key machine learning algorithms and concepts, illustrated with practical examples and code snippets using popular Python libraries. They emphasize the importance of data preprocessing, model selection, training, and evaluation in building effective machine learning solutions for various types of problems.

Machine Learning Algorithms: Core Concepts Explained

Frequently Asked Questions about Machine Learning Algorithms

1. What is the fundamental idea behind linear regression? Linear regression aims to model the relationship between a dependent variable (the one we want to predict) and one or more independent variables (the features we use for prediction) by fitting a linear equation (a straight line in two dimensions, or a hyperplane in higher dimensions) to the observed data. The goal is to find the line that best represents the trend in the data, allowing us to predict the dependent variable for new values of the independent variables.

2. How do decision trees work for classification? Decision trees are tree-like structures where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (the prediction). To classify a new instance, we start at the root node and follow the branches corresponding to the outcomes of the tests at each node until we reach a leaf node, which provides the classification. The tree is built by recursively splitting the data based on the attribute that provides the most information gain (or the largest reduction in entropy), aiming to create subsets that are increasingly pure with respect to the target class.

3. What is the core principle of the Support Vector Machine (SVM) algorithm for classification? The primary goal of an SVM is to find the optimal hyperplane that best separates data points belonging to different classes in a dataset. This “best” hyperplane is the one that maximizes the margin, which is the distance between the hyperplane and the nearest data points from each class (called support vectors). By maximizing this margin, the SVM aims to create a decision boundary that generalizes well to unseen data, reducing the risk of misclassification.

4. Can you explain the concepts of entropy and information gain in the context of decision trees? Entropy is a measure of the impurity or randomness within a dataset. A dataset with a mix of different classes has high entropy, while a dataset with only one class has low (ideally zero) entropy. Information gain is the reduction in entropy achieved after splitting the dataset on a particular attribute. When building a decision tree, the attribute with the highest information gain is chosen as the splitting criterion at each node, because it leads to the most significant decrease in impurity in the resulting subsets.

5. How does the K-Means clustering algorithm group data points? K-Means clustering is an iterative algorithm that aims to partition a dataset into $K$ distinct, non-overlapping clusters. It starts by randomly initializing $K$ centroids (representing the center of each cluster). Then, it repeatedly performs two steps: (1) assigning each data point to the cluster whose centroid is nearest (using a distance metric like Euclidean distance), and (2) recalculating the centroids of each cluster as the mean of all the data points assigned to that cluster. This process continues until the centroids no longer move significantly, indicating that the clusters have stabilized. The “elbow method” can be used to help determine an appropriate value for $K$.

6. What is the role of the sigmoid function in logistic regression? In logistic regression, the sigmoid function (also known as the logistic function) is used to transform the linear combination of input features into a probability between 0 and 1. While linear regression can produce continuous output values, logistic regression is used for classification tasks where we need to predict the probability of an instance belonging to a particular class. The sigmoid function maps any real-valued number to a value between 0 and 1, which can be interpreted as the probability of the event occurring. A threshold (often 0.5) is then used to classify the instance into one of the two classes.

7. How do Recurrent Neural Networks (RNNs) handle sequential data differently from standard feedforward networks? Standard feedforward neural networks process each input independently, without memory of past inputs in a sequence. RNNs, on the other hand, are designed to process sequences of data by maintaining an internal state (or memory) that is updated as each element of the sequence is processed. This allows RNNs to capture dependencies and patterns across time steps in the input sequence. They achieve this through recurrent connections, where the output of a neuron at one time step can be fed back as input to the neuron (or other neurons in the network) at the next time step.

8. What are Long Short-Term Memory (LSTM) networks, and what problem do they address in RNNs? Long Short-Term Memory (LSTM) networks are a specific type of RNN architecture that is designed to address the vanishing gradient problem, which can make it difficult for standard RNNs to learn long-range dependencies in sequential data. LSTMs introduce a more complex memory cell with mechanisms called “gates” (input gate, forget gate, and output gate) that control the flow of information into, out of, and within the cell state. These gates allow LSTMs to selectively remember relevant information over long sequences and forget irrelevant information, enabling them to learn complex patterns in tasks like natural language processing and time series analysis where long-term context is important.

Supervised Learning: Concepts and Applications

Supervised learning is a method used to enable machines to classify or predict objects, problems, or situations based on labeled data that is fed to the machine. In supervised learning, you already know the answer for a lot of the information coming in.

Here’s a breakdown of key aspects of supervised learning based on the sources:
- Labeled Data: Supervised learning relies on labeled data for training the machine learning model. This means that for each input data point, there is a corresponding correct output or target variable provided.
- Direct Feedback: During the training process, the model receives direct feedback based on the labeled data. This feedback helps the model learn the relationship between the inputs and the correct outputs.
- Prediction of Outcomes: The goal of supervised learning is to train a model that can predict the outcome for new, unseen data based on the patterns it learned from the labeled training data.
- Examples: The sources provide several examples of tasks that can be addressed using supervised learning:
- Predicting whether someone will default on a loan.
- Predicting whether you will make money on the stock market.
- Classification, where you want to predict a category, such as whether a stock price will increase or decrease (a yes/no answer or a 0/1 outcome).
- Regression, where you want to predict a quantity, such as predicting the age of a person based on height, weight, health, and other factors.
- Building a classifier using Support Vector Machines (SVM) to classify if a recipe is for a cupcake or a muffin.
- Classifying a tumor as malignant or benign based on features, which can be done using logistic regression.
Comparison with Unsupervised Learning:

The sources explicitly contrast supervised learning with unsupervised learning:
- In supervised learning, the data is labeled, and there is direct feedback to the model. The aim is to predict a specific outcome.
- In unsupervised learning, the data is unlabeled, and there is no feedback provided during training. The goal is to find hidden structures in the data and group the data together to discover relationships.
The sources also suggest that supervised and unsupervised learning can be used together. For instance, you might use unsupervised learning to find connected patterns in unlabeled image data, and then label those groups. This labeled data can then be used to train a supervised learning model to predict what’s in future images.

In summary, supervised learning is a powerful approach in machine learning that leverages labeled data to train models for prediction and classification tasks, relying on direct feedback to learn the underlying relationships within the data.

Understanding Unsupervised Learning: Concepts and Techniques

Unsupervised learning is a type of machine learning where a model is trained on unlabeled data to find hidden patterns and structure within the data. Unlike supervised learning, there are no target variables or correct answers provided during the training process, and the model does not receive direct feedback on its predictions. The goal is to discover inherent relationships, similarities, and groupings in the data without prior knowledge of what these might be.

Here’s a breakdown of key aspects of unsupervised learning based on the sources:
- Unlabeled Data: Unsupervised learning algorithms work with datasets that do not have predefined labels or categories. The algorithm must learn the underlying structure of the data on its own.
- Finding Hidden Patterns: The primary objective of unsupervised learning is to identify hidden patterns, structures, or relationships that might not be immediately obvious in the unlabeled data.
- No Direct Feedback: Since the data is unlabeled, there is no feedback mechanism that tells the model whether its findings are correct or incorrect. The evaluation of unsupervised learning models often relies on subjective interpretation of the discovered patterns or on downstream tasks that utilize the discovered structures.
- Clustering: One of the main applications of unsupervised learning is clustering, which involves grouping data points into clusters based on their feature similarity. The aim is to create groups where data points within a cluster are more similar to each other than to those in other clusters.
- K-means clustering is highlighted as a commonly used clustering tool and an example of unsupervised learning. It works by defining a specified number (K) of clusters and assigning random centroids. It then iteratively computes the distance of data points to these centroids, forms new clusters based on minimum distances, and recalculates the centroids until the cluster centroids stop changing.
- Hierarchical clustering is another clustering algorithm that creates a tree-like structure (dendrogram) by either agglomerating similar data points from the bottom up or dividing them from the top down.
- Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a density-based algorithm that identifies clusters based on the density of data points and can also handle outliers by labeling them as noise.
- Dimensionality Reduction: Unsupervised learning can also be used for dimensionality reduction, which aims to reduce the number of variables in a dataset while retaining the most important information.
- Principal Component Analysis (PCA) is mentioned as a dimensionality reduction technique that transforms data into a smaller set of uncorrelated variables (principal components) to capture the most variance in the data.
- Autoencoders, a type of neural network, can also be used for dimensionality reduction by learning efficient representations of data.
- Anomaly Detection: Unsupervised learning techniques can be employed to detect anomalies or unusual data points that deviate significantly from the normal patterns in the data.
- Association Rule Mining: While not detailed extensively, the sources mention association algorithms as another type of unsupervised learning problem, focusing on discovering relationships or associations between variables in large datasets.
- Deep Learning: Unsupervised learning principles are also applied in deep learning using algorithms like autoencoders and generative models for tasks such as clustering, dimensionality reduction, and anomaly detection.
Relationship with Supervised Learning:

As mentioned in our previous discussion, supervised learning uses labeled data for prediction. The sources highlight that unsupervised learning is used when the data is unlabeled and the goal is to discover inherent structure. However, the sources also note that these two approaches can be complementary. For example, unsupervised learning can be used to preprocess data or discover initial groupings, which can then inform the labeling process for subsequent supervised learning tasks.

In summary, unsupervised learning is a valuable set of techniques for exploring and understanding unlabeled data by identifying hidden patterns, groupings, and reductions in dimensionality, providing insights without relying on prior knowledge of the data’s categories or outcomes.

Reinforcement Learning: Agent-Environment Interaction and Reward Maximization

Reinforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result. This learning process aims to enable the agent to maximize a reward signal over time.

Here’s a breakdown of key aspects of reinforcement learning based on the sources:
- Agent and Environment: In reinforcement learning, there is an agent that interacts with an environment. The agent is the learner that takes actions. The environment is the setting in which the agent operates and to which it responds.
- Actions and Results: The agent learns by taking actions within the environment. After each action, the agent receives feedback in the form of a new state of the environment and a reward (or punishment).
- Learning by Trial and Error: Similar to how humans learn from experience, reinforcement learning involves a process of trial and error. The agent explores different actions and learns which actions lead to positive rewards and which lead to negative rewards.
- Maximizing Rewards: The ultimate goal of the agent is to learn a policy – a mapping from states to actions – that maximizes the cumulative reward it receives over time.
- Examples: The sources provide an intuitive example of a baby learning not to touch fire after experiencing the pain of being burned. This illustrates the concept of learning through actions and their consequences. Other examples of tasks where reinforcement learning is used include:
- Robotics
- Game playing, using algorithms like Deep Q Networks
- Optimizing shipping routes for a logistics company by considering fuel prices, traffic, and weather (mentioned in the context of “agentic AI”, which builds upon reinforcement learning principles).
- Relation to Other Machine Learning Types: The sources classify reinforcement learning as one of the basic divisions of machine learning, alongside supervised and unsupervised learning. Deep learning AI can also be applied using reinforcement learning methods.
- Current State and Future Potential: The sources describe reinforcement learning as being in its “infant stages” but also highlight it as having potentially the “biggest machine learning demand out there right now or in the future“. This suggests that while it’s a developing field, it holds significant promise for creating intelligent systems.
In essence, reinforcement learning focuses on training agents to make optimal decisions in dynamic environments by learning from the consequences of their actions, aiming to achieve long-term goals through the accumulation of rewards.

Understanding Neural Networks: Foundations and Applications

Neural networks are a fundamental component of deep learning and are inspired by the structure and function of the human brain. They consist of interconnected layers of artificial neurons (or units) that work together to process information.

Here’s a detailed discussion of neural networks based on the sources:
- Biological Inspiration: Artificial neural networks (ANNs) are biologically inspired by the animal brain and its interconnected neurons. They aim to simulate the human brain using artificial neurons. A biological neuron receives inputs through dendrites, processes them in the cell nucleus, and sends output through a synapse. An artificial neuron has analogous components: inputs, a processing unit involving weights and biases, and an output.
- Perceptron: The Basic Unit: A perceptron can be considered one of the fundamental units of neural networks. It can consist of at least one neuron and can function as a basic binary classifier. A basic perceptron receives inputs, multiplies each input by a weight, adds a bias, and then passes the result through an activation function to produce an output (e.g., 0 or 1, indicating whether the neuron is “activated” or not).
- Structure of Neural Networks:
- A fully connected artificial neural network typically includes an input layer, one or more hidden layers, and an output layer.
- The input layer receives data from external sources.
- Each neuron in the hidden layers computes a weighted sum of its inputs (from the previous layer) and applies an activation function to the result before passing it to the next layer.
- The output layer produces the network’s response.
- Weights are associated with the connections between neurons, and these weights are adjusted during training to optimize the network’s performance.
- A bias is added to the weighted sum in each neuron. Unlike weights (which are per input), there is one bias per neuron, and its value is also adjusted during training.
- Activation functions in each neuron decide whether a neuron should be “fired” or not, determining the output (e.g., zero or one) based on the weighted sum of inputs plus the bias. Common activation functions mentioned include ReLU and Sigmoid.
- Training Process:
- The training process involves feeding labeled data (input and expected output) into the network.
- The network makes a prediction, which is compared to the actual (labeled) output.
- The difference between the predicted and actual output is the error, which is measured by a cost function.
- This error is then fed back through the network in a process called backpropagation, which helps in adjusting the weights and biases of the neurons.
- The goal of training is to minimize the cost function, and an optimization technique called gradient descent is commonly used for this purpose by iteratively adjusting weights and biases. The learning rate in gradient descent determines the step size for these adjustments.
- This is an iterative process that continues until the error is minimized to a satisfactory level or a specified number of iterations (epochs) is reached.
- Logical Functions: Early research showed that single-layer perceptrons could implement basic logical functions like AND and OR by adjusting the weights and biases. However, implementing the XOR gate required a multi-level perceptron (MLP) with at least one hidden layer, which overcame an early roadblock in neural network development.
- Types of Neural Networks: The sources describe several common architectures in deep learning:
- Feedforward Neural Networks (FNN): The simplest type, where information flows linearly from input to output. They are used for tasks like image classification, speech recognition, and Natural Language Processing (NLP). Sequential models in Keras are an example of this, where layers are stacked linearly.
- Convolutional Neural Networks (CNN): Designed specifically for image and video recognition. They automatically learn features from images through convolutional operations, making them ideal for image classification, object detection, and image segmentation. CNNs involve layers like convolutional layers, ReLU layers, and pooling (reduction) layers.
- Recurrent Neural Networks (RNN): Specialized for processing sequential data, time series, and natural language. They maintain an internal state to capture information from previous inputs, making them suitable for tasks like speech recognition, NLP, and language translation. Long Short-Term Memory (LSTM) networks are a type of RNN.
- Deep Neural Networks (DNN): Neural networks with multiple layers of interconnected nodes (including multiple hidden layers) that enable the automatic discovery of complex representations from raw data. CNNs and RNNs with multiple layers are considered DNNs.
- Deep Belief Networks (DBN): Mentioned as one of the types of neural networks.
- Autoencoders: A type of neural network used for learning efficient data representations, typically for dimensionality reduction or anomaly detection.
- Applications of Deep Learning and Neural Networks: Deep learning, powered by neural networks, has numerous applications across various domains:
- Autonomous Vehicles: CNNs process data from sensors and cameras for object detection, traffic sign recognition, and driving decisions.
- Healthcare Diagnostics: Analyzing medical images (X-rays, MRIs, CT scans) for early disease detection.
- Natural Language Processing (NLP): Enabling sophisticated text generation, translation, and sentiment analysis (e.g., Transformer models like ChatGPT).
- Deepfake Technology: Creating realistic synthetic media, raising ethical concerns.
- Predictive Maintenance: Analyzing sensor data to predict equipment failures in industries.
- Gaming: AI systems like AlphaGo that can defeat human world champions.
- Synthesizing Images, Music, and Text: Generative Adversarial Networks (GANs) can be used for this.
- Robotics: Enabling human-like capabilities in robots.
- Speech Recognition: Converting audio into text.
- Image Captioning: Analyzing images and generating descriptive captions using RNNs.
- Time Series Prediction: Using RNNs to predict future values based on sequential data, such as stock prices.
- Sentiment Analysis: Determining the emotional tone of text using RNNs.
- Machine Translation: Translating text between different languages using RNNs.
- Fraud Detection: Identifying unusual financial transactions using autoencoders.
- Recommendation Systems: Providing personalized content recommendations.
- Image Enhancement: Features like in-painting and out-painting in tools like Stable Diffusion.
- Face Mask Detection: Building models to check if a person is wearing a mask.
- Relationship with Deep Learning, Machine Learning, and AI:
- Deep learning is a subset of machine learning, which in turn is a branch of artificial intelligence.
- Neural networks, particularly deep neural networks with multiple layers, are the main component of deep learning.
- Unlike traditional machine learning, deep learning models can automatically discover representations (features) from raw data, eliminating the need for manual feature extraction.
- Tools and Platforms:
- TensorFlow is highlighted as a popular open-source platform developed and maintained by Google for developing deep learning applications using neural networks. It supports both CPUs and GPUs for computation and uses tensors (multi-dimensional arrays) and graphs to represent and execute computations.
- Keras is presented as a high-level API that can run on top of TensorFlow (and other backends), making it straightforward to build neural network models, including sequential and functional models. Keras simplifies the process of defining layers (like dense, activation, dropout), compiling the model with optimizers and loss functions, and training it on data.
In summary, neural networks are powerful computational models inspired by the human brain, forming the core of deep learning. They learn complex patterns from data through interconnected layers of neurons with adjustable weights and biases, trained using techniques like backpropagation and gradient descent. With various architectures tailored for different types of data, neural networks have enabled significant advancements across a wide range of applications in artificial intelligence.

Deep Learning: Foundations, Methods, and Applications

Deep learning is presented in the sources as a subset of machine learning, which itself is a branch of artificial intelligence. It is defined as a type of machine learning that imitates how humans gain certain types of knowledge. Unlike traditional machine learning models that require manual feature extraction, deep learning models automatically discover representations from raw data. This capability is primarily achieved through the use of neural networks, particularly deep neural networks that consist of multiple layers of interconnected nodes.

Here’s a more detailed discussion of deep learning based on the sources:
- Core Component: Neural Networks: Neural networks are the main component of deep learning. These networks are inspired by the structure and function of the human brain, consisting of interconnected layers of artificial neurons [6, Me]. Deep learning utilizes deep neural networks, meaning networks with multiple hidden layers [6, Me]. These layers enable the network to transform input data into increasingly abstract and composite representations. For instance, in image recognition, initial layers might detect simple features like edges, while deeper layers recognize more complex structures like shapes and objects.
- Types of Deep Learning: Deep learning AI can be applied using supervised, unsupervised, and reinforcement machine learning methods.
- Supervised learning in deep learning involves training neural networks to make predictions or classify data using labeled datasets. The network learns by minimizing the error between its predictions and the actual targets through a process called backpropagation. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are common deep learning algorithms used for tasks like image classification, sentiment analysis, and language translation.
- Unsupervised learning in deep learning involves neural networks discovering patterns or clusters in unlabeled datasets without target variables. Algorithms like Autoencoders and generative models are used for tasks such as clustering, dimensionality reduction, and anomaly detection.
- Reinforcement learning in deep learning (Deep Reinforcement Learning) involves an agent learning to make decisions in an environment to maximize a reward signal over time [6, Me]. Algorithms like Deep Q-Networks are used for tasks such as robotics and gameplay [6, Me].
- Training Deep Learning Models: Training deep learning models often requires significant data and computational resources. The process typically involves:
- Data Pre-processing: Transforming textual data into a numerical representation (tokenization, encoding). Applying techniques like scaling, normalization, and encoding to make data more usable.
- Random Parameter Initialization: Initializing the model’s parameters randomly before training.
- Feeding Numerical Data: Inputting the numerical representation of the text data into the model.
- Loss Function Calculation: Measuring the discrepancy between the model’s predictions and the actual targets using a loss function [11, Me].
- Parameter Optimization: Adjusting the model’s parameters (weights and biases) through optimization techniques like gradient descent to minimize the loss [11, Me].
- Iterative Training: Repeating the training process over multiple iterations (epochs) until the model achieves satisfactory accuracy [11, Me].
- Advantages of Deep Learning:
- High Accuracy: Achieves state-of-the-art performance in tasks like image recognition and natural language processing.
- Automated Feature Engineering: Automatically discovers and learns relevant features from data without manual intervention.
- Scalability: Can handle large and complex datasets and learn from massive amounts of data.
- Makes processes quicker and simpler for data scientists to gather, analyze, and interpret massive amounts of data.
- Disadvantages of Deep Learning:
- High Computational Requirements: Requires significant data and computational resources (like GPUs) for training.
- Need for Large Labeled Datasets: Often requires extensive labeled data for supervised learning, which can be costly and time-consuming to obtain.
- Overfitting: Can overfit to the training data, leading to poor performance on new, unseen data.
- Applications of Deep Learning: Deep learning is revolutionizing various industries and has a wide range of applications:
- Autonomous Vehicles: Object detection, traffic sign recognition.
- Healthcare Diagnostics: Medical image analysis, early disease detection.
- Natural Language Processing (NLP): Text generation, translation, sentiment analysis, chatbots.
- Deepfake Technology: Creation of realistic synthetic media.
- Predictive Maintenance: Predicting equipment failures.
- Gaming: Creating advanced AI for games.
- Content Creation: Synthesizing images, music, and text.
- Robotics: Enabling more human-like robot capabilities.
- Speech Recognition: Converting spoken language to text.
- Image Recognition: Identifying objects and features in images.
- Fraud Detection: Identifying unusual patterns in financial transactions.
- Recommendation Systems: Providing personalized suggestions.
- Relationship with Other AI Concepts:
- Machine Learning: Deep learning is a subfield of machine learning, distinguished by the use of deep neural networks and automatic feature learning.
- Artificial Intelligence (AI): Deep learning is a powerful technique within the broader field of AI, enabling systems to perform complex tasks that previously required human intelligence.
- Tools and Platforms for Deep Learning:
- TensorFlow: An open-source platform developed by Google, widely used for developing deep learning applications. It supports both CPUs and GPUs and uses tensors for data manipulation.
- PyTorch: Another popular open-source machine learning framework often used for deep learning research and development.
- Keras: A high-level API that can run on top of TensorFlow (and other backends), simplifying the process of building and training neural networks.
In conclusion, deep learning, powered by multi-layered neural networks, represents a significant advancement in AI. Its ability to automatically learn intricate patterns from vast amounts of data has led to remarkable progress in numerous fields, making it a crucial technology in the ongoing AI revolution.

Artificial Intelligence Full Course 2025 | Artificial Intelligence Tutorial | AI Course |Simplilearn

The Original Text

hello everyone and welcome to artificial intelligent full course by simply learn AI or artificial intelligence is changing how machines work teaching them to think learn and make decisions like humans you already see AI in action with Siri Alexa Netflix recommendations and even self-driving cars by 2025 AI will be even bigger with Industries like healthcare finance and Tech relying it to boost Innovation this means huge job opportunities and high salaries with AI professionals earning up to 6 to 10 LP in India and around $100,000 in the US in this course you will learn basics of AI including neural networks deep learning and recording neural networks as well the technology powering modern AI you’ll also explore curent opportunities in Ai and get expert tips to prepare for job interviews and build the skills needed to succeed in this fast growing field but before we comment if you are interested in mastering the future of technology the profession certificate course in generative Ai and machine learning is your perfect opportunity offered in collaboration with the enic academy ID canut this 11 month online live and interactive program provides hands-on experience in Cutting Edge tools like generative AI machine learning and chpt D2 as well you’ll also gain practical experience to 15 plus projects integrated labs and life master classes delivered by esteemed it carool faculty so hurry up and find the course Link in the description box below and in the pin comments so let’s get started Liam a 19-year-old freshman recently joined an Ivy League College to study history and political science while reading about thinkers and Scholars of the early 20th century he stumbled upon a name Alan tur Liam was fascinated by Allan and realized that the computer that he knows of today Allan is considered to be the father of modern computer science that eventually led to the invention of his computer but there was something that was even more fascinating about Allan although Alan Turing was famous for his work developing the first modern computers decoding the encryption of German Enigma machines during the second world war he also built a detailed procedure known as the touring test forming the basis for artificial intelligence Liam had his mind blown by this fact he realized that AI is not a modern phenomen phenon but rather more than a thought experiment existing since the early 9s Liam used AI tools like chat GPT perplexity and consensus on a daily basis for his research he had a smartphone that he used for multiple tasks like using Siri or Google Assistant to find local food places using autocorrect of multiple apps like Instagram and WhatsApp and even AI photo editing features he realized that AI has seeped into almost every aspect of his life for making trivial decisions like where to have his morning coffee to complex AI tools like chat GPT for his research to even his father’s self-driving Tesla that he used whenever he got a chance to artificial intelligence or AI in the 21st century has become a very subtle technology that exists in every human’s life without them even realizing it but what is this AI does this mean robots in a completely dystopian AI warlord future not really let us dive a little deeper into understanding everything about AI artificial intelligence or AI is like giving computers the ability to think and learn much like humans do imagine teaching a friend how to solve puzzles and then that friend can solve different types of puzzles on their own AI Works similarly it helps computers understand and Carry Out tasks that typically need human intelligence these tasks include recognizing faces and photos chatting with us through sart assistants like Siri or Google assistant and even driving cars think of AI as a smart helper that makes our daily lives easier it can learn from data make decisions and improve itself over time this means that AI isn’t just about robots taking over the world it’s more about using smart technology to assist us in various ways making complex tasks simpler and everyday routines smoother AI has found its way into many areas of Our Lives often making things easier without us even realizing it in healthcare for example AI helps doctors by quickly analyzing medical images like x-rays to detect issues faster than the human IMI in finance AI Works to keep our money safe by spotting unusual activities in our bank accounts that could indicate fraud when you stream shows on Netflix AI suggests movies and series based on what you’ve watched and liked before in retail AI manages stock and predicts what items will be popular ensuring that store shelves are filled with what customers need Even in our home AI is at work through smart devices like thermostats that learn your schedule and adjust the temperature automatically or lights that turn on when you enter a room AI touches so many parts of our daily lives making things more convenient and efficient one of the best AI applications today which is very widely known and used is chat GPT an advanced AI developed by open AI that can chat with you just like a human imagine having a friend who knows almost everything and can help you with any question or topic that’s what chat GPT does but how does it work chat GPT is powered by something called a Transformer model this is a type of machine learning model that learns patterns in Language by looking at a vast amount of text Data from books websites and other sources think of it like reading millions of books and remember important information from all of them when you ask chat GPT a question it doesn’t just pull out a random answer instead it looks at the words you used understands the context and predicts what a good response would be based on what it has learned for example if you ask about the weather it understands you are looking for current weather conditions and gives you relevant information if you ask it to help with homework it draws on its knowledge to explain Concepts clearly chat GPT uses a process called Deep learning which is a bit like how our brains work it breaks down sentences into smaller parts and looks at how these parts fit together this helps it understand not just the meaning of individual words but also how they combine to convey a complete idea this is why chat GPT can handle complex questions and give answers that make sense to make sure it provides useful and accurate information chat GPT was trained on a diverse range of topics this training helps it recognize and generate text on anything from Science and History to entertainment in daily life it’s like having an encyclopedia and a friendly tutor ruled into one similar to chat GPT there are a plethora of other tools and applications being developed every day that are trained for various purposes using varied kind of data sets for example doll e which has been traded on a first data set of text and images from the internet stable diffusion which has been trained on a variety of images and corresponding text descriptions Tesla autopilot which has been trained on sensor data from Tesla vehicles and driving data and so on and so forth AI is a remarkable technology that holds great promise for the future offering solutions to some of the world’s most pressing challenges imagine a future where AI takes care of routine tasks giving us more time to be creative and focus on what we love AI can help in many ways from improving Medical Treatments to making our daily lives more efficient however it’s essential to use AI responsibly this means creating guidelines and rules to ensure AI is developed and used in ways that benefit Everyone by embracing Ai and understanding its potential we can look forward to a future where technology and human creativity go hand inand AI is not just about smart gadgets it’s about opening new possibilities and making our world a better place the future of a I is bright filled with opportunities for Innovation and progress helping us achieve things we never thought possible so let’s talk about is AI is a good career or not you have probably heard a lot about artificial intelligence or AI it’s everywhere and it’s shaking up Industries all over the world but here’s the big question is AI a good career choice yes absolutely it is take Elon Musk for example we all know him as the guy behind Tesla and SpaceX but did you know he also co-founded open AI even a laun diving into Ai and that just shows how massive this field is becoming and guess what AI isn’t just for Tech Geniuses there’s room for everyone Let’s Talk About Numbers AI jobs are growing like crazy up to 32% in recent years and the pay is pretty sweet with rols offering over $100,000 a year so we you into engineering research or even the ethical side of the things AI has something for you plus the skills you pick up in AI can be used in all sorts of Industries making it a super flexible career choice now ai is a big field and there are tons of different jobs you can go for let’s break down some of the key roles first up we have machine learning Engineers these folks are like the backbone of AI they build models that can analyze huge amounts of data in real time if you’ve got a background in data science or software engineering this could be your thing the average salary is around $131,000 in the US then there’s data scientist the detectives of the AI World they dig into Data to find patterns that help businesses make smart decisions if you’re good with programming and stats this is a great option and you can make about $105,000 a year next we’ve got business intelligence Developers they are the ones to process and analyze data to sport trends that guide business strategies if you enjoy working with data and have a background in computer science this role might be for you the average salary here is around $87,000 per year then we’ve got research scientist these are the ones pushing AI to new heights by asking Innovative questions and exploring new possibilities it’s a bit more academic often needing Advanced degrees but but it’s super rewarding with salaries around $100,000 next up we have big data engineers and Architects these are the folks who make sure all the different parts of business’s technology talk to each other smoothly they work with tools like Hadoop and Spark and they need strong programming and data visualization skills and get this the average salary is one of the highest in EI around $151,000 a year then we have AI software engineer these engineers build a software that powers AI application they need to be really good at coding and have a solid understanding of both software engineering and AI if you enjoy developing software and want to be a part of the air Revolution This Could Be Your Role the average salary is around $108,000 now if you’re more into designing systems you might want to look at becoming a software architect these guys design and maintain entire system making sure everything is scalable and efficient with expertise in Ai and Cloud platforms software Architects can earn Hefty salary about $150,000 a year let’s not forget about the data analyst they have been around for a while but their role has evolved big time with AI now they prepare data for machine learning models and create super insightful reports if you’re skilled in SQL Python and data visualization tools like Tabu this could be a a great fit for you the average salary is around $65,000 but it can go much higher in tech companies another exciting roles is robotics engineer these Engineers design and maintain AI powered robots from Factory robots to robots that help in healthcare they usually need Advanced degrees in engineering and strong skills in AI machine learning and iot Internet of Things the average salary of Robotics engineer is around $87,000 with experience it can go up to even more last but not the least we have got NLP Engineers NLP stands for natural language processing and these Engineers specialize in teaching machines to understand human language think voice assistants like Siri or Alexa to get into this role you’ll need a background in computational linguistics and programming skills the average salary of an NLP engineer is around $78,000 and it can go even higher as you gain more experience so you can see the world of AI is full of exciting opportunities whether you’re into coding designing systems working with data or even building robots there’s a role for you in this fastest growing field so what skills do you actually need to learn to land an entry-level AI position first off you need to have a good understanding of AI and machine learning Concepts you’ll need programming skills like python Java R and knowing your way around tools like tens of flow and Pie torch will help you give an edge too and do not forget about SQL pandas and big Technologies like Hadoop and Spark which are Super valuable plus experience with AWS and Google cloud is often required so which Industries are hiring AI professionals AI professionals are in high demand across a wide range of Industries here are some of the top sectors that hire AI Talent technology companies like Microsoft Apple Google and Facebook are leading with charge in AI Innovation consulting firms like PWC KPMG and Accenture looking for AI experts to help businesses transform then we have Healthcare organizations are using AI to revolutionize patient with treatment then we have got retail giants like Walmart and Amazon leverage AI to improve customer experiences then we have got media companies like Warner and Bloomberg are using AI to analyze and predict Trends in this media industry AI is not just the future it’s the present with right skills and determination you can carve out a rewarding career in this exciting field whether you’re drawn to a technical challenges or strategic possibilities there’s a role in AI that’s perfect for you so start building your skills stay curious and get ready to be a part of the air Revolution so now let’s see steps to get an AI engineer job so to thrive in this field developing a comprehensive skill set is a crucial while encompasses May specialized areas so here are some certain code skills that are essential across most roles so here is you can build these skills first one is technical skills so AI roles heavily rely on technical expertise particularly in programming data handling or working with AI specific tools or you can say the cloud specific tools so here are some key areas to focus on the first one is the programming languages so profy in journal purpose programming language like Python and R is the fundamental python in particular is widely used in AI for Simplicity and robust liability such as tlow and python which are crucial for machine learning and deep learning task the second one is database management so understanding how to manage and manipulate large data set is essential in AI familiarity with database Management Systems like Apache Cassandra couch base and Dynamo DB will allow you to store retrieve and process data efficiently the third one data analysis and statistics strong skills in data analysis are must tools like matlb Excel and pandas are invaluable for statical analysis data manipulation and visualization Trends and data which are critical for developing AI models fourth one Cloud AI platform knowing of cloudbased platform such as Microsoft Azure AI Google Cloud Ai and IBM Watson is increasingly important so these platform provide pre-build models tools and infrastructure that can accelerate AI development and deployment the second one is industry knowledge while technical skills from the backbone of your AI expertise understanding the industry context is equally important for example knowing how AI integrates with digital marketing goals and strategies can be significant Advantage if you are working in or targeting Industries like e-commerce or advertising so industry specific knowledge allows you to apply AI Solutions more effectively and communicate their value to stakeholders the third one one workplace or soft skills in addition to technical industry specific skills developing workplace skills or you can say soft skill is essential for success in AI roles or any roles so these soft skills often hor through experience include the first one is communication clearly articulating complex AI concept to non-technical stakeholder is crucial whether you are explaining how machine learning model works or presenting data driven Insight effective communication ensure that your work is understood and valued second one is collaboration AI projects often require teamwork across diverse field including data science software development and other things the third one is analytical thinking AI is fundamentally about problem solving you will need a strong analytical thinking skills to approach challenges logically break them down into manageable parts and develop Innovative solution the fourth one problem solving AI projects frequently involve an unexpected challenges whether it’s a technical bug or an unforeseen data issue strong problems solving will help you navigate these hurdles and key projects on TR so building these skills can be achieved through various methods including selfstudy online courses boot camps or formal education additionally working on real projects contributing to open source CI initiatives and seeking mentorship can provide practical experience and further enhance your expertise so next thing is learn Advanced topics so as you advanc in your machine Learning Journey it is important to delve into more advanced topics these areas will deepen your understandings and help you tackle complex problem so some key topics to focus are the first one is deep learning and neural network the second thing is enable learning techniques the third thing is generative models and aders learning fourth one is recommendation system and collaborative filtering the fifth one is time series analysis and forecasting so now let’s move forward and see some machine learning projects so working on real world projects to apply your knowledge focus on data collection and preparation caps project in image recognition and NLP predictive modeling and anomal detection practical experience key to solidifying your skills so now let’s move forward and see what is the next skill that is on a certification so if you are already hold on undergraduate degree in a field of related to AI enrolling in specialized course to enhance your technical skills can be highly beneficial even if you don’t have a degree earning certification can show potential employers that you are committed to your career goals and actively investing in your professional development so you can unleash your career potential with our artificial intelligence and machine learning courses tailor for diverse Industries and roles at top Global forms a program features key tools enhance your AI knowledge and business equipment join the job market and become soft after profession the next thing is continuous learning and exploration so stay updated with the latest development by following industry leaders engaging in online committees and working on one person project pursue Advanced learning through courses and certification to keep your skills sharp so now let’s move forward and see some AI career opportunities with salary so the job market for machine learning professional is booming the average annual salary for AI Engineers can be very based on location experience and Company so here are some roles like machine learning engineer data scientist NLP engineer compion and AI ml researcher so now let’s see how much they earn so the first one is ml engineer so machine learning Engineers earn $153,000 in us and 11 lakh in India perom the second one is Data Centers the data sent is earn $150,000 in us and 12 lakh perom in India the third one is NLP engineer they earn $117,000 in us and 7 lakh in India per anom fourth one is compter Vision engineer CV engineer they earn around $126,000 in us and 650,000 in India the last one is AIML researcher they earn $130,000 in us and in India they earn around 9 lakh per anom so note that these figures can vary on website to website and changes frequently so now last step is start applying for entry-level jobs when you feel confident in your training begin researching and applying for jobs many entry-level AI positions like software engineer or developer roles are often labeled as entry level or Junior in the job description jobs that require less than 3 years of experience are usually suitable for those Jud starting out if you need additional support in your job research consider applying for internship taking on freelance project or participating in hackathons to further hor your skills so these opportunities not only provide valuable feedback on your work but also help you build connection that could benefit your career in the future so with this we have come to end of this video if you have any question or doubt please feel free to ask in the comment section below our team of experts will help you as soon as possible AI will pretty much touch everything we do it’s more likely to be correct and grounded in reality talk to the AI about how to do better it’s a very deep philosophical conversation it’s a bit above my f grade I’m going to say something and it it’s it’s going to sound completely opposite um of what people feel uh you you you probably recall uh over the course of the last 10 years 15 years um almost everybody who sits on a stage like this would tell you it is vital that your children learn computer science um everybody should learn how to program and in fact it’s almost exactly the opposite it is our job to create Computing technology such that nobody has to program and that the programming language it’s human everybody in the world is now a programmer this is the miracle artificial intelligence or AI from its humble beginnings in 1950s AI has evolved from the simple problem solving and symbolic reasoning to the advanced machine learning and deep learning techniques that power some of the most Innovative application we see today so AI is not just a bus word it is a revolutionary Force reshaping Industries enhancing daily life and creating unmatched opportunities across various sector AI is changing numerous fields in healthcare it aids in early disease diagnosis and personalized treatment plans in finance it transform money management with the robo advisors and fraud detection system the automotive industry is seeing the rise of autonomous vehicles that navigate traffic and recognize obstacle while retail and e-commerce benefit from personalized shopping experience and optimized Supply Chain management so one of the most exciting developments in the AI is the rise of advanced C tools like chgb 40 Google Gemini and generative models so these tools represent The Pinacle of conversational AI capable of understanding and generating humanik text with remarkable accuracy chgb for can assist in writing brainstorming ideas and even tutoring make its valuable resource for student professional and creatives similarly Google Gemini take AI integration to the next level enhancing search capabilities providing insightful responses and integrating seamlessly into our digal lives generative AI is a subset of AI is also making views by creating new content from scratch tools like Dal which generates images from textual reception and gpt3 which can write coherent and creative text are just the beginnings so these Technologies are changing Fields like art design and content creation enabling the generation of unique and personal outputs that were previously unimaginable so beyond specific Industries AI application extend to everyday’s life voice activated assistant like Siri and Alexa and smart home devices learn our preferences and adjust our environments accordingly so AI is embedded in the technology we use daily making our lives more convenient connected and efficient so join us as we explore the future of AI examining the breakthroughs the challenges and the endless possibilities that lies ahead so whether you are a tech enthusiast a professional in the field or simply curious about wor next so this video will provide you with a comprehensive look at how AI is shaping our world and what we can expect in the years to come so before we move forward as we know chb Gemini generi tools is an AI based and if you want to learn how these School AI develop and want to create your own so without any further ad do let’s get started so how AI will impact the future the first is enhanced business automation AI is transforming business automation with 55% of organization adopting AI technology chatbots and digital assistant handle customer interaction and basic employee inquiries speeding up decision making the second thing is job disruption automation May display job with a one third to takes potentially automated while roles like Securities are at risk demand for machine learning specialist is rising AI is more like likely to augment skilled and creative positions emphasizing the need for up Skilling data privacy issues training AI model requires large data set raising privacy concern the FTC is investigating open AI for potential violation and the Biden haris Administration introduced an AI bill of right to promote data transparency the fourth one is increased regulation AI impact on intellectual property and ethical concerns is leading to increase regulation lawsuits and government guidelines on responsible AI use could reshape the industry climate change concern AI optimize Supply chains and reduce emission but the energy needed for the AI models may increase carbon emission potentially negating environmental benefits so understanding these impacts help us to prepare for ai’s future challenges and opportunities so now let’s see what industries will AI impact the most the first one is manufacturing AI enhances manufacturing with robotic arm and predictive sensors improving tasks like assembly and equipment and maintenance the second is healthare AI changes healthare by quickly identifying diseases streamlining drug Discovery and monitoring patients through virtual nursing assistant The Third One Finance AI helps bank and financial institution detect fraud conduct Audits and assess loan applications while Trader use AI for risk assessment and smart investment decision the fourth one education AI personalizes education by digitizing textbook deducting plagarism and analyzing student emotions to tailor learning experience the fifth one customer service AI power chatbots and virtual assistant provide data D insights enhancing customer services interaction so these industries are experiencing significant changes due to AI driving Innovation and efficiency across various sectors so now let’s move forward and see some risk and danger of AI so AI offers many benefits but also possess significant risk the first one job loss from 2023 to 2028 44% of worker skills will be disrupted without upskilling AI could lead to higher unemployment and fewer opportunities for marginalized groups the second one is human biases AI often reflect the biases of its trainers such as facial recognition favoring lighter skin tones unchecked biases can perpetuate social inequalities the third one defects and misinformation defects plus reality spreading misinformation with dangerous consequences they can be used for political propaganda financial fraud and compromising reputation the fourth one data privacy AI training on public data risk breaches that expose a personal information a 2024 Cisco survey found 48% of businesses use non-public information in AI tools with 69 concerned about intellectual property and legal rights breaches could expose million of consumers data the fifth one automated weapons AI in automated weapon fails to distinguish between Soldier and civilization posing savior threats misuse could lead endangered large population understanding these risk is crucial for responsible AI development and the use so as we explore the future of AI it’s clear that impact will be profound and far-reaching AI will change Industries and enhance efficiency and drive Innovation however it also brings significant challenges including job displacement biases privacy concern misinformation and the ethical implication of automated weapons so to harness AI potential responsibility we must invest in upscaling our Workforce address biases in AI system protect data privacy and develop regulations that ensure ethical AI use we’ve looked at a lot of examples of machine learning so let’s see if we can give a little bit more of a concrete definition what is machine learning machine learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed we see here we have a nice little diagram where we have our ordinary system uh your computer nowadays you can even run a lot of this stuff on a cell phone because cell phones advance so much and then with artificial intelligence and machine learning it now takes the data and it learns from what happened before and then it predicts what’s going to come next and then really the biggest part right now in machine learning that’s going on is it improves on that how do we find a new solution so we go from descriptive where it’s learning about stuff and understanding how it fits together to predicting what it’s going to do to post scripting coming up with a new solution and when we’re working on machine learning there’s a number of different diagrams that people have posted for what steps to go through a lot of it might be very domain specific so if you’re working working on Photo identification versus language versus medical or physics some of these are switched around a little bit or new things are put in they’re very specific to The Domain this is kind of a very general diagram first you want to Define your objective very important to know what it is you’re wanting to predict then you’re going to be collecting the data so once you’ve defined an objective you need to collect the data that matches you spend a lot of time in data science collecting data and the next step preparing the data you got to make sure that data is clean going in there’s the old saying bad data in bad answer out or bad data out and then once you’ve gone through and we’ve cleaned all this stuff coming in then you’re going to select the algorithm which algorithm are you going to use you’re going to train that algorithm in this case I think we’re going to be working with svm the support Vector machine then you have to test the model does this model work is this a valid model for what we’re doing and then once you’ve tested it you want to run your prediction you want to run your prediction or your choice or whatever output it’s going to come up with and then once everything is set and you’ve done lots of testing then you want to go ahead and deploy the model and remember I said domain specific this is very general as far as the scope of doing something a lot of models you get halfway through and you realize that your data is missing something and you have to go collect new data because you’ve run a test in here someplace along the line you’re saying hey I’m not really getting the answers I need so there’s a lot of things that are domain specific that become part of this model this is a very general model but it’s a very good model to start with and we do have some basic divisions of what machine learning does that’s important to know for instance do you want to predict a category well if you’re categorizing thing that’s classification for instance whether the stock price will increase or decrease so in other words I’m looking for a yes no answer is it going up or is it going down and in that case we’d actually say is it going up true if it’s not going up it’s false meaning it’s going down this way it’s a yes no 01 do you want to predict a quantity that’s regression so remember we just did classification now we’re looking at regression these are the two major divisions in what data is doing for instance predicting the age of a person based on the height weight health and other factors So based on these different factors you might guess how old a person is and then there are a lot of domain specific things like do you want to detect an anomaly that’s anomaly detection this is actually very popular right now for instance you want to detect money withdrawal anom Ames you want to know when someone’s making a withdrawal that might not be their own account we’ve actually brought this up because this is really big right now if you’re predicting the stock whether to buy stock or not you want to be able to know if what’s going on in the stock market is an anomaly use a different prediction model because something else is going on you got to pull out new information in there or is this just the norm I’m going to get my normal return on my money invested so being able to detect anomalies is very big in data science these days another question that comes up which is on what we call untrained data is do you want to discover structure in unexplored data and that’s called clustering for instance finding groups of customers with similar Behavior given a large database of customer data containing their demographics and past buying records and in this case we might notice that anybody who’s wearing certain set of shoes goes shopping at certain stores or whatever it is they’re going to make certain purchases by having that information it helps us to Market or group people together so then we can now explore that group and find out what it is we want to Market to them if you’re in the marketing world and that might also work in just about any Arena you might want to group people together whether they’re uh based on their different areas and Investments and financial background whether you’re going to give them a loan or not before you even start looking at whether they’re valid customer for the bank you might want to look at all these different areas and group them together based on unknown data so you’re not you don’t know what the data is going to tell you but you want to Cluster people together that come together let’s take a quick DeTour for quiz time oh my favorite so we’re going to have a couple questions here under our quiz time and um we’ll be posting the answers in the part two of this tutorial so let’s go ahead and take a look at these quiz times questions and hopefully you’ll get them all right it’ll get you thinking about how to process data and what’s going on can you tell what’s happening in the following cases of course you’re sitting there with your cup of coffee you have your check box and your pen trying to figure out what’s your next step in your data science analysis so the first one is grouping documents into different categories based on the topic and content of each document very big these days you know you have legal documents you have uh maybe it’s a Sports Group documents maybe you’re analyzing newspaper postings but certainly having that automated is a huge thing in today’s world B identifying handwritten digits in images correctly so we want to know whether uh they’re writing an A or capital A B C what are they writing out in their hand digit their handwriting C behavior of a website indicating that the site is not working as designed D predicting salary of an individual based on his or her years of experience HR hiring uh setup there so stay tuned for part two we’ll go ahead and answer these questions when we get to the part two of this tutorial or you can just simply write at the bottom and send a note to Simply learn and they’ll follow up with you on it back to our regular content and these last few bring us into the next topic which is another way of dividing our types of machine learning and that is with supervised unsupervised and reinforcement learning supervised learning is a method used to enable machines to classify predict objects problems or situations based on labeled data fed to the machine and in here you see see we have a jumble of data with circles triangles and squares and we label them we have what’s a circle what’s a triangle what’s a square we have our model training and it trains it so we know the answer very important when you’re doing supervised learning you already know the answer to a lot of your information coming in so you have a huge group of data coming in and then you have new data coming in so we’ve trained our model the model now knows the difference between a circle a square a triangle and now that we’ve trained it we can send in in this case a square and a circle goes in and it predicts that the top one’s a square and the next one’s a circle and you can see that this is uh being able to predict whether someone’s going to default on a loan because I was talking about Banks earlier supervised learning on stock market whether you’re going to make money or not that’s always important and if you are looking to make a fortune on the stock market keep in mind it is very difficult to get all the data correct on the stock market it is very it fluctuates in ways you really hard to predict so it’s quite a a roller coaster ride if you’re running machine learning on the stock market you start realizing you really have to dig for new data so we have supervised learning and if you have supervised we should need unsupervised learning in unsupervised learning machine learning model finds the hidden pattern in an unlabeled data so in this case instead of telling it what the circle is and what a triangle is and what a square is it goes in there looks at them and says for whatever reason it groups them together maybe it’ll group it by the number of corners and it notices that a number of them all have three corners a number of them all have four corners and a number of them all have no corners and it’s able to filter those through and group them together we talked about that earlier with looking at a group of people who are out shopping we want to group them together to find out what they have in common and of course once you understand what people have in common maybe you have one of them who’s a customer at your store or you have five of them are customer at your store and they have a lot in common with five others who are not customers at your store how do you Market to those five who aren’t customers at your store yet they fit the demograph of who’s going to shop there and you’d like them to shop at your store not the one next door of course this is a simplified version you can see very easily the difference between a triangle and a circle which is might not be so easy in marketing reinforcement learning reinforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result and we have here where the in this case a baby it’s actually great that they used an infant for this slide because the reinforcement learning is very much in infant stages but it’s also probably the biggest machine learning demand out there right now or in the future it’s going to be coming up over the next few years is reinforcement learning and how to make that work for us and you can see here where we have our action in the action in this one it goes into the fire hopefully the baby didn’t it was just a little candle not a giant fire pit like it looks like here when the baby comes out and the new state is the baby is sad and crying because they got burned on the fire and then maybe they take another action the baby’s called the legent cuz it’s the one taking the actions and in this case they didn’t go into the fire they went a different direction and now the baby’s happy and laughing and playing reinforcement learning is very easy to understand because that’s how as humans that’s one of the ways we learn we learn whether it is you know you burn yourself on the stove don’t do that anymore don’t touch the stove in the big picture being able to have machine learning program or an AI be able to do this is huge because now we’re starting to learn how to learn that’s a big jump in the world of computer and machine learning and we’re going to go back and just kind of go back over supervised versus unsupervised learning understanding this is huge because this is going to come up in any project you’re working on we have in supervised learning we have labeled data we have direct feedback so someone’s already gone in there and said yes that’s a triangle no that’s not a triangle and then you predicted outcome so you have a nice prediction this is this this new set of data is coming in and we know what it’s going to be and then with unsupervised Trading it’s not labeled so we really don’t know what it is there’s no feedback so we’re not telling it whether it’s right or wrong we’re not telling it whether it’s a triangle or a square we’re not telling it to go left or right all we do is we’re finding hidden structure in the data grouping the data together to find out what connects to each other and then you can use these together so imagine you have an image and you’re not sure what you’re looking for so you go in and you have the unstructured data find all these things that are connected together and then somebody looks at those and labels them now you can take that label data and program something to predict what’s in the picture so you can see how they go back and forth and you can start connecting all these different tools together to make a bigger picture there are many interesting machine learning algorithms let’s have a look at a few of them hopefully this give you a little flavor of what’s out there and these are some of the most important ones that are currently being used we’ll take a look at linear regression decision tree and the support vector machine let’s start with a closer look at linear regression linear regression is perhaps one of the most well-known and well understood algorithms in statistics and machine learning linear regression is a linear model for example a model that assumes a linear relationship between the input variables X and the single output variable Y and you’ll see this if you remember from your algebra classes y equals mx + C imagine we are predicting distance traveled y from speed X our linear regression model representation for this problem would be y = m * x + C or distance = M * speed plus C where m is the coefficient and C is the Y intercept and we’re going to look at two different variations of this first we’re going to start with time is constant and you can see we have a bicyclist he’s got a safety gear on thank goodness speed equals 10 m/ second and so over a certain amount of time his distance equals 36 km we have a second bicyclist is going twice the speed or 20 m/ second and you can guess if he’s going twice the speed and time is a constant then he’s going to go twice the distance and that’s easily to compute 36 * 2 you get 72 kilm and so if you had the question of how fast with somebody’s going three times that speed or 30 m/ second is you can easily compute the distance in our head we can do that without needing a computer but we want to do this for more complicated data so it’s kind of nice to compare the two but’s just take a look at that and what that looks like in a graph so in a linear regression model we have our distance to the speed and we have our m equals the ve slope of the line and we’ll notice that the line has a plus slope and as speed increases distance also increases hence the variables have a positive relationship and so your speed of the person which equals yal MX plus C distance traveled in a fixed interval of time and we could very easily compute either following the line or just knowing it’s three times 10 m/s that this is roughly 102 km distance that this third bicep has traveled one of the key definitions on here is positive relationship so the slope of the line is positive as distance increase so does speed increase let’s take a look at our second example where we put distance is a constant so we have speed equals 10 m/ second they have a certain distance to go and it takes them 100 seconds to travel that distance and we have our second bicyclist who’s still doing 20 m per second since he’s going twice the speed we can guess that he’ll cover the distance in about half the time 50 seconds and of course you could probably guess on the third one 100 divided by 30 since he’s going three times the speed you could easily guess that this is 33333 seconds time we put that into a linear regression model or a graph if the distance is assumed to be constant let’s see the relationship between speed and time and as time goes up the amount of speed to go that same distance goes down so now m equals a minus ve slope of the line as the speed increases time decreases hence the variable has a negative relationship again there’s our definition positive relationship and negative relationship dependent on the slope of the line and with a simple formula like this um and even a significant amount of data Let’s uh see with the mathematical implementation of linear regression and we’ll take this data so suppose we have this data set where we have xyx = 1 2 3 4 5 standard series and the Y value is 3 22 43 when we take that and we go ahead and plot these points on a graph you can see there’s kind of a nice scattering and you could probably eyeball a line through the middle of it but we’re going to calculate that exact line for linear regression and the first thing we do is we come up here and we have the mean of XI and remember mean is basically the average so we added 5 + 4 + 3+ 2 + 1 and divide by five that simply comes out as three and then we’ll do the same for y we’ll go ahead and add up all those numbers and divide by five and we end up with the mean value of y of I equals 2.8 where the XI references it’s an average or means value and the Yi also equals a means value of y and when we plot that you’ll see that we can put in the Y = 2.8 and the xal 3 in there on our graph we kind of gave it a little different color so you can sort it out with the dash lines on it and it’s important to note that when we do the linear regression the linear regression model should go through that dot now let’s find our regression equation to find the best fit line remember we go ahead and take our yal MX plus C so we’re looking for M and C so to find this equation for our data we need to find our slope of M and our coefficient of c and we have y = mx + C where m equals the sum of x – x average * y – y aage or y means and X means over the sum of x – x means squared that’s how we get the slope of the value of the line and we can easily do that by creating some columns here we have XY computers are really good about iterating through data and so we can easily compute this and fill in a graph of data and in our graph you can easily see that if we have our x value of one and if you remember the XI or the means value was 3 1 – 3 = a -2 and 2 – 3 = a one so on and so forth and we can easily fill in the column of x – x i y – Yi and then from those we can compute x – x i^ 2 and x – x i * y – Yi and you can guess it that the next step is to go ahead and sum the different columns for the answers we need so we get a total of 10 for our x – x i^ 2 and a total of two for x – x i * y – Yi and we plug those in we get 2/10 which equals .2 so now we know the slope of our line equals 0.2 so we can calculate the value of c that’d be the next step is we need to know where crosses the y axis and if you remember I mentioned earlier that the linear regression line has to pass through the means value the one that we showed earlier we can just flip back up there to that graph and you can see right here there’s our means value which is 3 x = 3 and Y = 2.8 and since we know that value we can simply plug plug that into our formula y = 2x + C so we plug that in we get 2.8 = 2 * 3 + C and you can just solve for C so now we know that our coefficient equals 2.2 and once we have all that we can go ahead and plot our regression line Y = 2 * x + 2.2 and then from this equation we can compute new values so let’s predict the values of Y using x = 1 2 3 4 5 and plot the points remember the 1 2 3 4 5 was our original X values so now we’re going to see what y thinks they are not what they actually are and when we plug those in we get y of designated with Y of P you can see that x = 1 = 2.4 x = 2 = 2.6 and so on and so on so we have our y predicted values of what we think it’s going to be when we plug those numbers in and when we plot the predicted values along with the actual values we can see the difference and this is one of the things is very important with linear aggression in any of these models is to understand the error and so we can calculate the error on all of our different values and you can see over here we plotted um X and Y and Y predict and we drawn a little line so you can sort of see what the error looks like there between the different points so our goal is to reduce this error we want to minimize that error value on our linear regression model minimizing the distance there are lots of ways to minimize the distance between the line and the data points like of squared errors sum of absolute errors root mean square error Etc we keep moving this line through the data points to make sure the best fit line has the least Square distance between the data points and the regression line so to recap with a very simple linear regression model we first figure out the formula of our line through the middle and then we slowly adjust the line to minimize the error keep in mind this is a very simple formula the math gets even though the math is very much the same it gets much more complex as we add in different dimensions so this is only two Dimensions y = mx + C but you can take that out to X Z ijq all the different features in there and they can plot a linear regression model on all of those using the different formulas to minimize the error let’s go ahead and take a look at decision trees a very different way to solve problems in the linear regression model decision tree is a tree-shaped algorithm used to determine a course of action each branch of a tree represents a possible decision occurrence or reaction we have data which tells us if it is a good day to play golf and if we were to open this data up in a general spreadsheet you can see we have the Outlook whether it’s a rainy overcast Sunny temperature hot mild cool humidity windy and did I like to play golf that day yes or no so we’re taking a census and certainly I wouldn’t want a computer telling me when I should go play golf or not but you could imagine if you got up in the night before you’re trying to plan your day and it comes up and says tomorrow would be a good day for golf for you in the morning and not a good day in the afternoon or something like that this becomes very beneficial and we see this in a lot of applications coming out now where it gives you suggestions and lets you know what what would uh fit the match for you for the next day or the next purchase or the next uh whatever you know next mail out in this case is tomorrow a good day for playing golf based on the weather coming in and so we come up and let’s uh determine if you should play golf when the day is sunny and windy so we found out the forecast tomorrow is going to be sunny and windy and suppose we draw our tree like this we’re going to have our humidity and then we have our normal which is if it’s if you have a normal humidity you’re going to go play golf and if the humidity is really high then we look at the Outlook and if the Outlook is sunny overcast or rainy it’s going to change what you choose to do so if you know that it’s a very high humidity and it’s sunny you’re probably not going to play golf cuz you’re going to be out there miserable fighting the mosquitoes that are out joining you to play golf with you maybe if it’s rainy you probably don’t want to play in the rain but if it’s slightly overcast and you get just the right Shadow that’s a good day to play golf and be outside out on the green now in this example you can probably make your own tree pretty easily because it’s a very simple set of data going in but the question is how do you know what to split where do you split your data what if this is much more complicated data where it’s not something that you would particularly understand like studying cancer they take about 36 measurements of the cancerous cells and then each one of those measurements represents how bulbous it is how extended it is how sharp the edges are something that as a human we would have no understanding of so how do we decide how to split that data up and is that the right decision tree but so that’s the question is going to come up is this the right decision tree for that we should calculate entropy and Information Gain two important vocabulary words there are the entropy and the information gain entropy entropy is a measure of Randomness or impurity in the data set entropy should be low so we want the chaos to be as low as possible we don’t want to look at it and be confused by the images or what’s going on there with mixed data and the Information Gain it is the measure of decrease in entropy after the data set is split also known as entropy reduction Information Gain should be high so we want our information that we get out of the split to be as high as possible possible let’s take a look at entropy from the mathematical side in this case we’re going to denote entropy as I of P of and N where p is the probability that you’re going to play a game of golf and N is the probability where you’re not going to play the game of golf now you don’t really have to memorize these formulas there’s a few of them out there depending on what you’re working with but it’s important to note that this is where this formula is coming from so when you see it you’re not lost when you’re running your programming unless you’re building your own decision tree code in the back and we simply have a log s of P Over p+ N minus n/ P plus n * the log s of n of p plus n but let’s break that down and see what actually looks like when we’re Computing that from the computer script side entropy of a target class of the data set is the whole entropy so we have entropy play golf and we look at this if we go back to the data you can simply count how many yeses and no in our complete data set for playing golf days in our complete set we find we have five days we did play golf and N9 days we did not play golf and so our I equals if you add those together 9 + 5 is 14 and so our I equals 5 over 14 and 9 over 14 that’s our P andn values that we plug into that formula and you can go 5 over 14 = 36 9 over 14 = 64 and when you do the whole equation you get the minus. 36 logun SAR of .36 -64 log s < TK of 64 and we get a set value we get .94 so we now have a full entropy value for the whole set of data that we’re working with and we want to make that entropy go down and just like we calculated the entropy out for the whole set we can also calculate entropy for playing golf in the Outlook is it going to be overcast or rainy or sunny and so we look at the entropy we have P of Sunny time e of 3 of 2 and that just comes out how many sunny days yes and how many sunny days no over the total which is five don’t forget to put the we’ll divide that five out later on equals P overcast = 4 comma 0 plus rainy = 2 comma 3 and then when you do the whole setup we have 5 over 14 remember I said there was a total of five 5 over 14 * the I of3 of 2 + 4 over 14 * the 4 comma 0 and 514 over I 23 and so we can now compute the entropy of just the part it has to do with the forecast and we get 693 similarly we can calculate the entropy of other predictors like temperature humidity and wind and so we look at the gain Outlook how much are we going to gain from this entropy play golf minus entropy play golf Outlook and we can take the original 0.94 for the whole set minus the entropy of just

the um rainy day in temperature and we end up with a gain of. 247 so this is our Information Gain remember we Define entropy and we Define Information Gain the higher the information gain the lower the entropy the better the information gain of the other three attributes can be calculated in the same way so we have our gain for temperature equals 0.029 we have our gain for humidity equals 0.152 and our gain for a windy day equals 048 and if you do a quick comparison you’ll see the. 247 is the greatest gain of information so that’s the split we want now let’s build the decision tree so we have the Outlook is it going to be sunny overcast or rainy that’s our first split because that gives us the most Information Gain and we can continue to go down the tree using the different information gains with the largest information we can continue down the nodes of the tree where we choose the attribute with the largest Information Gain as the root node and then continue to split each sub node with the largest Information Gain that we can compute and although it’s a little bit of a tongue twister to say all that you can see that it’s a very easy to view visual model we have our Outlook we split it three different directions if the Outlook is overcast we’re going to play and then we can split those further down if we want so if the over Outlook is sunny but then it’s also windy if it’s uh windy we’re not going to play if it’s uh not windy we’ll play so we can easily build a nice decision tree to guess what we would like to do tomorrow and give us a nice recommendation for the day so so we want to know if it’s a good day to play golf when it’s sunny and windy remember the original question that came out tomorrow’s weather report is sunny and windy you can see by going down the tree we go Outlook Sunny Outlook windy we’re not going to play golf tomorrow so our little Smartwatch pops up and says I’m sorry tomorrow is not a good day for golf it’s going to be sunny and windy and if you’re a huge golf fan you might go uhoh it’s not a good day to play golf we can go in and watch a golf game at home so we’ll sit in front of the TV instead of being out playing golf in the Wind now that we looked at our decision tree let’s look at the third one of our algorithms we’re investigating support Vector machine support Vector machine is a widely used classification algorithm the idea of support Vector machine is simple the algorithm creates a separation line which divides the classes in the best possible manner for example dog or cat disease or no disease suppose we have a labeled sample data which tells height and weight of males and females a new data point arrives and we want to know whether it’s going to be a male or a female so we start by drawing a line we draw decision lines but if we consider decision line one then we will classify the individual as a male and if we consider decision line two then it will be a female so you can see this person kind of lies in the middle of the two group so it’s a little confusing trying to figure out which line they should be under we need to know which line divides the classes correctly but how the goal is to choose a hyperplan and that is one of the key words they use when we talk about support Vector machines choose a hyperplane with the greatest possible margin between the decision line and the nearest Point within the training set so you can see here we have our support Vector we have the two nearest points to it and we draw a line between those two points and the distance margin is the distance between the hyperplane and the nearest data point from either set so we actually have a value and it should be equal lead distant between the two um points that we’re comparing it to when we draw the hyperplanes we observe that line one has a maximum distance so we observe that line one has a maximum distance margin so we’ll classify the new data point correctly and our result on this one is going to be that the new data point is Mel one of the reasons we call it a hyperplane versus a line is that a lot of times we’re not looking at just weight and height we might be looking at 36 different features or dimensions and so when we cut it with a hyper plane it’s more of a three-dimensional cut in the data multi dimensional it cuts the data a certain way and each plane continues to cut it down until we get the best fit or match let’s understand this with the help of an example problem statement I always start with a problem statement when you’re going to put some code together we’re going to do some coding now classifying muffin and cupcake recipes using support Vector machines so the cupcake versus the muffin let’s have a look at our data set and we have the different recipes here we have a muffin recipe that has so much flour I’m not sure what measurement 50 5 is in but it has 55 maybe it’s ounces but it has certain amount of flour certain amount of milk sugar butter egg baking powder vanilla and salt and So based on these measurements we want to guess whether we’re making a muffin or a cupcake and you can see in this one we don’t have just two features we don’t just have height and weight as we did before between the male and female in here we have a number of features in fact in this we’re looking at eight different features to guess whether it’s a muffin or a cupcake what’s the difference between a muffin and a cupcake turns out muffins have more flour while cupcakes have more butter and sugar so basically the cupcakes a little bit more of a dessert where the muffins a little bit more of a fancy bread but how do we do that in Python how do we code that to go through recipes and figure out what the recipe is and I really just want to say cupcakes versus muffins like some big professional wrestling thing before we start in our cupcakes versus muffins we are going to be working in Python there’s many versions of python many different editors that is one of the strengths and weaknesses of python is it just has so much stuff attached to it and it’s one of the more popular data science programming packages you can use in this case we’re going to go ahead and use anaconda and Jupiter notebook the Anaconda Navigator has all kinds of fun tools once you’re into the anacon Navigator you can change environments I actually have number of environments on here we’ll be using python 36 environment so this is in Python version 36 although it doesn’t matter too much which version you use I usually try to stay with the 3x because they’re current unless you have a project that’s very specifically in version 2x 27 I think is usually what most people use in the version two and then once we’re in our um Jupiter notebook editor I can go up and create a new file and we’ll just jump in here in this case we’re doing spvm muffin versus Cupcake and then let’s start with our packages for data analysis and we almost always use a couple there’s a few very standard packages we use we use import oops import import numpy that’s for number python they usually denoted as NP that’s very commona that’s very common and then we’re going to import pandas as PD and numpy deals with number arrays there’s a lot of cool things you can do with the numpy uh setup as far as multiplying all the values in an array in an numpy array data array pandas I can’t remember if we’re using it actually in this data set I think we do as an import it makes a nice data frame and the difference between a data frame and a nump array is that a data frame is more like your Excel spreadsheet you have columns you have indexes so you have different ways of referencing it easily viewing it and there’s additional features you can run on a data frame and pandas kind of sits on numpy so they you need them both in there and then finally we’re working with the support Vector machine so from sklearn we’re going to use the sklearn model import svm support Vector machine and then as a data scientist you should always try to visualize your data some data obviously is too complicated or doesn’t make any sense to the human but if it’s possible it’s good to take a second look at it so that you can actually see what you’re doing now for that we’re going to use two packages we’re going to import matplot library. pyplot as PLT again very common and we’re going to import caborn as SNS and we’ll go ahead and set the font scale in the SNS right in our import line that’s with this um semi colon followed by a line of data we’re going to set the SNS and these are great because the the C born sits on top of matap plot Library just like Panda sits on numpy so it adds a lot more features and uses and control we’re obviously not going to get into matplot library and caborn that’ be its own tutorial we’re really just focusing on the svm the support Vector machine from sklearn and since we’re in Jupiter notebook uh we have to add a special line in here for our matplot library and that’s your percentage sign or Amber sign matplot Library in line now if you’re doing this in just a straight code Project A lot of times I use like notepad++ and I’ll run it from there you don’t have to have that line in there because it’ll just pop up as its own window on your computer depending on how your computer set up because we’re running this in the jupyter notebook as a browser setup this tells it to display all of our Graphics right below on the page so that’s what that line is for remember the first time I ran this I didn’t know that and I had to go look that up years ago it’s quite a headache so map plot library in line is just because we’re running this on the web setup and we can go ahead and run this make sure all our modules are in they’re all imported which is great if you don’t have them import you’ll need to go ahead and pip use the PIP or however you do it there’s a lot of other install packages out there although pip is the most common and you have to make sure these are all installed on your python setup the next step of course is we got to look at the data can’t run a model for predicting dat data if you don’t have actual data so to do that let me go ahe and open this up and take a look and we have our uh cupcakes versus muffins and it’s a CSV file or CSV meaning that it’s comma separated variable and it’s going to open it up in a nice uh spread sheet for me and you can see up here we have the type we have muffin muffin muffin cupcake cupcake cupcake and then it’s broken up into flour milk sugar butter egg baking powder vanilla and salt so we can do is we can go ahead and look at the this data also in our python let us create a variable recipes equals we’re going to use our pandas module. read CSV remember was a comma separated variable and the file name happened to be cupcakes versus muffins oops I got double brackets there do it this way there we go cupcakes versus muffins because the program I loaded or the the place I saved this particular Python program is in the same folder we can get by with just the file name but remember if you’re storing it in a different location you have to also put down the full path on there and then because we’re in pandas we’re going to go ahead and you can actually in line you can do this but let me do the full print you can just type in recipes. head in the Jupiter notebook but if you’re running in code in a different script you need to go ahead and type out the whole print recipes. head and Panda’s knows that that’s going to do the first five lines of data and if we flip back on over to the spreadsheet where we opened up our CSV file uh you can see where it starts on line two this one calls it zero and then 2 34 5 6 is going to match go and close that out cuz we don’t need that anymore and it always starts at zero and these are it automatically indexes it since we didn’t tell it to use an index in here so that’s the index number for the leand side and it automatically took the top row as uh labels so Panda’s using it to read a CSV is just really slick and fast one of the reasons we love our pandas not just because they’re cute and cuddly teddy bears and let’s go ahead and plot our data and I’m not going to plot all of it I’m just going to plot the uh sugar and flour now obviously you can see where they get really complicated if we have tons of different features and so you’ll break them up and maybe look at just two of them at a time to see how they connect and to plot them we’re going to go ahead and use Seaborn so that’s our SNS and the command for that is SNS dolm plot and then the two different variables I’m going to plot is flour and sugar data equals recipes the Hue equals type and this is a lot of fun because it knows that this is pandas coming in so this is one of the powerful things about pandas mixed with Seaborn and doing graphing and then we’re going to use a pallet set one there’s a lot of different sets in there you can go look them up for Seaborn or do a regular a fit regular equals false so we’re not really trying to fit anything and it’s a scatter kws a lot of these settings you can look up in Seaborn half of these you could probably leave off when you run them somebody played with this and found out that these were the best settings for doing a Seaborn plot let’s go ahead and run that and because it does it in line it just puts it right on the page and you can see right here that just based on sugar and flour alone there’s a definite split and we use these models because you can actually look at it and say hey if I drew a line right between the middle of the blue dots and the red dots we’d be able to do an svm and and a hyperplane right there in the middle then the next step is to format or pre process our data and we’re going to break that up into two parts we need to type label and remember we’re going to decide whether it’s a muffin or cupcake well a computer doesn’t know muffin or cupcake it knows zero and one so what we’re going to do is we’re going to create a type label and from this we’ll create a numpy array andp where and this is where we can do some logic we take our recipes from our Panda and wherever type equals muffin it’s going to be zero and then if it doesn’t equal muffin which is cupcakes is going to be one so we create our type label this is the enser so when we’re doing our training model remember we have to have a a training data this is what we’re going to train it with is that it’s zero or one it’s a muffin or it’s not and then we’re going to create our recipe features and if you remember correctly from right up here the First Column is type so we really don’t need the type column that’s our muffin or cupcake and in pandas we can easily sort that out we take our value recipes dot columns that’s a pandas function built into pandas do values converting them to values so it’s just the column titles going across the top and we don’t want the first one so what we do is since it’s always starts at zero we want one colon till the end and then we want to go ahead and make this a list and this converts it to a list of strings and then we can go ahead and just take a look and see what we’re looking at for the features make sure it looks right let me go ahead and run that and I forgot the S on recipes so we’ll go ahead and add the s in there and then run that and we can see we have flour milk sugar butter egg baking powder vanilla and salt and that matches what we have up here right where we printed out everything but the type so we have our features and we have our label Now the recipe features is just the titles of the columns and we actually need the ingredients and at this point we have a couple options one we could run it over all the ingredients and when you’re doing this usually you do but for our example we want to limit it so you can easily see what’s going on because if we did all the ingredients we have you know that’s what um seven eight different hyperplanes that would be built into it we only want to look at one so you can see what the svm is doing and so we’ll take our recipes and we’ll do just flour and sugar again you can replace that with your recipe features and do all of them but we’re going to do just flour and sugar and we’re going to convert that to values we don’t need to make a list out of it because it’s not string values these are actual values on there and we can go ahead and just print ingredients you can see what that looks like uh and so we have just the N of flour and sugar just the two sets of plots and just for fun let’s go ahead and take this over here and take our recipe features and so if we decided to use all the recipe features you’ll see that it makes a nice column of different data so it just strips out all the labels and everything we just have just the values but because we want to be able to view this easily in a plot later on look go ahead and take that and just do flour and sugar and we’ll run that and you’ll see it’s just the two columns so the next step is to go ahead and fit our model we’ll go and just call it model and it’s a svm we’re using a package called SVC in this case we’re going to go ahead and set the kernel equals linear so it’s using a specific setup on there and if we go to the reference on their website for the svm you’ll see that there’s about there’s eight of them here three of them are for regression three are for classification the SVC support Vector classification is probably one of the most commonly used and then there’s also one for detecting outliers and another one that has to do with something a little bit more specific on the model but SVC and SBR are the two most commonly used standing for support vector classifier and support vector regr regression remember regression is an actual value a float value or whatever you’re trying to work on and SBC is a classifier so it’s a yes no true false but for this we want to know 01 muffin cupcake we go ahead and create our model and once we have our model created we’re going to do model.fit and this is very common especially in the sklearn all their models are followed with the fit command and what we put into the fit what we’re training with it is we’re putting in the ingredients which in this case we limited to just flour and sugar and the type label is it a muffin or a cupcake now in more complicated data science series you’d want to split into we won’t get into that today we split it into training data and test data and they even do something where they split it into thirds where a third is used for where you switch between which one’s training and test there’s all kinds of things go into that and gets very complicated when you get to the higher end not overly complicated just an extra step which we’re not going to do today because this is a very simple set of data and let’s go ahead and run this and now we have our model fit and I got a error here so let me fix that real quick it’s Capital SVC it turns out I did it lowercase support Vector classifier there we go let’s go ahead and run that and you’ll see it comes up with all this information that it prints out automatically these are the defaults of the model you notice that we changed the kernel to linear and there’s our kernel linear on the printout and there’s other different settings you can mess with we’re going to just leave that alone for right now for this we don’t really need to mess with any of those so next we’re going to dig a little bit into our newly trained model and we’re going to do this so we can show you on a graph and let’s go ahead and get the separating we’re going to say we’re going to use a W for our variable on here we’re going to do model. coefficient 0 so what the heck is that again we’re digging into the model so we’ve already got a prediction and a train this is a math behind it that we’re looking at right now and so the W is going to represent two different coefficients and if you remember we have y = mx + C so these coefficients are connected to that but in two-dimensional it’s a plane we don’t want to spend too much time on this because you can get lost in the confusion of the math so if you’re a math Wiz this is great you can go through here and you’ll see that we have AAL minus W 0 over W of 1 remember there’s two different values there and that’s basically the slope that we’re generating and then we’re going to build an XX what is XX we’re going to set it up to a numpy array there’s our NP doline space so we’re creating a line of values between 30 and 60 so it just creates a set of numbers for x and then if you remember correctly we have our formula y equal the slope X X Plus The Intercept well to make this work we can do this as y y equals the slope times each value in that array that’s the neat thing about numpy so when I do a * XX which is a whole numpy array of values it multiplies a across all of them and then it takes those same values and we subtract the model intercept that’s your uh we had MX plus C so that’d be the C from the formula y = mx plus C and that’s where all these numbers come from a little bit confusing cuz digging out of these different arrays and then we want to do is we’re going to take this and we’re going to go ahead and plot it so plot the parallels to separating hyper plane that pass through the support vectors and so we’re going to create b equals a model support vectors pulling our support vectors out there here’s our YY which we now know is a set of data and we have we’re going to create y y down = a * XX + B1 – A * B Z and then model support Vector B is going to be set that to a new value the minus one setup and YY up equals a * XX + B1 – A * B 0 and we can go ahead and just run this to load these variables up if you wanted to know understand a little bit more of what’s going on you can see if we print y y you just run that you can see it’s an array it’s this is a line it’s going to have in this case between 30 and 60 so there going to be 30 variables in here and the same thing with y y up y y down and we’ll we’ll plot those in just a minute on a graph so you can see what those look like we just go ahead and delete that out of here and run that so it loads up the variables nice clean slate I’m just going to copy this from before remember this our SNS or Seaborn plot LM plot flow sugar and I’ll just go and run that real quick so can see what remember what that looks like it’s just a straight graph on there and then one of the new things is because caborn sits on top of Pi plot we can do the P plot for the line going through and that is simply PLT do plot and that’s our xx and y y our two corresponding values XY and then somebody played with this to figure out that the line width equals two and the color black would look nice so let’s go ahead and run this whole thing with the PIP plot on there and you can see when we do this it’s just doing flour and sugar on here corresponding line between the sugar and the flour and the muffin versus Cupcake um and then we generated the um support vectors the y y down and y y up so let’s take a look and see what that looks like so we’ll do our PL plot and again this is all against XX the RX x value but this time we have YY down and let’s do something a little fun with this we can put in a k dash dash that just tells it to make it a dotted line and if we’re going to do the down one we also want to do the up one so here’s our YY up and when we run that it adds both sets aligned and so here’s our support and this is what you expect you expect these two lines to go through the nearest data point so the dash lines go through the nearest muffin and the nearest cupcake when it’s plotting it and then your svm goes right down the middle so it gives it a nice split in our data and you can see how easy it is to see based just on sugar and flour which one’s a muffin or a cupcake let’s go ahead and create a function to predict muffin or cupcake I’ve got my um recipes I pulled off the um internet and I want to see the difference between a muffin or a cupcake and so we need a function to push that through and create a function with DEA and let’s call it muffin or cupcake and remember we’re just doing flour and sugar today we’re not doing all the ingredients and that actually is a pretty good split you really don’t need all the ingredients to know it’s flour and sugar and let’s go ahead and do an IFL statement so if model predict is a flower and sugar equals zero so we take our model and we do run a predict it’s very common in sklearn where you have a DOT predict you put the data in and it’s going to return a value in this case if it equals zero then print you’re looking at a muffin recipe else if it’s not zero that means it’s one then you’re looking at a cupcake recipe that’s pretty straightforward for function or def for definition DF is how you do that in Python and of course we going to create a function you should run something in it and so let’s run a cupcake and we’re going to send it values 50 and 20 a muffin or a cupcake I don’t know what it is and let’s run this and just see what it gives us and it says oh it’s a muffin you’re looking at a muffin recipe so it very easily predicts whether we’re looking at a muffin or a cupcake recipe let’s plot this there we go plot this on the graph so we can see what that actually looks like and I’m just going to copy and paste it from below we we plotting all the points in there so this is nothing different than what we did before if I run it you’ll see it has all the points and the lines on there and we want to do is we want to add another point and we’ll do PLT plot and if I remember correctly we did for our test we did 50 and 20 and then somebody went in here just decided we’ll do yo for yellow or it’s kind of a orange is yellow color is going to come out marker size nine those are settings you can play with somebody else played with them to come up with the right setup so it looks good and you can see there it is graph um clearly a muffin in this case in cupcakes versus muffins the muffin has won and if you’d like to do your own muffin cupcake Contender series you certainly can send a note down below and the team at simply learn will send you over the data they use for the muffin and cupcake and that’s true of any of the data um we didn’t actually run a plot on it earlier we had men versus women you can also request that information to run it on your data setup so you can test that out so to go back over our setup we went ahead for our support Vector machine code we did a predict 40 Parts flower 20 Parts sugar I think it was different than the one we did whether it’s a muffin or a cupcake hence we have built a classifier using spvm which is able to classify if a recipe is of a cupcake or a muffin which wraps up our cupcake versus muffin today in our second tutorial we’re going to cover K means and linear regression along with going over the quiz questions we had during our first tutorial what’s in it for you we’re going to cover clustering what is clustering K means clustering which is one of the most common used clustering tools out there including a flowchart to understand K means clustering and how it functions and then we’ll do an actual python live demo on clustering of cars based on Brands then we’re going to cover logistic regression what is logistic regression logistic regression curve in sigmoid function and then we’ll do another python code demo to classify a tumor as malignant or benign based on features and let’s start with clustering suppose we have a pile of books of different genres now we divide them into different groups like fiction horror education and as we can see from this young lady she definitely is into heavy horror you can just tell by those eyes in the maple Canadian leaf on her shirt but we have fiction horror and education and we want to go ahead and divide our books up well organizing objects into groups based on similarity is clustering and in this case as we’re looking at the books we’re talking about clustering things with knowing categories but you can also use it to explore data so you might not know the categories you just know that you need to divide it up in some way to conquer the data and to organize it better but in this case we’re going to be looking at clustering in specific categories and let’s just take a deeper look at that we’re going to use K means clustering K means clustering is probably the most commonly used clustering tool in the machine learning library K means clustering is an example of unsupervised learning if you remember from our previous thing it is used when you have unlabeled data so we don’t know the answer yet we have a bunch of data that we want to Cluster to different groups Define clusters in the data based on feature similarity so we’ve introduced a couple terms here we’ve already talked about unsupervised learning and unlabeled data so we don’t know the answer yet we’re just going to group stuff together and see if we can find an answer of how things connect we’ve also introduced feature similarity features being different features of the data now with books we can easily see fiction and horror and history books but a lot of times with data some of that information isn’t so easy to see right when we first look at it and so K means is one of those tools where we can start finding things that connect that match with each other suppose we have these data points and want to assign them into a cluster now when I look at these data points I would probably group them into two clusters just by looking at them I’d say two of these group of data kind of come together but in K means we pick K clusters and assign random centroids to clusters where the K clusters represents two different clusters we pick K clusters and S random centroids to the Clusters then we compute distance from objects to the centroids now we form new clusters based on minimum distances and calculate the centroids so we figure out what the best distance is for the centroid then we move the centroid and recalculate those distances repeat previous two steps iteratively till the cluster centroid stop changing their positions and become Static repeat previous two steps iteratively till the cluster centroid stop changing and the positions become Static once the Clusters become Static then K means clustering algorithm is said to be converged and there’s another term we see throughout machine learning is converged that means whatever math we’re using to figure out the answer has come to a solution or it’s converged on an answer shall we see the flowchart to understand make a little bit more sense by putting it into a nice easy step by step so we start we choose K we’ll look at the elbow method in just a moment we assign random centroids to clusters and sometimes you pick the centroids because you might look at the data in in a graph and say oh these are probably the central points then we compute the distance from the objects to the centroids we take that and we form new clusters based on minimum distance and calculate their centroids then we compute the distance from objects to the new centroids and then we go back and repeat those last two steps we calculate the distances so as we’re doing it it brings into the new centroid and then we we move the centroid around and we figure out what the best which objects are closest to each centroid so the objects can switch from one centroid to the other as a centroids are moved around and we continue that until it is converged let’s see an example of this suppose we have this data set of seven individuals and their score on two topics A and B so here’s our subject in this case referring to the person taking the uh test and then we have subject a where we see what they’ve scored on their first subject and we have subject B and we can see what they score on the second subject now let’s take two farthest apart points as initial cluster centroids now remember we talked about selecting them randomly or we can also just put them in different points and pick the furthest one apart so they move together either one works okay depending on what kind of data you’re working on and what you know about it so we took the two furthest points one and one and five and seven and now let’s take the two farthest apart points as initial cluster centroids each point is then assigned to the closest cluster with respect to the distance from the centroids so we take each one of these points in there we measure that distance and you can see that if we measured each of those distances and you use the Pythagorean theorem for a triangle in this case because you know the X and the Y and you can figure out the diagonal line from that or you just take a ruler and put it on your monitor that’d be kind of silly but it would work if you’re just eyeballing it you can see how they naturally come together in certain areas now we again calculate the centroids of each cluster so cluster one and then cluster two and we look at each individual dot there’s one 2 three we’re in one cluster uh the centroid then moves over it becomes 1.8 comma 2.3 so remember it was at one and one well the very center of the data we’re looking at would put it at the one point roughly 22 but 1.8 and 2.3 and the second one if we wanted to make the overall mean Vector the average Vector of all the different distances to that centroid we come up with 4 comma 1 and 54 so we’ve now moved the centroids we compare each individual’s distance to its own cluster mean and to that of the opposite cluster and we find can build a nice chart on here that the as we move that centroid around we now have a new different kind of clustering of groups and using ukian distance between the points and the mean we get the same formula you see new formulas coming up so we have our individual dots distance to the mean cent of the cluster and distance to the mean cent of the cluster only only individual three is nearer to the mean of the opposite cluster cluster two than its own cluster one and you can see here in the diagram where we’ve kind of circled that one in the middle so when we’ve moved the clust the centroids of the Clusters over one of the points shifted to the other cluster because it’s closer to that group of individuals thus individual 3 is relocated to Cluster 2 resulting in a new Partition and we regenerate all those numbers of how close they are to the different clusters for the new clusters we will find the actual cluster centroids so now we move the centroids over and you can see that we’ve now formed two very distinct clusters on here on comparing the distance of each individual’s distance to its own cluster mean and to that of the opposite cluster we find that the data points are stable hence we have our final clusters now if you remember I brought up a concept earlier K me on the K means algorithm choosing the right value of K will help in less number of iterations and to find the appropriate number of clusters in a data set we use the elbow method and within sum of squares WSS is defined as the sum of the squared distance between each member of the cluster in its centroid and so you see we’ve done here is we have the number of clusters and as you do the same K means algorithm over the different clusters and you calculate what that centroid looks like and you find the optimal you can actually find the optimal number of clusters using the elbow the graph is called as the elbow method and on this we guessed at two just by looking at the data but as you can see the slope you actually just look for right there where the elbow is in the slope and you have a clear answer that we want two different to start with k means equals 2 A lot of times people end up Computing K means equals 2 3 4 5 until they find the value which fits on the elbow joint sometimes you can just look at the data and if you’re really good with that specific domain remember domain I mentioned that last time you’ll know that that where to pick those numbers or where to start guessing at what that K value is is so let’s take this and we’re going to use a use case using K means clustering to Cluster cars into Brands using parameters such as horsepower cubic inches make year Etc so we’re going to use the data set cars data having information about three brands of cars Toyota Honda and Nissan we’ll go back to my favorite tool the Anaconda Navigator with the Jupiter notebook and let’s go ahead and flip over to our Jupiter notebook and in our Jupiter notebook I’m going to go ahead and just paste the uh basic code that we usually start a lot of these off with we’re not going to go too much into this code because we’ve already discussed numpy we’ve already discussed matplot library and pandas numpy being the number array pandas being the pandas data frame and matap plot for the graphing and don’t forget uh since if you’re using the jupyter notebook you do need the map plot library in line so that it plots everything on the screen if you’re using a different python editor then you probably don’t need that cuz it’ll have a popup window on your computer and we’ll go ahead and run this just to load our libraries and our setup into here the next step is of course to look at our data which I’ve already opened up in a spreadsheet and you can see here we have the miles per gallon cylinders cubic inches horsepower weight pounds how you know how heavy it is time it takes to get to 60 my card is probably on this one at about 80 or 90 what year it is so this is you can actually see this is kind of older cars and then the brand Toyota Honda Nissan so the different cars are coming from all the way from 1971 if we scroll down to uh the 80s we have between the 70s and 80s a number of cars that they’ve put out and let’s uh when we come back here we’re going to do importing the data so we’ll go ahead and do data set equals and we’ll use pandas to read this in and it’s uh from a CSV file remember you can always post this in the comments and request the data files for these either in the comments here on the you YouTube video or go to Simply learn.com and request that the cars CSV I put it in the same folder as the code that I’ve stored so my python code is stored in the same folder so I don’t have to put the full path if you store them in different folders you do have to change this and double check your name variables and we’ll go ahead and run this and uh We’ve chosen data set arbitrarily because you know it’s a data set we’re importing and we’ve now imported our car CSV into the data set as you know you have to prep the data so we’re going to create the X data this is the one that we’re going to try to figure out what’s going on with and then there is a number of ways to do this but we’ll do it in a simple Loop so you can actually see what’s going on so we’ll do for i n x. columns so we’re going to go through each of the columns and a lot of times it’s important I I’ll make lists of the columns and do this because I might remove certain columns or there might be colums that I want to be processed differently but for this we can go ahead and take X of I and we want to go fill Na and that’s a panda command but the question is what are we going to fill the missing data with we definitely don’t want to just put in a number that doesn’t actually mean something and so one of the tricks you can do with this is we can take X of I and in addition to that we want to go ahead and turn this into an integer because a lot of these are integers so we’ll go ahead and keep it integers and me add the bracket here and a lot of editors will do this they’ll think that you’re closing one bracket make sure get that second bracket in there if it’s a double bracket that’s always something that happens regularly so once we have our integer of X of Y this is going to fill in any missing data with the average and I was so busy closing one set of brackets I forgot that the mean is also has brackets in there for the pandas so we can see here we’re going to fill in all the data with the average value for that column so if there’s missing data is in the average of the data it does have then once we’ve done that we’ll go ahead and loop through it again and just check and see to make sure everything is filled in correctly and we’ll print and then we take X is null and this returns a set of the null value or the how many lines are null and we’ll just sum that up to see what that looks like and so when I run this and so with the X what we want to do is we want to remove the last column because that had the models that’s what we’re trying to see if we can cluster these things and figure out the models there is so many different ways to sort the X out for one we could take the X and we could go data set our variable we’re using and use the iocation one of the features that’s in pandas and we could take that and then take all the rows and all but the last column of the data set and at this time we could do values we just convert it to values so that’s one way to do this and if I let me just put this down here and print X it’s a capital x we chose and I run this you can see it’s just the values we could also take out the values and it’s not going to return anything because there’s no values connected to it what I like to do with this is instead of doing the iocation which does integers more common is to come in here and we have our data set and we’re going to do data set dot or data set. columns and remember that lists all the columns so if I come in here let me just Mark that as red and I print data set . columns you can see that I have my index here I have my MPG cylinders everything including the brand which we don’t want so the way to get rid of the brand would be to do data Columns of Everything But the last one minus one so now if I print this you’ll see the brand disappears and so I can actually just take data set columns minus one and I’ll put it right in here for the columns we’re going to look at and uh let’s unmark this and unmark this and now if I do an x. head I now have a new data frame and you can see right here we have all the different columns except for the brand at the end of the year and it turns out when you start playing with the data set you’re going to get an error later on and it’ll say cannot convert string to uh float value and that’s because for some reason these things the way they recorded them must been recorded as strings so we have a neat feature in here on pandas to convert and it is simply convert objects and for this we’re going to do convert oops convert underscore numeric numeric equals true and yes I did have to go look that up I don’t have it memorized the convert numeric in there if I’m working with a lot of these things I remember them but um depending on where I’m at what I’m doing I usually have to look it up and we run that oops I must have missed something in here let me double check my spelling and when I double check my spelling you’ll see I missed the first underscore in the convert objects when I run this it now has everything converted into a numeric value because that’s what we’re going to be working with is numeric values down here and the next part is that we need to go through the data and eliminate null values most people when they’re doing small amounts working with small data pools discover afterwards that they have a null value and they have to go back and do this so you know be aware whenever we’re formatting this data things are going to pop up and sometimes you go backwards to fix it and that’s fine that’s just part of exploring the data and understanding what you have and I should have done this earlier but let me go ahead and increase the size of my window one notch there we go easier to see so we’ll do 4 I in working with x. columns we’ll page through all the columns and we want to take X of I and we’re going to change that we’re going to alter it and so with this we want to go ahead and fill in X of I pandis Has the fill in a and that just fills in any non-existent missing data and we’ll put my brackets up and there’s a lot of different ways to fill this data if you have a really large data set some people just void out that data because if and then look at it later in a separate exploration of data one of the tricks we can do is we can take our column and we can find the means and the means is in our quotation marks so when we take the columns we’re going to fill in the the non-existing one with the means the problem is that returns a decimal float so some of these aren’t decimals certainly need to be a little careful of doing this but for this example we’re just going to fill it in with the integer version of this keeps it on par with the other data that isn’t a decimal point and then what we also want to do is we want to double check A lot of times you do this first part first to double check then you do the fill and then you do it again just to make sure you did it right so we’re going to go through and test for missing data and one of the re ways you can do that is simply go in here and take our X of I column so it’s going to go through the X ofi column it says is null so it’s going to return any any place there’s a null value it actually goes through all the rows of each column is null and then we want to go ahead and sum that so we take that we add the sum value and these are all pandas so is null is a panda command and so is some and if we go through that and we go ahead and run it and we go ahead and take and run that you’ll see that all the columns have zero null values so we’ve now tested and double checked and our data is nice and clean we have no null values everything is now a number value we turned it into numeric and we’ve removed the last column in our data and at this point we’re actually going to start using the elbow method to find the optimal number of clusters so we’re now actually getting into the SK learn part uh the K means clustering on here I guess we’ll go ahead and zoom it up one more notot so you can see what I’m typing in here and then from sklearn going to or sklearn cluster I’m going to import K means I always forget to capitalize the K and the M when I do this say capital K capital M K means and we’ll go and create a um aray wcss equals let me get an empty array if you remember from the albow method from our slide within the sums of squares WSS is defined as the sum of square distance between each member of the cluster and its centroid so we’re looking at that change in differences as far as a squar distance and we’re going to run this over a number of K mean values in fact let’s go for I in range we’ll do 11 of them range 0 11 and the first thing we’re going to do is we’re going to create the actual we’ll do it all lowercase and so we’re going to create this object from the K means that we just imported and the variable that we want to put into this is in clusters and we’re going to set that equals to I that’s the most important one because we’re looking at how increasing the number of clusters changes our answer there are a lot lot of settings to the K means our guys in the back did a great job just kind of playing with some of them the most common ones that you see in a lot of stuff is how you init your K means so we have K means plus plus plus this is just a tool to let the model itself be smart how it picks it centroids to start with its initial centroids we only want to iterate no more than 300 times we have a Max iteration we put in there we have the inth the knit the random State equals zero you really don’t need to worry too much about these when you’re first learning this as you start digging in deeper you start finding that these are shortcuts that will speed up the process as far as a setup but the big one that we’re working with is the in clusters equals I so we’re going to literally train our K means 11 times we’re going to do this process 11 times in if you’re working with uh Big Data you know the first thing you do is you run a small sample of the data so you can test all your stuff on it and you can already see the problem that if I’m going to iterate through a terabyte of data 11 times and then the K means itself is iterating through the data multiple times that’s a heck of a process so you got to be a little careful with this a lot of times though you can find your elbow using the elbow method find your optimal number on a sample of data especially if you’re working with larger data sources so we want to go ahead and take our K means and we’re just going to fit it if you’re looking at any of the sklearn very common common you fit your model and if you remember correctly our variable we’re using is the capital x and once we fit this value we go back to the um array we made and we want to go just to pin that value on the end and it’s not the actual fitware pinning in there it’s when it generates it it generates the value you’re looking for is inertia so K means. inertia will pull that specific value out that we need and let’s get a visual on this we’ll do our PL T plot and what we’re plotting here is first the xaxis which is range 0 11 so that will generate a nice little plot there and the wcss for our Y axis it’s always nice to give our plot a title and let’s see we’ll just give it the elbow method for the title and let’s get some labels so let’s go ahead and do PLT X label and what we’ll do we’ll do number of clusters for that and PLT y label and for that we can do oops there we go wcss since that’s what we’re doing on the plot on there and finally we want to go ahead and display our graph which is simply PLT do oops. show there we go and because we have it set to inline it’ll appear in line hopefully I didn’t make a type error on there and you you can see we get a very nice graph you can see a very nice elbow joint there at uh two and again right around three and four and then after that there’s not very much now as a data scientist if I was looking at this I would do either three or four and I’d actually try both of them to see what the um output look like and they’ve already tried this in the back so we’re just going to use three as a setup on here and let’s go ahead and see what that looks like when we actually use this to show the different kinds of cars and so let’s go ahead and apply the K means to the cars data set and basically we’re going to copy the code that we looped through up above where K means equals K means number of clusters and we’re just going to set that number of clusters to three since that’s what we’re going to look for you could do three and four on this and graph them just to see how they come up differently be kind of curious to look at that but for this we’re just going to set it to three go ahead and create our own variable y k means for our answers and we’re going to set that equal to whoops I double equal there to K means but we’re not going to do a fit we’re going to do a fit predict is the setup you want to use and when you’re using untrained models you’ll see um a slightly different usually you see fit and then you see just the predict but we want to both fit and predict the K means on this and that’s fitcore predict and then our capital x is the data we’re working with and before we plot this data we’re going to do a little pandas trick we’re going to take our x value and we’re going to set XS Matrix so we’re converting this into a nice rows and columns kind of set up but we want the we’re going to have columns equals none so it’s just going to be a matrix of data in here and let’s go ahead and run that a little warning you’ll see These Warnings pop up because things are always being updated so there’s like minor changes in the versions and future versions instead of Matrix now that it’s more common to set it values instead of doing as Matrix but M Matrix works just fine for right now and you’ll want to update that later on but let’s go ahead and dive in and plot this and see what that looks like and before we dive into plotting this data I always like to take a look and see what I am plotting so let’s take a look at y k means I’m just going to print that out down here and we see we have an array of answers we have 2 1 0 2 one two so it’s clustering these different rows of data based on the three different spaces it thinks it’s going to be and then let’s go ahead and print X and see what we have for x and we’ll see that X is an array it’s a matrix so we have our different values in the array and what we’re going to do it’s very hard to plot all the different values in the array so we’re only going to be looking at the first two or positions zero and one and if you were doing a full presentation in front of the board meeting you might actually do a little different and and dig a little deeper into the different aspects because this is all the different columns we looked at but we only look at columns one and two for this to make it easy so let’s go ahead and clear this data out of here and let’s bring up our plot and we’re going to do a scatter plot here so PLT scatter and this looks a little complicated so let’s explain what’s going on with this we’re going to take the X values and we’re only interested in y of K means equals zero the first cluster okay and then we’re going to take value zero for the x axis and then we’re going to do the same thing here we’re only interested in K means equals zero but we’re going to take the second column so we’re only looking at the first two columns in our answer or in the data and then the guys in the back played with this a little bit to make it pretty and they discovered that it looks good with has a size equals 100 that’s the size of the dots we’re going to use red for this one and when they were looking at the data and what came out it was definitely the Toyota on this we’re just going to go ahead and label it Toyota again that’s something you’d really have to explore in here as far as playing with those numbers and see what looks good we’ll go ahead and hit enter in there and I’m just going to paste in the next two lines which is the next two cars and this is our Nissa and and Honda and you’ll see with our scatter plot we’re now looking at where Yore K means equals 1 and we want the zero column and y k means equals 2 again we’re looking at just the first two columns zero and one and each of these rows then corresponds to Nissan and Honda and I’ll go ahead and hit enter on there and uh finally let’s take a look and put the centroids on there again we’re going to do a scatter plot and on the centroids you can just pull that from our K means the uh model we created do cluster centers and we’re going to just do um all of them in the first number and all of them in the second number which is 0o one because you always start with zero and one and then they were playing with the size and everything to make it look good we’ll do a size of 300 we’re going to make the color yellow and we’ll label them so it’s good to have some good labels centroids and then we do one do a title PLT title and pop up there PLT title you always make want to make your graphs look pretty we’ll call it clusters of car make and one of the features of the plot library is you can add a legend it’ll automatically bring in it since we’ve already labeled the different aspects of the legend with Toyota Nissan and Honda and finally we want to go ahead and show so we can actually see it it and remember it’s in line uh so if you’re using a different editor that’s not the Jupiter notebook you’ll get a popup of this and you should have a nice set of clusters here so we can look at this and we have a clusters of Honda and green Toyota and red Nissan and purple and you can see where they put the centroids to separate them now when we’re looking at this we can also plot a lot of other different data on here as far because we only looked at the first two columns this is just column one and two or 01 as you label them in computer scripting but you can see here we have a nice clusters of Carm make and we were able to pull out the data and you can see how just these two columns form very distinct clusters of data so if you were exploring new data you might take a look and say well what makes these different almost going in reverse you start looking at the data and pulling apart the columns to find out why is the first group set up the way it is maybe you’re doing loans and you want to go well why is this group not defaulting on their loans and why is the last group defaulting on their loans and why is the middle group 50% defaulting on their bank loans and you start finding ways to manipulate the data and pull out the answers you want so now that you’ve seen how to use K mean for clustering let’s move on to the next topic now let’s look into logistic regression the logistic regression algorithm is the simplest classification algorithm used for binary or multi classification problem s and we can see we have our little girl from Canada who’s into horror books is back that’s actually really scary when you think about that with those big eyes in the previous tutorial we learned about linear regression dependent and independent variables so to brush up y equals MX plus C very basic algebraic function of Y and X the dependent variable is the target class variable we are going to predict the independent variables X1 all way up to xn are the features or attributes we’re going to use to predict the target class we know what a linear regression looks like but using the graph we cannot divide the outcome into categories it’s really hard to categorize 1.5 3.6 9.8 uh for example a linear regression graph can tell us that with increase in number of hours studied the marks of a student will increase but it will not tell us whether the student will pass or not in such cases where we need the output as categorical value we will use logistic regression and for that we’re going to use the sigmoid function so you can see here we have our marks 0 to 100 number of hours studied that’s going to be what they’re comparing it to in this example and we usually form a line that says y = mx + C and when we use the sigmoid function we have P = 1 over 1 + eus y it generates a sigmoid curve and so you can see right here when you take the Ln which is the natural logarithm I always thought it should be NL not Ln that’s just the inverse of uh e your eus Y and so we do this we get Ln of p over 1 – p = m * x + C that’s the sigmoid curve function we’re looking for and we can zoom in on the function and you’ll see that the function as it deres goes to one or to zero depending on what your x value is and the probability if it’s greater than five the value is automatically rounded off to one indicating that the student will pass so if they’re doing a certain amount of studying they will probably pass then you have a threshold value at the 0 five it automatically puts that right in the middle usually and your probability if it’s less than 0. five the value rented off to zero indicating the student will fail so if they’re not studying very hard they’re probably going to fail this of course is ignoring the outliers of that one student who’s just a natural genius and doesn’t need any studying to memorize everything that’s not me unfortunately have to study hard to learn new stuff problem statement to classify whether a tumor is malignant or benign and this is actually one of my favorite data sets to play with because it has so many features and when you look at them you really are hard to understand you can’t just look at them and know the answer so it gives you a chance to kind of dive into what data looks like when you aren’t able to understand the specific domain of the data but I also want you to remind you that in the domain of medicine if I told told you that my probability was really good it classified things that say 90% or 95% and I’m classifying whether you’re going to have a malignant or a Bine tumor I’m guessing that you’re going to go get it tested anyways so you got to remember the domain we’re working with so why would you want to do that if you know you’re just going to go get a biopsy because you know it’s that serious this is like an all or nothing just referencing the domain it’s important it might help the doctor know where to look just by understanding what kind of tumor it is so it might help them or Aid them in something they missed from before so let’s go ahead and dive into the code and I’ll come back to the domain part of it in just a minute so use case and we’re going to do our noral Imports here we’re importing numy Panda Seaborn the matplot library and we’re going to do matplot library in line since I’m going to switch over to Anaconda so let’s go ahead and flip over there and get this started so I’ve opened up a new window in my anaconda Jupiter notebook by the way Jupiter notebook uh you don’t have to use Anaconda for the Jupiter notebook I just love the interface and all the tools in anac brings so we got our import numpy as in P for our numpy number array we have our pandas PD we’re going to bring in caborn to help us with our graphs as SNS so many really nice Tools in both caborn and matplot library and we’ll do our matplot library. pyplot as PLT and then of course we want to let it know to do it in line and let’s go and just run that so it’s all set up and we’re just going to call our data data not creative today uh equals PD and this happens to be in a CSV file so we’ll use a pd. read CSV and I happen to name the file or renamed it data for p2.png and let’s just um open up the data before we go any further and let’s just see what it looks like in a spreadsheet so when I pop it open in a local spreadsheet this is just a CSV file comma separate variables we have an ID so I guess they um categorizes for reference or what id which test was done the diagnosis M for malignant B for B9 so there’s two different options on there and that’s what we’re going to try to predict is the m and b and test it and then we have like the radius mean or average the texture average perimeter mean area mean smoothness I don’t know about you but unless you’re a doctor in the field most of the stuff I mean you can guess what concave means just by the term concave but I really wouldn’t know what that means in the measurements are taking so they have all kinds of stuff like how smooth it is uh the Symmetry and these are all float values we just page through them real quick and you’ll see there’s I believe 36 if I remember correctly in this one so there’s a lot of different values they take and all these measurements they take when they go in there and they take a look at the different growth the tumorous growth so back in our data and I put this in the same folder as a code so I saved this code in that folder obviously if you have it any a different location you want to put the full path in there and we’ll just do uh Panda’s first five lines of data with the data. head and we run that we can see that we have pretty much what we just looked at we have an ID we have a diagnosis if we go all the way across you’ll see all the

different columns coming across displayed nicely for our data and while we’re exploring the data our caborn which we referenced as SNS makes it very easy to go in here and do a joint plot you’ll notice the very similar to because it is sitting on top of the um plot Library so the joint plot does a lot of work for us and we’re just going to look at the first two two columns that we’re interested in the radius mean and the texture mean we’ll just look at those two columns and data equals data so that tells it which two columns we’re plotting and that we’re going to use the data that we pulled in let’s just run that and it generates a really nice graph on here and there’s all kinds of cool things on this graph to look at I mean we have the texture mean and the radius mean obviously the axes you can also see and one of the cool things on here is you can also see the histogram they show that for the radius mean where is the most common radius mean come up and where the most common texture is so we’re looking at the tech the on each growth its average texture and on each radius its average uh radius on there get a little confusing because we’re talking about the individual objects average and then we can also look over here and see the the histogram showing us the median or how common each measurement is and that’s only two columns so let’s dig a little deeper into Seaborn they also have a heat map and if you’re not familiar with heat Maps a heat map just means it’s in color that’s all that means heat map I guess the original ones were plotting heat density on something and so ever sens it’s just called a heat map and we’re going to take our data and get our corresponding numbers to put that into the heat map and that’s simply data. C RR for that that’s a panda expression remember we’re working in a pandas data frame so that’s one of the cool tools and pandas for our data and this’s just pull that information into a heat map and see what that looks like and you’ll see that we’re now looking at all the different features we have our ID we have our texture we have our area our compactness concave points and if you look down the middle of this chart diagonal going from the upper left to bottom right it’s all white that’s because when you compare texture to texture they’re identical so they’re 100% or in this case perfect one in their correspondence and you’ll see that when you look at say area or right below it it has almost a black on there when you compare it to texture so these have almost no corresponding data They Don’t Really form a linear graph or something that you can look at and say how connected they are they’re very scattered data this is really just a really nice graph to get a quick look at your data doesn’t so much change what you do but it changes verifying so when you get an answer or something like that or you start looking at some of these individual pieces you might go hey that doesn’t match according to showing our heat map this should not correlate with each other and if it is you’re going to have to start asking well why what’s going on what else is coming in there but it does show some really cool information on here me we can see from the ID there’s no real one feature that just says if you go across the top line that lights up there’s no one feature that says hey if the area is a certain size then it’s going to be B9 or malignant it says there’s some that sort of add up and that’s a big hint in the data that we’re trying to ID this whether it’s malignant or B9 that’s a big hint to us as data scientist to go okay we can’t solve this with any one feature it’s going to be something that includes all the features or many of the different features to come up with the solution for it and while we’re exploring the data let’s explore one more area and let’s look at data. isnull we want to check for null values in our data if you remember from earlier in this tutorial we did it a little differently where we added stuff up and summ them up you can actually with pandas do it really quickly data. is null and Summit and it’s going to go across all the columns so when I run this you’re going to see all the columns come up with no null data so we’ve just just to reash these last few steps we’ve done a lot of explor ation we have looked at the first two columns and seen how they plot with the caborn with a joint plot which shows both the histogram and the data plotted on the XY coordinates and obviously you can do that more in detail with different columns and see how they plot together and then we took and did the Seaborn heat map the SNS do heat map of the data and you can see right here where it did a nice job showing us some bright spots where stuff correlates with each other and forms a very nice combination or points of scattering points and you can also see areas that don’t and then finally we went ahead and checked the data is the data null value do we have any missing data in there very important step because it’ll crash later on if you forget to do this St it will remind you when you get that nice error code that says null values okay so not a big deal if you miss it but it it’s no fun having to go back when you’re you’re in a huge process and you’ve missed this step and now you’re 10 steps later and you got to go remember where you were pulling the data in so we need to go ahead and pull out our X and our y so we just put that down here and we’ll set the x equal to and there’s a lot of different options here certainly we could do x equals all the columns except for the first two because if you remember the first two is the ID and the diagnosis so that certainly would be an option but what we’re going to do is we’re actually going to focus on on the worst the worst radius the worst texture parameter area smoothness compactness and so on one of the reasons to start dividing your data up when you’re looking at this information is sometimes the data will be the same data coming in so if I have two measurements coming into my model it might overweigh them it might overpower the other measurements because it’s measur it’s basically taking that information in twice that’s a little bit past the scope of this tutorial I want you to take away from this though is that we are dividing the data up into pieces and our team in the back went ahead and said hey let’s just look at the worst so I’m going to create a an array and you’ll see this array radius worst texture worst perimeter worst we’ve just taken the worst of the worst and I’m just going to put that in my X so this x is still a panda data frame but it’s just those columns and our y if you remember correctly is going to be oops hold on one second it’s not X it’s data there we go so x equals data and then it’s a list of the different columns the worst of the worst and if we’re going to take that then we have to have our answer for our Y for the stuff we know and if you remember correctly we’re just going to be looking at the diagnosis that’s all we care about is what is it diagnosed is it Bine or malignant and since it’s a single column we can just do diagnosis oh I forgot to put the brackets the there we go okay so it’s just diagnosis on there and we can also real quickly do like x. head if you want to see what that looks like and Y do head and run this and you’ll see um it only does the last one I forgot about that if you don’t do print you can see that the the y. head is just Mmm because the first ones are all malignant and if I run this the x. head is just the first five values of radius worst texture worst parameter worst area worst and so on I’ll go ahead and take that out so moving down to the next step we’ve built our two data sets our answer and then the features we want to look at in data science it’s very important to test your model so we do that by splitting the data and from sklearn model selection we’re going to import train test split so we’re going to split it into two groups there are so many ways to do this I noticed in one of the more modern ways to actually split it into three groups and then you model each group and test it against the other groups so you have all kinds of and there’s reasons for that which is pass the scope of this and for this particular example isn’t necessary for this we’re just going to split it into two groups one to train our data and one to test our data and the sklearn uh. model selection we have train test split you could write your own quick code to do this where you just randomly divide the data up into two groups but they do it for us nicely and we actually can almost we can actually do it in one statement with this where we’re going to generate four variables capital x train capital X test so we have our training data we’re going to use to fit the model and then we need something to test it and then we have our y train so we’re going to train the answer and then we have our test so this is the stuff we want to see how good it did on our model and we’ll go ahead and take our train test split that we just imported and we’re going to do X and our y our two different data that’s going in for our split and then the guys in the back came up and wanted us to go ahead and use a test size equals. 3 that’s testore size random State it’s always nice to kind of switch a random State around but not that important what this means is that the test size is we’re going to take 30% of the data and we’re going to put that into our test variables our y test and our X test and we’re going to do 70% into the X train and the Y train so we’re going to use 70% of the data to train our model and 30% to test it let’s go ahead and run that and load those up so now we have all our stuff split up and all our data ready to go now we get to the actual Logistics part we’re actually going to do our create our model so let’s go ahead and bring that in from sklearn we’re going to bring in our linear model and we’re going to import logistic regression that’s the actual model we’re using and this we’ll call it log model o the real model and let’s just set this equal to our logistic regression that we just imported so now we have a variable log model set to that class for us to use and with most the uh models in the sklearn we just need to go ahead and fix it fit do a fit on there and we use our X train that we separated out with our y train and let’s go ahead and run this so once we’ve run this we’ll have a model that fits this data that’s 70% of our training data uh and of course it prints this out that tells us all the different variables that you can set on there there’s a lot of different choices you can make but for word do we’re just going to let all the defaults sit we don’t really need to mess with those on this particular example and there’s nothing in here that really stands out as super important until you start fine-tuning it but for what we’re doing the basics will work just fine and then let’s we need to go ahead and test out our model is it working so let’s create a VAR variable y predict and this is going to be equal to our log model and we want to do a predict again very standard format for the sklearn library is taking your model and doing a predict on it and we’re going to test why predict against the Y test so we want to know what the model thinks it’s going to be that’s what our y predict is and with that we want the capital x x test so we have our train set and our test set and now we’re going to do our y predict and let’s go ahead and run that and if we uh print y predict let me go ahead and run that you’ll see it comes up and it PRS a prints a nice array of uh B and M for B9 and malignant for all the different test data we put in there so it does pretty good we’re not sure exactly how good it does but we can see that it actually works and it’s functional was very easy to create you’ll always discover with our data science that as you explore this you spend a significant amount of time prepping your data and making sure your data coming in is good uh there’s a saying good data in good answers out bad data in bad answers out that’s only half the thing that’s only half of it selecting your models becomes the next part as far as how good your models are and then of course fine-tuning it depending on what model you’re using so we come in here we want to know how good this came out so we have our y predict here log model. predict X test so for deciding how good our model is we’re going to go from the SK learn. metrics we’re going to import classification report and that just reports how good our model is doing and then we’re going to feed it the model data and let’s just print this out and we’ll take our classification report and we’re going to put into there our test our actual data so this is what we actually know is true and our prediction what our model predicted for that data on the test side and let’s run that and see what that does so we pull that up you’ll see that we have um a Precision for B9 and malignant B&M and we have a Precision of 93 and 91 a total of 92 so it’s kind of the average between these two of 9 two there’s all kinds of different information on here your F1 score your recall your support coming through on this and for this I’ll go ahead and just flip back to our slides that they put together for describing it and so here we’re going to look at the Precision using the classification report and you see this is the same print out I had up above some of the numbers might be different because it does randomly pick out which data we’re using so this model is able to predict the type of tumor with 91 %c accuracy so when we look back here that’s you will see where we have uh B9 in mland it actually is 92 coming up here but we’re looking about a 92 91% precision and remember I reminded you about domain so we’re talking about the domain of a medical domain with a very catastrophic outcome you know at 91 or 92% Precision you’re still going to go in there and have somebody do a biopsy on it very different than if you’re investing money and there’s a 92% chance you’re going to earn 10% and 8% chance you’re going to lose 8% you’re probably going to bet the money because at that odds it’s pretty good that you’ll make some money and in the long run you do that enough you definitely will make money and also with this domain I’ve actually seen them use this to identify different forms of cancer that’s one of the things that they’re starting to use these models for because then it helps the doctor know what to investigate so that wraps up this section we’re finally we’re going to go in there and let’s discuss the ANW to the quiz asked in machine learning tutorial part one can you tell what’s happening in the following cases grouping documents into different categories based on the topic and content of each document this is an example of clustering where K means clustering can be used to group the documents by topics using bag of words approach so if You’ gotten in there that you’re looking for clustering and hopefully you had at least one or two examples like K means that are used for clustering different things then give yourself a two thumbs up B identify handwritten digits in images correctly this is an example of classification the traditional approach to solving this would be to extract digit dependent features like curvature of different digits Etc and then use a classifier like svm to distinguish between images again if you got the fact that it’s a classification example give yourself a thumb up and if you’re able to go hey let’s use svm or another model for this give yourself those two thumbs up on it C behavior of a website indicting that the site is not working as designed this is an example of anomaly detection in this case the algorithm learns what is normal and what is not normal usually by observing the logs of the website give yourself a thumbs up if you got that one and just for a bonus can you think of another example of anomaly detection one of the ones I use for my own business is detecting anomalies in stock markets stock markets are very ficked and they behave very radical so finding those erratic areas and then find finding ways to track down why they’re erratic was something released in social media was something released you can see we’re knowing where that anomaly is can help you to figure out what the answer is to it in another area D predicting salary of an individual based on his or her years of experience this is an example of regression this problem can be mathematically defined as a function between independent years of experience and dependent variables salary of an individual and if you guess that this was a regression model give yourself a thumbs up and if you’re able to remember that it was between independent and dependent variables and that terms give yourself two thumbs up summary so to wrap it up we went over what is K means and we went through also the chart of choosing your elbow method and assigning a random centroid to the Clusters Computing the distance and then going in there and figuring out what the minimum centroids is and Computing the distance and going through that Loop until it gets the perfect C and we looked into the elbow method to choose K based on running our clusters across a number of variables and finding the best location for that we did a nice example of clustering cars with K means even though we only looked at the first two columns to make it simple and easy to graph we can easily extrapolate that and look at all the different columns and see how they all fit together and we looked at what is logistic regression we discussed the sigmoid function what is logistic regression and then we went into an example of class ifying tumors with Logistics I hope you enjoyed part two of machine learning thank you for joining us today for more information visit http://www.s simplylearn outcom again my name is Richard kersner a member of the simplylearn team get certified get ahead if you have any questions or comments feel free to write those down below the YouTube video or visit us at simply learn.com we’ll be happy to supply you with the data sets or other information as requested [Music] hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to nerd up and get certified click here today we’re going to cover the K nearest neighbors a l referred to as knnn and KNN is really a fundamental place to start in the machine learning it’s a basis of a lot of other things and just the logic behind it is easy to understand and Incorporated in other forms of machine learning so today what’s in it for you why do we need KNN what is KNN how do we choose the factor K when do we use knnn how does KNN algorithm work and then we’ll dive into my favorite part the use case predict whether a person will have diabetes or not that is a very common and popular used data set as far as testing out models and learning how to use the different models in machine learning by now we all know Ma machine learning models make predictions by learning from the past data available so we have our input values our machine learning model Builds on those inputs of what we already know and then we use that to create a predicted output is that a dog little kid looking over there and watching the black cat cross their path no dear you can differentiate between a cat and a dog based on their characteristics cats cats have sharp claws uses to climb smaller length of ears meows and purs doesn’t love to play around dogs they have dle claws bigger length of ears barks loves to run around you usually don’t see a cat running around people although I do have a cat that does that where dogs do and we can look at these we can say we can evaluate their sharpness of the claws how sharp are their claws and we can evaluate the length of the ears and we can usually sort out cats from dogs based on even those two characteristics now tell me if it is a cat or a dog not question usually little kids no cats and dogs by now unless they live a place where there’s not many cats or dogs so if we look at the sharpness of the claws the length of the ears and we can see that the cat has smaller ears and sharper claws than the other animals its features are more like cats it must be a cat sharp claws length of ears and it goes in the cat group because KNN is based on feature similarity we can do classification using KNN classifier so we have our input value the picture of the black cat goes into our trained model and it predicts that this is a cat coming out so what is KNN what is the KNN algorithm K nearest neighbors is what that stands for it’s one of the simplest supervised machine learning algorithms mostly used for classification so we want to know is this a dog or it’s not a dog is it a cat or not a cat it classifies a data point based on how its neighbors are classified KNN stores all available cases and classifies new cases based on a similarity measure and here we gone from cats and dogs right into wine another favorite of mine KNN stores all available cases and classifies new cases based on a similarity measure and here you see we have a measurement of sulfur dioxide versus the chloride level and then the different wines they’ve tested and where they fall on that graph based on how much sulfur dioxide and how much chloride K and KNN is a perimeter that refers to the number of nearest neighbors to include in the majority of the voting process and so if we add a new glass of wine there red or white we want to know what the neighbors are in this case we’re going to put k equals 5 we’ll talk about K in just a minute a data point is classified by the majority of votes from its five nearest neighbors here the unknown point would be classified as red since four out of five neighbors are red so how do we choose K how do we know k equals five I mean that’s was the value we put in there so we’re going to talk about it how do we choose a factor K KN andn algorithm is based on feature similarity choose the right value of K is a process called parameter tuning and is important for better accuracy so at k equals 3 we can classify we have a question mark in the middle as either a as a square or not is it a square or is it in this case a triangle and so if we set k equals to 3 we’re going to look at the three nearest neighbors we’re going to say this is a square and if we put k equals to 7 we classify as a triangle depending on what the other data is around and you can see as the K changes depending on where that point is is that drastically changes your answer and uh we jump here we go how do we choose the factor of K you’ll find this in all machine learning choosing these factors that’s the face you get it’s like oh my gosh did I choose the right K did I set it right my values in whatever machine learning tool you’re looking at so that you don’t have a huge bias in One Direction or the other and in terms of knnn the number of K if you choose it too low the bias is based on it’s just too noisy it’s it’s right next to a couple things and it’s going to pick those things and you might get a skewed answer and if your K is too big then it’s going to take forever to process so you’re going to run into processing issues and resource issues so what we do the most common use and there’s other options for choosing K is to use the square root of n so N is a total number of values you have you take the square root of it in most cases you also if it’s an even number so if you’re using uh like in this case squares and triangles if it’s even you want to make your K value odd that helps it select better so in other words you’re not going to have a balance between two different factors that are equal so usually take the square root of N and if it’s even you add one to it or subtract one from it and that’s where you get the K value from that is the most common use and it’s pretty solid it works very well when do we use KNN we can use KNN when data is labeled so you need a label on it we know we have a group of pictures with dogs dogs cats cats data is Noise free and so you can see here here when we have a class and we have like underweight 140 23 Hello Kitty normal that’s pretty confusing we have a high variety of data coming in so it’s very noisy and that would cause an issue data set is small so we’re usually working with smaller data sets where I you might get into gig of data if it’s really clean doesn’t have a lot of noise because KNN is a lazy learner I.E it doesn’t learn a discriminative function from the training set so it’s very lazy so if you have very complicated data and you have a large amount of it you’re not going to use the KNN but it’s really great to get a place to start even with large data you can sort out a small sample and get an idea of what that looks like using the KNN and also just using for smaller data sets KNN works really good how does the KNN algorithm work consider a data set having two variables height in centimeters and weight in kilograms and each point is classified as normal or underweight so we can see right here we have two variables you know true false they’re either normal or they’re not their underweight on the basis of the given data we have to classify the below set as normal or underweight using KNN so if we have new data coming in that says 57 kilg and 177 cm is that going to be normal or underweight to find the nearest neighbors will calculate the ukian distance according to the ukan distance formula the distance between two points in the plane with the coordinates XY and ab is given by distance D equals the square Ro T of x – a^ 2 + y – b^ 2 and you can remember that from the two edges of a triangle we’re Computing the third Edge since we know the X side and the yide let’s calculate it to understand clearly so we have our unknown point and we placed it there in red and we have our other points where the data is scattered around the distance D1 is a square root of 170 – 167 2 + 57 – 51 2ar which is about 6. 7 and distance 2 is about 13 and distance three is about 13.4 similarly we will calculate the ukian distance of unknown data point from all the points in the data set and because we’re dealing with small amount of data that’s not that hard to do and it’s actually pretty quick for a computer and it’s not a really complicated Mass you can just see how close is the data based on the ukian distance hence we have calculated the ukian distance of unknown data point from all the points as shown where X1 and y1 equal 57 and 170 whose class we have to classify so now we’re looking at that we’re saying well here’s the ukian distance who’s going to be their closest neighbors now let’s calculate the nearest neighbor at k equals 3 and we can see the three closest neighbors puts them at normal and that’s pretty self-evident when you look at this graph it’s pretty easy to say okay what you know we’re just voting normal normal normal three votes for normal this is going to be a normal weight so majority of neighbors are pointing towards normal hence as per KNN algorithm the class of 57170 should be normal so recap of knnn positive integer K is specified along with a new sample we select the K entries in our database which are closest to the new sample we find the most common classification of these entries this is the classification we give to the new sample so as you can see it’s pretty straightforward we’re just looking for the closest things that match what we got so let’s take a look and see what that looks like in a use case in Python so let’s dive into the predict diabetes use case case so use case predict diabetes the objective predict whether a person will be diagnosed with diabetes or not we have a data set of 768 people who were or were not diagnosed with diabetes and let’s go ahead and open that file and just take a look at that data and this is in a simple spreadsheet format the data itself is comma separated very common set of data and it’s also a very common way to get the data and you can see here we have columns a through I that’s what 1 2 3 4 5 6 7 eight um eight columns with a particular tribute and then the ninth colum which is the outcome is whether they have diabetes as a data scientist the first thing you should be looking at is insulin well you know if someone has insulin they have diabetes CU that’s why they’re taking it and that could cause issue on some of the machine learning packages but for a very basic setup this works fine for doing the KNN and the next thing you notice is it didn’t take very much to open it up um I can scroll down to the bottom of the data there’s 768 it’s pretty much a small data set you know at 769 I can easily fit this into my ram on my computer I can look at it I can manipulate it and it’s not going to really tax just a regular desktop computer you don’t even need an Enterprise version to run a lot of this so let’s start with importing all the tools we need and before that of course we need to discuss what IDE I’m using certainly you can use any particular editor for python but I like to use for doing uh very basic visual stuff the Anaconda which is great for doing demos with the Jupiter notebook and just a quick view of the Anaconda Navigator which is the new release out there which is really nice you can see under home I can choose my application we’re going to be using python 36 I have a couple different uh versions on this particular machine if I go under environments I can create a unique environment for each one which is nice and there’s even a little button there where I can install different packages so if I click on that button and open the terminal I can then use a simple pip install to install different packages I’m working with let’s go ahead and go back under home and we’re going to launch our notebook and I’ve already you know kind of like uh the old cooking shows I’ve already prepared a lot of my stuff so we don’t have to wait for it to launch because it takes a few minutes for it to open up a browser window in this case I’m going it’s going to open up Chrome because that’s my default that I use and since the script is pre-done you’ll see I have a number of windows open up at the top the one we’re working in and uh since we’re working on the KNN predict whether a person will have diabetes or not let’s go and put that title in there and I’m also going to go up here and click on Cell actually we want to go ahead and first insert a cell below and then I’m going to go back up to the top cell and I’m going to change the cell type to markdown that means this is not going to run as python it’s a markdown language so if I run this first one it comes up in nice big letters which is kind of nice remind us what we’re working on and by now you should be familiar with doing all of our Imports we’re going to import the pandas as PD import numpy as in P pandas is the pandas data frame and numpy is a number array very powerful tools to use in here so we have our Imports so we’ve brought in our pandas our numpy our two general python tools and then you can see over here we have our train test split by now youed should be familiar with splitting the data we want to split part of it for training our thing and then training our particular model and then we want to go ahead and test the remaining data just see how good it is pre-processing a standard scaler pre-process accessor so we don’t have a bias of really large numbers remember in the data we had like number pregnancies isn’t going to get very large where the amount of insulin they take and get up to 256 so 256 versus 6 that will skew results so we want to go ahead and change that so that they’re all uniform between minus one and one and then the actual tool this is the K neighbors classifier we’re going to use and finally the last three are three tools to test all about testing our model how good is it let me just put down test on there and we have our confusion Matrix our F1 score and our accuracy so we have our two general python modules we’re importing and then we have our six module specific from the sklearn setup and then we do need to go ahead and run this so these are actually imported there we go and then move on to the next step and so in this set we’re going to go ahead and load the database we’re going to use pandas remember pandas is PD and we’ll take a look at the data in Python we looked at it in a simple spread sheet but usually I like to also pull it up so that we can see what we’re doing so here’s our data set equals pd. read CSV that’s a pandas command and the diabetes folder I just put in the same folder where my IPython script is if you put in a different folder you’d need the full length on there we can also do a quick length of uh the data set that is a simple python command Len for length we might even let’s go ahead and print that we’ll go print and if you do it on its own line link. data set in the jupyter notebook it’ll automatically print it but when you’re in most of your different setups you want to do the print in front of there and then we want to take a look at the actual data set and since we’re in pandas we can simply do data set head and again let’s go ahead and add the print in there if you put a bunch of these in a row you know the data set one head data set two head it only prints out the last one so I ually always like to keep the print statement in there but because most projects only use one data frame Panda data frame doing it this way doesn’t really matter the other way works just fine and you can see when we hit the Run button we have the 768 lines which we knew and we have our pregnancies it’s automatically given a label on the left remember the head only shows the first five lines so we have zero through four and just a quick look at the data you can see it matches what we looked at before we have pregnancy glucose blood pressure all the way to Ag and then the outcome on the end and we’re going to do a couple things in this next step we’re going to create a list of columns where we can’t have zero there’s no such thing as zero skin thickness or zero blood pressure zero glucose uh any of those you’d be dead so not a really good Factor if they don’t if they have a zero in there because they didn’t have the data and we’ll take a look at that CU we’re going to start replacing that information with a couple of different things and let’s see what that looks like so first we create a nice list as you can see we have the values talked about glucose blood pressure skin thickness uh and this is a nice way when you’re working with columns is to list the columns you need to do some kind of transformation on very common thing to do and then for this particular setup we certainly could use the there’s some Panda tools that will do a lot of this where we can replace the na but we’re going to go ahead and do it as a data set column equals data set column. replace this is this is still pandas you can do a direct there’s also one that that you look for your n a lot of different options in here but the N nump Nan is what that’s stands for is is non doesn’t exist so the first thing we’re doing here is we’re replacing the zero with a nump none there’s no data there that’s what that says that’s what this is saying right here so put the zero in and we’re going to replace zeros with no data so if it’s a zero that means the person’s well hopefully not dead hopefully it just didn’t get the data the next thing we want to do is we’re going to create the mean which is the in integer from the data set from the column do mean where we skip Nas we can do that that is a panda’s command there the skip na so we’re going to figure out the mean of that data set and then we’re going to take that data set column and we’re going to replace all the npn with the means why did we do that and we could have actually just uh taken this step and gone right down here and just replace zero and Skip anything where except you could actually there’s a way to skip zeros and then just replace all the zeros but in this case we want to go ahead and do it this way so you can see that we’re switching this to a non-existent value then we’re going to create the mean well this is the average person so if we don’t know what it is if they did not get the data and the data is missing one of the tricks is you replace it with the average what is the most common data for that this way you can still use the rest of those values to do your computation and it kind of just brings that particular value or those missing values out of the equation let’s go ahead and take this and we’ll go ahead and run it doesn’t actually do anything so we’re still preparing our data if you want to see what that looks like like we don’t have anything in the first few lines so it’s not going to show up but we certainly could look at a row let’s do that let’s go into our data set with a print a data set and let’s pick in this case let’s just do glucose and if I run this this is going to print all the different glucose levels going down and we thankfully don’t see anything in here that looks like missing data at least on the ones it shows you can see it skipped a bunch in the middle CU that’s what it does if you have too many lines in Jupiter notebook it’ll skip a few and and go on to the next next in a data set let me go and remove this and we’ll just zero out that and of course before we do any processing before proceeding any further we need to split the data set into our train and testing data that way we have something to train it with and something to test it on and you’re going to notice we did a little something here with the uh Panda database code there we go my drawing tool we’ve added in this right here off the data set and what this says is that the first one in pandas this is from the PD pandas it’s going to say within the data set we want to look at the iocation and it is all rows that’s what that says so we’re going to keep all the rows but we’re only looking at zero column 0 to 8 remember column 9 here it is right up here we printed in here is outcome well that’s not part of the training data that’s part of the answer yes column nine but it’s listed as eight number eight so 0er to eight is nine columns so uh eight is the value and when you see it in here zero this is actually 0 to 7 it doesn’t include the last one and then we we go down here to Y which is our answer and we want just the last one just column 8 and you can do it this way with this particular notation and then if you remember we imported the train test split that’s part of the SK learn right there and we simply put in our X and our y we’re going to do random State equals zero you don’t have to necessarily seed it that’s a seed number I think the default is one when you seed it I’d have to look that up and then the test size test size is 0.2 that simply means we’re going to take 20% of the data and put it aside so that we can test it later that’s all that is and again we’re going to run it not very exciting so far we haven’t had any print out other than to look at the data but that is a lot of this is prepping this data once you prep it the actual lines of code are quick and easy and we’re almost there with the actual writing of our KNN we need to go ahead and do a scale the data if you remember correctly we’re fitting the data in a standard scaler which means instead of the data being from you know 5 to 303 in one column and the next column is 1 to six we’re going to set that all so that all the data is between minus one and one that’s what that standard scaler does keeps it standardized and we only want to fit the scaler with the training set but we want to make sure the testing set is the X test going in is also transformed so it’s processing it the same so here we go with our standard scaler we’re going to call it scor X for the scaler and we’re going to import the standard scalar into this variable and then our X train equals scor x. fit transform so we’re creating the scaler on the XT train variable and then our X test we’re also going to transform it so we’ve trained and transformed the X train and then the X test isn’t part of that training it isn’t part of the of training the Transformer it just gets transformed that’s all it does and again we’re going to go and run this and if you look at this we’ve now gone through these steps all three of them we’ve taken care of replacing our Z for key columns that shouldn’t be zero and we replace that with the means of those columns that way that they fit right in with our data models we’ve come down here and we split the data so now we have our test data and our training data and then we’ve taken and we scaled the data so all of our data going in now no we don’t tra we don’t train the Y part the Y train and Y test that never has to be trained it’s only the data going in that’s what we want to train in there then Define the model using K neighbors classifier and fit the train data in the model so we do all that data prep and you can see down here we’re only going to have a couple lines of code where we’re actually building our model and training it that’s one of the cool things about Python and how far we’ve come it’s such an exciting time to be in machine learning because there’s so many automated tools let’s see before we do this let’s do a quick length of and let’s do y we want let’s just do length of Y and we get 7 68 and if we import math we do math. square root let’s do y train there we go it’s actually supposed to be XT train before we do this let’s go ahead and do import math and do math square root length of Y test and when I run that we get 12.49 I want to see show you where this number comes from we’re about to use 12 is an even number so if you know if you’re ever voting on things remember the neighbors all vote don’t want to have even number of neighbors voting so we want to do something odd and let’s just take one away we’ll make it 11 let me delete this out of here that’s one of the reasons I love Jupiter notebook because you can flip around and do all kinds of things on the fly so we’ll go ahead and put in our classifier we’re creating our classifier now and it’s going to be the K neighbors classifier n neighbors equal 11 remember we did 12 minus 1 for 11 so we have an odd number of neighbors P equal 2 because we’re looking for is it are they diabetic or not and we’re using the ukian metric there are other means of measuring the distance you could do like square square means value there’s all kinds of measure this but the ukian is the most common one and it works quite well it’s important to evaluate the model let’s use the confusion Matrix to do that and we’re going to use the confusion Matrix wonderful tool and then we’ll jump into the F1 score and finally accuracy score which is probably the most commonly used quoted number when you go into a meeting or something like that so let’s go ahead and paste that in there and we’ll set the cm equal to confusion Matrix y test y predict so those are the two values we’re going to put in there and let me go aead and run that and print it out and the way you interpret this is you have the Y predicted which would be your title up here we could do uh let’s just do p predicted across the top and actual going down actual it’s always hard to to write in here actual that means that this column here down the middle that’s the important column and it means that our prediction said 94 and prediction and the actual agreed on 94 and 32 this number here the 13 and the 15 those are what was wrong so you could have like three different if you’re looking at this across three different variables instead of just two you’d end up with the third row down here and the column going down the middle so in the first case we have the the and I believe the zero is a 94 people who don’t have diabetes the prediction said that 13 of those people did have diabetes and were at high risk and the 32 that had diabetes it had correct but our prediction said another 15 out of that 15 it classified as incorrect so you can see where that classification comes in and how that works on the confusion Matrix then we’re going to go ahead and print the F1 score let me just run that and you see we get a 69 in our F1 score the F1 takes into account both sides of the balance of false positives where if we go ahead and just do the accuracy account and that’s what most people think of is it looks at just how many we got right out of how many we got wrong so a lot of people when you’re data scientist and you’re talking to other data scientists they’re going to ask you what the F1 score the F score is if you’re talking to the general public or the U decision makers in the business they’re going to ask what the accuracy is and the accuracy is always better than the the F1 score but the F1 score is more telling it lets us know that there’s more false positives than we would like on here but 82% not too bad for a quick flash look at people’s different statistics and running an sklearn and running the knnn the K nearest neighbor on it so we have created a model using KNN which can predict whether a person will have diabetes or not or at the very least whether they should go get a checkup and have their glucose checked regularly or not the print accur score we got the 0818 was pretty close to what we got and we can pretty much round that off and just say we have an accuracy of 80% tells it is a pretty fair fit in the model so what is deep learning deep learning is a subset of machine learning which itself is a branch of artificial intelligence unlike traditional machine learning models which require manual feature extraction deep learning models automatically discovers representation from raw data so this is made possible through neural networks particularly deep neural networks which consist of multiple layers of interconnected nodes so these neural network are inspired by the structure and the function of human brain each layer in the network transform the input data into more abstract and composite representation for instance in image recognition the initial layer might detect simple features like edges and textures while the deeper layer recognizes more complex structure like shapes and objects so one of the key advantage of deep learning is its ability to handle large amount of unstructured data such as images audios and text making it extremely powerful for various application so stay tuned as we delve deeper into how these neural networks are trained the types of deep learning models and some exciting application that are shaping our future types of deep learning deep learning AI can be applied supervised unsupervised and reinforcement machine learning using various methods for each the first one supervised machine learning in supervised learning the neural network learns to make prediction or classify that data using label data sets both input features and Target variables are provided and the network learns by minimizing the error between its prediction and the actual targets a process called back propagation CNN and RNN are the common deep learning algorithms used for tasks like image classification sentiment analysis and language translation the second one unsupervised machine learning in unsupervised machine learning the neural network discovers Ms or cluster in unlabeled data sets without Target variables it identifies hidden pattern or relationship within the data algorithms like Auto encoders and generative models are used for tasks such as clustering dimensionality reduction and anomaly detection the third one reinforcement machine learning in this an agent learns to make decision in an environment to maximize a reward signal the agent takes action observes the records and learns policies to maximize cumulative rewards over time deep reinforement learning algorithms like deep Q networks and deep deterministic poly gradient are used for tasks such as Robotics and game playay moving forward let’s see what are the artificial neural networks artificial neural networks Ann inspired by the structure and the function of human neurons consist of interconnected layers of artificial neurals or units the input layer receives data from the external resources and it passes to one or more hidden layers each neuron in these layers computes a weighted sum of inputs and transfers the result to the next layer during training the weight of these connection are adjusted to optimize the Network’s performance a fully connected artificial neural network includes an input layer or more hidden layers and an output layer each neuron in a hidden layer receives input from the previous layer and sends its output to the next layer so this process continues until the final output layer produce the network response so moving forward let’s see types of neural networks so deep learning models can automatically learn feature from data making them ideal to tasks like image recognition speech recognition and natural language processing so the most common architecture and deep learnings are the first one feed foral neural network fnn so these are the simplest type of neural network where information flows linearly from the input to the output they are widely used for tasks such as image classification speech recognition and natural language processing NLP the second one convolutional neural network designed specifically for image and video recognition CNN automatically learn feature from images making them ideal for image classification object detection and image segmentation the third one recurrent neural networks RNN are specialized for processing sequential data time series and natural language they maintain and internal state to capture information from previous input making them suitable for task such as spe recognition NLP and language translation so now let’s move forward and see some deep learning application the first one is autonomous vehicle deep learning is changing the development of self-driving car algorithms like CNS process data from sensors and cameras to detect object recognize traffic signs and make driving decision in real time enhancing safety and efficiency on the road the second one is Healthcare diagnostic deep learning models are being used to analyze medic I images such as x-rays MRIs and CT scans with high accuracy they help in early detection and diagnosis of diseases like cancer improving treatment outcomes and saving lives the third one is NLP recent advancement in NLP powered by Deep learning models like Transformer chat GPT have led to more sophisticated and humanik text generation translation and sentiment analysis so application include virtual assistant chat Bots and automated customer service the fourth one def technology so deep learning techniques are used to create highly realistic synthetic media known as def fakes while this technology has entertainment and creative application it also raises ethical concern regarding misinformation and digital manipulation the fifth one predictive maintenance in Industries like manufacturing and anation deep learning models predict equipment failures before they occur by analyzing sensor data the proactive approach reduces downtime lowers maintenance cost and improves operational efficiency so now let’s move forward and see some advantages and disadvantages of deep learning so first one is high computational requirements so deep learning requires significant data and computational resources for training whereas Advantage is high accuracy achieves a state-of-the-art performance in tasks like image recognition and natural language processing whereas deep learning needs large label data sets often require extensive label data set for training which can be costly and time consuming together so second advantage of deep learning is automated feature engineering automatically discovers and learn relevant features from data without manual intervention the third disadvantage is overfitting so deep learning can overfit to training data leading to poor performance on new unseen data whereas the third deep learning Advantage is scalability so de learning can handle large complex data set and learn from massive amount of data so in conclusion deep learning is a transformative leap in AI mimicking human neural networks it has changed healthare Finance autonomous vehicles and NLP today we’ll take you through the exciting road map of becoming an AI engineer if our content picks your interest and helps feel your curiosity don’t forget to subscribe to our Channel hit that Bell icon so you never miss an update now let’s embark on this AI journey together as artificial intelligence continues to revolutionalize various Industries AI Engineers stand at the Forefront of this technological wave these professionals are essential in crafting intelligent systems that address complex business challenges AI projects often stumble due to poor planning Sapar architecture or scalability issues AI Engineers P crucial role in overcoming these hurdles by merging Cutting Edge AI Technologies with strategic M insights so in this video we’ll guide you through the essentials of becoming an AI engineer let’s start with the basics what does Ani engineer do an AI engineer builds AI models using machine learning algorithms and deep learning neural networks these models are pivotal in generating business insights that influence organizational decision making from developing applications that leverage sentiment analysis for contextual advertising to creating systems for visual recognition and language translation the scope of an AI engineer’s work is vast and impactful so to succeed as an AI engineer you need a blend of technical progress and soft skills so now let’s break down this eth month plan month one computer science fundamentals and beginners python so before we delve into AI it’s crucial to establish a strong foundation in computer science this month you should focus on the following topics data representation understanding bits and bytes how text and numbers are stored and the binary number system is foundational for everything in Computing this knowledge helps in comprehending how computers interpret and process data now next comes computer networks learn the basics of computer networks including IP addresses and internet routing protocols it’s essential to understand how data travels across networks using UDP TCP and HTTP which form the backbone of the internet and the worldwide web next comes programming Basics begin with the basics of programming like variables strings numbers conditionals loops and algorithm Basics these fundamentals will allow you to write and understand simple programs simultaneously you’ll also start with python the preferred language for AI so learn about variables numbers strings lists dictionaries sets tuples and control structures like if conditionals and for loops and then move on to functions and modules understand how to create functions including Lambda functions and work with modules by using pip install to add functionality to your projects next comes file handling and exceptions you should also practice reading from and writing to files as well as handling exceptions to make your programs more robust finally graas the basics of classes and objects which are crucial for writing organized and efficient code so this comprehensive overview sets the stage for more complex programming tasks that you’ll encounter in the following months now in month two you’ll move on to data structure algorithms and advanced python so building on the foundations from month one we’ll now delve into data structure and algorithm so familiarize yourself with the concept of bigo notation to understand the efficiency of different algorithms and data structures learn about arays link list hash tables TXS cues trees and graphs mastering these structures will allow you to store and manipulate data effectively now next comes algorithms you should EXP algorithms such as binary search bubble sort quick sort mer sort and recursion these are essential for optimizing your code and parall you’ll Advance your python skills so you can dive into inheritance generators iterations list comprehensions decorators multi-threading and multiprocessing these topics will enable you to write more efficient and scalable code so this month’s learning prepares you to handle complex data operations and enhance your coding efficiency now in month three you’ll move on to Version Control SQL and data manipulation so in the third month the focus shifts to collaboration and data management number one Version Control so understand the importance of Version Control Systems especially get and GitHub so learn basic commands such as ADD commit and push you should also learn how to handle branches reward changes and understand Concepts like head diff and merge so these skills are invaluable for tracking changes and collaborating with other developers next pull requests Master the art of creating and managing pool requests to contribute to collaborative projects next we’ll Di into SQL for managing databases so first we’ll start with SQL Basics so learn about relational databases and how to perform basic queries and then you’ll move on to Advanced queries understand complex query techniques such as CT subqueries and window functions and then comes joints and database management so study different types of joints like Left Right inner and full joint you should also learn how to create databases manage indexes and right stored procedures Additionally you will use numi and pandas for data manipulation and learn basic data visualization techniques this comprehensive skill set will be crucial As you move into more advanced data science topics so now in month four you’ll deal with math and statistics for AI so mathematics and statistics are the backbone of AI and this month is dedicated to these critical subjects so first learn about descriptive versus inferential statistics continuous versus discrete data nominal versus ordinal data measures of central tendency like mean median mode and measures of dispersion like variance and standard deviation after that understand the basis of probability and delve into normal distribution correlation and cience after which you should move on to advanced concepts so you can study the central limit theorem hypothesis testing P values confidence intervals and so on in parallel you should also study linear algebra and calculus so in linear algebra learn about vectors metrices Egan values and Egan vectors and in calculus cover the basics of integral and differential calculus so this mathematical Foundation is essential for developing and understanding AI models setting you up for success as you trans ition into machine learning now in month five comes exploratory data analysis which is Eda and machine learning so with a solid foundation in math and statistics you are now ready to delve into machine learning number one pre-processing learn how to handle na values treat out layers perform data normalization and conduct future engineering you should also understand encoding techniques such as one hard and label encoding you’ll also explore supervised and unsupervised learning with a focus on regression and classification and learn about linear models like linear and logistical regression and nonl models like decision tree random Forest Etc and then understand how to evaluate models using metrics such as mean squared error mean absolute error me for regression and accuracy precision recall Etc then comes hyperparameter tuning learn about techniques like grid search CV and random search CV for optimizing your models after which you’ll move on to unsupervised learning here you can study clustering techniques like K means and hierarchical clustering and delve into dimensionality reduction with PCA so this month’s focus on Eda and model building will prepare you for more complex AI applications transitioning to the next phase you’ll begin to work on deploying these models and real world scenarios so in month six comes mlops and machine learning projects so this month we’ll cover the operational aspects of machine learning and work on practical projects so in mlops Basics learn about apis particularly using fast API for Python and server development understand devops fundamentals including cicd pipelines and containerization with Docker and cubs you also gain familiarity with at least one Cloud platform like AWS or aure now in month 7 comes deep learning so in this month we delve into the world of deep learning so number one comes noodle Network so learn about noodle networks including forward and backward propagation and build multi-layer perceptrons after which we’ll move on to Advanced architectures so here explore convolutional Neal networks which are CNN for image data and sequence models like rnms and lsdm so this deep learning knowledge will be crucial As you move into specialized areas of AI in the final month now in the final month the eighth month comes NLP or computer vision so the final month you have the option to specialize in either natural language processing NLP or computer vision so first we’ll leave NLP track so here you should learn about rejects text representation methods like count vectorizer tfidf b word TUC embeddings and text classification with Nave base and familiarize yourself with the fundamentals of libraries like Spacey and nltk and work on end to end NLP project and talking about computer vision track focus on basic image processing techniques like filtering Edge detection image scaling and rotation utilize libraries like open CV and build upon the CNN Knowledge from the previous month practice data preprocessing and augmentation so by the end of this month you should have a solid foundation your chosen specialization ready to embark on your AI engineering career so in conclusion adopting AI is more than just a trend it’s a strategic move that can transform your organization’s approach to machine learning hey everyone welcome to Simply learn today’s video will compare and contrast artificial intelligence deep learning machine learning and data science but before we get started consider subscribing to Simply learns YouTube channel and hit the Bell icon that way you’ll be the first to get notified when we post similar content before moving on let me ask you two interesting queries which among the following is not a branch of artificial intelligence data analysis machine learning deep learning neural networks and

the second query is what is the main difference between machine learning and deep learning please leave your answer in the comments section below and stay tuned to get the answer first we will unwrap deep learning deep learning was first introduced in the 1940s deep learning did not develop suddenly it developed slowly and steadily over seven decades many thesis and discoveries were made on deep learning from the 1940s to 20 thousand thanks to companies like Facebook and Google the term deep learning has gained popularity and may give the perception that it is a relatively New Concept deep learning can be considered as a type of machine learning and artificial intelligence or AI that imitates how humans gain certain types of knowledge deep learning includes statistics and predictive modeling deep learning makes processes quicker and simpler which is advantageous to data scientists to gather analyze and interpret massive amounts of data having the fundamentals discussed let’s move into the different types of deep learning neural networks are the main component of deep learning but neural networks comprise three main types which contain artificial neural networks orn convolution neural networks or CNN and recurrent neural networks or RNN artificial neural networks are inspired biologically by the animal brain convolution neural networks surpass other neural networks when given inputs such as images Voice or audio it analyzes images by processing data recurrent neural networks uses sequential data or series of data convolutional neural networks and recurrent neural networks are used in natural language processes speech recognition image recognition and many more machine learning the evolution of ml started with the mathematical modeling of neural networks that served as the basis for the invention of machine learning in 1943 neuroscientist Warren mccullock and logician Walter pittz attempted to quantitatively map out how humans make decisions and carry out thinking processes therefore the term machine learning is not new machine learning is a branch of artificial intelligence and computer science that uses data and algorithms to imitate how humans learn gradually increasing the system’s accuracy there are three typ types of machine learning which include supervised learning what is supervised learning well here machines are trained using label data machines predict output based on this data now coming to unsupervised learning models are not supervised using a training data set it is comparable to the learning process that occurs in the human brain while learning something new and the third type of machine learning is reinforcement learning here the agent learns from feedback it learns to behave and given environment based on actions and the result of the action this feature can be observed in robotics now coming to the evolution of AI the potential of artificial intelligence wasn’t explored until the 1950s although the idea has been known for centuries the term artificial intelligence has been around for a decade still it wasn’t until British polymath Allen Turing posed the question of why machines couldn’t use knowledge like humans do to solve problems and make make decisions we can Define artificial intelligence as a technique of turning a computer-based robot to work and act like humans now let’s have a glance at the types of artificial intelligence weak AI performs only specific tasks like Apple Siri Google assistant and Amazon’s Alexa you might have used all of these Technologies but the types I am mentioning after this are under experiment General AI can also be addressed as artificial general intelligence it is equivalent to human intelligence hence an AGI system is capable of carrying out any task that a human can strong AI aspires to build machines that are indistinguishable from the human mind both General and strong AI are hypothetical right now rigorous research is going on on this matter there are many branches of artificial intelligence which include machine learning deep learning natural language processing robotics expert systems fuzzy logic therefore the correct answer for which is not a branch of artificial intelligence is option a data analysis now that we have covered deep learning machine learning and artificial intelligence the final topic is data science Concepts like deep learning machine learning and artificial intelligence can be considered a subset of data science let us cover the evolution of data science the phrase data science was coined in the early 19 60s to characterize a new profession that would enable the comprehension and Analysis of the massive volumes of data being gathered at the time since its Beginnings data science has expanded to incorporate ideas and methods from other fields including artificial intelligence machine learning deep learning and so forth data science can be defined as the domain of study that handles vast volumes of data using modern tools and techniques to find unseen patterns derive meaningful information and make business decisions therefore data science comprises machine learning artificial intelligence and deep learning hello everyone I am M and welcome back to simpl YouTube channel these days we usually ask Siri hey Siri how far is the nearest fuel station whenever we are series something the powerful speech recognition system gets to work and converts the audio into its seual form this is then sent to the Apple server for further processing and then machine learning algorithms are run to understand the user’s intent and then finally Siri tells you the answer well this is happening because of these machine learning algorithms think about it not too long ago most tasks were done by people whether it was building things performing surgeries or even playing games like chess humans were in control but now things are changing fast almost all manual tasks are becoming automated meaning machines and computer are taking over those jobs this shift is redefining what we consider manual work machine learning a type of artificial intelligence is at the heart of this transformation there are so many different machine learning algorithms out there each designed to help computers learn and get better at task from playing chess like grandmas to performing delicate surgeries with amazing Precision these algorithms are making technology smarter and more personal every day so now that we have covered a brief about ml I want you guys to quickly check out the quiz attach Below in the description section take a moment to answer and let me know your thoughts in the comment section as well in today’s video we are going to cover the top 10 machine learning algorithms that every aspiring machine learning engineer should know whether you are building models to predict the future analyzing data or creating smart apps mastering these algorithm will help you make the most of machine learning so now let’s get started with what is algorithm what is an algorithm in computer programming an algorithm is a set of well- defined instruction to solve a particular problem it takes bunch of information sources and delivers the ideal result most of us must be using SnapChat to apply filter on our faces while making videos or capturing photographs but do you know how does Snapchat recognize your face while capturing videos or photographs and put filters on it even if there are multiple phases it applies filter on every phas accurately this became possible with the help of the face recognition technique which uses machine learning algorithms to detect faces and apply required filters on them so this is the basic idea of how an algorithms work so let’s move ahead in this video and we’ll see now how algorithm Works in machine learning so how do algorithm Works everyone knows the algorithm is a step-by-step process to approach a particular problem so there are numerous example of algorithm from figuring out sets of number to finding Roots through maps to sh data on the screen let’s understand this by using example every algorithm is built on inputs and the outputs Google search algorithm is no different the input is the search field and the output is the page of result that appears when you enter a particular phrase or keyword also known as Sur or search engine result page Google has algorithm so it can sort it result from various website and provide the user with the best result when you start you will see the search box will attempt to guess what you looking for in order to better understand what the user is looking for the algorithm is trying to gather as many as suggestion from them as possible the result from the search field that best messes the query will then be janked they choose which website will Rank and in what position using more than 200 ranking variables now that we have covered a brief about how algorithms work I want you guys to quickly check out the quiz attached Below in the description section take a a moment to answer and let me know your thoughts in the comment section as well moving forward let’s see types of machine learning so machine learning is classified into supervised learning unsupervised learning and reinforcement learning there are two sort of problems in supervised learning classification and regression certain types of machine learning algorithms fall under the classification are decision tree algorithms skn algorithm logistic algorithm name based algorithm support Vector machine algorithm svm however in regression type so machine learning algorithms are linear regression regression trees nonlinear regression basian linear regression now talking about unsupervised learning there are two sort of problem in unsupervised learning which are clustering and Association algorithms that fall under the clustering problems include K means clustering algorithms principal component analysis however algorithm that fall under Association problem are a prior algorithm and FP growth and rein enforcement learning there are two types positive reinforcement and negative reinforcement the reinforcement learning algorithms are mainly used in AI application and gaming application the main used algorithms are Q learning State action reward State action s r essay and deep Q neural network dqn and Mark of decision process after discussing what algorithms is and its types so now let’s see some popular machine learning algorithms the first one is linear regression and the second one is logistic regression third one is decis trees and the fourth one is svm support Vector machine and the fifth one is PCA principal component analysis and the sixth is K means clustering and seventh is random forest and eighth is Auto encoders and Ninth is dbscan it’s known as density based special clustering of application with noise and the last one we have is hierarchical clust so now let’s see these algorithm one by one so first one we have linear regression a statical method used to model the relationship between a dependent variable which is known as the target variable and one or more independent variables which are the predictors it assumes a linear relationship between the inputs and the output real life example is house price prediction predicting house prices based on features like size location and number of room so for example on average large houses cost more is the linear Trend identified by this algorithm some application are real estate price prediction sales forecasting and stock price prediction and the second one is logistic regression a classification algorithm used to predict binary outcomes that is yes or no or two or fors it uses a logistic function to model the probability of a particular classes real life example is email spam filter identifying spam emails based on certain features keyword sender number of links for instance and an email with claim your free gift now is classified as a Spam application are email spam filtering medical diagnosis and customer churn prediction and many more the third one we have decision trees a flowchart like tree structure used to make decision each node in the tree represent a decision based on a feature and each branch represent a possible outcome real life example is loan approval process a bank using decision trees might ask is the applicant credit is score about 700 and proceed with further question to approve or deny the loan application are loan approval medical diagnosis and marketing campaign analysis the fourth one we have random Forest an ensemble method that combines multiple decision trees to improve accuracy each tree give a vote on the outcome and the majority of vote determines the final decision real life example is medical diagnosis diagnosing diseases based on a patient data like age cholesterol level and blood pressure each decision tree in the forest makes a prediction and majority vote decides the diagnosis application are Health Care disease prediction fraud detection and customer segmentation number fifth we have support Vector machine svm a classification algorithm that find the optimal boundary to separate data into different classes often used for binary classification real life example is image recognition basically phase detection svm can be used to detect faces in an image by classifying Legions of the image as either phas or non-fas based on Pixel value applications are facial recognition speech recognition and hand return return digit recognition so now let’s move forward and see some unsupervised learning algorithm the number one is K means clustering a clustering algorithm that groups data into specified number K of a cluster based on similarity the goal is to minimize the distance between data points in each cluster real life example is customer segmentation in marketing grouping customers into segments like high Spenders frequent Shoppers based on their purchasing Behavior to personalize marketing efforts applications are customer segmentation Market Basket analysis and social media grouping in number seven we have hierarchical clustering a clustering algorithm that creates a tree like structure which is also known as dendrogram by grouping similar data points which can be either agglomerative which is bottom up or divisive which is top down real life example is Gene clustering and Healthcare clustering genes with similar explanation patterns to study cancer cell the dendograms help researchers identify genes that behave similarly in response to treatment application are Gene expressive analysis customer Behavior Analysis and document clustering in number eight we have dbn the full form is density based special clustering of application with noise are density based clustering algorithms that identify cluster based on the density of data point it can also handle noise which is outliers by labeling them as noise Point real life example is identifying crime Hotpot detecting areas with frequent criminal activity by clustering location based on crime density with L being excluded applications are crime hotspot detection anomal detection and GEOS special analysis number ninth we have principal component analysis PCA a dimension deduction that transform data into smaller set of uncorrelated variables which are principal components to capture the most variance in the data real life example is image compression comprising image by reducing the number of variables retaining key feature that preserve most of the images information thus reducing storage space applications are data compression Dimension reduction and data visualization the last one we have Auto encoders a type of neural network used to learn efficient representation of data typically for DST deduction or anomaly deduction it encs input data into a compressed representation and then reconstruct it back real life example is fraud detection in financial transaction detecting unusual transaction by training and encoders or normal transaction data when an outlier transaction occurs it is flagged as potentially fraud application are fraud detection image denoising and recommendation system so these algorithms are the part of many real system that we interact with daily from predicting what product you might want to buy online to directing frauds in your bank account they are used in various Industries such as Healthcare Finance retail and security llms if you ever wondered how machine learning can Now understand and generate humanik text you are in the right place from chat boards like chat GPT to AI assistant that powers search engines llms are transforming how we interact with technology one of the most exciting advancement in this space is Google Gemini or open AI chgb a cutting as large language model designed to push the boundaries of what AI can achieve in this video we will explore what llms are how they work and why models like Gemini are critical for the future of AI Google Gemini is part of a new wave of AI models that are smarter faster and more efficient it is designed to understand context better offer more accurate responses and integrate deeply into service like Google search and Google Assistant providing more humanik interactions so we will break down the science behind llms including their massive training data set Transformer architecture and how models like Gemini use deep learning Innovation to change Industries plus we will compare Google Gemini to other popular LM such as open aity models showing how each of these Technologies is used to power chatboard virtual assistants and other a application by end of this video you will have a clear understanding of how large language models like chamini work their key features and what they mean for their future AI don’t forget to like subscribe and hit the Bell icon to never miss any update from Simply learn so what are the large language models large language models like CH GPD 4 generative pre-trained Transformer 4 o and Google Gemini are sophisticated AI system designed to comprehend and generate humanik text these models are built using deep learning techniques and are trained on was data set collected from the internet they leverage self attention mechanism to analyze relationship between words or tokens allowing them to capture context and produce coherent relevant responses llms have significant application including powering virtual assistant chat boards content creation language translation and supporting research and decision making their ability to generate fluent and contextually appropriate text has advanced natural language processing and improved human computer interaction so now let’s see what are large language model used for large language models are utilized in scenarios with limited or no domain specific data available for training these scenarios include both few short and zero short training approaches which rely on the model’s strong inductive bias and its capability to derive meaningful representation from a small amount of data or even no data at all so now let’s see how are large language model trained large language models typically undergo pre-training on a board all encompassing data set that shares statical similarities with the data set is specific to the Target task the objective of pre- training is to enable the model toire high level feature that can later be applied during the fine tuning phase for specific task so there are some training processes of llm which involves several steps the first one is text pre-processing the textual data is transformed into a numerical representation that the llm model can effectively process this conversion may be involved techniques like tokenization encoding and creating input sequences the second one is random parameter initialization the models parameter are initialized randomly before the training process begins the third one is input numerical data the numerical representation of the text data is fed into the model of processing the models architecture typically based on Transformers allows it to capture the conceptual relationship between the words or tokens in the next the fourth one is loss function calculation a loss function calculation measure the discrepancy between the models prediction and the actual next word or token in a snx the llm model aims to minimize this laws during training the fifth one is parameter optimization the models parameter are registered through optimization technique this involves calculating gradient and updating the parameters accordingly gradually improving the model’s performance the last one is itative training the training process is repeated over multiple iteration or AO until the models output achieve a satisfactory level of accuracy on that given task or data set by following this training process large language model learn to capture linguistic patterns understand context and generate coherent responses enabling them to excel at various I language related task the next topic is how do large language models work so large language models leverage deep neural network to generate output based on patterns learn from the training data typically a large language model adopts a Transformer architecture which enables the model to identify relationship between words in a sentence irrespective of their position in the sequence in contrast to RNs that rely on recurrence to capture token relationship Transformer neural network employ self attention as their primary mechanism self attention calculates attention scores that determine the importance of each token with respect to the other token in the text sequence facilitating the modeling of integrate relationship within the data next let’s see application of large language models large language models have a wide range of application across various domains so here are some notable applications the first one is natural language processing NLP large language models are used to improve natural language understanding task such as sentiment analysis named entity recognition text classification and language modeling the second one is chatbot and virtual assistant LGE language models power conversational agents chatbots and virtual assistant providing more interactive and humanik user interaction the third one is machine translation L language models have been used for automatic language translation enabling text translation between different languages with improved accuracy the fourth one is sentiment analysis llms can analyze and classify the sentiment or emotion expressed in a piece of text which is valuable for market research brand monitoring and social media analysis the fifth one is content recommendation these models can be employed to provide personalized content recommendations enhancing user experience and engagement on platforms such as News website or streaming services so these application highlight the potential impact of large language models in various domains for improving language understanding automation this video on stable diffusion one of the most advanced AI tools for generating stunning photo realistic images from just text whether you are describing a vibrant s a futuristic city or a sural dreamcap stable diffusion can turn your imagination into reality within seconds the latest version stable diffusion XL brings even higher quality results thanks to a larger Network and improved techniques not only you can generate images but you can also enhance them with features like in painting where you can edit parts of an image or out painting which expand image Beyond its original borders so how does it works the AI starts by breaking down an image into noise then cleverly reverse that process to recreate a clear and detailed picture we will also show you how to create effective proms to get the best result from stable diffusion whether you’re using web based version or running it on your own computer and yes you can even use it for commercial purposes stick around because I will be giving you a live demo and showing you step by step how to create your own images with this powerful tool so without any further Ado let’s get started so hello guys guys welcome back to the demo part of this stable diffusion so first we will I will open stgt AI okay so this is the artificial intelligence company which launched stable diffusion text to image generator okay so we have multiple models in this okay so we have image model we have video model audio 3D languages okay let’s go to this image models yes so we have two series sd3 series and SD XL series sd3 large is there tur large turbo is there sd3 medium is there in SD XL see stable diffusion XEL is there SD XL turbo is there and Japanese St diffusion XEL is there so there are two ways of using stable diffusion the first is you can install stable diffusion locally and you can use it but there are some requirements you should have on your system okay just like there should be you know uh GPU should be there graphic card and V graphic card should be there or another graphic card will be fine right so here you can uh you know use the API or you can get the license or you can download the code okay or you can can read about this table diffusion Excel here right so I will show you how to download and install this stable diffusion for a while I don’t have any graphic card on my system okay so I can’t use but I will show you properly how to use and how to run it so first step I will give you this link okay this is the hugging phase so here sdxl is you know their recently launched model okay so here you can uh read it about all the configuration or you want okay the how you can install so first you have to install python okay first you have to install python latest version and then second you have to install after installing python you have to install git not git co-pilot just get bash okay this you have to install Windows either Mac OS or Linux okay so here what you have to do I will give you this link and you have to go to files and version and you have to download this one sdxl base 1.0 0. this one or you have to download it so see I have already downloaded it okay so this file is all around 6.5 GB okay for a while I’m canceling it because as you can see here I already downloaded okay so after downloading this you have to go to you know you have to write stable web UI stable diffusion web UI okay I will give you this link as well okay here what you have to do you have to go here and download zip file okay again zip file is there but again let me download it for you okay this is done okay after that what you have to do you have to unzip this stable diffusion okay then this folder will come okay then you have to go down and here you can see web U UI user for Windows badge file you have to run this badge file okay if you are you Mac User you can uh run this shell script okay so I will install this I have already installed but you have to just double click this and everything will be installed okay so it will launch one page after installing this it will take like half an hour because it will download multiple files of 4455 GB okay for a while let me run it see this page will come so as you can see here I am running stable diffusion locally 1 to 7.0 this okay this is local so here so now we downloaded first file right if you remember we downloaded this file so what you have to do you have to go to downloads then you have to copy this file downloaded file then you go have to go here here you have to go to models then here you have to dist stable diffusion then you can copy paste here your file okay so these are the models so this is the latest models okay it will install with v15 prune model but we want to use latest model okay so that’s why we are copying it okay so while installing it will show you some these type of things here it will download multiple things right so after installing this just refresh and you can see two models are there okay you can select either okay I’m selecting this here I will write astronaut writing horse but it will give me error okay I it will generate so what error is coming see found no Nvidia driver on your system okay I don’t have any drivers install in my system I don’t have any graphic card but if you have graphic card it will run smoothly it will give you all the outputs but again we can use this stable diffusion online as well web okay here see stable diffusion 2.1 demo I will give you this link as well okay so let me write again then the same thing as not writing horse generate image so it will take 11 second as it is showing we have to wait okay it’s scanning almost done processing see Astron on riding horse okay okay let’s go to chat gbt and ask some funny give me some funny text to image generators prompts okay so let’s get okay A C weing chef at adapting oh this one is cool okay let’s copy this and paste it here okay let’s read a penguin dressed as a pirate searching for treasure on a ice flow with a parrot that only sarks okay see pirate head see pirate head okay let’s try something else a robot trying to blend at a grumpy be sitting in a therapist okay this one is cool I guess okay let me run it again it will take some 11 seconds but this you know if you run locally this will work definitely I’m sure okay because in other than my system it’s running smoothly see a grumpy we are sitting in a therapist office discussing its feeling about something see sitting this one so this is how you can use a staple diffusion okay locally and using web right this is better than Dal because Dal is again expensive it was November 30 2022 Sam Alman Greg Brockman and ilas AER would never have thought that with the push off a button they would completely alter the lives of all human beings living on the earth and of future generations to come on November 30 the open AI team launched Chad GPD Chad GPT was born that day Alit a very small event in the history of Internet Evolution but one that can no less be marked as one of the most significant events of modern IT industry Chad GPD a text based chatbot that gives replies to questions asked to it is built on GPT large language model but what was so different I mean the Google search engine YouTube Firefox browser they all have been doing the same for brackets so how is Chad GPT any different and why is it such a big deal well for starters Chad GPT was not returning indexed websites that have been SEO tuned and optimized to rank at the top chat GPT was able to comprehend the nature tone and the intent of the query asked and generated text based responses based on the questions asked it was like talking to a chatbot on the internet minus the out of context responses with the knowledge of 1.7 trillion parameters it was no shock that a Computing system as efficient and prompt as chat gbt would have its own set bits so did Chad GB it was bound by the parameters of the language model it was trained on and it was limited to giving outdated results since the last training data was from September still jjt made Wes in the tech community and continues to do so just have a look at the Google Trend search on Chad GP every day new content is being published on Chad GPT and hundreds of AI tools the sheer interest that individuals and Enterprises across the globe has shown in chat GPT and AI tools is immense ai ai ai ai generative AI generative AI generative ai ai ai ai ai ai ai a AI now here comes the fun part Chad gbt or for that matter any large language model runs on neural networks trained on multi-million billion and even trillions of data parameters these chatbots generate responses to user queries based on the input given to it while it may generate similar responses for identical or similar queries it can also produce different responses based on the specific context phrasing and the quality of input provided by each user additionally chat GPT is designed to adapt its language and tone to match the style and preferences of each user so its responses may worry in wording and tone depending on the individual users communication style and preferences every user has their own unique style of writing and communication and chat gpt’s response can worry based on the input given to it so this is where prompt Engineers come into play prompt Engineers are expert at prompt engineering sounds like a cyclic definition right well let’s break it down first let’s understand what prompts are so prompts are any text based input given to the model as a query this includes statements like questions asked the tone mentioned in the query the context given for the query and the format of output expected so here is a quick example for your understanding now that we have discussed what a prompt is so let us now understand who is a prompt engineer and why it has become the job for the future broadly speaking a prompt engineer is a professional who is capable of drafting queries or prompts in such a way that large language models like GPT Palm llama Bloom Etc can generate the response that is expected these professionals are skilled at crafting accurate and contextual prompts which Le allows the model to generate desired results so here is a quick example for you prompt Engineers are experts not only at the linguistic front but they also had extensive domain knowledge and very well versed with the functioning of neural networks and natural language processing along with the knowledge of scripting languages and data analysis leading job platforms like indeed and Linkedin already have many prompt engineer positions in the United States alone job postings for this role run in the thousands reflecting the growing demand the salary of prompt Engineers is also compelling with a range that spends from $50,000 to over $150,000 per year depending on experience and specialization so there are multiple technical Concepts that a prompt engineer must be well wored in to be successful in their jobs such as multimodality tokens weights parameters Transformers to name a few whether it’s Healthcare defense IT services or attech industry the need for skill prompt Engineers is on the rise there are already several thousand job openings in this field and the demand will continue to grow so if you want to hop on this amazing opportunity and become an expert prompt engineering professional then now is the time let us know in the comments what you think about prompt engineering and if you want to know more about the skills needed to become a prompt engineer then make sure to like and share this video with your friends and family and tell them about this amazing new job opportunity hello everyone I am M and welcome to today’s video where we will be talking about llm benchmarks tools used to test and measure how well large language models like GPT and Google Gemini performs if you have ever wondered how AI models are evaluated this video will explain it in simple terms llm benchmarks are used to check how good these models are at tasks like coding answering questions and translating languages or summarizing text these tests use sample data and a specific measurement to see how will the model perform for example the model might be tested with a few example like few short learning or none at all like zero short learning to see how it endles new task so now the question arises why are these benchmarks important they help developers understand where a model is strong and where it needs Improvement they also make it easier to compare different models helping people choose the best one for their needs however llm benchmarks do have some limits they don’t always predict how well a model will work in real world situation and sometimes model can overfit meaning they perform well on test data but struggle in Practical use we will also cover how llm leaderboards rank different model page on their benchmark scores giving us a clear picture of Which models are performing the best so stay tuned as we dive into how llm Benchmark work and why they are so important for advancing AI so without any further Ado let’s get started so what are llm benchmarks llm benchmarks are standardized tools used to evaluate the performance of La language models they provide a structur way to test llms on a specific task or question using sample data and predefined metrics to measure their capabilities these Benchmark assess various skills such as coding Common Sense reasoning and NLP tasks like machine translation question answering and text summarization the importance of llm Benchmark lies in their role in advancing model development they track the progress of an llm offering quantitive insights into where the model performs well and where Improvement is needed this feedback is crucial for guiding the fine tuning process allowing researchers and developers to enhance model performance additionally benchmarks offers an objective comparison between different llms helping developers and organization choose the best model for their needs so how llm benchmarks work llm Benchmark follow a clear and systematic process they present a task for llm to complete evaluate it performance using specific metrics and assign a score based on how well the model performs so here is a breakdown of how this process work the first one is setup llm Benchmark come with pre-prepared sample data including coding challenges long documents math problems and real world conversation the task is span various areas like Common Sense reasoning problem solving question answering summary generation and translation all present to the model at the start of testing the second step is testing the model is tested on one of the three ways few short the llm is provided with a few example before being prompted to complete a task demonstrating its ability to learn from limited data the second one is zero shot the model is asked to perform a task without any prior examples testing its ability to understand New Concept and adapt to unfamiliar scenarios the third one is fine tune the model is trained on a data set similar to the one used in The Benchmark aiming to enhance its performance on the specific task involved the third step is the scoring so after completing the task The Benchmark compares the model’s output with the expected answer and generates a score typically ranging from 0 to 100 reflecting how accurately the llm perform so now let’s moving forward let’s see key metrics for benchmarking llms so LLS Benchmark uses various metrics to assess performance of large language model so here are some commonly used metric the first one is accuracy of precision measure the percentage of correct prediction made by the model the second one is recall also known as sensitivity measure the number of true positive reflecting the currect prediction made by the model the third one is F1 score combines both accuracy and recall into a single metric weighing them equally to address any false positive or negatives F1 score ranging from Z to one where one indicates perfect precision and recall the fourth one is exact match tracks the percentage of predictions that exactly match the correct answer which is especially used for the task like translation and question answering the fifth one is perity gges here it will tell you how well a model predicts the next word or token a lower perplexity score in indates better task comprehension by the model the sixth one is blue bilingual evaluation understudy is used for evaluating machine translation by comparing andrs sequence of adjacent text element between the models output and the human produced translation so these quantitative metrics are often combined for more through evaluation so in addition human evaluation introduces qualitatively factors like coherence relevance and semantic meaning provide a nuan assessment however human evalution can be time consuming and subjective making a balance between quantitative and qualitative measures important for comprehensive evaluation so now let’s moving forward see some limitation of llm benchmarking while llm benchmarking available for assessing model performance they have several limitation that prevents them from the fully predicting real world Effectiveness so here are some few the first one is bounded scoring once a model achieves the highest possible scores on The Benchmark that Benchmark loses its utility and must be updated with more challenging task to to remain a meaningful assessment tool the second one is Broad data set llm Benchmark often rely on Sample data from diverse subject and task so this wide scope may not effectively evaluate a model performance in edge cases specialized fields or specific use cases where more tailor data would be needed the third one is finite assessment Benchmark only test a model current skills and as llms evolve and a new capabilities emerg new benchmarks must be created to measure these advancement the fourth one is overfitting so if an llm is trained on the same data used for benchmarking it can be lead to overfitting where the model performs well or the test data but struggles with the real task so this result is scores that don’t truly represent the model’s broader capabilities so now what are llm leaderboards so llm leaderboards publish a ranking of llms based on the variety of benchmarks leaderboard provide a way to keep track to the my llms and the compare their performance llm leaderboards are especially beneficial in making decision on which model do you so here are some so in this you can see here open AI is leading and GPD 40 second and the Llama third with 45 parameter B and 3.5 Sonet is there so this is best in multitask reasoning what about the best in coding so here open AI o1 is leading I guess this is the oran one and the second one is 3.5 Sonet and after that in the third position there is GPD 4 so this is inv best in coding so next comes fastest and most affordable models so fastest models are llama 8B parameter 8B parameter and the second one is L ma Lama 70b and the third one is 1.5 flesh this is Gemini 1 and lowest latency and here it is leading llama again in cheapest models again llama 8B is leading and in the second number we have Gemini flash 1.5 and in third we have GPT 4 mini moving forward let’s see standard benchmarks between CLA 3 Opus and GPT 4 so in journal they are equal in reasoning CLA 3 Opus is leading and in coding gp4 is leading in math again GPT 40 is leading in tool use cloud 3 Opus is leading and in multilingual Cloud 3 opas leading today we will discuss about booming topic of this era multimodel AI let’s understand with an example imagine you are showing a friend your vacation photos you might describe the site you saw the sounds you heard and even your emotions this is how humans naturally understand the World by combining information from different sources multimodel AI aims to do the same thing let’s break the model AI first multimodel refers to two different ways of communicating information like text speech images and video where AI stands for artificial intelligence which are systems that can learn and make decision so multimodel AI is a type of AI that can process and understand information from multiple sources just like you do when you look at your vacation photos now that we have understood what is multimodel AI let’s now go a bit further it is obvious that multimodel AI is not the only AI out there but what is big deal about multimodel AI that everyone is talking about that is what we will discuss in this segment so now let’s understand the difference between multimodel Ai and the generative AI while both multimodel Ai and generative AI are exciting advancement in AI they differ in their approach to data and functionality so generative AI Focus creates new data similar to the data it’s stained down and in multimodel AI focus is to understand and processes information from multiple sources that is text speech images and video data types of generative AI are primarily works with a single data type like text writing poems or images that is generating realistic portraits whereas in multimodel AI data types works with diverse data types enabling a more comprehensive understanding of the world the third one is examples like chat boards text generation models image editing tools whereas multimodel AI example covers virtual assistants medical diagnosis system and autonomous vehicles strength are can produce creative and Innovative content automated repetitive task and personalize your experience whereas in multimodel AI stren are provides a more humanlike understanding of the world and improve accuracy in ense generative AI excels at creating new data while multimodel AI excels at understanding and utilizing exising data from diverse sources they can be complimentary with generative models being used to create new data for multimodel AI Sy s to learn more from and improve their understanding to the work next let’s understand what are the benefits of multimodel AI the benefits of multimodel AI is that it offers developers and users an AI with more advanced reasoning problem solving and generation capabilities these advancement offers endless possibilities for how Next Generation application can change the way we work and live for developers looking to start building Vex AI gini API offers features such as Enterprise security data residency performance and technical support if you’re existing Google Cloud customers can start prompting with Gemini AI in Vex AI right now next let’s see what are the multimodel AI big challenges multimodel AI is powerful but faces hurdles the first one is data overload managing and storing massive diverse data is expensive and complex the second one is meaning mystery teaching AI to understand subtle difference in between meaning like sarcasm is tricky the third one is data alignment ensuring data points from different sources saying in tune is challenging the fourth one is data scarcity limited and potentially biased data sets hinder effective training the fifth one is missing data Blues what happens when data is missing like disorted audio the last one is Black Box Blues understanding how AI makes decision can be difficult so these challenges must be addressed to unlock the full potential of model AI next let’s see what is the future of multimodel AI and why is it important multimodel Ai and multimodels are represent a Leap Forward in how developers build and expand the functionality of AI in the next generation of application for example Gemini can understand explain and generate high quality code in the world’s most popular programming languages like python Java C++ and go freeing developers to work on building more featured field application multimodels AI potential also bring the world closer to AI That’s less like smart software and more like an expert helper or assistant open AI is one of the main leaders in the field of generative AI with its chat GPT being one of the most popular and widely used examples chat GPT is powered by open AI GPT family of large language models llms in August and September 2024 there were rumors about a new model from open AI code name strawberry at first it was unclear if it was the next version of GPT 40 or something different on September 12 open AI officially introduced the 01 model hi I am mik in this video we will discuss about open model 01 and its types after this we will perform some basic prompts using openai preview and openai mini and at the end we will see comparison between the open A1 models and GPD 4 so without any further Ado let’s get started what is open a 1 the open A1 family is a group of llms that have been improved to handle more complex reasoning these models are designed to offer a different experience from GPT 440 focusing on thinking through problems more thoroughly before responding unlike older models o is built to solve challenging problems that require multiple steps and deep reasoning open o1 models also use a technique called Chain of Thought prompting which allows the model to Think Through problem step by step open a o consist of two models o1 preview and o1 mini the 01 preview model is meant for more complex task while the 01 mini is a smaller more affordable version so what can open A1 do open A1 can handle many tasks just like other GPT models from open AI such as answering questions summarizing content and creating new material however o is especially good at more complex task including the first one is enhanced using the 0 models are designed for advanced problem solving particularly in subjects like science technology engineering and math the second one is brainstorming and ideation with its improved reasoning o is great at coming up with creative ideas and solution in various field the number third is scientific research o is perfect for task like anoing cell sequencing data or solving complex math needed in areas like Quantum Optics the number fourth is coding the over models can write and fix code performing well on coding tests like human EV and Bo forces and helping developers build multi-step workflows the fifth one mathematics o1 is much better at math than previous model scoring 83% in the international mathematics Olympia test compared to gp4 row 133% it also did well in other math competition like aim making it useful for generating complex formulas for physics and the last one is self checking o can check the accuracy of its own responses helping to improve the reliability of its answer you can use open A1 models in several ways chat gbd plus and team users have access to 0 preview and 0 mini models and can manually choose them in the model picker although free users don’t have access to the 0 models yet open AI planning to offer 0 mini to them in the future developers can also use these models open a as API and they are available on third party platform like Microsoft as Yi studio and GitHub models so yes guys I have opened this chb 40 model here and chat G1 prev as you can see so I have this plus model OKAY the paid version of chat gbd so I can access this o1 preview and 01 Mini model okay we will go with o1 preview model and we will put same prompts in both the model of the chat gity 40 and the o1 preview and see what are the differences are coming okay so we will do do some math questions and we will do some coding we will do some Advanced reasoning and quantum physics as well okay so let’s start with so I have some prompt already written with me so first one is number Theory okay so what I will do I will copy it from here and paste it in this and both okay so let me run in for and o1 preview so here you can see it’s thinking okay so this is what I was saying chain of thoughts okay so these are the chain of thoughts first is breaking down the primes this is and then is identifying the GCT and now see the difference between the output see output is 561 is not a prime number and the gcd greatest common deceiver of 48 and 180 is 12 okay here see charge o preview is giving the output in step by step first see determine if 561 is a prime number or not the number 561 is not a prime number it composite number because it has this this this okay then Second Step then the greatest common divisor then they found 12 and answer is no 561 is not composite number because of this and the greatest common divisor of 48 and 18 is 12 see just see the difference between the two model this is why CH gp1 models are crazy for math coding and advanced reasoning quantum physics for these things okay so let’s go with our second step so here if you will see you can see the attach file option in charity 40 okay you can come upload from your computer but here you we will see in o1 there is no attach file option this is one drawback okay so here upload from computer so this is one small okay and let me open this and this is the question I have okay yeah so I will copy this I will run this and this okay see it’s start giving the answer and o1 is still thinking solving the equation then solving analyzing the relationship okay so CH GT1 will take time but it will give you more accurate more step by step as you want okay so here you can see solve for x question this this this and here the steps you can see okay this is more structured way you can say in a good structured way okay ch1 preview give you in good structur way as o1 mini as well okay so yeah so here they wrote just one and two this this this and here if you’ll see question one solve for x in this and step one is this step two is this and step three is this then the answer of xal to three but here simply the roote we know this this this and X = to 3 for the second question see expanding the left hand side this this this but here step one square both sides of the given equation start by squaring both side okay it’s written but not in good way okay so this is why o1 is better for math okay so now let’s check it for the coding part okay so I have one question okay let me see what output it will give to first I will write I need okay leave it I will copy it and I will copy it as well here run it and run it see it’s start giving answer okay and still this will adjust the parameters and Shing the code generation because jpt o1 will think first then it will analyze then after that it will give you answers okay here the code is done see here the code is done and it’s still thinking step one and first here you can’t see anything see step setup development environment PP install n PL Li then this then this and here nothing and but I will ask it okay give me code in one tab okay here also like give me code [Music] and in single tab okay so I can just copy and paste so what I will do I will open one online compiler and I will directly copy it and paste okay so let’s finish this I hope it will work so let me open W3 schools compiler okay yeah same I will open for this W3 School okay so let me copy the code and my bad and paste it here same for goes for this okay okay I will copy the code and I will paste it here okay I hope okay okay it gives something yeah cool so yes now you can see the difference between the output so this is the output of 40 and this is the output of o1 preview see o1 preview output is this and this is the out output of 40 so this is the difference this is why o1 takes time but it will give you more accurate result in a good way okay so now let’s check something else so moving on let’s see some Advanced reing question okay so this is The Logical puzzle one the first one okay so I will copy it and I will paste it here okay this is for 0 this is for preview because why I’m not comparing o1 with mini because they both are same but slightly different is there okay so here we can see more difference between for old model versus new model you can say okay so now see the answer is end in this much only but it will explain you in a better way see thoughts for 7 Seconds explanation that case one then case two okay with conclusion in both scenarios summary and this here this one small explanation and that’s it right so they created o1 preview for more you know it will describe you more in a better way right now let’s see some scientific reasoning as well okay so let me copy it here say still thinking but just start giving answer see thought for 16 seconds so again I will say that you know chat G1 is much better than chb for Chad gbt 4 is great for you know content writing and all but Chad gbt 01 preview and mini are very good for reasoning math coding or quantum physics these type of things okay Advanced reasoning okay charity 4 is good for you know generative text okay like for marketing writing copies emails and all of those so now let’s see some comparison between o models and GPD 40 model when new models are released their capabilities are revealed through Benchmark data in the technical reports the new open AI model excel in complex using task it surpasses human phsd level accuracy in physics chemistry biology on the GP QA benchmark coding becomes easier with 01 as it rents in the 89th percentile of the competitive programming questions code Force the model is also outstanding in math on a qualifying exam for international mathematics Olympiad IMO GPD 4 solved only 133% of problems while 0 achieved 83% this is truly next level on the standard ml benchmarks it has huge improvements across the board MML means multitask accuracy and GP QA is reasoning capability human evaluation open a ask people to compare o wi with GPT 40 on difficult open-handed task across different topics using the same method as the 0 preview versus GPT 4 comparison like o preview o mini was preferred over gp4 for tasks that require strong reasoning skills but GPT 40 was still favored for language based task model speed as a concrete example we compared responses from gp40 o mini and O preview on the word in question while GPT 4 did not answer correctly both o mini and O preview did and O mini read the answer around 3 to 5x faster limitation and wor next due to its specialization on STEM Science technology engineering and math reasoning capabilities or min’s factual knowledge on non stamp topics such as dates biographics and trivia is comparable to small LM such as GPT 40 meaning open AI will improve these limitation in future version as well as has experiment the extending the model to other modalities and specialities outside of the stem on July 25th open AI introduce search gbt a new search tool changing how we find information online unlike traditional search engines which require you to type in specific keywords Serb lets you ask question in natural everyday language just like having a conversation so this is a big shift from how we were used to searching the web instead of thinking in keywords and hoping to find the right result you can ask now sir gbd exactly what you want to know and it will understand the context and give you direct answers it designed to make searching easier and more intuitive without going through links and pages but with this new way of searching so there are some important question to consider can sgpt compete with Google the search giant we all know what makes sgpd different from AI overview another recent search tool and how does it compare to chat GPT open AI popular conversational AI so in this video we are going to explore these questions and more we will look at what makes rgbt special how it it compares to other tools and why it might change the way we search for information whether you are new into Tech or just curious this video will break it down in simple words stick around to learn more about sgb so without any further Ado let’s get started so what is search GPT sech GPT is a new search engine prototype developed by open AI designed to enhance the way we search for information using AI unlike a typical jetbot like chat GPT s GPT isn’t just about having a conversation it’s focused on improving the search experience with some key features the first one is direct answer instead of Simply showing you a list of links sepd delivers direct answer to your question for example if you ask what is the best wireless noise cancellation head for in 2024 sir gbt will summarize the top choices highlighting their pros and cons based on Expert reviews and user opinions so this approach is different from the traditional search engines that typically provide a list of links leading to various articles or videos the second one is relevant sources SE GPD responses come with clear citations and links to the original sources ensuring transparency and accuracy so this way you can easily verify the information and Del deeper into the topic if you want the third one conversational search sgpd allows you to have a back and forth dialogue with the search engine you can ask follow-up questions or refine your original query based on the responsive you receive making your search experience more interactive and personalized now let’s jump into the next topic which is Ser GPT versus Google so sir GPT is being talked about a major competitor to Google in the future so let’s break down how they differ in their approach to search the first one is conversational versus keyword based search search GPT uses a conversational interface allowing user to ask question in natural language and refine their queries through follow-up question so this creates a more interactive search experience on the other hand Google relies on keyword-based search where user enter specific terms to find relevant web pages the second thing is direct answer versus list of links so one of the SE gpts is Stand Out Fe feacher is its ability to provide direct answers to the question it summarizes information from the various sources and clearly CES them so you don’t have to click through multiple links Google typically present a list of links leaving user to shift through the result to find the information they need the third one AI powered understanding versus keyword matching sir GPS uses AI to understand the intent behind your question offering more relevant result even if your query isn’t perfectly worded Google’s primary method is keyword matching which can sometimes lead to less accurate result especially for complex queries the fourth one Dynamic context versus isolated searches so sear gbt maintains content across multiple interaction allowing for more personalized responses whereas Google treats e search as a separate query without remembering previous interaction and the last one realtime information versus index web pages Ser is aim to provide the latest information using realtime data from the web whereas Google V index is comprehensive but may include outdated or less relevant information so now let’s jump into the next topic which is serd versus AI overviews so SBD and AI overviews both use AI but they approach search and information delivery differently it’s also worth noting that both tools are still being developed so their features and capabilities May evolve and even overlap as they grow so here are the differences the first one is Source attribution Serb provides clear and direct citation

linked to the original sources making it easy for user to verify the information whereas AI overviews include links the citation may not always be clear or directly associated with specific claims the second one is transparency control sgbt promises greater transparency by offering Publishers control over how their content is used including the option to opt out of AI training AI overviews offer less transparency regarding the selection of content and the summarization process used the next one is scope and depth sgbt strives to deliver detailed and comprehensive answers pulling from a broad range of sources including potential multimedia content and in AI overviews offers a concise summary of key points often with links for further exploration but with a more limited scope now let’s jump into the next part Ser GPT versus CH GPT Ser GPT and CH GPT both developed by open share some core features but serve different purposes so here are some differences the first one is primary purpose sgpt designed for search providing direct answer and sources from the web whereas sgpd focus on conversational AI generating text responses the second one is information sources sgb relies on realtime information from the web whereas sh GPD knowledge based on this training data which might not be correct the third one is response format sgbt prioritize concise answers with citation and Source links so whereas sgbt is more flexible generating longer text summarizes creative content code and Etc the next feature is use cases surity idle for fact finding research and task requiring upto-date information whereas chpd is suitable for creative writing brainstorming drafting emails and other open andas so now question arises when will sergt be released sergt is currently in a limited prototype phase meaning it’s not yet widely available open a is testing with a select group to gather feedback and improve the tool so if you are interested in trying sgbd so you can join the weight list on its web page but you will need a CH gbd account a full public release by the end of 2024 is unlikely as openi hasn’t set a timeline it’s more probable that sgbd features will gradually added to the Chad GPD in 2024 or in 25 with a potential Standalone release later based on testing and the feedback Sora is here open AI has introduced Sora an advanced AI tool for creating videos now available at sora.com earlier this year Sora was launched to turn text into realistic videos showcasing exciting progress in AI technology now open AI has released Sora turbo a faster and more powerful version available to jbt Plus and pro users Sora lets user create videos in 1080P quality up to 20 second long and in different formats like WID screen vertical or Square it includes tools like a storyboard for precise control and options to remix or create videos from scratch there is also a community section with featured and recent videos to spark ideas chat plus users can make up to 50 videos per month at 480p resolution while Pro user get access to more features like higher resolution and longer video duration while Sora turbo is much faster open AI is still working to improve areas like handling complex setion and making the technology more affordable to ensure safe and ethical use Sora includes features like visible watermarks content moderation and metadata to identify videos created with Sora Sora makes it easier for people to create and share stories through video open AI is excited to see how user will explore new creative possibilities with the powerful tool so welcome to the demo part of the Sora so this is the landing page when you will log in in Sora so let me tell you I have the charb plus version not the pro version so I have some 721 credits left okay uh later on I will tell you what are the credits okay so let’s explore something here so so these are some recent videos which I have created or tested you can see and this featured version is all the users of Sora which are creating videos so it’s coming under featured so we can learn or we can generate some new ideas like this okay like this parot and all like this is very cool for Learning and these are some the saved version and these are all videos and uploads like this so let’s come into the credit Parts okay so you can see I have 721 credit left so if you will go this help open.com page and this page you can see what are the credit so credits are used to generate videos with Sora okay so if you will create 4 ATP Square 5sec video it will take only 20 credits okay for 10 it will take 40 then this then this okay for 480p uh this much credit 25 credit 50 credit like this 7208 is different can it be different okay so here it is written please not that the questing multiple variation at once will be charged at the same rate as running two separate generation request okay so here this plus icon you can see so here you can upload the image or video okay so you can also do like this you can upload the image and you can create the video from that image okay and this is choose from library your personal Library this library right and this option is for the variation okay like these are basically presets like balloon World Stop Motion archive World filar or cardboard and the paper okay so this is the resolution okay 480p this is the fastest in video generation okay 720p will take like 4X lower and 1080p 8X lower I guess 1080p is only available in ch gpt’s uh pro version got it okay so we uh we are just you know doing I will I’m just uh showing you demo so I will uh choose this fastest version only okay so this is the time duration how long you want like 5 Seconds 10 seconds 15 and 20 seconds is available in pro version okay of CH gity and this is how much versions you want to take we will I will select only two okay because it will again charge more credits to you okay and these credits are monthly basis I guess okay these credits are monthly basis okay see again recard remix Bland Loop to create content this will take again more credits okay see here chity plus up to 50 priority videos 1,000 credits okay per month I guess yeah per month up to 720p resolution and the 5 Second duration and charge Pro up to 500 priority videos 10,000 credits unlimited relax videos up to 1080p resolution 20 second duration download without Watermark here you can download with Watermark I guess I don’t know yeah we’ll see uh about uh everything okay Char but charity Pro is $200 per month so huh yeah it’s expensive right so yes let’s uh do something creative so okay I will write here okay polar be enjoin on the Sahara Desert okay Sahara Desert yeah okay you can do storyboard as well or you can create directly videos okay so let me show you the storyboard first yeah so frame by frame you can give you know different uh what to say prompt okay here you can give different prompt okay polar beer with family okay playing with scent like this okay and later on it will create a whole the video okay third you can describe again you can add image like this okay this is a story created by the chgb okay let’s create okay added to the queue okay it’s very fast actually almost done yeah see with family you can see playing with the scand okay so these are the two variation okay you can choose either this or either that one or either that one okay I’m feeling this muches yeah so here you can again addit your story recut you can trim or extend this video in a new story board okay so basically record features allow you to creators to you know pinpoint and isolate the most impactful frame in a video extending uh them in either direction to build out of like complete scene okay if you choose recut okay this thing fine then remix what remix do is like the remix features allows user to reimagine existing videos by alterating their components without losing you know that essence of the original originality you can say okay you want to you know add or remove certain things okay what if I want to remove you know that this polar be or like this okay or you can say we can you know change colors or we can some tweak visual elements and this blend so this blend feature allows you to combine with different video if I want to upload some videos it will blend both the video this video particular with that video which I will upload okay right and the last Loop you know by the name Loop features you know uh feature make it easy to create seamless repetition of the video okay this will like this this is one option is ideal for background visuals music videos like this okay so this is how you can create video in 2 minutes I can say just by giving prompt okay this one is favorite you can save it for the favorite and this this you can sharing options are there copy link or this unpublished and you can download see I told you without Watermark is available in only pro version so I this with Watermark you can download see download a video and just a click or you can download as a GFI as well right and uh add to a folder okay fine this is the notification activity right so let’s create one okay monkey with family driving car on this space yeah so okay I will choose this temp 16 by9 let it takes more credit of mine it’s okay yeah add it to the queue if you’ll go to favorites it will come this one because I chose it okay and if you will ask how this Sora is working so it’s like text to image Genera AI model such as like d three stable diffusion and M so Sora is are diffusion models that means that it starts with each frame of the video consisting of the static noise see oh it’s cartoonish but yeah see if you want Lamborghini you can add that I want Lamborghini or Tesla whatever so this is how you can generate videos with Sora you know in a quick in quick two minutes so just write notebook LM in the browser and it will land here so this is the landing page and I’ll give you an overview of the website now when you come down scroll down and you will get how people are using notebook LM it’s power study organize your thinking and sparking new ideas then you will also get some reviews what people are saying like notebook LM blew our mind and basically all the good reviews now you can see this notebook LM plus if you click on this you will basically get the premium features and the subscription plans so this is free for individuals to get started and here these are the points you will get if you subscribe for Notebook LM plus so I’ll go back for this overview section and I will click on try notebook LM so when you click on this it will get here you can go to settings and basically I have created a dark mode because it is soothing we can even create a device or a light mode now here you can even click on this to get this type of view and even you can uh click on this boxes to get a box type view so basically I will click on this create new and as soon as I click on this I get this to upload the sources so I will just close this to show how it looks so it looks good now it’s time to upload the files so basically when you click on ADD Source you will get this and when you scroll down you will get three types of ways to upload the files Google Drive Link and paste text and even you can find the source limit over here like when you upload sources it should not be more than 50 so fine I’ll upload I upload three medical reports I’ll upload another Jan and Michael reports are done and I upload John these are basically the random medical reports I have collected from the internet to just show you how it works you can even add YouTube video links and even drive links now when you click on a particular thing particular report you will get Source guide a summary basically and even you will get some key topics now if I brought this thing you can see you will get some prompts over here already there a pre-written promps what factors contributed to James Smith’s anemia diagonosis so I will choose this now you can see it has provided an Insight James Smith’s anemia diagnosis is based on her low hemoglobin level of 9.5 so this is basically a clickable one that it gives a reference to what it has taken from she is also experiencing fattic and pale skin which are common symtoms of anemas the same reference you can refer to basically these are the proofs that it has taken all the insights from these three reports so now if you want notebook LM to just ignore one particular report you can basically unselect it from here and if you want everything to get selected click over here so when I have chosen this pre-selected prompt it has basically use the resources provided and find the helpful insights for that particular resource for example you can see on the screen and in some case they will provide some content that will be a larger one and they will even provide references for that m one is very short and so it has basically provided only one reference you can even add sources by clicking on this but remember it should not exit the level I have only uploaded three you can add Mor 47 you can now even see that this is one it is written save to note basically means you can save this answer for future reference like if I do save note it will basically appear over here now the best thing is you can even have a feedback of this particular response you have got maybe a good response or a bad response you can even copy here and paste it in another place this is very helpful for the students who study from different materials and may get puzzled writing down notes so basically they can save notes for future dos and even they can copy from here and paste it somewhere else you can even delete this future note like delete note and you can delete now if you see on the right section you will see a note section over here now you can even add a note you can physically write down what you have understood or maybe anything that is important to you basically acting as your notepad now coming to the surprising part that is the studio over here previously it was a notebook guide but they have recently updated the features it’s basically like a guide for you containing study guide The Briefing documents FAQ and the timeline now if we click on briefing the document it create a brief document taking the help of the resources you have provided so basically you can get a brief document out of the three resources you have provided so I’ll click on this and you can see John Michael and Jane three of them are involved and it has given an overview of all the three reports over here so this is basically making a summary of these three reports you can even get a study guide prepared for you that will clear out your idea even more now I’m very excited as I’m going to show a magical thing this can actually convert your resources rather the summar is into a podcast a podcast if you don’t no is like a radio show you can listen to anytime online it covers different topics like stories discussions or even informations you want to know so it is written audio overview you deep dive conversation of two hes in English only you can even customize it but I will generate it may take a few minutes so just stick around it’s almost a 8 minutes audio of James Smith’s animia diagnosis so we will hear that okay so we’ve got a stack of medical reports here all right and uh we’re going to take a look at three different patients sounds good uh we have Jane Smith she’s 29 years old okay we’ve got uh John Doe he is 45 right and we’ve got Michael Johnson and he is 52 okay a good spread yeah and you know it’s really interesting how these cases even though they’re different stages of life right K offers a window into some pretty common health challenges yeah definitely so are you ready to dive in I’m ready let’s do it okay so first up we’ve got Jane Smith and now it’s incredible how it actually turns the normal resources that I have provided just a three medical reports of Jane John and Michael to a real podcast so all thanks to AI it has made it conversational and even you can get a whole overview of these three reports in a very conversational and like a podcast real podcast way now you can even click on this three dots and you can change the playback speed and you can even download from here and if you don’t like it you can even delete from here it depends on your requirement the best thing is you can even give your feedback and help it grow that makes a sense because there’s always a room to upgrade now you must be thinking why it is helpful it can normally be used as a podcast maker but it can also be helpful for the people who can even hear and remember Concepts more than just by studying like a mundan routine let’s suppose you have 50 sources now it might be difficult for you to read line by line and document by document so it’s better to generate a summary even better to listen to a podcast and get an overview of all the sources and that’s how it works it was definitely heart rendering experience converting reports to a podcast now you must be thinking who will get benefited by this Google notebook LM notebook LM is for everyone who works with information students can simplify studying by summarizing notes and organize resources content creators can turn ideas into engaging podcast or easily structure their research professionals can save time by managing reports presentations or complex data whether you are learning creating or working on big projects notebook LM helps you do it faster smarter and with less effort so I can foresee that not making is about to hit New Heights and the way we have been doing it might soon be a thing of past with AI stepping in Google notebook LM is just the start of this exciting Journey it’s still in its early stages but only will get better from here I’m thrilled to see the amazing things it can do and I hope you are too think about this you’re about to create something amazing an AI that can think learn and grow in ways we only dreamed of and here’s the best part you don’t need to be an AI expert to make it happen what if you could use Lang chain a tool that connects most advanced language models to realtime data allowing you to build AI applications that are both smart and flexible it sounds like something out of Science Fiction but with lanch it’s real as large language models quickly become the backbone of many applications L chain has emerged as a game changing tool transforming the way we use these powerful Technologies today we are diving into Lang chain the ultimate framework that makes AI development easier for everyone whether you want to understand user questions with one llm create humanlike responses with another or pulling Data Insights Lang chain makes it all happen but Lang chain is more than just making AI easy to use it’s about getting these models to work together seamlessly L chain simplifies what could be a complex process into simple powerful system from Smart chat BS to enhancing data for machine learning the possibilities with L chain are endless so why has Lang chain become one of the fastest growing open source projects ever and how you can use Lang chain to get ahead in the world of AI so let’s first start by understanding what is Lang chain Lang chain is an open source framework designed to help developers build AI powered applications using large language models or llms just like GPD 4 but what really sets langin apart is its ability to link these powerful models with external data sources and other components so this allows you to create sophisticated natural language processing NLP applications that can do much more than just understand and generate text they can interact with live data data bases and other software tools now you might be asking is Lang chain a python Library yes it is L chain is available as a python Library which means you can easily integrate into your existing python projects but it doesn’t stop there langin is also available in JavaScript and typescript making it accessible to a wide range of developers whether you’re working on a web app or backend system or a standalone tool Lang chain fits right in so why should we use Lang chain so why is Lang chain such a big deal developing AI applications typically requires using multiple tools and writing a lot of complex score you need to manage data retrieval processing integration with language models and many more this can be time consuming and complicated especially if you’re not deeply familiar with AI Lang chain simplifies the entire process allowing you to develop and deploy and even manage AI applications more easily and efficiently let’s break down this with an example imagine you’re building a chart board that needs to provide realtime weather updates without land chain you would need to manually connect your to weather API fetch data process it and then format the response but with L chain the process becomes much more straightforward you can focus on what matters the most building the features and functionalities of your application while L chain handles the complex Integrations behind the scenes so let’s discuss the key features of L chain L chain is packed with features that make it incredibly powerful and flexible let’s take a closer look at some of the key components at first we have model interaction langin allows you to interact with any language model seamlessly it manages the inputs and outputs to these data models ensuring that you can integrate them into your application without a hitch for example if you want to use gp4 to generate responses to customer inquiries Lang chain makes it easy to plug that model into your workflow next we have data connection and retrieval one of the Lang chain strength is its ability to connect to external data sources so whether you need to pull data from a datab Bas or web API or even a file system L chain simplifies this process you can retrieve transform and use data from almost any Source making your AI applications more robust and versatile next we have chains Lang chain introduces the concept of chains where you can link multiple models and components together to perform complex task for example you might have a chain where one component retrieves data and another processes it and a third generates a humanlike respond this chaining ability allows you to build workflows that would otherwise require extensive coding next we have agents agents are like the decision makers in Lang chain they can create commands deciding the best course of action based on the input they receive for example an agent Cloud determine which language model to use based on the type of query it’s handling making your application smarter and more adaptive then we have memory Lang chain supports both shortterm and long-term memory making that your a I can remember past interactions this is particularly useful for applications like chat boards where maintaining context over multiple interactions are significantly improve the user experience imagine you’re building a virtual assistant the assistant needs to remember previous interactions to provide relevant responses now with the help of Lang chain you can easily Implement memory so that the assistant knows what you have talked before making the conversation more natural and engaging so what are the integr supported by Lang chain well Lang chain is designed to work seamlessly with a wide variety of Integrations making it extremely versatile for different use cases llm providers Lang chain supports integration with major llm providers like open AI hugging face and goor this means you can easily incorporate the latest and most powerful language models into your applications then we have data sources Lang chain can connect to variety of data sources such as Google search Wikipedia and Cloud platforms like AWS Google cloud and Azure this makes it easy to retrieve and use the most upto-date information in your applications Vector databases are used for handling large volumes of complex data such as images or long text sochain integrates with Vector databases like pine cone and these databases store data as high dimensional vectors which helps in allowing for efficient and accurate retrieval so this is particularly useful for applications that require searching through large data sets quickly for example let’s say you are building an application that needs to analyze thousands of documents to find relevant information with Lang chain you can integrate a vector database like pine cones to your documents as vectors and quickly search through using them powerful language models this capability can save you a lot of time and make your application much more effective now the question arises how to create proms in Lang chain creating prompts in Lang chain is much easier with something called a prompt template a prompt template acts as a set of instructions for language model and these templates can be customized to varing levels of customizations for example you might design a prom template to ask simple questions or you could create more detailed instructions that guide the language model to produce high quality responses let’s walk through how you can create a prompt using Lang chain in Python Step One is installing Lang chain first you’ll need to have python installed in your system once that’s set up you can install Lang Chain by opening your python shell or terminal and running the following command pip install Lang chain next step is adding Integrations to Lang chain this often requires at least one integration to function properly a common choice is open ai’s language model API to use open AI API you’ll need to create an account on the openai website and obtain your API key after that install open’s python package and input your API key like this so this is the following uh command which is inserted below you can look into that next step is importing and using a prom template now that you have L chain and the necessary integration set up you can start creating your promps langin offers a pre-made prom template that allows you to structure your text in a way that the language model can easily understand here’s how you can do it through this particular prompt given below so in this Example The Prompt template ask two variables which is an objective and a Content subject and uses them to generate a prompt the output might be something like tell me an interesting fact about zebras the language model would then take this prompt and return a relevant fact about zebras based on the given objective this is simple but powerful way to generate Dynamic prompt that can be adapted to a wide range of task from answering questions to generating creative content let’s now talk about how to develop applications with Lang chain so building applications with Lang chain is straightforward and involves a few key steps first Define your application know exactly what problem it’s solving and identify the necessary components like language models data sources and user interaction the next step is to build the functionality using Lang chains components such as prompt chains and agents this is where you can create the logic that drives your application like processing user input or retrieving data then we have customizing your application to meet specific needs Lang chains flexibility allows you to tweak proms integrate additional data sources and fine tune models for Optimal Performance before going live it’s crucial to test and deploy your application testing helps catch any issues and L chain makes debugging easy so you can deploy your application with confidence for example let’s build a chatboard using Lang chain first we have to Define it is a chart board that answers question about technology Trends we then create a functionality by setting a prompt and a chain to process input next we have customization we customize it by integrating a new API to pull in the latest information and finally we test and deploy the chatboard to ensure it responds accurately to users so Lang chain offers Endless Possibilities across various Industries let’s now look into the examples and use cases of Lang chain Lang chain offers Endless Possibilities across various Industries you can create customer service chart boards that manage queries and transaction or coding assistance that suggest Cod Snippets and debug issues in healthcare land chain can assist doctors with diagnosis and patient data management then we have marketing and e-commerce it can analyze consumer Behavior generate product recommendations and craft comp compelling product descriptions so with the help of this AI assistant it helps doctors make quicker more informed decisions so lch is a powerful framework that makes EI development accessible and efficient now as I mentioned one of the secret sources of deep learning is neural networks let’s see what neural networks is neural networks is based on our biological neurons the whole concept of deep learning and artificial intelligence is based on human brain and human brain consists of billions of tiny stuff called neurons and this is how a biological neuron looks and this is how an artificial neuron look so neural networks is like a simulation of our human brain human brain has billions of biological neurons and we are trying to simulate the human brain using artificial neurons this is how a biological neuron looks it has dendrites and the corresponding component with an artificial neural network is or an artificial neuron are the inputs they receive the inputs through ddes and then there is the cell nucleus which is basically the processing unit in a way so in artificial neuron also there is uh a piece which is an equivalent of this cell nucleus and based on the weights and biases we will see what exactly weights and biases are as we move the input gets processed and that results in an output in a biological neuron the output is sent through a synapse and in an artificial neuron there is an equivalent of that in the form of an output and biological neurons are also interconnected so there are billions of neurons which are interconnected in the same way artificial neurons are also interconnected so this output of this neuron will be fed as an input to another neuron and so on now in neural network one of the very basic units is a perceptron so what is a perceptron A perceptron can be considered as one of the fundamental units of neural networks it can consist at least one neuron but sometimes it can be more than one neuron but you can create a perceptron with a single neuron and it can be used to perform certain functions it can can be used as a basic binary classifier it can be trained to do some basic binary classification and this is how a basic perceptron looks like and this is nothing but a neuron you have inputs X1 X2 X to xn and there is a summation function and then there is what is known as an activation function and based on this input what is known as the weighted sum the activation function either gets gives an outut put like a zero or a one so we say the neuron is either activated or not so that’s the way it works so you get the inputs these inputs are each of the inputs are multiplied by a weight and there is a bias that gets added and that whole thing is fed to an activation function and then that results in an output and if the output is correct it is accepted if it is wrong if there is an error then that error is fed back and the neuron then adjust the weights and biases to give a new output and so on and so forth so that’s what is known as the training process of a neuron or a neural network there’s a concept called perceptron learning so perceptron learning is again one of the very basic learning processes the way it works is somewhat like this so you have all these inputs like X1 to xn and each of these inputs is multiplied by a weight and then that sum this is the formula of the equation so that sum W are i x i Sigma of that which is the sum of all these product of X and W is added up and then a bias is added to that the bias is not dependent on the input but or the input values but the bias is common for one neuron however the bias value keeps changing during the training process once the training is completed the values of these weights W1 W2 and so on and the value of the bias gets fixed so that’s that is basically the whole training process and that is what is known as the perceptron training so the weights and biases keep changing till you get the accurate output and the summation is of course passed through the activation function as you see here this wixi summation plus b is passed through activation function and then the neuron gets either fired or not and based on that there will be an output that output is compared with the actual or expected value which is also known as labeled information so this is the process of supervised learning so the output is already known and um that is compared and thereby we know if there is an error or not and if there is an error the error is fed back and the weights and biases are updated accordingly till the error is reduced to the minimum so this iterative process is known as perceptron learning or perceptron learning Rule and this error needs to be minimized so until the error is minimized this iteratively the weights and biases keep changing and that is what is the training process so the whole idea is to update the weights and the bias of the perceptron till the error is minimized the error need not be zero the error may not ever reach zero but the idea is to keep changing these weights and bias so that the error is minimum the minimum possible that it can have so this whole process is an iterative process and this is the iteration continues till either the error is zero which is uh unlikely situation or it is the minimum possible Within These given conditions now in 1943 two scientists Warren mik and Walter pittz came up with an experiment where they were able to implement the logical functions like and or and nor using neurons and that was a significant breakthrough in a sense so they were able to come up with the most common logical Gates they were able to implement some of the most common logical Gates which could take two inputs Like A and B and then give a corresponding result so for example in case of an and gate A and B and then the output is a in case of an R gate it is a plus b and so on and so forth and they were able to do this using a single layer perceptron now most of these GS it was possible to use single layer perceptron except for XR and we will see why that is in a little bit so this is how an and gate works the inputs A and B the output should be fired or the neuron should be fired only when both the inputs are one so if you have 0 0 the output should be zero for 01 it is again 0 1 0 again 0 and 1 one the output should be one so how do we implement this with a neuron so it was found that by changing the values of Weights it is possible to achieve this logic so for example if we have equal weights like 7 7 and then if we take the sum of weighted product so for example 7 into 0 and then 7 into 0 will give you 0 and so on and so forth and in the last case when both the inputs are one you get a value which is greater than one which is the threshold so only in this case the neuron gets activated and the output is there is an output in all the other cases there is no output because the threshold value is one so this is implementation of an hand gate using a single perceptron or a single neuron similarly an orgate in order to implement an orgate in case of an orgate the output will be one if either of these inputs is one so for example 01 will result in one or rather in all the cases it is one except for 0 0 so how do we implement this using a perceptron once again if you have a perceptron with weights for example 1.2 now if you see here if in the first case when both are zero the output is zero in the second case when it is 0 and 1 1.2 into 0 is 0 and then 1.2 into 1 is 1 and in the second case similarly the output is 1.2 in the last case when both the inputs are one the output is 2.4 so during the training process these weights will keep changing and then at one point where the weights are equal to W1 is equal to 1.2 and W2 is equal to 1.2 the system learns that it gives the correct output so that is implementation of orgate using a single NE on or a single layer perceptron now Exar gate this was one of the challenging ones they tried to implement an Exar gate with a single level perceptron but it was not possible and therefore in order to implement an XR so this was like a a roadblock in the progress of U neural network however subsequently they realize that this can be implemented and XR gate can be implemented using a multi-level perceptron or M l p so in this case there are two layers instead of a single layer and this is how you can Implement an XR gate so you will see that X1 and X2 are the inputs and there is a hidden layer and that’s why it is denoted as H3 and H4 and then you take the output of that and feed it to the output at 05 and provide a threshold here so we will see here that this is the numerical calculation so the weights are in this case for X1 it is 20 and minus 20 and once again 20 and minus 20 so these inputs are fed into H3 and H4 so you’ll see here for H3 the input is 01 1 1 and for H4 it is 1 1 1 and if you now look at the output final output where the threshold is taken as one if you use a sigmoid with the threshold one you will see that in these two cases it is zero and in the the last two cases it is one so this is a implementation of XR in case of XR only when one of the inputs is one you will get an output so that is what we are seeing here if we have either both the inputs are one or both the inputs are zero then the output should be zero so that is what is an exclusive or gate so it is exclusive because only one of the inputs should be one and then only you’ll get an output of one which is Satisfied by this condition so this is a special implement mation XR gate is a special implementation of perceptron now that we got a good idea about perceptron let’s take a look at what is a neural network so we have seen what is a perceptron we have seen what is a neuron so we will see what exactly is a neural network a neural network is nothing but a network of these neurons and they are different types of neural networks there are about five of them these are artificial neural network convolutional neural network then recursive neural network Network or recurrent neural network deep neural network and deep belief Network so and each of these types of neural networks have a special you know they can solve special kind of problems for example convolutional neural networks are very good at performing image processing and image recognition and so on whereas RNN are very good for speech recognition and also text analysis and so on so each type has some special characteristics and they can they’re good at performing certain special kind of tasks what are some of the applications of deep learning deep learning is today used extensively in gaming you must have heard about alphao which is a game created by a startup called Deep Mind which got acquired by Google and alphao is an AI which defeated the human world champion lead do in this game of Go so gaming is an area where deep learning is being extensively used and a lot of research happens in the area of gaming as well in addition to that nowadays there are neural networks or special type called generative adversarial networks which can be used for synthesizing either images or music or text and so on and they can be used to compose music so the neural network can be trained to comp compose a certain kind of music and autonomous cars you must be familiar with Google Google’s self-driving car and today a lot of Automotive companies are investing in this space and uh deep learning is a core component of this autonomous Cars the cars are trained to recognize for example the road the the lane markings on the road signals any objects that are in front any obstruction and so on and so forth so all this involves deep learning so that’s another major application and robots we have seen several robots including Sofia you may be familiar with Sophia who was given a citizenship by Saudi Arabia and there are several such robots which are very humanlike and the underlying technology in many of these robots is deep learning medical Diagnostics and Health Care is another major area where deep learning is being used and within Healthcare Diagnostics again there are multiple areas where deep learning and image recognition image processing can be used for example for cancer detection as you may be aware if cancer is detected early on it can be cured and one of the challenges is in the availability of Specialists who can diagnose cancer using these diagnostic images and various scans and and so on and so so forth so the idea is to train neural network to perform some of these activities so that the load on the cancer specialist doctors or oncologist comes down and there is a lot of research happening here and there are already quite a few applications that are claimed to be performing better than human beings in this space can be lung cancer it can be breast cancer and so on and so forth so Healthcare is a major area where deep learning is being applied let’s take a look at the inner working of a neural network so how does an artificial neural network let’s say identify can we train a neural network to identify the shapes like squares and circles and triangles when these images are fed so this is how it works any image is nothing but it is a digital information of the pixels so in this particular case let’s say this is an image of 28x 28 pixels and this is an image of a square there’s a certain way in which the pixels are lit up and so this pixels have a certain value maybe from 0 to 256 and 0 indicates that it is black or it is dark and 256 indicates it is completely it is white or lit up so that is like an indication or a measure of the how the pixels are lit up and so this is an image is let let’s say consisting of information of 784 pixels so all the information what is inside this image can be kind of compressed into the 784 pixels the way each of these pixels is lit up provides information about what exactly is the image so we can train neural networks to use that information and identify the images so let’s take a look how this works so each neuron the value if it is close to one that means it is white whereas if it is close to zero that means it is black now this is a an animation of how this whole thing works so these pixels one of the ways of doing it is we can flatten this image and take this complete 784 pixels and feed that as input to our neural network neural network can consist of probably several layers there can be a few hidden layers and then there is an input layer and an output layer now the input layer take these 784 pixels as input the values of each of these pixels and then you get an output which can be of three types or three classes one can be a square a circle or a triangle now during the training process there will be initially obviously you feed this image and it will probably say it’s a circle or it will say it’s a a triangle so as a part of the training process we then send that error back and the weights and the biases of these neurons are adjusted till it correctly identifies that this is a square that is the whole training mechanism that happens out here now let’s take a look at a circle same way so you feed these 784 pixels there is a certain pattern in which the pixels are lit up and the neural network is trained to identify that pattern and during the training process once again it would probably initially identify it incorrectly saying this is a square or a triangle and then that error is fed back and the weights and biases are adjusted finally till it finally gets the image correct so that is the training process so now we will take a look at same way a triangle so now if you feed another image which is consisting of triangle so this is the training process now we have trained our neural network to classify these images into a triangle or a circle and a square so now this neural network can identify these three types of objects now if you feed another image and it will be able to identify whether it’s a square or a triangle or a circle now what is important to be observed is that when you feed a new image it is not necessary that the image or the the triangle is exactly in this position now the neural network actually identifies the patterns so even if the triangle is let’s say positioned here not exactly in the middle but maybe at the corner or in the side it would still identify that it is a triangle and that is the whole idea behind pattern recognition so how does this training process work this is a quick view of how the training process works so we have seen that a neuron consists of inputs it receives inputs and then there is a weighted sum which is nothing but this XI wi summation of that plus the bias and this is then fed to the activation function and that in turn gives us a output now during the training process initially obviously when you feed these images when you send maybe a square it will identify it as a triangle and when you maybe feed a triangle it will identify as a square and so on so that error information is fed back and initially these weights can be random maybe all of them have zero values and then it will slowly keep changing so the as a part of the training process the values of these weights W1 W2 up to WN keep changing in such a way that to towards the end of the training process it should be able to identify these images correctly so till then the weights are adjusted and that is known as the training process so and these weights are numeric values could be 0.525 35 and so on it could be positive or it could be negative and the value that is coming here is the pixel value as we have seen it can be anything between 0 to 1 you can scale it between 0 to 1 or 0 to 256 whichever way Z being black and 256 being white and then all the other colors in between so that is the input so these are numerical values this multiplication or the product W ixi is a numerical value and the bias is also a numerical value we need to keep in mind that the bias is fixed for a neuron it doesn’t change with the inputs whereas the weights are one per input so that is one important point to be noted so but the bias also keeps changing initially it will again have a random value but as a part of the training process the weights the values of the weights W1 W2 WN and the value of B will change and ultimately once the training process is complete these values are fixed for this particular neuron W1 W2 up to WN and plus the value of the B is also fixed for this particular neuron and in this way there will be multiple neurons and each there may be multiple levels of neurons here and that’s the way the training process work works so this is another example of multi-layer so there are two hidden layers in between and then you have the input layer values coming from the input layer then it goes through multiple layers hidden layers and then there is an output layer and as you can see there are weights and biases for each of these neurons in each layer and all of them gets keeps changing during the training process and at the end of the training process all these weights have a certain value and that is a trained model and those values will be fixed once the training is completed all right then there is something known as activation function neural networks consists of one of the components in neural networks is activation function and every neuron has an activation function and there are different types of activation functions that are used it could be a relu it could be sigmoid and so on and so forth and the activation function is what decides whether a neuron should be fired or not so whether the output should be zero or one is decided by the activation function and the activation function in turn takes the input which is the weighted sum remember we talked about wixi + B that weighted sum is fed as a input to the activation function and then the output can be either a zero or a one and there are different types of activation functions which are covered in an earlier video you might want to watch all right so as a part of the training process we feed the inputs the labeled data or the training data and then it gives an output which is the predicted output by the network which we indicate as y hat and then there is a labeled data because we for supervised learning we already know what should be the output so that is the actual output and in the initial process before the training is complete obviously there will be error so that is measured by what is known as the cost function so the difference between the predicted output and the actual output is the error and the cost function can be defined in different ways there are different types of cost functions so in this case it is like the average of the squares of the error so and then all the errors are added which can sometimes be called as sum of squares sum of square errors or SSC and that is then fed as a feedback in what is known as backward propagation or back propagation and that helps in the network adjusting the weights and biases and so the weights and biases get updated till this value the error value or the cost function is minimum now there is a optimization technique which is used here called gradient descent optimization and this algorithm Works in a way that the error which is the cost function needs to be minimized so there’s a lot of mathematics that goes behind find this for example they find the local Minima the global Minima using the differentiation and so on and so forth but the idea is this so as a training process as the as a part of training the whole idea is to bring down the error which is like let’s say this is the function the cost function at certain levels it is very high the cost value of the cost function the output of the cost function is very high so the weight have to be adjusted in such a way and also the bias of course that the cost function is minimized so there is this optimization technique called gradient descent that is used and this is known as the learning rate now gradient descent you need to specify what should be the learning rate and the learning rate should be optimal because if you have a very high learning rate then the optimization will not converge because at some point it will cross over to the side on the other hand if you have very low learning rate then it might take forever to convert so you need to come up with the optimum value of the learning rate and once that is done using the gradient descent optimization the error function is reduced and that’s like the end of the training process all right so this is another view of gradient descent so this is how it looks this is your your cost function the output of the cost function and that has to be minimized using gradient descent algorithm and these are like the parameters and weight could be one of them so initially we start with certain random values so cost will be high and then the weights keep changing and in such a way that the cost function needs to come down and at some point it may reach the minimum value and then it may increase so that is where the gradient descent algorithm decides that okay it has reach the minimum value and it will kind of try to stay here this is known as the global Minima now sometimes these curves may not be just for explanation purpose this has been drawn in a nice way but sometimes these curves can be pretty erratic there can be some local Minima here and then there is a peak and then and so on so the whole idea of gradient desent optimization is to identify the global Minima and to find the weights and the bias at that particular point so that’s what is gradient descent and then this is another example so you can have these multiple local Minima so as you can see at this point when it is coming down it may appear like this is a minimum value but then it is not this is actually the global minimum value and the gradient desent algorithm will make an effort to reach this level and not get stuck at this point so the algorithm is already there and it knows how to identif ify This Global minimum and that’s what it does during the training process now in order to implement deep learning there are multiple platforms and languages that are available but the most common platform nowadays is tensor flow and so that’s the reason we have uh this tutorial we created this tutorial for tensor flow so we will take you through a quick demo of how to write a tensorflow code using Python and tensorflow is uh an open source platform created by Google so let’s just take a look at the details of tens ofl and so this is a a library a python Library so you can use python or any other languages it’s also supported in other languages like Java and R and so on but python is the most common language that is used so it is a library for developing deep learning applications especially using neural networks and it consists of primarily two parts if you will so one is the tensors and then the other is the graphs or the flow that’s the way the name that’s the reason for this kind of a name called tensorflow so what are tensors tensors are like multi-dimensional arrays if you will that’s one way of looking at it so usually you have a onedimensional array so first of all you can have what is known as a scalar which means a number and then you have a onedimensional array something like this which means this is like a set of numbers so so that is a one-dimension array then you can have a two-dimensional array which is like a matrix and beyond that sometimes it gets difficult so this is a three-dimensional array but tens of flow can handle many more Dimensions so it can have multi-dimensional arrays that is the strength of tensor flow and which makes computation deep learning computation much faster and that’s the reason why tensor flow is used for developing deep learning applications so so tensor flow is a deep learning tool and this is the way it works so the data basically flows in the form of tensors and the way the programming works as well is that you first create a graph of how to execute it and then you actually execute that particular graph in the form of what is known as a session we will see this in the tensorflow code as we move forward so all the data is managed or manipulated in tensors and then the processing happens using these graphs there are certain terms called like for example ranks of a tensor the rank of a tensor is like a dimensional dimensionality in a way so for example if it is scalar so there is just a number just one number the rank is supposed to be zero and then it can be a one-dimensional vector in which case the rank is supposed to be one and then you can have a two-dimensional Vector typically like a matrix then in that case we say the rank is two and then if it is a three-dimensional array then rank is three and so on so it can have more than three as well so it is possible that you can store multi-dimensional arrays in the form of tensors so what are some of the properties of tensor flow I think today it is one of the most popular platform torf flow is the most popular deep learning platform or Library it is open source it’s developed by Google developed and maintained by Google but it is open source one of the most important things about tensorflow is that it can run on CPUs as well as gpus GPU is a graphical Processing Unit just like CPU is central processing unit now in earlier days GPU was used for primarily for graphics and that’s how the name has come and one of the reasons is that it cannot perform generic activities very efficiently like CPU but it can perform iterative actions or computations extremely fast and much faster than a CPU so they are really good for computational activities and in deep learning there is a lot of iterative computation that happens so in the form of matrix multiplication and so on so gpus are very well suited for this kind of computation and tensorflow supports both GPU as well as CPU and there’s a certain way of writing code in tensorflow we will see as we go into the code and of course tensorflow can be used for traditional machine learning as well but then that would be an Overkill but just for understanding it may be a good idea to start writing code for a normal machine learning use case so that you get a hang of how tensorflow code works and then you can move into neural networks so that is um just a suggestion but if you’re already familiar with how tens oflow works then probably yeah you can go straight into the neural networks part so in this tutorial we will take the use case of recognizing handwritten digits this is like a hollow world of deep learning and this is a nice little amness database is a nice little database that has images of handwritten digits nicely formatted because very often in deep learning and neural networks we end up spending a lot of time in preparing the data for training and with amness database we can avoid that you already have the data in the right format which can be directly used for training and amnest also offers a bunch of built-in utility functions that we can straight away use and call those functions without worrying about writing our own functions and that’s one of the reasons why mes database is very popular for training purposes initially when and people want to learn about deep learning and tensor flow this is the database that is used and it has a collection of 70,000 handwritten digits and a large part of them are for training then you have test just like in any machine learning process and then you have validation and all of them are labeled so you have the images and they’re label and these images they look somewhat like this so they are handwritten images collected from from a lot of individuals people have these are samples written by human beings they have handwritten these numbers these numbers going from 0 to 9 so people have written these numbers and then the images of those have been taken and formatted in such a way that it is very easy to handle so that is amness database and the way we are going to implement this in our tens oflow is we will feed this data especially the training data along with the label information and uh the data is basically these images are stored in the form of the pixel information as we have seen in one of the previous slides all the images are nothing but these are pixels so an image is nothing but an arrangement of pixels and the value of the pixel either it is lit up or it is not or in somewhere in between that’s how the images are stored and that is how they are fed into the neural network and and for training once the network is trained when you provide a new image it will be able to identify within a certain error of course and for this we will use one of the simpler neural network configurations called softmax and for Simplicity what we will do is we will flatten these pixels so instead of taking them in a two-dimensional arrangement we just flatten them off so for example it starts from here it is a 28 by 28 so there are 7484 pixels so pixel number one starts here it goes all the way up to 28 then 29 starts here and goes up to 56 and so on and the pixel number 784 is here so we take all these pixels flatten them out and feed them like one single line into our neural network and this is a what is known as a softmax layer what it does is once it is trained it will be able to identify what digit this is so there are in this output layer there are 10 neurons each signifying a digit and at any given point of time when you feed an image only one of these 10 neurons gets activated so for example if this is strained properly and if you feed a number nine like this then this particular particular neuron gets activated so you get an output from this neuron let me just use uh a pen or a laser to show you here okay so you’re feeding a number nine let’s say this has been trained and now if you’re feeding a number nine this will get activated now let’s say you feed one to the trained Network then this neuron will get activated if you feed two this neuron will get activated and so on I hope you get the idea so this is one type of a neural network or an activation function known as softmax layer so that’s what we will be using here this one of the simpler ones for quick and easy understanding so this is how the code would look we will go into our lab environment in the cloud and uh we will show you there directly but very quickly this is how the code looks and uh let me run you through briefly here and then we will go into the Jupiter notebook where the actual code is and we will run that as well so as a first step first of all we are using python here and that’s why the syntax of the language is Python and the first step is to import the tensorflow library so and we do this by using this line of code saying import tensor flow as DF DF is just for convenience so you can name give any name and once you do this TF is tens flow is available as an object in the name of TF and then you can run on its uh methods and accesses its attributes and so on and so forth and M database is actually an integral part of tensor flow and that’s again another reason why we as a first step we always use this example Mist database example so you just simply import mnist database as well using this line of code and you slightly modify this so that the labels are in this format what is known as one hot true which means that the label information is stored like an array and uh let me just uh use the pen to show what exactly it is so when you do this one hot true what happens is each label is stored in the form of an array of 10 digits and let’s say the number is uh 8 okay so in this case all the remaining values there will be a bunch of zeros so this is like array at position zero this is at position one position two and so on and so forth let’s say this is position 7 then this is position 8 that will be one because our input is eight and again position 9 will be zero okay so one hot encoding this one hot encoding true will kind of load the data in such a way that the labels are in such a way that only one of the digits has a value of one and that indicat So based on which digit is one we know what is the label so in this case the eighth position is one therefore we know this sample data the value is eight similarly if you have a two here let’s say then the labeled information will be somewhat like this so you have your labels so you have

this as zero the zeroth position the first position is also zero the second position is one because this indicates number two and then you have third as zero and so on okay so that is the significance of this one hot true all right and then we can check how the data is uh looking by displaying the the data and as I mentioned earlier this is pretty much in the form of digital form like numbers so all these are like pixel values so you will not really see an image in this format but there is a way to visualize that image I will show you in a bit and uh this tells you how many images are there in each set so the training there are 55,000 images in training and in the test set there are 10,000 and then validation there are 5,000 so alog together there are 70,000 images all right so let’s uh move one and we can view the actual image by uh using the matplot clip library and this is how you can view this is the code for viewing the images and you can view the them in color or you can view them in Gray scale so the cmap is what tells in what way we want to view it and what are the maximum values and the minimum values of the pixel values so these are the Max and minimum values so of the pixel values so maximum is one because this is a scaled value so one means it is uh White and zero means it is black and in between is it can be anywhere in between in black and white and the way to train the model there is a certain way in which you write your tsor flow code and um the first step is to create some placeholders and then you create a model in this case we will use the softmax model one of the simplest ones and um placeholders are primarily to get the data from outside into the neural network so this is a very common mechanism that is used and uh then of course you will have variables which are your remember these are your weights and biases so for in our case there are 10 neurons and each neuron actually has 784 because each neuron takes all the inputs if we go back to our slide here actually every neuron takes all the 784 inputs right this is the first neuron it has it receives all the 784 this is the second neuron this also receives all the 78 so each of these inputs needs to be multip multiplied with the weight and that’s what we are talking about here so these are this is a a matrix of 784 values for each of the neurons and uh so it is like a 10 by 784 Matrix because there are 10 neurons and uh similarly there are biases now remember I mentioned bias is only one per neuron so it is not one per input unlike the weights so therefore there are only 10 biases because there are only 10 neurons in this case so that is what we are creating a variable for biases so this is uh something little new in tensor flow you will see unlike our regular programming languages where everything is a variable here the variables can be of three different types you have placeholders which are primarily used for feeding data you have variables which can change during the course of computation and then a third type which is is not shown here are constants so these are like fixed numbers all right so in a regular programming language you may have everything as variables or at the most variables and constants but in tens oflow you have three different types placeholders variables and constants and then you create what is known as a graph so tensorflow programming consists of graphs and tensors as I mentioned earlier so this can be considered ultimately as a tensor and then the graph tells how to execute the whole implementation so that the execution is stored in the form of a graph and in this case what we are doing is we are doing a multiplication TF you remember this TF was created as a tensorflow object here one more level one more so TF is available here now tensorflow has what is known as a matrix multiplication or matal function so that is what is being used here in this case so we are using the matrix multiplication of tens of flow so that you multiply your input values x with W right this is what we were doing x w plus b you’re just adding B and this is in very similar to one of the earlier slides where we saw Sigma XI wi so that’s what we are doing here matrix multiplication is multiplying all the input values with the corresponding weights and then adding the bias so that is the graph we created and then we need to Define what is our loss function and what is our Optimizer so in this case we again use the tensor flows apis so tf. NN softmax cross entropy with logits is the uh API that we will use and reduce mean is what is like the mechanism whereby which says that you reduce the error and Optimizer for doing deduction of the error what Optimizer are we using so we are using gradient descent Optimizer we discussed about this in couple of slides uh earlier and for that you need to specify the learning rate you remember we saw that there was a a slide somewhat like this and then you define what should be the learning rate how fast you need to come down that is the learning rate and this again needs to be tested and tried and to find out the optimum level of this learning rate it shouldn’t be very high in which case it will not converge or shouldn’t be very low because it will in that case it will take very long so you define the optimizer and then you call the method minimize for that Optimizer and that will Kickstart the training process and so far we’ve been creating the graph and in order to actually execute that graph we create what is known as a session and then we run that session and once the training is completed we specify how many times how many iterations we want it to run so for example example in this case we are saying Thousand Steps so that is a exit strategy in a way so you specify the exit condition so it training will run for thousand iterations and once that is done we can then evaluate the model using some of the techniques shown here so let us get into the code quickly and see how it works so this is our Cloud environment now you can install tensorflow on your local machine as well and I’m showing this demo on our existing Cloud but you can also install denslow on your local machine and uh there is a separate video on how to set up your tensor flow environment you can watch that if you want to install your local environment or you can go for other any cloud service like for example Google Cloud Amazon or Cloud Labs any of these you can use and U run and try the code okay so it has got started we will log in all right so this is our deep learning tutorial uh code and uh this is our tensorflow environment and uh so let’s get started the first we have seen a little bit of a code walk through uh in the slides as well now you will see the actual code in action so the first thing we need to do is import tensorflow and then we will import the data and we need to adjust the data in such a way that the one hot is encoding is set to True one hot encoding right as I explained earlier so in this case the label values will be shown appropriately and if we just check what is the type of the data so you can see that this is a uh data sets python data sets and if we check the number of of images the way it looks so this is how it looks it is an array of type float 32 similarly the number if you want to see what is the number of training images there are 55,000 then there are test images 10,000 and then validation images 5,000 now let’s take a quick look at the data itself visualization so we will use um matte plot clip for this and um if we take a look at the shape now shape gives us like the dimension of the tensors or or or the arrays if you will so in this case the training data set if we sees the size of the training data set using the method shape it says there are 55,000 and 55,000 by 784 so remember the 784 is nothing but the 28 by 28 28 into 28 so that is equal to 784 so that’s what it is uh showing now we can take just one image and just see what is the the first image and see what is the shape so again size obviously it is only 784 similarly you can look at the image itself the data of the first image itself so this is how it it shows so large part of it will probably be zeros because as you can imagine in the image only certain areas are written rest is U blank so that’s why you will mostly see the Z either it is black or white but then there are these values are so the values are actually they are scaled so the values are between Z and one okay so this is what you’re seeing so certain locations there are some values and then other locations there are zeros so that is how the data is stored and loaded if we want to actually see what is the value of the handwritten image if you want to view it this is how you view it so you create like do this reshape and um matplot lib has this um feature to show you these images so we will actually use the function called um I am show and then if you pass this parameters appropriately you will be able to see the different images now I can change the values in this position so which image we are looking at right so we can say if I want to see what is the in maybe 5,000 right so 5,000 has three similarly you can just say five what is in five five as eight what is in [Music] 50 again eight so basically by the way if you’re wondering uh how I’m executing this code shift enter in case you’re not familiar with Jupiter notebooks shift enter is how you execute each cell individually will cell and if you want to execute the entire program you can go here and say run all so that is how this code gets executed and um here again we can check what is the maximum value and what is the minimum value of this pixel values as I mentioned this is it is scaled so therefore it is between the values lie between 1 and zero now this is where we create our model the first thing is to create the require placeholders and variables and that’s what we are doing here as we have seen in the slides so we create one place holder and we create two variables which is for the weights and biases these two variables are actually matrices so each variable has 784 by 10 values okay so one for this 10 is for each neuron there are 10 neurons and 784 is for the pixel values inputs that are given which is 28 into 28 and the biases as I mentioned one for each neurons so there will be 10 biases they are stored in a variable by the name b and this is the graph which is basically the multiplication of these matrix multiplication of X into W and then the bias is added for each of the neurons and the whole idea is to minimize the error so let me just execute I think this code is executed then we Define what is our the Y value is basically the label value so this is another placeholder we had X as one placeholder and Yore true as a second placeholder and this will have values in the form of uh 10 digigit 10 digigit uh arrays and uh since we said one hot encoded the position which has a one value indicates what is is the label for that particular number all right then we have cross entropy which is nothing but the loss loss function and we have the optimizer we have chosen gradient descent as our Optimizer then the training process itself so the training process is nothing but to minimize the cross entropy which is again nothing but the loss function so we Define all of this in the form of of a graph so the up to here remember what we have done is we have not exactly executed any tensorflow code till now we are just preparing the graph the execution plan that’s how the tensorflow code works so the whole structure and format of this code will be completely different from how we normally do programming so even with people with programming experience may find this a little difficult to understand it and it needs quite a bit of practice so you may want to view this uh video also maybe a couple of times to understand this flow because the way tensor flow programming is done is slightly different from the normal programming some of you who let’s say have done uh maybe spark programming to some extent will be able to easily understand this uh but even in spark the the programming the code itself is pretty straightforward behind the scenes the execution happens slightly differently but in tens oflow even the code has to be written in a completely different way so the code doesn’t get executed uh in the same way as you have written so that that’s something you need to understand and little bit of practi is needed for this so so far what we have done up to here is creating the variables and feeding the variables and um or rather not feeding but setting up the variables and uh the that’s all defining maybe the uh what kind of a network you want to use for example we want to use softmax and so on so you have created the variables how to load the data loaded the data viewed the data and prepared everything but you have not yet executed anything in tens of flow now the next step is the execution in tens of flow so the first step for doing any execution in tensor flow is to initialize the variable abl so anytime you have any variables defined in your code you have to run this piece of code alwayss so you need to basically create what is known as a a node for initializing so this is a node you still are not yet executing anything here you just created a node for the initialization so let us go ahead and create that and here onwards is where you will actually execute your code uh intensive flow and in order to execute the code what you will need is a session tensor flow session so tf. session will give you a session and there are a couple of different ways in which you can do this but one of the most common methods of doing this is with what is known as a withd loop so you have a withd tf. session as SS and with a uh colon here and this is like a block starting of the block and these indentations tell how far this block goes and this session is valid till this block gets executed so that is the purpose of creating this width block this is known as a width block so with tf. session as cess you say cs. run in it now cs. run will execute a node that is specified here so for example here we are saying SS Dot run sess is basically an instance of the session right so here we are saying tf. session so an instance of the session gets created and we are calling that cess and then we run a node within that one of the nodes in the graph so one of the nodes here is in it so we say run that particular node and that is when the initialization of the variables happens now what this does is if you have any variables in your code in our case we have W is a variable and B is a variable so any variables that we created you have to run this code you have to run the initialization of these variables otherwise you will get an error okay so that is the that’s what this is doing then we within this width block we specify a for Loop and we are saying we want the system to iterate for thousand steps and perform the training that’s what this for Loop does run training for thousand iterations and what it is doing basically is it is fetching the data or these images remember there are about 50,000 images but it cannot get all the images in one shot because it will take up a lot of memory and performance issues will be there so this is a very common way of Performing deep learning training you always do in batches so we have maybe 50,000 images but you always do it in batches of 100 or maybe 500 depending on the size of your system and so on and so forth so in this case we are saying okay get me 100 uh images at a time and get me only the training images remember we use only the training data for training purpose and then we use test data for test purpose you must be familiar with machine learning so you must be aware of this but in case you are not in machine learning also not this is not specific to deep learning but in machine learning in general you have what is known as training data set and test data set your available data typically you will be splitting into two parts and using the training data set for training purpose and then to see how well the model has been trained you use the test data set to check or test the validity or the accuracy of the model so that’s what we are doing here and You observe here that we are actually calling an mnist function here so we are saying mnist train. nextt batch right so this is the advantage of using mes database because they have provided some very nice helper functions which are readily available otherwise this activity itself we would have had to write a piece of code to fetch this data in batches that itself is a a lengthy exercise so we can avoid all that if we are using amness database and that’s why we use this for the initial learning phase okay so when we say fetch what it will do is it will fetch the images into X and the labels into Y and then you use this batch of 100 images and you run the training so cs. run basically what we are doing here is we are running the training mechanism which is nothing but it passes this through the neural network passes the images through the neural network finds out what is the the output and if the output obviously the initially it will be wrong so all that feedback is given back to the neural network and thereby all the W’s and Bs get updated till it reaches thousand iterations in this case the exit criteria is th000 but you can also specify probably accuracy rate or something like that for the as an exit criteria so here it is it just says that okay this particular image was wrongly predicted so you need to update your weights and biases that’s the feedback given to each neuron and that is run for thousand iterations and typically by the end of this thousand iterations the model would have learned to recognize these handwritten images obviously it will not be 100% accurate okay so once that is done after so this happens for thousand iterations once that is done you then test the accuracy of these models by using the test data set right so this is what we are trying to do here the code may appear a little complicated because if you’re seeing this for the first time you need to understand uh the various methods of tensor flow and so on but it is basically comparing the output with what has been what is actually there that’s all it is doing so you have your test data and uh you’re trying to find out what is the actual value and what is the predict value and seeing whether they are equal or not TF do equal right and how many of them are correct and so on and so forth and based on that the accuracy is uh calculated as well so this is the accuracy and uh that is what we are trying to see how accurate the model is in predicting these uh numbers or these digits okay so let us run this this entire thing is in one cell so we will have to just run it in one shot it may take a little while let us see and uh not bad so it has finished the thousand iterations and what we see here as an output is the accuracy so we see that the accuracy of this model is around 91% okay now which is pretty good for such a short exercise within such a short time we got 90% accuracy however in real life this is probably not sufficient so there are other ways in to increase the accuracy we will see probably in some of the later tutorials how to improve this accuracy how to change maybe the hyper parameters like number of neurons or number of layers and so on and so forth and uh so that this accuracy can be increased Beyond 90% hello and welcome to the tensorflow object detection API tutorial in this video I will walk you through the tensorflow code to perform op object detection in a video so let’s get started this part is basically you’re importing all the libraries we need a lot of these libraries for example lumpi we need image IO datetime and pill and so on and so forth and of course mat plot lib so we import all these libraries and then there are a bunch of variables which have some paths for the files and folders so this is regular stuff let’s keep moving then we import ma plot lib and make it in line and uh a few more Imports all right and then these are some warnings we can just ignore them so if I run this code once again it will go away all right and then here onwards we do the model preparation and what we’re going to do is we’re going to use an existing neural network model so we are not going to train a new one because that really will take a long time and uh it needs a lot of computation resources and so on and it is really not required there are already models that have been trained and in this case it is the SSD with mobile net that’s the model that we are going to use and uh this model is trained to detect objects and uh it is readily available as open source so we can actually use this and if you want to use other models there are a few more models available so you can click on this link here and uh let me just take you there there are a few more models but we have chosen this particular one because this is faster it may not be very accurate but that is one of the faster models but on this link you will see a lot of other models that are readily available these are trained models some of them would take a little longer but they may be more accurate and so on so you can probably play around with these other models okay so we will be using that model so this piece of code this line is basically importing that model and this is also Al known as uh Frozen model the term we use is frozen model so we import download and import that and then we will actually use that model in our code all right so these two cells we have downloaded and import the model and then once it is available locally we will then load this into our program all right so we are loading this into memory and uh you need to perform a couple of additional steps which is basically we need to to map the numbers to text as you may be aware when we actually build the model and when we run predictions the model will not give a text the output of the model is usually a number so we need to map that to a text so for example if the network predicts that the output is five we know that five means it is an airplane things like that so this mapping is done in this next cell all right so let’s keep moving and then we have a helper code which will basically load the data or load the images and transform into numpy arrays this is also used in doing object detection in images so we are actually going to reuse because video is nothing but it consists of frames which in turn are images so we are going to pretty much use reuse the same code which we used for doing object detection in image so this is where the actual detection starts so here this is the path for where the images are stored so this is here once again we are reusing the code which we wrote for detecting objects in an image so this is the path where the images were stored and this is the extension and this was done for about two or three images so we will continue to use this and uh we go down I’ll skip this section so this is the cell where we are actually loading the video and converting it into frames and then using frame by frame we are detecting the objects in the image so in this code what we are doing basically is there a few lines of code what they do is basically once they find an object a box will be drawn around those uh each of those objects and the input file the name of the input video file is uh traffic it is the extension is MP4 and uh we have this video reader so excellent object which is basically part of this class called image iio so we can read and write videos using that and uh the video that we are going to use is traffic. MP4 you can use any mp4 file but in our case I picked up video which has uh like car so let me just show you so this is in this object detection folder I have this mp4 file I’ll just quickly PR this video it’s a little slow yeah okay so here we go this is the video it’s a short one relatively small video so that for this particular demo and what it will do is once we run our code it will detect each of these cars and it will annotate them as cars so in this particular video we only have cars we can later on see with another video I think I have cat here so we can also try with that but let’s first check with this uh traffic video so let me go back so we will be reading this frame by frame and um no actually we will be reading the video file but then we will be analyzing it frame by frame and we will be reading them at 10 frames per second that is the rate we are mentioning here and analyzing it and then annotating and then writing it back so you will see that we will have a video file named something like this traffic annotated and um we will see the annotated video so let’s go back and run through this piece of code and then we will’ll come back and see the annotated uh video this might take a little while so I will pause the video after running this particular cell and then come back to show you the results all right so let’s go ahead and run it so it is running now and it is also important that at the end you close the video writer so that it is similar to a file pointer when you open a file you should also make sure you close it so that it doesn’t hog the resources so it’s very similar at the end of it the last piece or last line of code should be video writer. close all right so I’ll pause and then I’ll come back okay so I will see you in a little bit all right so now as you can see here the processing is done the r Glass has disappeared that means the video has been processed so let’s go back and check the annotated video we go back to my file manager so this was the original traffic. MP4 and now you have here traffic annotate it M4 so let’s go and run this and see how it looks you see here it just got each of these cars are getting detected let me pause and show you so we pause here it says car 70% let us allow it to go a little further it detects something on top what is that truck okay so I think because of the board on top it somehow thinks there is a truck let’s playay some more and see if it detects anything else so this is again a car looks like so let us yeah so this is a car and it has confidence level of 69% okay this is again a car all right so basically till the end it goes and detects each and every car that is passing by now we can quickly repeat this process for another video let me just show you the other video which is a cat again there is uh this cat is not really moving or anything but it is just standing there staring and moving a little slowly and uh our application will our network will detect that this is a cat and uh even when the cat moves a little bit in the other direction it’ll continue to detect and show that it is a cat Okay so yeah so this is how the original video is let’s go ahead and change our code to analyze this one and see if it detects our Network detects this cat close this here we go and I’ll go back to my code all we need to do is change this traffic to cat the extension it will automatically pick up because it is given here and then it will run through so very quickly once again what it is doing is this video reader video uncore reader has a a neat little feature or interface whereby you can say for frame in video uncore reader so it will basically provide frame by frame so you in a loop frame by frame and then you take that each frame that is given to you you take it and analyze it as if it is an image individual image so that’s the way it works so it is very easy to handle this all right so now let’s once again run just this cell rest of the stuff Remains the Same so I will run this cell again it will take a little while so the our glasses come back I will pause and then come back in a little while all right so the processing is done let’s go and check the annotated video go here so we have cat and notated MP4 let’s play this all right so you can see here it is detecting the cat and in the beginning you also saw it detected something else here there looks like it detected one more object so let’s just go back and see what it has detected here let’s see yes so what is it trying to show here it’s too small not able to see but uh it is trying to detect something I think it is saying it is a car I don’t know all right okay so in this video there’s only pretty much only one object which is the cat and uh let’s wait for some time and see if it continues to detect it when the cat turns around and moves as well just in a little bit that’s going to happen and we will see there we go and in spite of turning the other way I think our network is able to detect that it is a cat so let me freeze and then show here it is actually still continues to detect it as a cat all right so so that’s pretty much it I think that’s the only object that it detects in this particular video okay so close this so that’s pretty much it thank you very much for watching this video and you have a great day and in case you have any questions please uh put them below the video here and we will be more than happy to get back to you and make sure you put your email ID so that we can contact you in case you have any questions thank you once again bye-bye today we’re going to be covering the convolutional neural network tutorial do you know how deep learning recognizes the objects in an image and really this particular neural network is how image recognition works it’s very Central one of the biggest building blocks for image recognition it does it using convolution neural network and we over here we have the basic picture of a u hummingbird pixels of an image fed as input you have your input layer coming in so it takes that graphic and puts it into the input layer you have all your hidden layers and then you have your output layer and your output layer one of those is is going to light up and say oh it’s a bird we’re going to go into depth we’re going to actually go back and forth on this a number of times today so if you’re not catching all the image um don’t worry we’re going to get into the details so we have our input layer accepts the pixels of the image as input in the form of arrays and you can see up here where they’ve actually um labeled each block of the bird in different arrays so we’ll dive into deep as to how that looks like and how those matrixes are set up your hidden layer carry out feature extraction by performing certain calcul ations and manipulation so this is the part that kind of reorganizes that picture multiple ways until we get some data that’s easy to read for the neural network this layer uses a matrix filter and performs convolution operation to detect patterns in the image and if remember that convolution means to coil or to twist so we’re going to twist the data around and alter it and use that operation to detect a new pattern there are multiple hidden layers like convolution layer real U is how that is pronounced when that’s the rectified linear unit that has to do with the activation function that’s used pooling layer also uses multiple filters to detect edges corners eyes feathers beak Etc and just like the term says pooling is pulling information together and we’ll look into that a lot closer here so if you’re if it’s a little confusing now we’ll dig in deep and try to get you uh squared away with that and then finally there is a fully connected layer that identifies the object in the image so we have these different layers coming through in the hidden layers and they come into the final area and that’s where we have say one node or one neural network entity that lights up that says it’s a bird what’s in it for you we’re going to cover an introduction to the CNN what is convolution neural network how CNN recognizes images we’re going to dig deeper into that and really look at the individual layers in the convolutional neural network and finally we do a use case implementation using the CNN we’ll begin our introduction to the CNN by introducing to a pioneer of convolutional neural network Yan leun he was the director of Facebook AI research group built the first convolutional neural network called lenette in 1988 so these have been around for a while and have had a chance to mature over the years it was used for character recognition tasks like reading zip code digits imagine processing mail and automating that process CNN is a feed forward neural network that is generally used to analyze visual images by producing data with a grid-like topology a CNN is also known as a convet and very key to this is we are looking at images that was what this was designed for and you’ll see the different layers as we dig in Mirror some of the other some of them are actually now used since we’re using uh tensorflow and carass in our code later on you’ll see that some of those layers appear in a lot of your other neural network Frameworks uh but in this case this is very Central to processing images and doing so in a variety that captures multiple images and really drills down into their different features in this example here you see flowers of two varieties Orchid and a rose I think the Orchid is much more dainty and beautiful and the rose smells quite beautiful I have a couple rose bushes in my yard uh they go into the input layer that data is in sent to all the different nodes in the next layer one of the Hidden layers based on its different weights and its setup it then comes out and gives those a new value those values then are multiplied by their weights and go to the next hidden layer and so on and then you have the output layer and one of those notes comes out and says it’s an orchid and the other one comes out and says it’s a rose depending on how was well it was trained what separates the CNN or the convolutional neural network from other neural networks is a convolutional operation forms a basis of any convolutional neural network in a CNN every image is represented in the form of arrays of pixel values so here we have a real image of the digit 8 uh that then gets put onto its pixel values representing the form of an array in this case you have a two-dimensional array and then you can see in the Final End form we transform the digit 8 into its representational form of pixels of zeros and on where the ones represent in this case the black part of the eight and the zeros represent the white background to understand the convolution neural network or how that convolutional operation Works we’re going to take a side step and look at matrixes in this case we’re going to simplify it we’re going to take two matrices A and B of one dimension now kind of separate this from your thinking as we learned that you want to focus just on the Matrix aspect of this and then we’ll bring that back together and see what that looks like when we put the pieces for the convolutional operation here we’ve set up two arrays we have uh in this case there a single Dimension Matrix and we have a = 5 37597 and we have b = 1 23 so in the convolution as it comes in there it’s going to look at these two and we’re going to start by doing multiplying them a * B and so we multiply the arrays element wise and we get 5 66 where five is the five * 1 6 is 3 * 2 and then the other 6 is 2 * 3 and since the two arrays aren’t the same size they’re not the same setup we’re going to just truncate the first one and we’re going to look at the second array multiplied just by the first three elements of the first array now that’s going to be a little confusing remember a computer gets to repeat these processes hundreds of times so so we’re not going to just forget those other numbers later on we’ll see we’ll bring those back in and then we have the sum of the product in this case 5 + 6 plus 6 equals 17 so in our a * B our very first digit in that Matrix of a * B is 17 and if you remember I said we’re not going to forget the other digits so we now have 325 we move one set over and we take 325 and we multiply that times B and you’ll see that 3 * 1 is 3 2 * 2 is 4 and so on and so on we sum it up so now we have the second digit of our a * B product in The Matrix and we continue on with that same thing so on and so on so then we would go from uh 375 to 759 to 597 this short Matrix that we have for a we’ve now covered all the different entities in a that match three different levels of B now in a little bit we’re going to cover where we use this math at this multiplying of matrixes and how that works uh but it’s important understand that we’re going through the Matrix and multiplying the different parts to it to match the smaller Matrix with the larger Matrix I know a lot of people get lost at is you know what’s going on here with these matrixes uh oh scary math not really that scary when you break it down we’re looking at a section of a and we’re comparing it to B so when you break that down your mind like that you realize okay so I’m I’m just taking these two matrixes and comparing them and I’m bringing the value down into one Matrix a * B we’re deucing that information in a way that will help the computer see different aspects let’s go ahead and flip over again back to our images here we are back to our images talking about going to the most basic two-dimensional image you can get to consider the following two images the image for the symbol back slash when you press the back slash the above image is processed and you can see there for the image for the forward slash is the opposite so we click the forward slash button that flips uh very basic we have four pixels going in can’t get any more more basic than that here we have a little bit more complicated picture we take a real image of a smiley face um then we represent that in the form of black and white pixels so if this was an image in the computer it’s black and white and like we saw before we convert this into the zeros in one so where the other one would have just been a matrix of just four dots now we have a significantly larger image coming in so don’t worry we’re going to bring this all together here in just a little bit layers in convolutional neural network when we’re looking at this we have our convolution layer and that really is the central aspect of processing images in the convolutional neural network that’s why we have it and then that’s going to be feeding in and you have your reu layer which is you know as we talked about the rectified linear unit we’ll talk about that a little bit later the reu is an how it Act is how that layer is activated is the math behind it what makes the neurons fire you’ll see that in a lot of other neural networks when you’re using it just by itself it’s for processing smaller amounts of data where you use the atom activation feature for large data coming in now because we’re processing small amounts of data in each image the reu layer works great you have your pooling layer that’s where you’re pulling the data together pooling is a neural network term it’s very commonly used I like to use the term reduce so if you’re coming from the map and reduce side you’ll see that we’re mapping all this data through all these networks and then we’re going to reduce it we’re going to pull it together and then then finally we have the fully connected layer that’s where our output’s going to come out so we have started to look at matrixes we’ve started to look at the convolutional layer and where it fits in and everything we’ve taken a look at images so we’re going to focus more on the convolution layer since this is a convolutional neural network a convolution layer has a number of filters and perform convolution operation every image is considered as a matrix of pixel values consider the following 5×5 image whose pixel values are only zero and one now obviously when we’re dealing with color there’s all kinds of things that come in on color processing but we want to keep it simple and just keep it black and white and so we have our image pixels uh so we’re sliding the filter Matrix over the image and Computing the dot product to detect the patterns and right here you’re going to ask where does this filter come from this is a bit confusing because the filter is going to be derived uh later on we build the filters when we program or train our model so you don’t need to worry what the fil fil actually is what you do need to understand how a convolution layer works is what is the filter doing filter and you’ll have many filters you don’t have just one filter you’ll have lots of filters that are going to look for different aspects and so the filter might be looking for just edges it might be looking for different parts we’ll cover that a little bit more detail in a minute right now we’re just focusing on how the filter works as a matrix remember earlier we talked about multiplying matrixes together and here we have our two-dimensional Matrix and you can see we take the filter and we multiply it in the upper left image and you can see right here 1 * 1 1 * 0 1 * 1 we multiply those all together then sum them and we end up with a convolved feature of four we’re going to take that and sliding the filter Matrix over the image and Computing the dot product to detect patterns so we’re just going to slide this over we’re going to predict the first one and slide it over one notch predict the second one and so on and so on all the way through until we have a new Matrix and this Matrix which is the same size as filter has reduced the image and whatever filter whatever that’s filtering out it’s going to be looking at just those features reduced down to a smaller uh Matrix so once the feature maps are extracted the next step is to move them to the reu layer so the reu layer The Next Step first is going to perform an element wise operation so each of those Maps coming in if there’s negative pixels so it sets all the negative pixels to zero um and you you can see this nice graph where it just zeros out the negatives and then you have a value that goes from zero up to whatever value is um coming out of the Matrix this introduces nonlinearity to the network uh so up until now we have a we say linearity we’re talking about the fact that the feature has a value so it’s a linear feature this feature um came up and has let’s say the feature is the edge of the beak you know it’s like or the backslash that we saw um you’ll look at that and say okay this feature has a value from -10 to to 10 in this case um if it was one and say yeah this might be a beak it might not might be an edge right there a minus 5 means no we’re not even going to look at it to zero and so we end up with an output and the output takes all these feature all these filtered features remember we’re not just running one filter on this we’re running a number of filters on this image and so we end up with an rectified feature map that is looking at just the features coming through and how they weigh in from our filters so here we have an input a looks like a twocan bird very exotic looking real image is scanned in multiple convolution and the relu layers for locating features and you can see up here is turned it into a black and white image and in this case we’re looking in the upper right hand corner for a feature and that box scans over a lot of times it doesn’t scan one pixel at a time a lot of times it will Skip by two or three or four pixels uh to speed up the process that’s one of the ways you can compensate if you don’t have enough resources on your computation for large images and it’s not just one filter slowly goes across the image uh you have multiple filters have been programmed in there so you’re looking at a lot of different filters going over the different aspects of the image and just sliding across there and forming a new Matrix one more aspect to note about the reu layer is we’re not just having one reu coming in uh so not only do we have multiple features going through but we’re generating multiple reu layers for locating the feature features that’s very important to note you know so we have a quite a bundle we have multiple filters multiple railu uh which brings us to the next step forward propagation now we’re going to look at the pooling layer the rectified feature map now goes through a pooling layer pooling is a down sampling operation that reduces the dimensionality of the feature map that’s all we’re trying to do we’re trying to take a huge amount of information and reduce it down to a single answer this is a specific kind of bird this is an iris this is a Rog so you have a rectified feature map and you see here we have a rectified feature map coming in um we set the max pooling with a 2 by two filters and a stride of Two And if you remember correctly I talked about not going one pixel at a time uh well that’s where the stride comes in we end up with a 2X two pulled feature map but instead of moving one over each time and looking at every possible combination we skip a we skip a few there we go by two we skip every other pixel and we just do every other one um and this produces our rectified feature map which as you can see over here 16x 16 to a 4×4 so we’re continually trying to filter and reduce our data so that we can get to something we can manage and over here you see that we have the Max uh 34 one and two and in the max pooling we’re looking for the max value a little bit different than what we were looking at before so coming from the rectified feature we’re now finding the max value and then we’re pulling those features together so instead of think of this as image of the map think of this as how valuable is a feature in that area how much of a feature value do we have and we just want to find the best or the maximum feature for that area they might have that one piece of the filter of the beak said oh I see a one in this beak and this image and then it skips over and says I see a three in this image and says oh this one is rated as a four we don’t want to sum it together cuz then you know you might have like five ones and I’ll say ah five but you might have uh four zeros and one 10 and that tin says well this is definitely a beak where the ones will say probably not a beak a little strange analogy since we’re looking at a bird but you can see how that pulled feature map comes down and we’re just looking for the max value in each one of those matrixes pooling layer uses different filters to identify different parts of the image like edges corners body feathers eyes beak Etc um I know I focus mainly on the beak but obviously uh each feature could be each a different part of the bird coming in so let’s take a look look at what that looks like structure of a convolution neural network so far this is where we’re at right now we have our input image coming in and then we use our filters and there’s multiple filters on there that are being developed to kind of twist and change that data and so we multiply the matrixes we take that little filter maybe it’s a 2 x two we multiply it by each piece of the image and if we step two then it’s every other piece of the image that generates multiple convolution layers so we have a number of convolution layers we have um set up in there is looking at that data we then take those convolution layers we run them through the reu setup and then once we’ve done through the reu setup and we have multiple reu going on multiple layers that are reu then we’re going to take those multiple layers and we’re going to be pooling them so now we have the pooling layers or multiple poolings going on up until this point we’re dealing with uh sometimes it’s multiple Dimensions you can have three dimensions some strange data setups that aren’t doing images but looking at other things they can have four five six seven dimensions uh so right now we’re looking at 2D image Dimensions coming in into the pooling layer so the next step is we want to reduce those Dimensions or flatten them so flattening flattening is a process of converting all of the resultant two-dimensional arrays from pulled feature map into a single long continuous linear Vector so over here you see where we have a pulled feature map maybe that’s the bird wing and it has values 6847 and we want to just flatten this out and turn it into 6847 or a sing linear vector and we find out that not only do we do each of the pulled feature Maps we do all of them into one long linear Vector so now we’ve gone through our convolutional neural network part and we have the input layer into the next setup all we’ve done is taken all those different pooling layers and we flatten them out and combine them into a single linear Vector going in so after we’ve done the flattening we have a just a quick recap because we’ve covered so much so it’s important to go back and take a look at each of the steps steps we’ve gone through the structure of the network so far is we have our convolution where we twist it and we filter it and multiply the matrixes we end up with our convolutional layer which uses the reu to figure out the values going out into the pooling and you have numerous convolution layers that then create numerous pooling layers pooling that data together which is the max value which one we want to send forward we want to send the best value and then we’re going to take all of that from each of the pooling layers and we’re going to flatten it and we’re going to combine them into a single input going into the final layer once you get to that step you might be looking at that going boy that looks like the normal inut to most neural network and you’re correct it is so once we have the flattened Matrix from the pooling layer that becomes our input so the pooling layer is fed as an input to the fully connected layer to classify the image and so you can see as our flattened Matrix comes in in this case we have the pixels from the flattened Matrix fed as an input back to our twocan whatever that kind of bird that is um I need one of these to identify what kind of bird that is it comes into our Ford propagation network uh and that will then have the different weights coming down across and then finally it selects that that’s a bird and that it’s not a dog or a cat in this case even though it’s not labeled the final layer there in red is our output layer our final output layer that says bird cat or dog so quick recap of everything we’ve covered so far we have our input image which is twisted and M multiply the filters are multiplied times the uh matri the two matrixes multiplied all the filters to create our convolution layer our convolution layers there’s multiple layers in there because it’s all building multiple layers off the different filters then goes through the reu as say activation and that creates our pooling and so once we get into the pooling layer we then and the pooling look for who’s the best what’s the max value coming in from our convolution and then we take that layer and we flatten it and then it goes into a fully connected layer our fully connected neural network and then to the output and here we can see the entire process how the CNN recognizes a bird this is kind of nice because it’s showing the little pixels and where they’re going you can see the filter is generating this convolution network and that filter shows up in the bottom part of the convolution network and then based on that it uses the relo for the pooling the pooling then find out which one’s the best and so on all the way to the fully connected layer at the end or the classification and the output layer so that’d be a classification neural network at the end so we covered a lot of theory up till now and you can imagine each one of these steps has to be broken down in code so putting that together can be a little complicated not that each step of the process is overly complicated but because we have so many steps uh we have one two three four five different steps going on here with substeps in there we’re going to break that down and walk through that in code so in our use case implementation using the CNN we’ll be using the Carr 10 data set from Canadian Institute for advanced research for classifying images across 10 categories Unfortunately they don’t let me know whether it’s going to be a toucan or some other kind of bird but we do get to find out whether it can categorize between a ship a frog deer bird airplane automobile cat dog horse truck so that’s a lot of fun and if you’re looking anything in the news at all of our automated cars and everything else you can see where this kind of processing is so important in today’s world and Cutting Edge as far as what’s coming out in the commercial deployment I mean this is really cool stuff we’re starting to see this just about everywhere in Industry uh so great time to be playing with this and figuring it all out let’s go ahead and dive into the code and see what that looks like when we’re actually writing our script before we go on let’s do uh one more quick look at what we have here let’s just take a look at data batch one keys and remember in Jupiter notebook I can get by with not doing the print statement if I put a variable down there it’ll just display the variable and you can see under data batch one for the keys since this is a dictionary we have the batch one label data and file names uh so you can actually see how it’s broken up in our data set so for the next step or step four as we’re calling it uh we want to display the image using Matt plot Library there’s many ways to display the images you could even uh well there’s other ways to drill into it but map plot library is really good for this and we’ll also look at our first reshape uh setup or shaping the data so you can have a little glimpse into what that means uh so we’re going to start by importing our M plot and of course since I am doing Jupiter notebook I need to do the map plot inline command so it shows up on my page so here we go we’re going to import matplot library. pip plot is PLT and if you remember map plot Library the P plot is like a canvas that we paint stuff onto and there’s my percentage sign map plot library in line so it’s going to show up in my notebook and then of course we’re going to import numpy as NP for our numbers python array setup and let’s go ahead and set u x equals to data batch one so this will pull in all the data going into the x value and then because this is just a long stream of binary data uh we need to go a little bit of reshaping so in here we have to go ahead and reshape the data we have 10,000 images okay that looks correct and this is kind of an interesting thing it took me a little bit to I had to go research this myself to figure out what’s going on with this data and what it is is it’s a 32×32 picture and let me do this let me go ahead and do a drawing pad on here uh so we have 32 bits by 32 bits and it’s in color so there’s three bits of color now I don’t know why the data is particularly like this it probably has to do with how they originally encoded it but most pictures put the three afterward so what we’re doing here is we’re going to take uh the shape we’re going to take the data which is just a long stream of information and we’re going to break it up into 10,000 pieces and those 10,000 pieces then are broken into three pieces each and those three pieces then are 32 by 32 you could look at this like an oldfashioned projector where they have the red screen or the red projector the blue projector and the green projector and they add them all together and each one of those is a 32x 32 bit so that’s probably how this was originally formatted with in that kind of Ideal things have changed so we’re going to transpose it we’re going to take the three which was here and we’re going to put it at the end so the first part is reshaping the data from a single line of bit data or whatever format it is into 10,000 by 3x 32x 32 and then we’re going to transpose the color factor to the last place so it’s the image then the 32x 32 in the middle that’s this part right here and then finally we’re going to take this uh which is three bits of data and put it at the end so it’s more like we do process images now and then as type this is really important that we’re going to use an integer 8 you can come in here and you’ll see a lot of these they’ll try to do this with a float or a float 64 what you got to remember though is a float uses a lot of memory so once you switch this into uh something that’s not integer 8 which is goes up to 128 you are just going to the the amount of ram let just put that in here is going to go way up the amount of RAM that it loads uh so you want to go ahead and use this you can try the other ones and see what happens if you have a lot of RAM on your computer but for this exercise this will work just fine and let’s go ahead and take that and run this so now our X variable is all loaded and it has all the images in it from the batch one data batch one and just to show we were talking about with the as type on there if we go ahead and take x0 and just look for its max value let me go ahead and run that uh you’ll see it doesn’t oops I said 128 it’s 255 uh you’ll see it doesn’t go over 255 because it’s an basically an asky character is what we’re keeping that down to we’re keeping those values down so they’re only 255 0 to 255 versus a float value which would bring this up um exponentially in size and since we’re using the map plot Library we can do um oops that’s not what I wanted since we’re using the map plot Library we can take our canvas and just do a PLT do IM for image show and let’s just take a look at what x0 looks like and it comes in I’m not sure what that is but you can see it’s a very low grade image uh broken down to the minimal pixels on there and if we did the same thing oh let’s do uh let’s see what one looks like hopefully it’s a little easier to see run on there not enter let’s hit the run on that uh and we can see this is probably a semi that’s a good guess on there and I can just go back up here instead of typing the same line in over and over and we’ll look at three uh that looks like a dump truck un loading uh and so on you can do any of the 10,000 images we can just jump to 55 uh looks like some kind of animal looking at us there probably a dog and just for fun let’s do just one more uh uh run on there and we can see a nice car for image number four uh so you can see we past through all the different images and it’s very easy to look at them and they’ve been reshaped to fit our view and what the uh map plot Library uses for its format so the next step is we’re going to start creating some helper functions we’ll start by a one hot encoder to help us we’re processing the data remember that your labels they can’t just be words they have to switch it and we use the one hot encoder to do that and then we’ll also create a uh class uh CFR helper so it’s going to having a knit and a setup for the images and then finally we’ll go ahead and run that code so you can see what that looks like and then we get into the fun part where we’re actually going to start creating our model our actual neural network model so let’s start by creating our one hot encoder we’re going to create our own here uh and it’s going to return an out and we’ll have our Vector coming in and our values equal 10 what this means is that we have the 10 values the 10 possible labels and remember we don’t look at the labels as a number because a car isn’t one more than a horse that’d be just kind of bizarre to have horse equals zero car equals 1 plane equals 2 cat equals 3 so a cat plus a C equals what uh so instead we create a numpy array of zeros and there’s going to be 10 values so we have 10 different values in there so you have uh zero or one one means it’s a cat zero means it’s not a cat um in the next line it might be that uh one means it’s a car zero means it’s not a car so instead of having one output with a value of 0 to 10 you have 10 outputs with the values of 0 to one that’s what the one hot encoder is doing here and we’re going to utilize this in code in just a minute so let’s go ahead and take a look at the next help helpers we have a few of these helper functions we’re going to build and when you’re working with a very complicated python project dividing it up into separate definitions and classes is very important otherwise it just becomes really ungainly to work with so let’s go ahead and put in our next helper uh which is a class and this is a lot in this class so we we’ll break it down here let’s just start uhop we put a space right in there there we go that this a little bit more readable add a second space so we’re going to create our class the cipher Helper and we’ll start by by initializing it now there’s a lot going on in here so let’s start with the uh nit part uh self. I equals zero that’ll come in in a little bit we’ll come back to that in the lower part we want to initialize our training batches so when we went through this there was like a meta batch we don’t need the meta batch but we do need the data batch one 2 3 4 five and we do not want the testing batch in here this is just the self all train batches so we’re going to come make an array of of all those different images and then of course we left the test batch out so we have our self. test batch uh we’re going to initialize the training images and the training labels and also the test images and the test labels so these are just this is just to initialize these variables in here then we create another definition down here and this is going to set up the images let’s just take a look and see what’s going on in there now we could have all just put this as part of the uh init part uh since this is all just helping stuff but breaking it up again makes it easier to read it also makes it easier when we start executing the different pieces to see what’s going on so that way we have a nice print statement to say hey we’re now running this and this is what’s going on in here we’re going to set up these self trining images at this point and that’s going to go to a numpy array vstack and in there we’re going to load up uh in this case the data for D and self all train batches again that points right up to here so we’re going to go through each one of these uh five files or each one of these data sets CU they’re not a file anymore we’ve brought them in data batch one points to the actual data and so our self-training images is going to stack them all into our into a numpy array and then it’s always nice to get the training length and that’s just a total number of uh self-training images in there and then we’re going to take the selft trining images let me switch marker colors cuz I am getting a little too much on the markers up here oops there we go bring down our marker change so we can see it a little better and at this point this should look familiar where did we see this well when we wanted to uh uh look at this above and we wanted to look at the images in the matplot library we had to reshape it so we’re doing the same thing here we’re taking our self-training images and uh based on the training length total number of images because we stacked them all together so now it’s just one large file of images we’re going to take and look at it as our our three video cameras that are each displaying uh 32 by 32 we’re going to switch that

around so that now we have um each of our images that stays the same place and then we have our 32x 32 and then by our three our last our three different values for the color and of course we want to go ahead and uh they run this where you say divide by 255 that was from earlier it just brings all the data into 0 to one that’s what this is doing so we’re turning this into a 0 to one array which is uh all the pictures 32x 32x 3 and then we’re going to take the self-training labels and we’re going to pump those through our one hot encoder we just made and we’re going to stack them together and uh again we’re converting this into an array that goes from uh instead of having horse equals one dog equals two and then horse plus dog would equal three which would be cat no it’s going to be uh you know an array of 10 where each one is 0o to one then we want to go ahead and set up our test images and labels and uh when we’re doing this you’re going to see it’s the same thing we just did with the rest of let me just change colors right here this is no different than what we were doing up here with our training Set uh we’re going to stack the different uh images uh we’re going to get the length of them so we know how many images are in there uh you certainly could add them by hand but it’s nice to let the computer do it especially if it ever changes on the other end and you’re using other data and again we reshape them and transpose them and we also do the one hot encoder same thing we just did on our training images so now our test images are in the same format so now we have a definition which sets up all our images in there and then the next step is to go ahead and batch them or next batch and let’s do another breakout here for batches because this is really important to understand T to throw me for a little Loop when I’m working with tensor flow or carass or a lot of these we have our data coming in if you remember we had like 10,000 photos let me just put 10,000 down here we don’t want to all 10,000 at once so we want to break this up into batch sizes and you also remember that we had the number of photos in this case uh length of test or whatever number is in there uh we also have 32 by 32 by 3 so when we’re looking at the batch size we want to change this from 10,000 to um a batch of in this case I think we’re going to do batches of 100 so we want to look at just 100 the first 100 of the photos and if you remember we set self y equal to 0er uh so what we’re looking at here is we’re going to create X we’re going to get the next batch from the very initialize we’ve already initialized it for zero so we’re going to look at X from zero to batch size which we set to 100 so just the first 100 images and then we’re going to reshape that into uh and this is important to let the data know that we’re looking at 100x 32x 32x 3 now we’ve already formatted it to the 32x 32x 3 this just sets everything up correctly so that X has the data in there in the correct order and the correct shape and then the Y just like the X uh is our labels so our training labels again they go from zero to batch size in this case they do selfi plus batch size because the selfi is going to keep changing and then finally we increment the selfi because we have zero so we so the next time we call it we’re going to get the next batch size and so basically we have X and Y X being the photograph data coming in and y being the label and that of course is labeled through one hot encoder so if you remember correctly if it was say horse is equal to zero it would be um one for the zero position since this is the horse and then everything else would be zero in here me just put lines through there there we go there’s our array hard to see that array so let’s go ahead and take that and uh we’re going to finish loading it since this is our class and now we’re armed with all this um uh our setup over here let’s go ahead and load that up and so we’re going to create a variable CH with the CFR helper in it and then we’re going to do ch. setup images uh now we could have just put all the setup images under the init but by breaking this up into two parts it makes it much more readable and um also if you’re doing other work there’s reasons to do that as far as the setup let’s go ahead and run that and you can see where it says uh setting up training images and labels setting up test images and that’s one of the reasons we broke it up is so that if you’re testing this out you can actually have print statements in there telling you what’s going on which is really nice uh they did a good job with this setup I like the way that it was broken up in the back and then one quick note you want to remember that batch to set up the next batch is we have to run uh batch equals CH next batch of 100 because we’re going to use the 100 size uh but we’ll come back to that we’re going to use that just remember that that’s part of our code we’re going to be using in a minute from the definition we just made so now we’re ready to create our model first thing we want to do is we want to import our tensor flow as TX I’ll just go ahead and run that so it’s loaded up and you can see we got a a warning here uh that’s because they’re making some changes it’s always growing and they’re going to be depreciating one of the uh values from float 64 to float type or it’s treated as an NP float 64 uh nothing to really worry about CU this doesn’t even affect what we’re working on because we’ve set all of our stuff to a 255 value or 0o to one and do keep in mind that 0 to one value that we converted to 255 is still a float value uh but it’ll will easily work with either the uh numpy float 64 or the numpy dtype float it doesn’t matter which one it goes through so the depreciation would not affect our code as we have it and in our tensor flow uh we’ll go ahead let me just increase the size in there just a moment so you can get better view of the um what we’re typing in uh we’re going to set a couple placeholders here and so we have we’re going to set x equals TF placeholder TF float 32 we just talked about the float 64 versus the numpy float we’re actually just going to keep this at float 32 more than a significant number of decimals for what we’re working with and since it’s a place holder we’re going to set the shape equal to and we’ve set it equal to none because at this point we’re just holding the place on there we’ll be setting up as we run the batches that’s what the first value is and then 32x 32x 3 that’s what we’ reshaped our data to fit in and then we have our y true equals placeholder T of float 32 and the shape equals none comma 10 10 is the 10 different labels we have so it’s an array of 10 and then let’s create one more placeholder we’ll call this a hold prob or hold probability and we’re going to use this we don’t have to have a shape or anything for this this placeholder is for what we call Dropout if you remember from our Theory before we drop out so many nodes that’s looking at or the different values going through which helps decrease bias so we need to go ahead and put a a placeholder for that also and we’ll run this so it’s all loaded up in there so we have our three different placeholders and since we’re in tensor flow when you use carass it does some of this automatically but we’re in tensor flow direct carass sits on tensor flow we’re going to go ahead and create some more helper functions we’re going to create something to help us initialize the weights initialize our bias if you remember that each uh layer has to have a bias going in we’re going to go ahead and work on our our conversional 2D our Max pool so we have our pooling layer our convolutional layer and then our normal F layer so we’re going to go ahead and put those all into definitions and let’s see what that looks like in code and you can also grab some of these helper functions from the MN the uh nist setup let me just put that in there if you’re under the tensor flow so a lot of these are already in there but we’re going to go ahead and do our own and we’re going to create our uh a knit weights and one of the reasons we’re doing this is so that you can actually start thinking about what’s going on in the back end so even though there’s ways to do this with an automation sometimes these have to be tweaked and you have to put in your own setup in here uh now we’re not going to be doing that we’re just going to recreate them for our code and let’s take a look at this we have our weights and so what comes in is going to be the shape and what comes out is going to be uh random numbers so we’re going to go ahead and just nit some random numbers based on the shape with a standard deviation of 0.1 kind of a fun way to do that and then the TF variable uh in nit random distribution so we’re just creating a random distribution on there that’s all that is for the weights now you might change that you might have a a higher standard deviation in some cases you actually load preset weights that’s pretty rare usually you’re testing that against another model or something like that and you want to see how those weights configure with each other uh now remember we have our bias so we need to go ahead and initialize the bias with a constant uh in this case we’re using 0.1 a lot of times the bias is just put in as one and then you have your weights to add on to that uh but we’re going to set this as 0.1 uh so we want to return a convolutional 2d in this case a neural network this is uh would be a layer on here what’s going on with the con 2D is we’re taking our data coming in uh we’re going to filter it strides if you remember correctly strides came from here’s our image and then we only look at this picture here and then maybe we have a stride of one so we look at this picture here and we continue to look at the different filters going on there the other thing this does is that we have our data coming in as 32 by 32 by 3 and we want to change this so that it’s just this is three dimensions and it’s going to reformat this as just two Dimensions so it’s going to take this number here and combine it with the 32x 32 so this is a very important layer here CU it’s reducing our data down using different means and it connects down I’m just going to jump down one here uh it goes with the convolutional layer so you have your your kind of your pre- formatting and the setup and then you have your actual convolution layer that goes through on there and you can see here we have a knit weights by the the shape and knit bias shape of three because we have the three different uh here’s our three again and then we return the tfnn relu with the convention 2D so this convolutional uh has this feeding into it right there it’s using that as part of it and of course the input is the XY plus b the bias so that’s quite a mouthful but these two are the are the keys here to creating the convolutional layers there the convolutional 2D coming in and then the convolutional layer which then steps through and creates all those filters we saw then of course we have our pooling uh so after each time we run it through the convectional layer we want to pull the data uh if you remember correctly on the on the pool side and let me just get rid of all my marks it’s getting a little crazy there and in fact let’s go ahead and jump back to that slide let’s just take a look at that slide over here uh so we have our image coming in we create our convolutional layer with all the filters remember the filters go um you know the filters coming in here and it looks at these four boxes and then if it’s a step let’s say step two it then goes to these four boxes and then the next step and so on uh so we have our convolutional layer that we generate or convolutional layers they use the uh reu function um there’s other functions out there for this though the reu is the uh most the one that works the best at least so far I’m sure that will change then we have our pooling now if you remember correctly the pooling was Max uh so if we had the filter coming in and they did the multiplication on there and we have a one and maybe a two here and another one here and a three here three is the max and so out of all of these you then create an array that would be three and if the max is over here two or whatever it is that’s what goes into the pooling of what’s going on in our pooling uh so again we’re reducing that data down we’re reducing it down as small as we can and then finally we’re going to flatten it out into a single array and that goes into our fully connected layer and you can see that here in the code right here we’re going to create our normal full layer um so at some point we’re going to take from our pooling layer this will go into some kind of flattening process and then that will be fed into the full the different layers going in down here um and so we have our input size you’ll see our input layer get shape which is just going to get the shape for whatever is coming in uh and then input size initial weights is also based on uh the input layer coming in and the input size down here is based on the input layer shape so we’re just going to already use the shape and already have our size coming in and of course uh you have to make sure youit the bias always put your bias on there and we’ll do that based on the size so this will return tf. matmo input layer w+b this is just a normal full layer that’s what this means right down here that’s what we’re going to return so that was a lot of steps we went through let’s go ahead and run that so those are all loaded in there and let’s go ahead and uh create the layers let’s see what that looks like now that we’ve done all the heavy lifting and everything uh we can get to do all the easy part let’s go ahead and create our layers we’ll create a convolution layer one and two two different convolutional layers and then we’ll take that and we’ll flatten that out create a reshape pooling in there for our reshape and then we’ll have our full uh layer at the end so let’s start by creating our first uh convolutional layer then we come in here and let me just run that real quick and I want you to notice on here the three and the 32 this is important because coming into convolutional layer we have three different channels and 32 pixels each uh so that has to be in there the four and four you can play with this is your filter size so if you remember you have a filter and you have your image and the filter slowly steps over and filters out this image depending on what your step is for this particular setup 44 is just fine that should work pretty good for what we’re doing and for the size of the image and then of course at the end once you have your com evolutional layer set up you also need to pull it and you’ll see that the pooling is automatically set up so that it would see the different shape based on what’s coming in so here we have Max two 2 by two and we put in the convolutional one that we just created the convolutional layer we just created goes right back into it and that right up here as you can see is the X it’s coming in from here so it NOS to look at the first model and set the the data accordingly set that up so it matches and we went ahead ran this already I think I ran let me go and run it again and if we’re going to do one layer let’s go ahead and do a second layer down here and it’s we’ll call it convo 2 it’s also convolutional layer on this and you’ll see that we’re feeding convolutional one in the pooling so it goes from convolutional one into convolutional one pooling from convolutional one pooling into convolutional two and then from convolutional two into convolutional two pooling and we’ll go ahead and take this and run this so these variables are all loaded into memory and for our flatten layer uh let’s go ahead and we’ll do uh since we have 64 coming out of here and we have 4×4 going in let’s do 8X 8X 64 so let’s do 4,096 this is going to be the flat layer so that’s how many bits are coming through on the flat layer and we’ll reshape this so we’ll reshape our convo 2 pooling and that will feed into here the convo two pooling and then we’re going to set it up as a single layer that’s 4,9 6 in size that’s what that means there we’ll go ahead and run this so we’ve now created this variable the convo two flat and then we have our first full layer this is the final uh neural network where the flat layer going in and we’re going to again use the uh Rel for our uh setup on there on a neural network for evaluation and you’ll notice that we’re going to create our first full layer our normal full layer that’s our definition so we created that that’s creating the normal full layer and our input for the data comes right here from the this goes right into it uh the convo to flat so this tells it how big the data is and we’re going to have it come out it’s going to have uh 1024 that’s how big the layer is coming out we’ll go ahead and run this so now we have our full layer one and with the full layer one we want to also Define the full one Dropout to go with that so our full layer one comes in uh keep probability equals hold probability remember we created that earlier and the full layer one is what’s coming into it and this is going backwards and training the data we’re not training every weight we’re only training a percentage of them each time which helps get rid of the bias so let me go ahead and run that and uh finally we’ll go ahead and create a y predict which is going to equal the normal full one Dropout and 10 cuz we have 10 labels in there now in this neural network we could have added additional layers that would be another option to play with you can also play with instead of 1024 you can use other numbers for the way that sets up on what’s coming out going into the next one we’re only going to do just the one layer and the one layer Dropout and you can see if we did another layer it’d be really easy just to feed in the full one Dropout into full layer two and then full Layer Two Dropout would have full Layer Two feed into it and then you’d switch that here for the Y prediction for right now this is great this particular data set is tried and true and we know that this will work on it and if we just type in y predict and we run that uh we’ll see that this is a tensor object uh shape question mark 10 dtype 32 a quick way to double check what we’re working on so now we’ve got all of our uh we’ve done a setup all the way to the Y predict which we just did uh we want to go ahead and apply the loss function and make sure that set up in there uh create the optimizer and then uh trainer Optimizer and create a variable to initialize all the global TF variables so before we dive into the um loss fun function let me point out one quick thing or just kind of a rehap over a couple things and that is when we’re playing with this these setups um we pointed out up here we can change the 44 and use different numbers there you change your outcome so depending on what numbers you use here will have a huge impact on how well your model fits and that’s the same here of the 1024 also this is also another number that if you continue to raise that number you’ll get um possibly a better fit you might overfit and if you lower that number you’ll use less resources and generally you want to use this in um the exponential growth an exponential being 2 4 8 16 and in this case the next one down would be 512 you can use any number there but those would be the ideal numbers uh when you look at this data so the next step in all this is we need to also create uh a way of tracking how good our model is and we’re going to call this a loss function and so we’re going to create a cross entropy line loss function and so before we discuss exactly what that is let’s take a look and see what we’re feeding it uh we’re going to feed it our labels and we have our true labels and our prediction labels uh so coming in here is we’re the two different uh variables we’re sending in or the two different probability distributions is one that we know is true and what we think it’s going to be now this function right here when they talk about cross entropy uh in information Theory the cross entropy between two probability distributions over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set that’s a mouthful uh really we’re just looking at the amount of error in here how many of these are correct and how many of these um are incorrect so how much of it matches and we’re going to look at that we’re just going to look at the average that’s what the mean the reduced to the mean means here so we’re looking at the average error on this and so the next step is we’re going to take the error we want to know our cross entropy or our loss function how much loss we have that’s going to be part of how we train the model so when you know what the loss is and we’re training it you feed that back into the back propagation setup and so we want to go ahead and optimize that here’s our Optimizer we’re going to create the optimizer using an atom Optimizer remember there’s a lot of different ways of optimizing the data atoms the most popular used uh so our Optimizer is going to equal the TF train atom Optimizer if you don’t remember what the learning rate is let me just pop this back into here here’s our learning rate when you have your weights you have all your weights and your different nodes that are coming out here’s our node coming out um and it has all its weights and then the error is being prop sent back through in reverse on our neural network so we take this error and we adjust these weights based on the different formulas in this case the atom formulas is what we’re using we don’t want to just adjust them completely we don’t want to change this weight so it exactly fits the data coming through because if we made that kind of adjustment it’s going to be biased to whatever the last data we sent through is instead we’re going to multiply that by 0.001 and make a very small shift in this weight so our Delta W is only 0.001 of the actual Delta W of the full change we’re going to compute from the atom and then we want to go ahead and train it so our training or set up a training uh uh variable or function and this is going to equal our Optimizer minimize cross entropy and we make sure we go ahead and run this so it’s loaded in there and then we’re almost ready to train our model but before we do that we need to create one more um variable in here and we’re going to create a variable to initialize all the global TF variables and when we look at this um the TF Global variable initializer this is a tensor flow um object it goes through there and it looks at all our different setup that we have going under our tensor flow and then initializes those variables uh so it’s kind of like a magic one because it’s all hidden in the back end of tensor flow all you need to know about this is that you have to have the initial ization on there which is an operation um and you have to run that once you have your setup going so we’ll go ahead and run this piece of code and then we’re going to go ahead and train our data so let me run this so it’s loaded up there and so now we’re going to go ahead and run the model by creating a graph session graph session is a tensorflow term so you’ll see that coming up it’s one of the things that throws me because I always think of graphx and Spark and graph as just general graphing uh but they talk about a graph session so we’re going to go ahead and run the model and let’s go ahead and walk through this uh what’s going on here and let’s paste this data in here and here we go so we’re going to start off with the with the TF session as sess so that’s our actual TF session we’ve created uh so we’re right here with the TF uh session our session we’re creating we’re going to run TF Global variable initializer so right off the bat we’re initializing our variables here uh and then we have for I in range 500 so what’s going on here remember 500 we’re going to break the date up and we’re going to batch it in at 500 points each we’ve created our session run so we’re going to do with TF session as session right here we’ve created our variable session uh and then we’re going to run we’re going to go ahead and initialize it so we have our TF Global variables initializer that we created um that initializes our our session in here the next thing we’re going to do is we’re going to go for I in range of 500 batch equals ch next batch so if you remember correctly this is loading up um 100 pictures at a time and uh this is going to Loop through that 500 times so we are literally doing uh what is that uh 500 time 100 is uh 50,000 so that’s 50,000 pictures we’re going to process right there in the first process is we’re going to do a session run we’re going to take our train we created our train variable or Optimizer in there we’re going to feed it the dictionary uh we had our feed dictionary that created and we have x equals batch 0 coming in y true batch one hold the probability five and then just so that we can keep track of what’s going on we’re going to every uh 100 steps we’re going to run a print So currently onstep format accuracy is um and we’re going to look at matches equals tf. equal TF argument y prediction one tf. AR Max y true comma 1 so we’re going to look at this is how many Ma matches it has and here our ACC uh all we’re doing here is we’re going to take the matches how many matches they have it creates generates a chart we’re going to convert that to float that’s what the TF cast does and then we just want to know the average we just want to know the average of the um accuracy and then we’ll go ahead and print that out uh print session run accuracy feed dictionary so it takes all this and it prints out our accuracy on there so let’s go ahead and take this oops screens there let’s go ahead and take this and let’s run it and this is going to take a little bit to run uh so let’s see what happens on my old laptop and we’ll see here that we have our current uh we’re currently on Step Zero it takes a little bit to get through the accuracy and this will take just a moment to run we can see that on our Step Zero it has an accuracy of 0.1 or 0128 um and as it’s running we’ll go ahead you don’t need to watch it run all the way but uh this accuracy is going to change a little bit up and down so we’ve actually lost some accuracy during our step two but we’ll see how that comes out let’s come back after we run it all the way through and see how the different steps come out I was actually reading that backwards uh the way this works is the closer we get to one the more accuracy we have uh so you can see here we’ve gone from a 0.1 to a 39 um and we’ll go ahead and pause this and come back and see what happens when we’re done with the full run all right now that we’ve uh prepared the meal got it in the oven and pulled out my finished dish here if you’ve ever watched uh any of the old cooking shows let’s discuss a little bit about this accuracy going on here and how do you interpret that we’ve done a couple things first we’ve defined accuracy um the reason I got it backwards before is you have uh loss or accuracy and with loss you’ll get a graph that looks like this it goes oops that’s an S by the way there we go you get a graph that curves down like this and with accuracy you get a graph that curves up this is how good it’s doing now in this case uh one is supposed to be really good accuracy that mean it gets close to one but it never crosses one so if you have an accuracy of one that is phenomenal um in fact that’s pretty much imp you know unheard of and the same thing with loss if you have a loss of zero that’s also unheard of the zero is actually on this this axis right here as we go in there so how do we interpret that because you know if I was looking at this and I go oh 0. 51 that’s uh 51% you’re doing 5050 no this is not percentage let me just put that in there it is not percentage uh this is log rithmic what that means is that 0. 2 is twice as good as 0.1 and uh when we see 04 that’s twice as good as 0. 2 real way to convert this into a percentage you really can’t say this is is a direct percentage conversion what you can do though is in your head if we were to give this a percentage uh we might look at this as uh 50% we’re just guessing equals 0.1 and if 50% roughly equals 0.1 that’s where we started up here at the top remember at the top here here’s our 0.128 the accuracy of 50% then 75% is about 0.2 and so on and so on don’t quote those numbers because that doesn’t work that way they say that if you have .95 that’s pretty much saying 100% And if you have uh anywhere between you’d have to go look this up let me go and remove all my drawings there uh so the the magic number is 0.5 we really want to be over a 0.5 in this whole thing and we have uh both 0504 remember this is accuracy if we were looking at loss then we’d be looking the other way but 0.0 you know instead of how high it is we want how low it is uh but with accuracy being over a 05 is pretty valid that means this is pretty solid and if you get to a 0.95 then it’s a direct correlation that’s what we’re looking for here in these numbers you can see we finished with this model at 0 5135 so still good um and if we look at uh when they ran this in the other end remember there’s a lot of Randomness that goes into it when we see the weights uh they got 05251 so a little better than ours but that’s fine you’ll find your own uh comes up a little bit better or worse depending on uh just that Randomness and so we’ve gone through the whole model we’ve created we trained the model and we’ve also gone through on every 100th run to test the model to see how accurate it is welcome to the RNN tutorial that’s the recurrent neural network so we talk about a feed forward neural network in a feed forward neural network information flows only in the forward direction from the input nodes through the hidden layers if any and the output nodes there are no Cycles or Loops in the network and so you can see here we have our input layer I was talking about how it just goes straight forward into the hidden layers so each one of those connects and then connects to the next hidden layer connects to the output layer and of course we have a nice simplified version where it has a predicted output and the refer to the input is X a lot of times in the output as y decisions are based on current input no memory about the past no future scope why recurrent neural network issues in feed forward neural network so one of the biggest issues is because it doesn’t have a scope of memory or time a feed forward neural network doesn’t know how to handle sequential data uh it only considers only the current input so if you have a series of things and because three points back affects what’s happening now and what your output affects what’s happening that’s very important so whatever I put as an output is going to affect the next one um a feed forward doesn’t look at any of that it just looks at this is what’s coming in and it cannot memorize previous inputs so it doesn’t have that list of inputs coming in solution to feed forward neural network you’ll see here where it says recurrent neural network and we have our X on the bottom going to H going to Y that’s your feed forward uh but right in the middle it has a value C so it’s a whole another process it’s memorizing what’s going on in the hidden layers and the hidden layers they produce data feed into the next one so your hidden layer might have an output that goes off to Y uh but that output goes back into the next prediction coming in what this does is this allows it to handle sequential data it considers the current input and also the previously received inputs and if we’re going to look at General drawings and um Solutions we should also look at applications of the RNN image captioning RNN is used to caption an image by analyzing the activities present in it a dog catching a ball in midair uh that’s very tough I mean you know we have a lot of stuff that analyzes images of a dog and the image of a ball but it’s able to add one more feature in there that’s actually catching the ball in midair time series prediction any time series problem like predicting the prices of stocks in a particular month can be solved using RNN and we’ll dive into that in our use case and actually take a look at some stock one of the things you should know about analyzing stock today is that it is very difficult and if you’re analyzing the whole stock the stock market at the New York Stock Exchange in the US produces somewhere in the neighborhood if you count all the individual trades and fluctuations by the second um it’s like three terabytes a day of data so we’re only to look at one stock just analyzing One stock is really tricky in here we’ll give you a little jump on that so that’s exciting but don’t expect to get rich off of it immediately another application of the RNN is natural language processing text Mining and sentiment analysis can be carried out using RNN for natural language processing and you can see right here the term natural language processing when you stream those three words together is very different than I if I said processing language natural Le so the time series is very important when we’re analyzing sentiments it can change the whole value of a sentence just by switching the words around or if you’re just counting the words you might get one sentiment where if you actually look at the order they’re in you get a completely different sentiment when it rains look for rainbows when it’s dark look for stars both of these are positive sentiments and they’re based upon the order of which the sentence is going in machine translation given an input in one language RNN can be used to translate the input into a different languages as output I myself very linguistically challenged but if you study languages and you’re good with languages you know right away that if you’re speaking English you would say big cat and if you’re speaking Spanish you would say cat big so that translation is really important to get the right order to get uh all kinds of parts of speech that are important to know by the order of the words here this person is speaking in English and getting translated and you can see here a person is speaking in English in this little diagram I guess that’s denoted by the flags I have a flag I own it no um but they’re speaking in English and it’s getting translated into Chinese Italian French German and Spanish languages some of the tools coming out are just so cool so somebody like myself who’s very linguistically challenged I can now travel into Worlds I would never think of because I can have something translate my English back and forth readily and I’m not stuck with a communication gap so let’s dive into what is a recurrent neural network recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer sounds a little confusing when we start breaking it down it’ll make more sense and usually we have a propagation forward neural network with the input layers the hidden layers the output layer with the recurrent neural network we turn that on its side so here it is and now our X comes up from the bottom into the hidden layers into Y and they usually draw very simplified X to H with c as a loop a to Y where a B and C are the perimeters a lot of times you’ll see this kind of drawing in here digging closer and closer into the H and how it works going from left to right you’ll see that the C goes in and then the X goes in so the x is going Upward Bound and C is going to the right a is going out and C is also going out that’s where it gets a little confusing so here we have xn uh CN and then we have y out and C out and C is based on HT minus one so our value is based on the Y and the H value or connected to each other they’re not necessarily the same value because H can be its own thing and usually we draw this or we represent it as a function h of T equals a function of C where H of T minus one that’s the last H output and x a t going in so it’s the last output of H combined with the new input of x uh where HT is the new state FC is a function with the parameter C that’s a common way of denoting it uh HT minus one is the Old State coming out and then xit T is an input Vector at time of Step T well we need to cover types of recurrent neural networks and so the first one is the most common one which is a one: one single output one: one neural network is usually known as is a vanilla neural network used for regular machine learning problems why because vanilla is usually considered kind of a just a real basic flavor but because it’s very basic a lot of times they’ll call it the vanilla neural network uh which is not the common term but it is you know like kind of a slang term people will know what you’re talking about usually if you say that then we run one to Min so you have a single input and you might have a multiple outputs in this case uh image captioning as we looked at earlier where we have not just looking at it as a dog but a dog in a ball in the air and then you have many to1 Network takes in a sequence of inputs examples sentiment analysis where a given sentence can be classified as expressing positive or negative sentiments and we looked at that as we were discussing if it rains look for a rainbow so positive sentiment where rain might be a negative sentiment if you were just adding up the words in there and then of course if you’re going to do a one to one many to one one to many there’s many to many networks takes in a sequence of inputs and generates a sequence of outputs example machine translation so we have a lengthy sentence coming in in English and then going out in all the different languages uh you know just a wonderful tool very complicated set of computations you know if you’re a translator you realize just how difficult it is to translate into different languages one of the biggest things you need to understand when we’re working with this neural network is what’s called The Vanishing gradient problem while training an RNN your slope can be either too small or very large and this makes training difficult when the slope is too small the problem is known as Vanishing gradient and you’ll see here they have a nice U image loss of information through time so if you’re pushing not enough information forward that information is lost and then when you go to train it you start losing the third word in the sentence or something like that or it doesn’t quite follow the full logic of what you’re working on exploding gradient problem Oh this is one that runs into everybody when you’re working with this particular neural network when the slope tends to grow EXP itially instead of decaying this problem is called exploding gradient issues in gradient problem long tring time poor performance bad accuracy and I’ll add one more in there uh your computer if you’re on a lower-end computer testing out a model will lock up and give you the memory error explaining gradient problem consider the following two examples to understand what should be the next word in the sequence the person who took my bike and blank a thief the students who got into engineering with blank from Asia and you can see in here we have our x value going in we have the previous value going forward and then you back propagate the error like you do with any neural network and as we’re looking for that missing word maybe we’ll have the person took my bike and blank was a thief and the student who got into engineering with a blank were from Asia consider the following example the person who took the bike so we’ll go back to the person who took the bike was blank a thief in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun so was a thief is singular the student who got into engineering well in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun and so you can see here the students who got into engineering with blank were from Asia it might be sometimes difficult for the eror to back propagate to the beginning of the sequence to predict what should be the output so when you run into the gradient problem we need a solution the solution to the gradient problem first we’re going to look at exploding gradient where we have three different solutions depending on what’s going on one is identity initialization so the first thing we want to do is see if we can find a way to minimize the identities coming in instead of having it identify everything just the important information we’re looking at next is to truncate the back propagation so instead of having uh whatever information it’s sending to the next series we can truncate what it’s sending we can lower that particular uh set of layers make those smaller and finally is a gradient clipping so when we’re training it we can clip what that gradient looks like and narrow the training model that we’re using when you have a Vanishing gradient the OPA problem uh we can take a look at weight initialization very similar to the identity but we’re going to add more weights in there so it can identify different aspects of what’s coming in better choosing the the right activation function that’s huge so we might be activating based on one thing and we need to limit that we haven’t talked too much about activation functions so we’ll look at that just minimally uh there’s a lot of choices out there and then finally there’s long short-term memory networks the lstms and we can make adjustments to that so just like we can clip the gradient as it comes out we can also um expand on that we can increase the memory Network the size of it so it handles more information and one of the most common problems in today’s uh setup is what they call longterm dependencies suppose we try to predict the last word in the text the clouds are in the and you probably said sky here we do not need any further context it’s pretty clear that the last word is going to be Sky suppose we try to predict the last word in the text I have been staying in Spain for the last 10 years I can speak fluent maybe you said Portuguese or French no you probably said Spanish the word we predict will depend on the previous few words words in context here we need the context of Spain to predict the last word in the text it’s possible that the gap between the relevant information and the point where it is needed to become very large lstms help us solve this problem so the lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior All recurrent neural networks have the form of a chain of repeat repeating modules of neural network connections in standard rnns this repeating module will have a very simple structure such as a single tangent H layer lstm s’s also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior LST tms’s also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way as you can see the deeper we dig into this the more complicated the graphs kit in here I want you to note that you have X of T minus one coming in you have X of T coming in and you have x a t + one and you have H of T minus one and H of T coming in and H of t+1 going out and of course uh on the other side is the output a um in the middle we have our tangent H but it occurs in two different places so not only when we’re Computing the x of t + one are we getting the tangent H from X of T but we’re also getting that value coming in from the X of T minus one so the short of it is as you look at these layers not only does it does the propagate through the first layer goes into the second layer back into itself but it’s also going into the third layer so now we’re kind of stacking those up and this can get very complicated as you grow that in size it also grows in memory too and in the amount of resources it takes uh but it’s a very powerful tool to help us address the problem of complicated long sequential information coming in like we were just looking at in the sentence and when we’re looking at our long shortterm memory network uh there’s three steps of processing assing in the lstms that we look at the first one is we want to forget irrelevant parts of the previous state you know a lot of times like you know is as in a unless we’re trying to look at whether it’s a plural noun or not they don’t really play a huge part in the language so we want to get rid of them then selectively update cell State values so we only want to update the cell State values that reflect what we’re working on and finally we want to put only output certain parts of the cell state so whatever is coming out we want to limit what’s going out too and let’s dig a little deeper into this let’s just see what this really looks like uh so step one decides how much of the past it should remember first step in the lstm is to decide which information to be omitted in from the cell in that particular time step it is decided by the sigmoid function it looks at the previous state h of T minus one and the current input xit and computes the function so you can see over here we have a function of T equals the sigmoid function of the weight of f the H at T minus one and then X at t plus of course you have a bias in there with any of your neural networks so we have a bias function so F of equals forget gate decides which information to delete that is not important from the previous time step considering an L STM is fed with the following inputs from the previous and present time step Alice is good in physics JN on the other hand is good in chemistry so previous output John plays football well he told me yesterday over the phone that he had served as a captain of his college football team that’s our current input so as we look at this the first step is the forget gate realizes there might be a change in context after en counting the First full stop Compares with the current input sentence of exit so we’re looking at that full stop and then Compares it with the input of the new sentence the next sentence talks about John so the information on Alice is deleted okay that’s important to know so we have this input coming in and if we’re going to continue on with John then that’s going to be the primary information we’re looking at the position of the subject is vacated and is a assigned to John and so in this one we’ve seen that we’ve weeded out a whole bunch of information and we’re only passing information on JN since that’s now the new topic so step two is in to decide how much should this unit add to the current state in the second layer there are two parts one is a sigmoid function and the other is a tangent H in the sigmoid function it decides which values to let through zero or one tangent H function gives the weightage to the values which are passed de setting their level of importance minus one to one and you can see the two formulat that come up uh the I of T equals the sigmoid of the weight of I a to t minus1 x t plus the bias of I and the C of T equals the tangent of H of the weight of C of H of T minus 1 x of t plus the bias of C so our I of T equals the input gate determines which information to let through based on its significance in the current time step if this seems a little complicated don’t worry because a lot of the programming is already done when we get to the case study understanding though that this is part of the program is important when you’re trying to figure out these what to set your settings at you should also note when you’re looking at this it should have some semblance to your forward propagation neural networks where we have a value assigned to a weight plus a bias very important steps than any of the neural network layers whether we’re propagating into them the information from one to the next or we’re just doing a straightforward neural network propagation let’s take a quick look at this what it looks like from the human standpoint um as I step out in my suit again consider the current input at xft John plays football well he told me yesterday over the phone that he had served as a captain of his college football team that’s our input input gate analyses the important information John plays football and he was a captain of his college team is important he told me over the phone yesterday is less important hence it is forgotten this process of adding some new information can be done via the input gate now this example is as a human form and we’ll look at training this stuff in just a minute uh but as a human being if I wanted to get this information from a conversation maybe it’s a Google Voice listening in on you or something like that um how do we weed out the information that he was talking to me on the phone yesterday well I don’t want to memorize that he talked to me on the phone yesterday or maybe that is important but in this case it’s not I want to know that he was the captain of the football team I want to know that he served I want to know that John plays football and he was a captain of the college football team the those are the two things that I want to take away as a human being again we measure a lot of this from the human Viewpoint and that’s also how we try to train them so we can understand these neural networks finally we get to step three decides what part of the current cell State makes it to the output the third step is to decide what will be our output first we run a sigmoid layer which decides what parts of the cell State make it to the output then we put the cell State through the tangent H to push the values to be between minus one and one and multiply it by the output of the oid gate so when we talk about the output of T we set that equal to the sigmoid of the weight of zero of the H of T minus one you back One Step in Time by the x of t plus of course the bias the H of T equals the out of T times the tangent of the tangent h of c a t so our o equals the output gate allows the past in information to impact the output in the current time step let’s consider the example to predicting the next word in the sentence John played tremendously well against the opponent and won for his team for his contributions Brave blank was awarded player of the match there could be a lot of choices for the empty space current input Brave is an adjective adjectives describe a noun John could be the best output after Brave thumbs up for John awarded player of the match and if you were to pull just the nouns out of the sentence team doesn’t look right because that’s not really the subject we’re talking about contributions you know Brave contributions or Brave team Brave player Brave match um so you look at this and you can start to train this these this neural network so it starts looking at and goes oh no JN is what we’re talking about so brave is an adjective Jon’s going to be the best output and we give John a big thumbs up and then of course we jump into my favorite part the case study use case implementation of lstm let’s predict the prices of stocks using the lstm network based on the stock price price data between 2012 2016 we’re going to try to predict the stock prices of 2017 and this will be a narrow set of data we’re not going to do the whole stock market it turns out that the New York Stock Exchange generates roughly three terabytes of data per day that’s all the different trades up and down of all the different stocks going on and each individual one uh second to second or nanc to nanoc uh but we’re going to limit that to just some very basic fundamental information so don’t think you’re going to get rich off this today but at least you can give an a you can give a step forward in how to start processing something like stock prices a very valid use for machine learning in today’s markets use case implementation of lstm let’s dive in we’re going to import our libraries we’re going to import the training set and uh get the scaling going um now if you watch any of our other tutorials a lot of these pieces just start to look very familiar because it’s very similar setup uh but let’s take a look at that and um just a reminder we’re going to be using Anaconda the Jupiter notebook so here I have my anaconda Navigator when we go under environments I’ve actually set up a caros python 36 I’m in Python 36 and uh nice thing about Anaconda especially the newer version remember a year ago messing with anaconda in different versions of python and different environments um Anaconda now has a nice interface um and I have this installed both on a Ubuntu Linux machine and on uh window so it works fine on there you can go in here and open a terminal window and then in here once you’re in the terminal window this is where you’re going to start uh installing using pip to install your different modules and everything now we’ve already pre-installed them so we don’t need to do that in here uh but if you don’t have them install in your particular environment you’ll need to do that and of course you don’t need to use the anaconda or the Jupiter you can use whatever favorite python ID you like I’m just a big fan of this CU it keeps all my stuff separate you can see on this machine I have specifically installed one for carass since we’re going to be working with carass under tensorflow when we go back to home I’ve gone up here to application and that’s the environment I’ve loaded on here and then we’ll click on the launch Jupiter notebook now I’ve already in my Jupiter notebook um have set up a lot of stuff so that we’re ready to go kind of like uh Martha Stewarts and the old cooking shows we want to make sure we have all our tools for you so you’re not waiting for them to load and uh if we go up here to where it says new you can see where you can um create a new Python 3 that’s what we did here underneath the setup so it already has all the modules installed on it and I’m actually renamed this so if you go under file you can rename it we I’m calling it RNN stock and let’s just take a look at start diving into the code let’s get into the exciting part now we’ve looked at the tool and of course you might be using a different tool which is fine uh let’s start putting that code in there and seeing what those Imports and uploading everything looks like now first half is kind of boring when we hit the rum button because we’re going to be importing numpy as NP that’s uh uh the number python which is your numpy array and the matap plot library because we’re going to do some plotting at the end and our pandas for our data set our pandas is PD and when I hit run uh it really doesn’t do anything except for load those modules just a quick note let me just do a quick draw here oops shift alt there we go you’ll notice when we’re doing this setup if I was to divide this up oops I’m going to actually U let’s overlap these here we go uh this first part that we’re going to do is our data prep a lot of prepping involved um in fact depending on what your system is since we’re using carass I put an overlap here uh but you’ll find that almost maybe even half of the code we do is all about the data prep and the reason I overlap this with uh carass let me just put that down because that’s what we’re working in uh is because car has like their own preset stuff so it’s already pre-built in which is really nice so there’s a couple Steps A lot of times that are in the carass setup uh we’ll take a look at that to see what comes up in our code as we go through and look at stock and then the last part is to evaluate and if you’re working with um shareholders or uh you know classroom whatever it is you’re working with uh the evaluate is the next biggest piece um so the actual code here crossed is a little bit more but when you’re working with uh some of the other packages you might have like three lines that might be it all your stuff is in your pre-processing and your data since carass has is is Cutting Edge and you load the individual layers you’ll see that there’s a few more lines here and cross is a little bit more robust and then you spend a lot of times uh like I said with the evaluate you want to have something you present to everybody else and say hey this is what I did this is what it looks like so let’s go through those steps this is like a kind of just general overview and let’s just take a look and see what the next set of code looks like and in here we have a a data set train and it’s going to be read using the PD or pandas read CSV and it’s a Google stock pric train. CSV and so under this we have training set equals data set train. iocation and we’ve kind of sorted out part of that so what’s going on here let’s just take a look at let’s let’s look at the actual file and see what’s going on there now if we look at this uh ignore all the extra files on this um I already have a train and a test set where it’s sorted out this is important to notice because a lot of times we do that as part of the pre-processing of the data we take 20% of the data out so we can test it and then we train the rest of it that’s what we use to create our neural network that way we can find out how good it is uh but let’s go ahead and just take a look and see what that looks like as far as the file itself and I went ahead and just opened this up in a basic word pad text editor just so we can take a look at it certainly you can open up an Excel or any other kind of spreadsheet um and we note that this is a comma SE ated variables we have a date uh open high low close volume this is the standard stuff that we import into our stock one of the most basic set of information you can look at in stock it’s all free to download um in this case we downloaded it from uh Google that’s why we call it the Google stock price um and it specifically is Google this is the Google stock values from uh as you can see here we started off at 13 2012 so when we look at this first setup up here uh we have a data set train equals pdor CSV and if you noticed on the original frame um let me just go back there they had it set to home Ubuntu downloads Google stock price train I went ahead and changed that because we’re in the same file where I’m running the code so I’ve saved this particular python code and I don’t need to go through any special paths or have the full path on there and then of course we want to take out um certain values in here and you’re going to notice that we’re using um our data set and we’re now in pandas uh so pandas basically it looks like a spreadsheet um and in this case we’re going to do I location which is going to get specific locations the first value is going to show us that we’re pulling all the rows in the data and the second one is we’re only going to look at columns one and two and if you remember here from our data as we switch back on over columns we always start with zero which is the date and we’re going to be looking at open and high which would be one and two we’ll just label that right there so you can see now when you go back and do this you certainly can extrapolate and do this on all the columns um but for the example let’s just limit a little bit here so that we can focus on just some key aspects of stock and then we’ll go up here and run the code and uh again I said the first half is very boring whenever you hit the Run button it doesn’t do anything cuz we’re still just loading the data and setting it up now that we’ve loaded our data we want to go ahead and scale it we want to do what they call feature scaling and in here we’re going to pull it up from the sklearn or the SK kit pre-processing import min max scaler and when you look at this you got to remember that um biases in our data we want to get rid of that so if you have something that’s like a really high value um let’s just draw a quick graph and I have have something here like the maybe the stock has a value One stock has a value of 100 and another stock has a value of five um you start to get a bias between different stocks and so when we do this we go ahead and say okay 100’s going to be the Max and five is going to be the men and then everything else goes and then we change this so we just squish it down I like the word squish so it’s between one and zero so 100 equals one or 1 equal 100 and 0 equal 5 and you can just multiply it’s usually just a simple multiplication we’re using uh multiplication so it’s going to be uh minus5 and then 100 divided or 95 divided by 1 so or whatever value is is divided by 95 and uh once we’ve actually created our scale we’ve toing it’s going to be from 0o to one we want to take our training set and we’re going to create a trending set scaled and we’re going to use our scaler SC and we’re going to fit we’re going to fit pit and transform the training Set uh so we can now use the SC this this particular object we’ll use it later on our testing set because remember we have to also scale that when we go to test our uh model and see how it works and we’ll go ahead and click on the run again uh it’s not going to have any output yet because we’re just setting up all the variables okay so we pasted the data in here and we’re going to create the data structure with the 60 time steps and output first note we’re running 60 time steps and that is where this value here also comes in so the first thing we do is we create our X train and Y train variables and we set them to an empty python array very important to remember what kind of array we’re in and what we’re working with and then we’re going to come in here we’re going to go for I in range 60 to 1258 there’s our 60 60 time steps and the reason we want to do this is as we’re adding the data in there there’s nothing below the 60 so if we’re going to use 60 time steps uh we have to start at 60 because it includes everything underneath of it otherwise you’ll get a pointer error and then we’re going to take our X train and we’re going to append training set scaled this is a scaled value between zero and one and then as I is equal to 60 this value is going to be um 60 – 60 is 0 so

this actually is 0 to I so it’s going to be 0 is 60 1 to 61 let me just circle this part right here 1 to 61 uh 2 to 62 two and so on and so on and if you remember I said 0 to 60 that’s incorrect because it does not count remember it starts at 0 so this is a count of 60 so it’s actually 59 important to remember that as we’re looking at this and then the second part of this that we’re looking at so if you remember correctly here we go we go from uh 0 to 59 of I and then we have a comma a zero right here and so finally we’re just going to look at the open value now I know we did put it in there for 1 to two um if you remember correctly it doesn’t count the second one so it’s just the open value we’re looking at just open um and then finally we have y train. append training set I to zero and if you remember correctly I to or I comma 0 if you remember correctly this is 0 to 59 so there’s 60 values in it uh so when we do I down here this is number 60 so we’re going to do this is we’re creating an array and we have 0 to 59 and over here we have number 60 which is going into the Y train it’s being appended on there and then this just goes all the way up so this is down here is a uh 0 to 59 and we’ll call it 60 since that’s the value over here and it goes all the way up to 1258 that’s where this value here comes in that’s the length of the data we’re loading so we’ve loaded two arrays we’ve loaded one array that has uh which is filled with arrays from 0 to 59 and we loaded one array which is just the value and what we’re looking at you want to think about this as a Time sequence uh here’s my open open open open open open what’s the next one in the series so we’re looking at the Google stock and each time it opens we want to know what the next one uh 0 through 59 what’s 60 1 through 60 what’s 61 2 through 62 what’s 62 and so on and so on going up and then once we’ve loaded those in our for Loop we go ahead and take X train and YT train equals np. array XT tr. NP array ytrain we’re just converting this back into a numpy array that way we can use all the cool tools that we get with numpy array including reshaping so if we take a look and see what’s going on here we’re going to take our X train we’re going to reshape it wow what the heck does reshape mean uh that means we have an array if you remember correctly um so many numbers by 60 that’s how wide it is and so we’re when you when you do XT train. shape that gets one of the shapes and you get um X train. shape of one gets the other shape and we’re just making sure the data is formatted correctly and so you use this to pull the fact that it’s 60 by um in this case where’s that value 60 by 1199 1258 minus 60199 and we’re making sure that that is shaped correctly so the data is grouped into uh 11 99 by 60 different arrays and then the one on the end just means at the end because this when you’re dealing with shapes and numpy they look at this as layers and so the in layer needs to be one value that’s like the leaf of a tree where this is the branch and then it branches out some more um and then you get the Leaf np. reshape comes from and using the existing shapes to form it we’ll go ahead and run this piece of code again there’s no real output and then we’ll import our different carass modules that we need so from carass Models we’re going to import the sequential model dealing with sequential data we have our D layers we have actually three layers we’re going to bring in our D our lstm which is what we’re focusing on and our Dropout and we’ll discuss these three layers more in just a moment but you do need the with the lstm you do need the Dropout and then the final layer will be the dents but let’s go ahead and run this and that’ll bring Port our modules and you’ll see we get an error on here and if you read it close it’s not actually an error it’s a warning what does this warning mean these things come up all the time when you’re working with such Cutting Edge modules that are completely being updated all the time we’re not going to worry too much about the warning all it’s saying is that the h5py module which is part of carass is going to be updated at some point and uh if you’re running new stuff on carass and you start updating your carass system you better make sure that your H5 Pi is updated too otherwise you’re going to have an error later on and you can actually just run an update on the H5 Pi now if you wanted to not a big deal we’re not going to worry about that today and I said we were going to jump in and start looking at what those layers mean I meant that and uh we’re going to start off with initializing the RNN and then we’ll start adding those layers in and you’ll see that we have the lstm and then the Dropout lstm then Dropout lstm then Dropout what the heck is that doing so let’s explore that we’ll start by initializing the RNN regressor equals sequential because we’re using the sequential model and we’ll run that and load that up and then we’re going to start adding our lstm layer and some Dropout regularization and right there should be the Q Dropout regularization and if we go back here and remember our exploding gradient well that’s what we’re talking about the uh Dropout drops out unnecessary data so we’re not just shifting huge amount of data through um the network so and so we go in here let’s just go ahead and uh add this in I’ll go ahead and run this and we had three of them so let me go and put all three of them in and then we can go back over them there’s the second one and let’s put one more in let’s put that in and we’ll go and put two more in I meant to I said one more in but it’s actually two more in and then let’s add one more after that and as you can see each time I run these they don’t actually have an output so let’s take a closer look and see what’s going on here so we’re going to add our first lstm layer in here we’re going to have units 50 the units is the positive integer and it’s the dimensionality of the output space this is what’s going going out into the next layer so we might have 60 coming in but we have 50 going out we have a return sequence because it is a sequence data so we want to keep that true and then you have to tell it what shape it’s in well we already know the shape by just going in here and looking at x train shape so input shape equals the XT Trin shape of 1 comma 1 makes it really easy you don’t have to remember all the numbers that put in 60 or whatever else is in there you just let it tell the regressor what model to use and so we follow our STM with Dropout layer now understanding the Dropout layer is kind of exciting because one of the things that happens is we can overtrain our Network that means that our neural network will memorize such specific data that it has trouble predicting anything that’s not in that specific realm to fix for that each time we run through the training mode we’re going to take 02 or 20% of our neurons and just turn them off so we’re only going to train on the other ones and it’s going to be random that way each time we pass through this we don’t overtrain these nodes come back in in the next training cycle we randomly pick a different 20 and finally they see a big difference as we go from the first to the second and third and fourth the first thing is we don’t have to input the shape because the shape’s already the output units is 50 here this Auto The Next Step automatically knows this layer is putting out 50 and because it’s the next layer it automatically sets that and says oh 50 is coming out from our last layer it’s coming out you know goes into the regressor and of course we have our Dropout and that’s what’s coming into this one and so on and so on and so the next three layers we don’t have to let it know what the shape is it automatically understands that and we’re going to keep the units the same we’re still going to do 50 units it’s still a sequence coming through 50 units and a sequence now the next piece of code is what brings it all together let’s go ahead and take a look at that and we come in here we put the output layer the dense layer and if you remember up here we had the three layers we had uh lstm Dropout and d uh D just says we’re going to bring this all down into one output instead of putting out a sequence we just know want to know the answer at this point and let’s go ahead and run that and so in here you notice all we’re doing is setting things up one step at a time so far we’ve brought in our uh way up here we brought in our data we brought in our different modules we formatted the data for training it we set it up you know we have our y x train and our y train we have our source of data and the answers we’re we know so far that we’re going to put in there we reshaped that we’ve come in and built our carass we’ve imported our different layers and we have in here if you look we have what uh five total layers now carass is a little different than a lot of other systems because a lot of other systems put this all in one line and do it automatic but they don’t give you the options of how those layers interface and they don’t give you the options of how the data comes in carass is Cutting Edge for this reason so even though a lot of extra steps in building the model this has a huge impact on the output and what we can do with this these new models from carass so we brought in our dents we have our full model put together our regressor so we need to go ahead and compile it and then we’re going to go ahead and fit the data we’re going to compile the pieces so they all come together and then we’re going to run our training data on there and actually recreate our regressor so it’s ready to be used so let’s go ahead and compile that and I can go ahe and run that and uh if you’ve been looking at any of our other tutorials on neural networks you’ll see we’re going to use the optimizer atom adom is op optimiz for Big Data there’s a couple other optimizers out there beyond the scope of this tutorial but certainly Adam will work pretty good for this and loss equals mean squared value so when we’re training it this is what we want to base the loss on how bad is our error or we’re going to use the mean squared value for our error and the atom Optimizer for its differential equations you don’t have to know the math behind them but certainly it helps to know what they’re doing and where they fit into the bigger models and then finally we’re going to do our fit fitting the RN into the training set we have the regressor fit xtrain y train epics and batch size so we know where this is this is our data coming in for the xtrain our y train is the answer we’re looking for of our data our sequential input epex is how many times we’re going to go over the whole data set we created a whole data set of XT train so this is each each of those rows which includes a Time sequence of 60 and badge size another one of those things where carass really shines is if you were pulling this save from a large file instead of trying to load it all into RAM it can now pick smaller batches up and load those indirectly we’re not worried about pulling them off a file today because this isn’t big enough to uh cause the computer too much of a problem to run not too straining on the resources but as we run this you can imagine what would happen if I was doing a lot more than just one column in one set of stock in this case Google stock imagine if I was doing this across all the stocks and I had instead of just the open I had open close high low and you can actually find yourself with about 13 different variables times 60 cuz it’s a Time sequence suddenly you find yourself with a gig of memory you’re loading into your RAM which will just completely you know if it’s just if you’re not on multiple computers or cluster you’re going to start running into resource problems but for this we don’t have to worry about that so let’s go ahead and run this and this will actually take a little bit on my computer it’s an older laptop and give it a second to kick in there there we go all right so we have epic so this is going to tell me it’s running the first run through all the data and as it’s going through it’s batching them in 32 pieces so 32 uh lines each time and there’s 1198 I think I said $199 earlier but it’s $ 1198 I was off by one and each one of these is 133 seconds so you can imagine this is roughly 20 to 30 minutes runtime on this computer like I said it’s an older laptop running at uh 0.9 GHz on a dual processor and that’s fine what we’ll do is I’ll go ahead and stop go get a drink of coffee and come back and let’s see what happens at the end and where this takes us and like any good cooking show I’ve kind of gotten my latte I also had some other stuff running in the background so you’ll see these numbers jumped up to like 19 seconds 15 seconds which you can scroll through and you can see we’ve run it through 100 steps or 100 epics so the question is what does all this mean one of the first things you’ll notice is that our loss is over here it kind of stopped at 0.0014 but you can see it kind of goes down until we hit about 0.0014 3 times in a row so we guessed our epic pretty close since our loss has remain the same on there so to find out we’re looking at we’re going to go ahead and load up our test data the test data that we didn’t process yet and real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let’s go ahead and go through this code and we can see we’ve labeled it uh part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV you see I’ve changed the path on it for my computer and uh then we’ll call it the real stock price and again we’re doing just the one column here and the values from ication so it’s all the rows and just the values from these that one location that’s the open Stock open and let’s go ahead and run that so that’s loaded in there and then let’s go ahead and uh create we have our inputs we’re going to create inputs here and this should all look familiar this is the same thing we did before we’re going to take our data set total we’re going to do a little Panda concat from the data Sate train now remember the end of the data set train is part of the data going in and let’s just visualize that just a little bit here’s our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data so to find out we’re looking at we’re going to go ahead and load up our test data the test data that we didn’t process yet and real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let’s go ahead and go through this code and we can see we’ve labeled it part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV you see I’ve changed the path on it for my computer and uh then we’ll call it the real stock price and again we’re doing just the one column here and the values from I location so it’s all the rows and just the values from these that one location that’s the open Stock open let’s go ahead and run that so that’s loaded in there and then let’s go ahead and uh create we have our inputs we’re going to create inputs here and this should all look familiar this is the same thing we did before we’re going to take our data set total we’re going to do a little Panda concat from the data State train now remember the end of the data set train is part of the data going in and let’s just visualize that just a little little bit here’s our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data cuz that’s part of the next data or it’s actually the top 59 so that’s what this first setup is over here is we’re going in we’re doing the real stock price and we’re going to just take the data set test and we’re going to load that in and then the real stock price is our data test. test location so we’re just looking at that first uh column the open price and then our data set total we’re going to take pandas and we’re going to concat and we’re going to take our data set train for the open and our data site test open and this is one way you can reference these columns we’ve referenced them a couple different ways we’ve referenced them up here with the one two but we know it’s labeled as a panda set is open so pandas is great that way lots of Versatility there and we’ll go ahead and go back up here and run this there we go and uh you’ll notice this is the same as what we did before we have our open data set we pended our two different or concatenated our two data sets together we have our inputs equals data set total length data set total minus length of data set minus test minus 60 values so we’re going to run this over all of them and you’ll see why this works because normally when you’re running your test set versus your training set you run them completely separate but when we graph this you’ll see that we’re just going to be we’ll be looking at the part that uh we didn’t train it with to see how well it graphs and we have our inputs equals inputs. reshapes or reshaping like we did before we’re Transforming Our inputs so if you remember from the transform between zero and one and uh finally we want to go ahead and take our X test and we’re going to create that X test and for I in range 60 to 80 so here’s our X test and we’re pending our inputs I to 60 which remember is 0 to 59 and I comma zero on the other side so it’s just the First Column which is our open column and uh once again we take our X test we convert it to a numpy array we do the same reshape we did before and uh then we get down to the final two lines and here we have something new right here on these last two lines let me just highlight those or or mark them predicted stock price equals regressor do predicts X test so we’re predicting all the stock including both the training and the testing model here and then we want to take this prediction and we want to inverse the transform so remember we put them between zero and one well that’s not going to mean very much to me to look at a at a float number between 0 one I want the dollar amounts I want to know what the cash value is and we’ll go ahead and run this and you’ll see it runs much quicker than the training that’s what’s so wonderful about these neural networks once you put them together it takes just a second to run the same neural network that took us what a half hour to train ahead and plot the data we’re going to plot what we think it’s going to be and we’re going to plot it against the real data what what the Google stock actually did so let’s go ahead and take a look at that in code and let’s uh pull this code up so we have our PLT that’s our uh oh if you remember from the very beginning let me just go back up to the top we have our matplot library. pyplot as PLT that’s where that comes in and we come down here we’re going to plot let me get my drawing thing out again we’re going to go ahead and PLT is basically kind of like an object it’s one of the things that always threw me when I’m doing graphs in Python because I always think you have to create an object and then it loads that class in there well in this case PLT is like a canvas you’re putting stuff on so if you’ve done HTML 5 you’ll have the canvas object this is the canvas so we’re going to plot the real stock price that’s what it actually is and we’re going to give that color red so it’s going to be in bright red we’re going to label it real Google stock price and then we’re going to do our predicted stock and we’re going to do it in blue and it’s going to be labeled predicted and we’ll give it a title because it’s always nice to give a title to your H graph especially if you’re going to present this to somebody you know to your shareholders in the office and uh the X label is going to be time because it’s a Time series and we didn’t actually put the actual date and times on here but that’s fine we just know they’re incremented by time and then of course the Y label is the actual stock price PLT do Legend tells us to build the legend on here so that the color red and and real Google stock price show up on there and then the plot shows us that actual graph so let’s go ahead and run this and see what that looks like and you can see here we have a nice graph and let’s talk just a little bit about this graph before we wrap it up here’s our Legend I was telling you about that’s why we have the legend to showed the prices we have our title and everything and you’ll notice on the bottom we have a Time sequence we didn’t put the actual time in here now we could have we could have gone ahead and um plotted the X since we know what the the dates are and plotted this to dates but we also know this only the last piece of data that we’re looking at so last piece of data which ends somewhere probably around here on the graph I think it’s like about 20% of the data probably less than that we have the Google price and the Google price has this little up jump and then down and you’ll see that the actual Google instead of a a turn down here just didn’t go up as high and didn’t low go down so our prediction has the same pattern but the overall value is pretty far off as far as um stock but then again we’re only looking at one column we’re only looking at the open price we’re not looking at how many volumes were traded like I was pointing out earlier we talk about stock just right off the bat there’s six columns there’s open high low close volume then there’s whether uh I mean volume shares then there’s the adjusted open adjusted High adjusted low adjusted close they have a special formula to predict exactly what it would really be worth based on the value of the stock and then from there there’s all kinds of other stuff you can put in here so we’re only looking at one small aspect the opening price of the stock and as you can see here we did a pretty good job this curve follows the curve pretty well it has like a you know little jumps on it bends they don’t quite match up so this Bend here does not quite match up with that bend in there but it’s pretty darn close we have the basic shape of it and the prediction isn’t too far off and you can imagine that as we add more data in and look at different aspects in the specific domain of stock we should be able to get a better representation each time we drill in deeper of course this took a half hour for my program my computer to train so you can imagine that if I was running it across all those different variables might take a little bit longer to train the data not so good for doing a quick tutorial like this so we’re going to direct into what is carass we’ll also go all the way through this into a couple of tutorials because that’s where you really learn a lot is when you roll up your sleeves so we talk about what is carass carass is a highlevel deep learning API written in Python for easy Implement implementation of neural networks uses deep learning Frameworks such as tensorflow pytorch Etc as backend to make computation faster and this is really nice because as a program there is so much stuff out there and it’s evolving so fast it can get confusing and having some kind of high level order in there we can actually view it and easily program these different neural networks uh is really powerful it’s really powerful to to um uh have something out really quick and also be able to start testing your models and seeing where you’re going so cross works by using complex deep learning Frameworks such as tensorflow pytorch um mlpl Etc as a back end for fast computation while providing a userfriendly and easy tolearn front end and you can see here we have the carass API uh specifications and under that you’d have like TF carass for tensor flow thano carass and so on and then you have your tensorflow workflow that this is all sitting on top of and this is like I said it organizes everything the heavy lifting is still done by tensor flow or whatever you know underlying package you put in there and this is really nice because you don’t have to um dig as deeply into the heavy end stuff while still having a very robust package you can get up and running rather quickly and it doesn’t distract from the processing time because all the heavy lifting is done by packages like tensor flow this is the organization on top of it so the working principle of carass uh the working principle of carass is carass uses computational graphs to express and evaluate mathematic iCal Expressions you can see here we put it in blue they have the expression um expressing complex problems as a combination of simple mathematical operators uh where we have like the percentage or in this case in Python that’s usually your uh left your um remainder or multiplication uh you might have the operator of x uh to the power of3 and it us is useful for calculating derivatives by using uh back propagation so if we’re doing with neural networks we send the error back back up to figure out how to change it uh this makes it really easy to do that without really having not banging your head and having to hand write everything it’s easier to implement distributed computation and for solving complex problems uh specify input and outputs and make sure all nodes are connected and so this is really nice as you come in through is that um as your layers are going in there you can get some very complicated uh different setups nowadays which we’ll look at in just a second and this just makes it really easy to start spinning this stuff up and trying out the different models so when we look at Cross models uh cross model you have a sequential model sequential model is a linear stack of layers where the previous layer leads into the next layer and this if you’ve done anything else even like the sklearn with their neural networks and propagation and any of these setups this should look familiar you should have your input layer it goes into your layer one layer two and then to the output layer and it’s useful for simple classifier decoder models and you can see down here we have the model equals AOSS sequential and this is the actual code you can see how easy it is uh we have a layer that’s dense your layer one has an activation they’re using the ru in this particular example and then you have your name layer one layer Den Rao name Layer Two and so forth uh and they just feed right into each other so it’s really easy just to stack them as you can see here and it automatically takes care of everything else for you and then there’s a functional model and this is really where things are at this is new make sure you update your cross or you’ll find yourself running this um doing the functional model you’ll run into an error code because this is a fairly new release and he uses multi-input and multi-output model the complex model which Forks into two or more branches and you can see here we have our image inputs equals your coros input shape equals 32x 32x 3 you have your uh dense layers dense 64 activation railu this should look similar to what you already saw before uh but if you look at the graph on the right it’s going to be a lot easier to see what’s going on you have two different inputs uh and one way you could think of this is maybe one of those is a small image and one of those is a full-sized image and that feedback goes into you might feed both of them into one note because it’s looking for one thing and then into one node for the other one and so you can start to get kind of an idea that there’s a lot of use for this kind of split and this kind of setup uh where we have multiple information coming in but the information’s very different even though it overlaps and you don’t want it to send it through the same neural network um and they’re finding that this trains faster and is also has a better result depending on how you split the data up and and how you Fork the models coming down and so in here we do have the two complex uh models coming in uh we have our image inputs which is a 32x 32 by3 your three channels or four if you’re having an alpha channel uh you have your dense your layers dense is 64 activation using the railu very common uh x equals dense inputs X layers dense x64 activation equals Rao X outputs equals layers dense 10 X model equals coros model inputs equals inputs outputs equals outputs name equals ninc model uh so we add a little name on there and again this is this kind of split here this is setting us up to um have the input go into different areas so if you’re already looking at corus you probably already have this answer what are neural networks uh but it’s always good to get on the same page and for those people who don’t fully understand neural networks to dive into them a little bit or do a quick overview neural networks are deep learning algorithms modeled after the human brain they use multiple neurons which are mathematical operations to break down and solve complex maical problems and so just like the neuron one neuron fires in and it fires out to all these other neurons or nodes as we call them and eventually they all come down to your output layer and you can see here we have the really standard graph input layer a hidden layer and an output layer one of the biggest parts of any data processing is your data pre-processing uh so we always have to touch base on that with a neural network like many of these models they’re kind of uh when you first start using them they’re like a black box you put your data in you train it and you test it and see how good it was and you have to pre-process that data because bad data in is uh bad outputs so in data pre-processing we will create our own data examples set with carass the data consists of a clinical trial conducted on 2100 patients r ing from ages 13 to 100 with a the patients under 65 and the other half over 65 years of age we want to find the possibility of a patient experiencing side effects due to their age and you can think of this in today’s world with uh co uh what’s going to happen on there and we’re going to go ahead and do an example of that in our uh live Hands-On like I said most of this you really need to have hands on to understand so let’s go ahead and bring up our anaconda and I’ll open that up and open up a Jupiter notebook for doing the python code in now if you’re not familiar with those you can use pretty much any of your uh setups I just like those for doing demos and uh showing people especially shareholders it really helps because it’s a nice visual so let me go and flip over to our anaconda and the Anaconda has a lot of cool to tools they just added datal lore and IBM Watson Studio clad into the Anaconda framework but we’ll be in the Jupiter lab or Jupiter notebook um I’m going to do jupyter notebook for this because I use the lab for like large projects with multiple pieces because it has multiple tabs where the notebook will work fine for what we’re doing and this opens up in our browser window because that’s how Jupiter notebook soorry Jupiter notebook is set to run and we’ll go under new create a new Python 3 and uh it creates an Untitled python we’ll go ahead and give this a title and we’ll just call this uh cross tutorial and let’s change that to a capital there we go we go and just rename that and the first thing we want to go ahead and do is uh get some pre-processing tools involved and so we need to go ahead and import some stuff for that like our numpy do some random number Generation Um I mentioned sklearn or your s kit if you’re installing sklearn the sklearn stuff it’s a s kit you want to look up that should be a tool of anybody who is uh doing data science if if you’re not if you’re not familiar with the sklearn toolkit it’s huge uh but there’s so many things in there that we always go back to and we want to go ahead and create some train labels and train samples uh for training our data and then just a note of what we’re we’re actually doing in here uh let me go ahead and change this this is kind of a a fun thing you can do we can change the code to markdown and then markdown code is nice for doing examples once you’ve already built this uh our example data we’re going to do experimental there we go experimental drug was tested on 2100 individuals between 13 to 100 years of age half the participants under 65 and 95% of participants are under 65 experience no side effects well 95% of participants over 65 um experience side effects so that’s kind of where we’re starting at um and this is just a real quick example because we’re going to do another one with a little bit more uh complicated information uh and so we want to go ahead and generate our setup uh so we want to do for I in range and we want to go ahead and create if you look here we have random integers train the labels of pin so we’re just creating some random data uh let me go ahead and just run that and so once we’ve created our random data and if you if I mean you can certainly ask for a copy of the code from Simply learn they’ll send you a copy of this or you can zoom in on the video and see how we went ahead and did our train samples a pin um and we’re just using this I do this kind of stuff all the time I was running a thing on uh that had to do with errors following a bell-shaped curve on uh a standard distribution error and so what do I do I generate the data on a standard distribution error to see what it looks like and how my code processes it since that was the Baseline I was looking for in this we’re just doing uh uh generating random data for our setup on here and uh we could actually go in uh print some of the data up let’s just do this print um we’ll do [Music] train samples and we’ll just do the first um five pieces of data in there to see what that looks like and you can see the first five pieces of data in our train samples is 49 85 41 68 19 just random numbers generated in there that’s all that is uh and we generated significantly more than that um let’s see 50 up here 1,000 yeah so there’s 1,00 here 1,000 numbers we generated and we could also if we wanted to find that out we can do a quick uh print the length of it and so or you could do a shape kind of thing and if you’re using numpy although the link for this is just fine and there we go it’s actually 2100 like we said in the demo setup in there and then we want to go ahead and take our labels oh that was our train labels we also did samples didn’t we uh so we could also print do the same thing oh labels uh and let’s change this to labels and [Music] labels and run that just to double check and sure enough we have 2100 and they’re labeled one Z one0 one0 I guess that’s if they have symptoms or not one symptoms uh Zer none and so we want to go ahead and take our train labels and we’ll convert it into a numpy array and the same thing with our samples and let’s go ahead and run that and we also Shuffle uh this is just a neat feature you can do in uh numpy right here put my drawing thing on which I didn’t have on earlier um I can take the data and I can Shuffle it uh so we have our so it’s it just randomizes it that’s all that’s doing um we’ve already randomize it so it’s kind of an Overkill it’s not really necessary but if you’re doing uh a larger package where the data is coming in and a lot of times it’s organized somehow and you want to randomize it just to make sure that that you know the input doesn’t follow a certain pattern uh that might create a bias in your model and we go ahead and create a scaler uh the scaler range uh minimum Max scaler feature range 0 to one uh then we go ahead and scale the uh scaled train samples we’re going to go ahead and fit and transform the data uh so it’s nice and scaled and that is the age uh so you can see up here we have 49 85 41 we’re just moving that so it’s going to be uh between zero and one and so this is true with any of your neural networks you really want to convert the data uh to zero and one otherwise you create a bias uh so if you have like a 100 creates a bias versus the math behind it gets really complicated um if you actually start multiplying stuff because a lot of multiplication Edition going on in there that higher end value will eventually multiply down and it will have a huge bias as to how the model fits it and then it will not fit as well and then one of the fun things we can do in Jupiter notebook is that if you have a variable and you’re not doing anything with it it’s the last one on the line it will automatically print um and we’re just going to look at the first five samples on here and just going to print the first five samples and you can see here we go uh 9 95. 791 so everything’s between zero and one and that just shows us that we scaled it properly and it looks good uh it really helps a lot to do these kind of print UPS halfway through uh you never know what’s going to go on there I don’t know how many times I’ve gotten down and found out that the data sent to me that I thought was scaled was not and then I have to go back and track it down and figure it out on there uh so let’s go ahead and create our artificial neural network and for doing that this is where we start diving into tensor flow and carass uh tensor flow if you don’t know the history of tensor flow it helps to uh jump into we’ll just use Wikipedia careful don’t quote Wikipedia on these things because you get in trouble uh but it’s a good place to start uh back in 20 Google brain built disbelief as a proprietary machine learning setup tensor flow became the open source for it uh so tensorflow was a Google product and then it became uh open sourced and now it’s just become probably one of the defao when it comes for neural networks as far as where we’re at uh so when you see the tensorflow setup it it’s got like a huge following there are some other setups like a um the site kit under the sklearn has our own little neural network uh but the tensorflow is the most robust one out there right now and caros sitting on top of it makes it a very powerful tool so we can leverage both the carass uh easiness in which we can build a sequential setup on top of tensor flow and so in here we’re going to go ahead and do our input of tensor flow uh and then we have the rest of this is all carass here from number two down uh we’re going to import from tensorflow the coros connection and then you have your tensorflow cross models import sequential it’s a specific kind of model we’ll look at that in just a second if you remember from the files that means it goes from one layer to the next layer to the next layer there’s no funky splits or anything like that uh and then we have from tensorflow Cross layers we’re going to import our activation and our dense layer and we have our Optimizer atom um this is a big thing to be aware of how you optimize uh your data when you first do it Adams’s as good as any atom is usually uh there’s a number of Optimizer out there there’s about uh there’s a couple main one thems but atom is usually assigned to bigger data uh it works fine usually the lower data does it just fine but atom is probably the mostly used but there are some more out there and depending on what you’re doing with your layers your different layers might have different activations on them and then finally down here you’ll see um our setup where we want to go ahead and use the metrics and we’re going to use the tensorflow cross metrics um for categorical cross entropy uh so we can see how everything performs when we’re done that’s all that is um a lot of times you’ll see us go back and forth between tensor flow and then pyit has a lot of really good metrics also for measuring these things um again it’s the end of the you know at the end of the Story how good does your model do and we’ll go ahead and load all that and then comes the fun part um I actually like to spend hours messing with these things and uh four lines of code you’re like ah you’re G to spend hours on four lines of code um no we don’t spend hours on four lines of code that’s not what we’re talking about when I say spend hours on four lines of code uh what we have here I’m going to explain that in just a second we have a model and it’s a sequential model if you remember correctly we mentioned the sequential up here where it goes from one layer to the next and our first layer is going to be your input it’s going to be uh what they call D which is um usually it’s just D and then you have your input and your activation um how many units are coming in we have 16 uh what’s the shape What’s the activation and this is where it gets interesting um because we have in here uh railu on two of these and softmax activation on one of these there are so many different options for what these mean um and how they function how does the ru how does the softmax function and they do a lot of different things um we’re not going to go into the activations in here that is what really you spend hours doing is looking at these different activations um and just some some of it is just U um almost like you’re playing with it like an artist you start getting a fill for like a uh inverse tangent activation or the tan activation takes up a huge processing amount uh so you don’t see it a lot yet it comes up with a better solution especially when you’re doing uh when you’re analyzing Word documents and you’re tokenizing the words and so you’ll see this shift from one to the other because you’re both trying to build a better model and if you’re working on a huge data set um it’ll crash the system it’ll just take two long to process um and then you see things like soft Max uh soft Max generates an interesting um setup where a lot of these when you talk about rayu oops let me do this uh Ru there we go railu has um a setup where if it’s less than zero it’s zero and then it goes up um and then you might have what they call lazy uh setup where it has a slight negative to it so that the errors can translate better same thing with softmax it has a slight laziness to it so that errors translate better all these little details make a huge different on your model um so one of the really cool things about data science that I like is you build your uh what they call you build defil and it’s an interesting uh design set set up oops I forgot the end of my code here the concept to build a fail is you want the model as a whole to work so you can test your model out so that you can do uh you can get to the end and you can do your let’s see where was it overshot down here you can test your test out the the quality of your setup on there and see where did I do my tensor FL oh here we go I did it was right above me there we go we start doing your cross entropy and stuff like that is you need a full functional set of code so that when you run it you can then test your model out and say hey it’s either this model works better than this model and this is why um and then you can start swapping in these models and so when I say I spend a huge amount of time on pre-processing data is probably 80% of your programming time um well between those two it’s like 8020 you’ll spend a lot of time on the model once you get the model down once you get the whole code and the flow down uh set depending on your data your models get more and more robust as you start experimenting with different inputs different data streams and all kinds of things and we can do a simple model summary here uh here’s our sequential here’s our layer our output our parameter this is one of the nice things about carass is you just you can see right here here’s our sequential one model boom boom boom boom everything’s set and clear and easy to read so once we have our model built uh the next thing we’re going to want to do is we’re want to go ahead and train that model and so the next step is of course model training and when we come in here this a lot of times is just paired with the model because it’s so straightforward it’s nice to print out the model setup so you can have a tracking but here’s our model uh the keyword in Cross is compile Optimizer atom learning rate another term right there that we’re just skipping right over that really becomes the meat of um the setup is your learning rate uh so whoops I forgot that I had an arrow but I’ll just underline it a lot of times the learning rate set to 0.0 uh set to 0.01 uh depending on what you’re doing this learning rate um can overfit and underfit uh so you’d want to look up I know we have a number of tutorials out on overfitting and underfitting that are really worth reading once you get to that point in understanding and we have our loss um sparse categorical cross entropy so this is going to tell carass how far to go until it stops and then we’re looking for metrics of accuracy so we’ll go ahead and run that and now that we’ve compiled our model we want to go ahead and um run it fit it so here’s our model fit um we have our scaled train samples our train labels our validation split um in this case we’re going to use 10% of the data for validation uh batch size another number you kind of play with not a huge difference as far as how it works but it does affect how long it takes to run and it can also affect the bias a little bit uh most of the time though so a batch size is between 10 to 100 um depending on just how much data you’re processing in there we want to go ahead and Shuffle it uh we’re going to go through 30 epics and uh put a verbose of two let me just go and run this and you can see right here here’s our epic here’s our training um here’s our loss now if you remember correctly up here we set the loss see where was it um compiled our data there we go loss uh so it’s looking at The Spar categorical cross entropy this tells us that as it goes how how how much um how much does the um error go down uh is the best way to look at that and you can see here the lower the number the better it just keeps going down and vice versa accuracy we want let’s see where’s my accuracy value accuracy at the end uh and you can see 619 69. 74 it’s going up we want the accuracy would be ideal if it made it all the way to one but we also the loss is more important because it’s a balance um you can have 100% accuracy and your model doesn’t work because it’s overfitted uh again you w’t look up overfitting and underfitting models and we went ahead and went through uh 30 epics it’s always fun to kind of watch your code going um to be honest I usually uh um the first time I run it I’m like Ah that’s cool I get to see what it does and after the second time of running it I’m like i’ like to just not see that and you can repress those of course in your code uh repress the warnings in the printing and so the next step is going to be building a test set and predicting it now uh so here we go we want to go ahead and build our test set and we have just like we did our training set a lot of times you just split your your initial set set up uh but we’ll go ahead and do a separate set on here and this is just what we did above uh there’s no difference as far as um the randomness that we’re using to build this set on here uh the only difference is that we already um did our scaler up here well it doesn’t matter because the the data is going to be across the same thing but this should just be just transform down here instead of fit transform uh because you don’t want to refit your data um on your testing data there we go and now we’re just transforming it because you never want to transform the test data um easy mistake to make especially on an example like this where we’re not doing um you know we’re randomizing the data anyway so it doesn’t matter too much because we’re not expecting something weird and then we went ahead and do our predictions the whole reason we built the model as we take our model we predict and we’re going to do here’s our xcal data batch size 10 verbose and now we have our predictions in here and we could go ahead and do a um oh we’ll print predictions and then I guess I could just put down predictions and five so we can look at the first five of the predictions and what we have here is we have our age and uh the prediction on this age versus what what we think it’s going to be what what we think is going to going to have uh symptoms or not and the first thing we notice is that’s hard to read because we really want a yes no answer uh so we’ll go ahead and just uh round off the predictions using the argmax um the numpy argmax uh for predictions so it just goes to a zer1 and if you remember this is a Jupiter notebook so I don’t have to put the print I can just just put in uh rounded predictions and we’ll just do the first five and you can see here 0 1 0 0 0 so that’s what the predictions are that we have coming out of this um is no symptoms symptoms no symptoms symptoms no symptoms and just as uh we were talking about at the beginning we want to go ahead and um take a look at this there we go confusion matrixes for accuracy check um most important part when you get down to the end of the Story how accurate is your model before you go and play with the model and see if you can get a better accuracy out of it and for this we’ll go ahead and use theit um the SK learn metric uh s kit being where that comes from import confusion Matrix uh some iteration tools and of course a nice map plot library that makes a big difference so it’s always nice to um have a nice graph to look at um pictures worth a thousand words um and then we’ll go ahead and do call it CM for confusion Matrix y true equals test labels y predict rounded predictions and we’ll go ahead and load in our cm and I’m not going to spend too much time on the plotting um going over the different plotting code um you can spend uh like whole we have whole tutorials on how to do your different plotting on there uh but we do do have here is we’re going to do a plot confusion Matrix there’s our CM our classes normalized false title confusion Matrix cmap is going to be in blues and you can see here we have uh to the nearest cmap titles all the different pieces whether you put tick marks or not the marks the classes the color bar um so a lot of different information on here as far as how we’re doing the printing of the of the confusion Matrix you can also just dump the confusion Matrix into a caborn and real quick get an output it’s worth knowing how to do all this uh when you’re doing a presentation to the shareholders you don’t want to do this on the Fly you want to take the time to make it look really nice uh like our guys in the back did and uh let’s go ahead and do this forgot to put together our CM plot labels we’ll go and run that and then we’ll go ahead and call the little the definition for our mapping and you can see here plot confusion Matrix that’s our the the little script we just wrote and we’re going to dump our data into it um so our confusion Matrix our classes um title confusion Matrix and let’s just go ahead and run that and you can see here we have our basic setup uh no side effects 195 had side effects uh 200 no side effects that had side effects so we predicted the 10 of them who had actually had side effects and that’s pretty good I mean I I don’t know about you but you know that’s 5% error on this and this is because there’s 200 here that’s where I get 5% is uh divide these both by by two and you get five out of a 100 uh you can do the same kind of math up here not as quick on the flight it’s 15 and 195 not an easily rounded number but you can see here where they have 15 people who predicted to have no uh with the no side effects but had side effects kind of setup on there and these confusion Matrix are so important at the end of the day this is really where where you show uh whatever you’re working on comes up and you can actually show them hey this is how good we are or not how messed up it is so this was a uh I spent a lot of time on some of the parts uh but you can see here is really simple uh we did the random generation of data but when we actually built the model coming up here uh here’s our model summary and we just have the layers on here that we built with our model on this and then we went ahead and trained it and ran the prediction now we can get a lot more complicated uh let me flip back on over here because we’re going to do another uh demo so that was our basic introduction to it we talked about the uh oops here we go okay so implementing a neural network with coros after creating our samples and labels we need to create our carass neural network model we will be working with a sequen model which has three layers and this is what we did we had our input layer our hidden layers and our output layers and you can see the input layer uh coming in uh was the age Factor we had our hidden layer and then we had the output are you going to have symptoms or not so we’re going to go ahead and go with something a little bit more complicated um training our model is a two-step process we first compile our model and then we train it in our training data set uh so we have compiling compiling converts the code into a form of understandable by Machine we used the atom in the last example a gradient descent algorithm to optimize a model and then we trained our model which means it let it uh learn on training data uh and I actually had a little backwards there but this is what we just did is we if you remember from our code we just had o let me go back here um here’s our model that we created summarized uh we come down here and compile it so it tells it hey we’re ready to build this model and use it uh and then we train it this is the part where we go ahead and fit our model and and put that information in here and it goes through the training on there and of course we scaled the data which was really important to do and then you saw we did the creating a confusion Matrix with caras um as we are performing classifications on our data we need a confusion Matrix to check the results a confusion Matrix breaks down the various misclassification ifications as well as correct classifications to get the accuracy um and so you can see here this is what we did with the true positive false positive true negative false negative and that is what we went over let me just scroll down here on the end we printed it out and you can see we have a nice print out of our confusion Matrix uh with the true positive false positive false negative true negative and so the blue ones uh we want those to be the biggest number because those are the better side and then uh we have our false predictions on here uh as far as this one so I had no side effects but we predicted let’s see no side effects predicting side effects and vice versa if getting your learning started is half the battle what if you could do that for free visit scaleup by simply learn click on the link in the description to know more now uh saving and loading models with carass we’re going to dive into a more complicated demo um and you’re going to say oh well that was a lot of complication before well if you broke it down we randomized some data we created the um carass setup we compiled it we trained it we predicted and we ran our Matrix uh so we’re going to dive into something a lot a little bit more fun is we’re going to do a face mask detection with carass uh so we’re going to build a carass model to check if a person is wearing a mask or not in real time and this might be important if you’re at the front of a store this is something today which is um might be very useful as far as some of our you know making sure people are safe uh and so we’re going to look at mask and no mask and let’s start with a little bit on the data and so in my data I have with a mask you can see they just have a number of images showing the people in masks and again if you want some of this information uh contact simply learn and they can send you some of the information as far as people with and without masks so you can try it on your own and this is just such a wonderful

example of this setup on here so before I dive into the mass detection uh talking about being in the current with uh covid and seeing that people are wearing masks this particular example I had to go ahead and update to a python 3.8 version uh it might run into a 37 I’m not sure I haven’t I kind of skipped 37 and installed 38 uh so I’ll be running in a three python 38 um and then you also want to make sure your tensor flow is up to date because the um they call functional uh layers that’s where they split if you remember correctly from back uh oh let’s take a look at this remember from here the functional model and a functional layer allows us to feed in the different layers into different you know different nodes into different layers and split them uh very powerful tool very popular right now in the edge of where things are with neural networks and creating a better model so I’ve upgraded to python 3.8 and let’s go ahead and open that up and go through uh our next example which includes uh multiple layers um programming it to recognize whether someone wears a mask or not and then uh saving that model so we can use it in real time so we’re actually almost a full um end to end development of a product here uh of course this is a very simplified version and it’d be a lot more more to it you’d also have to do like uh recognizing whether it’s someone’s face or not all kinds of other things go into this so let’s go ahead and jump into that code and we’ll open up a new Python 3 oops Python 3 it’s working on it there we go um and then we want to go ahead and train our mask we’ll just call this train mask and we want to go ahead and train mask and save it uh so it’s uh save mask train mask detection not to be confused with masking data a little bit different we’re actually talking about a physical mask on your face and then from the cross stampo we got a lot of imports to do here and I’m not going to dig too deep on the Imports uh we’re just going to go ahead and notice a few of them uh so we have in here go alt D there we go have something to draw with a little bit here we have our uh image processing and the image processing right here me underline that uh deals with how do we bring images in because most images are like a a square grid and then each value in there has three values for the three different colors uh cross and tensorflow do a really good job of uh working with that so you don’t have to do all the heavy listing and figuring out what going to go on uh and we have the mobile net average pooling 2D um this again is how do we deal with the images and pulling them uh dropout’s a cool thing worth looking up if you haven’t when as you get more and more into carass intenser flow uh it’ll Auto drop out certain notes that way you’ll get a better um the notes just kind of die uh they find that they actually create more of a bias than a help and they also add process in time so they remove them um and then we have our flatten that’s where you take that huge array with the three different colors and you find a way to flatten it so it’s just a one-dimensional array instead of a 2X two by3 uh dense input we did that in the other one so that should look a little familiar oops there we go our input um our model again these are things we had on the last one here’s our Optimizer with our atom um we have some pre-processing on on the input that goes along with bringing in the data in uh more pre-processing with image to array loading the image um this stuff is so nice it looks like a lot of work you have to import all these different modules in here but the truth is is it does everything for you you’re not doing a lot of pre-processing you’re letting the software do the pre-processing um and we’re going to be working with the setting something to categorical again that’s just a conversion from a number to a category uh 01 doesn’t really mean anything it’s like true false um label bizer the same thing uh we’re changing our labels around and then there’s our train test split classification report um our I am utilities let me just go ahead and scroll down here Notch for these this is something a little different going on down here this is not part of the uh tensor flow or the SK learn this is the S kit setup and tensor flow above uh the path this is part of um open CV and we’ll actually have another tutorial going out with the open CV so if you want to know more about Open CV you’ll get a glance on it in uh this software especially the ne the second piece when we reload up the data and hook it up to a video camera we’re going to do that on this round um but this is part of the open CV thing and you’ll see CV2 is usually how that’s referenced um but the IM utilities has to do with how do you rotate pictures around and stuff like that uh and resize them and then the map plot library for plotting because it’s nice to have a graph tells us how good we’re doing and then of course our numpy numbers array and just a straight OS access wow so that was a lot of imports uh like I said I’m not going to spend I spent a little time going through them uh but we didn’t want to go too much into them and then I’m going to create um some variables that we need to go ahead and initialize we have the learning rate number of epics to train for and the batch size and if you remember correctly we talked about the learning rate uh to the -4.1 um a lot of times it’s 0.001 or 0.001 usually it’s in that uh variation depending on what you’re doing and how many epics and they kind of play with the epics the epics is how many times are we going to go through all the data now I have it as two um the actual setup is for 20 and 20 works great the reason I have for two is it takes a long time to process one of the downsides of Jupiter is that Jupiter isolates it to a single kernel so even though I’m on an8 core processor uh with 16 dedicated threads only one thread is running on this no matter what so it doesn’t matter uh so it takes a lot longer to run even though um tensor flow really scals up nicely and the batch size is how many pictures do we load at once in process again those are numbers you have to learn to play with depending on your data and what’s coming in and the last thing we want to go ahead and do is there’s a directory with a data set we’re going to run uh and this just has images of masks and not masks and if we go in here you’ll see data set um and you have pictures with mass they’re just images of people with mass on their face uh and then we have the opposite let me go back up here without masks so it’s pretty straightforward they look kind of a skew because they tried to format them into very similar uh setup on there so they’re they’re mostly squares you’ll see some that are slightly different on here and that’s kind of important thing to do on a lot of these data sets get them as close as you can to each other and we’ll we actually will run in the in this processing of images up here and the cross uh layers and importing and and dealing with images it does such a wonderful job of converting these and a lot of it we don’t have to do a whole lot with uh you have a couple things going on there and so uh we’re now going to be this is now loading the um images and let me see and we’ll go ahead and uh create data and labels here’s our um uh here’s the features going in which is going to be our pictures and our labels going out and then for categories in our list directory directory and if you remember I just flashed that at you it had uh uh face mask or or no face mask those are the two options and we’re just going to load into that we’re going to pin the image itself and the labels so we’re just create a huge array uh and you can see right now this could be an issue if you had more data at some point um thankfully I have a a 32 gig hard drive or Ram even that does you could do with a lot less of that probably under 16 or even eight gigs would easily load all this stuff um and there’s a conversion going on in here I told you about how we are going to convert the size of the image so it resizes all the images that way our data is all identical the way it comes in and you can see here with our labels we have without mask without mask without mask uh the other one would be with mask those are the two that we have going in there uh and then we need to change it to the one not hot encoding and this is going to take our um um up here we had was was it labels and data uh we want the labels uh to be categorical so we’re going to take labels and change it to categorical and our labels then equal a categorical list uh we’ll run that and again if we do uh labels and we just do the last or the first 10 let let’s do the last 10 just because um minus 10 to the end there we go just so we can see where the other side looks like we now have one that means they have a mask one zero one zero so on uh one being they have a mask and zero no mask and if we did this in Reverse I just realized that this might not make sense if you’ve never done this before let me run this 01 so zero is uh do they have a mask on zero do they not have a mask on one so this is the same as what we saw up here without mask one equals um the second value is without mask so with masks without mask uh and that’s just a with any of your data processing we can’t really a zero if you have a 01 output uh it causes issues as far as training and setting it up so we always want to use a one hot encoder if the values are not actual uh linear value or regression values are not actual numbers if they represent a thing and so now we need to go ahead do our train X test X train y test y um train split test data and we’ll go ahead and make sure it’s going to be uh random and we’ll take 20% of it for testing and the rest for um setting it up as far as training their model this is something that’s become so cool when they’re training the Set uh they realize we can augment the data what does augment mean well if I rotate the data around and I zoom in iom zoom out I rotate it um share it a little bit flip it horizontally um fill mode as I do all these different things to the data it um is able to it’s kind of like increasing the number of samples I have uh so if I have all these perfect samples what happens when we only have part of the face or the face is tilted sideways or all those little shifts cause a problem if you’re doing just a standard set of data so we’re going to create an augment and our image data generator um which is going to rotate zoom and do all kinds of cool thing and this is worth looking up this image data generator and all the different features it has um a lot of times I’ll the first time through my models I’ll leave that out because I want to make sure there’s a thing we call build the fail which is just cool to know you build the whole process and then you start adding these different things in uh so you can better train your model and so we go and run this and then we’re going to load um and then we need to go ahead and you probably would have gotten an error if you hadn’t put this piece in right here um I haven’t run it myself cuz the guys in the back did this uh we take our base model and one of the things we want to do is we want to do a mobile net V2 um and this what we this is a big thing right here include the top equals false a lot of data comes in with a label on the top row uh so we want to make sure that that is not the case uh and then the construction of the head of the model that will be placed on the top of the base model uh we want to go ahead and set that up and you’ll see a warning here I’m kind of ignoring the warning because it has to do with the uh size of the pictures and the weights for input shape um so they’ll it’ll switch things to defaults just saying hey we’re going to Auto shape some of this stuff for you you should be aware of that with this kind of imagery we’re already augmenting it by moving it around and flipping it and doing all kinds of things to it uh so that’s not a bad thing in this but another data it might be if you’re working in a different domain and so we’re going to go back here and we’re going to have we have our base model we’re going to do our head model equals our base model output um and what we got here is we have an average pooling 2D pool size 77 head model um head model flatten so we’re flattening the data uh so this is all processing and flattening the image and the pooling has to do with some of the ways it can process some of the data we’ll look at that a little bit when we get down to the lower level on this processing it um and then we have our dents we’ve already talked a little bit about a d just what you think about and then the head model has a Dropout of 0.5 uh what we can do is a Dropout the Dropout says that we’re going to drop out a certain amount of nodes while training uh so when you actually use model it will use all the notes but this drops certain ones out and it helps stop biases from up forming uh so it’s really a cool feature on here they discovered this a while back uh we have another dense mode and this time we’re using soft Max activation lots of different activation options here softmax is a real popular one for a lot of things and so is Ru and you know there’s we could do a whole talk on activation formulas uh and why what they uses are and how they work when you first start out you’ll you’ll use mostly the ru and the softmax for a lot of them uh just because they’re they’re some of the basic setups it’s a good place to start uh and then we have our model equals model inputs equals base model. input outputs equals head model so again we’re still building our model here we’ll go ahead and run that and then we’re going to Loop over all the layers in the base model and freeze them so they will not be updated during the first training process uh so for layer and base model layers layers. trable equals False A lot of times when you go through your data um you want to kind of jump in partway through um I I’m not sure why in the back they did this for this particular example um but I do this a lot when I’m working with series and and specifically in stock data I want it to iterate through the first set of 30 Data before it does anything um I would have to look deeper to see why they froze it on this particular one and then we’re going to compile our model uh so compiling the model atom init layer Decay um initial learning rate over epics and we go ahead and compile our loss is going to be the binary cross entropy which we’ll have that print out Optimizer for opt metrics is accuracy same thing we had before not a huge jump as far as um the previous code and then we go ahead and we’ve gone through all this and now we need to go ahead and fit our model uh so train the head of the network print info training head run now I skipped a little time because it you’ll see the run time here is um at 80 seconds per epic takes a couple minutes for it to get through on a single kernel one of the things I want you to notice on here while we’re while it’s finishing the processing is that we have up here our augment going on so anytime the train X and train y go in there’s some Randomness going on there and is jiggling it around what’s going into our setup uh of course we’re batch sizing it uh so it’s going through whatever we set for the batch values how many we process at a time and then we have the steps per epic uh the train X the batch size validation data here’s our test X and Test Y where we’re sending that in uh and this again it’s validation one of the important things to know about validation is our um when both our training data and our test data have about the same accuracy that’s when you want to stop that means that our model isn’t biased if you have a higher accuracy on your uh testing you know you’ve trained it and your accuracy is higher on your actual test data then something in there is probably uh has a bias and it’s overfitted uh so that’s what this is really about right here with the validation data and validation steps so it looks like it’s let me go ahead and see if it’s done processing looks like we’ve gone ahead and gone through two epics again you could run this through about 20 with this amount of data and it would give you a nice refined uh model at the end we’re going to stop at 2 because I really don’t want to sit around all afternoon and I’m running this on a single thread so now that we’ve done this we’re going to need to evaluate our model and see how good it is and to do that we need to go ahead and make our predictions um these are our predictions on our test X to see what it thinks are going to be uh so now it’s going to be evaluating the network and then we’ll go ahead and go down here and we want to need to uh turn the index in remember it’s it’s either zero or one it’s uh 0 1 01 on you have two outputs uh not wearing uh wearing a mask not wearing a mask and so we need to go ahead and take that argument at the end and change those predictions to a zero or one coming out uh and then to finish that off we want to go ahead and let me just put this right in here and do it all in one shot we want to show a nicely formatted classification report so we can see what that looks like on here and there we have it we have our Precision uh it’s 97% with a mask there’s our F1 score support without a mask 97% um so that’s pretty high high setup on there you know three people are going to sneak into the store who are without a mask and that thinks they have a mask and there’s going to be three people with a mask that’s going to flag the person at the front to go oh hey look at this person you might not have a mask um if I guess it’s a set up in front of a store um so there there you have it and of course one of the other cool things about this is if some some’s walking into the store and you take multiple pictures of them um you know this is just an it it would be a way of flagging and then you can take that average of those pictures and make sure they match or don’t match if you’re on the back end and this is an important step because we’re going to this is just cool I love doing this stuff uh so we’re going to go ahead and take our model and we’re going to save it uh so model save Mass detector. model we’re going to give it a name uh we’re going to save the format um in this case we’re we’re going to use the H5 format and so this model we just programmed has just been saved uh so now I can load it up in model now they can use it for whatever and then if I get more information uh and we start working with that at some point I might want to update this model um make a better model and this is true of so many things where I take this model and maybe I’m uh running a prediction on uh making money for a company and as my model gets better I want to keep updating it and then it’s really easy just to push that out to the actual end user uh and here we have a nice graph you can see the training loss and accuracy as we go through the epics uh we only did the you know only shows just the one Epic coming in here but you can see right here as the uh um value loss train accuracy and value accuracy starts switching and they start converging and you’ll hear converging this is the convergence they’re talking about when they say you’re you’re um I know when I work in the S kit with sklearn neural networks this is what they’re talking about a convergence is our loss and our accuracy come together and also up here and this is why I’d run it more than just two epics as you can see they still haven’t converged all the way uh so that would be a cue for me to keep going but what we want to do is we want to go ahead and create a new Python 3 program and we just did our train mask so now we’re going to go ahead and import that and use it and show you in a live action um get a view of uh both myself in the afternoon along with my background of an office which is in the middle still of reconstruction for another month and we’ll call this uh mask detector and then we’re going to grab a bunch of um a few items coming in uh we have our um mobilet V2 import pre-processing input so we’re still going to need that um we still have our tensor floral image to array we have our load model that’s where most of the stuff’s going on this is our CV2 or open CV again I’m not going to dig too deep into that we’re going to flash a little open CV code at you uh and we actually have a tutorial on that coming out um our numpy array our IM utilities which is part of the open CV or CV2 setup uh and then we have of course time and just our operating system so those are the things we’re going to go ahead and set up on here and then we’re going to create this takes just a moment our module here which is going to do all the heavy lifting uh so we’re going to detect and predict a mask we have frame face net Mass net these are going to be generated by our open CV we have our frame coming in and then we want to go ahead and and create a mask around the face it’s going to try to detect the face and then set that up so we know what we’re going to be processing through our model um and then there’s a frame shape here this is just our um height versus width that’s all HW stands for um they’ve called it blob which is a CV2 DNN blob form image frame so this is reformatting this Frame that’s going to be coming in literally from my camera and we’ll show you that in a minute that little piece of code that shoots that in here uh and we’re going to pass the blob through the network and obtain the face detections uh so faet do set inport blob detections face net forward print detections shape uh so these is this is what’s going on here this is that model we just created we’re going to send that in there and I’ll show you in a second where that is but it’s going to be under face net uh and then we go ahead and initialize our list of faces their corresponding locations and the list of predictions from our face mask Network we’re going to Loop over the detections and this is a little bit more work than you think um as far as looking for different faces what happens if you have a fa a crowd of faces um so We’re looping through the detections and the shapes going through here and probability associated with the detection uh here’s our confidence of detections we’re going to filter out weak detection by ensuring the confidence is greater than the minimum confidence uh so we’ said it remember zero to one so 0 five would be our minimum confidence probably is pretty good um and then we’re going to put in compute bounding boxes for the object if I’m zipping through this it’s because we’re going to do an open CV and I really want to stick to just the carass part and so I’m I’m just kind of jumping through all this code you can get a copy of this code from Simply learn and take it apart or look for the open CV coming out and we’ll create a box uh the box sets it around the image ensure the bounding boxes fall within dimensions of the frame uh so we create a box around what’s going to what we hope is going to be the face extract the face Roi convert it from BGR to RGB Channel again this is an open CV issue not really an issue but it has to do with the order um I don’t know how many times I’ve forgotten to check the order colors we’re working with open CV because it’s all kinds of fun things when red becomes blue and blue becomes red uh then we’re going to go ahead and resize it process it frame it uh face frame setup again the face the CBT color we’re going to convert it uh we’re going to resize it image to array pre-process the input uh pin the face locate face. x. y and x boy that was just a huge amount and I skipped over a ton of it but the bottom line is we’re building a box around the face and that box because the open CV does a decent job of finding the face and that box is going to go in there and see hey does this person have a mask on it uh and so that’s what that’s what all this is doing on here and then finally we get down to this where it says predictions equals mass net. predict faces batch size 32 uh so these different images of where we’re guessing where the face is are then going to go through an generate an array of faces if you will and we’re going to look through and say does this face have a mask on it and that’s what’s going right here is our prediction that’s the big thing that we’re working for and then we return the locations and the predictions the locations just tells where on the picture it is and then the um prediction tells us what it is is it a mask or is it not a mask all right so we’ve loaded that all up so we’re going to load our serialized fac detector model from dis um and we have our the path that it was saved in obviously you’re going to put it in a different path depending on where you have it or however you want to do it and how you saved it on the last one where we trained it uh and then we have our weights path path um and so finally our face net here it is equals CB2 dn. read net uh Proto text path weights path and we’re going to load that up on here so let me go ahead and run that and then we also need to I’ll just put it right down here I always hate separating these things in there um and then we’re going to load the actual mass detector model from dis this is the the the model that we saved so let’s go ahead and run that on there also so this is in all the different pieces we need for our model and then the next part is we’re going to create open up our video uh and this is just kind of fun because it’s all part of the open CV video setup and me just put this all in as one there we go uh so we’re going to go ahead and open up our video we’re going to start it and we’re going to run it until we’re done and this is where we get some real like kind of live action stuff which is F this is what I like working about with images and videos is that when you start working with images and videos it’s all like right there in front of you it’s Visual and you can see what’s going on uh so we’re going to start our video streaming this is grabbing our video stream Source zero start uh that means it’s C grabbing my main camera I have hooked up um and then you know starting video you’re going to print it out here’s our video Source equals zero start Loop over the frames from the video stre stream oops a little redundancy there um let me close I’ll just leave it that’s how they had it in the code so uh so while true we’re going to grab the frame from the threaded video stream and resize it to have the maximum width of 400 pixels so here’s our frame we’re going to read it uh from our visual uh stream we’re going to resize it and then we have a returning remember we returned from the our procedure the location and the prediction so detect and predict mask we’re sending it the frame we’re sending it the face net and the mass net so we’re sending all the different pieces that say this is what’s going through on here and then it returns our location and predictions and then for our box and predictions in the location and predictions um and the box is is again this is an open CV set that says hey this is a box coming in from the location um because you have the two different points on there and then we’re going to unpack the box and predictions and we’re going to go ahead and do mask without a mask equals prediction we’re going to create our label no mask we create color if the label equals mask l225 and you know this is going to make a lot more sense when I hit the Run button here uh but we have the probability of the label we’re going to display the label and bounding box rectangle on the output frame uh and then we’re going to go ahead ahead and show the output from the frame CV2 IM am show frame frame and then the key equals CV2 weit key one we’re just going to wait till the next one comes through from our feed and we’re going to do this until we hit the stop button pretty much so are you ready for this let’s see if it works we’ve distributed our uh our model we’ve loaded it up into our distributed uh code here we’ve got it hooked into our camera and we’re going to go ahead and run it and there it goes it’s going to be running and we can see the data coming down down here and we’re waiting for the popup and there I am in my office with my funky headset on uh and you can see in the background my unfinished wall and it says up here no mask oh no I don’t have a mask on uh I wonder if I cover my mouth what would happen uh you can see my no mask goes down a little bit I wish I’d brought a mask into my office it’s up at the house but you can see here that this says you know there’s a 95 98% chance that I don’t have a mask on and it’s true I don’t have a mask on right now and this could be distributed this is actually an excellent little piece of script that you could start you know you install somewhere on a a video feed on a on a security camera or something and then you’d have this really neat uh setup saying hey do you have a mask on when you enter a store or public transportation or whatever it is where they’re required to wear a mask uh let me goe and stop that now if you want a copy of this uh code definitely give us a hauler we will be going into open CV in another one so I skipped a lot of the open CV um code in here as far as going into detail really focusing on the carass uh saving the model uploading the model and then processing a streaming video through it so you can see that the model works we actually have this working model that hooks into the video camera which is just pretty cool and a lot of fun so I told you we’re going to dive in and really Roll Up Our Sleeve and do a lot of coating today uh we did the basic uh demo up above for just pulling in a carass and then we went into a cross model uh where we pulled in data to see whether someone was wearing a mask or not so very useful in today’s world as far as a fully running application today we are talking about must have python AI projects and how to build them so that can really help you sharpen your skills and stand out in the growing field of artificial intelligence so let’s quickly see what is python python is one of the most popular programming languages for AI because it’s Simplicity and the powerful libraries it offers like tensorflow kasas and pytorch building projects using python is a great way to get started if you want to break into the AI industry whereas artificial intelligence is transforming Industries like healthcare finance and even entertainment companies are now looking for expert who know how to apply AI to real world problems in this video we will expl L beginner level to advanced level projects so these projects are designed to give you hands-on experience in building intelligent systems analyzing data and even automating tasks so without any further Ado let’s get started so let’s start with beginner level projects so number one we have fake new detection using machine learning so in today’s world fake news is a major concern causing misinformation to spread rapidly across social media and news platform detecting fake news is crucial to maintaining the Integrity of the information we consume so this projects aim to a machine learning model that can identify fake news articles by analyzing their textual content by learning from existing data sets of real and fake news the model will able to classify articles into these two categories thus assisting media outlets and social media platform in reducing the spread of misinformation this project is perfect introduction to natural language processing NLP as it involves X data manipulation feature extraction and supervised learning it can also be adapted for real-time use on website or social media platforms to flash suspicious article and provide more reliable information to users so now let’s see how to build this in this first step is data collection use data set like l or fake News Net that can contain label real and fake news articles you can find data set from the kle the second step is pre-processing clean the text by removing stop words punctuation and spal character tokenize and stem words using nltk or spacy the third step is feature extraction use TF IDF or bag of words to convert the text into numerical data for machine learning models the fourth step is model training train a classifier like logistic regression na Bas or random Forest on the data set the fifth step is evaluation evaluate the model using accuracy precision recall and F score metrix to determine how well it classify fake and real news tools you can use is nltk psyched learn and pandas skills you will gain from this are text pre-processing NLP classification models and if you want to make us video on this project please comment Down Below in number two we have image recognition using CNN image recognition is one of the core application of deep learning and computer vision Us in variety of Industries ranging from Healthcare to autonomous vehicles this project will guide you through building an image classifier using CNS a deep learning architecture designed specifically for image recognition task the goal is to create a model that can accurately classify images such as differentiating between cats and dogs by working on this project you will gain a solid understanding of the fundamental concepts of CNS such as convolution layers pooling and activation functions this project not only introduces you to the basics of CNS but also teaches essential skills like image pre-processing data set handling and model evalution which can be applied to more advanced computer vision projects in the future so now let’s see how to build this project in this project first you have to import data set so use a data set like cifr 10 or kegle cats versus dogs with label images the second step is pre-processing resize normalize and augmented images using libraries like open CV or pil to prepare the data set the third step is model architecture design a basic CNN model with convolutional Cooling and fully connected layers using K us or tensor flow the fourth step is training split the data set into training and validation sets and train the CNN model to classify images fifth step is model evaluation use accuracy precision and confusion metrics to evaluate how well the model predicts the correct class level tools you will use in this project are kasas tensorflow open CV pandas skills you will gained after doing this projects are image pre-processing CNN architecture model evaluation and if you want to make us video on image recognition using CNN please comment down below second we have intermate level projects in this first we have ai based recipe recommendation system recommendation systems have become an integral part of modern digital platform from e-commerce website suggesting product to users to streamline services recommending shows and movies so in this project you will build a recipe recommendation system that suggest diseses based on the ingredients a users has on hand this project demonstrate how recommendation algorithms such as content based filtering and collaborative filtering can be used to provide personalized suggestion you will also learn how to pre-process data and clean textual data such as ingredient list and Implement a machine learning algorithm to match user inputs with recipe databases this is an excellent project for understanding how recommendations system work and how they can be applied in various industries from food Tech to personalized contact recommendations so now let’s see how to build this in this Project’s first step is data collection use web scraping tools like beautiful soup to scrape recipes from websites or use data sets like recipe 1M the second step is pre-processing normalize and clean ingredient data by standardizing ingredient names and handling missing values the third step is recommendation algorithms Implement content based filtering and collabor filtering to recommend it recipes content based filtering matches ingredient list while collaborative filtering uses user preferences model fourth step is model training use cosign similarity to match user provided ingredients with recipe ingredient in the data set and the next step is interface and tools so create a simple web interface using flask where users can input ingredients and receive recipe recommendation tools you will use beautiful soup Panda psyched learn and flask so you will gain skills like web screen in data cleaning recommendation systems using this project by doing this project and if you want to make us video on this project please comment down below number fourth we have chatbot with sentiment analysis chatbots have transformed how businesses and services interact with users providing 24×7 support and personalized responses in this project you will build a conversational chatbot that can analyze the sentiment behind user inputs and respond accordingly by incorporating sentiment analysis the chatbot will not only understand the content of the users message messages but also the emotional toll such as whether the user is Happy frustrated or neutral this allow the chatboard to adjust a tone and responses to improve user satisfaction for example a chatboard could offer a more empathetic response if it detects a negative sentiment this project will give you hands-on experience in building a conversational AI system while learning how to integrate machine learning technique like sentiment analysis the skills you develop in this project can be applied to customer service Healthcare education and more so now let’s see how to build this project in this project the first step is chatbot framework use tools like dialog flow or rasa to build a conversational chatbot capable of handling various user intents the second step is sentiment analysis integrate a sentiment analysis model using pre-rain models like Veda orber the third step is conversational flow adjust the chatbox responses based on the sentiment positive negative or neutral detected in the user’s input the fourth step is integration and deployment so build an interface website or messaging platform where users can interact with the chatbot in real time and deploy the chatbot on a website or app allowing user to engage with it and receive sentiment aware responses so you will use tools in this are dialog flow rasa verer Transformers flask and you will gain skills like sentiment analysis chatbot deployment conversational year and if you want to make us video on this project please comment down below so now let’s see some advanced level projects in this we have ai powered image colorization image colorization is a fascinating application of deep learning that transform black and white images into color by predicting and applying realistic colors to grayscale images this project explore how CNN and G can be used to learn mapping between gray scale and colored images you will gather data set of colored images convert them into gray scale and train the model generate color version this project is especially useful in areas such as film restoration photography and artistic creation where colorization can breathe new life into old black and white images more however it highlights the power of deep learning in understanding and generating complex visual data giving you insight into how these models work for tasks like image Generation video prediction and Beyond the skills you learn in this project will also be useful for other creative AI application like style transfer and image synthesis so now let’s see how to build this project so in this project first step is data collection use data set of colored image convert them into gray scale and use the grayscale images as inputs while training the model to Output colorized version the second seate preprocessing normalize images pixel values and resize them for training the third step is model architecture Implement a unit model or generative advisal Network G which are well suited for image generation task like colorization fourth step is training and evaluation train the model on grayscale images as input and colored images as output using mean squared error method for guidance evaluate with visual inspection and Peak signal to noise ratio the last step is deployment so create a web interface where user can upload black and white images and get them colorized so you will use tools like tensor flow K US Open CV and flask so we will gain skills like deep learning CNN GN and image pre-process and if you want to make a video on AI powered image colorization project so please comment down below and the last we have object detection using YOLO you only look once object detection is one of the most popular computer vision application allowing machine to recognize and locate multiple objects within an image or video stream in real time YOLO is a state-of-the-art object detection algorithm known for its speed and accuracy this project involves building a real-time object detection systems capable of identifying multiple objects in images or videos feeds by drawing bounding boxes around them object detection has wide spread use in fields such as security surveillance autonomous driving and augmented reality where system need to understand and interact with their surroundings in real time by working on this project you will learn how to pre-process image data format bounding box labels and train a YOLO model using a data set like koku or Pascal you will also gain a valuable experience in deploying object detection system that process video streams giving you the skills to build application in Dynamic environment from traffic monitoring to Industrial robotics so now let’s see how to build this projects so in this project we will first import data set so use a data set like coko or Pascal V which contains label objects in images with bounding boxes the second step is pre-processing resize image and normalize pixel values uring bounding box labels are appropriately formatted the third step is model architecture use the U Loop U only Loop once architecture which splits images into a grid and predicts bounding boxes and class probabilities for each object the fourth step is training and evaluation train the YOLO model on label data using a framework like Darkness evaluate the model using metrics like intersection over Union and mean average Precision the last step is deployment develop a system that can process video streams in real time detecting objects and drawing bounding boxes around them tool you will use open CV tensor flow and darket you will gain skills like object detection YOLO architecture Real Time video processing and if you want to make vide on this project please comment down below so in conclusion these python AI projects not only help you build Hands-On skills but also provide a solid foundation for advancing your career in artificial intelligence whether you are working on fake news detection image recognition or developing Advanced tools like chatbots and object detection system these projects offer real world application that companies are looking for start small keep learning and as you complete each project you will get better prepared to take on the challenges in the growing AI field imagine this you are using a calculator app on your phone and it gives you an answer to be a complex math problem faster than you can blink pretty standard right but what if instead of just crunching numbers that app could actually think through the problem breaking it down like a human would considering the best approach and even explaining why it made certain decisions sound futuristic doesn’t it well we are not too far from that reality today we are diving into open A’s latest project code named Strawberry a new AI model that pushing the boundaries of reasoning and problem solving so in this video we will break down what makes strawberry special how it works and why it could change the game for AI systems moving forward so first off what exactly is strawberry according to recent report open AI is preparing to release this new AI model in the next two weeks or in the couple of weeks and it’s set to improve on things like reasoning and problem solving previously known as Q or qar this model is designed to be much better at thinking through problems compared to what we have seen from previous versions what makes a strawberry different from what we have used before so now let’s take a look one of the coolest things about strawberry is that it uses something called system Toth thinking this idea came from the famous psychologist Daniel kman and it refers to a more careful and slow way of thinking like when you really focus on solving a tricky problem instead of answering question instantly strawberry takes about 10 to 20 seconds to process its thought this extra time helps it to avoid mistakes and gives more accurate answers but the model doesn’t just think slowly it’s got some really cool abilities that makes it stand out let’s talk about those strawberry is built to handle Advanced reasoning and solve mathematical problems these are areas where AI system struggles but strawberry is designed to be a lot better at breaking down complex problem step by step and and here is something interesting it might even be added to Future versions of chity possibly as a model name called Oran or GPT 5 if it that happen it could mean chat GPT will become more smarter and more reliable in solving tough problems now here is where it gets really fascinating there is some research that might help us understand how strawberry improv it thinking let’s check it out you might have heard about something called star which stand for selftaught Reasoner this is a method where an AI can actually teach itself to think better here is how it works star starts with a few examples where the AI is shown how to solve problem step by step then the AI tries solving their problem on its own getting better as it goes it keeps improving by looking as its mistakes and learning from them this could be what’s happening with strawberry it’s using similar method to teach itself how to reason better and solve complex problem but the AI doesn’t just think better it’s also learning how to break down the problems in a very humanlike way so now let’s explore that next strawberry uses something called Chain of Thought reasoning basically when faced with a complex problem it breaks it down into smaller manageable steps kind of like how we do when we are solving a puzzle instead of just jumping on to an answer it takes a time to go through each step making the solution more understandable and accurate so this is especially useful in math where strawberry is expected to be a really strong with all its potential what does the future hold for AI models like strawberry so now let’s W this thing with a look at what’s next so now what’s next for open AI well strawberry is just the beginning there is talk about a future model called Oran which could be the next big version after gp4 or gp40 it may even use that strawberry Learners to get better at solving problems but here is the thing TR training these Advanced model is expensive training gp4 for example cost over 100 million even though open AO Sam old men said the era of just making bigger models is coming to an end it’s clear that the models like strawberry are focused on becoming smarter and more efficient so what does all of this mean for the future of AI and how we use it strawberry could represent a huge leap in ai’s ability to reason and solve complex problem so with its focus on slower more deliberate thinking and its potential connection to the star method it’s Paving the way for smarter more reliable AI system and this is just the star as we move forward models like Oran the possibilities are endless and that’s a r on open AI exciting new model strawberry it’s clear that this AI could bring major advancement in reasoning and problem solving and we can’t to see how it all unfolds what are thoughts on your strawberry do let us know in the comment section below today we are diving into the fascinating world of of Google Quantum AI we break it down step by step what Google Quantum AI is how is different from classical computers and why it’s a GameChanger and the real problem it’s solving we’ll also explore the latest developments their Innovative Hardware the challenges they face and why despite the hurdles it’s still an incredibly exciting field with a bright future stick with me because by the end you’ll be amazed at how this technology is shaping tomorrow so let’s get started the universe operat on quantum mechanics constantly adapting and evolving to overcome the hurdles it encounters Quantum Computing miror the dynamic nature it doesn’t just work within its environment it responds to it this unique tra opens the door to groundbreaking solutions for tomorrow’s toughest challenges the question arises what is Google Quantum AI Quantum AI is Google’s leap into the future of computing it’s a cuttingedge project where they are building powerful quantum computers and exploring how these machines can solve problems that traditional computers struggle with or can’t solve at all if not aware classical computers use bits like zero or one and solve tasks step by step great for everyday use now quantum computers use cubits which can be zero one or both simultaneously allowing them to solve complex problems much faster so think of Google Quantum AI like you’re trying to design a new medicine to fight a disease a regular computer would analyze molecules step by step which could take years but go Quantum AI on the other hand can simulate how molecules interact at the quantum level almost instantly this speeds up drug Discovery potentially saving millions of lives by finding treatments faster now you must be wondering why is it so necessary Google Quantum AI is necessary because some problems are just too big and complex for regular computers to solve efficiently these are challenges like developing life-saving medicines creating unbreakable cyber security optimizing Traffic systems or even understanding How the Universe works regular computers can take years or even centuries to crack these problems while quantum computers could solve them in minutes or hours so the question is actually what problems they’re solving it is basically solving so many problems I will list some of them number one drug Discovery simulating molecules to find new treatments faster then comes cyber security developing Ultra secure encryption systems to keep your data safe AI advancements training AI models much quicker and with more accuracy climate modeling understanding climate changes to create better solutions for global warming so in simple terms Google Quantum AI is here to tackle The Impossible problems and bring futuristic solutions to today’s challenges it’s like upgrading the world’s brain to things smarter and faster so Google Quantum AI has been at the for front of quantum Computing advancements pushing boundaries from the groundbreaking psychol process to the latest Innovation Willow in 2019 Google introduced psychor a 53 Cubit processor that achieved something called Quantum Supremacy so cubits or Quantum bits are the code of quantum computers unlike regular bits which are either zero or one cubits can be zero one or both at once this called superposition allowing quantum computers to process vast data simultaneously they are powerful but fragile needing precise control and hold the key to solving complex problems psychos solved a problem in just 200 seconds that would take the world’s fastest supercomputer over 10,000 years this was a big moment it showed quantum computers could do things that classical computers couldn’t after psychor scientists realized a key issue quantum computers are very sensitive to errors even small disturbances can mess up calculations to fix this Google started working on error correction making their systems more accurate and reliable for real world use in 2024 Google launched Willow a 105 Cub processor this ship is smarter and more powerful and it can correct errors as they happen so Willow shows how much closer we are to building quantum computers that can solve practical problems Google’s logical chibits have reached a huge breakthrough they Now operate below the critical Quantum error correction threshold sounds exciting right but what does this mean let’s break it down so conun computers use cubits which are very powerful but also very fragile they can easily be disrupted by noise or interference causing errors so to make quantum computers practical they need to correct these errors while running complex calculations this is where logical cubits comes in they group multiple physical cubits to create a more stable and reliable unit for computing the error correction threshold is like a magic line if errors can be corrected faster than they appear the system becomes scalable and much more reliable by getting their logical keybords to operate below this threshold Google has shown that their quantum computers can handle ERS effectively Paving the way for larger and more powerful Quantum systems so let’s discuss what is a great Hardware approach in Google Quantum AI that made it possible Google Quantum ai’s Hardware approach focuses on making quantum computers stable and reliable for practical use they group cubits which are the building blocks of quantum computers to work together allowing the system to fix errors as they happen so by keeping the chips at extreme cold temperatures they reduce interference which keeps the calculations accurate this setup helps the system handle bigger and more complex tasks like simulating molecules for drug Discovery improving AI models and creating stronger encryption for data security it’s a big step in making Quantum Computing a tool for solving real world problems so while Google Quantum AI has achieved incredible Milestone it still faces some key limitations which are fragile cubits cubits are extremely sensitive to noise and interference which can cause errors keeping them stable requires Ultra cold temperatures and precise control error correction challenges also Google has made progress in fixing errors Quantum error correction still isn’t perfect and needs more work before quantum computers can scale to solve real world problems reliably limited applications right now quantum computers are great for specialized problems like optimization and simulation for everyday Computing tasks classical computers are still better Hardware complexity building and maintaining a quantum computers incredibly expensive and complicated the advanced cooling systems and infrastructure make it hard to expand these systems widely still in early stages quantum computers including Google’s are still in the experimental phase they’re not yet ready for large scale practical useing Industries but despite its challenges Google Quantum AI is Paving the way for a future where Quantum Computing tackles problems that regular computers can’t handle like finding new medicines predicting climate changes and building smarter AI it’s an exciting start to a whole new era of Technology full of possibilities we are just beginning to explore the future of Google Quantum AI is incredibly exciting with the potential to solve real world problems that traditional computers can’t handle it’s set to revolutionalize Industries like healthare by speeding up drug Discovery Finance through advanced optimization and energy with better material modeling so Quantum AI could also lead to breakthroughs in AI by trailing smarter models faster and commuting Unbreakable encryp for stronger data security as Google improves its hardware and error correction its Quantum systems will become more powerful and reliable Paving the way for large scale practical applications the possibilities are endless and Google Quantum AI is the Forefront of shaping a transformative future artificial intelligence or AI is Transforming Our World making things faster and more efficient but what happens when AI makes mistakes when AI is biased it can have serious consequences for companies and people’s lives imagine missing out on a job being wrongly identified in a photo or being unfairly treated all because a computer program made a bad decision these mistakes don’t just harm individuals they can affect entire communities without realizing it so AI bias is also called algorithmic bias happens when AI system unintentionally favor one group over another take healthcare for example if the data used to train an AI system doesn’t include enough woman or people for minority groups the system might not work as well for them this can lead to incorrect medical prediction like giving black patients less accurate result than white patients in job hiring AI can unintentionally promote certain stereotypes like when job ads use term like ninja which may attract more men than women even though the term isn’t a requirement for the job even in Creative areas like image generation AI can reinforce biases when asked to create picture of cosos AI system often mostly white men leaving out women and people of color in law enforcement AI tools sometimes rely on biased data which can unfairly Target minority communities so in this video we will explore some well-known examples of AI bias and how these mistakes are impacting people and Society from healthare to hiring and even criminal justice AI bias is something we need to understand and fix so let’s dive in and learn how these bias happen and what can be done to stop them so without any fur further Ado let’s get started so what is AI bias AI bias also called machine learning bias happens when human biases affect the data used to train AI system causing unfair or inaccurate results when AI bias isn’t fixed it can hurt a business success and prevent some people from fully participating in the economy or Society biases makes AI less accurate which reduce its Effectiveness businesses May struggle to benefit from system that give unfair result and Scandals from a bias can lead to loss of trust especially among groups like people of color woman people with disabilities and the lgbtq community AI models often learn from the data that reflect Society biases this can lead to unfair treatment of marginalized groups in the areas like hiring policing and credit scoring as a Wall Street Journal notes businesses still find it challenging to address this widespread biases as the AI use grows so moving forward let’s see some sources of AI bias distorted outcomes can negatively affect both organization and society as a whole so here are some common forms of AI bias the first one is algorithm bias if the problem or question is not well defined or the well feedback provided to the machine learning algorithm is inadequate the result may be inaccurate or misleading the second one is cognitive bias since AI system rely on a human input they can be affected by unconscious human biases which may influence either the data set or the model’s Behavior the third one confirmation bias this occurs when the AI overly depends on existing beliefs or Trends in the data reinforcing prior biases and falling to detect new patterns or Trends the fourth one execution bias when important data is omitted from the data set often due to the developer overlooking New or crucial factors this type of bias arises the fifth one measurement biases so this bias stems from incomplete data such as when a data set fails to represent the entire population for instance if a college analyze only graduates to determine success factors it would Overlook reasons why other drop out so moving forward let’s see how to avoid bias so here are checklist of six process steps that can keep AI programs free of bias the first one is choose the right model ensure diverse stakeholder select training data in supervised model and integrate bias Direction tools in unsupervised models the second one use accurate data train AI with complete balanced data that reflects the true demographics the third one build a diverse team a Vari team helps sport buyers including innovators creators implementers and end users the next one watch data processing buyers can appear during any phase of processing so stay viland throughout the fifth one monitor regularly so continuously test models and have independent assessment to detect and fix biases and the last one check infrastructure ensure technological tools and sens are functioning properly to avoid hidden biases so conclusion is AI biases poses serious challenges by amplifying existing societal biases affecting individuals and businesses from Healthcare to hiring AI system can unintentionally reinforce stereotypes and inequalities imagine you are managing a global supply chain company and where you have to handle orders shipments and demand forecasting but unexpected issues arises where certain shortages like transport delays and the changes in demand so instead of relying on manual adjustments what if an AI agent could handle everything automatically this AI wouldn’t just suggest actions it would decide execute and continuously improve its strategies That’s The Power of agent Ki with that said guys I welcome you all on our today’s tutorial on what is Agent Ki now let us start with understanding first the first wave of artificial intelligence which was Predictive Analytics or we could say data analytics and forecasting what exactly happened happened uh like predictive AI focused more on analyzing the historical data identifying the patterns and making forecast about the future events and these model do not generate any new content but instead it was predicting outcomes based on the statistical models and machine learning now technically how used to work so basically what we had like we used to take uh suppose this is the ml model okay so this is taking a structured data which could be like suppose any past user activity or it could be a transaction record or any sensor reading for example you can consider say Netflix users watch History okay it could be any movie genre watch time and the user rating so now after this what we were basically doing is we were doing the feature engineering or pre-processing okay now in the future uh Engineering Process we were extracting key features like user watch time Trends preferred genre and was frequency and we could

also apply scaling normalization and encoding techniques to basically make data more usable for the ml model then we were using the ml models suppose it could be a Time series forecasting models like ARA lstm and all those given algorithms which was basically predicting the Future movie preferences based on the historical data and in the output guys Netflix AI recommends new shows or movies based on the similar user patterns so this is how exactly the Netflix model was working incorporating the machine learning model so this was exactly the first wave of AI now let us discuss about the second wave of AI now if I discuss about the second wave which was basically content creation and use of conversational AI so you know LM models like chat GPT became very much popular during the second wave of artificial intelligence so what exactly was happening like generative AI was taking input data and it was producing new content such as text images videos or even code and these models learn from patterns in large data sets and it was generating humanik outputs now let us bit understand how exactly this technology was working so basically first there was a data input okay so basically any prompt from the user so suppose in the GPT okay so I’ll just open GPT all over here and say we are uh suppose we are giving any new prompt say such as write a article on AI okay so this was our given prompt and after this what exactly was happening was tokenization and pre-processing so the input text suppose which I have written all over here write a article on AI so this text was basically split into smaller parts for example like uh you could consider certain thing like this so here you have WR as one uh you know and as next and similarly you could carry on you know for the other words then what exactly used to happened that these words were you know uh converted into word embeding means the numerical vectors represent words like in a higher dimensional space and then we used to perform neural network processing so here the LM processes input such as attention mechanisms okay or you know using these models like gb4 bir and L and with the help of self attention layers they were understanding the context and they were predicting the next word okay now as a result you were getting output certain thing like this so which was basically a generat AI phase so this was guys our second evolution of AI now if I talk about our third wave so it is basically agentic AI or autonomous AI agent now what is this guys so the agentic AI actually goes beyond text generation so it integrates decision making action execution and autonomous learning these AI systems don’t just respond to prompts but they also independently plan execute and optimize the processes so you could understand something like this so so here the first uh step was the user input or receiving any go so user provides any high level instruction for example it could be like say optimize Warehouse shipments for maximum efficiency it could something be like that and unlike generative AI which would generate text agentic AI executes real world actions after this what suppose The Prompt that we have given like optimize Warehouse shipments for maximum efficiency then the next step would have been quering the databases the AI would pull the realtime data from multiple sources so it could be traditional database like SQL or no SQL where we are fetching inventory levels or shipment history then it could be a vector uh database from where it is receiving some unstructured data like past customer complaints and all those things then with the help of external apis it is connecting to like forecasting services or fuel price apis or supplier Erp systems and these things are like present with this uh respect then uh the third step was the llm decision making now after quing the database the AI agent processes data through the llm based reasoning engine example like decision rules applied like suppose if inventory is low then it could automate supplier restocking orders like if shipment cost is increased then it is rerouting shipments through cheaper vendors and suppose also if weather condition impact the route then it is adjusting the delivery schedules now you can understand how agentic AI is behaving all over here in the decision making process now next step would be action execution bya apis so AI is executing task without human intervention it is triggering an API call to reorder a stock from A supplier or update the warehouse robot workflows to PRI I fast moving products or even send emails and notifications to logistic partners and about the changes what is going to be happen and after this finally it is continuously learning which is a data fly wheel all over here okay the AI is monitoring the effectiveness of its action like uh it was restocking efficient or did routing shipments you know uh reduce the cost and all so it is mon monitoring the effectiveness of the action it has taken and the data flywheel is continuously improving the future decisions so basically it is using reinforcement learning and fine-tuning to optimize its logic okay now let’s have a just quick recap about the comparison of all these three waves of AI so basically creative ai’s main focus was on forecasting the trends okay while generative AI was creating the content and agentic AI on the other hand which is at the final step right now is making decision and taking action so you could see how the Evolution happened of AI in all these stages and if you uh understand about the learning approach then productive AI was basically analyzing the historical data while generative AI was learning from the patterns like using text image generation okay and but agentic AI is basically using the reinforcement learning or the self-learning to improve its learning approach now if we just look at the user involvement in productive AI so human is asking for the forecast in all here human is giving the prompts but in the agent AI the prompts or the intervention of human input has become very much minimal if you could understand the technology like basically productive VI was using machine Learning Time series analytics so these kind of you know uh algorithms they were using generative AI was using Transformers like GPT llama BT and all those things now agentic ai is doing word guys it is using llm plus apis plus autonomous execution so we have discussed how this workflow is you know in a short way how it is working and uh moving ahead we are also going to discuss uh through an example how exactly all these steps like agent AI is working so based on the example you could understand like uh predictive AI you know Netflix recommendation model which they have on their system and uh similarly if you talk about U generative AI then you could understand about chat GPT you know writing articles and all those things and agentic AI we could imagine like how AI if Incorporated in Supply chains how you know things are working out so guys I hope so you would have got a brief idea regarding the three waves of AI now let us move ahead and bit understand about what is the exact difference between generative Ai and agentic AI now guys let us understand the difference between generative Ai and agentic AI so let us first you know deep dive into what exactly is a generative AI okay so as you can see all over here that generative AI models generally are taking input query okay and they are processing it using llm or large language model and basically returning a static response without taking any further action so in this case for example a chatbot like uh chat GPT you know it is taking the input from the user so as I’ve shown you earlier that uh say suppose I’ve given an input like write a blog post on AI in healthcare so when I have written this uh given given uh you know user input or given the query so when it goes to the large language model this model is actually you know tokenizing all these input query and it is retrieving the relevant Knowledge from its training data and it generate text based on the patterns now we give the prompt then llm processes it okay and then we are getting the given output so now this is basically how you know generative AI is working so you could see all over here we have GPT mod model we have Del we have codex so these are some of the you know amazing you know generative AI models okay now let us discuss bit about Del which is actually a you know realistic image generation you know gen so uh like Del is described as you know the realistic image generation model by the open Ai and this actually is a part of you know generative AI category alongside with GPT which is basically for human like language creation purposes this this model was created and you could have also codex for like uh it could be used for advanced code generation purposes so let us discuss a bit about di so di is like a deep learning model basically which is designed to generate realistic images from the text prompt and it can create highly detailed and creative visuals based on descriptions provided by the users so uh some of the aspects of di like you could have all over here like text to image generation where users can input text prompts and Di can generate Unique Images based on those description the images generated by di are highly realistic and creative okay and it can generate photo realistic images artistic illustration and even surreal or imaginative visuals we will also have customization and variability where it is allowing variation of an image edits based on text instruction and multiple style so this is also part of a generative AI model and it is this tool is actually playing a very amazing role so I will show you one example like how generative AI is actually working in a mage generation Our purposes so guys as you can see all over here I have opened this generative VI tool called di let us give a prompt to Di and let us see how the image is generated so let’s say we want have a futuristic city at Sunset filled with neon skyscrapper they have flying cars and holographic Billboards streets are bustling with humanoid robots and we can have people wearing uh let’s just say Hightech you know let’s include some technology okay now let us see how uh di is trying to create an image so this is how actually generative AI is working so let is wait for a few seconds as the output comes up now you could see all over here that uh this image which is generated basically this is generated by Ai and you could see based on our prompt it has given like the kind of you know uh the input we gave and we got the output based on this now so this is one of the amazing uh gen tool we could explore this guys okay now guys let us discuss about agentic AI or autonomous decision making and action execution so you could see this diagram all over here so agentic AI like unlike the generative AI it is not generating responses but it is also executing a task autonomously based on the given query for example like if you take AI in managing a warehouse inventory okay suppose we want to optimize the warehouse shipment for the next quarter so here what is going to happen so first the agent is going to receive its goal all over here okay and um this AI agent uh you know is going to query the external data sources so it could uh you know for example it could be your uh you know inventory databases or Logistics API and then it retrieves real time inventory levels and it demands the given forecast okay now at here it is going to make the autonomous discussions and the kind of output we are going to get will be kept in observation by this agent okay so basically it is going to analyze the current Warehouse stock product demand for the next quarter check the supplier’s availability and automate the restocking if inventory is below the given threshold so U for example you could uh imagine uh you know suppose based on the you know output what we are going to get all over here so based on this output we could get certain thing like this like uh say current inventory level like say 75% capacity okay then uh it could have also other thing like uh say demand forecast say 30% increas in expected in quarter two and also it is going to go say like say reordering initiated so this is output what we are going to get based on the supply chain man management and example what we are trying to get so as we have seen in generative AI user is giving the input okay prompt then it is using llm model to generate the given output but agent AI is doing what guys it is going it is going to take action you know beyond just generating a text so in this scenario it is squaring the inventory databases it is automating the purchase order it is going to select the optimal shipping providers which could be you know suitable for the given company it is going to continuously refine the strategies based on the realtime feedback so guys let’s recap Once More so if we talk about the function base then J is more concerned with producing a written content or a visual content okay and even it can code from the pre-existing input but if you talk about agent AI guys uh it is actually you know it’s all about decision making taking actions towards a specific goal and it is focused on achieving the objectives by interacting with the environment and making the aous decision gen is exactly relying on the existing data to predict and generate content based on say patterns it has learned during its training phase but it does not adapt or evolve from its experiences whereas if I talk about agentic AI it is adaptive so it is learning from its actions and experiences it is improving over time by analyzing the feedback adjusting Its Behavior to meet objectives more effectively with the help of Genna human input is essential to The Prompt so that you know basically with the help of that it could go into the LM model and it could generate the given uh you know output based on your prompt once uh you set up the agentic AI it requires like minimum human involvement it operates autonomously making decisions and adapting to changes like without continuous human guidance and it can even learn in real time so that’s what the beauty of agentic so we have given one example of gen like basically giving prompt to the chat GPT or Del okay and agentic AI one example could be your Supply Chain management system now let us bit deep dive into understanding the technical aspects of how agent AI is exactly working now guys let us try to understand how agentic AI is exactly working so there is actually a four step process of you know how agentic AI exactly works so the first step is you know perceiving where basically what we are doing is we are gathering and processing information from databases sensors and digital environments and also the next step is reasoning so with the help of large language model as a decision-making engine it is generating the solutions if we talk about the third step which is acting so it is integrating with external tools and softwares to autonomously execute the given task and finally it is learning continuously to improve through through the feedback loop which is also known as the data flyv okay now let us explore each of the step one by one and let us try to understand so if you talk about perceiving okay so this is actually the first step where agentic AI is actually stepping up so it is doing the perception where what exactly is happening guys that AI is collecting data from multiple sources so this data could be from database okay like your traditional and Vector databases Okay so it could be graph CU like vector database means the same and if you talk about other from data it could be from epis like it is fetching realtime information from external systems it is uh basically taking data from the iot sensors like for real world applications like Robotics and Logistics and also it could take you know data from the user inputs also like it could be text command voice commands or a chatbot interaction now how it is exactly working guys so basically let us recollect everything technically and let us see how this is happening so the first step which is going in perceiving is the data extraction where uh exactly the AI agent queries the structured uh databases like SQL or nosql for Relevant records uh it is also using Vector databases to retrieve any semantic data for context aware responses like it could be you know any complaint certain uh you know it is trying to find out okay so so next after it has got the data extraction it goes for feature extraction and pre-processing where AI is filtering the relevant features from the raw data for example like a fraud detection AI is scanning the transaction log for anomalies the third thing it is entity recognition and object detection so AI uses basically computer version to detect objects and images and uh then it applying the named entity recognition this is a technique okay uh to extract the critical terms from the given text also so we have three uh step-by-step process which is happening in uh perceiving the first one is data extraction second one is feature extraction and pre-processing the third one is like entity recognition and object deduction so uh let us take a very simple example like AI based customer support system so if it consider an agentic AI assistance like for a customer service so say a customer is asking where is my order so the AI queries multiple databases all over here suppose it is going to query the e-commerce order database to retrieve the order status or it could go to the logistics API to track the realtime shipment location also it could go for customer interaction history to provide personalized response the result what we get all over here is that the AI is fetching the tracking details identifying any delays if it is happening and suggesting the best course of action now uh the next step is reasoning okay now ai’s understanding and decision making and problem solving is making agentic AI very greater so here what is exactly happening like once the AI has perceived the data now it should start reasoning it okay so the LM model acts as a reasoning engine you know orchestrating AI processes and integrating with specialized models for various function so if you talk about the key components uh like here used in the reasoning it could be llm based decision making so AI agents could use models like llms like gb4 Cloud llama to interpret a user intent and generate a response it is basically coordinating with smaller AI models for domain specific task like it could be like Financial prediction or medical Diagnostics so these could be uh you know the given an example then it is using uh retrieval augmented generation or r model okay to with the help of which AI is enhancing the accur you know by retrieving any propriety data from the company’s databases for example like instead of relying on gbt 4’s knowledge the AI can fetch company specific policies to generate the accurate answers so this could be the one and uh in in the reasoning the final step is AI workflow and planning so it is a multi-step reasoning where AI is breaking down complex task into logical step for example like if asks to automate a financial report AI is retrieving the transaction data and analyzing the trend and it is formatting the results Al so for example you could use this in uh Supply Chain management suppose consider there is a logistics company which is using the agentic AI to optimize what could be the you know uh shipping routes you know so a supply chain manager requesting the AI agent to find the best shipping route to reduce the delivery cost so the AI processes realtime fuel prices traffic conditions and weather report so using llm Plus data retrieval it finds out the optimized routs and selects the cheapest carrier result you get is that AI chooses the best delivery option so here the cost is reduced and improving efficiency so this is one of the uh use cases guys uh so after perceiving you get his reasoning okay now let us move ahead and discuss about the third step which is act so in this step basically what is happening like AI is taking autonomous actions so unlike generative AI which stops at generating Conta so agentic AI takes the real world action okay how AI is executing task autonomously guys so basically first step is like here the integration with apis and software could be happen where AI can send automated API calls to the business systems for example like reordering the stock from the suppliers Epi so suppose any inventory level is going down so it could you know reorder that particular stock from the suppliers apepi so it is interacting with the given API now it could also automate the workflows like AI executes multi-step workflows without human supervision so here like AI can handle like insurance claims by verifying the documents checking policies and approving the payouts and finally AI could operate within predefined business rules okay to prevent any unauthorized actions also so ethical AI is basically being worked in this direction for example like AI can automatically process claims up to say uh $10,000 you know but it is requiring the human approval for the higher amounts So based on you know insurance and policy making stuff so agentic AI could be you know really helpful in this scenario uh one example like uh let’s consider so let’s say we have this agentic managing an IT support system so suppose a user says my email server is down so the AI can diagnose the issue restart the server and confirms the given resolution now if it is unresolved then AI escalates to a human technician then it results into you know AI is fixing the issues autonomously reducing the downtime okay so this is where your action or act is coming up into the picture now if you go on to the next and the final step which is learning so uh learning basically with the help of data fly wheel it is continuously learning okay so this is the feedback loop all over here which is the data fly wheel so how AI learns over the time if we ask this question so what is exactly happening that it is uh interacting with the data collection suppose AI logs uh successful and failed actions for example like if users correct AI generated responses then AI is learning from those Corrections second thing what you could do is you could model uh you could fine-tune the model and do reinforcement learning so AI adjust its decision- making models you know basically to improve future accuracy it uses reinforcement learning basically to optimize workflows based on past performance okay now uh third step could be automated data labeling and self correction so here what is happening that AI is labeling and categorizing past interactions to refine its knowledge base example like AI autonomously is updating frequently Asked answers based on the recurring user queries so in this way AI is learning over the time uh EX example one you could consider uh so say we have this uh AI is optimizing any financial fraud deduction so say this is uh consider that this is a bank which is AI powered which has this AI powered fraud detection system so AI is analyzing these financial transaction and it is detecting any suspicious activity and if flagged the transactions are false and AI is learning to reduce these false alerts so over the time AI is improving the fraud detection accuracy like minimizing disruptions for the customer so in this way AI is getting smarter over the time like reducing the false alerts and also the financial fraud so let’s have a just quick recap of what uh we studied right now so agentic AI Works in four steps the first step is perceiving where AI is gathering data from databases sensors and apis the next step is reasoning so it is using llm to interpret task applies logic and generating the solution the third step is acting so here AI is integrating with external systems and automating the task and finally it is learning so AI is improving over the time you know bya feedback loop or which is basically called as data fly me so guys uh now let us see this diagram and try to understand what this diagram is trying to say so the first thing you could see an AI agent all over here so this is an AI agent which is basically an autonomous system so which has a capability of perceiving its environment making decision and executing actions without any human intervention now ai agent is acting as the Central Intelligence okay in this given diagram and it interacts with the user okay uh and various other data sources it processes input queries databases makes decision using a large language model and it is executing action and it is learning from the given feedback now the next step you could see the llm model so if you talk about llms these are the large language model model which is kind of an advanced AI model trained on massive amount of Text data to understand generate and reason over natural language now if I talk about this llm so This is actually acting as the reasoning engine all over here and it is interpreting the user inputs and making informed decision it is also retrieving relevant data from the databases generating responses it can also coordinate with multiple AI models for different tasks like it could be content generation okay predictions or decision making now when the user is asking a chat board like for example let’s say what is my account balance so the llm processes the query retrieves the relevant data and responds the given bank balance accordingly now if you look at the kind of database the llm is interacting so we have the traditional database and the vector database so uh here if I say uh the database like AI agent basically squaring the structured database so suppose structure database like it could be a customer records or inventory data or it could be any transactional log also so traditional databases basically store well defined you know structured information okay so for example like when a bank a assistant is processing a query like show my last five transaction so it is basically fetching the information from a traditional SQL based database next we have this Vector database also guys so Vector database is a specialized uh kind of a database for for storing unstructured data which could be like text embeddings images or audio representations so guys like unlike traditional databases that store exact values Vector databases store in a high dimensional mathematical space it allows AI models to search semantically uh similar data instead of like exact matches now ai is retrieving the contextual information from the vector databases which is ex actually enhancing the decision making it is improving the AI memory by allowing the system also to search for you know conceptually similar past interaction let us take a example to understand this for example uh we have discussed about a customer support jackbot So suppose if it queries a vector database to find out similar pass tickets like when responding to a customer query so a recommendation engine could use a vector database to find out similar products on a user’s past preferences so this could be done in that scenario also some of the like popular Vector databases could be like Facebook’s AI similarity search Pine Cone or vv8 these are the certain amazing Vector databases then you could see the next step is you know after it has worked on these given data it is performing the action so the action component is referring where ai’s agent has this ability to now execute task autonomously after the reasoning is done so AI is integrating with external tools apis or automation software to complete the given task it does not provide only information but it is actually uh say you know performing the given action so for example like in a customer support the AI can automatically reset a user’s password after verifying their identity if we talk about in finance then AI can approve a loan also like based on the predefined eligibility criteria now finally we have the data fly wheel so data flywheel is a continuous feedback loop where AI is learning from the past interactions refining its models and it is always improving over the time now every time like the AI is interacting with the data or taking an action or receiving a feedback that information is fed into this model so this is creating a self uh improving AI system that is becoming smarter over the time so the data fly wheel is allowing AI to learn from every interaction and uh AI is becoming more efficient by continuously optimizing responses and refining strategies thing in could be used in a fraud detection so in this the AI is going to learn from the past fraud cases and it is going to detect new fraudulent patterns and more effectively chatbots also can learn from user feedback and improve the responses and finally you have the model customization which is basically you are trying to fine-tune the AI models on specific business need or any industry requirement so AI models are not static like they can be adapted and optimized for a specific task so custom fine-tuning is actually improving the accuracy and domain specific application like it could be Finance Healthcare or cyber security so a financial institution uh say fine-tuning an llm to generate a investment advice okay on a historical market trends that could be one use case or in healthcare if you discuss like the healthcare provider is fine-tuning then AI model to interpret the medical reports and recommend the treatments so guys based on the given diagram you would have got a brief idea like how uh you know agentic AI is working now if we discuss about the future of agentic AI then guys I would say it looks very much promising because it is keep improving itself and it is finding new ways to be useful like with better machine learning algorithms and smarter decision making these AI system will be more uh independent handling complex task on their own and believe me in Industries like healthcare Finance customer service they have already started to see how AI agents can make more impact and it could be more efficient from personalization perspective you know managing resources and many more other things so as this system continue to learn and adapt I think so they will be opening up even more possibilities helping businesses grow improving how we live and work now I would say that uh in conclusion that agentic AI is actually Paving the way for New Opportunities like unlike the old bu versions of AI which was assisting with generating content or predicting the data you know or responding to any queries but agentic AI can perform techniques independently with minimal human effort and agentic AI has become self-reliant in decision making day and it is making very big differences in Industry like Healthcare Logistics customer services which is enabling companies to be more efficient as a result it is providing better services to their clients that’s wrap full course if you have any doubts or question you can ask them in the comment section below our team of experts will reply you as soon as possible thank you and keep learning with simply learn staying ahead in your career requires continuous learning and upskilling whether you’re a student aiming to learn today’s top skills or a working professional looking to advance your career we’ve got you covered explore our impressive catalog of certification programs in cuttingedge domains including data science cloud computing cyber security AI machine learning or digital marketing designed in collaboration with leading universities and top corporations and delivered by industry experts choose any of our programs and set yourself on the path to Career Success click the link in the description to know more hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to nerd up and get certified click here

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 23, 2025
Ollama Course – Build AI Apps Locally
This course teaches users how to leverage the open-source tool, olama, to run large language models (LLMs) locally on their personal computers. The instructor, Paulo, covers olama’s setup, customization, REST API integration, and Python libraries. Several practical applications are demonstrated, including a grocery organizer, RAG system, and AI recruiter agency. The course emphasizes hands-on learning alongside theoretical concepts, requiring basic programming and AI knowledge. Key features highlighted include model management, a unified interface, and cost efficiency through local execution.

Ollama Local LLM Applications Study Guide

Quiz

Instructions: Answer each question in 2-3 sentences.
1. What is Ollama and what is its primary function?
2. According to the course, why is it beneficial to run large language models locally?
3. What are RAG systems and how do they relate to large language models?
4. What are parameters in the context of large language models, and how do they impact performance?
5. Describe the role of context length in a large language model.
6. What does the term “quantization” refer to in relation to LLMs?
7. What is a model file and how can it be used with Ollama?
8. How can the Ollama REST API be utilized?
9. What is the purpose of Langchain in building AI applications with Ollama?
10. Briefly explain how Ollama agents can be leveraged to build more complex applications?
Answer Key
1. Ollama is an open-source tool designed to simplify the process of running large language models locally on your personal computer. Its main function is to manage the installation, execution, and customization of these models, making advanced AI accessible to a wider audience.
2. Running large language models locally with Ollama offers benefits such as being free, providing more control over models, and ensuring better privacy since your data does not need to be sent to external servers. This approach allows you to experiment without relying on cloud-based services.
3. RAG systems, or Retrieval Augmented Generation systems, combine document retrieval with large language models to enhance the models’ knowledge. They work by retrieving relevant information from a knowledge base to augment the prompt so that the LLM can provide responses grounded in your specific data.
4. Parameters in large language models refer to the internal weights and biases the model learns during training. More parameters generally mean a more complex model with a greater capacity to understand and respond accurately, but also require more computational resources.
5. Context length refers to the maximum number of tokens a large language model can process at once in a single input. A longer context length allows the model to handle larger documents, conversations, and can capture dependencies across text spans.
6. Quantization is a technique used to reduce the size of a neural network model by reducing the precision of its weights. This leads to smaller models, faster processing, and lower memory usage.
7. A model file is a configuration file used in Ollama to customize a large language model. It allows developers to modify parameters like temperature and system messages, tailoring the model to perform specific tasks.
8. The Ollama REST API provides an interface to interact with Ollama models through HTTP requests. It allows developers to programmatically generate responses, manage models, and use them in applications without needing the command line interface.
9. Langchain is a framework that simplifies building applications with large language models. It provides tools to load documents, generate embeddings, manage vector databases, and create chains of operations to manage the complexities of LLM applications.
10. Ollama agents, similar to AI agents in general, are components that act autonomously to complete a specific task or a complex series of steps, often using large language models and other tools. They can be used to create complex workflows such as resume analysis or automated recruiting processes.
Essay Questions

Instructions: Answer each question in a well-structured essay format, citing relevant details from the course material.
1. Discuss the benefits and drawbacks of running large language models locally compared to using cloud-based services. What trade-offs should developers consider when making this decision?
2. Explain the process of building a RAG system using Ollama, emphasizing the roles of different components like embedding models, vector databases, and large language models. How does Langchain contribute to the development of these systems?
3. Compare and contrast using the Ollama CLI, the REST API, and a UI-based interface for interacting with large language models. What scenarios are each most suited for and why?
4. Describe how a model file can be used to customize a large language model within Ollama. Provide examples of how changes to settings like temperature and system messages can impact model output.
5. Analyze how AI agents and autonomous systems can be used to build complex workflows with Ollama. Discuss the design considerations and benefits of adopting agent-based approaches for specialized tasks.
Glossary

Agent: In the context of AI, an agent refers to a software component that can operate autonomously to complete a specific task or series of tasks, often leveraging large language models.

API (Application Programming Interface): A set of protocols, routines, and tools for building software applications. In this context, it refers to the REST API offered by Ollama for programmatic interaction with LLMs.

CLI (Command Line Interface): A text-based interface for interacting with a computer program or operating system, in the case of Ollama, it provides direct access to the models through commands.

Context Length: The maximum number of tokens an LLM can process at once in a single input. A longer context length allows the model to handle longer texts and capture dependencies more effectively.

Embeddings: Numerical vector representations of text or other data that capture the semantic meaning and relationships between different pieces of data. Used to allow computers to perform computation on linguistic data.

Extensibility: Refers to the ability to add custom models or extensions to Ollama.

Hallucination: A phenomenon in LLMs where the model generates information that is factually incorrect or does not align with the provided context, often sounding confidently correct.

Langchain: An open-source framework for developing applications with large language models. Provides a unified abstraction for loading documents, embedding, and managing vector databases.

LLM (Large Language Model): A machine learning model trained on a vast amount of text data, capable of understanding and generating human-like text.

Model File: A configuration file used in Ollama to customize LLMs. It allows developers to modify parameters like temperature and system messages, tailoring the model to specific tasks.

Multi-Modal Model: A type of LLM that can understand and process multiple types of data, such as text and images.

Ollama: An open-source tool that simplifies running large language models locally on a personal computer. It manages model downloads, execution, and customization, allowing advanced language processing without external services.

Parameters: The internal weights and biases learned by a neural network during training. They determine how the model processes input data and generates output. More parameters generally indicate a more complex model.

Quantization: A technique used to reduce the size and computational demands of a neural network model by reducing the precision of its weights.

RAG (Retrieval Augmented Generation): A system that combines document retrieval with large language models. It enhances the model’s knowledge by retrieving relevant information from a knowledge base, and allowing the model to give informed responses.

REST API (Representational State Transfer API): A way to interact with web services by sending HTTP requests, the REST API for Ollama allows interaction with LLMs without the command line.

Vector Database (Vector Store): A database that stores data as vector embeddings, specifically designed to handle similarity search.

Olama: Local LLM Development Course

Okay, here’s a detailed briefing document summarizing the key themes and ideas from the provided text about the Olama tool and its associated course:

Briefing Document: Olama – Local LLM Development

Introduction:

This document reviews a mini-course focused on using Olama, an open-source tool that enables the local running of large language models (LLMs) on personal computers. The course, created by Paulo deson, aims to teach developers and other interested individuals how to leverage Olama for building AI solutions without relying on paid cloud services. The course emphasizes a hands-on approach balanced with theoretical understanding.

Main Themes and Key Ideas:
- Olama: Local, Free LLMs: Olama is presented as a solution to the problem of accessing and using large language models, which often involves paid cloud services. It allows developers to download, run, and interact with various LLMs locally on their machines for free. “The idea here is very simple as you know right now if you want to run large language models or if you want to use a model large language model in this case just a model most likely you’ll have to use open Ai chbt and so forth and many others out there that are paid and the thing is with a Lama you don’t have to pay for anything it’s free and that’s the beauty.”
- Simplified LLM Management: Olama simplifies the process of managing, installing, and executing different LLMs via a command-line interface (CLI). It abstracts away the technical complexities involved in setting up and running these models. “ama abstracts away the technicality so the technical complexity that are involved when we want to set up these models which makes Advanced language processing accessible to a broader audience such as developers researchers and hobbyists”
- Local Control and Privacy: By running models locally, users maintain control over their data and ensure privacy, as data is not sent to external servers. This addresses the data privacy concerns associated with cloud-based LLM services. “in this case here when we run our own models locally uh we are making sure that our data doesn’t need to be sent to external servers”
- Key Features of Olama:
- Model Management: Easy download and switching between various LLMs.
- Unified Interface: Consistent set of commands for interacting with models.
- Extensibility: Support for adding custom models and extensions.
- Performance Optimizations: Effective utilization of local hardware, including GPU acceleration.
- Use Cases:
- Development and Testing: Testing various LLMs to determine optimal performance for specific applications.
- RAG (Retrieval Augmented Generation) Systems: Building RAG systems powered by local models for information retrieval and context-aware responses. “the idea is that we’re going to be able to build rag system so retrieval augmented generation systems that are powered solely by AMA models”
- Privacy-focused Applications: Ensuring data privacy by running models on local hardware.
- Course Audience: The course is targeted towards developers, AI engineers, open-minded learners, machine learning engineers, and data scientists who are interested in local LLM application development. It assumes a basic understanding of programming, particularly Python, as well as general knowledge of AI, machine learning, and LLMs. “this course is for developers AI Engineers open-minded Learners machine learning engineers and so forth as well as data scientists so if you are somebody who is willing to put in the work and wants to learn about AMA and build local llm applications then this course is for you”
- Course Structure: The course includes a mix of theory and hands-on learning, with an emphasis on practical application. It begins with the fundamentals and then transitions to hands-on projects where students build AI solutions using Olama. “most of my courses I have this mixture of two things I have Theory so this is where we talk about the fundamental concepts the lingo and so forth and I have Hands-On because it’s all about actually doing things that way you actually understand and know how to get things done that’s the whole point”
- Development Environment: Requires Python installed, a code editor (VS Code is recommended), and a willingness to learn. “in this case you know that this is all P about python which means you’ll have to have python installed and also you have to have some sort of a code editor”
- Olama Installation and Usage: The course demonstrates how to install Olama on different operating systems (MacOS, Linux, Windows). It also shows how to download and run models, and how to interact with them through a command-line interface.
- Understanding Model Parameters: The course touches upon important model parameters, such as parameters (3.2B, 1B), context length, embedding length, and quantization. It clarifies that a higher number of parameters improves accuracy, but increases the computational requirements. “when we talk about parameters talk about 3B or 2B or 10p or 7p and so forth these are numbers inside a neural network that it adjusts to learn how to turn inputs into correct outputs”
- Olama Commands: The course introduces several key Olama commands like list, remove, pull, run, and the use of the model file for customizing models.
- Rest API: The course demonstrates that behind the command line interface there is a rest API that you can interact with to get responses.
- UI based Interface: The course introduces a third party tool called mistral which allows you to interact with AMA models through a UI.
- Python Library: The course also explores the use of Olama through a Python library, which makes it easier to integrate Olama into applications. “we want to be able to create local large language model applications using AMA models and so for that we need a way for us to be able to use python”
- Practical Applications:
- Grocery List Organizer: Creating a tool that categorizes grocery items from a plain text list.
- RAG Systems: Building a full RAG system using Langchain, allowing users to interact with their own documents. “we’re going to build rack systems with AMA so with AMA of course we can build more complex large language model applications”
- AI Recruiter Agency: Developing an AI-powered recruitment tool for processing resumes and providing candidate recommendations using an agent-based system.
Key Quotes:
- “olama is an open-source tool that simplifies running large language models locally on your personal computer”
- “the idea is that we’re going to be able to use AMA to customize our models meaning that we are able to use different flavors of models so we can test them around and all of that is actually going to be free”
- “the idea is that ama sits at the center and allows us developers to pick different large language models depending on the situation depending on what we want to do”
- “the main point here of of course is that we have this Model Management in one place we’re able to easily download and switch between different large language models”
- “the idea is that you find something that will work for you”
- “AMA as we know is a platform that allows you to run large language models locally which is really awesome”
- “AMA model support these tasks here text generation code generation and multimodel applications”
- “the power that we have right now is at all this is locally”
- “we have our own box that we can pass in sensitive documents and all those things without worrying about prices”
- “the great thing here is that it supports various models tailored for different tasks including text generation code generation and multimodel applications”
- “we can now use sort of a a backend combination of the API rest API through the python Library the AMA python library”
- “agents is a really good way to build complex applications”
Conclusion:

This course provides a comprehensive introduction to Olama, demonstrating its potential for local LLM development. By emphasizing hands-on experience and practical applications, the course equips developers with the knowledge and skills needed to create AI solutions that respect privacy and reduce costs. The course demonstrates the practical applications of Olama for tasks such as building a grocery list categorizer, creating RAG systems, and building a complex AI agent based application.

Ollama: A Guide to Local LLMs

Frequently Asked Questions about Ollama
- What is Ollama and what problem does it solve? Ollama is an open-source tool designed to simplify the process of running large language models (LLMs) locally on your own hardware. It addresses the problem of needing to rely on paid cloud-based services like OpenAI or complex setup procedures when using LLMs. By abstracting away technical complexities, Ollama makes advanced language processing accessible to a broader audience such as developers, researchers, and hobbyists, providing a free and private alternative to cloud services.
- Who is this course about Ollama for? This course is tailored for developers, AI engineers, open-minded learners, machine learning engineers, and data scientists who are willing to put in the work to learn about Ollama and build local LLM applications. It assumes a basic understanding of programming (especially Python) and some fundamental knowledge of AI, machine learning, and LLMs.
- What are some key features of Ollama? Ollama has several key features including:
- Model Management: Easily download and switch between different large language models.
- Unified Interface: Interact with various models using one consistent set of commands through the command-line interface (CLI).
- Extensibility: Supports adding custom models and extensions.
- Performance Optimizations: Effectively utilize your hardware, including GPU acceleration where available.
- What are parameters in the context of large language models? Parameters are the internal weights and biases that a model learns during training and determine how the model processes input data and generates output. The number of parameters (e.g., 3.2B) reflects the complexity and capacity of the model, with more parameters typically leading to better performance but also requiring more computational resources. Models like Llama are designed with efficiency in mind, performing well even at smaller scales.
- What are use cases for Ollama? Ollama has a wide range of use cases, including:
- Development and testing: Allows developers to test and switch between models when creating applications.
- Building retrieval augmented generation (RAG) systems: Enables the creation of free, local rag systems.
- Privacy-focused data processing: Keeps data locally, eliminating the need to send information to external servers.
- Custom AI solutions: Allows building tailored large language model applications with free models and control over your data and environment.
- How do you install and run models with Ollama? To install Ollama, you download the appropriate version for your operating system (MacOS, Linux, or Windows). Once installed, you can download and run specific models directly using the CLI, e.g., ollama run llama3:latest to get the latest llama 3 model. Models are managed through the CLI, which allows for downloading, removing, and listing available models. You can then interact with the models directly through the terminal shell.
- Can Ollama models be customized, and how is that done? Yes, Ollama models can be customized by creating a model file, where you can specify model parameters, such as temperature, and system messages. You can create a new version of an existing model using the ollama create command, which uses your defined model file to implement the desired customization, allowing you to fine-tune your models for specific purposes.
- Besides the CLI, how else can you interact with Ollama models? Ollama models can also be interacted with using the REST API, accessible at localhost:11434 when Ollama is running. The REST API allows you to generate responses, chat with models, or fetch metadata using tools like curl and JSON payloads in python. Additionally, user-friendly interfaces like the Mistral app allow you to interact with locally running Ollama models with a GUI, making it similar to using ChatGPT, and integrating with document knowledge bases via retrieval augmented generation (RAG). In addition, code libraries such as python, provide an abstracted way of interacting with the REST API, which will make building LLM applications using your own models locally even simpler.
Olama: Local Large Language Model Toolkit

Olama is a tool that simplifies running large language models locally on a personal computer [1, 2]. It is an open-source tool designed to make advanced language processing accessible to a broader audience, including developers, researchers, and hobbyists [2].

Olama’s applications include:
- Building local large language model applications: Olama allows users to customize models and build applications using them [1].
- Creating retrieval augmented generation (RAG) systems: Olama enables the creation of RAG systems powered by its models [1].
- Model management: Olama allows users to easily download and switch between different large language models [3].
- Development and testing: Developers can test applications that integrate large language models without setting up different environments [3].
- Education and research: Olama provides a platform for learning and experimentation without the barriers associated with cloud services [3].
- Secure applications: Olama is suitable for industries where data privacy is critical, such as healthcare and finance, because models are run locally [4].
- Customization: Olama allows for greater flexibility in customizing and fine-tuning models [5].
Olama addresses the challenges of accessibility, privacy, and cost in the realm of large language models [4]. By enabling local execution, it makes AI technologies more practical for a range of applications [4].

Specific real-world applications include:
- Grocery list organizer: Olama can categorize and sort grocery items [1, 6].
- AI recruiter agency: Olama can be used to build an AI-powered recruitment agency that extracts information from resumes, analyzes candidate profiles, matches candidates with suitable positions, screens candidates, and provides detailed recommendations [1, 7-9].
Olama supports various models tailored for different tasks, including text generation, code generation, and multimodal applications [10]. Olama can be used through a command line interface (CLI), a user interface (UI), or a Python library [11].

Key features of Olama include:
- Model management: The ability to easily download and switch between models [3].
- Unified interface: Interacting with models using a consistent set of commands [3].
- Extensibility: The ability to add custom models and extensions [3].
- Performance optimization: Utilization of local hardware, including GPU acceleration [3].
- Cost-efficiency: Eliminating the need for cloud-based services and associated costs [5].
- Reduced latency: Faster response times due to local execution [5].
- Enhanced privacy and security: Data does not need to be sent to external servers [5].
Olama uses a command line interface (CLI) to manage model installation and execution [12]. The tool abstracts away the technical complexity involved in setting up models, making it accessible to a wider audience [12].

Local LLMs with Olama: Accessibility, Privacy, and Applications

Local Large Language Models (LLMs) can be run on your personal computer using tools like Olama, an open-source tool that simplifies this process [1]. Olama is designed to make advanced language processing more accessible for developers, researchers, and hobbyists [2].

Key aspects of local LLMs and their applications include:
- Accessibility: Olama makes it easier for a broad range of users to utilize LLMs, without requiring specialized knowledge of machine learning frameworks [2].
- Privacy and Security: Running models locally means that your data is not sent to external servers, which enhances privacy and security [3]. This can be especially important for applications dealing with sensitive information [4, 5].
- Cost-Efficiency: Local LLMs eliminate the need for cloud-based services, which means you don’t have to pay for API calls or server usage [4].
- Reduced Latency: Local execution of models reduces delays associated with network communications, leading to faster response times [4].
- Customization: You have greater flexibility in customizing and fine-tuning models to suit specific needs without limitations from third-party services [4].
- Model Management: Olama provides a central place to download, manage, and switch between different LLMs [6].
Olama uses a command-line interface (CLI) to manage models, which abstracts away technical complexities [3]. Olama also has a REST API that can be used to interact with models [7].

Applications of local LLMs using Olama include:
- Building local LLM applications, with the ability to customize models [1].
- Creating Retrieval Augmented Generation (RAG) systems [1]. RAG systems use documents or data to generate responses, thereby augmenting the knowledge of the LLM [3].
- Development and testing of applications that integrate LLMs [6].
- Education and research, providing a platform for learning and experimentation [5].
- Secure applications in industries like healthcare and finance, where data privacy is crucial [5].
- Creating tools that use function calling, which aids LLMs in performing more tasks [8].
- Customizing models for specific purposes [4].
Olama supports a variety of models tailored for different tasks including text generation, code generation, and multimodal applications [9].

Examples of real-world applications include:
- Grocery list organizers that can categorize and sort items [1, 10].
- AI recruiter agencies that can extract information from resumes, analyze candidate profiles, match them to positions, screen them, and provide recommendations [1, 11, 12].
In summary, local LLMs, especially when used with tools like Olama, provide a way to utilize large language models in a private, cost effective and flexible manner [2, 4]. They allow for the development of various applications across diverse fields by allowing people to use LLMs locally [4].

Olama: Customizing Local LLMs

Model customization is a key feature when using local large language models (LLMs) with tools like Olama [1]. Olama is designed to allow users greater flexibility in modifying and fine-tuning models to better suit their specific needs, without being limited by third-party services [1].

Here’s a breakdown of how model customization works with Olama:
- Flexibility: Local execution of models allows for greater flexibility in customizing models [1]. You can adjust models to meet specific requirements without the constraints imposed by third-party services [1].
- Fine-tuning: Olama enables the fine-tuning of models to better suit specific needs [1].
- Model Files: Model files allow for modification and customization of models. These files contain specific instructions and parameters for the model. For example, you can set the temperature of a model, which influences its creativity or directness, and add system messages to instruct the model on how to behave [2, 3].
- Creating Custom Models: With Olama, you can create customized versions of models by specifying a base model and adding parameters through model files [3]. This process allows you to tailor a model’s behavior to your specific needs [3].
- Extensibility: Olama supports adding custom models and extensions [4]. This allows you to integrate models or functionalities that are not available in the standard Olama library [4].
- Parameters: You can customize a model by adjusting parameters like temperature which affects the creativity of the model [3]. The system message parameter, for example, can instruct the model to be succinct and informative [3].
- Model Management: Olama provides a central place to manage different models which can be used interchangeably. You can easily download and switch between different large language models, allowing for testing and selection of the model that best suits your needs [4, 5].
Practical examples of model customization include:
- Adjusting model behavior: By using a model file, you can instruct a model to be more succinct and informative [3]. This is useful in a variety of applications where you need specific responses from the model [3].
- Creating specialized models: You can use a base model and customize it to create a model designed for a specific purpose [3]. This is helpful when you need a model with a focused skill set for a specific task [3].
- Testing and switching models: Olama makes it easy to switch between different models to determine which one performs best for a particular use case. You can test various models to find the one that works for you [4, 5].
- Adapting to different tasks: You can switch between models tailored for various tasks including text generation, code generation, and multimodal applications. You can select the best model for the task you want to perform [6].
By allowing this level of customization, Olama makes it possible to tailor LLMs to very specific applications. The ability to modify models, combined with local execution, provides a versatile way for developers and researchers to use the power of LLMs in various settings [1].

Retrieval Augmented Generation Systems

Retrieval Augmented Generation (RAG) systems are a way to enhance the capabilities of large language models (LLMs) by allowing them to access and use external data sources to generate responses [1, 2]. This approach helps to overcome some limitations of LLMs, such as their limited knowledge base and tendency to “hallucinate,” by providing them with relevant, up-to-date information from a custom knowledge base [2, 3].

Here’s how RAG systems work:
- Indexing:
- Document Loading: Documents in various formats (e.g., PDF, text, URLs, databases) are loaded into the system [4].
- Preprocessing: The loaded documents are parsed and preprocessed. This typically involves breaking the text into smaller, manageable chunks [2-4].
- Embedding: These text chunks are converted into numerical representations called embeddings using an embedding model [2-5]. These embeddings capture the semantic meaning of the text, allowing for similarity comparisons [4, 6].
- Vector Storage: The generated embeddings are stored in a vector database or vector store, which is designed for efficient storage and retrieval of these high-dimensional vectors [2-4, 7].
- Retrieval and Generation:
- Query Embedding: When a user asks a question (the query), that question is also converted into an embedding using the same embedding model [2, 4, 5].
- Similarity Search: The query embedding is used to search the vector database for the most similar document embeddings [2, 5, 6]. This search retrieves the most relevant chunks of text related to the query [4, 5].
- Context Integration: The retrieved document chunks and the original query are combined and passed to the LLM [2, 3, 5].
- Response Generation: The LLM uses the provided context and the query to generate a coherent and informed response [2, 3, 5].
Key components of RAG systems include [7]:
- Large Language Model (LLM): The core component responsible for generating the final response [7]. It leverages its knowledge, reasoning capabilities and is good at predicting things, summarizing and brainstorming [3].
- Document Corpus: The collection of documents that serve as the knowledge base for the system [7].
- Embedding Model: Used to convert both the document chunks and the queries into vector embeddings [2-4].
- Vector Database: A specialized database for storing and efficiently searching through the vector embeddings [2-4, 7].
- Retrieval Mechanism: The process that identifies and retrieves the most relevant document chunks in relation to the query [7].
- Prompt Engineering: Designing prompts that effectively instruct the LLM on how to utilize the provided context to generate answers [8, 9].
Tools like LangChain can simplify the development of RAG systems [7]. LangChain provides abstractions for document loading, splitting, embedding, and integration with various LLMs and vector databases.

Benefits of RAG systems:
- Enhanced Accuracy: RAG systems provide LLMs with external context, which reduces the occurrence of generating responses that are not based on any supporting information [2, 3].
- Up-to-Date Information: By using external knowledge bases, RAG systems can provide more current information than the LLM might have been trained on [3].
- Customization: RAG systems can be tailored to specific domains or use cases by using domain-specific documents [2, 3].
- Reduced Hallucination: The use of external data helps the LLM to avoid making up information [2, 3].
- Improved Transparency: Since the LLM is grounded in retrieved data, it’s easier to trace the source of its answers [5].
Olama can be used to build RAG systems with local LLMs [1, 2]. By enabling local execution of both LLMs and embedding models, Olama provides a cost-effective and private way to build RAG systems [1, 2, 10]. Olama also supports various models that can be used for both embeddings and language generation, allowing for flexibility in the development process [11].

In summary, RAG systems combine the knowledge and reasoning capabilities of LLMs with the specificity of external data sources. These systems are useful when you need an LLM to reason about specific, custom or up-to-date information. This approach enhances the performance of LLMs in many different application scenarios [5, 7].

AI-Powered Recruitment Agencies

AI can be used to build recruitment agencies using tools like Olama and the swarm framework, which allows for the creation of AI agents that perform tasks delegated to them [1, 2]. This setup can automate many parts of the recruitment process, drawing on the power of large language models (LLMs) and AI [2].

Here’s how AI recruitment systems work:
- AI Agents: Specialized AI agents are created to perform different tasks in the recruitment process [2]. Each agent is designed with specific instructions and capabilities, and can delegate tasks to other agents [2-4].
- Base Agent: All agents are built from a base agent, which has the core functionalities needed for the agent to work, such as the connection to the local LLM [3, 5].
- Task Delegation: Agents delegate tasks to other agents, allowing for a structured and efficient workflow.
- Local LLMs: Local LLMs, powered by tools like Olama, are used in the backend, eliminating the need for API calls and third party services [1, 3, 5].
Key agents in an AI recruitment system include [4, 6-8]:
- Extractor Agent: Extracts information from resumes, focusing on personal information, work experience, education, skills, and certifications [6, 7]. It converts the raw text into a structured format.
- Matcher Agent: Matches candidate profiles with job positions based on skills, experience, location, and other criteria [7, 8]. It uses the extracted information from the resume and available job listings to find suitable matches.
- Screener Agent: Screens candidates based on qualifications, alignment, experience, and other factors, generating a screening report [6].
- Profile Enhancer Agent: Enhances candidate profiles based on the extracted information [8].
- Recommender Agent: Generates final recommendations based on the analysis, extracted information and other factors [4].
- Orchestrator Agent: Coordinates the entire recruitment workflow, delegates tasks to other agents, manages the flow of information, maintains context, and aggregates results from each stage [4, 9].
Here are the steps in an AI recruitment system:
- Resume Upload: A resume is uploaded to the system [2, 10].
- Information Extraction: The extractor agent extracts information from the resume [6, 7, 10].
- Analysis: The orchestrator sends the extracted information to the analyzer agent [9, 11].
- Matching: The matcher agent compares the extracted resume information with available job listings to identify potential matches [7, 8].
- Screening: The screener agent performs a screening of the candidate, generating a report [4, 6].
- Recommendation: The recommender agent provides final recommendations [4].
- Result Output: A comprehensive report is generated with a breakdown of skills, job matches, and recommendations.
This system can provide:
- Skill Analysis: A detailed analysis of a candidate’s skills, expertise, and experience [10, 11].
- Job Matches: Identification of potential job matches based on skills and experience, along with match scores and location [10, 11].
- Screening Results: A summary of the candidate’s qualifications and experience relevant to the job [10, 11].
- Final Recommendations: Recommendations for the candidate to enhance their profile, including developing specific skills or gaining further education [10, 11].
Key benefits of an AI recruitment system:
- Efficiency: AI agents can process numerous resumes quickly and efficiently, saving recruiters time.
- Automation: Many steps of the recruitment process are automated, reducing the need for manual tasks.
- Cost Reduction: Local LLMs eliminate costs associated with API calls and cloud-based services [3, 5, 12].
- Customization: The system can be customized to fit specific needs, including using different LLMs or embeddings models [4, 5, 13].
- Context Maintenance: The system maintains context throughout the process ensuring that each agent has all of the necessary information.
- Scalability: The system can be easily scaled to handle multiple resumes.
In conclusion, AI recruitment systems powered by local LLMs and agent frameworks like swarm can streamline the hiring process by automating various tasks, providing comprehensive analysis of candidates, and reducing costs. The flexibility and customization of these systems, combined with the power of LLMs, make them a useful tool for modern recruitment agencies.

Ollama Course – Build AI Apps Locally

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 2, 2025
Prompt Engineering Fundamentals
This course material introduces prompt engineering, focusing on practical application rather than rote memorization of prompts. It explains how large language models (LLMs) function, emphasizing the importance of understanding their underlying mechanisms—like tokens and context windows—to craft effective prompts. The course uses examples and exercises to illustrate how prompt design impacts LLM outputs, covering various techniques like using personas and custom instructions. It stresses the iterative nature of prompt engineering and the ongoing evolution of the field. Finally, the material explores the potential of LLMs and the ongoing debate surrounding artificial general intelligence (AGI).

Prompt Engineering Study Guide

Quiz

Instructions: Answer the following questions in 2-3 sentences each.
1. What is the main focus of the course, according to the instructor?
2. Why is prompt engineering a skill, not a career, in the instructor’s opinion?
3. How did the performance of large language models change as they got larger?
4. What is multimodality, and what are four things a leading LLM can do?
5. What is the purpose of the playground mentioned in the course?
6. What are tokens, and how are they used by large language models?
7. What is temperature in the context of language models, and how does it affect outputs?
8. Explain the “reversal curse” phenomenon in large language models.
9. What are the two stages of training for large language models?
10. How does the system message influence the model’s behavior?
Quiz Answer Key
1. The main focus of the course is working with large language models, teaching how to use this new technology effectively in various aspects of work and life. It is not focused on selling pre-made prompts but on understanding the models themselves.
2. The instructor believes that prompt engineering is a skill that enhances any job, not a standalone career. He argues that it’s a crucial skill for efficiency, not a profession in itself.
3. As models increased in size, performance at certain tasks did not increase linearly but instead skyrocketed, with new abilities emerging that weren’t present in smaller models. This was an unexpected and non-linear phenomenon.
4. Multimodality is the ability of LLMs to understand and generate not only text, but also other modalities like images, the internet, and code. LLMs can accept and generate text, accept images, browse the internet, and execute python code.
5. The playground is a tool that allows users to experiment with and test the different settings of large language models. It is a space where one can fine-tune and better understand the model’s outputs.
6. Tokens are the way that LLMs understand and speak; they are smaller pieces of words that the model analyzes. LLMs determine the sequence of tokens most statistically probable to follow your input, based on training data.
7. Temperature is a setting that controls the randomness of the output of large language models. Lower temperature makes the output more predictable and formalistic, while higher temperature introduces randomness and can lead to creativity or gibberish.
8. The reversal curse refers to the phenomenon where an LLM can know a fact but fail to provide it when asked in a slightly reversed way. For example, it may know that Tom Cruise’s mother is Mary Lee Pfeiffer but not that Mary Lee Pfeiffer is Tom Cruise’s mother.
9. The two stages are pre-training and fine-tuning. In pre-training, the model learns patterns from a massive text dataset. In fine-tuning, a base model is adjusted to be an assistant, typically through supervised learning.
10. The system message acts as a “North Star” for the model, it provides a set of instructions or context at the outset that directs how the model should behave and interact with users. It is the model’s guiding light.
Essay Questions

Instructions: Answer the following questions in essay format. There is no single correct answer for any of the questions.
1. Discuss the concept of emergent abilities in large language models. How do these abilities relate to the size of the model, and what implications do they have for the field of AI?
2. Explain the Transformer model, and discuss why it was such a significant breakthrough in natural language processing. How has it influenced the current state of AI technologies?
3. Critically analyze the role of the system message in prompt engineering. In what ways can it be used to both enhance and undermine the functionality of an LLM?
4. Explore the role of context in prompt engineering, discussing both its benefits and potential pitfalls. How can prompt engineers effectively manage context to obtain the most useful outputs?
5. Discuss the various strategies employed throughout the course to trick or “break” an LLM. What do these strategies reveal about the current limitations of AI technology?
Glossary of Key Terms

Artificial Intelligence (AI): A broad field of computer science focused on creating intelligent systems that can perform tasks that typically require human intelligence.

Base Model: The initial output of the pre-training process in large language model development. It is a model that can do language completion, but is not yet conversational.

Context: The information surrounding a prompt, including previous conversation turns, relevant details, and additional instructions that help a model understand the task.

Context Window: The maximum number of tokens that a large language model can consider at any given time in a conversation. Also known as token limit.

Custom Instructions: User-defined instructions in platforms like ChatGPT that affect every conversation with a model.

Deep Learning: A subfield of machine learning that uses artificial neural networks with multiple layers to analyze data.

Emergent Abilities: Unforeseen abilities that appear in large language models as they scale up in size, which are not explicitly coded but rather learned.

Fine-Tuning: The process of adapting a base model to specific tasks and use cases, usually through supervised learning.

Large Language Model (LLM): A type of AI model trained on vast amounts of text data, used to generate human-like text.

Machine Learning: A subset of AI that enables systems to learn from data without being explicitly programmed.

Mechanistic Interpretability: The field of study dedicated to figuring out what’s happening when tokens pass through all the various layers of the model.

Multimodality: The ability of a language model to process and generate information beyond text, such as images, code, and internet browsing.

Natural Language Processing (NLP): A branch of AI that enables computers to understand, interpret, and generate human language.

Parameters: The internal variables of a large language model that it learns during training, affecting its ability to make predictions.

Persona: The role or identity given to a language model, which influences its tone, style, and the way it responds.

Pre-Training: The initial phase of large language model training, where the model is exposed to massive amounts of text data to learn patterns.

Prompt Engineering: The practice of designing effective prompts that can elicit the desired responses from AI models, particularly large language models.

System Message: The initial instructions or guidelines provided to a large language model by the model creator, which establishes its behavior and role. Also known as meta-prompt or system prompt.

Temperature: A parameter in large language models that controls the randomness of the output. Higher temperature leads to more diverse outputs, while lower temperatures produce more predictable responses.

Tokens: The basic units of text processing for large language models. They are often sub-word units that represent words, parts of words, or spaces.

Transformer Model: A neural network architecture that uses the “attention” mechanism to process sequences of data, such as text, enabling large language models to consider context over long ranges.

Prompt Engineering: Mastering Large Language Models

Okay, here is a detailed briefing document summarizing the key themes and ideas from the provided text, incorporating quotes where appropriate:

Briefing Document: Prompt Engineering Course Review

Introduction:

This document summarizes the main concepts discussed in a course focused on working with Large Language Models (LLMs), often referred to as “prompt engineering.” The course emphasizes practical application and understanding the mechanics of LLMs, rather than rote memorization of specific prompts. It highlights the importance of viewing prompt engineering as a multi-disciplinary skill, rather than a career in itself, for most individuals.

Key Themes and Ideas:
1. Prompt Engineering is More Than Just Prompts:
- The course emphasizes that true “prompt engineering” is not about memorizing or using pre-made prompts. As the instructor Scott states, “it’s not about teaching you 50 promps to boost your productivity…you’re going to learn to work with these large language models.”
- Scott believes that “there are plenty of people out there trying to sell you prompt libraries I think those are useless. They’re single prompts that are not going to produce exactly what you need for your work.” Instead, the course aims to teach how LLMs work “under the hood” so users can create effective prompts for their specific use cases.
1. Prompt Engineering as a Multi-Disciplinary Skill:
- The course defines prompt engineering as “a multi-disciplinary branch of engineering focused on interacting with AI through the integration of fields such as software engineering, machine learning, cognitive science like psychology, business, philosophy, computer science.”
- It stresses that “whatever your area of expertise is…you are going to be able to take that perspective and add it to the field.” This is because the field is new and constantly evolving.
1. Understanding How LLMs Work is Crucial:
- The core idea of the course is that to effectively use LLMs, you need to understand how they function internally. This includes concepts like tokens, parameters, and the Transformer architecture.
- “you need to understand what’s going on behind the scenes so that you can frame your prompt in the right light.”
- The course emphasizes that LLMs are not simply coded programs that have pre-set responses but rather “trained on data and after that training certain abilities emerged.”
- Emergent abilities, new capabilities that appear as models scale in size, demonstrate that these are not simply predictable increases in performance. This “scaling up the model linearly should increase performance linearly, but that’s not what happened.”
1. LLMs are not perfect:
- The course emphasizes that, despite the impressiveness of LLMs, they are still prone to making mistakes due to a few reasons including user error and their design.
- “it’s because we’re not dealing with code or a computer program here in the traditional sense. We’re dealing with a new form of intelligence, something that was trained on a massive data set and that has certain characteristics and limitations.”
- The concept of “hallucinating”, where the LLM produces confident yet false statements, is also important to keep in mind.
1. Multimodality and Capabilities:
- LLMs can handle more than just text. They can process and generate images, browse the internet (to access current information), and execute code, particularly Python code.
- “it can accept and generate text, it can accept images, it can generate images, it can browse the internet…and it can execute python code.”
- The course walks through an example of an LLM creating and refining a simple game by using Python.
1. Tokens are the Foundation:
- LLMs understand and “speak” in tokens, which are sub-word units, not whole words. “one token is equal to about 0.75 words”.
- The model determines the most statistically probable sequence of tokens based on its training data, giving the impression of “guessing” the next word.
- A high temperature setting increases the randomness when picking tokens, leading to more casual and sometimes nonsensical outputs, while a low temperature setting produces more formal output.
1. The Importance of Context and its Limitations:
- Providing sufficient context in prompts improves accuracy.
- However, there is a limitation to the amount of context LLMs can handle at a given time (the token or context window).
- “every time you send a prompt your entire conversation history is bundled up and packed on to the prompt…chat GPT is essentially constantly reminding of your entire conversation.”
- Once the context window fills, older information starts to be forgotten and accuracy can be compromised. This happens without the user necessarily realizing it.
- Information provided at the beginning of a prompt has a larger impact and is remembered better than information provided at the end, in effect creating a “Primacy Effect”. Information in the middle is more readily forgotten. This process mimics how the human brain handles context.
1. The Power of Personas:
- Giving an LLM a specific persona or role (“you are an expert mathematician,” or even a character such as Bilbo Baggins) provides it with crucial context and improves the quality of responses. This allows the user to better interact with and leverage LLMs.
- Personas are often set via the system message or by custom instructions.
1. Custom Instructions
- Users can provide instructions that the LLM uses as its “North Star” much in the same way as a system message.
- These “custom instructions” are used for any new chat, however users may forget about these instructions which may cause problems.
1. LLMs and “Secrets”:
- LLMs are not designed to keep secrets and are susceptible to being tricked into revealing private information given the right prompt.
- The way these LLMs “think” with tokens also enables the spilling of tea by crafting prompts that circumvent normal parameters.
1. The LLM Landscape:
- The course breaks down the LLM landscape into base models, which are trained on data and then further fine-tuned to create chatbot interfaces or domain specific models. The Transformer architecture enables LLMs to pay attention to and incorporate a wider range of context.
- Different companies, such as OpenAI, Anthropic, and Meta, create various models, including open-source ones like Llama 2.
Practical Applications:
- The course focuses on practical applications of prompt engineering. It uses examples such as making a game and generating music using an AI.
- The skills learned in the course can be used to create chatbots, generate code, understand complex documents, and make other helpful outputs to assist in work, study, or just general life.
Conclusion:

This course aims to provide a deep understanding of LLMs and how to effectively interact with them through thoughtful prompt engineering. It prioritizes practical knowledge, emphasizing that it is a “skill” rather than a “career” for most individuals, and that this skill is important for everyone. It is constantly updated with the latest techniques for effective prompting. By understanding the underlying mechanisms and limitations of these models, users can leverage their immense potential in their work and lives.

Prompt Engineering and Large Language Models

Prompt Engineering and Large Language Models: An FAQ
1. What exactly is “prompt engineering” and why is it important?
2. While the term “prompt engineering” is commonly used, it’s essentially about learning how to effectively interact with large language models (LLMs) to utilize their capabilities in various work and life situations. Instead of focusing on memorizing specific prompts, it’s about understanding how LLMs work so you can create effective instructions tailored to your unique needs. It’s a multi-disciplinary skill, drawing from software engineering, machine learning, psychology, business, philosophy, and computer science, and it is crucial for harnessing the full potential of AI for efficiency and productivity. It is considered more of a skill that enhances various roles, rather than a job in and of itself.
3. Why is prompt engineering necessary if LLMs are so advanced?
4. LLMs aren’t just programmed with specific answers; they learn from vast datasets and develop emergent abilities. Prompt engineering is necessary because we’re not dealing with traditional code or programs. We’re working with a form of intelligence that has been trained to predict the most statistically probable sequence of tokens, given the prompt and its training data. By understanding how these models process information, you can learn to frame your prompts in a way that leverages their understanding, yielding more accurate results. Also, prompting techniques can elicit abilities from models that might not be present when prompted in more basic ways.
5. Are prompt libraries or pre-written prompts helpful for prompt engineering?
6. While pre-written prompts can introduce you to what’s possible with LLMs, they are generally not very useful for true prompt engineering. Each user’s needs are unique, so generic prompts are unlikely to provide the results you need for your specific work. You’re better off learning the underlying principles of how to interact with LLMs than memorizing a collection of single-use prompts. It’s about developing an intuitive understanding of how to phrase requests, which enables you to naturally create effective prompts for your situation.
7. What is multimodality in the context of LLMs and how can it be used?
8. Multimodality refers to an LLM’s ability to understand and generate text, images, and even code. This goes beyond simple text inputs and outputs. LLMs can take images as prompts and give text responses to them, browse the internet to access more current data, or even execute code to perform calculations. This means prompts can incorporate diverse inputs and generate diverse outputs, greatly expanding the potential ways that LLMs can be used.
9. What is the “playground” and why might someone use it?
10. The playground is an interface provided by OpenAI (and other companies) that allows you to experiment directly with different LLMs, as well as test advanced settings and features such as temperature (for randomness) and the probability of the next token. It’s an important tool for advanced users to understand how the underlying technology works and to test techniques such as different system messages before implementing them into their products or day-to-day work with AI. It’s relatively inexpensive to use the playground and is a good place to go for more in-depth experimentation with AI tools.
11. What are “tokens” and why are they important?
12. Tokens are the fundamental units that LLMs use to understand and generate language. They’re like words, but LLMs actually break words down into smaller pieces. One token is approximately equivalent to 0.75 words. LLMs do not see words the way humans do; instead they see tokens that have a numerical ID which is part of a complex lookup table. The LLM statistically predicts the most probable sequence of tokens to follow your input, which is why it is often described as a ‘word guessing machine’. A word can consist of multiple tokens. Understanding this helps you see how LLMs are processing information on a basic level. This basic understanding of tokens will help guide your prompts more effectively.
13. What is the significance of “system messages” or “meta prompts” in prompt engineering?
14. A system message is an initial, often hidden, instruction or context that’s provided to the LLM before it interacts with the user. It acts as a “North Star” for the model, guiding its behavior, tone, and style. The system message determines how the model responds to user input and how it will generally interpret all user prompts. Understanding system messages is vital, particularly if you are developing an application that incorporates an LLM. System messages can be modified to tailor the model to various tasks or use cases, but it’s important to be aware that a model will always be pulled back to its original system message. Also, adding specific instructions to the system message will help the model with complex instructions that you want the model to remember for each and every interaction.
15. What is context, and why is it important when prompting, and why does the rule of more context being better not always hold up?
16. Context refers to all the information or details that accompany a prompt, including past conversation history, instructions or details within the prompt itself, and even the system message. More context usually leads to better, more accurate responses. However, LLMs have a limited “token window” (or a context window) which sets a maximum amount of text or context they can manage at any one time. When you exceed this limit, older context tokens are removed. It is imperative that the most important information or context is placed at the beginning of the context window because models have a tendency to pay more attention to the first and last part of a context window, and less to the information in the middle. Additionally, too much context can actually decrease the accuracy of an LLM, because the model will sometimes pay less attention to relevant information, or become bogged down by less relevant information.
Prompt Engineering: A Comprehensive Guide

Prompt engineering is a critical skill that involves developing and optimizing prompts to efficiently use artificial intelligence for specific tasks [1, 2]. It is not typically a standalone career but a skill set needed to use AI effectively [1, 3]. The goal of prompt engineering is to use AI to become more efficient and effective in work and life [2, 3].

Key aspects of prompt engineering include:
- Understanding Large Language Models (LLMs): It is essential to understand how LLMs work under the hood to effectively utilize them when prompting [3]. These models are not simply code; they have emergent abilities that arise as they grow larger [4, 5]. They are sensitive to how prompts are framed, and even slight changes can lead to significantly different responses [2].
- Prompts as Instructions: Prompts are essentially the instructions and context provided to LLMs to accomplish tasks [2]. They are like seeds that grow into useful results [2].
- Elements of a Prompt: A basic prompt has two elements: the input (the instruction) and the output (the model’s response) [6].
- Not Just About Productivity: Prompt engineering is not just about using pre-made prompts to boost productivity. Instead, it is about learning to work with LLMs to utilize them for specific use cases [3, 7, 8].
- Multi-Disciplinary Field: Prompt engineering integrates fields such as software engineering, machine learning, cognitive science, business, philosophy, and computer science [9].
- Importance of Empirical Research: The field is undergoing a lot of research, and prompt engineering should be based on empirical research that shows what works and what doesn’t [10].
- Hands-On Experience: Prompt engineering involves hands-on demos, exercises, and projects, including coding and developing prompts [10]. It requires testing, trying things out, and iterating until the right output is achieved [11, 12].
- Natural Language: Prompt engineering is like programming in natural language. Like programming, specific words and sequences are needed to get the right result [6].
- Beyond Basic Prompts: It’s more than just asking a question; it’s about crafting prompts to meet specific needs, which requires understanding how LLMs work [6, 7, 13].
Applied Prompt Engineering involves using prompt engineering principles in the real world to improve work, career, or studies [13, 14]. It includes using models to complete complex, multi-step tasks [8].

Why Prompt Engineering is Important:
- Maximizing Potential: It is key to using LLMs productively and efficiently to achieve specific goals [8].
- Avoiding Errors and Biases: Proper prompt engineering helps to minimize errors and biases in the model’s output [8].
- Programming in Natural Language: Prompt engineering is an example of programming using natural language [15].
- Future Workplace Skill: Prompt engineering skills will be essential in the workplace, just like Microsoft Word and Excel skills are today [3, 10]. A person with the same skills and knowledge but who also knows how to use AI through prompt engineering will be more effective [16].
Tools for Prompt Engineering:
- Chat GPT: The user interface to interact with LLMs [16, 17].
- OpenAI Playground: An interface for interacting with the OpenAI API that allows for more control over the LLM settings [16, 18].
- Replit: An online integrated development environment (IDE) to run coding applications [19].
Key Concepts in Prompt Engineering:
- Tokens: The way LLMs understand and speak. Words are broken down into smaller pieces called tokens [20].
- Attention Mechanism: This allows the model to pay more attention to more context [21, 22].
- Transformer Architecture: An architecture that allows the model to pay attention to more context, enabling better long-range attention [22, 23].
- Parameters: The “lines” and “dots” that enable the model to recognize patterns. LLMs compress data through parameters and weights [24, 25].
- Base Model: A model resulting from the pre-training phase, which is not a chatbot but rather a model that completes words or tokens [25].
- Fine-Tuning: The process of taking the base model and giving it additional text information so it can generate more helpful and specific output [25, 26].
- System Message: A default prompt provided to the model by its creator that sets the stage for interactions by including instructions or specific context [27]. It is like a North Star, guiding the model’s behavior [27, 28].
- Context: The additional information provided to the LLM that helps it better understand the task and respond accurately [29].
- Token Limits: LLMs have token limits, which are the maximum amount of words they can remember at any given time. This also acts as a context window [30, 31].
- Recency Effect: The effect of information being more impactful when given towards the end [32, 33].
- Personas: Giving the model a persona or role can help it provide better, more accurate responses [34, 35]. Personas work because they provide additional context [35].
This summary should provide a clear overview of what prompt engineering is and its key components.

Large Language Models: An Overview

Large Language Models (LLMs) are a type of machine learning model focused on understanding and generating natural language text [1, 2]. They are characterized by being trained on vast amounts of text data and having numerous parameters [2]. LLMs are a subset of Natural Language Processing (NLP), which is a branch of Artificial Intelligence focused on enabling computers to understand text and spoken words the same way human beings do [1, 3].

Here’s a more detailed breakdown of key aspects of LLMs:
- Size and Training: The term “large” in LLMs refers to the fact that these models are trained on massive datasets, often consisting of text from the internet [2, 4]. These models also have a large number of parameters, which are the “lines” and “dots” that enable the model to recognize patterns [4, 5]. The more tokens and parameters, the more capable a model generally is [6].
- Parameters: Parameters are part of the model’s internal structure that determine how it processes information [5, 7]. They can be thought of as the “neurons” in the model’s neural network [7].
- Emergent Abilities: LLMs exhibit emergent abilities, meaning that as the models become larger, new capabilities arise that weren’t present in smaller models [8, 9]. These abilities aren’t explicitly programmed but emerge from the training process [8].
- Tokens: LLMs understand and process language using tokens, which are smaller pieces of words, rather than the words themselves [10]. Each token has a unique ID, and the model predicts the next token in a sequence [11].
- Training Process: The training of an LLM typically involves two main phases:
- Pre-training: The model is trained on a large corpus of text data to learn patterns and relationships within the text [7]. This results in a base model [12].
- Fine-tuning: The base model is further trained using a more specific dataset, often consisting of ideal questions and answers, to make it better at completing specific tasks or behaving like a helpful assistant [12, 13]. The fine tuning process adjusts the parameters and weights of the model, which also impacts the calculations within the model and creates emergent abilities [13].
- Transformer Architecture: LLMs utilize a transformer architecture, which allows the model to pay attention to a wider range of context, improving its ability to understand the relationships between words and phrases, including those separated by large distances [6, 14]. This architecture helps enable better long-range attention [14].
- Context Window: LLMs have a limited context window, meaning they can only remember a certain number of tokens (or words) at once [15]. The token limit acts as a context window [16]. The context window is constantly shifting, and when a new prompt is given, the older information can be shifted out of the window, meaning that the model may not have all of the prior conversation available at any given time [15, 16]. Performance is best when relevant information is at the beginning or end of the context window [17].
- Word Guessing: At their core, LLMs are essentially “word guessing machines”, determining the most statistically probable sequence of tokens to follow a given prompt, based on their training data [11, 18].
- Relationship to Chatbots: LLMs are often used as the underlying technology for chatbots. For example, the GPT models from OpenAI are used by the ChatGPT chatbot [2, 19]. A chatbot is essentially a user interface or “wrapper” that makes it easy for users to interact with a model [20]. The system message provides a default instruction to the model created by the creator of the model [21]. Custom instructions can also be added to change the model’s behavior [22].
- Task-Specific Models: Some models are fine-tuned for specific tasks. For example, GitHub Copilot uses the GPT model but has been further fine-tuned for code generation [19, 20].
- Limitations: LLMs can sometimes provide incorrect or biased information, and they can also struggle with math [23, 24]. These models can also hallucinate (make things up) [25, 26]. They may also learn that A=B but not that B=A, which is known as the “reversal curse” [27]. Also, the model may only remember information in the context window and can forget information from the beginning of a conversation [16].
In summary, LLMs are sophisticated models that process and generate language using statistical probabilities, trained on extensive datasets and incorporating architectures that allow for better context awareness, but are also limited by context windows, and other factors, and may produce errors or biased results..

AI Tools and Prompt Engineering

AI tools, particularly those powered by Large Language Models (LLMs), are becoming increasingly prevalent in various aspects of work and life [1-4]. These tools can be broadly categorized based on their underlying model and specific functions [5, 6].

Here’s a breakdown of key aspects regarding AI tools, drawing from the sources:
- LLMs as the Foundation: Many AI tools are built upon LLMs like GPT from OpenAI, Gemini from Google, Claude from Anthropic, and Llama from Meta [5-8]. These models provide the core ability to understand and generate natural language [5, 6].
- Chatbots as Interfaces:
- Chatbots like ChatGPT, Bing Chat, and Bard use LLMs as their base [5, 6]. They act as a user interface (a “wrapper”) that allows users to interact with the underlying LLM through natural language [5, 6].
- The user interface makes it easier to input prompts and receive outputs [6]. Without it, interaction with an LLM would require code [6].
- Chatbots also have a system message, which is a default prompt that is provided by the chatbot’s creator to set the stage for interactions and guides the model [9, 10].
- Custom instructions can also be added to chatbots to further change the model’s behavior [11].
- Task-Specific AI Tools:
- These tools are designed for specific applications, such as coding, writing, or other domain-specific tasks [6, 7].
- Examples include GitHub Copilot, Amazon CodeWhisperer (for coding), and Jasper AI and Copy AI (for writing) [6, 7].
- They often use a base model that has been fine-tuned for their specific purposes [6, 7]. For example, GitHub Copilot uses a modified version of OpenAI’s GPT model fine-tuned for code generation [7].
- Task-specific tools may also modify the system message or system prompt to further customize the model’s behavior [6, 12].
- Custom AI Tools: AI tools can also be customized to learn a specific subject, improve mental health, or complete a specific task [13].
- Multimodality: Some advanced AI tools, like ChatGPT, can handle multiple types of input and output [14]:
- Text : They can generate and understand text [14].
- Images: They can accept images and generate images [14-16].
- Internet: They can browse the internet to gather more current information [17].
- Code: They can execute code, specifically Python code [17].
- Prompt Engineering for AI Tools:
- Prompt engineering is the key to using AI tools effectively [13].
- It helps maximize the potential of AI tools, avoid errors and biases, and ensure the tools are used efficiently [13].
- The skill of prompt engineering involves crafting prompts that provide clear instructions to the AI tool, guiding it to produce the desired output [4, 13].
- It requires an understanding of how LLMs work, including concepts like tokens, context windows, and attention mechanisms [2, 12, 18, 19].
- Effective prompts involve more than simply asking a question; they involve understanding the task, the capabilities of the AI tool, and the science of prompt engineering [4].
- Using personas and a unique tone, style and voice with AI tools can make them more intuitive for humans to use, improve their accuracy, and help them to be on brand [20, 21].
- By setting up a tool with custom instructions, it’s possible to effectively give the tool a new “North Star” or behavior profile [11, 22].
- Importance of Training Data: The effectiveness of an AI tool depends on the data it has been trained on [23]. The training process involves both pre-training on a vast amount of text data and then fine-tuning on a specific dataset to enhance its capabilities [24, 25].
In summary, AI tools are diverse and powerful, with LLMs acting as their core technology. These tools range from general-purpose chatbots to task-specific applications. Prompt engineering is a critical skill for maximizing the effectiveness of these tools, allowing users to tailor their behavior and output through carefully crafted prompts [13]. Understanding how LLMs function, and having clear and specific instructions are key for success in using AI tools [4, 12].

Prompt Engineering: Principles and Best Practices

Prompt engineering involves the development and optimization of prompts to effectively use AI for specific tasks [1]. It is a skill that can be used by anyone and everyone, regardless of their job or technical background [2]. The goal of prompt engineering is to use AI to become more efficient and effective in work by understanding how Large Language Models (LLMs) function [2]. It is a multi-disciplinary branch of engineering focused on interacting with AI through the integration of fields such as software engineering, machine learning, cognitive science, business, philosophy, and computer science [3, 4].

Key principles of prompt engineering include:
- Understanding LLMs: It’s important to understand how LLMs work under the hood, including concepts like tokens, the transformer architecture, and the context window [2]. LLMs process language using tokens, which are smaller pieces of words [5]. They also use a transformer architecture, allowing them to pay attention to more context [6].
- Prompts as Instructions: A prompt is essentially the instructions and context given to LLMs to accomplish a task [1]. It’s like a seed that you plant in the LLM’s mind that grows into a result [1]. Prompts are like coding in natural language, requiring specific words and sequences to get the right result [3].
- Prompt Elements: A basic prompt consists of two elements, an input (the question or instruction) and an output (the LLM’s response) [3].
- Iterative Process: Prompt engineering is an iterative process of testing, trying things out, evaluating, and adjusting until the desired output is achieved [7].
- Standard Prompts: The most basic type of prompt is the standard prompt, which consists only of a question or instruction [8]. These are important because they are often the starting place for more complex prompts, and can be useful for gathering information from LLMs [9].
- Importance of Context: Providing the LLM with more information or context generally leads to a better and more accurate result [10]. Context includes instructions, background information, and any other relevant details. It helps the LLM understand the task and generate a more helpful response. More context means more words and tokens for the model to analyze, causing the attention mechanism to focus on relevant information and reducing the likelihood of errors [11]. However, providing too much context can also be detrimental, as LLMs have token limits [12, 13].
- Context Window: LLMs have a limited context window (also known as a token limit), which is the number of tokens (or words) the model can remember at once [12, 13]. Once that limit is reached, the model will forget information from the beginning of the conversation. Therefore, it is important to manage the context window to maintain the accuracy and coherence of the model’s output [12].
- Primacy and Recency Effects: Information placed at the beginning or end of a context window is more likely to be accurately recalled by the model, while information in the middle can get lost [14-16]. For this reason, place the most important context at the beginning of a prompt [16].
- Personas: Giving an LLM a persona or role can provide additional context to help it understand the task and provide a better response [17-19]. Personas help to prime the model to think in a certain way. Personas can be functional and fun [20, 21].
- Tone, Style, and Voice: A persona can also include a specific tone, style, and voice that are unique to the task, which can help produce more appropriate and nuanced outputs [21].
- Custom Instructions: Custom instructions are a way to give the model more specific information about what you want it to know or how you want it to respond [21]. This is similar to giving the model a sub system message.
In summary, prompt engineering is about understanding how LLMs work and applying that understanding to craft effective prompts that guide the model toward accurate, relevant, and helpful outputs. By paying attention to detail and incorporating best practices, users can achieve much more with LLMs and tailor them to meet their specific needs and preferences [22].

Mastering Prompt Engineering with LLMs

This course provides an in-depth look at prompt engineering and how to work with large language models (LLMs) [1]. The course emphasizes gaining practical, real-world skills to put you at the forefront of the AI world [1]. It aims to teach you how to use AI to become more efficient and effective in your work [2]. The course is taught by Scott Kerr, an AI enthusiast and practitioner [1].

Here’s an overview of the key components of the course:
- Focus on Practical Skills: The course focuses on teaching how to work with LLMs for specific use cases, rather than providing a library of pre-made prompts [2]. It emphasizes learning by doing, with numerous exercises and projects, including guided and unguided projects [1]. The projects include coding games and using autonomous agents, among other tasks [3].
- Understanding LLMs: A key part of the course involves diving deep into the mechanics of LLMs, understanding how they work under the hood, and using that knowledge when prompting them [2].
- This includes understanding how LLMs use tokens [4], how they use the transformer architecture [5], and the concept of a context window [6].
- The course also covers the training process of LLMs and the difference between base models and assistant models [7].
- Prompt Engineering Principles: The course teaches prompt engineering as a multi-disciplinary branch of engineering that requires integrating fields such as software engineering, machine learning, cognitive science, business, philosophy, and computer science [8]. The course provides a framework for creating complex prompts [9]
- Standard Prompts: The course starts with the most basic prompts, standard prompts, which are a single question or instruction [10].
- Importance of Context: The course teaches the importance of providing the LLM with more information or context, which includes providing relevant instructions and background information to get more accurate results [11].
- The course emphasizes placing key information at the beginning or end of the prompt for best results [12].
- Managing the Context Window: The course emphasizes the importance of managing the limited context window of the LLMs, to maintain accuracy and coherence [6].
- System Messages: The course discusses the importance of the system message, which acts as the “North Star” for the model, and it teaches users how to create their own system message for specific purposes [13].
- Personas: The course teaches the use of personas to give LLMs a specific role, tone, style and voice, to make them more useful for humans to use [14, 15].
- Applied Prompt Engineering: The course emphasizes using prompt engineering principles in real-world scenarios to make a difference in your work [16]. The course shows the difference in responses between a base model and an assistant model, using LM Studio, to emphasize the importance of applied prompt engineering [7].
- Multimodality: The course introduces the concept of multimodality and how models like Chat-GPT can understand and produce images as well as text, browse the internet, and execute python code [17-19].
- Tools and Set-Up: The course introduces different LLMs, including the GPT models by Open AI, which can be used through chat-GPT [20]. It also teaches how to use the Open AI playground to interact with the models [20, 21]. The course also emphasizes the importance of using the chat-GPT app to use on a daily basis [22].
- Emphasis on Empirical Research: The course is grounded in empirical research and peer-reviewed studies conducted by AI researchers [3].
- Up-to-Date Information: The course is designed to provide the most up-to-date information in a constantly changing field and is dedicated to continually evolving [23].
- Projects and Exercises: The course includes hands-on demos, exercises, and guided and unguided projects to develop practical skills [3]. These include coding games and using autonomous agents [1].
- Evaluation: The course introduces the concept of evaluating and testing prompts, because in order to be scientific, the accuracy and success of prompts needs to be measurable [24].
In summary, the course is structured to provide a blend of theoretical knowledge and practical application, aiming to equip you with the skills to effectively utilize LLMs in various contexts [1]. It emphasizes a deep understanding of how these models work and the best practices for prompt engineering, so that you can use them to your advantage.

Learn Prompt Engineering: Full Beginner Crash Course (5 HOURS!)

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
February 24, 2025
Harvard CS50’s Artificial Intelligence with Python – Full University Course
This source explains how AI can be used for problem-solving, moving from explicit instructions to learning from data. It introduces supervised learning, where AI learns to map inputs to outputs using labeled datasets, covering classification tasks and nearest neighbor algorithms. The source also discusses linear regression, support vector machines, and techniques like perceptron learning. It transitions to reinforcement learning, where AI learns through rewards and punishments in an environment, and touches on unsupervised learning with clustering techniques like k-means. Finally, the document explores neural networks, detailing their structure, training via gradient descent and backpropagation, and their applications in various AI problems.

Propositional Logic, Model Checking, and Beyond: A Comprehensive Study Guide

I. Review of Key Concepts
- Propositional Logic: A system for representing logical statements and reasoning about their truth values.
- Propositional Symbols: Variables representing simple statements that can be either true or false (e.g., P, Q, R).
- Logical Connectives: Symbols used to combine propositional symbols into more complex statements:
- and (∧): Both statements must be true for the combined statement to be true.
- or (∨): At least one statement must be true for the combined statement to be true.
- not (¬): Reverses the truth value of a statement.
- implies (→): If the first statement is true, then the second statement must also be true.
- biconditional (↔): Both statements have the same truth value (both true or both false).
- Knowledge Base (KB): A set of sentences representing facts known about the world.
- Query (α): A question about the world that we want to answer using the KB.
- Entailment (KB ⊨ α): The relationship between the KB and a query, meaning that the KB logically implies the query; whenever the KB is true, the query must also be true.
- Model: An assignment of truth values (true or false) to all propositional symbols in the language. Represents a possible world or state.
- Model Checking: An algorithm for determining entailment by enumerating all possible models and checking if, in every model where the KB is true, the query is also true.
- Inference Algorithm: A procedure to derive new sentences from existing ones in the KB.
- Inference Rules: Logical equivalences used to manipulate and simplify logical expressions (e.g., implication elimination, De Morgan’s laws, distributive law).
- Soundness: An inference algorithm is sound if it only derives conclusions that are entailed by the KB.
- Completeness: An inference algorithm is complete if it can derive all conclusions that are entailed by the KB.
- Conjunctive Normal Form (CNF): A logical sentence expressed as a conjunction (AND) of clauses, where each clause is a disjunction (OR) of literals.
- Clause: A disjunction of literals (e.g., P or not Q or R).
- Literal: A propositional symbol or its negation (e.g., P, not Q).
- Resolution: An inference rule that combines two clauses containing complementary literals to produce a new clause.
- Factoring: Removing duplicate literals within a clause.
- Empty Clause: The result of resolving two contradictory clauses, representing a contradiction (always false).
- Inference by Resolution: An algorithm for proving entailment by converting the KB and the negation of the query to CNF, and then repeatedly applying the resolution rule until the empty clause is derived.
- Joint Probability Distribution: A table showing the probabilities of all possible combinations of values for a set of random variables.
- Inclusion-Exclusion Formula: A formula for calculating the probability of A or B: P(A or B) = P(A) + P(B) – P(A and B).
- Marginalization: Calculating the probability of a variable by summing over all possible values of other variables: P(A) = Σ P(A and B).
- Conditioning: Expressing the probability of A in terms of the conditional probability of A given B and the probability of B: P(A) = P(A|B) * P(B) + P(A|¬B) * P(¬B).
- Conditional Probability: The probability of event A occurring given that event B has already occurred, denoted P(A|B).
- Random Variable: A variable whose value is a numerical outcome of a random phenomenon.
- Heuristic Function: An estimate of the “goodness” of a state (e.g., the distance to the goal).
- Local Search: A class of optimization algorithms that start with an initial state and iteratively improve it by moving to neighboring states.
- Hill Climbing: A local search algorithm that repeatedly moves to the neighbor with the highest value.
- Steepest Ascent Hill Climbing: Chooses the best neighbor among all neighbors in each iteration.
- Stochastic Hill Climbing: Chooses a neighbor randomly from the neighbors that are better than the current state.
- First Choice Hill Climbing: Chooses the first neighbor with a higher value and moves there.
- Random Restart Hill Climbing: Runs hill climbing multiple times with different initial states and returns the best result.
- Local Beam Search: Keeps track of k best states and expands all of them in each iteration.
- Local Maximum/Minimum: A state that is better than all its neighbors but not the best state overall.
- Simulated Annealing: A local search algorithm that sometimes accepts worse neighbors with a probability that decreases over time (temperature).
- Temperature (in Simulated Annealing): A parameter that controls the probability of accepting worse neighbors; high temperature means higher probability, and low temperature means lower probability.
- Delta E (ΔE): The difference in value (or cost) between the current state and a neighboring state.
- Traveling Salesman Problem (TSP): Finding the shortest possible route that visits every city and returns to the origin city.
- NP-Complete Problems: A class of problems for which no known polynomial-time algorithm exists.
- Linear Programming: A mathematical technique for optimizing a linear objective function subject to linear equality and inequality constraints.
- Objective Function: A mathematical expression to be minimized or maximized in linear programming.
- Constraints: Restrictions or limitations on the values of variables in linear programming.
- Constraint Satisfaction Problem (CSP): A problem where the goal is to find values for a set of variables that satisfy a set of constraints.
- Variables (in CSP): Entities with associated domains of possible values.
- Domains (in CSP): The set of possible values that can be assigned to a variable.
- Constraints (in CSP): Restrictions on the values that variables can take, specifying allowable combinations of values.
- Unary Constraint: A constraint involving only one variable.
- Binary Constraint: A constraint involving two variables.
- Node Consistency: Ensuring that all values in a variable’s domain satisfy the variable’s unary constraints.
- Arc Consistency: Ensuring that for every value in a variable’s domain, there exists a consistent value in the domain of each of its neighboring variables.
- AC3: A common algorithm for enforcing arc consistency.
- Backtracking Search: A recursive algorithm that explores possible solutions by trying different values for variables and backtracking when a constraint is violated.
- Minimum Remaining Values (MRV) Heuristic: A variable selection strategy that chooses the variable with the fewest remaining legal values.
- Degree Heuristic: A variable selection strategy that chooses the variable involved in the largest number of constraints on other unassigned variables.
- Least Constraining Value Heuristic: A value selection strategy that chooses the value that rules out the fewest choices for neighboring variables in the constraint graph.
- Supervised Machine Learning: A type of machine learning where an algorithm learns from labeled data to make predictions or classifications.
- Inputs (x): The features or attributes used by a machine learning model to make predictions.
- Outputs (y): The target variables or labels that a machine learning model is trained to predict.
- Hypothesis Function (h): A mathematical function that maps inputs to outputs.
- Weights (w): Parameters in a machine learning model that determine the importance of each input feature.
- Learning Rate (α): A parameter that controls the step size during training.
- Threshold Function: A function that outputs one value if the input is above a threshold and another value if the input is below the threshold.
- Logistic Regression: A statistical method for binary classification using a logistic function to model the probability of a certain class or event.
- Soft Threshold: A function that smoothly transitions between two values, allowing for outputs between 0 and 1.
- Dot Product: A mathematical operation that multiplies corresponding elements of two vectors and sums the results.
- Gradient Descent: An iterative optimization algorithm for finding the minimum of a function.
- Stochastic Gradient Descent: An optimization algorithm that updates the parameters of a machine learning model using the gradient computed from a single randomly chosen data point.
- Mini-Batch Gradient Descent: An optimization algorithm that updates the parameters of a machine learning model using the gradient computed from a small batch of data points.
- Neural Networks: A type of machine learning model inspired by the structure of the human brain, consisting of interconnected nodes (neurons) organized in layers.
- Activation Function: A function applied to the output of a neuron in a neural network to introduce non-linearity.
- Layers (in Neural Networks): A level of nodes that receive input from other nodes and pass their output to additional nodes.
- Natural Language Processing (NLP): The branch of AI that deals with the interaction between computers and human language.
- Syntax: The set of rules that govern the structure of sentences in a language.
- Semantics: The meaning of words, phrases, and sentences in a language.
- Formal Grammar: A set of rules for generating sentences in a language.
- Context-Free Grammar: A type of formal grammar where rules consist of a single non-terminal symbol on the left-hand side.
- Terminal Symbol: A symbol that represents a word in a language.
- Non-Terminal Symbol: A symbol that represents a phrase or category of words in a language.
- Rewriting Rules: Rules that specify how non-terminal symbols can be replaced by other symbols.
- Noun Phrase: A phrase that functions as a noun.
- Verb Phrase: A phrase that functions as a verb.
- Natural Language Toolkit (NLTK): A Python library for NLP.
- Parsing: The process of analyzing a sentence according to the rules of a grammar.
- Syntax Tree: A hierarchical representation of the structure of a sentence.
- Statistical NLP: An approach to NLP that uses statistical models learned from data.
- n-gram: A contiguous sequence of n items from a sample of text.
- Markov Chain: A sequence of events where the probability of each event depends only on the previous event.
- Tokenization: The process of splitting a sequence of characters into pieces (tokens).
- Text Classification: The task of assigning a category label to a text.
- Sentiment Analysis: Determining the emotional tone or attitude expressed in a piece of text.
- Bag-of-Words Model: A text representation that represents a document as the counts of its words, disregarding grammar and word order.
- Term Frequency (TF): The number of times a term appears in a document.
- Inverse Document Frequency (IDF): A measure of how rare a term is across a collection of documents.
- TF-IDF: A weight used in information retrieval and text mining that reflects how important a word is to a document in a corpus.
- Stop Words: Common words that are often removed from text before processing.
- Word Embeddings: Vector representations of words that capture semantic relationships.
- One-Hot Representation: A vector representation where each word is represented by a vector with a 1 in the corresponding index and 0s elsewhere.
- Distributed Representation: A vector representation where the meaning of a word is distributed across multiple values.
- Word2Vec: A model for learning word embeddings.
II. Short Answer Quiz
1. Explain the difference between soundness and completeness in the context of inference algorithms. Soundness means that any conclusion drawn by the algorithm is actually entailed by the knowledge base. Completeness means that the algorithm is capable of deriving every conclusion that is entailed by the knowledge base.
2. Describe the process of converting a logical sentence into Conjunctive Normal Form (CNF). The process involves eliminating bi-conditionals and implications, moving negations inward using De Morgan’s laws, and using the distributive law to get a conjunction of clauses where each clause is a disjunction of literals.
3. What is the purpose of using the resolution inference rule in propositional logic? The resolution rule is used to derive new clauses from existing ones, aiming to ultimately derive the empty clause, which indicates a contradiction and proves entailment.
4. Explain the marginalization rule and provide a simple example. Marginalization calculates the probability of a variable by summing over all possible values of other variables. For example, if you want to know the probability that someone likes ice cream, you would take the probability of them liking ice cream and liking chocolate times the probability that they like chocolate.
5. What is the key idea behind local search algorithms? Local search algorithms start with an initial state and iteratively improve it by moving to neighboring states, based on some evaluation function, without necessarily keeping track of the path taken to reach the solution.
6. Describe how simulated annealing helps to avoid local optima. Simulated annealing accepts worse neighbors with a probability that decreases over time, allowing the algorithm to escape local optima early in the search and converge towards a global optimum later.
7. In linear programming, what are the roles of the objective function and constraints? The objective function is what we want to minimize or maximize, while constraints are limitations on the values of variables that must be satisfied.
8. What is the purpose of enforcing arc consistency in a constraint satisfaction problem (CSP)? Enforcing arc consistency reduces the domains of variables by removing values that cannot be part of any solution due to binary constraints, making the search for a solution more efficient.
9. Explain the difference between a one-hot representation and a distributed representation in NLP. A one-hot representation represents a word as a vector with a 1 in the corresponding index and 0s elsewhere, while a distributed representation distributes the meaning of a word across multiple values in a vector.
10. How do word embedding models like Word2Vec capture semantic relationships between words? Word2Vec captures semantic relationships by training a model to predict the context words surrounding a given word in a large corpus, resulting in vector representations where similar words are located close to each other in vector space.
III. Essay Questions
1. Compare and contrast model checking and inference by resolution as methods for determining entailment in propositional logic. Discuss the advantages and disadvantages of each approach.
2. Explain how local search algorithms can be applied to solve optimization problems. Discuss the challenges of local optima and describe techniques, such as simulated annealing, for overcoming these challenges.
3. Describe the general framework of a constraint satisfaction problem (CSP). Discuss the role of variable and value selection heuristics in improving the efficiency of backtracking search for solving CSPs.
4. Explain the process of training a machine learning model for sentiment analysis. Discuss the different text representation techniques, such as bag-of-words and TF-IDF, and the role of word embeddings.
5. Describe the key concepts in Natural Language Processing (NLP), including syntax and semantics. Discuss how NLP techniques are used to understand and generate natural language.
IV. Glossary of Key Terms
- Activation Function: A function applied to the output of a neuron in a neural network to introduce non-linearity, enabling the network to learn complex patterns.
- Arc Consistency: A constraint satisfaction technique ensuring that for every value in a variable’s domain, there exists a consistent value in the domain of each of its neighboring variables based on the problem constraints.
- Backtracking Search: A recursive algorithm that explores possible solutions by trying different values for variables and backtracking when a constraint is violated, allowing the algorithm to systematically search the solution space.
- Bag-of-Words Model: A text representation in NLP that represents a document as the counts of its words, disregarding grammar and word order, which helps quantify the content of texts for analysis.
- Clause: In logic, it is the statement that combines different literals with “or” relationship.
- Complete: An inference algorithm that can derive all conclusions entailed by the KB.
- Conditioning: A probability rule that expresses the probability of one event in terms of its conditional probability, and this rule is used to find the probabilities that are unknown with the information given.
- Conjunctive Normal Form (CNF): A standardized logical sentence expressed as a conjunction (AND) of clauses, where each clause is a disjunction (OR) of literals, simplifying logical deductions.
- Constraints: Limitation to the conditions of the variables in linear programing or constraint satisfaction problems.
- Context-Free Grammar: A type of formal grammar where rules consist of a single non-terminal symbol on the left-hand side, used to define the syntax of programming languages.
- Delta E (ΔE): The difference in value between the current state and its neighboring states.
- Distributed Representation: It describes the meaning of the representation of a word distributing over multiple values in vector which is the idea behind the word embedding technique.
- Domain: The set of possible values that can be assigned to a variable.
- Entailment (KB ⊨ α): KB logically implies that α; whenever KB is true, so does α, which is the relationship that is important when the machine needs to find if the conclusion is correct or not.
- Formal Grammar: A set of rules for generating sentences in a language, and those rules are applied in order to find what it is that is trying to be said in language analysis.
- Heuristic Function: It estimates the ‘goodness’ of a state (e.g., the distance to the goal), which will let machine learning models take efficient and near perfect results.
- Hill Climbing: This iterative optimization algorithm is characterized by continuously searching to find better solution while moving to a better neighbor and also have the highest value.
- Hypothesis Function (h): This function maps inputs to outputs and can be used to learn and predict.
- Inclusion-Exclusion Formula: Used to find the P(A or B), in which it finds the P(A), P(B), P(A and B), and finds P(A)+P(B)-P(A and B) in result.
- Inference Algorithm: A procedure to derive new sentences from existing ones in the KB.
- Joint Probability Distribution: A table showing the probabilities of all possible combinations of values for a set of random variables.
- Knowledge Base (KB): A set of sentences representing facts known about the world.
- Layers (in Neural Networks): A level of nodes that receive input from other nodes and pass their output to additional nodes.
- Learning Rate (α): It controls the step size during the machine learning algorithm.
- Linear Programming: A mathematical technique for optimizing a linear objective function subject to linear equality and inequality constraints.
- Literal: A propositional symbol or its negation (e.g., P, not Q) that describes the condition of a statement.
- Local Maximum/Minimum: A state that is better than all its neighbors but not the best state overall.
- Local Search: A class of optimization algorithms that start with an initial state and iteratively improve it by moving to neighboring states.
- Logistic Regression: A statistical method for binary classification using a logistic function to model the probability of a certain class or event.
- Marginalization: Calculating the probability of a variable by summing over all possible values of other variables: P(A) = Σ P(A and B).
- Markov Chain: A sequence of events where the probability of each event depends only on the previous event, allowing modeling of sequences over time.
- Model: An assignment of truth values (true or false) to all propositional symbols in the language that represents the state.
- Model Checking: An algorithm for determining entailment by enumerating all possible models and checking if, in every model where the KB is true, the query is also true.
- n-gram: A contiguous sequence of n items from a sample of text that helps in analyzing languages and predicting text.
- Natural Language Processing (NLP): The field of AI that is related to the understanding of human language.
- Noun Phrase: A phrase that functions as a noun to use for language parsing.
- NP-Complete Problems: A class of problems for which no known polynomial-time algorithm exists.
- Objective Function: An mathematical function to be minimized or maximized in linear programming.
- One-Hot Representation: A vector representation where each word is represented by a vector with a 1 in the corresponding index and 0s elsewhere.
- Parsing: This process of taking a sentence and analyzing it according to grammar rules in NLP.
- Propositional Logic: A system for representing logical statements and reasoning about their truth values.
- Query (α): The question that we want to answer using the KB.
- Random Variable: A variable whose value is a numerical outcome of a random phenomenon.
- Rewriting Rules: Rules that specify how non-terminal symbols can be replaced by other symbols.
- Semantics: the meaning of words, phrases, and sentences in a language, which helps with extracting the insights and understanding of language.
- Simulated Annealing: A local search algorithm that sometimes accepts worse neighbors with a probability that decreases over time (temperature).
- Soft Threshold: A function that smoothly transitions between two values, allowing for outputs between 0 and 1.
- Soundness: An inference algorithm is sound if it only derives conclusions that are entailed by the KB.
- Statistical NLP: An approach to NLP that uses statistical models learned from data.
- Steepest Ascent Hill Climbing: Chooses the best neighbor among all neighbors in each iteration.
- Stop Words: Common words that are often removed from text before processing.
- Syntax: The set of rules that govern the structure of sentences in a language.
- Syntax Tree: A hierarchical representation of the structure of a sentence, used to know how a structure looks with a graphical approach.
- Temperature (in Simulated Annealing): A parameter that controls the probability of accepting worse neighbors; high temperature means higher probability, and low temperature means lower probability.
- Tokenization: The process of splitting a sequence of characters into pieces (tokens), which allows for language parsing and to read for machines.
- Traveling Salesman Problem (TSP): Finding the shortest possible route that visits every city and returns to the origin city.
- Unary Constraint: A constraint involving only one variable.
- Verb Phrase: A phrase that functions as a verb to be analyzed in parsing.
- Weights (w): Parameters in a machine learning model that determine the importance of each input feature, letting it know the emphasis on each feature.
- Word Embeddings: Vector representations of words that capture semantic relationships.
- Word2Vec: A model for learning word embeddings by knowing what words mean, learning and classifying similar words.
AI: Reasoning, Search, NLP, and Learning Techniques

Here’s a briefing document summarizing the main themes and ideas from the provided sources.

Briefing Document: Artificial Intelligence – Reasoning, Search, and Natural Language Processing

Overview:

The sources cover several fundamental concepts in Artificial Intelligence (AI), including logical reasoning, search algorithms, probabilistic reasoning, and natural language processing (NLP). They explore techniques for representing knowledge, drawing inferences, solving problems through search, handling uncertainty, and enabling computers to understand and generate human language.

I. Logical Reasoning and Inference:
- Entailment and Inference Algorithms: The core idea is that AI systems should be able to determine if a knowledge base (KB) entails a query (alpha). This means: “Given some query about the world…the question we want to ask…is does KB, our knowledge base, entail alpha? In other words, using only the information we know inside of our knowledge base…can we conclude that this sentence alpha is true?”
- Model Checking: This is a basic inference algorithm. It involves enumerating all possible models (assignments of truth values to variables) and checking if, in every model where the knowledge base is true, the query (alpha) is also true. “If we wanted to determine if our knowledge base entails some query alpha, then we are going to enumerate all possible models…And if in every model where our knowledge base is true, alpha is also true, then we know that the knowledge base entails alpha.”
- Inference Rules: These are logical transformations used to derive new knowledge from existing knowledge. Examples include:
- Implication Elimination: alpha implies beta can be transformed into not alpha or beta. “This is a way to translate if-then statements into or statements… if I have the implication, alpha implies beta, that I can draw the conclusion that either not alpha or beta”
- Biconditional Elimination: a if and only if b becomes a implies b and b implies a.
- De Morgan’s Laws: These laws relate ANDs and ORs through negation. not (alpha and beta) is equivalent to not alpha or not beta. And not (alpha or beta) is equivalent to not alpha and not beta. “If it is not true that alpha and beta, well, then either not alpha or not beta… if you have a negation in front of an and expression, you move the negation inwards, so to speak…and then flip the and into an or.”
- Distributive Law: alpha and (beta or gamma) is equivalent to (alpha and beta) or (alpha and gamma).
- Conjunctive Normal Form (CNF): A standard form for logical sentences where it is represented as a conjunction (AND) of clauses, where each clause is a disjunction (OR) of literals (propositional symbols or their negations). “A conjunctive normal form sentence is a logical sentence that is a conjunction of clauses…a conjunction of clauses means it is an and of individual clauses, each of which has ors in it.”
- Resolution: An inference rule that applies to clauses in CNF. If you have P or Q and not P or R, you can resolve them to get Q or R. This involves dealing with factoring (removing duplicate literals) and the empty clause (representing a contradiction). “…if I have two clauses where there’s something that conflicts or something complementary between those two clauses, I can resolve them to get a new clause, to draw a new conclusion.”
- Inference by Resolution: To prove that a knowledge base entails a query (alpha), we assume not alpha and try to derive a contradiction (the empty clause) using resolution. “We want to prove that our knowledge base entails some query alpha…we’re going to try to prove that if we know the knowledge and not alpha, that that would be a contradiction…To determine if our knowledge base entails some query alpha, we’re going to convert knowledge base and not alpha to conjunctive normal form”
II. Search Algorithms:
- Search Problems: Defined by an initial state, actions, a transition model, a goal test, and a path cost function.
- Local Search: Algorithms that operate on a single current state and move to neighbors. They don’t care about the path to the solution.
- Hill Climbing: A simple local search algorithm that repeatedly moves to the neighbor with the highest value (or lowest cost). It suffers from problems with local maxima/minima. “Generally, what hill climbing is going to do is it’s going to consider the neighbors of that state…and pick the highest one I can…continually looking at all of my neighbors and picking the highest neighbor…until I get to a point…where I consider both of my neighbors and both of my neighbors have a lower value than I do.”
- Variations: Steepest ascent, stochastic, first choice, random restart, local beam search.
- Simulated Annealing: A local search algorithm that sometimes accepts worse moves to escape local optima. The probability of accepting a worse move depends on the “temperature” and the difference in cost (delta E). “whereas before, we never, ever wanted to take a move that made our situation worse, now we sometimes want to make a move that is actually going to make our situation worse…And so how do we do that? How do we decide to sometimes accept some state that might actually be worse? Well, we’re going to accept a worse state with some probability.”
- Linear Programming: A family of problems where the goal is to minimize a cost function subject to linear constraints. “the goal of linear programming is to minimize a cost function…subject to particular constraints, subjects to equations that are of the form like this of some sequence of variables is less than a bound or is equal to some particular value”
III. Constraint Satisfaction Problems (CSPs):
- Definition: Problems defined by variables, domains (possible values for each variable), and constraints.
- Node Consistency: Ensuring that all values in a variable’s domain satisfy the unary constraints (constraints involving only that variable). “…we can pick any of these values in the domain. And there won’t be a unary constraint that is violated as a result of it.”
- Arc Consistency: Ensuring that all values in a variable’s domain satisfy the binary constraints (constraints involving two variables). “In order to make some variable x arc consistent with respect to some other variable y, we need to remove any element from x’s domain to make sure that every choice for x, every choice in x’s domain, has a possible choice for y.”
- AC3: An algorithm for enforcing arc consistency across an entire CSP. It maintains a queue of arcs and revises domains to ensure consistency. “AC3 takes a constraint satisfaction problem. And it enforces our consistency across the entire problem…It’s going to basically maintain a queue or basically just a line of all of the arcs that it needs to make consistent.”
- Backtracking Search: A depth-first search algorithm for solving CSPs. It assigns values to variables one at a time, backtracking when a constraint is violated.
- Minimum Remaining Values (MRV): A heuristic for variable selection that chooses the variable with the fewest remaining legal values in its domain. “Select the variable that has the fewest legal values remaining in its domain…In the example of the classes and the exam slots, you would prefer to choose the class that can only meet on one possible day.”
- Degree Heuristic: A heuristic used to select what the best variable will be. “The general approach is that in cases of ties, where two or more of the classes each can only have one possible day of the exam left, we want to choose the one that is involved in the most constraints, the one that we expect to potentially have the bigger impact on the overall problem”
- Least Constraining Value: A heuristic for value selection that chooses the value that rules out the fewest choices for neighboring variables. “Loop over the values in the domain that we haven’t yet tried and pick the value that rules out the fewest values from the neighboring variables.”
IV. Probabilistic Reasoning:
- Joint Probability Distribution: A table showing the probabilities of all possible combinations of values for a set of random variables.
- Inclusion-Exclusion Principle: Used to calculate the probability of A or B: P(A or B) = P(A) + P(B) – P(A and B). Deals with the problem of overcounting when calculating probabilities.
- Marginalization: A rule used to calculate the probability of a variable by summing over all possible values of other variables. “I need to sum up not just over B and not B, but for all of the possible values that the other random variable could take on…I’m going to sum up over j, where j is going to range over all of the possible values that y can take on. Well, let’s look at the probability that x equals xi and y equals yj.”
- Conditioning: Similar to marginalization, but uses conditional probabilities instead of joint probabilities.
V. Supervised Learning:
- Hypothesis Function: A function that maps inputs to outputs. In supervised learning the input consists of a set of labeled data points, each with multiple features and one associated value, or ‘label’. The job of supervised learning is to ‘learn’ a model that correctly maps an input consisting of a data point with multiple features to a corresponding output.
- Weights: Parameters of the hypothesis function that determine the importance of different input features. “We’ll generally call that number a weight for how important should these variables be in trying to determine the answer.”
- Threshold Function: A function that outputs one category if the weighted sum of inputs is above a threshold and another category otherwise. “If we do all this math, is it greater than or equal to 0? If so, we might categorize that data point as a rainy day. And otherwise, we might say, no rain.”
- Logistic Regression: Uses a logistic function (sigmoid) instead of a hard threshold, allowing for probabilistic outputs between 0 and 1. “Instead of using this hard threshold type of function, we can use instead a logistic function…And as a result, the possible output values are no longer just 0 and 1…But you can actually get any real numbered value between 0 and 1.”
- Gradient Descent: An iterative optimization algorithm used to find the optimal weights for a model by repeatedly updating the weights in the direction of the negative gradient of the cost function. “And we can use gradient descent to train a neural network, that gradient descent is going to tell us how to adjust the weights to try and lower that overall cost on all the data points.”
- Stochastic Gradient Descent: Updates the weights based on a single randomly chosen data point at each iteration.
- Mini-Batch Gradient Descent: Updates the weights based on a small batch of data points at each iteration.
- Neural Networks: A network of interconnected nodes (neurons) organized in layers. Each connection has a weight. Neural networks take an input and ‘learn’ to modify the weight of each connection to accurately map an input to an output. A simple neural network consists of an input layer and an output layer, while more complex neural networks consist of several hidden layers between input and output. “we create a network of nodes…and if we want, we can connect all of these nodes together such that every node in the first layer is connected to every node in the second layer…And each of these edges has a weight associated with it.”
- Activation Function: A function applied to the output of each node in a neural network to introduce non-linearity. “You take the inputs, you multiply them by the weights, and then you typically are going to transform that value a little bit using what’s called an activation function.”
- Multi-Class Classification: A classification problem with more than two categories. Can be handled using neural networks with multiple output nodes, each representing the probability of belonging to a particular class.
VI. Natural Language Processing (NLP):
- Syntax: The structure of language.
- Semantics: The meaning of language. “While syntax is all about the structure of language, semantics is about the meaning of language. It’s not enough for a computer just to know that a sentence is well-structured if it doesn’t know what that sentence means.”
- Formal Grammar: A system of rules for generating sentences in a language.
- Context-Free Grammar (CFG): A type of formal grammar that defines rules for rewriting non-terminal symbols into terminal symbols (words) or other non-terminal symbols. “a context-free grammar is some system of rules for generating sentences in a language…We’re going to give the computer some rules that we know about language and have the computer use those rules to make sense of the structure of language.”
- NLTK (Natural Language Toolkit): A Python library for NLP tasks.
- N-grams: Contiguous sequences of n items (characters or words) from a sample of text.
- Tokenization: The process of splitting a sequence of characters into pieces, such as words.
- Markov Chain: A sequence of values where one value can be predicted based on the preceding values. Can be used for language generation. “Recall that a Markov chain is some sequence of values where we can predict one value based on the values that came before it…we can use that to predict what word might come next in a sequence of words.”
- Text Classification: The problem of assigning a category or label to a piece of text.
- Sentiment Analysis: A specific text classification task that involves determining the sentiment (positive, negative, neutral) of a piece of text.
- Bag of Words: A representation of text as a collection of words, disregarding grammar and word order, but keeping track of word frequencies. “With the bag of words representation, I’m just going to keep track of the count of every single word, which I’m going to call features.”
- TF-IDF (Term Frequency-Inverse Document Frequency): A weighting scheme that assigns higher weights to words that are frequent in a document but rare in the overall corpus.
- One-Hot Representation: A vector representation of a word where one element is 1 and all other elements are 0. “Each of these words now has a distinct vector representation. And this is what we often call a one-hot representation, a representation of the meaning of a word as a vector with a single 1 and all of the rest of the values are 0.”
- Distributed Representation: A vector representation of a word where the meaning is distributed across multiple values, ideally in such a way that similar words have similar vector representations.
- Word Embeddings: Distributed representations of words that capture semantic relationships.
- Word2Vec: A model for generating word embeddings based on the context in which words appear. “we’re going to define the meaning of a word based on the words that appear around it, the context words around it…we’re going to say is because the words breakfast and lunch and dinner appear in a similar context, that they must have a similar meaning.”
This briefing document provides a high-level overview of the concepts covered in the sources. It highlights key definitions, algorithms, and techniques used in AI.

NLP, ML, and Problem Solving: FAQ

Natural Language Processing, Machine Learning and Problem Solving: FAQ

1. What is the core concept of “entailment” in the context of knowledge bases and inference algorithms, and how does model checking help determine entailment?

Entailment refers to whether a knowledge base (KB) logically implies a query (alpha). In other words, can you conclude that alpha is true solely based on the information within the KB? Model checking is an algorithm that answers this by enumerating all possible models (assignments of true/false to propositional symbols). If, in every model where the KB is true, alpha is also true, then the KB entails alpha. Essentially, it exhaustively checks if alpha must be true whenever the KB is true.

2. Explain the model checking algorithm, including how it enumerates models and determines if a knowledge base entails a query.

The model checking algorithm involves the following steps:
1. Enumerate all possible models: List every possible combination of truth values (true or false) for all propositional symbols in the knowledge base and query.
2. Evaluate the knowledge base in each model: Determine if the knowledge base (KB) is true or false in each of the enumerated models.
3. Check the query in models where the KB is true: For every model where the KB is true, check if the query (alpha) is also true.
- Determine entailment:If alpha is true in every model where the KB is true, then the KB entails alpha.
- If there exists at least one model where the KB is true but alpha is false, then the KB does not entail alpha.
3. What are inference rules in propositional logic, and give examples of implication elimination, biconditional elimination, and De Morgan’s laws?

Inference rules are logical equivalences that allow you to transform logical sentences into different, but logically equivalent, forms. This is useful for drawing new conclusions from existing knowledge. Here are some examples:
- Implication Elimination: alpha implies beta is equivalent to not alpha or beta. This replaces an implication with an OR statement.
- Biconditional Elimination: alpha if and only if beta is equivalent to (alpha implies beta) and (beta implies alpha). This breaks down a biconditional into two implications.
- De Morgan’s Laws:not (alpha and beta) is equivalent to not alpha or not beta. The negation of a conjunction is the disjunction of the negations.
- not (alpha or beta) is equivalent to not alpha and not beta. The negation of a disjunction is the conjunction of the negations.
4. Describe the conjunctive normal form (CNF) and explain the steps to convert a logical formula into CNF.

Conjunctive Normal Form (CNF) is a standard logical format where a sentence is represented as a conjunction (AND) of clauses, and each clause is a disjunction (OR) of literals. A literal is either a propositional symbol or its negation. The steps to convert a formula to CNF are:
1. Eliminate Biconditionals: Replace all alpha <-> beta with (alpha -> beta) ^ (beta -> alpha).
2. Eliminate Implications: Replace all alpha -> beta with ~alpha v beta.
3. Move Negations Inwards: Use De Morgan’s laws to move negations inward, so they apply only to literals (e.g., ~ (alpha ^ beta) becomes ~alpha v ~beta).
4. Distribute ORs over ANDs: Use the distributive law to transform the expression into a conjunction of clauses (e.g., alpha v (beta ^ gamma) becomes (alpha v beta) ^ (alpha v gamma)).
5. Explain the resolution inference rule and the resolution algorithm for proving entailment. What is “inference by resolution,” and how does the empty clause relate to contradiction?

The resolution inference rule states that if you have two clauses, alpha OR beta and ~alpha OR gamma, you can infer beta OR gamma. It essentially eliminates a complementary pair of literals (alpha and ~alpha) and combines the remaining literals into a new clause. “Inference by resolution” uses this rule repeatedly to derive new clauses.

The resolution algorithm for proving entailment involves:
1. Negate the query: To prove KB entails alpha, assume ~alpha.
2. Convert to CNF: Convert KB AND ~alpha into CNF.
3. Resolution Loop: Repeatedly apply the resolution rule to pairs of clauses in the CNF formula. Add any new clauses generated back into the set of clauses. If factoring is needed, remove any duplicate literals in resulting clause.
4. Check for Empty Clause: If, at any point, you derive the “empty clause” (a clause with no literals, representing “false”), this means you’ve found a contradiction.
5. Determine Entailment: If you derive the empty clause, then KB entails alpha (because KB AND ~alpha leads to a contradiction, so it must be the case that if KB is true, then alpha must be true). If you can no longer derive new clauses and haven’t found the empty clause, then KB does not entail alpha.
The empty clause signifies a contradiction because it represents a situation where both P and NOT P are true, which is impossible. Finding the empty clause through resolution proves that the initial assumption (the negated query) was inconsistent with the knowledge base.

6. Explain the inclusion-exclusion principle and the marginalization rule in probability theory, providing examples of their application.
- Inclusion-Exclusion Principle: This principle calculates the probability of A OR B. The formula is: P(A or B) = P(A) + P(B) – P(A and B). It is used to correct for over counting when calculating P(A or B).
- Example: The probability of rolling a 6 on a red die (A) OR a 6 on a blue die (B). If you just add P(A) + P(B), you’re double-counting the case where both dice show 6. Subtracting P(A and B) (the probability of both dice showing 6) corrects for this.
- Marginalization Rule: This rule allows you to calculate the probability of one variable (A) by summing over all possible values of another variable (B). The formula is: P(A) = Σ P(A and B).
- Example: Probability of it being cloudy (A), given the joint distribution of cloudiness and raininess (B). We calculate P(cloudy) by summing P(cloudy and rainy) + P(cloudy and not rainy). We consider all possible cases that take place, and then look at the probability that the probability of A happens in each of the cases. This is useful for finding an individual (unconditional) probability from a joint probability distribution.
7. Describe the hill climbing algorithm, including its pseudocode, and discuss its limitations (local optima). Also explain variations like stochastic hill climbing and random restart hill climbing.

The hill climbing algorithm is a local search technique used to find a maximum (or minimum) of a function. Its pseudocode is as follows:
1. Start with a current state (often random).
2. Loop: a. Find the neighbor of the current state with the highest (or lowest) value. b. If the neighbor is better than the current state, move to the neighbor ( current = neighbor). c. If the neighbor is not better, terminate and return the current state.
A major limitation of hill climbing is that it can get stuck in local optima: points that are better than their immediate neighbors but not the best overall solution.

Variations:
- Stochastic Hill Climbing: Randomly choose a neighbor with a better value, rather than always picking the best neighbor. This can help escape plateaus (areas of the search space with relatively equal value), but not always a local optimum.
- Random Restart Hill Climbing: Run the hill climbing algorithm multiple times from different random starting states. Keep track of the best solution found across all runs. This increases the chance of finding the global optimum by exploring different regions of the search space.
8. Explain the simulated annealing algorithm and how it can potentially escape local optima compared to simple hill climbing.

Simulated Annealing is a metaheuristic optimization algorithm that can be used for finding the global minimum of a function that may possess several local minima. Simulated Annealing works by first randomly picking a state. Then the algorithm calculates the cost of the state and then makes a neighbor of the state to calculate that cost as well. If the neighbor cost is better, than the new current state becomes the new neighbor. However, simulated annealing adds a twist. Even if the neighbor cost is not better than the current state, you still have a probability of setting the current state to the new worse neighbor to try and dislodge yourself.

This probability is based on a temperature. At the beginning, the temperature is high so there is a better probability to dislodge yourself and explore the search space even if it may lead to worse results at first. As the algorithm iterates, the temperature starts to go down, so it slowly starts to look for better neighbors instead of just exploring and dislodging.

Simulated Annealing is thus better than simple hill climbing because simple hill climbing never goes to a state that may lead to worse results, so as a result gets stuck in local optima as described in the hill climbing algorithm, which SA doesn’t suffer from.

Supervised Learning: Classification, Regression, and Evaluation

Supervised learning is a type of machine learning where a computer is given access to a dataset of input-output pairs and learns a function that maps inputs to outputs. The computer uses the data to train its model and understand the relationships between inputs and outputs. The goal is for the AI to learn to predict outputs based on new input data.

Key aspects of supervised learning:
- Input-output pairs: The computer is provided with a dataset where each data point consists of an input and a corresponding desired output.
- Function mapping: The goal is to find a function that accurately maps inputs to outputs, allowing the computer to make predictions on new, unseen data.
- Training: The computer uses the provided data to train its model, adjusting its internal parameters to minimize the difference between its predictions and the actual outputs.
Classification and regression are two common tasks within supervised learning.
- Classification: Aims to map inputs into discrete categories. An example would be classifying a banknote as authentic or counterfeit based on its features.
- Regression: Aims to predict continuous output values. For example, predicting sales based on advertising spending.
Implementation and evaluation
- Libraries such as Scikit-learn in Python provide tools to implement supervised learning algorithms.
- The data is typically split into training and testing sets. The model is trained on the training set and evaluated on the testing set to assess its ability to generalize to new data.
- Holdout cross-validation splits the data into training and testing sets. The training set trains the machine learning model. The testing set tests how well the machine learning model performs.
- K-fold cross-validation divides data into k different sets and runs k different experiments.
Machine Learning: Algorithms, Techniques, and Applications

Machine learning involves enabling computers to learn from data and experiences without explicit instructions. Instead of programming a computer with explicit rules, machine learning allows the computer to learn patterns from data and improve its performance on a specific task.

Key aspects of machine learning:
- Learning from Data: Machine learning algorithms use data to identify patterns, make predictions, and improve decision-making.
- Algorithms and Techniques: Machine learning encompasses a wide range of algorithms and techniques that enable computers to learn from data.
- Pattern Recognition: Machine learning algorithms identify underlying patterns and relationships within data.
Machine learning comes in different forms, including supervised learning, reinforcement learning and unsupervised learning.
- Supervised learning involves training a model on a labeled dataset consisting of input-output pairs, enabling the model to learn a function that maps inputs to outputs.
- Reinforcement learning involves training an agent to make decisions in an environment to maximize a reward signal.
- Unsupervised learning involves discovering patterns and relationships in unlabeled data without explicit guidance. Clustering is a task preformed in unsupervised learning that involves organizing a set of objects into distinct clusters or groups of similar objects.
Neural networks are a popular tool in machine learning inspired by the structure of the human brain and can be very effective at certain tasks. A neural network is a mathematical model for learning inspired by biological neural networks. Artificial neural networks can model mathematical functions and learn network parameters.

TensorFlow is a library that can be used for creating neural networks, modeling them, and running them on sample data.

Machine learning has a wide variety of applications including: recognizing faces in photos, playing games, understanding human language, spam detection, search and optimization problems, and more.

Neural Networks: Models, Training, and Applications

Neural networks are a popular tool in modern machine learning that draw inspiration from the way human brains learn and reason. They are a type of model that is effective at learning from some set of input data to figure out how to calculate some function from inputs to outputs.

Key aspects of neural networks:
- Mathematical Model: A neural network is a mathematical model for learning inspired by biological neural networks.
- Units: Instead of biological neurons, neural networks use units inside of the network. The units can be represented like nodes in a graph.
- Layers: Neural networks are composed of multiple layers of interconnected nodes or units, including an input layer, one or more hidden layers, and an output layer.
- Weights: Connections between units are defined by weights. The weights determine how signals are passed between connected nodes.
- Activation Functions: Activation functions introduce non-linearity into the network, allowing it to learn complex patterns and relationships in the data.
- Backpropagation: Backpropagation is a key algorithm that makes training multi-layered neural networks possible. The backpropagation algorithm is used to adjust the weights in the network during training to minimize the difference between predicted and actual outputs.
- Versatility: Neural networks are versatile tools applicable to a number of domains.
There are different types of neural networks, each designed for specific tasks:
- Feed-forward neural networks have connections that only move in one direction. The inputs pass through hidden layers and ultimately produce an output.
- Convolutional neural networks (CNNs) are designed for processing grid-like data, such as images. CNNs apply convolutional layers and pooling layers to extract features from images.
- Recurrent neural networks (RNNs) are designed for processing sequential data, such as text or time series. RNNs have connections that loop back into themselves, allowing them to maintain a hidden state that captures information about the sequence. Long short-term memory (LSTM) neural network is a popular type of RNN.
Training Neural Networks:
- Gradient descent is a technique used to train neural networks by minimizing a loss function. Gradient descent involves iteratively adjusting the weights of the network based on the gradient of the loss function with respect to the weights.
- Stochastic gradient descent randomly chooses one data point at a time to calculate the gradient based on, instead of calculating it based on all of the data points.
- Mini-batch gradient descent divides the data set up into small batches, groups of data points, to calculate the gradient based on.
- Overfitting occurs when a neural network is too complex and fits the training data too closely, resulting in poor generalization to new data.
- Dropout is a technique used to combat overfitting by randomly removing units from the neural network during training.
TensorFlow is a library that can be used for creating neural networks, modeling them, and running them on sample data.

Understanding Gradient Descent in Neural Networks

Gradient descent is an algorithm inspired by calculus for minimizing loss when training a neural network. In the context of neural networks, “loss” refers to how poorly a hypothesis function models data.

Key aspects of gradient descent:
- Loss Function: Gradient descent aims to minimize a loss function, which quantifies how poorly the neural network performs.
- Gradient Calculation: The algorithm calculates the gradient of the loss function with respect to the network’s weights. The gradient indicates the direction in which the weights should be adjusted to reduce the loss.
- Weight Update: The weights are updated by taking a small step in the direction opposite to the gradient. The size of this step can vary and is chosen when training the neural network.
- Iterative Process: This process is repeated iteratively, adjusting the weights little by little based on the data points, with the aim of converging towards a good solution.
There are variations to the standard gradient descent algorithm:
- Stochastic Gradient Descent: Instead of looking at all data points at once, stochastic gradient descent randomly chooses one data point at a time to calculate the gradient. This provides a less accurate gradient estimate but is faster to compute.
- Mini-Batch Gradient Descent: This approach is a middle ground between standard and stochastic gradient descent, where the data set is divided into small batches and the gradient is calculated based on these batches.
Understanding Neural Network Hidden Layers

Hidden layers are intermediate layers of artificial neurons or units within a neural network between the input layer and the output layer.

Here’s more about hidden layers and how they contribute to neural network functionality:
- Structure and Function In a neural network, the input layer receives the initial data, and the output layer produces the final result. The hidden layers lie in between, performing complex transformations on the input data to help the network learn non-linear relationships.
- Nodes and Connections Each hidden layer contains a certain number of nodes or units, each connected to the nodes in the preceding and following layers. The connections between nodes have weights, which are adjusted during training to optimize the network’s performance.
- Activation Each unit calculates its output value based on a linear combination of all the inputs. The advantage of layering like this gives an ability to model more complex functions.
Backpropagation: One of the challenges of neural networks is training neural networks that have hidden layers inside of them. The input data provides values for all of the inputs, and what the value of the output should be. However, the input data does not provide what the values for all of the nodes in the hidden layer should be. The key algorithm that makes training the hidden layers of neural networks possible is called backpropagation.

Deep Neural Networks: Neural networks that contain multiple hidden layers are called deep neural networks. The presence of multiple hidden layers allows the network to model more complex functions. Each layer can learn different features of the input, and these features can be combined to produce the desired output. However, complex networks are at greater risk of overfitting.

Dropout: Dropout is a technique that can combat overfitting in neural networks. It involves temporarily removing units from the network during training to prevent over-reliance on any single node.

Harvard CS50’s Artificial Intelligence with Python – Full University Course

The Original Text

This course from Harvard University explores the concepts and algorithms at the foundation of modern artificial intelligence, diving into the ideas that give rise to technologies like game-playing engines, handwriting recognition, and machine translation. You’ll gain exposure to the theory behind graph search algorithms, classification, optimization, reinforcement learning, and other topics in artificial intelligence and machine learning. Brian Yu teaches this course. Hello, world. This is CS50, and this is an introduction to artificial intelligence with Python with CS50’s own Brian Yu. This course picks up where CS50 itself leaves off and explores the concepts and algorithms at the foundation of modern AI. We’ll start with a look at how AI can search for solutions to problems, whether those problems are learning how to play a game or trying to find driving directions to a destination. We’ll then look at how AI can represent information, both knowledge that our AI is certain about, but also information and events about which our AI might be uncertain, learning how to represent that information, but more importantly, how to use that information to draw inferences and new conclusions as well. We’ll explore how AI can solve various types of optimization problems, trying to maximize profits or minimize costs or satisfy some other constraints before turning our attention to the fast-growing field of machine learning, where we won’t tell our AI exactly how to solve a problem, but instead, give our AI access to data and experiences so that our AI can learn on its own how to perform these tasks. In particular, we’ll look at neural networks, one of the most popular tools in modern machine learning, inspired by the way that human brains learn and reason as well before finally taking a look at the world of natural language processing so that it’s not just us humans learning to learn how artificial intelligence is able to speak, but also AI learning how to understand and interpret human language as well. We’ll explore these ideas and algorithms, and along the way, give you the opportunity to build your own AI programs to implement all of this and more. This is CS50. All right. Welcome, everyone, to an introduction to artificial intelligence with Python. My name is Brian Yu, and in this class, we’ll explore some of the ideas and techniques and algorithms that are at the foundation of artificial intelligence. Now, artificial intelligence covers a wide variety of types of techniques. Anytime you see a computer do something that appears to be intelligent or rational in some way, like recognizing someone’s face in a photo, or being able to play a game better than people can, or being able to understand human language when we talk to our phones and they understand what we mean and are able to respond back to us, these are all examples of AI, or artificial intelligence. And in this class, we’ll explore some of the ideas that make that AI possible. So we’ll begin our conversations with search, the problem of we have an AI, and we would like the AI to be able to search for solutions to some kind of problem, no matter what that problem might be. Whether it’s trying to get driving directions from point A to point B, or trying to figure out how to play a game, given a tic-tac-toe game, for example, figuring out what move it ought to make. After that, we’ll take a look at knowledge. Ideally, we want our AI to be able to know information, to be able to represent that information, and more importantly, to be able to draw inferences from that information, to be able to use the information it knows and draw additional conclusions. So we’ll talk about how AI can be programmed in order to do just that. Then we’ll explore the topic of uncertainty, talking about ideas of what happens if a computer isn’t sure about a fact, but maybe is only sure with a certain probability. So we’ll talk about some of the ideas behind probability, and how computers can begin to deal with uncertain events in order to be a little bit more intelligent in that sense as well. After that, we’ll turn our attention to optimization, problems of when the computer is trying to optimize for some sort of goal, especially in a situation where there might be multiple ways that a computer might solve a problem, but we’re looking for a better way, or potentially the best way, if that’s at all possible. Then we’ll take a look at machine learning, or learning more generally, and looking at how, when we have access to data, our computers can be programmed to be quite intelligent by learning from data and learning from experience, being able to perform a task better and better based on greater access to data. So your email, for example, where your email inbox somehow knows which of your emails are good emails and which of your emails are spam. These are all examples of computers being able to learn from past experiences and past data. We’ll take a look, too, at how computers are able to draw inspiration from human intelligence, looking at the structure of the human brain, and how neural networks can be a computer analog to that sort of idea, and how, by taking advantage of a certain type of structure of a computer program, we can write neural networks that are able to perform tasks very, very effectively. And then finally, we’ll turn our attention to language, not programming languages, but human languages that we speak every day. And taking a look at the challenges that come about as a computer tries to understand natural language, and how it is some of the natural language processing that occurs in modern artificial intelligence can actually work. But today, we’ll begin our conversation with search, this problem of trying to figure out what to do when we have some sort of situation that the computer is in, some sort of environment that an agent is in, so to speak, and we would like for that agent to be able to somehow look for a solution to that problem. Now, these problems can come in any number of different types of formats. One example, for instance, might be something like this classic 15 puzzle with the sliding tiles that you might have seen. Where you’re trying to slide the tiles in order to make sure that all the numbers line up in order. This is an example of what you might call a search problem. The 15 puzzle begins in an initially mixed up state, and we need some way of finding moves to make in order to return the puzzle to its solved state. But there are similar problems that you can frame in other ways. Trying to find your way through a maze, for example, is another example of a search problem. You begin in one place, you have some goal of where you’re trying to get to, and you need to figure out the correct sequence of actions that will take you from that initial state to the goal. And while this is a little bit abstract, any time we talk about maze solving in this class, you can translate it to something a little more real world. Something like driving directions. If you ever wonder how Google Maps is able to figure out what is the best way for you to get from point A to point B, and what turns to make at what time, depending on traffic, for example, it’s often some sort of search algorithm. You have an AI that is trying to get from an initial position to some sort of goal by taking some sequence of actions. So we’ll start our conversations today by thinking about these types of search problems and what goes in to solving a search problem like this in order for an AI to be able to find a good solution. In order to do so, though, we’re going to need to introduce a little bit of terminology, some of which I’ve already used. But the first term we’ll need to think about is an agent. An agent is just some entity that perceives its environment. It somehow is able to perceive the things around it and act on that environment in some way. So in the case of the driving directions, your agent might be some representation of a car that is trying to figure out what actions to take in order to arrive at a destination. In the case of the 15 puzzle with the sliding tiles, the agent might be the AI or the person that is trying to solve that puzzle to try and figure out what tiles to move in order to get to that solution. Next, we introduce the idea of a state. A state is just some configuration of the agent in its environment. So in the 15 puzzle, for example, any state might be any one of these three, for example. A state is just some configuration of the tiles. And each of these states is different and is going to require a slightly different solution. A different sequence of actions will be needed in each one of these in order to get from this initial state to the goal, which is where we’re trying to get. So the initial state, then, what is that? The initial state is just the state where the agent begins. It is one such state where we’re going to start from. And this is going to be the starting point for our search algorithm, so to speak. We’re going to begin with this initial state and then start to reason about it, to think about what actions might we apply to that initial state in order to figure out how to get from the beginning to the end, from the initial position to whatever our goal happens to be. And how do we make our way from that initial position to the goal? Well, ultimately, it’s via taking actions. Actions are just choices that we can make in any given state. And in AI, we’re always going to try to formalize these ideas a little bit more precisely, such that we could program them a little bit more mathematically, so to speak. So this will be a recurring theme. And we can more precisely define actions as a function. We’re going to effectively define a function called actions that takes an input, s, where s is going to be some state that exists inside of our environment. And actions of s is going to take the state as input and return as output the set of all actions that can be executed in that state. And so it’s possible that some actions are only valid in certain states and not in other states. And we’ll see examples of that soon, too. So in the case of the 15 puzzle, for example, there are generally going to be four possible actions that we can do most of the time. We can slide a tile to the right, slide a tile to the left, slide a tile up, or slide a tile down, for example. And those are going to be the actions that are available to us. So somehow our AI, our program, needs some encoding of the state, which is often going to be in some numerical format, and some encoding of these actions. But it also needs some encoding of the relationship between these things. How do the states and actions relate to one another? And in order to do that, we’ll introduce to our AI a transition model, which will be a description of what state we get after we perform some available action in some other state. And again, we can be a little bit more precise about this, define this transition model a little bit more formally, again, as a function. The function is going to be a function called result that this time takes two inputs. Input number one is s, some state. And input number two is a, some action. And the output of this function result is it is going to give us the state that we get after we perform action a in state s. So let’s take a look at an example to see more precisely what this actually means. Here is an example of a state, of the 15 puzzle, for example. And here is an example of an action, sliding a tile to the right. What happens if we pass these as inputs to the result function? Again, the result function takes this board, this state, as its first input. And it takes an action as a second input. And of course, here, I’m describing things visually so that you can see visually what the state is and what the action is. In a computer, you might represent one of these actions as just some number that represents the action. Or if you’re familiar with enums that allow you to enumerate multiple possibilities, it might be something like that. And this state might just be represented as an array or two-dimensional array of all of these numbers that exist. But here, we’re going to show it visually just so you can see it. But when we take this state and this action, pass it into the result function, the output is a new state. The state we get after we take a tile and slide it to the right, and this is the state we get as a result. If we had a different action and a different state, for example, and pass that into the result function, we’d get a different answer altogether. So the result function needs to take care of figuring out how to take a state and take an action and get what results. And this is going to be our transition model that describes how it is that states and actions are related to each other. If we take this transition model and think about it more generally and across the entire problem, we can form what we might call a state space. The set of all of the states we can get from the initial state via any sequence of actions, by taking 0 or 1 or 2 or more actions in addition to that, so we could draw a diagram that looks something like this, where every state is represented here by a game board, and there are arrows that connect every state to every other state we can get to from that state. And the state space is much larger than what you see just here. This is just a sample of what the state space might actually look like. And in general, across many search problems, whether they’re this particular 15 puzzle or driving directions or something else, the state space is going to look something like this. We have individual states and arrows that are connecting them. And oftentimes, just for simplicity, we’ll simplify our representation of this entire thing as a graph, some sequence of nodes and edges that connect nodes. But you can think of this more abstract representation as the exact same idea. Each of these little circles or nodes is going to represent one of the states inside of our problem. And the arrows here represent the actions that we can take in any particular state, taking us from one particular state to another state, for example. All right. So now we have this idea of nodes that are representing these states, actions that can take us from one state to another, and a transition model that defines what happens after we take a particular action. So the next step we need to figure out is how we know when the AI is done solving the problem. The AI needs some way to know when it gets to the goal that it’s found the goal. So the next thing we’ll need to encode into our artificial intelligence is a goal test, some way to determine whether a given state is a goal state. In the case of something like driving directions, it might be pretty easy. If you’re in a state that corresponds to whatever the user typed in as their intended destination, well, then you know you’re in a goal state. In the 15 puzzle, it might be checking the numbers to make sure they’re all in ascending order. But the AI needs some way to encode whether or not any state they happen to be in is a goal. And some problems might have one goal, like a maze where you have one initial position and one ending position, and that’s the goal. In other more complex problems, you might imagine that there are multiple possible goals. That there are multiple ways to solve a problem, and we might not care which one the computer finds, as long as it does find a particular goal. However, sometimes the computer doesn’t just care about finding a goal, but finding a goal well, or one with a low cost. And it’s for that reason that the last piece of terminology that we’ll use to define these search problems is something called a path cost. You might imagine that in the case of driving directions, it would be pretty annoying if I said I wanted directions from point A to point B, and the route that Google Maps gave me was a long route with lots of detours that were unnecessary that took longer than it should have for me to get to that destination. And it’s for that reason that when we’re formulating search problems, we’ll often give every path some sort of numerical cost, some number telling us how expensive it is to take this particular option, and then tell our AI that instead of just finding a solution, some way of getting from the initial state to the goal, we’d really like to find one that minimizes this path cost. That is, less expensive, or takes less time, or minimizes some other numerical value. We can represent this graphically if we take a look at this graph again, and imagine that each of these arrows, each of these actions that we can take from one state to another state, has some sort of number associated with it. That number being the path cost of this particular action, where some of the costs for any particular action might be more expensive than the cost for some other action, for example. Although this will only happen in some sorts of problems. In other problems, we can simplify the diagram and just assume that the cost of any particular action is the same. And this is probably the case in something like the 15 puzzle, for example, where it doesn’t really make a difference whether I’m moving right or moving left. The only thing that matters is the total number of steps that I have to take to get from point A to point B. And each of those steps is of equal cost. We can just assume it’s of some constant cost like one. And so this now forms the basis for what we might consider to be a search problem. A search problem has some sort of initial state, some place where we begin, some sort of action that we can take or multiple actions that we can take in any given state. And it has a transition model. Some way of defining what happens when we go from one state and take one action, what state do we end up with as a result. In addition to that, we need some goal test to know whether or not we’ve reached a goal. And then we need a path cost function that tells us for any particular path, by following some sequence of actions, how expensive is that path. What does its cost in terms of money or time or some other resource that we are trying to minimize our usage of. And the goal ultimately is to find a solution. Where a solution in this case is just some sequence of actions that will take us from the initial state to the goal state. And ideally, we’d like to find not just any solution but the optimal solution, which is a solution that has the lowest path cost among all of the possible solutions. And in some cases, there might be multiple optimal solutions. But an optimal solution just means that there is no way that we could have done better in terms of finding that solution. So now we’ve defined the problem. And now we need to begin to figure out how it is that we’re going to solve this kind of search problem. And in order to do so, you’ll probably imagine that our computer is going to need to represent a whole bunch of data about this particular problem. We need to represent data about where we are in the problem. And we might need to be considering multiple different options at once. And oftentimes, when we’re trying to package a whole bunch of data related to a state together, we’ll do so using a data structure that we’re going to call a node. A node is a data structure that is just going to keep track of a variety of different values. And specifically, in the case of a search problem, it’s going to keep track of these four values in particular. Every node is going to keep track of a state, the state we’re currently on. And every node is also going to keep track of a parent. A parent being the state before us or the node that we used in order to get to this current state. And this is going to be relevant because eventually, once we reach the goal node, once we get to the end, we want to know what sequence of actions we use in order to get to that goal. And the way we’ll know that is by looking at these parents to keep track of what led us to the goal and what led us to that state and what led us to the state before that, so on and so forth, backtracking our way to the beginning so that we know the entire sequence of actions we needed in order to get from the beginning to the end. The node is also going to keep track of what action we took in order to get from the parent to the current state. And the node is also going to keep track of a path cost. In other words, it’s going to keep track of the number that represents how long it took to get from the initial state to the state that we currently happen to be at. And we’ll see why this is relevant as we start to talk about some of the optimizations that we can make in terms of these search problems more generally. So this is the data structure that we’re going to use in order to solve the problem. And now let’s talk about the approach. How might we actually begin to solve the problem? Well, as you might imagine, what we’re going to do is we’re going to start at one particular state, and we’re just going to explore from there. The intuition is that from a given state, we have multiple options that we could take, and we’re going to explore those options. And once we explore those options, we’ll find that more options than that are going to make themselves available. And we’re going to consider all of the available options to be stored inside of a single data structure that we’ll call the frontier. The frontier is going to represent all of the things that we could explore next that we haven’t yet explored or visited. So in our approach, we’re going to begin the search algorithm by starting with a frontier that just contains one state. The frontier is going to contain the initial state, because at the beginning, that’s the only state we know about. That is the only state that exists. And then our search algorithm is effectively going to follow a loop. We’re going to repeat some process again and again and again. The first thing we’re going to do is if the frontier is empty, then there’s no solution. And we can report that there is no way to get to the goal. And that’s certainly possible. There are certain types of problems that an AI might try to explore and realize that there is no way to solve that problem. And that’s useful information for humans to know as well. So if ever the frontier is empty, that means there’s nothing left to explore. And we haven’t yet found a solution, so there is no solution. There’s nothing left to explore. Otherwise, what we’ll do is we’ll remove a node from the frontier. So right now at the beginning, the frontier just contains one node representing the initial state. But over time, the frontier might grow. It might contain multiple states. And so here, we’re just going to remove a single node from that frontier. If that node happens to be a goal, then we found a solution. So we remove a node from the frontier and ask ourselves, is this the goal? And we do that by applying the goal test that we talked about earlier, asking if we’re at the destination. Or asking if all the numbers of the 15 puzzle happen to be in order. So if the node contains the goal, we found a solution. Great. We’re done. And otherwise, what we’ll need to do is we’ll need to expand the node. And this is a term of art in artificial intelligence. To expand the node just means to look at all of the neighbors of that node. In other words, consider all of the possible actions that I could take from the state that this node is representing and what nodes could I get to from there. We’re going to take all of those nodes, the next nodes that I can get to from this current one I’m looking at, and add those to the frontier. And then we’ll repeat this process. So at a very high level, the idea is we start with a frontier that contains the initial state. And we’re constantly removing a node from the frontier, looking at where we can get to next and adding those nodes to the frontier, repeating this process over and over until either we remove a node from the frontier and it contains a goal, meaning we’ve solved the problem, or we run into a situation where the frontier is empty, at which point we’re left with no solution. So let’s actually try and take the pseudocode, put it into practice by taking a look at an example of a sample search problem. So right here, I have a sample graph. A is connected to B via this action. B is connected to nodes C and D. C is connected to E. D is connected to F. And what I’d like to do is have my AI find a path from A to E. We want to get from this initial state to this goal state. So how are we going to do that? Well, we’re going to start with a frontier that contains the initial state. This is going to represent our frontier. So our frontier initially will just contain A, that initial state where we’re going to begin. And now we’ll repeat this process. If the frontier is empty, no solution. That’s not a problem, because the frontier is not empty. So we’ll remove a node from the frontier as the one to consider next. There’s only one node in the frontier. So we’ll go ahead and remove it from the frontier. But now A, this initial node, this is the node we’re currently considering. We follow the next step. We ask ourselves, is this node the goal? No, it’s not. A is not the goal. E is the goal. So we don’t return the solution. So instead, we go to this last step, expand the node, and add the resulting nodes to the frontier. What does that mean? Well, it means take this state A and consider where we could get to next. And after A, what we could get to next is only B. So that’s what we get when we expand A. We find B. And we add B to the frontier. And now B is in the frontier. And we repeat the process again. We say, all right, the frontier is not empty. So let’s remove B from the frontier. B is now the node that we’re considering. We ask ourselves, is B the goal? No, it’s not. So we go ahead and expand B and add its resulting nodes to the frontier. What happens when we expand B? In other words, what nodes can we get to from B? Well, we can get to C and D. So we’ll go ahead and add C and D from the frontier. And now we have two nodes in the frontier, C and D. And we repeat the process again. We remove a node from the frontier. For now, I’ll do so arbitrarily just by picking C. We’ll see why later, how choosing which node you remove from the frontier is actually quite an important part of the algorithm. But for now, I’ll arbitrarily remove C, say it’s not the goal. So we’ll add E, the next one, to the frontier. Then let’s say I remove E from the frontier. And now I check I’m currently looking at state E. Is it a goal state? It is, because I’m trying to find a path from A to E. So I would return the goal. And that now would be the solution, that I’m now able to return the solution. And I have found a path from A to E. So this is the general idea, the general approach of this search algorithm, to follow these steps, constantly removing nodes from the frontier, until we’re able to find a solution. So the next question you might reasonably ask is, what could go wrong here? What are the potential problems with an approach like this? And here’s one example of a problem that could arise from this sort of approach. Imagine this same graph, same as before, with one change. The change being now, instead of just an arrow from A to B, we also have an arrow from B to A, meaning we can go in both directions. And this is true in something like the 15 puzzle, where when I slide a tile to the right, I could then slide a tile to the left to get back to the original position. I could go back and forth between A and B. And that’s what these double arrows symbolize, the idea that from one state, I can get to another, and then I can get back. And that’s true in many search problems. What’s going to happen if I try to apply the same approach now? Well, I’ll begin with A, same as before. And I’ll remove A from the frontier. And then I’ll consider where I can get to from A. And after A, the only place I can get to is B. So B goes into the frontier. Then I’ll say, all right, let’s take a look at B. That’s the only thing left in the frontier. Where can I get to from B? Before, it was just C and D. But now, because of that reverse arrow, I can get to A or C or D. So all three, A, C, and D, all of those now go into the frontier. They are places I can get to from B. And now I remove one from the frontier. And maybe I’m unlucky, and maybe I pick A. And now I’m looking at A again. And I consider, where can I get to from A? And from A, well, I can get to B. And now we start to see the problem. But if I’m not careful, I go from A to B, and then back to A, and then to B again. And I could be going in this infinite loop, where I never make any progress, because I’m constantly just going back and forth between two states that I’ve already seen. So what is the solution to this? We need some way to deal with this problem. And the way that we can deal with this problem is by somehow keeping track of what we’ve already explored. And the logic is going to be, well, if we’ve already explored the state, there’s no reason to go back to it. Once we’ve explored a state, don’t go back to it. Don’t bother adding it to the frontier. There’s no need to. So here’s going to be our revised approach, a better way to approach this sort of search problem. And it’s going to look very similar, just with a couple of modifications. We’ll start with a frontier that contains the initial state, same as before. But now we’ll start with another data structure, which will just be a set of nodes that we’ve already explored. So what are the states we’ve explored? Initially, it’s empty. We have an empty explored set. And now we repeat. If the frontier is empty, no solution, same as before. We remove a node from the frontier. We check to see if it’s a goal state, return the solution. None of this is any different so far. But now what we’re going to do is we’re going to add the node to the explored state. So if it happens to be the case that we remove a node from the frontier and it’s not the goal, we’ll add it to the explored set so that we know we’ve already explored it. We don’t need to go back to it again if it happens to come up later. And then the final step, we expand the node and we add the resulting nodes to the frontier. But before, we just always added the resulting nodes to the frontier. We’re going to be a little clever about it this time. We’re only going to add the nodes to the frontier if they aren’t already in the frontier and if they aren’t already in the explored set. So we’ll check both the frontier and the explored set, make sure that the node isn’t already in one of those two. And so long as it isn’t, then we’ll go ahead and add it to the frontier, but not otherwise. And so that revised approach is ultimately what’s going to help make sure that we don’t go back and forth between two nodes. Now, the one point that I’ve kind of glossed over here so far is this step here, removing a node from the frontier. Before, I just chose arbitrarily. Like, let’s just remove a node and that’s it. But it turns out it’s actually quite important how we decide to structure our frontier, how we add and how we remove our nodes. The frontier is a data structure and we need to make a choice about in what order are we going to be removing elements. And one of the simplest data structures for adding and removing elements is something called a stack. And a stack is a data structure that is a last in, first out data type, which means the last thing that I add to the frontier is going to be the first thing that I remove from the frontier. So the most recent thing to go into the stack or the frontier in this case is going to be the node that I explore. So let’s see what happens if I apply this stack-based approach to something like this problem, finding a path from A to E. What’s going to happen? Well, again, we’ll start with A and we’ll say, all right, let’s go ahead and look at A first. And then notice this time, we’ve added A to the explored set. A is something we’ve now explored. We have this data structure that’s keeping track. We then say from A, we can get to B. And all right, from B, what can we do? Well, from B, we can explore B and get to both C and D. So we added C and then D. So now, when we explore a node, we’re going to treat the frontier as a stack, last in, first out. D was the last one to come in. So we’ll go ahead and explore that next and say, all right, where can we get to from D? Well, we can get to F. And so all right, we’ll put F into the frontier. And now, because the frontier is a stack, F is the most recent thing that’s gone in the stack. So F is what we’ll explore next. We’ll explore F and say, all right, where can we get to from F? Well, we can’t get anywhere, so nothing gets added to the frontier. So now, what was the new most recent thing added to the frontier? Well, it’s now C, the only thing left in the frontier. We’ll explore that from which we can see, all right, from C, we can get to E. So E goes into the frontier. And then we say, all right, let’s look at E. And E is now the solution. And now, we’ve solved the problem. So when we treat the frontier like a stack, a last in, first out data structure, that’s the result we get. We go from A to B to D to F. And then we sort of backed up and went down to C and then E. And it’s important to get a visual sense for how this algorithm is working. We went very deep in this search tree, so to speak, all the way until the bottom where we hit a dead end. And then we effectively backed up and explored this other route that we didn’t try before. And it’s this going very deep in the search tree idea, this way the algorithm ends up working when we use a stack that we call this version of the algorithm depth first search. Depth first search is the search algorithm where we always explore the deepest node in the frontier. We keep going deeper and deeper through our search tree. And then if we hit a dead end, we back up and we try something else instead. But depth first search is just one of the possible search options that we could use. It turns out that there’s another algorithm called breadth first search, which behaves very similarly to depth first search with one difference. Instead of always exploring the deepest node in the search tree, the way the depth first search does, breadth first search is always going to explore the shallowest node in the frontier. So what does that mean? Well, it means that instead of using a stack which depth first search or DFS used, where the most recent item added to the frontier is the one we’ll explore next, in breadth first search or BFS, we’ll instead use a queue, where a queue is a first in first out data type, where the very first thing we add to the frontier is the first one we’ll explore and they effectively form a line or a queue, where the earlier you arrive in the frontier, the earlier you get explored. So what would that mean for the same exact problem, finding a path from A to E? Well, we start with A, same as before, then we’ll go ahead and have explored A and say, where can we get to from A? Well, from A, we can get to B, same as before. From B, same as before, we can get to C and D. So C and D get added to the frontier. This time, though, we added C to the frontier before D. So we’ll explore C first. So C gets explored. And from C, where can we get to? Well, we can get to E. So E gets added to the frontier. But because D was explored before E, we’ll look at D next. So we’ll explore D and say, where can we get to from D? We can get to F. And only then will we say, all right, now we can get to E. And so what breadth first search or BFS did is we started here, we looked at both C and D, and then we looked at E. Effectively, we’re looking at things one away from the initial state, then two away from the initial state, and only then, things that are three away from the initial state, unlike depth first search, which just went as deep as possible into the search tree until it hit a dead end and then ultimately had to back up. So these now are two different search algorithms that we could apply in order to try and solve a problem. And let’s take a look at how these would actually work in practice with something like maze solving, for example. So here’s an example of a maze. These empty cells represent places where our agent can move. These darkened gray cells represent walls that the agent can’t pass through. And ultimately, our agent, our AI, is going to try to find a way to get from position A to position B via some sequence of actions, where those actions are left, right, up, and down. What will depth first search do in this case? Well, depth first search will just follow one path. If it reaches a fork in the road where it has multiple different options, depth first search is just, in this case, going to choose one. That doesn’t a real preference. But it’s going to keep following one until it hits a dead end. And when it hits a dead end, depth first search effectively goes back to the last decision point and tries the other path, fully exhausting this entire path. And when it realizes that, OK, the goal is not here, then it turns its attention to this path. It goes as deep as possible. When it hits a dead end, it backs up and then tries this other path, keeps going as deep as possible down one particular path. And when it realizes that that’s a dead end, then it’ll back up, and then ultimately find its way to the goal. And maybe you got lucky, and maybe you made a different choice earlier on. But ultimately, this is how depth first search is going to work. It’s going to keep following until it hits a dead end. And when it hits a dead end, it backs up and looks for a different solution. And so one thing you might reasonably ask is, is this algorithm always going to work? Will it always actually find a way to get from the initial state? To the goal. And it turns out that as long as our maze is finite, as long as there are only finitely many spaces where we can travel, then, yes, depth first search is going to find a solution. Because eventually, it’ll just explore everything. If the maze happens to be infinite and there’s an infinite state space, which does exist in certain types of problems, then it’s a slightly different story. But as long as our maze has finitely many squares, we’re going to find a solution. The next question, though, that we want to ask is, is it going to be a good solution? Is it the optimal solution that we can find? And the answer there is not necessarily. And let’s take a look at an example of that. In this maze, for example, we’re again trying to find our way from A to B. And you notice here there are multiple possible solutions. We could go this way or we could go up in order to make our way from A to B. Now, if we’re lucky, depth first search will choose this way and get to B. But there’s no reason necessarily why depth first search would choose between going up or going to the right. It’s sort of an arbitrary decision point because both are going to be added to the frontier. And ultimately, if we get unlucky, depth first search might choose to explore this path first because it’s just a random choice at this point. It’ll explore, explore, explore. And it’ll eventually find the goal, this particular path, when in actuality there was a better path. There was a more optimal solution that used fewer steps, assuming we’re measuring the cost of a solution based on the number of steps that we need to take. So depth first search, if we’re unlucky, might end up not finding the best solution when a better solution is available. So that’s DFS, depth first search. How does BFS, or breadth first search, compare? How would it work in this particular situation? Well, the algorithm is going to look very different visually in terms of how BFS explores. Because BFS looks at shallower nodes first, the idea is going to be, BFS will first look at all of the nodes that are one away from the initial state. Look here and look here, for example, just at the two nodes that are immediately next to this initial state. Then it’ll explore nodes that are two away, looking at this state and that state, for example. Then it’ll explore nodes that are three away, this state and that state. Whereas depth first search just picked one path and kept following it, breadth first search, on the other hand, is taking the option of exploring all of the possible paths as kind of at the same time bouncing back between them, looking deeper and deeper at each one, but making sure to explore the shallower ones or the ones that are closer to the initial state earlier. So we’ll keep following this pattern, looking at things that are four away, looking at things that are five away, looking at things that are six away, until eventually we make our way to the goal. And in this case, it’s true we had to explore some states that ultimately didn’t lead us anywhere, but the path that we found to the goal was the optimal path. This is the shortest way that we could get to the goal. And so what might happen then in a larger maze? Well, let’s take a look at something like this and how breadth first search is going to behave. Well, breadth first search, again, we’ll just keep following the states until it receives a decision point. It could go either left or right. And while DFS just picked one and kept following that until it hit a dead end, BFS, on the other hand, will explore both. It’ll say look at this node, then this node, and it’ll look at this node, then that node. So on and so forth. And when it hits a decision point here, rather than pick one left or two right and explore that path, it’ll again explore both, alternating between them, going deeper and deeper. We’ll explore here, and then maybe here and here, and then keep going. Explore here and slowly make our way, you can visually see, further and further out. Once we get to this decision point, we’ll explore both up and down until ultimately we make our way to the goal. And what you’ll notice is, yes, breadth first search did find our way from A to B by following this particular path, but it needed to explore a lot of states in order to do so. And so we see some trade offs here between DFS and BFS, that in DFS, there may be some cases where there is some memory savings as compared to a breadth first approach, where breadth first search in this case had to explore a lot of states. But maybe that won’t always be the case. So now let’s actually turn our attention to some code and look at the code that we could actually write in order to implement something like depth first search or breadth first search in the context of solving a maze, for example. So I’ll go ahead and go into my terminal. And what I have here inside of maze.py is an implementation of this same idea of maze solving. I’ve defined a class called node that in this case is keeping track of the state, the parent, in other words, the state before the state, and the action. In this case, we’re not keeping track of the path cost because we can calculate the cost of the path at the end after we found our way from the initial state to the goal. In addition to this, I’ve defined a class called a stack frontier. And if unfamiliar with a class, a class is a way for me to define a way to generate objects in Python. It refers to an idea of object oriented programming, where the idea here is that I would like to create an object that is able to store all of my frontier data. And I would like to have functions, otherwise known as methods, on that object that I can use to manipulate the object. And so what’s going on here, if unfamiliar with the syntax, is I have a function that initially creates a frontier that I’m going to represent using a list. And initially, my frontier is represented by the empty list. There’s nothing in my frontier to begin with. I have an add function that adds something to the frontier as by appending it to the end of the list. I have a function that checks if the frontier contains a particular state. I have an empty function that checks if the frontier is empty. If the frontier is empty, that just means the length of the frontier is 0. And then I have a function for removing something from the frontier. I can’t remove something from the frontier if the frontier is empty, so I check for that first. But otherwise, if the frontier isn’t empty, recall that I’m implementing this frontier as a stack, a last in first out data structure, which means the last thing I add to the frontier, in other words, the last thing in the list, is the item that I should remove from this frontier. So what you’ll see here is I have removed the last item of a list. And if you index into a Python list with negative 1, that gets you the last item in the list. Since 0 is the first item, negative 1 kind of wraps around and gets you to the last item in the list. So we give that the node. We call that node. We update the frontier here on line 28 to say, go ahead and remove that node that you just removed from the frontier. And then we return the node as a result. So this class here effectively implements the idea of a frontier. It gives me a way to add something to a frontier and a way to remove something from the frontier as a stack. I’ve also, just for good measure, implemented an alternative version of the same thing called a queue frontier, which in parentheses you’ll see here, it inherits from a stack frontier, meaning it’s going to do all the same things that the stack frontier did, except the way we remove a node from the frontier is going to be slightly different. Instead of removing from the end of the list the way we would in a stack, we’re instead going to remove from the beginning of the list. Self.frontier 0 will get me the first node in the frontier, the first one that was added, and that is going to be the one that we return in the case of a queue. Then under here, I have a definition of a class called maze. This is going to handle the process of taking a sequence, a maze-like text file, and figuring out how to solve it. So it will take as input a text file that looks something like this, for example, where we see hash marks that are here representing walls, and I have the character A representing the starting position and the character B representing the ending position. And you can take a look at the code for parsing this text file right now. That’s the less interesting part. The more interesting part is this solve function here, the solve function is going to figure out how to actually get from point A to point B. And here we see an implementation of the exact same idea we saw from a moment ago. We’re going to keep track of how many states we’ve explored, just so we can report that data later. But I start with a node that represents just the start state. And I start with a frontier that, in this case, is a stack frontier. And given that I’m treating my frontier as a stack, you might imagine that the algorithm I’m using here is now depth-first search, because depth-first search, or DFS, uses a stack as its data structure. And initially, this frontier is just going to contain the start state. We initialize an explored set that initially is empty. There’s nothing we’ve explored so far. And now here’s our loop, that notion of repeating something again and again. First, we check if the frontier is empty by calling that empty function that we saw the implementation of a moment ago. And if the frontier is indeed empty, we’ll go ahead and raise an exception, or a Python error, to say, sorry, there is no solution to this problem. Otherwise, we’ll go ahead and remove a node from the frontier as by calling frontier.remove and update the number of states we’ve explored, because now we’ve explored one additional state. So we say self.numexplored plus equals 1, adding 1 to the number of states we’ve explored. Once we remove a node from the frontier, recall that the next step is to see whether or not it’s the goal, the goal test. And in the case of the maze, the goal is pretty easy. I check to see whether the state of the node is equal to the goal. Initially, when I set up the maze, I set up this value called goal, which is a property of the maze, so I can just check to see if the node is actually the goal. And if it is the goal, then what I want to do is backtrack my way towards figuring out what actions I took in order to get to this goal. And how do I do that? We’ll recall that every node stores its parent, the node that came before it that we used to get to this node, and also the action used in order to get there. So I can create this loop where I’m constantly just looking at the parent of every node and keeping track for all of the parents what action I took to get from the parent to this current node. So this loop is going to keep repeating this process of looking through all of the parent nodes until we get back to the initial state, which has no parent, where node.parent is going to be equal to none. As I do so, I’m going to be building up the list of all of the actions that I’m following and the list of all the cells that are part of the solution. But I’ll reverse them because when I build it up, going from the goal back to the initial state and building the sequence of actions from the goal to the initial state, but I want to reverse them in order to get the sequence of actions from the initial state to the goal. And that is ultimately going to be the solution. So all of that happens if the current state is equal to the goal. And otherwise, if it’s not the goal, well, then I’ll go ahead and add this state to the explored set to say, I’ve explored this state now. No need to go back to it if I come across it in the future. And then this logic here implements the idea of adding neighbors to the frontier. I’m saying, look at all of my neighbors, and I implemented a function called neighbors that you can take a look at. And for each of those neighbors, I’m going to check, is the state already in the frontier? Is the state already in the explored set? And if it’s not in either of those, then I’ll go ahead and add this new child node, this new node, to the frontier. So there’s a fair amount of syntax here, but the key here is not to understand all the nuances of the syntax. So feel free to take a closer look at this file on your own to get a sense for how it is working. But the key is to see how this is an implementation of the same pseudocode, the same idea that we were describing a moment ago on the screen when we were looking at the steps that we might follow in order to solve this kind of search problem. So now let’s actually see this in action. I’ll go ahead and run maze.py on maze1.txt, for example. And what we’ll see is here, we have a printout of what the maze initially looked like. And then here down below is after we’ve solved it. We had to explore 11 states in order to do it, and we found a path from A to B. And in this program, I just happened to generate a graphical representation of this as well. So I can open up maze.png, which is generated by this program, that shows you where in the darker color here are the walls, red is the initial state, green is the goal, and yellow is the path that was followed. We found a path from the initial state to the goal. But now let’s take a look at a more sophisticated maze to see what might happen instead. Let’s look now at maze2.txt. We’re now here. We have a much larger maze. Again, we’re trying to find our way from point A to point B. But now you’ll imagine that depth-first search might not be so lucky. It might not get the goal on the first try. It might have to follow one path, then backtrack and explore something else a little bit later. So let’s try this. We’ll run python maze.py of maze2.txt, this time trying on this other maze. And now, depth-first search is able to find a solution. Here, as indicated by the stars, is a way to get from A to B. And we can represent this visually by opening up this maze. Here’s what that maze looks like, and highlighted in yellow is the path that was found from the initial state to the goal. But how many states did we have to explore before we found that path? Well, recall that in my program, I was keeping track of the number of states that we’ve explored so far. And so I can go back to the terminal and see that, all right, in order to solve this problem, we had to explore 399 different states. And in fact, if I make one small modification of the program and tell the program at the end when we output this image, I added an argument called show explored. And if I set show explored equal to true and rerun this program, python maze.py, running it on maze2, and then I open the maze, what you’ll see here is highlighted in red are all of the states that had to be explored to get from the initial state to the goal. Depth-first search, or DFS, didn’t find its way to the goal right away. It made a choice to first explore this direction. And when it explored this direction, it had to follow every conceivable path all the way to the very end, even this long and winding one, in order to realize that, you know what? That’s a dead end. And instead, the program needed to backtrack. After going this direction, it must have gone this direction. It got lucky here by just not choosing this path, but it got unlucky here, exploring this direction, exploring a bunch of states it didn’t need to, and then likewise exploring all of this top part of the graph when it probably didn’t need to do that either. So all in all, depth-first search here really not performing optimally, or probably exploring more states than it needs to. It finds an optimal solution, the best path to the goal, but the number of states needed to explore in order to do so, the number of steps I had to take, that was much higher. So let’s compare. How would breadth-first search, or BFS, do on this exact same maze instead? And in order to do so, it’s a very easy change. The algorithm for DFS and BFS is identical with the exception of what data structure we use to represent the frontier, that in DFS, I used a stack frontier, last in, first out, whereas in BFS, I’m going to use a queue frontier, first in, first out, where the first thing I add to the frontier is the first thing that I remove. So I’ll go back to the terminal, rerun this program on the same maze, and now you’ll see that the number of states we had to explore was only 77 as compared to almost 400 when we used depth-first search. And we can see exactly why. We can see what happened if we open up maze.png now and take a look. Again, yellow highlight is the solution that breadth-first search found, which incidentally is the same solution that depth-first search found. They’re both finding the best solution. But notice all the white unexplored cells. There was much fewer states that needed to be explored in order to make our way to the goal because breadth-first search operates a little more shallowly. It’s exploring things that are close to the initial state without exploring things that are further away. So if the goal is not too far away, then breadth-first search can actually behave quite effectively on a maze that looks a little something like this. Now, in this case, both BFS and DFS ended up finding the same solution, but that won’t always be the case. And in fact, let’s take a look at one more example. For instance, maze3.txt. In maze3.txt, notice that here there are multiple ways that you could get from A to B. It’s a relatively small maze, but let’s look at what happens. If I use, and I’ll go ahead and turn off show explored so we just see the solution. If I use BFS, breadth-first search, to solve maze3.txt, well, then we find a solution, and if I open up the maze, here is the solution that we found. It is the optimal one. With just four steps, we can get from the initial state to what the goal happens to be. But what happens if we tried to use depth-first search or DFS instead? Well, again, I’ll go back up to my Q frontier, where Q frontier means that we’re using breadth-first search, and I’ll change it to a stack frontier, which means that now we’ll be using depth-first search. I’ll rerun pythonmaze.py, and now you’ll see that we find the solution, but it is not the optimal solution. This instead is what our algorithm finds, and maybe depth-first search would have found the solution. It’s possible, but it’s not guaranteed that if we just happen to be unlucky, if we choose this state instead of that state, then depth-first search might find a longer route to get from the initial state to the goal. So we do see some trade-offs here, where depth-first search might not find the optimal solution. So at that point, it seems like breadth-first search is pretty good. Is that the best we can do, where it’s going to find us the optimal solution, and we don’t have to worry about situations where we might end up finding a longer path to the solution than what actually exists? Where the goal is far away from the initial state, and we might have to take lots of steps in order to get from the initial state to the goal, what ended up happening is that this algorithm, BFS, ended up exploring basically the entire graph, having to go through the entire maze in order to find its way from the initial state to the goal state. What we’d ultimately like is for our algorithm to be a little bit more intelligent. And now what would it mean for our algorithm to be a little bit more intelligent in this case? Well, let’s look back to where breadth-first search might have been able to make a different decision and consider human intuition in this process as well. What might a human do when solving this maze that is different than what BFS ultimately chose to do? Well, the very first decision point that BFS made was right here, when it made five steps and ended up in a position where it had a fork in the row. It could either go left or it could go right. In these initial couple steps, there was no choice. There was only one action that could be taken from each of those states. And so the search algorithm did the only thing that any search algorithm could do, which is keep following that state after the next state. But this decision point is where things get a little bit interesting. Depth-first search, that very first search algorithm we looked at, chose to say, let’s pick one path and exhaust that path. See if anything that way has the goal. And if not, then let’s try the other way. Depth-first search took the alternative approach of saying, you know what, let’s explore things that are shallow, close to us first. Look left and right, then back left and back right, so on and so forth, alternating between our options in the hopes of finding something nearby. But ultimately, what might a human do if confronted with a situation like this of go left or go right? Well, a human might visually see that, all right, I’m trying to get to state b, which is way up there, and going right just feels like it’s closer to the goal. It feels like going right should be better than going left because I’m making progress towards getting to that goal. Now, of course, there are a couple of assumptions that I’m making here. I’m making the assumption that we can represent this grid as like a two-dimensional grid where I know the coordinates of everything. I know that a is in coordinate 0, 0, and b is in some other coordinate pair, and I know what coordinate I’m at now. So I can calculate that, yeah, going this way, that is closer to the goal. And that might be a reasonable assumption for some types of search problems, but maybe not in others. But for now, we’ll go ahead and assume that, that I know what my current coordinate pair is, and I know the coordinate, x, y, of the goal that I’m trying to get to. And in this situation, I’d like an algorithm that is a little bit more intelligent, that somehow knows that I should be making progress towards the goal, and this is probably the way to do that because in a maze, moving in the coordinate direction of the goal is usually, though not always, a good thing. And so here we draw a distinction between two different types of search algorithms, uninformed search and informed search. Uninformed search algorithms are algorithms like DFS and BFS, the two algorithms that we just looked at, which are search strategies that don’t use any problem-specific knowledge to be able to solve the problem. DFS and BFS didn’t really care about the structure of the maze or anything about the way that a maze is in order to solve the problem. They just look at the actions available and choose from those actions, and it doesn’t matter whether it’s a maze or some other problem, the solution or the way that it tries to solve the problem is really fundamentally going to be the same. What we’re going to take a look at now is an improvement upon uninformed search. We’re going to take a look at informed search. Informed search are going to be search strategies that use knowledge specific to the problem to be able to better find a solution. And in the case of a maze, this problem-specific knowledge is something like if I’m in a square that is geographically closer to the goal, that is better than being in a square that is geographically further away. And this is something we can only know by thinking about this problem and reasoning about what knowledge might be helpful for our AI agent to know a little something about. There are a number of different types of informed search. Specifically, first, we’re going to look at a particular type of search algorithm called greedy best-first search. Greedy best-first search, often abbreviated G-BFS, is a search algorithm that instead of expanding the deepest node like DFS or the shallowest node like BFS, this algorithm is always going to expand the node that it thinks is closest to the goal. Now, the search algorithm isn’t going to know for sure whether it is the closest thing to the goal. Because if we knew what was closest to the goal all the time, then we would already have a solution. The knowledge of what is close to the goal, we could just follow those steps in order to get from the initial position to the solution. But if we don’t know the solution, meaning we don’t know exactly what’s closest to the goal, instead we can use an estimate of what’s closest to the goal, otherwise known as a heuristic, just some way of estimating whether or not we’re close to the goal. And we’ll do so using a heuristic function conventionally called h of n that takes a status input and returns our estimate of how close we are to the goal. So what might this heuristic function actually look like in the case of a maze solving algorithm? Where we’re trying to solve a maze, what does the heuristic look like? Well, the heuristic needs to answer a question between these two cells, C and D, which one is better? Which one would I rather be in if I’m trying to find my way to the goal? Well, any human could probably look at this and tell you, you know what, D looks like it’s better. Even if the maze is convoluted and you haven’t thought about all the walls, D is probably better. And why is D better? Well, because if you ignore the wall, so let’s just pretend the walls don’t exist for a moment and relax the problem, so to speak, D, just in terms of coordinate pairs, is closer to this goal. It’s fewer steps that I wouldn’t take to get to the goal as compared to C, even if you ignore the walls. If you just know the xy-coordinate of C and the xy-coordinate of the goal, and likewise you know the xy-coordinate of D, you can calculate the D just geographically. Ignoring the walls looks like it’s better. And so this is the heuristic function that we’re going to use. And it’s something called the Manhattan distance, one specific type of heuristic, where the heuristic is how many squares vertically and horizontally and then left to right, so not allowing myself to go diagonally, just either up or right or left or down. How many steps do I need to take to get from each of these cells to the goal? Well, as it turns out, D is much closer. There are fewer steps. It only needs to take six steps in order to get to that goal. Again, here, ignoring the walls. We’ve relaxed the problem a little bit. We’re just concerned with if you do the math to subtract the x values from each other and the y values from each other, what is our estimate of how far we are away? We can estimate the D is closer to the goal than C is. And so now we have an approach. We have a way of picking which node to remove from the frontier. And at each stage in our algorithm, we’re going to remove a node from the frontier. We’re going to explore the node if it has the smallest value for this heuristic function, if it has the smallest Manhattan distance to the goal. And so what would this actually look like? Well, let me first label this graph, label this maze, with a number representing the value of this heuristic function, the value of the Manhattan distance from any of these cells. So from this cell, for example, we’re one away from the goal. From this cell, we’re two away from the goal, three away, four away. Here, we’re five away because we have to go one to the right and then four up. From somewhere like here, the Manhattan distance is two. We’re only two squares away from the goal geographically, even though in practice, we’re going to have to take a longer path. But we don’t know that yet. The heuristic is just some easy way to estimate how far we are away from the goal. And maybe our heuristic is overly optimistic. It thinks that, yeah, we’re only two steps away. When in practice, when you consider the walls, it might be more steps. So the important thing here is that the heuristic isn’t a guarantee of how many steps it’s going to take. It is estimating. It’s an attempt at trying to approximate. And it does seem generally the case that the squares that look closer to the goal have smaller values for the heuristic function than squares that are further away. So now, using greedy best-first search, what might this algorithm actually do? Well, again, for these first five steps, there’s not much of a choice. We start at this initial state a, and we say, all right, we have to explore these five states. But now we have a decision point. Now we have a choice between going left and going right. And before, when DFS and BFS would just pick arbitrarily, because it just depends on the order you throw these two nodes into the frontier, and we didn’t specify what order you put them into the frontier, only the order you take them out, here we can look at 13 and 11 and say that, all right, this square is a distance of 11 away from the goal according to our heuristic, according to our estimate. And this one, we estimate to be 13 away from the goal. So between those two options, between these two choices, I’d rather have the 11. I’d rather be 11 steps away from the goal, so I’ll go to the right. We’re able to make an informed decision, because we know a little something more about this problem. So then we keep following, 10, 9, 8. Between the two 7s, we don’t really have much of a way to know between those. So then we do just have to make an arbitrary choice. And you know what, maybe we choose wrong. But that’s OK, because now we can still say, all right, let’s try this 7. We say 7, 6, we have to make this choice, even though it increases the value of the heuristic function. But now we have another decision point, between 6 and 8, and between those two. And really, we’re also considering this 13, but that’s much higher. Between 6, 8, and 13, well, the 6 is the smallest value, so we’d rather take the 6. We’re able to make an informed decision that going this way to the right is probably better than going down. So we turn this way, we go to 5. And now we find a decision point where we’ll actually make a decision that we might not want to make, but there’s unfortunately not too much of a way around this. We see 4 and 6. 4 looks closer to the goal, right? It’s going up, and the goal is further up. So we end up taking that route, which ultimately leads us to a dead end. But that’s OK, because we can still say, all right, now let’s try the 6. And now follow this route that will ultimately lead us to the goal. And so this now is how greedy best-for-search might try to approach this problem by saying, whenever we have a decision between multiple nodes that we could explore, let’s explore the node that has the smallest value of h of n, this heuristic function that is estimating how far I have to go. And it just so happens that in this case, we end up doing better in terms of the number of states we needed to explore than BFS needed to. BFS explored all of this section and all of that section, but we were able to eliminate that by taking advantage of this heuristic, this knowledge about how close we are to the goal or some estimate of that idea. So this seems much better. So wouldn’t we always prefer an algorithm like this over an algorithm like breadth-first search? Well, maybe one thing to take into consideration is that we need to come up with a good heuristic, how good the heuristic is, is going to affect how good this algorithm is. And coming up with a good heuristic can oftentimes be challenging. But the other thing to consider is to ask the question, just as we did with the prior two algorithms, is this algorithm optimal? Will it always find the shortest path from the initial state to the goal? And to answer that question, let’s take a look at this example for a moment. Take a look at this example. Again, we’re trying to get from A to B. And again, I’ve labeled each of the cells with their Manhattan distance from the goal. The number of squares up and to the right, you would need to travel in order to get from that square to the goal. And let’s think about, would greedy best-first search that always picks the smallest number end up finding the optimal solution? What is the shortest solution? And would this algorithm find it? And the important thing to realize is that right here is the decision point. We’re estimated to be 12 away from the goal. And we have two choices. We can go to the left, which we estimate to be 13 away from the goal. Or we can go up, where we estimate it to be 11 away from the goal. And between those two, greedy best-first search is going to say the 11 looks better than the 13. And in doing so, greedy best-first search will end up finding this path to the goal. But it turns out this path is not optimal. There is a way to get to the goal using fewer steps. And it’s actually this way, this way that ultimately involved fewer steps, even though it meant at this moment choosing the worst option between the two or what we estimated to be the worst option based on the heuristics. And so this is what we mean by this is a greedy algorithm. It’s making the best decision locally. At this decision point, it looks like it’s better to go here than it is to go to the 13. But in the big picture, it’s not necessarily optimal. That it might find a solution when in actuality, there was a better solution available. So we would like some way to solve this problem. We like the idea of this heuristic, of being able to estimate the path, the distance between us and the goal. And that helps us to be able to make better decisions and to eliminate having to search through entire parts of this state space. But we would like to modify the algorithm so that we can achieve optimality, so that it can be optimal. And what is the way to do this? What is the intuition here? Well, let’s take a look at this problem. In this initial problem, greedy best research found us this solution here, this long path. And the reason why it wasn’t great is because, yes, the heuristic numbers went down pretty low. But later on, they started to build back up. They built back 8, 9, 10, 11, all the way up to 12 in this case. And so how might we go about trying to improve this algorithm? Well, one thing that we might realize is that if we go all the way through this algorithm, through this path, and we end up going to the 12, and we’ve had to take this many steps, who knows how many steps that is, just to get to this 12, we could have also, as an alternative, taken much fewer steps, just six steps, and ended up at this 13 here. And yes, 13 is more than 12, so it looks like it’s not as good. But it required far fewer steps. It only took six steps to get to this 13 versus many more steps to get to this 12. And while greedy best research says, oh, well, 12 is better than 13, so pick the 12, we might more intelligently say, I’d rather be somewhere that heuristically looks like it takes slightly longer if I can get there much more quickly. And we’re going to encode that idea, this general idea, into a more formal algorithm known as A star search. A star search is going to solve this problem by instead of just considering the heuristic, also considering how long it took us to get to any particular state. So the distinction is greedy best for search. If I am in a state right now, the only thing I care about is, what is the estimated distance, the heuristic value, between me and the goal? Whereas A star search will take into consideration two pieces of information. It’ll take into consideration, how far do I estimate I am from the goal? But also, how far did I have to travel in order to get here? Because that is relevant, too. So we’ll search algorithms by expanding the node with the lowest value of g of n plus h of n. h of n is that same heuristic that we were talking about a moment ago that’s going to vary based on the problem. But g of n is going to be the cost to reach the node, how many steps I had to take, in this case, to get to my current position. So what does that search algorithm look like in practice? Well, let’s take a look. Again, we’ve got the same maze. And again, I’ve labeled them with their Manhattan distance. This value is the h of n value, the heuristic estimate of how far each of these squares is away from the goal. But now, as we begin to explore states, we care not just about this heuristic value, but also about g of n, the number of steps I had to take in order to get there. And I care about summing those two numbers together. So what does that look like? On this very first step, I have taken one step. And now I am estimated to be 16 steps away from the goal. So the total value here is 17. Then I take one more step. I’ve now taken two steps. And I estimate myself to be 15 away from the goal, again, a total value of 17. Now I’ve taken three steps. And I’m estimated to be 14 away from the goal, so on and so forth. Four steps, an estimate of 13. Five steps, estimate of 12. And now here’s a decision point. I could either be six steps away from the goal with a heuristic of 13 for a total of 19, or I could be six steps away from the goal with a heuristic of 11 with an estimate of 17 for the total. So between 19 and 17, I’d rather take the 17, the 6 plus 11. So so far, no different than what we saw before. We’re still taking this option because it appears to be better. And I keep taking this option because it appears to be better. But it’s right about here that things get a little bit different. Now I could be 15 steps away from the goal with an estimated distance of 6. So 15 plus 6, total value of 21. Alternatively, I could be six steps away from the goal, because this is five steps away, so this is six steps away, with a total value of 13 as my estimate. So 6 plus 13, that’s 19. So here, we would evaluate g of n plus h of n to be 19, 6 plus 13. Whereas here, we would be 15 plus 6, or 21. And so the intuition is 19 less than 21, pick here. But the idea is ultimately I’d rather be having taken fewer steps, get to a 13, than having taken 15 steps and be at a 6, because it means I’ve had to take more steps in order to get there. Maybe there’s a better path this way. So instead, we’ll explore this route. Now if we go one more, this is seven steps plus 14 is 21. So between those two, it’s sort of a toss-up. We might end up exploring that one anyways. But after that, as these numbers start to get bigger in the heuristic values, and these heuristic values start to get smaller, you’ll find that we’ll actually keep exploring down this path. And you can do the math to see that at every decision point, A star search is going to make a choice based on the sum of how many steps it took me to get to my current position, and then how far I estimate I am from the goal. So while we did have to explore some of these states, the ultimate solution we found was, in fact, an optimal solution. It did find us the quickest possible way to get from the initial state to the goal. And it turns out that A star is an optimal search algorithm under certain conditions. So the conditions are H of n, my heuristic, needs to be admissible. What does it mean for a heuristic to be admissible? Well, a heuristic is admissible if it never overestimates the true cost. H of n always needs to either get it exactly right in terms of how far away I am, or it needs to underestimate. So we saw an example from before where the heuristic value was much smaller than the actual cost it would take. That’s totally fine, but the heuristic value should never overestimate. It should never think that I’m further away from the goal than I actually am. And meanwhile, to make a stronger statement, H of n also needs to be consistent. And what does it mean for it to be consistent? Mathematically, it means that for every node, which we’ll call n, and successor, the node after me, that I’ll call n prime, where it takes a cost of C to make that step, the heuristic value of n needs to be less than or equal to the heuristic value of n prime plus the cost. So it’s a lot of math, but in words what that ultimately means is that if I am here at this state right now, the heuristic value from me to the goal shouldn’t be more than the heuristic value of my successor, the next place I could go to, plus however much it would cost me to just make that step from one step to the next step. And so this is just making sure that my heuristic is consistent between all of these steps that I might take. So as long as this is true, then A star search is going to find me an optimal solution. And this is where much of the challenge of solving these search problems can sometimes come in, that A star search is an algorithm that is known and you could write the code fairly easily, but it’s choosing the heuristic. It can be the interesting challenge. The better the heuristic is, the better I’ll be able to solve the problem in the fewer states that I’ll have to explore. And I need to make sure that the heuristic satisfies these particular constraints. So all in all, these are some of the examples of search algorithms that might work, and certainly there are many more than just this. A star, for example, does have a tendency to use quite a bit of memory. So there are alternative approaches to A star that ultimately use less memory than this version of A star happens to use, and there are other search algorithms that are optimized for other cases as well. But now so far, we’ve only been looking at search algorithms where there is one agent. I am trying to find a solution to a problem. I am trying to navigate my way through a maze. I am trying to solve a 15 puzzle. I am trying to find driving directions from point A to point B. Sometimes in search situations, though, we’ll enter an adversarial situation, where I am an agent trying to make intelligent decisions. And there’s someone else who is fighting against me, so to speak, that has opposite objectives, someone where I am trying to succeed, someone else that wants me to fail. And this is most popular in something like a game, a game like Tic Tac Toe, where we’ve got this 3 by 3 grid, and x and o take turns, either writing an x or an o in any one of these squares. And the goal is to get three x’s in a row if you’re the x player, or three o’s in a row if you’re the o player. And computers have gotten quite good at playing games, Tic Tac Toe very easily, but even more complex games. And so you might imagine, what does an intelligent decision in a game look like? So maybe x makes an initial move in the middle, and o plays up here. What does an intelligent move for x now become? Where should you move if you were x? And it turns out there are a couple of possibilities. But if an AI is playing this game optimally, then the AI might play somewhere like the upper right, where in this situation, o has the opposite objective of x. x is trying to win the game to get three in a row diagonally here. And o is trying to stop that objective, opposite of the objective. And so o is going to place here to try to block. But now, x has a pretty clever move. x can make a move like this, where now x has two possible ways that x can win the game. x could win the game by getting three in a row across here. Or x could win the game by getting three in a row vertically this way. So it doesn’t matter where o makes their next move. o could play here, for example, blocking the three in a row horizontally. But then x is going to win the game by getting a three in a row vertically. And so there’s a fair amount of reasoning that’s going on here in order for the computer to be able to solve a problem. And it’s similar in spirit to the problems we’ve looked at so far. There are actions. There’s some sort of state of the board and some transition from one action to the next. But it’s different in the sense that this is now not just a classical search problem, but an adversarial search problem. That I am at the x player trying to find the best moves to make, but I know that there is some adversary that is trying to stop me. So we need some sort of algorithm to deal with these adversarial type of search situations. And the algorithm we’re going to take a look at is an algorithm called Minimax, which works very well for these deterministic games where there are two players. It can work for other types of games as well. But we’ll look right now at games where I make a move, then my opponent makes a move. And I am trying to win, and my opponent is trying to win also. Or in other words, my opponent is trying to get me to lose. And so what do we need in order to make this algorithm work? Well, any time we try and translate this human concept of playing a game, winning and losing to a computer, we want to translate it in terms that the computer can understand. And ultimately, the computer really just understands the numbers. And so we want some way of translating a game of x’s and o’s on a grid to something numerical, something the computer can understand. The computer doesn’t normally understand notions of win or lose. But it does understand the concept of bigger and smaller. And so what we might do is we might take each of the possible ways that a tic-tac-toe game can unfold and assign a value or a utility to each one of those possible ways. And in a tic-tac-toe game, and in many types of games, there are three possible outcomes. The outcomes are o wins, x wins, or nobody wins. So player one wins, player two wins, or nobody wins. And for now, let’s go ahead and assign each of these possible outcomes a different value. We’ll say o winning, that’ll have a value of negative 1. Nobody winning, that’ll have a value of 0. And x winning, that will have a value of 1. So we’ve just assigned numbers to each of these three possible outcomes. And now we have two players, we have the x player and the o player. And we’re going to go ahead and call the x player the max player. And we’ll call the o player the min player. And the reason why is because in the min and max algorithm, the max player, which in this case is x, is aiming to maximize the score. These are the possible options for the score, negative 1, 0, and 1. x wants to maximize the score, meaning if at all possible, x would like this situation, where x wins the game, and we give it a score of 1. But if this isn’t possible, if x needs to choose between these two options, negative 1, meaning o winning, or 0, meaning nobody winning, x would rather that nobody wins, score of 0, than a score of negative 1, o winning. So this notion of winning and losing and tying has been reduced mathematically to just this idea of try and maximize the score. The x player always wants the score to be bigger. And on the flip side, the min player, in this case o, is aiming to minimize the score. The o player wants the score to be as small as possible. So now we’ve taken this game of x’s and o’s and winning and losing and turned it into something mathematical, something where x is trying to maximize the score, o is trying to minimize the score. Let’s now look at all of the parts of the game that we need in order to encode it in an AI so that an AI can play a game like tic-tac-toe. So the game is going to need a couple of things. We’ll need some sort of initial state that will, in this case, call s0, which is how the game begins, like an empty tic-tac-toe board, for example. We’ll also need a function called player, where the player function is going to take as input a state here represented by s. And the output of the player function is going to be which player’s turn is it. We need to be able to give a tic-tac-toe board to the computer, run it through a function, and that function tells us whose turn it is. We’ll need some notion of actions that we can take. We’ll see examples of that in just a moment. We need some notion of a transition model, same as before. If I have a state and I take an action, I need to know what results as a consequence of it. I need some way of knowing when the game is over. So this is equivalent to kind of like a goal test, but I need some terminal test, some way to check to see if a state is a terminal state, where a terminal state means the game is over. In a classic game of tic-tac-toe, a terminal state means either someone has gotten three in a row or all of the squares of the tic-tac-toe board are filled. Either of those conditions make it a terminal state. In a game of chess, it might be something like when there is checkmate or if checkmate is no longer possible, that that becomes a terminal state. And then finally, we’ll need a utility function, a function that takes a state and gives us a numerical value for that terminal state, some way of saying if x wins the game, that has a value of 1. If o is won the game, that has a value of negative 1. If nobody has won the game, that has a value of 0. So let’s take a look at each of these in turn. The initial state, we can just represent in tic-tac-toe as the empty game board. This is where we begin. It’s the place from which we begin this search. And again, I’ll be representing these things visually, but you can imagine this really just being like an array or a two-dimensional array of all of these possible squares. Then we need the player function that, again, takes a state and tells us whose turn it is. Assuming x makes the first move, if I have an empty game board, then my player function is going to return x. And if I have a game board where x has made a move, then my player function is going to return o. The player function takes a tic-tac-toe game board and tells us whose turn it is. Next up, we’ll consider the actions function. The actions function, much like it did in classical search, takes a state and gives us the set of all of the possible actions we can take in that state. So let’s imagine it’s o is turned to move in a game board that looks like this. What happens when we pass it into the actions function? So the actions function takes this state of the game as input, and the output is a set of possible actions. It’s a set of I could move in the upper left or I could move in the bottom middle. So those are the two possible action choices that I have when I begin in this particular state. Now, just as before, when we had states and actions, we need some sort of transition model to tell us when we take this action in the state, what is the new state that we get. And here, we define that using the result function that takes a state as input as well as an action. And when we apply the result function to this state, saying that let’s let o move in this upper left corner, the new state we get is this resulting state where o is in the upper left corner. And now, this seems obvious to someone who knows how to play tic-tac-toe. Of course, you play in the upper left corner. That’s the board you get. But all of this information needs to be encoded into the AI. The AI doesn’t know how to play tic-tac-toe until you tell the AI how the rules of tic-tac-toe work. And this function, defining this function here, allows us to tell the AI how this game actually works and how actions actually affect the outcome of the game. So the AI needs to know how the game works. The AI also needs to know when the game is over, as by defining a function called terminal that takes as input a state s, such that if we take a game that is not yet over, pass it into the terminal function, the output is false. The game is not over. But if we take a game that is over because x has gotten three in a row along that diagonal, pass that into the terminal function, then the output is going to be true because the game now is, in fact, over. And finally, we’ve told the AI how the game works in terms of what moves can be made and what happens when you make those moves. We’ve told the AI when the game is over. Now we need to tell the AI what the value of each of those states is. And we do that by defining this utility function that takes a state s and tells us the score or the utility of that state. So again, we said that if x wins the game, that utility is a value of 1, whereas if o wins the game, then the utility of that is negative 1. And the AI needs to know, for each of these terminal states where the game is over, what is the utility of that state? So if I give you a game board like this where the game is, in fact, over, and I ask the AI to tell me what the value of that state is, it could do so. The value of the state is 1. Where things get interesting, though, is if the game is not yet over. Let’s imagine a game board like this, where in the middle of the game, it’s o’s turn to make a move. So how do we know it’s o’s turn to make a move? We can calculate that using the player function. We can say player of s, pass in the state, o is the answer. So we know it’s o’s turn to move. And now, what is the value of this board and what action should o take? Well, that’s going to depend. We have to do some calculation here. And this is where the minimax algorithm really comes in. Recall that x is trying to maximize the score, which means that o is trying to minimize the score. So o would like to minimize the total value that we get at the end of the game. And because this game isn’t over yet, we don’t really know just yet what the value of this game board is. We have to do some calculation in order to figure that out. And so how do we do that kind of calculation? Well, in order to do so, we’re going to consider, just as we might in a classical search situation, what actions could happen next and what states will that take us to. And it turns out that in this position, there are only two open squares, which means there are only two open places where o can make a move. o could either make a move in the upper left or o can make a move in the bottom middle. And minimax doesn’t know right out of the box which of those moves is going to be better. So it’s going to consider both. But now, we sort of run into the same situation. Now, I have two more game boards, neither of which is over. What happens next? And now, it’s in this sense that minimax is what we’ll call a recursive algorithm. It’s going to now repeat the exact same process, although now considering it from the opposite perspective. It’s as if I am now going to put myself, if I am the o player, I’m going to put myself in my opponent’s shoes, my opponent as the x player, and consider what would my opponent do if they were in this position? What would my opponent do, the x player, if they were in that position? And what would then happen? Well, the other player, my opponent, the x player, is trying to maximize the score, whereas I am trying to minimize the score as the o player. So x is trying to find the maximum possible value that they can get. And so what’s going to happen? Well, from this board position, x only has one choice. x is going to play here, and they’re going to get three in a row. And we know that that board, x winning, that has a value of 1. If x wins the game, the value of that game board is 1. And so from this position, if this state can only ever lead to this state, it’s the only possible option, and this state has a value of 1, then the maximum possible value that the x player can get from this game board is also 1. From here, the only place we can get is to a game with a value of 1, so this game board also has a value of 1. Now we consider this one over here. What’s going to happen now? Well, x needs to make a move. The only move x can make is in the upper left, so x will go there. And in this game, no one wins the game. Nobody has three in a row. And so the value of that game board is 0. Nobody is 1. And so again, by the same logic, if from this board position the only place we can get to is a board where the value is 0, then this state must also have a value of 0. And now here comes the choice part, the idea of trying to minimize. I, as the o player, now know that if I make this choice moving in the upper left, that is going to result in a game with a value of 1, assuming everyone plays optimally. And if I instead play in the lower middle, choose this fork in the road, that is going to result in a game board with a value of 0. I have two options. I have a 1 and a 0 to choose from, and I need to pick. And as the min player, I would rather choose the option with the minimum value. So whenever a player has multiple choices, the min player will choose the option with the smallest value. The max player will choose the option with the largest value. Between the 1 and the 0, the 0 is smaller, meaning I’d rather tie the game than lose the game. And so this game board will say also has a value of 0, because if I am playing optimally, I will pick this fork in the road. I’ll place my o here to block x’s 3 in a row, x will move in the upper left, and the game will be over, and no one will have won the game. So this is now the logic of minimax, to consider all of the possible options that I can take, all of the actions that I can take, and then to put myself in my opponent’s shoes. I decide what move I’m going to make now by considering what move my opponent will make on the next turn. And to do that, I consider what move I would make on the turn after that, so on and so forth, until I get all the way down to the end of the game, to one of these so-called terminal states. In fact, this very decision point, where I am trying to decide as the o player what to make a decision about, might have just been a part of the logic that the x player, my opponent, was using, the move before me. This might be part of some larger tree, where x is trying to make a move in this situation, and needs to pick between three different options in order to make a decision about what to happen. And the further and further away we are from the end of the game, the deeper this tree has to go. Because every level in this tree is going to correspond to one move, one move or action that I take, one move or action that my opponent takes, in order to decide what happens. And in fact, it turns out that if I am the x player in this position, and I recursively do the logic, and see I have a choice, three choices, in fact, one of which leads to a value of 0. If I play here, and if everyone plays optimally, the game will be a tie. If I play here, then o is going to win, and I’ll lose playing optimally. Or here, where I, the x player, can win, well between a score of 0, and negative 1, and 1, I’d rather pick the board with a value of 1, because that’s the maximum value I can get. And so this board would also have a maximum value of 1. And so this tree can get very, very deep, especially as the game starts to have more and more moves. And this logic works not just for tic-tac-toe, but any of these sorts of games, where I make a move, my opponent makes a move, and ultimately, we have these adversarial objectives. And we can simplify the diagram into a diagram that looks like this. This is a more abstract version of the minimax tree, where these are each states, but I’m no longer representing them as exactly like tic-tac-toe boards. This is just representing some generic game that might be tic-tac-toe, might be some other game altogether. Any of these green arrows that are pointing up, that represents a maximizing state. I would like the score to be as big as possible. And any of these red arrows pointing down, those are minimizing states, where the player is the min player, and they are trying to make the score as small as possible. So if you imagine in this situation, I am the maximizing player, this player here, and I have three choices. One choice gives me a score of 5, one choice gives me a score of 3, and one choice gives me a score of 9. Well, then between those three choices, my best option is to choose this 9 over here, the score that maximizes my options out of all the three options. And so I can give this state a value of 9, because among my three options, that is the best choice that I have available to me. So that’s my decision now. You imagine it’s like one move away from the end of the game. But then you could also ask a reasonable question, what might my opponent do two moves away from the end of the game? My opponent is the minimizing player. They are trying to make the score as small as possible. Imagine what would have happened if they had to pick which choice to make. One choice leads us to this state, where I, the maximizing player, am going to opt for 9, the biggest score that I can get. And 1 leads to this state, where I, the maximizing player, would choose 8, which is then the largest score that I can get. Now the minimizing player, forced to choose between a 9 or an 8, is going to choose the smallest possible score, which in this case is an 8. And that is then how this process would unfold, that the minimizing player in this case considers both of their options, and then all of the options that would happen as a result of that. So this now is a general picture of what the minimax algorithm looks like. Let’s now try to formalize it using a little bit of pseudocode. So what exactly is happening in the minimax algorithm? Well, given a state s, we need to decide what to happen. The max player, if it’s max’s player’s turn, then max is going to pick an action a in actions of s. Recall that actions is a function that takes a state and gives me back all of the possible actions that I can take. It tells me all of the moves that are possible. The max player is going to specifically pick an action a in this set of actions that gives me the highest value of min value of result of s and a. So what does that mean? Well, it means that I want to make the option that gives me the highest score of all of the actions a. But what score is that going to have? To calculate that, I need to know what my opponent, the min player, is going to do if they try to minimize the value of the state that results. So we say, what state results after I take this action? And what happens when the min player tries to minimize the value of that state? I consider that for all of my possible options. And after I’ve considered that for all of my possible options, I pick the action a that has the highest value. Likewise, the min player is going to do the same thing but backwards. They’re also going to consider what are all of the possible actions they can take if it’s their turn. And they’re going to pick the action a that has the smallest possible value of all the options. And the way they know what the smallest possible value of all the options is is by considering what the max player is going to do by saying, what’s the result of applying this action to the current state? And then what would the max player try to do? What value would the max player calculate for that particular state? So everyone makes their decision based on trying to estimate what the other person would do. And now we need to turn our attention to these two functions, max value and min value. How do you actually calculate the value of a state if you’re trying to maximize its value? And how do you calculate the value of a state if you’re trying to minimize the value? If you can do that, then we have an entire implementation of this min and max algorithm. So let’s try it. Let’s try and implement this max value function that takes a state and returns as output the value of that state if I’m trying to maximize the value of the state. Well, the first thing I can check for is to see if the game is over. Because if the game is over, in other words, if the state is a terminal state, then this is easy. I already have this utility function that tells me what the value of the board is. If the game is over, I just check, did x win, did o win, is it a tie? And this utility function just knows what the value of the state is. What’s trickier is if the game isn’t over. Because then I need to do this recursive reasoning about thinking, what is my opponent going to do on the next move? And I want to calculate the value of this state. And I want the value of the state to be as high as possible. And I’ll keep track of that value in a variable called v. And if I want the value to be as high as possible, I need to give v an initial value. And initially, I’ll just go ahead and set it to be as low as possible. Because I don’t know what options are available to me yet. So initially, I’ll set v equal to negative infinity, which seems a little bit strange. But the idea here is I want the value initially to be as low as possible. Because as I consider my actions, I’m always going to try and do better than v. And if I set v to negative infinity, I know I can always do better than that. So now I consider my actions. And this is going to be some kind of loop where for every action in actions of state, recall actions as a function that takes my state and gives me all the possible actions that I can use in that state. So for each one of those actions, I want to compare it to v and say, all right, v is going to be equal to the maximum of v and this expression. So what is this expression? Well, first it is get the result of taking the action in the state and then get the min value of that. In other words, let’s say I want to find out from that state what is the best that the min player can do because they’re going to try and minimize the score. So whatever the resulting score is of the min value of that state, compare it to my current best value and just pick the maximum of those two because I am trying to maximize the value. In short, what these three lines of code are doing are going through all of my possible actions and asking the question, how do I maximize the score given what my opponent is going to try to do? After this entire loop, I can just return v and that is now the value of that particular state. And for the min player, it’s the exact opposite of this, the same logic just backwards. To calculate the minimum value of a state, first we check if it’s a terminal state. If it is, we return its utility. Otherwise, we’re going to now try to minimize the value of the state given all of my possible actions. So I need an initial value for v, the value of the state. And initially, I’ll set it to infinity because I know I can always get something less than infinity. So by starting with v equals infinity, I make sure that the very first action I find, that will be less than this value of v. And then I do the same thing, loop over all of my possible actions. And for each of the results that we could get when the max player makes their decision, let’s take the minimum of that and the current value of v. So after all is said and done, I get the smallest possible value of v that I then return back to the user. So that, in effect, is the pseudocode for Minimax. That is how we take a gain and figure out what the best move to make is by recursively using these max value and min value functions, where max value calls min value, min value calls max value back and forth, all the way until we reach a terminal state, at which point our algorithm can simply return the utility of that particular state. So what you might imagine is that this is going to start to be a long process, especially as games start to get more complex, as we start to add more moves and more possible options and games that might last quite a bit longer. So the next question to ask is, what sort of optimizations can we make here? How can we do better in order to use less space or take less time to be able to solve this kind of problem? And we’ll take a look at a couple of possible optimizations. But for one, we’ll take a look at this example. Again, returning to these up arrows and down arrows, let’s imagine that I now am the max player, this green arrow. I am trying to make this score as high as possible. And this is an easy game where there are just two moves. I make a move, one of these three options. And then my opponent makes a move, one of these three options, based on what move I make. And as a result, we get some value. Let’s look at the order in which I do these calculations and figure out if there are any optimizations I might be able to make to this calculation process. I’m going to have to look at these states one at a time. So let’s say I start here on the left and say, all right, now I’m going to consider, what will the min player, my opponent, try to do here? Well, the min player is going to look at all three of their possible actions and look at their value, because these are terminal states. They’re the end of the game. And so they’ll see, all right, this node is a value of four, value of eight, value of five. And the min player is going to say, well, all right, between these three options, four, eight, and five, I’ll take the smallest one. I’ll take the four. So this state now has a value of four. Then I, as the max player, say, all right, if I take this action, it will have a value of four. That’s the best that I can do, because min player is going to try and minimize my score. So now what if I take this option? We’ll explore this next. And now explore what the min player would do if I choose this action. And the min player is going to say, all right, what are the three options? The min player has options between nine, three, and seven. And so three is the smallest among nine, three, and seven. So we’ll go ahead and say this state has a value of three. So now I, as the max player, I have now explored two of my three options. I know that one of my options will guarantee me a score of four, at least. And one of my options will guarantee me a score of three. And now I consider my third option and say, all right, what happens here? Same exact logic. The min player is going to look at these three states, two, four, and six. I’ll say the minimum possible option is two. So the min player wants the two. Now I, as the max player, have calculated all of the information by looking two layers deep, by looking at all of these nodes. And I can now say, between the four, the three, and the two, you know what? I’d rather take the four. Because if I choose this option, if my opponent plays optimally, they will try and get me to the four. But that’s the best I can do. I can’t guarantee a higher score. Because if I pick either of these two options, I might get a three or I might get a two. And it’s true that down here is a nine. And that’s the highest score out of any of the scores. So I might be tempted to say, you know what? Maybe I should take this option because I might get the nine. But if the min player is playing intelligently, if they’re making the best moves at each possible option they have when they get to make a choice, I’ll be left with a three. Whereas I could better, playing optimally, have guaranteed that I would get the four. So that is, in effect, the logic that I would use as a min and max player trying to maximize my score from that node there. But it turns out they took quite a bit of computation for me to figure that out. I had to reason through all of these nodes in order to draw this conclusion. And this is for a pretty simple game where I have three choices, my opponent has three choices, and then the game’s over. So what I’d like to do is come up with some way to optimize this. Maybe I don’t need to do all of this calculation to still reach the conclusion that, you know what, this action to the left, that’s the best that I could do. Let’s go ahead and try again and try to be a little more intelligent about how I go about doing this. So first, I start the exact same way. I don’t know what to do initially, so I just have to consider one of the options and consider what the min player might do. Min has three options, four, eight, and five. And between those three options, min says four is the best they can do because they want to try to minimize the score. Now I, the max player, will consider my second option, making this move here, and considering what my opponent would do in response. What will the min player do? Well, the min player is going to, from that state, look at their options. And I would say, all right, nine is an option, three is an option. And if I am doing the math from this initial state, doing all this calculation, when I see a three, that should immediately be a red flag for me. Because when I see a three down here at this state, I know that the value of this state is going to be at most three. It’s going to be three or something less than three, even though I haven’t yet looked at this last action or even further actions if there were more actions that could be taken here. How do I know that? Well, I know that the min player is going to try to minimize my score. And if they see a three, the only way this could be something other than a three is if this remaining thing that I haven’t yet looked at is less than three, which means there is no way for this value to be anything more than three because the min player can already guarantee a three and they are trying to minimize my score. So what does that tell me? Well, it tells me that if I choose this action, my score is going to be three or maybe even less than three if I’m unlucky. But I already know that this action will guarantee me a four. And so given that I know that this action guarantees me a score of four and this action means I can’t do better than three, if I’m trying to maximize my options, there is no need for me to consider this triangle here. There is no value, no number that could go here that would change my mind between these two options. I’m always going to opt for this path that gets me a four as opposed to this path where the best I can do is a three if my opponent plays optimally. And this is going to be true for all the future states that I look at too. That if I look over here at what min player might do over here, if I see that this state is a two, I know that this state is at most a two because the only way this value could be something other than two is if one of these remaining states is less than a two and so the min player would opt for that instead. So even without looking at these remaining states, I as the maximizing player can know that choosing this path to the left is going to be better than choosing either of those two paths to the right because this one can’t be better than three. This one can’t be better than two. And so four in this case is the best that I can do. So in order to do this cut, and I can say now that this state has a value of four. So in order to do this type of calculation, I was doing a little bit more bookkeeping, keeping track of things, keeping track all the time of what is the best that I can do, what is the worst that I can do, and for each of these states saying, all right, well, if I already know that I can get a four, then if the best I can do at this state is a three, no reason for me to consider it, I can effectively prune this leaf and anything below it from the tree. And it’s for that reason this approach, this optimization to minimax, is called alpha, beta pruning. Alpha and beta stand for these two values that you’ll have to keep track of of the best you can do so far and the worst you can do so far. And pruning is the idea of if I have a big, long, deep search tree, I might be able to search it more efficiently if I don’t need to search through everything, if I can remove some of the nodes to try and optimize the way that I look through this entire search space. So alpha, beta pruning can definitely save us a lot of time as we go about the search process by making our searches more efficient. But even then, it’s still not great as games get more complex. Tic-tac-toe, fortunately, is a relatively simple game. And we might reasonably ask a question like, how many total possible tic-tac-toe games are there? You can think about it. You can try and estimate how many moves are there at any given point, how many moves long can the game last. It turns out there are about 255,000 possible tic-tac-toe games that can be played. But compare that to a more complex game, something like a game of chess, for example. Far more pieces, far more moves, games that last much longer. How many total possible chess games could there be? It turns out that after just four moves each, four moves by the white player, four moves by the black player, that there are 288 billion possible chess games that can result from that situation, after just four moves each. And going even further, if you look at entire chess games and how many possible chess games there could be as a result there, there are more than 10 to the 29,000 possible chess games, far more chess games than could ever be considered. And this is a pretty big problem for the Minimax algorithm, because the Minimax algorithm starts with an initial state, considers all the possible actions, and all the possible actions after that, all the way until we get to the end of the game. And that’s going to be a problem if the computer is going to need to look through this many states, which is far more than any computer could ever do in any reasonable amount of time. So what do we do in order to solve this problem? Instead of looking through all these states which is totally intractable for a computer, we need some better approach. And it turns out that better approach generally takes the form of something called depth-limited Minimax, where normally Minimax is depth-unlimited. We just keep going layer after layer, move after move, until we get to the end of the game. Depth-limited Minimax is instead going to say, you know what, after a certain number of moves, maybe I’ll look 10 moves ahead, maybe I’ll look 12 moves ahead, but after that point, I’m going to stop and not consider additional moves that might come after that, just because it would be computationally intractable to consider all of those possible options. But what do we do after we get 10 or 12 moves deep when we arrive at a situation where the game’s not over? Minimax still needs a way to assign a score to that game board or game state to figure out what its current value is, which is easy to do if the game is over, but not so easy to do if the game is not yet over. So in order to do that, we need to add one additional feature to depth-limited Minimax called an evaluation function, which is just some function that is going to estimate the expected utility of a game from a given state. So in a game like chess, if you imagine that a game value of 1 means white wins, negative 1 means black wins, 0 means it’s a draw, then you might imagine that a score of 0.8 means white is very likely to win, though certainly not guaranteed. And you would have an evaluation function that estimates how good the game state happens to be. And depending on how good that evaluation function is, that is ultimately what’s going to constrain how good the AI is. The better the AI is at estimating how good or how bad any particular game state is, the better the AI is going to be able to play that game. If the evaluation function is worse and not as good as it estimating what the expected utility is, then it’s going to be a whole lot harder. And you can imagine trying to come up with these evaluation functions. In chess, for example, you might write an evaluation function based on how many pieces you have as compared to how many pieces your opponent has, because each one has a value. And your evaluation function probably needs to be a little bit more complicated than that to consider other possible situations that might arise as well. And there are many other variants on Minimax that add additional features in order to help it perform better under these larger, more computationally untractable situations where we couldn’t possibly explore all of the possible moves. So we need to figure out how to use evaluation functions and other techniques to be able to play these games ultimately better. But this now was a look at this kind of adversarial search, these search problems where we have situations where I am trying to play against some sort of opponent. And these search problems show up all over the place throughout artificial intelligence. We’ve been talking a lot today about more classical search problems, like trying to find directions from one location to another. But any time an AI is faced with trying to make a decision, like what do I do now in order to do something that is rational, or do something that is intelligent, or trying to play a game, like figuring out what move to make, these sort of algorithms can really come in handy. It turns out that for tic-tac-toe, the solution is pretty simple because it’s a small game. XKCD has famously put together a web comic where he will tell you exactly what move to make as the optimal move to make no matter what your opponent happens to do. This type of thing is not quite as possible for a much larger game like Checkers or Chess, for example, where chess is totally computationally untractable for most computers to be able to explore all the possible states. So we really need our AI to be far more intelligent about how they go about trying to deal with these problems and how they go about taking this environment that they find themselves in and ultimately searching for one of these solutions. So this, then, was a look at search in artificial intelligence. Next time, we’ll take a look at knowledge, thinking about how it is that our AIs are able to know information, reason about that information, and draw conclusions, all in our look at AI and the principles behind it. We’ll see you next time. [“AIMS INTRO MUSIC”] All right, welcome back, everyone, to an introduction to artificial intelligence with Python. Last time, we took a look at search problems, in particular, where we have AI agents that are trying to solve some sort of problem by taking actions in some sort of environment, whether that environment is trying to take actions by playing moves in a game or whether those actions are something like trying to figure out where to make turns in order to get driving directions from point A to point B. This time, we’re going to turn our attention more generally to just this idea of knowledge, the idea that a lot of intelligence is based on knowledge, especially if we think about human intelligence. People know information. We know facts about the world. And using that information that we know, we’re able to draw conclusions, reason about the information that we know in order to figure out how to do something or figure out some other piece of information that we conclude based on the information we already have available to us. What we’d like to focus on now is the ability to take this idea of knowledge and being able to reason based on knowledge and apply those ideas to artificial intelligence. In particular, we’re going to be building what are known as knowledge-based agents, agents that are able to reason and act by representing knowledge internally. Somehow inside of our AI, they have some understanding of what it means to know something. And ideally, they have some algorithms or some techniques they can use based on that knowledge that they know in order to figure out the solution to a problem or figure out some additional piece of information that can be helpful in some sense. So what do we mean by reasoning based on knowledge to be able to draw conclusions? Well, let’s look at a simple example drawn from the world of Harry Potter. We take one sentence that we know to be true. Imagine if it didn’t rain, then Harry visited Hagrid today. So one fact that we might know about the world. And then we take another fact. Harry visited Hagrid or Dumbledore today, but not both. So it tells us something about the world, that Harry either visited Hagrid but not Dumbledore, or Harry visited Dumbledore but not Hagrid. And now we have a third piece of information about the world that Harry visited Dumbledore today. So we now have three pieces of information now, three facts. Inside of a knowledge base, so to speak, information that we know. And now we, as humans, can try and reason about this and figure out, based on this information, what additional information can we begin to conclude? And well, looking at these last two statements, Harry either visited Hagrid or Dumbledore but not both, and we know that Harry visited Dumbledore today, well, then it’s pretty reasonable that we could draw the conclusion that, you know what, Harry must not have visited Hagrid today. Because based on a combination of these two statements, we can draw this inference, so to speak, a conclusion that Harry did not visit Hagrid today. But it turns out we can even do a little bit better than that, get some more information by taking a look at this first statement and reasoning about that. This first statement says, if it didn’t rain, then Harry visited Hagrid today. So what does that mean? In all cases where it didn’t rain, then we know that Harry visited Hagrid. But if we also know now that Harry did not visit Hagrid, then that tells us something about our initial premise that we were thinking about. In particular, it tells us that it did rain today, because we can reason, if it didn’t rain, that Harry would have visited Hagrid. But we know for a fact that Harry did not visit Hagrid today. So it’s this kind of reason, this sort of logical reasoning, where we use logic based on the information that we know in order to take information and reach conclusions that is going to be the focus of what we’re going to be talking about today. How can we make our artificial intelligence logical so that they can perform the same kinds of deduction, the same kinds of reasoning that we’ve been doing so far? Of course, humans reason about logic generally in terms of human language. That I just now was speaking in English, talking in English about these sentences and trying to reason through how it is that they relate to one another. We’re going to need to be a little bit more formal when we turn our attention to computers and being able to encode this notion of logic and truthhood and falsehood inside of a machine. So we’re going to need to introduce a few more terms and a few symbols that will help us reason through this idea of logic inside of an artificial intelligence. And we’ll begin with the idea of a sentence. Now, a sentence in a natural language like English is just something that I’m saying, like what I’m saying right now. In the context of AI, though, a sentence is just an assertion about the world in what we’re going to call a knowledge representation language, some way of representing knowledge inside of our computers. And the way that we’re going to spend most of today reasoning about knowledge is through a type of logic known as propositional logic. There are a number of different types of logic, some of which we’ll touch on. But propositional logic is based on a logic of propositions, or just statements about the world. And so we begin in propositional logic with a notion of propositional symbols. We will have certain symbols that are oftentimes just letters, something like P or Q or R, where each of those symbols is going to represent some fact or sentence about the world. So P, for example, might represent the fact that it is raining. And so P is going to be a symbol that represents that idea. And Q, for example, might represent Harry visited Hagrid today. Each of these propositional symbols represents some sentence or some fact about the world. But in addition to just having individual facts about the world, we want some way to connect these propositional symbols together in order to reason more complexly about other facts that might exist inside of the world in which we’re reasoning. So in order to do that, we’ll need to introduce some additional symbols that are known as logical connectives. Now, there are a number of these logical connectives. But five of the most important, and the ones we’re going to focus on today, are these five up here, each represented by a logical symbol. Not is represented by this symbol here, and is represented as sort of an upside down V, or is represented by a V shape. Implication, and we’ll talk about what that means in just a moment, is represented by an arrow. And biconditional, again, we’ll talk about what that means in a moment, is represented by these double arrows. But these five logical connectives are the main ones we’re going to be focusing on in terms of thinking about how it is that a computer can reason about facts and draw conclusions based on the facts that it knows. But in order to get there, we need to take a look at each of these logical connectives and build up an understanding for what it is that they actually mean. So let’s go ahead and begin with the not symbol, so this not symbol here. And what we’re going to show for each of these logical connectives is what we’re going to call a truth table, a table that demonstrates what this word not means when we attach it to a propositional symbol or any sentence inside of our logical language. And so the truth table for not is shown right here. If P, some propositional symbol, or some other sentence even, is false, then not P is true. And if P is true, then not P is false. So you can imagine that placing this not symbol in front of some sentence of propositional logic just says the opposite of that. So if, for example, P represented it is raining, then not P would represent the idea that it is not raining. And as you might expect, if P is false, meaning if the sentence, it is raining, is false, well then the sentence not P must be true. The sentence that it is not raining is therefore true. So not, you can imagine, just takes whatever is in P and it inverts it. It turns false into true and true into false, much analogously to what the English word not means, just taking whatever comes after it and inverting it to mean the opposite. Next up, and also very English-like, is this idea of and represented by this upside-down V shape or this point shape. And as opposed to just taking a single argument the way not does, we have P and we have not P. And is going to combine two different sentences in propositional logic together. So I might have one sentence P and another sentence Q, and I want to combine them together to say P and Q. And the general logic for what P and Q means is it means that both of its operands are true. P is true and also Q is true. And so here’s what that truth table looks like. This time we have two variables, P and Q. And when we have two variables, each of which can be in two possible states, true or false, that leads to two squared or four possible combinations of truth and falsehood. So we have P is false and Q is false. We have P is false and Q is true. P is true and Q is false. And then P and Q both are true. And those are the only four possibilities for what P and Q could mean. And in each of those situations, this third column here, P and Q, is telling us a little bit about what it actually means for P and Q to be true. And we see that the only case where P and Q is true is in this fourth row here, where P happens to be true, Q also happens to be true. And in all other situations, P and Q is going to evaluate to false. So this, again, is much in line with what our intuition of and might mean. If I say P and Q, I probably mean that I expect both P and Q to be true. Next up, also potentially consistent with what we mean, is this word or, represented by this V shape, sort of an upside down and symbol. And or, as the name might suggest, is true if either of its arguments are true, as long as P is true or Q is true, then P or Q is going to be true. Which means the only time that P or Q is false is if both of its operands are false. If P is false and Q is false, then P or Q is going to be false. But in all other cases, at least one of the operands is true. Maybe they’re both true, in which case P or Q is going to evaluate to true. Now, this is mostly consistent with the way that most people might use the word or, in the sense of speaking the word or in normal English, though there is sometimes when we might say or, where we mean P or Q, but not both, where we mean, sort of, it can only be one or the other. It’s important to note that this symbol here, this or, means P or Q or both, that those are totally OK. As long as either or both of them are true, then the or is going to evaluate to be true, as well. It’s only in the case where all of the operands are false that P or Q ultimately evaluates to false, as well. In logic, there’s another symbol known as the exclusive or, which encodes this idea of exclusivity of one or the other, but not both. But we’re not going to be focusing on that today. Whenever we talk about or, we’re always talking about either or both, in this case, as represented by this truth table here. So that now is not an and an or. And next up is what we might call implication, as denoted by this arrow symbol. So we have P and Q. And this sentence here will generally read as P implies Q. And what P implies Q means is that if P is true, then Q is also true. So I might say something like, if it is raining, then I will be indoors. Meaning, it is raining implies I will be indoors, as the logical sentence that I’m saying there. And the truth table for this can sometimes be a little bit tricky. So obviously, if P is true and Q is true, then P implies Q. That’s true. That definitely makes sense. And it should also stand to reason that when P is true and Q is false, then P implies Q is false. Because if I said to you, if it is raining, then I will be out indoors. And it is raining, but I’m not indoors? Well, then it would seem to be that my original statement was not true. P implies Q means that if P is true, then Q also needs to be true. And if it’s not, well, then the statement is false. What’s also worth noting, though, is what happens when P is false. When P is false, the implication makes no claim at all. If I say something like, if it is raining, then I will be indoors. And it turns out it’s not raining. Then in that case, I am not making any statement as to whether or not I will be indoors or not. P implies Q just means that if P is true, Q must be true. But if P is not true, then we make no claim about whether or not Q is true at all. So in either case, if P is false, it doesn’t matter what Q is. Whether it’s false or true, we’re not making any claim about Q whatsoever. We can still evaluate the implication to true. The only way that the implication is ever false is if our premise, P, is true, but the conclusion that we’re drawing Q happens to be false. So in that case, we would say P does not imply Q in that case. Finally, the last connective that we’ll discuss is this bi-conditional. You can think of a bi-conditional as a condition that goes in both directions. So originally, when I said something like, if it is raining, then I will be indoors. I didn’t say what would happen if it wasn’t raining. Maybe I’ll be indoors, maybe I’ll be outdoors. This bi-conditional, you can read as an if and only if. So I can say, I will be indoors if and only if it is raining, meaning if it is raining, then I will be indoors. And if I am indoors, it’s reasonable to conclude that it is also raining. So this bi-conditional is only true when P and Q are the same. So if P is true and Q is true, then this bi-conditional is also true. P implies Q, but also the reverse is true. Q also implies P. So if P and Q both happen to be false, we would still say it’s true. But in any of these other two situations, this P if and only if Q is going to ultimately evaluate to false. So a lot of trues and falses going on there, but these five basic logical connectives are going to form the core of the language of propositional logic, the language that we’re going to use in order to describe ideas, and the language that we’re going to use in order to reason about those ideas in order to draw conclusions. So let’s now take a look at some of the additional terms that we’ll need to know about in order to go about trying to form this language of propositional logic and writing AI that’s actually able to understand this sort of logic. The next thing we’re going to need is the notion of what is actually true about the world. We have a whole bunch of propositional symbols, P and Q and R and maybe others, but we need some way of knowing what actually is true in the world. Is P true or false? Is Q true or false? So on and so forth. And to do that, we’ll introduce the notion of a model. A model just assigns a truth value, where a truth value is either true or false, to every propositional symbol. In other words, it’s creating what we might call a possible world. So let me give an example. If, for example, I have two propositional symbols, P is it is raining and Q is it is a Tuesday, a model just takes each of these two symbols and assigns a truth value to them, either true or false. So here’s a sample model. In this model, in other words, in this possible world, it is possible that P is true, meaning it is raining, and Q is false, meaning it is not a Tuesday. But there are other possible worlds or other models as well. There is some model where both of these variables are true, some model where both of these variables are false. In fact, if there are n variables that are propositional symbols like this that are either true or false, then the number of possible models is 2 to the n, because each of these possible models, possible variables within my model, could be set to either true or false if I don’t know any information about it. So now that I have the symbols and the connectives that I’m going to need in order to construct these parts of knowledge, we need some way to represent that knowledge. And to do so, we’re going to allow our AI access to what we’ll call a knowledge base. And a knowledge base is really just a set of sentences that our AI knows to be true. Some set of sentences in propositional logic that are things that our AI knows about the world. And so we might tell our AI some information, information about a situation that it finds itself in, or a situation about a problem that it happens to be trying to solve. And we would give that information to the AI that the AI would store inside of its knowledge base. And what happens next is the AI would like to use that information in the knowledge base to be able to draw conclusions about the rest of the world. And what do those conclusions look like? Well, to understand those conclusions, we’ll need to introduce one more idea, one more symbol. And that is the notion of entailment. So this sentence here, with this double turnstile in these Greek letters, this is the Greek letter alpha and the Greek letter beta. And we read this as alpha entails beta. And alpha and beta here are just sentences in propositional logic. And what this means is that alpha entails beta means that in every model, in other words, in every possible world in which sentence alpha is true, then sentence beta is also true. So if something entails something else, if alpha entails beta, it means that if I know alpha to be true, then beta must therefore also be true. So if my alpha is something like I know that it is a Tuesday in January, then a reasonable beta might be something like I know that it is January. Because in all worlds where it is a Tuesday in January, I know for sure that it must be January, just by definition. This first statement or sentence about the world entails the second statement. And we can reasonably use deduction based on that first sentence to figure out that the second sentence is, in fact, true as well. And ultimately, it’s this idea of entailment that we’re going to try and encode into our computer. We want our AI agent to be able to figure out what the possible entailments are. We want our AI to be able to take these three sentences, sentences like, if it didn’t rain, Harry visited Hagrid. That Harry visited Hagrid or Dumbledore, but not both. And that Harry visited Dumbledore. And just using that information, we’d like our AI to be able to infer or figure out that using these three sentences inside of a knowledge base, we can draw some conclusions. In particular, we can draw the conclusions here that, one, Harry did not visit Hagrid today. And we can draw the entailment, too, that it did, in fact, rain today. And this process is known as inference. And that’s what we’re going to be focusing on today, this process of deriving new sentences from old ones, that I give you these three sentences, you put them in the knowledge base in, say, the AI. And the AI is able to use some sort of inference algorithm to figure out that these two sentences must also be true. And that is how we define inference. So let’s take a look at an inference example to see how we might actually go about inferring things in a human sense before we take a more algorithmic approach to see how we could encode this idea of inference in AI. And we’ll see there are a number of ways that we can actually achieve this. So again, we’ll deal with a couple of propositional symbols. We’ll deal with P, Q, and R. P is it is a Tuesday. Q is it is raining. And R is Harry will go for a run, three propositional symbols that we are just defining to mean this. We’re not saying anything yet about whether they’re true or false. We’re just defining what they are. Now, we’ll give ourselves or an AI access to a knowledge base, abbreviated to KB, the knowledge that we know about the world. We know this statement. All right. So let’s try to parse it. The parentheses here are just used for precedent, so we can see what associates with what. But you would read this as P and not Q implies R. All right. So what does that mean? Let’s put it piece by piece. P is it is a Tuesday. Q is it is raining, so not Q is it is not raining, and implies R is Harry will go for a run. So the way to read this entire sentence in human natural language at least is if it is a Tuesday and it is not raining, then Harry will go for a run. So if it is a Tuesday and it is not raining, then Harry will go for a run. And that is now inside of our knowledge base. And let’s now imagine that our knowledge base has two other pieces of information as well. It has information that P is true, that it is a Tuesday. And we also have the information not Q, that it is not raining, that this sentence Q, it is raining, happens to be false. And those are the three sentences that we have access to. P and not Q implies R, P and not Q. Using that information, we should be able to draw some inferences. P and not Q is only true if both P and not Q are true. All right, we know that P is true and we know that not Q is true. So we know that this whole expression is true. And the definition of implication is if this whole thing on the left is true, then this thing on the right must also be true. So if we know that P and not Q is true, then R must be true as well. So the inference we should be able to draw from all of this is that R is true and we know that Harry will go for a run by taking this knowledge inside of our knowledge base and being able to reason based on that idea. And so this ultimately is the beginning of what we might consider to be some sort of inference algorithm, some process that we can use to try and figure out whether or not we can draw some conclusion. And ultimately, what these inference algorithms are going to answer is the central question about entailment. Given some query about the world, something we’re wondering about the world, and we’ll call that query alpha, the question we want to ask using these inference algorithms is does KB, our knowledge base, entail alpha? In other words, using only the information we know inside of our knowledge base, the knowledge that we have access to, can we conclude that this sentence alpha is true? And that’s ultimately what we would like to do. So how can we do that? How can we go about writing an algorithm that can look at this knowledge base and figure out whether or not this query alpha is actually true? Well, it turns out there are a couple of different algorithms for doing so. And one of the simplest, perhaps, is known as model checking. Now, remember that a model is just some assignment of all of the propositional symbols inside of our language to a truth value, true or false. And you can think of a model as a possible world, that there are many possible worlds where different things might be true or false, and we can enumerate all of them. And the model checking algorithm does exactly that. So what does our model checking algorithm do? Well, if we wanted to determine if our knowledge base entails some query alpha, then we are going to enumerate all possible models. In other words, consider all possible values of true and false for our variables, all possible states in which our world can be in. And if in every model where our knowledge base is true, alpha is also true, then we know that the knowledge base entails alpha. So let’s take a closer look at that sentence and try and figure out what it actually means. If we know that in every model, in other words, in every possible world, no matter what assignment of true and false to variables you give, if we know that whenever our knowledge is true, what we know to be true is true, that this query alpha is also true, well, then it stands to reason that as long as our knowledge base is true, then alpha must also be true. And so this is going to form the foundation of our model checking algorithm. We’re going to enumerate all of the possible worlds and ask ourselves whenever the knowledge base is true, is alpha true? And if that’s the case, then we know alpha to be true. And otherwise, there is no entailment. Our knowledge base does not entail alpha. All right. So this is a little bit abstract, but let’s take a look at an example to try and put real propositional symbols to this idea. So again, we’ll work with the same example. P is it is a Tuesday, Q is it is raining, R as Harry will go for a run. Our knowledge base contains these pieces of information. P and not Q implies R. We also know P. It is a Tuesday and not Q. It is not raining. And our query, our alpha in this case, the thing we want to ask is R. We want to know, is it guaranteed? Is it entailed that Harry will go for a run? So the first step is to enumerate all of the possible models. We have three propositional symbols here, P, Q, and R, which means we have 2 to the third power, or eight possible models. All false, false, false true, false true, false, false true, true, et cetera. Eight possible ways you could assign true and false to all of these models. And we might ask in each one of them, is the knowledge base true? Here are the set of things that we know. In which of these worlds could this knowledge base possibly apply to? In which world is this knowledge base true? Well, in the knowledge base, for example, we know P. We know it is a Tuesday, which means we know that these four first four rows where P is false, none of those are going to be true or are going to work for this particular knowledge base. Our knowledge base is not true in those worlds. Likewise, we also know not Q. We know that it is not raining. So any of these models where Q is true, like these two and these two here, those aren’t going to work either because we know that Q is not true. And finally, we also know that P and not Q implies R, which means that when P is true or P is true here and Q is false, Q is false in these two, then R must be true. And if ever P is true, Q is false, but R is also false, well, that doesn’t satisfy this implication here. That implication does not hold true under those situations. So we could say that for our knowledge base, we can conclude under which of these possible worlds is our knowledge base true and under which of the possible worlds is our knowledge base false. And it turns out there is only one possible world where our knowledge base is actually true. In some cases, there might be multiple possible worlds where the knowledge base is true. But in this case, it just so happens that there’s only one, one possible world where we can definitively say something about our knowledge base. And in this case, we would look at the query. The query of R is R true, R is true, and so as a result, we can draw that conclusion. And so this is this idea of model check-in. Enumerate all the possible models and look in those possible models to see whether or not, if our knowledge base is true, is the query in question true as well. So let’s now take a look at how we might actually go about writing this in a programming language like Python. Take a look at some actual code that would encode this notion of propositional symbols and logic and these connectives like and and or and not and implication and so forth and see what that code might actually look like. So I’ve written in advance a logic library that’s more detailed than we need to worry about entirely today. But the important thing is that we have one class for every type of logical symbol or connective that we might have. So we just have one class for logical symbols, for example, where every symbol is going to represent and store some name for that particular symbol. And we also have a class for not that takes an operand. So we might say not one symbol to say something is not true or some other sentence is not true. We have one for and, one for or, so on and so forth. And I’ll just demonstrate how this works. And you can take a look at the actual logic.py later on. But I’ll go ahead and call this file harry.py. We’re going to store information about this world of Harry Potter, for example. So I’ll go ahead and import from my logic module. I’ll import everything. And in this library, in order to create a symbol, you use capital S symbol. And I’ll create a symbol for rain, to mean it is raining, for example. And I’ll create a symbol for Hagrid, to mean Harry visited Hagrid, is what this symbol is going to mean. So this symbol means it is raining. This symbol means Harry visited Hagrid. And I’ll add another symbol called Dumbledore for Harry visited Dumbledore. Now, I’d like to save these symbols so that I can use them later as I do some logical analysis. So I’ll go ahead and save each one of them inside of a variable. So like rain, Hagrid, and Dumbledore, so you could call the variables anything. And now that I have these logical symbols, I can use logical connectives to combine them together. So for example, if I have a sentence like and rain and Hagrid, for example, which is not necessarily true, but just for demonstration, I can now try and print out sentence.formula, which is a function I wrote that takes a sentence in propositional logic and just prints it out so that we, the programmers, can now see this in order to get an understanding for how it actually works. So if I run python harry.py, what we’ll see is this sentence in propositional logic, rain and Hagrid. This is the logical representation of what we have here in our Python program of saying and whose arguments are rain and Hagrid. So we’re saying rain and Hagrid by encoding that idea. And this is quite common in Python object-oriented programming, where you have a number of different classes, and you pass arguments into them in order to create a new and object, for example, in order to represent this idea. But now what I’d like to do is somehow encode the knowledge that I have about the world in order to solve that problem from the beginning of class, where we talked about trying to figure out who Harry visited and trying to figure out if it’s raining or if it’s not raining. And so what knowledge do I have? I’ll go ahead and create a new variable called knowledge. And what do I know? Well, I know the very first sentence that we talked about was the idea that if it is not raining, then Harry will visit Hagrid. So all right, how do I encode the idea that it is not raining? Well, I can use not and then the rain symbol. So here’s me saying that it is not raining. And now the implication is that if it is not raining, then Harry visited Hagrid. So I’ll wrap this inside of an implication to say, if it is not raining, this first argument to the implication will then Harry visited Hagrid. So I’m saying implication, the premise is that it’s not raining. And if it is not raining, then Harry visited Hagrid. And I can print out knowledge.formula to see the logical formula equivalent of that same idea. So I run Python of harry.py. And this is the logical formula that we see as a result, which is a text-based version of what we were looking at before, that if it is not raining, then that implies that Harry visited Hagrid. But there was additional information that we had access to as well. In this case, we had access to the fact that Harry visited either Hagrid or Dumbledore. So how do I encode that? Well, this means that in my knowledge, I’ve really got multiple pieces of knowledge going on. I know one thing and another thing and another thing. So I’ll go ahead and wrap all of my knowledge inside of an and. And I’ll move things on to new lines just for good measure. But I know multiple things. So I’m saying knowledge is an and of multiple different sentences. I know multiple different sentences to be true. One such sentence that I know to be true is this implication, that if it is not raining, then Harry visited Hagrid. Another such sentence that I know to be true is or Hagrid Dumbledore. In other words, Hagrid or Dumbledore is true, because I know that Harry visited Hagrid or Dumbledore. But I know more than that, actually. That initial sentence from before said that Harry visited Hagrid or Dumbledore, but not both. So now I want a sentence that will encode the idea that Harry didn’t visit both Hagrid and Dumbledore. Well, the notion of Harry visiting Hagrid and Dumbledore would be represented like this, and of Hagrid and Dumbledore. And if that is not true, if I want to say not that, then I’ll just wrap this whole thing inside of a not. So now these three lines, line 8 says that if it is not raining, then Harry visited Hagrid. Line 9 says Harry visited Hagrid or Dumbledore. And line 10 says Harry didn’t visit both Hagrid and Dumbledore, that it is not true that both the Hagrid symbol and the Dumbledore symbol are true. Only one of them can be true. And finally, the last piece of information that I knew was the fact that Harry visited Dumbledore. So these now are the pieces of knowledge that I know, one sentence and another sentence and another and another. And I can print out what I know just to see it a little bit more visually. And here now is a logical representation of the information that my computer is now internally representing using these various different Python objects. And again, take a look at logic.py if you want to take a look at how exactly it’s implementing this, but no need to worry too much about all of the details there. We’re here saying that if it is not raining, then Harry visited Hagrid. We’re saying that Hagrid or Dumbledore is true. And we’re saying it is not the case that Hagrid and Dumbledore is true, that they’re not both true. And we also know that Dumbledore is true. So this long logical sentence represents our knowledge base. It is the thing that we know. And now what we’d like to do is we’d like to use model checking to ask a query, to ask a question like, based on this information, do I know whether or not it’s raining? And we as humans were able to logic our way through it and figure out that, all right, based on these sentences, we can conclude this and that to figure out that, yes, it must have been raining. But now we’d like for the computer to do that as well. So let’s take a look at the model checking algorithm that is going to follow that same pattern that we drew out in pseudocode a moment ago. So I’ve defined a function here in logic.py that you can take a look at called model check. Model check takes two arguments, the knowledge that I already know, and the query. And the idea is, in order to do model checking, I need to enumerate all of the possible models. And for each of the possible models, I need to ask myself, is the knowledge base true? And is the query true? So the first thing I need to do is somehow enumerate all of the possible models, meaning for all possible symbols that exist, I need to assign true and false to each one of them and see whether or not it’s still true. And so here is the way we’re going to do that. We’re going to start. So I’ve defined another helper function internally that we’ll get to in just a moment. But this function starts by getting all of the symbols in both the knowledge and the query, by figuring out what symbols am I dealing with. In this case, the symbols I’m dealing with are rain and Hagrid and Dumbledore, but there might be other symbols depending on the problem. And we’ll take a look soon at some examples of situations where ultimately we’re going to need some additional symbols in order to represent the problem. And then we’re going to run this check all function, which is a helper function that’s basically going to recursively call itself checking every possible configuration of propositional symbols. So we start out by looking at this check all function. And what do we do? So if not symbols means if we finish assigning all of the symbols. We’ve assigned every symbol a value. So far we haven’t done that, but if we ever do, then we check. In this model, is the knowledge true? That’s what this line is saying. If we evaluate the knowledge propositional logic formula using the model’s assignment of truth values, is the knowledge true? If the knowledge is true, then we should return true only if the query is true. Because if the knowledge is true, we want the query to be true as well in order for there to be entailment. Otherwise, we don’t know that there otherwise there won’t be an entailment if there’s ever a situation where what we know in our knowledge is true, but the query, the thing we’re asking, happens to be false. So this line here is checking that same idea that in all worlds where the knowledge is true, the query must also be true. Otherwise, we can just return true because if the knowledge isn’t true, then we don’t care. This is equivalent to when we were enumerating this table from a moment ago. In all situations where the knowledge base wasn’t true, all of these seven rows here, we didn’t care whether or not our query was true or not. We only care to check whether the query is true when the knowledge base is actually true, which was just this green highlighted row right there. So that logic is encoded using that statement there. And otherwise, if we haven’t assigned symbols yet, which we haven’t seen anything yet, then the first thing we do is pop one of the symbols. I make a copy of the symbols first just to save an existing copy. But I pop one symbol off of the remaining symbols so that I just pick one symbol at random. And I create one copy of the model where that symbol is true. And I create a second copy of the model where that symbol is false. So I now have two copies of the model, one where the symbol is true and one where the symbol is false. And I need to make sure that this entailment holds in both of those models. So I recursively check all on the model where the statement is true and check all on the model where the statement is false. So again, you can take a look at that function to try to get a sense for how exactly this logic is working. But in effect, what it’s doing is recursively calling this check all function again and again and again. And on every level of the recursion, we’re saying let’s pick a new symbol that we haven’t yet assigned, assign it to true and assign it to false, and then check to make sure that the entailment holds in both cases. Because ultimately, I need to check every possible world. I need to take every combination of symbols and try every combination of true and false in order to figure out whether the entailment relation actually holds. So that function we’ve written for you. But in order to use that function inside of harry.py, what I’ll write is something like this. I would like to model check based on the knowledge. And then I provide as a second argument what the query is, what the thing I want to ask is. And what I want to ask in this case is, is it raining? So model check again takes two arguments. The first argument is the information that I know, this knowledge, which in this case is this information that was given to me at the beginning. And the second argument, rain, is encoding the idea of the query. What am I asking? I would like to ask, based on this knowledge, do I know for sure that it is raining? And I can try and print out the result of that. And when I run this program, I see that the answer is true. That based on this information, I can conclusively say that it is raining, because using this model checking algorithm, we were able to check that in every world where this knowledge is true, it is raining. In other words, there is no world where this knowledge is true, and it is not raining. So you can conclude that it is, in fact, raining. And this sort of logic can be applied to a number of different types of problems, that if confronted with a problem where some sort of logical deduction can be used in order to try to solve it, you might try thinking about what propositional symbols you might need in order to represent that information, and what statements and propositional logic you might use in order to encode that information which you know. And this process of trying to take a problem and figure out what propositional symbols to use in order to encode that idea, or how to represent it logically, is known as knowledge engineering. That software engineers and AI engineers will take a problem and try and figure out how to distill it down into knowledge that is representable by a computer. And if we can take any general purpose problem, some problem that we find in the human world, and turn it into a problem that computers know how to solve as by using any number of different variables, well, then we can take a computer that is able to do something like model checking or some other inference algorithm and actually figure out how to solve that problem. So now we’ll take a look at two or three examples of knowledge engineering and practice, of taking some problem and figuring out how we can apply logical symbols and use logical formulas to be able to encode that idea. And we’ll start with a very popular board game in the US and the UK known as Clue. Now, in the game of Clue, there’s a number of different factors that are going on. But the basic premise of the game, if you’ve never played it before, is that there are a number of different people. For now, we’ll just use three, Colonel Mustard, Professor Plumb, and Miss Scarlet. There are a number of different rooms, like a ballroom, a kitchen, and a library. And there are a number of different weapons, a knife, a revolver, and a wrench. And three of these, one person, one room, and one weapon, is the solution to the mystery, the murderer and what room they were in and what weapon they happened to use. And what happens at the beginning of the game is that all these cards are randomly shuffled together. And three of them, one person, one room, and one weapon, are placed into a sealed envelope that we don’t know. And we would like to figure out, using some sort of logical process, what’s inside the envelope, which person, which room, and which weapon. And we do so by looking at some, but not all, of these cards here, by looking at these cards to try and figure out what might be going on. And so this is a very popular game. But let’s now try and formalize it and see if we could train a computer to be able to play this game by reasoning through it logically. So in order to do this, we’ll begin by thinking about what propositional symbols we’re ultimately going to need. Remember, again, that propositional symbols are just some symbol, some variable, that can be either true or false in the world. And so in this case, the propositional symbols are really just going to correspond to each of the possible things that could be inside the envelope. Mustard is a propositional symbol that, in this case, will just be true if Colonel Mustard is inside the envelope, if he is the murderer, and false otherwise. And likewise for Plum, for Professor Plum, and Scarlet, for Miss Scarlet. And likewise for each of the rooms and for each of the weapons. We have one propositional symbol for each of these ideas. Then using those propositional symbols, we can begin to create logical sentences, create knowledge that we know about the world. So for example, we know that someone is the murderer, that one of the three people is, in fact, the murderer. And how would we encode that? Well, we don’t know for sure who the murderer is. But we know it is one person or the second person or the third person. So I could say something like this. Mustard or Plum or Scarlet. And this piece of knowledge encodes that one of these three people is the murderer. We don’t know which, but one of these three things must be true. What other information do we know? Well, we know that, for example, one of the rooms must have been the room in the envelope. The crime was committed either in the ballroom or the kitchen or the library. Again, right now, we don’t know which. But this is knowledge we know at the outset, knowledge that one of these three must be inside the envelope. And likewise, we can say the same thing about the weapon, that it was either the knife or the revolver or the wrench, that one of those weapons must have been the weapon of choice and therefore the weapon in the envelope. And then as the game progresses, the gameplay works by people get various different cards. And using those cards, you can deduce information. That if someone gives you a card, for example, I have the Professor Plum card in my hand, then I know the Professor Plum card can’t be inside the envelope. I know that Professor Plum is not the criminal, so I know a piece of information like not Plum, for example. I know that Professor Plum has to be false. This propositional symbol is not true. And sometimes I might not know for sure that a particular card is not in the middle, but sometimes someone will make a guess and I’ll know that one of three possibilities is not true. Someone will guess Colonel Mustard in the library with the revolver or something to that effect. And in that case, a card might be revealed that I don’t see. But if it is a card and it is either Colonel Mustard or the revolver or the library, then I know that at least one of them can’t be in the middle. So I know something like it is either not Mustard or it is not the library or it is not the revolver. Now maybe multiple of these are not true, but I know that at least one of Mustard, Library, and Revolver must, in fact, be false. And so this now is a propositional logic representation of this game of Clue, a way of encoding the knowledge that we know inside this game using propositional logic that a computer algorithm, something like model checking that we saw a moment ago, can actually look at and understand. So let’s now take a look at some code to see how this algorithm might actually work in practice. All right, so I’m now going to open up a file called Clue.py, which I’ve started already. And what we’ll see here is I’ve defined a couple of things. To find some symbols initially, notice I have a symbol for Colonel Mustard, a symbol for Professor Plum, a symbol for Miss Scarlett, all of which I’ve put inside of this list of characters. I have a symbol for Ballroom and Kitchen and Library inside of a list of rooms. And then I have symbols for Knife and Revolver and Wrench. These are my weapons. And so all of these characters and rooms and weapons altogether, those are my symbols. And now I also have this check knowledge function. And what the check knowledge function does is it takes my knowledge and it’s going to try and draw conclusions about what I know. So for example, we’ll loop over all of the possible symbols and we’ll check, do I know that that symbol is true? And a symbol is going to be something like Professor Plum or the Knife or the Library. And if I know that it is true, in other words, I know that it must be the card in the envelope, then I’m going to print out using a function called cprint, which prints things in color. I’m going to print out the word yes, and I’m going to print that in green, just to make it very clear to us. If we’re not sure that the symbol is true, maybe I can check to see if I’m sure that the symbol is not true. Like if I know for sure that it is not Professor Plum, for example. And I do that by running model check again, this time checking if my knowledge is not the symbol, if I know for sure that the symbol is not true. And if I don’t know for sure that the symbol is not true, because I say if not model check, meaning I’m not sure that the symbol is false, well, then I’ll go ahead and print out maybe next to the symbol. Because maybe the symbol is true, maybe it’s not, I don’t actually know. So what knowledge do I actually have? Well, let’s try and represent my knowledge now. So my knowledge is, I know a couple of things, so I’ll put them in an and. And I know that one of the three people must be the criminal. So I know or mustard, plum, scarlet. This is my way of encoding that it is either Colonel Mustard or Professor Plum or Miss Scarlet. I know that it must have happened in one of the rooms. So I know or ballroom, kitchen, library, for example. And I know that one of the weapons must have been used as well. So I know or knife, revolver, wrench. So that might be my initial knowledge, that I know that it must have been one of the people, I know it must have been in one of the rooms, and I know that it must have been one of the weapons. And I can see what that knowledge looks like as a formula by printing out knowledge.formula. So I’ll run python clue.py. And here now is the information that I know in logical format. I know that it is Colonel Mustard or Professor Plum or Miss Scarlet. And I know that it is the ballroom, the kitchen, or the library. And I know that it is the knife, the revolver, or the wrench. But I don’t know much more than that. I can’t really draw any firm conclusions. And in fact, we can see that if I try and do, let me go ahead and run my knowledge check function on my knowledge. Knowledge check is this function that I, or check knowledge rather, is this function that I just wrote that looks over all of the symbols and tries to see what conclusions I can actually draw about any of the symbols. So I’ll go ahead and run clue.py and see what it is that I know. And it seems that I don’t really know anything for sure. I have all three people are maybes, all three of the rooms are maybes, all three of the weapons are maybes. I don’t really know anything for certain just yet. But now let me try and add some additional information and see if additional information, additional knowledge, can help us to logically reason our way through this process. And we are just going to provide the information. Our AI is going to take care of doing the inference and figuring out what conclusions it’s able to draw. So I start with some cards. And those cards tell me something. So if I have the kernel mustard card, for example, I know that the mustard symbol must be false. In other words, mustard is not the one in the envelope, is not the criminal. So I can say, knowledge supports something called, every and in this library supports dot add, which is a way of adding knowledge or adding an additional logical sentence to an and clause. So I can say, knowledge dot add, not mustard. I happen to know, because I have the mustard card, that kernel mustard is not the suspect. And maybe I have a couple of other cards too. Maybe I also have a card for the kitchen. So I know it’s not the kitchen. And maybe I have another card that says that it is not the revolver. So I have three cards, kernel mustard, the kitchen, and the revolver. And I encode that into my AI this way by saying, it’s not kernel mustard, it’s not the kitchen, and it’s not the revolver. And I know those to be true. So now, when I rerun clue.py, we’ll see that I’ve been able to eliminate some possibilities. Before, I wasn’t sure if it was the knife or the revolver or the wrench. If a knife was maybe, a revolver was maybe, wrench is maybe. Now I’m down to just the knife and the wrench. Between those two, I don’t know which one it is. They’re both maybes. But I’ve been able to eliminate the revolver, which is one that I know to be false, because I have the revolver card. And so additional information might be acquired over the course of this game. And we would represent that just by adding knowledge to our knowledge set or knowledge base that we’ve been building here. So if, for example, we additionally got the information that someone made a guess, someone guessed like Miss Scarlet in the library with the wrench. And we know that a card was revealed, which means that one of those three cards, either Miss Scarlet or the library or the wrench, one of those at minimum must not be inside of the envelope. So I could add some knowledge, say knowledge.add. And I’m going to add an or clause, because I don’t know for sure which one it’s not, but I know one of them is not in the envelope. So it’s either not Scarlet, or it’s not the library, and or supports multiple arguments. I can say it’s also or not the wrench. So at least one of those needs a Scarlet library and wrench. At least one of those needs to be false. I don’t know which, though. Maybe it’s multiple. Maybe it’s just one, but at least one I know needs to hold. And so now if I rerun clue.py, I don’t actually have any additional information just yet. Nothing I can say conclusively. I still know that maybe it’s Professor Plum, maybe it’s Miss Scarlet. I haven’t eliminated any options. But let’s imagine that I get some more information, that someone shows me the Professor Plum card, for example. So I say, all right, let’s go back here, knowledge.add, not Plum. So I have the Professor Plum card. I know the Professor Plum is not in the middle. I rerun clue.py. And right now, I’m able to draw some conclusions. Now I’ve been able to eliminate Professor Plum, and the only person it could left remaining be is Miss Scarlet. So I know, yes, Miss Scarlet, this variable must be true. And I’ve been able to infer that based on the information I already had. Now between the ballroom and the library and the knife and the wrench, for those two, I’m still not sure. So let’s add one more piece of information. Let’s say that I know that it’s not the ballroom. Someone has shown me the ballroom card, so I know it’s not the ballroom. Which means at this point, I should be able to conclude that it’s the library. Let’s see. I’ll say knowledge.add, not the ballroom. And we’ll go ahead and run that. And it turns out that after all of this, not only can I conclude that I know that it’s the library, but I also know that the weapon was the knife. And that might have been an inference that was a little bit trickier, something I wouldn’t have realized immediately, but the AI, via this model checking algorithm, is able to draw that conclusion, that we know for sure that it must be Miss Scarlet in the library with the knife. And how did we know that? Well, we know it from this or clause up here, that we know that it’s either not Scarlet, or it’s not the library, or it’s not the wrench. And given that we know that it is Miss Scarlet, and we know that it is the library, then the only remaining option for the weapon is that it is not the wrench, which means that it must be the knife. So we as humans now can go back and reason through that, even though it might not have been immediately clear. And that’s one of the advantages of using an AI or some sort of algorithm in order to do this, is that the computer can exhaust all of these possibilities and try and figure out what the solution actually should be. And so for that reason, it’s often helpful to be able to represent knowledge in this way. Knowledge engineering, some situation where we can use a computer to be able to represent knowledge and draw conclusions based on that knowledge. And any time we can translate something into propositional logic symbols like this, this type of approach can be useful. So you might be familiar with logic puzzles, where you have to puzzle your way through trying to figure something out. This is what a classic logic puzzle might look like. Something like Gilderoy, Minerva, Pomona, and Horace each belong to a different one of the four houses, Gryffindor, Hufflepuff, Ravenclaw, and Slytherin. And then we have some information. The Gilderoy belongs to Gryffindor or Ravenclaw, Pomona does not belong in Slytherin, and Minerva does belong to Gryffindor. So we have a couple pieces of information. And using that information, we need to be able to draw some conclusions about which person should be assigned to which house. And again, we can use the exact same idea to try and implement this notion. So we need some propositional symbols. And in this case, the propositional symbols are going to get a little more complex, although we’ll see ways to make this a little bit cleaner later on. But we’ll need 16 propositional symbols, one for each person and house. So we need to say, remember, every propositional symbol is either true or false. So Gilderoy Gryffindor is either true or false. Either he’s in Gryffindor or he is not. Likewise, Gilderoy Hufflepuff also true or false. Either it is true or it’s false. And that’s true for every combination of person and house that we could come up with. We have some sort of propositional symbol for each one of those. Using this type of knowledge, we can then begin to think about what types of logical sentences we can say about the puzzle. That if we know what will before even think about the information we were given, we can think about the premise of the problem, that every person is assigned to a different house. So what does that tell us? Well, it tells us sentences like this. It tells us like Pomona Slytherin implies not Pomona Hufflepuff. Something like if Pomona is in Slytherin, then we know that Pomona is not in Hufflepuff. And we know this for all four people and for all combinations of houses, that no matter what person you pick, if they’re in one house, then they’re not in some other house. So I’ll probably have a whole bunch of knowledge statements that are of this form, that if we know Pomona is in Slytherin, then we know Pomona is not in Hufflepuff. We were also given the information that each person is in a different house. So I also have pieces of knowledge that look something like this. Minerva Ravenclaw implies not Gilderoy Ravenclaw. If they’re all in different houses, then if Minerva is in Ravenclaw, then we know the Gilderoy is not in Ravenclaw as well. And I have a whole bunch of similar sentences like this that are expressing that idea for other people and other houses as well. And so in addition to sentences of these form, I also have the knowledge that was given to me. Information like Gilderoy was in Gryffindor or in Ravenclaw that would be represented like this, Gilderoy Gryffindor or Gilderoy Ravenclaw. And then using these sorts of sentences, I can begin to draw some conclusions about the world. So let’s see an example of this. We’ll go ahead and actually try and implement this logic puzzle to see if we can figure out what the answer is. I’ll go ahead and open up puzzle.py, where I’ve already started to implement this sort of idea. I’ve defined a list of people and a list of houses. And I’ve so far created one symbol for every person and for every house. That’s what this double four loop is doing, looping over all people, looping over all houses, creating a new symbol for each of them. And then I’ve added some information. I know that every person belongs to a house, so I’ve added the information for every person that person Gryffindor or person Hufflepuff or person Ravenclaw or person Slytherin, that one of those four things must be true. Every person belongs to a house. What other information do I know? I also know that only one house per person, so no person belongs to multiple houses. So how does this work? Well, this is going to be true for all people. So I’ll loop over every person. And then I need to loop over all different pairs of houses. The idea is I want to encode the idea that if Minerva is in Gryffindor, then Minerva can’t be in Ravenclaw. So I’ll loop over all houses, each one. And I’ll loop over all houses again, h2. And as long as they’re different, h1 not equal to h2, then I’ll add to my knowledge base this piece of information. That implication, in other words, an if then, if the person is in h1, then I know that they are not in house h2. So these lines here are encoding the notion that for every person, if they belong to house one, then they are not in house two. And the other piece of logic we need to encode is the idea that every house can only have one person. In other words, if Pomona is in Hufflepuff, then nobody else is allowed to be in Hufflepuff either. And that’s the same logic, but sort of backwards. I loop over all of the houses and loop over all different pairs of people. So I loop over people once, loop over people again, and only do this when the people are different, p1 not equal to p2. And I add the knowledge that if, as given by the implication, if person one belongs to the house, then it is not the case that person two belongs to the same house. So here I’m just encoding the knowledge that represents the problem’s constraints. I know that everyone’s in a different house. I know that any person can only belong to one house. And I can now take my knowledge and try and print out the information that I happen to know. So I’ll go ahead and print out knowledge.formula, just to see this in action, and I’ll go ahead and skip this for now. But we’ll come back to this in a second. Let’s print out the knowledge that I know by running Python puzzle.py. It’s a lot of information, a lot that I have to scroll through, because there are 16 different variables all going on. But the basic idea, if we scroll up to the very top, is I see my initial information. Gilderoy is either in Gryffindor, or Gilderoy is in Hufflepuff, or Gilderoy is in Ravenclaw, or Gilderoy is in Slytherin, and then way more information as well. So this is quite messy, more than we really want to be looking at. And soon, too, we’ll see ways of representing this a little bit more nicely using logic. But for now, we can just say these are the variables that we’re dealing with. And now we’d like to add some information. So the information we’re going to add is Gilderoy is in Gryffindor, or he is in Ravenclaw. So that knowledge was given to us. So I’ll go ahead and say knowledge.add. And I know that either or Gilderoy Gryffindor or Gilderoy Ravenclaw. One of those two things must be true. I also know that Pomona was not in Slytherin, so I can say knowledge.add not this symbol, not the Pomona-Slytherin symbol. And then I can add the knowledge that Minerva is in Gryffindor by adding the symbol Minerva Gryffindor. So those are the pieces of knowledge that I know. And this loop here at the bottom just loops over all of my symbols, checks to see if the knowledge entails that symbol by calling this model check function again. And if it does, if we know the symbol is true, we print out the symbol. So now I can run Python, puzzle.py, and Python is going to solve this puzzle for me. We’re able to conclude that Gilderoy belongs to Ravenclaw, Pomona belongs to Hufflepuff, Minerva to Gryffindor, and Horace to Slytherin just by encoding this knowledge inside the computer, although it was quite tedious to do in this case. And as a result, we were able to get the conclusion from that as well. And you can imagine this being applied to many sorts of different deductive situations. So not only these situations where we’re trying to deal with Harry Potter characters in this puzzle, but if you’ve ever played games like Mastermind, where you’re trying to figure out which order different colors go in and trying to make predictions about it, I could tell you, for example, let’s play a simplified version of Mastermind where there are four colors, red, blue, green, and yellow, and they’re in some order, but I’m not telling you what order. You just have to make a guess, and I’ll tell you of red, blue, green, and yellow how many of the four you got in the right position. So a simplified version of this game, you might make a guess like red, blue, green, yellow, and I would tell you something like two of those four are in the correct position, but the other two are not. And then you could reasonably make a guess and say, all right, look at this, blue, red, green, yellow. Try switching two of them around, and this time maybe I tell you, you know what, none of those are in the correct position. And the question then is, all right, what is the correct order of these four colors? And we as humans could begin to reason this through. All right, well, if none of these were correct, but two of these were correct, well, it must have been because I switched the red and the blue, which means red and blue here must be correct, which means green and yellow are probably not correct. You can begin to do this sort of deductive reasoning. And we can also equivalently try and take this and encode it inside of our computer as well. And it’s going to be very similar to the logic puzzle that we just did a moment ago. So I won’t spend too much time on this code because it is fairly similar. But again, we have a whole bunch of colors and four different positions in which those colors can be. And then we have some additional knowledge. And I encode all of that knowledge. And you can take a look at this code on your own time. But I just want to demonstrate that when we run this code, run python mastermind.py and run and see what we get, we ultimately are able to compute red 0 in the 0 position, blue in the 1 position, yellow in the 2 position, and green in the 3 position as the ordering of those symbols. Now, ultimately, what you might have noticed is this process was taking quite a long time. And in fact, model checking is not a particularly efficient algorithm, right? What I need to do in order to model check is take all of my possible different variables and enumerate all of the possibilities that they could be in. If I have n variables, I have 2 to the n possible worlds that I need to be looking through in order to perform this model checking algorithm. And this is probably not tractable, especially as we start to get to much larger and larger sets of data where you have many, many more variables that are at play. Right here, we only have a relatively small number of variables. So this sort of approach can actually work. But as the number of variables increases, model checking becomes less and less good of a way of trying to solve these sorts of problems. So while it might have been OK for something like Mastermind to conclude that this is indeed the correct sequence where all four are in the correct position, what we’d like to do is come up with some better ways to be able to make inferences rather than just enumerate all of the possibilities. And to do so, what we’ll transition to next is the idea of inference rules, some sort of rules that we can apply to take knowledge that already exists and translate it into new forms of knowledge. And the general way we’ll structure an inference rule is by having a horizontal line here. Anything above the line is going to represent a premise, something that we know to be true. And then anything below the line will be the conclusion that we can arrive at after we apply the logic from the inference rule that we’re going to demonstrate. So we’ll do some of these inference rules by demonstrating them in English first, but then translating them into the world of propositional logic so you can see what those inference rules actually look like. So for example, let’s imagine that I have access to two pieces of information. I know, for example, that if it is raining, then Harry is inside, for example. And let’s say I also know it is raining. Then most of us could reasonably then look at this information and conclude that, all right, Harry must be inside. This inference rule is known as modus ponens, and it’s phrased more formally in logic as this. If we know that alpha implies beta, in other words, if alpha, then beta, and we also know that alpha is true, then we should be able to conclude that beta is also true. We can apply this inference rule to take these two pieces of information and generate this new piece of information. Notice that this is a totally different approach from the model checking approach, where the approach was look at all of the possible worlds and see what’s true in each of these worlds. Here, we’re not dealing with any specific world. We’re just dealing with the knowledge that we know and what conclusions we can arrive at based on that knowledge. That I know that A implies B, and I know A, and the conclusion is B. And this should seem like a relatively obvious rule. But of course, if alpha, then beta, and we know alpha, then we should be able to conclude that beta is also true. And that’s going to be true for many, but maybe even all of the inference rules that we’ll take a look at. You should be able to look at them and say, yeah, of course that’s going to be true. But it’s putting these all together, figuring out the right combination of inference rules that can be applied that ultimately is going to allow us to generate interesting knowledge inside of our AI. So that’s modus ponensis application of implication, that if we know alpha and we know that alpha implies beta, then we can conclude beta. Let’s take a look at another example. Fairly straightforward, something like Harry is friends with Ron and Hermione. Based on that information, we can reasonably conclude Harry is friends with Hermione. That must also be true. And this inference rule is known as and elimination. And what and elimination says is that if we have a situation where alpha and beta are both true, I have information alpha and beta, well then, just alpha is true. Or likewise, just beta is true. That if I know that both parts are true, then one of those parts must also be true. Again, something obvious from the point of view of human intuition, but a computer needs to be told this kind of information. To be able to apply the inference rule, we need to tell the computer that this is an inference rule that you can apply, so the computer has access to it and is able to use it in order to translate information from one form to another. In addition to that, let’s take a look at another example of an inference rule, something like it is not true that Harry did not pass the test. Bit of a tricky sentence to parse. I’ll read it again. It is not true, or it is false, that Harry did not pass the test. Well, if it is false that Harry did not pass the test, then the only reasonable conclusion is that Harry did pass the test. And so this, instead of being and elimination, is what we call double negation elimination. That if we have two negatives inside of our premise, then we can just remove them altogether. They cancel each other out. One turns true to false, and the other one turns false back into true. Phrased a little bit more formally, we say that if the premise is not alpha, then the conclusion we can draw is just alpha. We can say that alpha is true. We’ll take a look at a couple more of these. If I have it is raining, then Harry is inside. How do I reframe this? Well, this one is a little bit trickier. But if I know if it is raining, then Harry is inside, then I conclude one of two things must be true. Either it is not raining, or Harry is inside. Now, this one’s trickier. So let’s think about it a little bit. This first premise here, if it is raining, then Harry is inside, is saying that if I know that it is raining, then Harry must be inside. So what is the other possible case? Well, if Harry is not inside, then I know that it must not be raining. So one of those two situations must be true. Either it’s not raining, or it is raining, in which case Harry is inside. So the conclusion I can draw is either it is not raining, or it is raining, so therefore, Harry is inside. And so this is a way to translate if-then statements into or statements. And this is known as implication elimination. And this is similar to what we actually did in the beginning when we were first looking at those very first sentences about Harry and Hagrid and Dumbledore. And phrased a little bit more formally, this says that if I have the implication, alpha implies beta, that I can draw the conclusion that either not alpha or beta, because there are only two possibilities. Either alpha is true or alpha is not true. So one of those possibilities is alpha is not true. But if alpha is true, well, then we can draw the conclusion that beta must be true. So either alpha is not true or alpha is true, in which case beta is also true. So this is one way to turn an implication into just a statement about or. In addition to eliminating implications, we can also eliminate biconditionals as well. So let’s take an English example, something like, it is raining if and only if Harry is inside. And this if and only if really sounds like that biconditional, that double arrow sign that we saw in propositional logic not too long ago. And what does this actually mean if we were to translate this? Well, this means that if it is raining, then Harry is inside. And if Harry is inside, then it is raining, that this implication goes both ways. And this is what we would call biconditional elimination, that I can take a biconditional, a if and only if b, and translate that into something like this, a implies b, and b implies a. So many of these inference rules are taking logic that uses certain symbols and turning them into different symbols, taking an implication and turning it into an or, or taking a biconditional and turning it into implication. And another example of it would be something like this. It is not true that both Harry and Ron passed the test. Well, all right, how do we translate that? What does that mean? Well, if it is not true that both of them passed the test, well, then the reasonable conclusion we might draw is that at least one of them didn’t pass the test. So the conclusion is either Harry did not pass the test or Ron did not pass the test, or both. This is not an exclusive or. But if it is true that it is not true that both Harry and Ron passed the test, well, then either Harry didn’t pass the test or Ron didn’t pass the test. And this type of law is one of De Morgan’s laws. Quite famous in logic where the idea is that we can turn an and into an or. We can say we can take this and that both Harry and Ron passed the test and turn it into an or by moving the nots around. So if it is not true that Harry and Ron passed the test, well, then either Harry did not pass the test or Ron did not pass the test either. And the way we frame that more formally using logic is to say this. If it is not true that alpha and beta, well, then either not alpha or not beta. The way I like to think about this is that if you have a negation in front of an and expression, you move the negation inwards, so to speak, moving the negation into each of these individual sentences and then flip the and into an or. So the negation moves inwards and the and flips into an or. So I go from not a and b to not a or not b. And there’s actually a reverse of De Morgan’s law that goes in the other direction for something like this. If I say it is not true that Harry or Ron passed the test, meaning neither of them passed the test, well, then the conclusion I can draw is that Harry did not pass the test and Ron did not pass the test. So in this case, instead of turning an and into an or, we’re turning an or into an and. But the idea is the same. And this, again, is another example of De Morgan’s laws. And the way that works is that if I have not a or b this time, the same logic is going to apply. I’m going to move the negation inwards. And I’m going to flip this time, flip the or into an and. So if not a or b, meaning it is not true that a or b or alpha or beta, then I can say not alpha and not beta, moving the negation inwards in order to make that conclusion. So those are De Morgan’s laws and a couple other inference rules that are worth just taking a look at. One is the distributive law that works this way. So if I have alpha and beta or gamma, well, then much in the same way that you can use in math, use distributive laws to distribute operands like addition and multiplication, I can do a similar thing here, where I can say if alpha and beta or gamma, then I can say something like alpha and beta or alpha and gamma, that I’ve been able to distribute this and sign throughout this expression. So this is an example of the distributive property or the distributive law as applied to logic in much the same way that you would distribute a multiplication over the addition of something, for example. This works the other way too. So if, for example, I have alpha or beta and gamma, I can distribute the or throughout the expression. I can say alpha or beta and alpha or gamma. So the distributive law works in that way too. And it’s helpful if I want to take an or and move it into the expression. And we’ll see an example soon of why it is that we might actually care to do something like that. All right, so now we’ve seen a lot of different inference rules. And the question now is, how can we use those inference rules to actually try and draw some conclusions, to actually try and prove something about entailment, proving that given some initial knowledge base, we would like to find some way to prove that a query is true? Well, one way to think about it is actually to think back to what we talked about last time when we talked about search problems. Recall again that search problems have some sort of initial state. They have actions that you can take from one state to another as defined by a transition model that tells you how to get from one state to another. We talked about testing to see if you were at a goal. And then some path cost function to see how many steps did you have to take or how costly was the solution that you found. Now that we have these inference rules that take some set of sentences in propositional logic and get us some new set of sentences in propositional logic, we can actually treat those sentences or those sets of sentences as states inside of a search problem. So if we want to prove that some query is true, prove that some logical theorem is true, we can treat theorem proving as a form of a search problem. I can say that we begin in some initial state, where that initial state is the knowledge base that I begin with, the set of all of the sentences that I know to be true. What actions are available to me? Well, the actions are any of the inference rules that I can apply at any given time. The transition model just tells me after I apply the inference rule, here is the new set of all of the knowledge that I have, which will be the old set of knowledge, plus some additional inference that I’ve been able to draw, much as in the same way we saw what we got when we applied those inference rules and got some sort of conclusion. That conclusion gets added to our knowledge base, and our transition model will encode that. What is the goal test? Well, our goal test is checking to see if we have proved the statement we’re trying to prove, if the thing we’re trying to prove is inside of our knowledge base. And the path cost function, the thing we’re trying to minimize, is maybe the number of inference rules that we needed to use, the number of steps, so to speak, inside of our proof. And so here we’ve been able to apply the same types of ideas that we saw last time with search problems to something like trying to prove something about knowledge by taking our knowledge and framing it in terms that we can understand as a search problem with an initial state, with actions, with a transition model. So this shows a couple of things, one being how versatile search problems are, that they can be the same types of algorithms that we use to solve a maze or figure out how to get from point A to point B inside of driving directions, for example, can also be used as a theorem proving method of taking some sort of starting knowledge base and trying to prove something about that knowledge. So this, yet again, is a second way, in addition to model checking, to try and prove that certain statements are true. But it turns out there’s yet another way that we can try and apply inference. And we’ll talk about this now, which is not the only way, but certainly one of the most common, which is known as resolution. And resolution is based on another inference rule that we’ll take a look at now, quite a powerful inference rule that will let us prove anything that can be proven about a knowledge base. And it’s based on this basic idea. Let’s say I know that either Ron is in the Great Hall or Hermione is in the library. And let’s say I also know that Ron is not in the Great Hall. Based on those two pieces of information, what can I conclude? Well, I could pretty reasonably conclude that Hermione must be in the library. How do I know that? Well, it’s because these two statements, these two what we’ll call complementary literals, literals that complement each other, they’re opposites of each other, seem to conflict with each other. This sentence tells us that either Ron is in the Great Hall or Hermione is in the library. So if we know that Ron is not in the Great Hall, that conflicts with this one, which means Hermione must be in the library. And this we can frame as a more general rule known as the unit resolution rule, a rule that says that if we have p or q and we also know not p, well then from that we can reasonably conclude q. That if p or q are true and we know that p is not true, the only possibility is for q to then be true. And this, it turns out, is quite a powerful inference rule in terms of what it can do, in part because we can quickly start to generalize this rule. This q right here doesn’t need to just be a single propositional symbol. It could be multiple, all chained together in a single clause, as we’ll call it. So if I had something like p or q1 or q2 or q3, so on and so forth, up until qn, so I had n different other variables, and I have not p, well then what happens when these two complement each other is that these two clauses resolve, so to speak, to produce a new clause that is just q1 or q2 all the way up to qn. And in an or, the order of the arguments in the or doesn’t actually matter. The p doesn’t need to be the first thing. It could have been in the middle. But the idea here is that if I have p in one clause and not p in the other clause, well then I know that one of these remaining things must be true. I’ve resolved them in order to produce a new clause. But it turns out we can generalize this idea even further, in fact, and display even more power that we can have with this resolution rule. So let’s take another example. Let’s say, for instance, that I know the same piece of information that either Ron is in the Great Hall or Hermione is in the library. And the second piece of information I know is that Ron is not in the Great Hall or Harry is sleeping. So it’s not just a single piece of information. I have two different clauses. And we’ll define clauses more precisely in just a moment. What do I know here? Well again, for any propositional symbol like Ron is in the Great Hall, there are only two possibilities. Either Ron is in the Great Hall, in which case, based on resolution, we know that Harry must be sleeping, or Ron is not in the Great Hall, in which case we know based on the same rule that Hermione must be in the library. Based on those two things in combination, I can say based on these two premises that I can conclude that either Hermione is in the library or Harry is sleeping. So again, because these two conflict with each other, I know that one of these two must be true. And you can take a closer look and try and reason through that logic. Make sure you convince yourself that you believe this conclusion. Stated more generally, we can name this resolution rule by saying that if we know p or q is true, and we also know that not p or r is true, we resolve these two clauses together to get a new clause, q or r, that either q or r must be true. And again, much as in the last case, q and r don’t need to just be single propositional symbols. It could be multiple symbols. So if I had a rule that had p or q1 or q2 or q3, so on and so forth, up until qn, where n is just some number. And likewise, I had not p or r1 or r2, so on and so forth, up until rm, where m, again, is just some other number. I can resolve these two clauses together to get one of these must be true, q1 or q2 up until qn or r1 or r2 up until rm. And this is just a generalization of that same rule we saw before. Each of these things here are what we’re going to call a clause, where a clause is formally defined as a disjunction of literals, where a disjunction means it’s a bunch of things that are connected with or. Disjunction means things connected with or. Conjunction, meanwhile, is things connected with and. And a literal is either a propositional symbol or the opposite of a propositional symbol. So it’s something like p or q or not p or not q. Those are all propositional symbols or not of the propositional symbols. And we call those literals. And so a clause is just something like this, p or q or r, for example. Meanwhile, what this gives us an ability to do is it gives us an ability to turn logic, any logical sentence, into something called conjunctive normal form. A conjunctive normal form sentence is a logical sentence that is a conjunction of clauses. Recall, again, conjunction means things are connected to one another using and. And so a conjunction of clauses means it is an and of individual clauses, each of which has ors in it. So something like this, a or b or c, and d or not e, and f or g. Everything in parentheses is one clause. All of the clauses are connected to each other using an and. And everything in the clause is separated using an or. And this is just a standard form that we can translate a logical sentence into that just makes it easy to work with and easy to manipulate. And it turns out that we can take any sentence in logic and turn it into conjunctive normal form just by applying some inference rules and transformations to it. So we’ll take a look at how we can actually do that. So what is the process for taking a logical formula and converting it into conjunctive normal form, otherwise known as c and f? Well, the process looks a little something like this. We need to take all of the symbols that are not part of conjunctive normal form. The bi-conditionals and the implications and so forth, and turn them into something that is more closely like conjunctive normal form. So the first step will be to eliminate bi-conditionals, those if and only if double arrows. And we know how to eliminate bi-conditionals because we saw there was an inference rule to do just that. Any time I have an expression like alpha if and only if beta, I can turn that into alpha implies beta and beta implies alpha based on that inference rule we saw before. Likewise, in addition to eliminating bi-conditionals, I can eliminate implications as well, the if then arrows. And I can do that using the same inference rule we saw before too, taking alpha implies beta and turning that into not alpha or beta because that is logically equivalent to this first thing here. Then we can move knots inwards because we don’t want knots on the outsides of our expressions. Conjunctive normal form requires that it’s just claws and claws and claws and claws. Any knots need to be immediately next to propositional symbols. But we can move those knots around using De Morgan’s laws by taking something like not A and B and turn it into not A or not B, for example, using De Morgan’s laws to manipulate that. And after that, all we’ll be left with are ands and ors. And those are easy to deal with. We can use the distributive law to distribute the ors so that the ors end up on the inside of the expression, so to speak, and the ands end up on the outside. So this is the general pattern for how we’ll take a formula and convert it into conjunctive normal form. And let’s now take a look at an example of how we would do this and explore then why it is that we would want to do something like this. Here’s how we can do it. Let’s take this formula, for example. P or Q implies R. And I’d like to convert this into conjunctive normal form, where it’s all ands of clauses, and every clause is a disjunctive clause. It’s ors together. So what’s the first thing I need to do? Well, this is an implication. So let me go ahead and remove that implication. Using the implication inference rule, I can turn P or Q into P or Q implies R into not P or Q or R. So that’s the first step. I’ve gotten rid of the implication. And next, I can get rid of the not on the outside of this expression, too. I can move the nots inwards so they’re closer to the literals themselves by using De Morgan’s laws. And De Morgan’s law says that not P or Q is equivalent to not P and not Q. Again, here, just applying the inference rules that we’ve already seen in order to translate these statements. And now, I have two things that are separated by an or, where this thing on the inside is an and. What I’d really like to move the ors so the ors are on the inside, because conjunctive normal form means I need clause and clause and clause and clause. And so to do that, I can use the distributive law. If I have not P and not Q or R, I can distribute the or R to both of these to get not P or R and not Q or R using the distributive law. And this now here at the bottom is in conjunctive normal form. It is a conjunction and and of disjunctions of clauses that just are separated by ors. So this process can be used by any formula to take a logical sentence and turn it into this conjunctive normal form, where I have clause and clause and clause and clause and clause and so on. So why is this helpful? Why do we even care about taking all these sentences and converting them into this form? It’s because once they’re in this form where we have these clauses, these clauses are the inputs to the resolution inference rule that we saw a moment ago, that if I have two clauses where there’s something that conflicts or something complementary between those two clauses, I can resolve them to get a new clause, to draw a new conclusion. And we call this process inference by resolution, using the resolution rule to draw some sort of inference. And it’s based on the same idea, that if I have P or Q, this clause, and I have not P or R, that I can resolve these two clauses together to get Q or R as the resulting clause, a new piece of information that I didn’t have before. Now, a couple of key points that are worth noting about this before we talk about the actual algorithm. One thing is that, let’s imagine we have P or Q or S, and I also have not P or R or S. The resolution rule says that because this P conflicts with this not P, we would resolve to put everything else together to get Q or S or R or S. But it turns out that this double S is redundant, or S here and or S there. It doesn’t change the meaning of the sentence. So in resolution, when we do this resolution process, we’ll usually also do a process known as factoring, where we take any duplicate variables that show up and just eliminate them. So Q or S or R or S just becomes Q or R or S. The S only needs to appear once, no need to include it multiple times. Now, one final question worth considering is what happens if I try to resolve P and not P together? If I know that P is true and I know that not P is true, well, resolution says I can merge these clauses together and look at everything else. Well, in this case, there is nothing else, so I’m left with what we might call the empty clause. I’m left with nothing. And the empty clause is always false. The empty clause is equivalent to just being false. And that’s pretty reasonable because it’s impossible for both P and not P to both hold at the same time. P is either true or it’s not true, which means that if P is true, then this must be false. And if this is true, then this must be false. There is no way for both of these to hold at the same time. So if ever I try and resolve these two, it’s a contradiction, and I’ll end up getting this empty clause where the empty clause I can call equivalent to false. And this idea that if I resolve these two contradictory terms, I get the empty clause, this is the basis for our inference by resolution algorithm. Here’s how we’re going to perform inference by resolution at a very high level. We want to prove that our knowledge base entails some query alpha, that based on the knowledge we have, we can prove conclusively that alpha is going to be true. How are we going to do that? Well, in order to do that, we’re going to try to prove that if we know the knowledge and not alpha, that that would be a contradiction. And this is a common technique in computer science more generally, this idea of proving something by contradiction. If I want to prove that something is true, I can do so by first assuming that it is false and showing that it would be contradictory, showing that it leads to some contradiction. And if the thing I’m trying to prove, if when I assume it’s false, leads to a contradiction, then it must be true. And that’s the logical approach or the idea behind a proof by contradiction. And that’s what we’re going to do here. We want to prove that this query alpha is true. So we’re going to assume that it’s not true. We’re going to assume not alpha. And we’re going to try and prove that it’s a contradiction. If we do get a contradiction, well, then we know that our knowledge entails the query alpha. If we don’t get a contradiction, there is no entailment. This is this idea of a proof by contradiction of assuming the opposite of what you’re trying to prove. And if you can demonstrate that that’s a contradiction, then what you’re proving must be true. But more formally, how do we actually do this? How do we check that knowledge base and not alpha is going to lead to a contradiction? Well, here is where resolution comes into play. To determine if our knowledge base entails some query alpha, we’re going to convert knowledge base and not alpha to conjunctive normal form, that form where we have a whole bunch of clauses that are all anded together. And when we have these individual clauses, now we can keep checking to see if we can use resolution to produce a new clause. We can take any pair of clauses and check, is there some literal that is the opposite of each other or complementary to each other in both of them? For example, I have a p in one clause and a not p in another clause. Or an r in one clause and a not r in another clause. If ever I have that situation where once I convert to conjunctive normal form and I have a whole bunch of clauses, I see two clauses that I can resolve to produce a new clause, then I’ll do so. This process occurs in a loop. I’m going to keep checking to see if I can use resolution to produce a new clause and keep using those new clauses to try to generate more new clauses after that. Now, it just so may happen that eventually we may produce the empty clause, the clause we were talking about before. If I resolve p and not p together, that produces the empty clause and the empty clause we know to be false. Because we know that there’s no way for both p and not p to both simultaneously be true. So if ever we produce the empty clause, then we have a contradiction. And if we have a contradiction, that’s exactly what we were trying to do in a fruit by contradiction. If we have a contradiction, then we know that our knowledge base must entail this query alpha. And we know that alpha must be true. And it turns out, and we won’t go into the proof here, but you can show that otherwise, if you don’t produce the empty clause, then there is no entailment. If we run into a situation where there are no more new clauses to add, we’ve done all the resolution that we can do, and yet we still haven’t produced the empty clause, then there is no entailment in this case. And this now is the resolution algorithm. And it’s very abstract looking, especially this idea of like, what does it even mean to have the empty clause? So let’s take a look at an example, actually try and prove some entailment by using this inference by resolution process. So here’s our question. We have this knowledge base. Here is the knowledge that we know, A or B, and not B or C, and not C. And we want to know if all of this entails A. So this is our knowledge base here, this whole log thing. And our query alpha is just this propositional symbol, A. So what do we do? Well, first, we want to prove by contradiction. So we want to first assume that A is false, and see if that leads to some sort of contradiction. So here is what we’re going to start with, A or B, and not B or C, and not C. This is our knowledge base. And we’re going to assume not A. We’re going to assume that the thing we’re trying to prove is, in fact, false. And so this is now in conjunctive normal form, and I have four different clauses. I have A or B. I have not B or C. I have not C, and I have not A. And now, I can begin to just pick two clauses that I can resolve, and apply the resolution rule to them. And so looking at these four clauses, I see, all right, these two clauses are ones I can resolve. I can resolve them because there are complementary literals that show up in them. There’s a C here, and a not C here. So just looking at these two clauses, if I know that not B or C is true, and I know that C is not true, well, then I can resolve these two clauses to say, all right, not B, that must be true. I can generate this new clause as a new piece of information that I now know to be true. And all right, now I can repeat this process, do the process again. Can I use resolution again to get some new conclusion? Well, it turns out I can. I can use that new clause I just generated, along with this one here. There are complementary literals. This B is complementary to, or conflicts with, this not B over here. And so if I know that A or B is true, and I know that B is not true, well, then the only remaining possibility is that A must be true. So now we have A. That is a new clause that I’ve been able to generate. And now, I can do this one more time. I’m looking for two clauses that can be resolved, and you might programmatically do this by just looping over all possible pairs of clauses and checking for complementary literals in each. And here, I can say, all right, I found two clauses, not A and A, that conflict with each other. And when I resolve these two together, well, this is the same as when we were resolving P and not P from before. When I resolve these two clauses together, I get rid of the As, and I’m left with the empty clause. And the empty clause we know to be false, which means we have a contradiction, which means we can safely say that this whole knowledge base does entail A. That if this sentence is true, that we know that A for sure is also true. So this now, using inference by resolution, is an entirely different way to take some statement and try and prove that it is, in fact, true. Instead of enumerating all of the possible worlds that we might be in in order to try to figure out in which cases is the knowledge base true and in which cases are query true, instead we use this resolution algorithm to say, let’s keep trying to figure out what conclusions we can draw and see if we reach a contradiction. And if we reach a contradiction, then that tells us something about whether our knowledge actually entails the query or not. And it turns out there are many different algorithms that can be used for inference. What we’ve just looked at here are just a couple of them. And in fact, all of this is just based on one particular type of logic. It’s based on propositional logic, where we have these individual symbols and we connect them using and and or and not and implies and by conditionals. But propositional logic is not the only kind of logic that exists. And in fact, we see that there are limitations that exist in propositional logic, especially as we saw in examples like with the mastermind example or with the example with the logic puzzle where we had different Hogwarts house people that belong to different houses and we were trying to figure out who belonged to which houses. There were a lot of different propositional symbols that we needed in order to represent some fairly basic ideas. So now is the final topic that we’ll take a look at just before we end class today is one final type of logic different from propositional logic known as first order logic, which is a little bit more powerful than propositional logic and is going to make it easier for us to express certain types of ideas. In propositional logic, if we think back to that puzzle with the people in the Hogwarts houses, we had a whole bunch of symbols. And every symbol could only be true or false. We had a symbol for Minerva Gryffindor, which was either true of Minerva within Gryffindor and false otherwise, and likewise for Minerva Hufflepuff and Minerva Ravenclaw and Minerva Slytherin and so forth. But this was starting to get quite redundant. We wanted some way to be able to express that there is a relationship between these propositional symbols, that Minerva shows up in all of them. And also, I would have liked to have not have had so many different symbols to represent what really was a fairly straightforward problem. So first order logic will give us a different way of trying to deal with this idea by giving us two different types of symbols. We’re going to have constant symbols that are going to represent objects like people or houses. And then predicate symbols, which you can think of as relations or functions that take an input and evaluate them to true or false, for example, that tell us whether or not some property of some constant or some pair of constants or multiple constants actually holds. So we’ll see an example of that in just a moment. For now, in this same problem, our constant symbols might be objects, things like people or houses. So Minerva, Pomona, Horace, Gilderoy, those are all constant symbols, as are my four houses, Gryffindor, Hufflepuff, Ravenclaw, and Slytherin. Predicates, meanwhile, these predicate symbols are going to be properties that might hold true or false of these individual constants. So person might hold true of Minerva, but it would be false for Gryffindor because Gryffindor is not a person. And house is going to hold true for Ravenclaw, but it’s not going to hold true for Horace, for example, because Horace is a person. And belongs to, meanwhile, is going to be some relation that is going to relate people to their houses. And it’s going to only tell me when someone belongs to a house or does not. So let’s take a look at some examples of what a sentence in first order logic might actually look like. A sentence might look like something like this. Person Minerva, with Minerva in parentheses, and person being a predicate symbol, Minerva being a constant symbol. This sentence in first order logic effectively means Minerva is a person, or the person property applies to the Minerva object. So if I want to say something like Minerva is a person, here is how I express that idea using first order logic. Meanwhile, I can say something like, house Gryffindor, to likewise express the idea that Gryffindor is a house. I can do that this way. And all of the same logical connectives that we saw in propositional logic, those are going to work here too. And or implication by conditional not. In fact, I can use not to say something like, not house Minerva. And this sentence in first order logic means something like, Minerva is not a house. It is not true that the house property applies to Minerva. Meanwhile, in addition to some of these predicate symbols that just take a single argument, some of our predicate symbols are going to express binary relations, relations between two of its arguments. So I could say something like, belongs to, and then two inputs, Minerva and Gryffindor, to express the idea that Minerva belongs to Gryffindor. And so now here’s the key difference, or one of the key differences, between this and propositional logic. In propositional logic, I needed one symbol for Minerva Gryffindor, and one symbol for Minerva Hufflepuff, and one symbol for all the other people’s Gryffindor and Hufflepuff variables. In this case, I just need one symbol for each of my people, and one symbol for each of my houses. And then I can express as a predicate something like, belongs to, and say, belongs to Minerva Gryffindor, to express the idea that Minerva belongs to Gryffindor House. So already we can see that first order logic is quite expressive in being able to express these sorts of sentences using the existing constant symbols and predicates that already exist, while minimizing the number of new symbols that I need to create. I can just use eight symbols for people for houses, instead of 16 symbols for every possible combination of each. But first order logic gives us a couple of additional features that we can use to express even more complex ideas. And these more additional features are generally known as quantifiers. And there are two main quantifiers in first order logic, the first of which is universal quantification. Universal quantification lets me express an idea like something is going to be true for all values of a variable. Like for all values of x, some statement is going to hold true. So what might a sentence in universal quantification look like? Well, we’re going to use this upside down a to mean for all. So upside down ax means for all values of x, where x is any object, this is going to hold true. Belongs to x Gryffindor implies not belongs to x Hufflepuff. So let’s try and parse this out. This means that for all values of x, if this holds true, if x belongs to Gryffindor, then this does not hold true. x does not belong to Hufflepuff. So translated into English, this sentence is saying something like for all objects x, if x belongs to Gryffindor, then x does not belong to Hufflepuff, for example. Or a phrase even more simply, anyone in Gryffindor is not in Hufflepuff, simplified way of saying the same thing. So this universal quantification lets us express an idea like something is going to hold true for all values of a particular variable. In addition to universal quantification though, we also have existential quantification. Whereas universal quantification said that something is going to be true for all values of a variable, existential quantification says that some expression is going to be true for some value of a variable, at least one value of the variable. So let’s take a look at a sample sentence using existential quantification. One such sentence looks like this. There exists an x. This backwards e stands for exists. And here we’re saying there exists an x such that house x and belongs to Minerva x. In other words, there exists some object x where x is a house and Minerva belongs to x. Or phrased a little more succinctly in English, I’m here just saying Minerva belongs to a house. There’s some object that is a house and Minerva belongs to a house. And combining this universal and existential quantification, we can create far more sophisticated logical statements than we were able to just using propositional logic. I could combine these to say something like this. For all x, person x implies there exists a y such that house y and belongs to xy. All right. So a lot of stuff going on there, a lot of symbols. Let’s try and parse it out and just understand what it’s saying. Here we’re saying that for all values of x, if x is a person, then this is true. So in other words, I’m saying for all people, and we call that person x, this statement is going to be true. What statement is true of all people? Well, there exists a y that is a house, so there exists some house, and x belongs to y. In other words, I’m saying that for all people out there, there exists some house such that x, the person, belongs to y, the house. This is phrased more succinctly. I’m saying that every person belongs to a house, that for all x, if x is a person, then there exists a house that x belongs to. And so we can now express a lot more powerful ideas using this idea now of first order logic. And it turns out there are many other kinds of logic out there. There’s second order logic and other higher order logic, each of which allows us to express more and more complex ideas. But all of it, in this case, is really in pursuit of the same goal, which is the representation of knowledge. We want our AI agents to be able to know information, to represent that information, whether that’s using propositional logic or first order logic or some other logic, and then be able to reason based on that, to be able to draw conclusions, make inferences, figure out whether there’s some sort of entailment relationship, as by using some sort of inference algorithm, something like inference by resolution or model checking or any number of these other algorithms that we can use in order to take information that we know and translate it to additional conclusions. So all of this has helped us to create AI that is able to represent information about what it knows and what it doesn’t know. Next time, though, we’ll take a look at how we can make our AI even more powerful by not just encoding information that we know for sure to be true and not to be true, but also to take a look at uncertainty, to look at what happens if AI thinks that something might be probable or maybe not very probable or somewhere in between those two extremes, all in the pursuit of trying to build our intelligent systems to be even more intelligent. We’ll see you next time. Thank you. All right, welcome back, everyone, to an introduction to artificial intelligence with Python. And last time, we took a look at how it is that AI inside of our computers can represent knowledge. We represented that knowledge in the form of logical sentences in a variety of different logical languages. And the idea was we wanted our AI to be able to represent knowledge or information and somehow use those pieces of information to be able to derive new pieces of information by inference, to be able to take some information and deduce some additional conclusions based on the information that it already knew for sure. But in reality, when we think about computers and we think about AI, very rarely are our machines going to be able to know things for sure. Oftentimes, there’s going to be some amount of uncertainty in the information that our AIs or our computers are dealing with, where it might believe something with some probability, as we’ll soon discuss what probability is all about and what it means, but not entirely for certain. And we want to use the information that it has some knowledge about, even if it doesn’t have perfect knowledge, to still be able to make inferences, still be able to draw conclusions. So you might imagine, for example, in the context of a robot that has some sensors and is exploring some environment, it might not know exactly where it is or exactly what’s around it, but it does have access to some data that can allow it to draw inferences with some probability. There’s some likelihood that one thing is true or another. Or you can imagine in context where there is a little bit more randomness and uncertainty, something like predicting the weather, where you might not be able to know for sure what tomorrow’s weather is with 100% certainty, but you can probably infer with some probability what tomorrow’s weather is going to be based on maybe today’s weather and yesterday’s weather and other data that you might have access to as well. And so oftentimes, we can distill this in terms of just possible events that might happen and what the likelihood of those events are. This comes a lot in games, for example, where there is an element of chance inside of those games. So you imagine rolling a dice. You’re not sure exactly what the die roll is going to be, but you know it’s going to be one of these possibilities from 1 to 6, for example. And so here now, we introduce the idea of probability theory. And what we’ll take a look at today is beginning by looking at the mathematical foundations of probability theory, getting an understanding for some of the key concepts within probability, and then diving into how we can use probability and the ideas that we look at mathematically to represent some ideas in terms of models that we can put into our computers in order to program an AI that is able to use information about probability to draw inferences, to make some judgments about the world with some probability or likelihood of being true. So probability ultimately boils down to this idea that there are possible worlds that we’re here representing using this little Greek letter omega. And the idea of a possible world is that when I roll a die, there are six possible worlds that could result from it. I could roll a 1, or a 2, or a 3, or a 4, or a 5, or a 6. And each of those are a possible world. And each of those possible worlds has some probability of being true, the probability that I do roll a 1, or a 2, or a 3, or something else. And we represent that probability like this, using the capital letter P. And then in parentheses, what it is that we want the probability of. So this right here would be the probability of some possible world as represented by the little letter omega. Now, there are a couple of basic axioms of probability that become relevant as we consider how we deal with probability and how we think about it. First and foremost, every probability value must range between 0 and 1 inclusive. So the smallest value any probability can have is the number 0, which is an impossible event. Something like I roll a die, and the die is a 7 is the roll that I get. If the die only has numbers 1 through 6, the event that I roll a 7 is impossible, so it would have probability 0. And on the other end of the spectrum, probability can range all the way up to the positive number 1, meaning an event is certain to happen, that I roll a die and the number is less than 10, for example. That is an event that is guaranteed to happen if the only sides on my die are 1 through 6, for instance. And then they can range through any real number in between these two values. Where, generally speaking, a higher value for the probability means an event is more likely to take place, and a lower value for the probability means the event is less likely to take place. And the other key rule for probability looks a little bit like this. This sigma notation, if you haven’t seen it before, refers to summation, the idea that we’re going to be adding up a whole sequence of values. And this sigma notation is going to come up a couple of times today, because as we deal with probability, oftentimes we’re adding up a whole bunch of individual values or individual probabilities to get some other value. So we’ll see this come up a couple of times. But what this notation means is that if I sum up all of the possible worlds omega that are in big omega, which represents the set of all the possible worlds, meaning I take for all of the worlds in the set of possible worlds and add up all of their probabilities, what I ultimately get is the number 1. So if I take all the possible worlds, add up what each of their probabilities is, I should get the number 1 at the end, meaning all probabilities just need to sum to 1. So for example, if I take dice, for example, and if you imagine I have a fair die with numbers 1 through 6 and I roll the die, each one of these rolls has an equal probability of taking place. And the probability is 1 over 6, for example. So each of these probabilities is between 0 and 1, 0 meaning impossible and 1 meaning for certain. And if you add up all of these probabilities for all of the possible worlds, you get the number 1. And we can represent any one of those probabilities like this. The probability that we roll the number 2, for example, is just 1 over 6. Every six times we roll the die, we’d expect that one time, for instance, the die might come up as a 2. Its probability is not certain, but it’s a little more than nothing, for instance. And so this is all fairly straightforward for just a single die. But things get more interesting as our models of the world get a little bit more complex. Let’s imagine now that we’re not just dealing with a single die, but we have two dice, for example. I have a red die here and a blue die there, and I care not just about what the individual roll is, but I care about the sum of the two rolls. In this case, the sum of the two rolls is the number 3. How do I begin to now reason about what does the probability look like if instead of having one die, I now have two dice? Well, what we might imagine is that we could first consider what are all of the possible worlds. And in this case, all of the possible worlds are just every combination of the red and blue die that I could come up with. For the red die, it could be a 1 or a 2 or a 3 or a 4 or a 5 or a 6. And for each of those possibilities, the blue die, likewise, could also be either 1 or 2 or 3 or 4 or 5 or 6. And it just so happens that in this particular case, each of these possible combinations is equally likely. Equally likely are all of these various different possible worlds. That’s not always going to be the case. If you imagine more complex models that we could try to build and things that we could try to represent in the real world, it’s probably not going to be the case that every single possible world is always equally likely. But in the case of fair dice, where in any given die roll, any one number has just as good a chance of coming up as any other number, we can consider all of these possible worlds to be equally likely. But even though all of the possible worlds are equally likely, that doesn’t necessarily mean that their sums are equally likely. So if we consider what the sum is of all of these two, so 1 plus 1, that’s a 2. 2 plus 1 is a 3. And consider for each of these possible pairs of numbers what their sum ultimately is, we can notice that there are some patterns here, where it’s not entirely the case that every number comes up equally likely. If you consider 7, for example, what’s the probability that when I roll two dice, their sum is 7? There are several ways this can happen. There are six possible worlds where the sum is 7. It could be a 1 and a 6, or a 2 and a 5, or a 3 and a 4, a 4 and a 3, and so forth. But if you instead consider what’s the probability that I roll two dice, and the sum of those two die rolls is 12, for example, we’re looking at this diagram, there’s only one possible world in which that can happen. And that’s the possible world where both the red die and the blue die both come up as sixes to give us a sum total of 12. So based on just taking a look at this diagram, we see that some of these probabilities are likely different. The probability that the sum is a 7 must be greater than the probability that the sum is a 12. And we can represent that even more formally by saying, OK, the probability that we sum to 12 is 1 out of 36. Out of the 36 equally likely possible worlds, 6 squared because we have six options for the red die and six options for the blue die, out of those 36 options, only one of them sums to 12. Whereas on the other hand, the probability that if we take two dice rolls and they sum up to the number 7, well, out of those 36 possible worlds, there were six worlds where the sum was 7. And so we get 6 over 36, which we can simplify as a fraction to just 1 over 6. So here now, we’re able to represent these different ideas of probability, representing some events that might be more likely and then other events that are less likely as well. And these sorts of judgments, where we’re figuring out just in the abstract what is the probability that this thing takes place, are generally known as unconditional probabilities. Some degree of belief we have in some proposition, some fact about the world, in the absence of any other evidence. Without knowing any additional information, if I roll a die, what’s the chance it comes up as a 2? Or if I roll two dice, what’s the chance that the sum of those two die rolls is a 7? But usually when we’re thinking about probability, especially when we’re thinking about training in AI to intelligently be able to know something about the world and make predictions based on that information, it’s not unconditional probability that our AI is dealing with, but rather conditional probability, probability where rather than having no original knowledge, we have some initial knowledge about the world and how the world actually works. So conditional probability is the degree of belief in a proposition given some evidence that has already been revealed to us. So what does this look like? Well, it looks like this in terms of notation. We’re going to represent conditional probability as probability of A and then this vertical bar and then B. And the way to read this is the thing on the left-hand side of the vertical bar is what we want the probability of. Here now, I want the probability that A is true, that it is the real world, that it is the event that actually does take place. And then on the right side of the vertical bar is our evidence, the information that we already know for certain about the world. For example, that B is true. So the way to read this entire expression is what is the probability of A given B, the probability that A is true, given that we already know that B is true. And this type of judgment, conditional probability, the probability of one thing given some other fact, comes up quite a lot when we think about the types of calculations we might want our AI to be able to do. For example, we might care about the probability of rain today given that we know that it rained yesterday. We could think about the probability of rain today just in the abstract. What is the chance that today it rains? But usually, we have some additional evidence. I know for certain that it rained yesterday. And so I would like to calculate the probability that it rains today given that I know that it rained yesterday. Or you might imagine that I want to know the probability that my optimal route to my destination changes given the current traffic condition. So whether or not traffic conditions change, that might change the probability that this route is actually the optimal route. Or you might imagine in a medical context, I want to know the probability that a patient has a particular disease given some results of some tests that have been performed on that patient. And I have some evidence, the results of that test, and I would like to know the probability that a patient has a particular disease. So this notion of conditional probability comes up everywhere. So we begin to think about what we would like to reason about, but being able to reason a little more intelligently by taking into account evidence that we already have. We’re more able to get an accurate result for what is the likelihood that someone has this disease if we know this evidence, the results of the test, as opposed to if we were just calculating the unconditional probability of saying, what is the probability they have the disease without any evidence to try and back up our result one way or the other. So now that we’ve got this idea of what conditional probability is, the next question we have to ask is, all right, how do we calculate conditional probability? How do we figure out mathematically, if I have an expression like this, how do I get a number from that? What does conditional probability actually mean? Well, the formula for conditional probability looks a little something like this. The probability of a given b, the probability that a is true, given that we know that b is true, is equal to this fraction, the probability that a and b are true, divided by just the probability that b is true. And the way to intuitively try to think about this is that if I want to know the probability that a is true, given that b is true, well, I want to consider all the ways they could both be true out of the only worlds that I care about are the worlds where b is already true. I can sort of ignore all the cases where b isn’t true, because those aren’t relevant to my ultimate computation. They’re not relevant to what it is that I want to get information about. So let’s take a look at an example. Let’s go back to that example of rolling two dice and the idea that those two dice might sum up to the number 12. We discussed earlier that the unconditional probability that if I roll two dice and they sum to 12 is 1 out of 36, because out of the 36 possible worlds that I might care about, in only one of them is the sum of those two dice 12. It’s only when red is 6 and blue is also 6. But let’s say now that I have some additional information. I now want to know what is the probability that the two dice sum to 12, given that I know that the red die was a 6. So I already have some evidence. I already know the red die is a 6. I don’t know what the blue die is. That information isn’t given to me in this expression. But given the fact that I know that the red die rolled a 6, what is the probability that we sum to 12? And so we can begin to do the math using that expression from before. Here, again, are all of the possibilities, all of the possible combinations of red die being 1 through 6 and blue die being 1 through 6. And I might consider first, all right, what is the probability of my evidence, my B variable, where I want to know, what is the probability that the red die is a 6? Well, the probability that the red die is a 6 is just 1 out of 6. So these 1 out of 6 options are really the only worlds that I care about here now. All the rest of them are irrelevant to my calculation, because I already have this evidence that the red die was a 6, so I don’t need to care about all of the other possibilities that could result. So now, in addition to the fact that the red die rolled as a 6 and the probability of that, the other piece of information I need to know in order to calculate this conditional probability is the probability that both of my variables, A and B, are true. The probability that both the red die is a 6, and they all sum to 12. So what is the probability that both of these things happen? Well, it only happens in one possible case in 1 out of these 36 cases, and it’s the case where both the red and the blue die are equal to 6. This is a piece of information that we already knew. And so this probability is equal to 1 over 36. And so to get the conditional probability that the sum is 12, given that I know that the red dice is equal to 6, well, I just divide these two values together, and 1 over 36 divided by 1 over 6 gives us this probability of 1 over 6. Given that I know that the red die rolled a value of 6, the probability that the sum of the two dice is 12 is also 1 over 6. And that probably makes intuitive sense to you, too, because if the red die is a 6, the only way for me to get to a 12 is if the blue die also rolls a 6, and we know that the probability of the blue die rolling a 6 is 1 over 6. So in this case, the conditional probability seems fairly straightforward. But this idea of calculating a conditional probability by looking at the probability that both of these events take place is an idea that’s going to come up again and again. This is the definition now of conditional probability. And we’re going to use that definition as we think about probability more generally to be able to draw conclusions about the world. This, again, is that formula. The probability of A given B is equal to the probability that A and B take place divided by the probability of B. And you’ll see this formula sometimes written in a couple of different ways. You could imagine algebraically multiplying both sides of this equation by probability of B to get rid of the fraction, and you’ll get an expression like this. The probability of A and B, which is this expression over here, is just the probability of B times the probability of A given B. Or you could represent this equivalently since A and B in this expression are interchangeable. A and B is the same thing as B and A. You could imagine also representing the probability of A and B as the probability of A times the probability of B given A, just switching all of the A’s and B’s. These three are all equivalent ways of trying to represent what joint probability means. And so you’ll sometimes see all of these equations, and they might be useful to you as you begin to reason about probability and to think about what values might be taking place in the real world. Now, sometimes when we deal with probability, we don’t just care about a Boolean event like did this happen or did this not happen. Sometimes we might want the ability to represent variable values in a probability space where some variable might take on multiple different possible values. And in probability, we call a variable in probability theory a random variable. A random variable in probability is just some variable in probability theory that has some domain of values that it can take on. So what do I mean by this? Well, what I mean is I might have a random variable that is just called roll, for example, that has six possible values. Roll is my variable, and the possible values, the domain of values that it can take on are 1, 2, 3, 4, 5, and 6. And I might like to know the probability of each. In this case, they happen to all be the same. But in other random variables, that might not be the case. For example, I might have a random variable to represent the weather, for example, where the domain of values it could take on are things like sun or cloudy or rainy or windy or snowy. And each of those might have a different probability. And I care about knowing what is the probability that the weather equals sun or that the weather equals clouds, for instance. And I might like to do some mathematical calculations based on that information. Other random variables might be something like traffic. What are the odds that there is no traffic or light traffic or heavy traffic? Traffic, in this case, is my random variable. And the values that that random variable can take on are here. It’s either none or light or heavy. And I, the person doing these calculations, I, the person encoding these random variables into my computer, need to make the decision as to what these possible values actually are. You might imagine, for example, for a flight. If I care about whether or not I make it or do a flight on time, my flight has a couple of possible values that it could take on. My flight could be on time. My flight could be delayed. My flight could be canceled. So flight, in this case, is my random variable. And these are the values that it can take on. And often, I want to know something about the probability that my random variable takes on each of those possible values. And this is what we then call a probability distribution. A probability distribution takes a random variable and gives me the probability for each of the possible values in its domain. So in the case of this flight, for example, my probability distribution might look something like this. My probability distribution says the probability that the random variable flight is equal to the value on time is 0.6. Or otherwise, put into more English human-friendly terms, the likelihood that my flight is on time is 60%, for example. And in this case, the probability that my flight is delayed is 30%. The probability that my flight is canceled is 10% or 0.1. And if you sum up all of these possible values, the sum is going to be 1, right? If you take all of the possible worlds, here are my three possible worlds for the value of the random variable flight, add them all up together, the result needs to be the number 1 per that axiom of probability theory that we’ve discussed before. So this now is one way of representing this probability distribution for the random variable flight. Sometimes you’ll see it represented a little bit more concisely that this is pretty verbose for really just trying to express three possible values. And so often, you’ll instead see the same notation representing using a vector. And all a vector is is a sequence of values. As opposed to just a single value, I might have multiple values. And so I could extend instead, represent this idea this way. Bold p, so a larger p, generally meaning the probability distribution of this variable flight is equal to this vector represented in angle brackets. The probability distribution is 0.6, 0.3, and 0.1. And I would just have to know that this probability distribution is in order of on time or delayed and canceled to know how to interpret this vector. To mean the first value in the vector is the probability that my flight is on time. The second value in the vector is the probability that my flight is delayed. And the third value in the vector is the probability that my flight is canceled. And so this is just an alternate way of representing this idea, a little more verbosely. But oftentimes, you’ll see us just talk about a probability distribution over a random variable. And whenever we talk about that, what we’re really doing is trying to figure out the probabilities of each of the possible values that that random variable can take on. But this notation is just a little bit more succinct, even though it can sometimes be a little confusing, depending on the context in which you see it. So we’ll start to look at examples where we use this sort of notation to describe probability and to describe events that might take place. A couple of other important ideas to know with regards to probability theory. One is this idea of independence. And independence refers to the idea that the knowledge of one event doesn’t influence the probability of another event. So for example, in the context of my two dice rolls, where I had the red die and the blue die, the probability that I roll the red die and the blue die, those two events, red die and blue die, are independent. Knowing the result of the red die doesn’t change the probabilities for the blue die. It doesn’t give me any additional information about what the value of the blue die is ultimately going to be. But that’s not always going to be the case. You might imagine that in the case of weather, something like clouds and rain, those are probably not independent. But if it is cloudy, that might increase the probability that later in the day it’s going to rain. So some information informs some other event or some other random variable. So independence refers to the idea that one event doesn’t influence the other. And if they’re not independent, then there might be some relationship. So mathematically, formally, what does independence actually mean? Well, recall this formula from before, that the probability of A and B is the probability of A times the probability of B given A. And the more intuitive way to think about this is that to know how likely it is that A and B happen, well, let’s first figure out the likelihood that A happens. And then given that we know that A happens, let’s figure out the likelihood that B happens and multiply those two things together. But if A and B were independent, meaning knowing A doesn’t change anything about the likelihood that B is true, well, then the probability of B given A, meaning the probability that B is true, given that I know A is true, well, that I know A is true shouldn’t really make a difference if these two things are independent, that A shouldn’t influence B at all. So the probability of B given A is really just the probability of B. If it is true that A and B are independent. And so this right here is one example of a definition for what it means for A and B to be independent. The probability of A and B is just the probability of A times the probability of B. Anytime you find two events A and B where this relationship holds, then you can say that A and B are independent. So an example of that might be the dice that we were taking a look at before. Here, if I wanted the probability of red being a 6 and blue being a 6, well, that’s just the probability that red is a 6 multiplied by the probability that blue is a 6. It’s both equal to 1 over 36. So I can say that these two events are independent. What wouldn’t be independent, for example, would be an example. So this, for example, has a probability of 1 over 36, as we talked about before. But what wouldn’t be independent would be a case like this, the probability that the red die rolls a 6 and the red die rolls a 4. If you just naively took, OK, red die 6, red die 4, well, if I’m only rolling the die once, you might imagine the naive approach is to say, well, each of these has a probability of 1 over 6. So multiply them together, and the probability is 1 over 36. But of course, if you’re only rolling the red die once, there’s no way you could get two different values for the red die. It couldn’t both be a 6 and a 4. So the probability should be 0. But if you were to multiply probability of red 6 times probability of red 4, well, that would equal 1 over 36. But of course, that’s not true. Because we know that there is no way, probability 0, that when we roll the red die once, we get both a 6 and a 4, because only one of those possibilities can actually be the result. And so we can say that the event that red roll is 6 and the event that red roll is 4, those two events are not independent. If I know that the red roll is a 6, I know that the red roll cannot possibly be a 4, so these things are not independent. And instead, if I wanted to calculate the probability, I would need to use this conditional probability as the regular definition of the probability of two events taking place. And the probability of this now, well, the probability of the red roll being a 6, that’s 1 over 6. But what’s the probability that the roll is a 4 given that the roll is a 6? Well, this is just 0, because there’s no way for the red roll to be a 4, given that we already know the red roll is a 6. And so the value, if we do add all that multiplication, is we get the number 0. So this idea of conditional probability is going to come up again and again, especially as we begin to reason about multiple different random variables that might be interacting with each other in some way. And this gets us to one of the most important rules in probability theory, which is known as Bayes rule. And it turns out that just using the information we’ve already learned about probability and just applying a little bit of algebra, we can actually derive Bayes rule for ourselves. But it’s a very important rule when it comes to inference and thinking about probability in the context of what it is that a computer can do or what a mathematician could do by having access to information about probability. So let’s go back to these equations to be able to derive Bayes rule ourselves. We know the probability of A and B, the likelihood that A and B take place, is the likelihood of B, and then the likelihood of A, given that we know that B is already true. And likewise, the probability of A given A and B is the probability of A times the probability of B, given that we know that A is already true. This is sort of a symmetric relationship where it doesn’t matter the order of A and B and B and A mean the same thing. And so in these equations, we can just swap out A and B to be able to represent the exact same idea. So we know that these two equations are already true. We’ve seen that already. And now let’s just do a little bit of algebraic manipulation of this stuff. Both of these expressions on the right-hand side are equal to the probability of A and B. So what I can do is take these two expressions on the right-hand side and just set them equal to each other. If they’re both equal to the probability of A and B, then they both must be equal to each other. So probability of A times probability of B given A is equal to the probability of B times the probability of A given B. And now all we’re going to do is do a little bit of division. I’m going to divide both sides by P of A. And now I get what is Bayes’ rule. The probability of B given A is equal to the probability of B times the probability of A given B divided by the probability of A. And sometimes in Bayes’ rule, you’ll see the order of these two arguments switched. So instead of B times A given B, it’ll be A given B times B. That ultimately doesn’t matter because in multiplication, you can switch the order of the two things you’re multiplying, and it doesn’t change the result. But this here right now is the most common formulation of Bayes’ rule. The probability of B given A is equal to the probability of A given B times the probability of B divided by the probability of A. And this rule, it turns out, is really important when it comes to trying to infer things about the world, because it means you can express one conditional probability, the conditional probability of B given A, using knowledge about the probability of A given B, using the reverse of that conditional probability. So let’s first do a little bit of an example with this, just to see how we might use it, and then explore what this means a little bit more generally. So we’re going to construct a situation where I have some information. There are two events that I care about, the idea that it’s cloudy in the morning and the idea that it is rainy in the afternoon. Those are two different possible events that could take place, cloudy in the morning, or the AM, rainy in the PM. And what I care about is, given clouds in the morning, what is the probability of rain in the afternoon? A reasonable question I might ask, in the morning, I look outside, or an AI’s camera looks outside and sees that there are clouds in the morning. And we want to conclude, we want to figure out what is the probability that in the afternoon, there is going to be rain. Of course, in the abstract, we don’t have access to this kind of information, but we can use data to begin to try and figure this out. So let’s imagine now that I have access to some pieces of information. I have access to the idea that 80% of rainy afternoons start out with a cloudy morning. And you might imagine that I could have gathered this data just by looking at data over a sequence of time, that I know that 80% of the time when it’s raining in the afternoon, it was cloudy that morning. I also know that 40% of days have cloudy mornings. And I also know that 10% of days have rainy afternoons. And now using this information, I would like to figure out, given clouds in the morning, what is the probability that it rains in the afternoon? I want to know the probability of afternoon rain given morning clouds. And I can do that, in particular, using this fact, the probability of, so if I know that 80% of rainy afternoons start with cloudy mornings, then I know the probability of cloudy mornings given rainy afternoons. So using sort of the reverse conditional probability, I can figure that out. Expressed in terms of Bayes rule, this is what that would look like. Probability of rain given clouds is the probability of clouds given rain times the probability of rain divided by the probability of clouds. Here I’m just substituting in for the values of a and b from that equation of Bayes rule from before. And then I can just do the math. I have this information. I know that 80% of the time, if it was raining, then there were clouds in the morning. So 0.8 here. Probability of rain is 0.1, because 10% of days were rainy, and 40% of days were cloudy. I do the math, and I can figure out the answer is 0.2. So the probability that it rains in the afternoon, given that it was cloudy in the morning, is 0.2 in this case. And this now is an application of Bayes rule, the idea that using one conditional probability, we can get the reverse conditional probability. And this is often useful when one of the conditional probabilities might be easier for us to know about or easier for us to have data about. And using that information, we can calculate the other conditional probability. So what does this look like? Well, it means that knowing the probability of cloudy mornings given rainy afternoons, we can calculate the probability of rainy afternoons given cloudy mornings. Or, for example, more generally, if we know the probability of some visible effect, some effect that we can see and observe, given some unknown cause that we’re not sure about, well, then we can calculate the probability of that unknown cause given the visible effect. So what might that look like? Well, in the context of medicine, for example, I might know the probability of some medical test result given a disease. Like, I know that if someone has a disease, then x% of the time the medical test result will show up as this, for instance. And using that information, then I can calculate, all right, what is the probability that given I know the medical test result, what is the likelihood that someone has the disease? This is the piece of information that is usually easier to know, easier to immediately have access to data for. And this is the information that I actually want to calculate. Or I might want to know, for example, if I know that some probability of counterfeit bills have blurry text around the edges, because counterfeit printers aren’t nearly as good at printing text precisely. So I have some information about, given that something is a counterfeit bill, like x% of counterfeit bills have blurry text, for example. And using that information, then I can calculate some piece of information that I might want to know, like, given that I know there’s blurry text on a bill, what is the probability that that bill is counterfeit? So given one conditional probability, I can calculate the other conditional probability as well. And so now we’ve taken a look at a couple of different types of probability. And we’ve looked at unconditional probability, where I just look at what is the probability of this event occurring, given no additional evidence that I might have access to. And we’ve also looked at conditional probability, where I have some sort of evidence, and I would like to, using that evidence, be able to calculate some other probability as well. And the other kind of probability that will be important for us to think about is joint probability. And this is when we’re considering the likelihood of multiple different events simultaneously. And so what do we mean by this? For example, I might have probability distributions that look a little something like this. Like, oh, I want to know the probability distribution of clouds in the morning. And that distribution looks like this. 40% of the time, C, which is my random variable here, is equal to it’s cloudy. And 60% of the time, it’s not cloudy. So here is just a simple probability distribution that is effectively telling me that 40% of the time, it’s cloudy. I might also have a probability distribution for rain in the afternoon, where 10% of the time, or with probability 0.1, it is raining in the afternoon. And with probability 0.9, it is not raining in the afternoon. And using just these two pieces of information, I don’t actually have a whole lot of information about how these two variables relate to each other. But I could if I had access to their joint probability, meaning for every combination of these two things, meaning morning cloudy and afternoon rain, morning cloudy and afternoon not rain, morning not cloudy and afternoon rain, and morning not cloudy and afternoon not raining, if I had access to values for each of those four, I’d have more information. So information that’d be organized in a table like this, and this, rather than just a probability distribution, is a joint probability distribution. It tells me the probability distribution of each of the possible combinations of values that these random variables can take on. So if I want to know what is the probability that on any given day it is both cloudy and rainy, well, I would say, all right, we’re looking at cases where it is cloudy and cases where it is raining. And the intersection of those two, that row in that column, is 0.08. So that is the probability that it is both cloudy and rainy using that information. And using this conditional probability table, using this joint probability table, I can begin to draw other pieces of information about things like conditional probability. So I might ask a question like, what is the probability distribution of clouds given that I know that it is raining? Meaning I know for sure that it’s raining. Tell me the probability distribution over whether it’s cloudy or not, given that I know already that it is, in fact, raining. And here I’m using C to stand for that random variable. I’m looking for a distribution, meaning the answer to this is not going to be a single value. It’s going to be two values, a vector of two values, where the first value is probability of clouds, the second value is probability that it is not cloudy, but the sum of those two values is going to be 1. Because when you add up the probabilities of all of the possible worlds, the result that you get must be the number 1. And well, what do we know about how to calculate a conditional probability? Well, we know that the probability of A given B is the probability of A and B divided by the probability of B. So what does this mean? Well, it means that I can calculate the probability of clouds given that it’s raining as the probability of clouds and raining divided by the probability of rain. And this comma here for the probability distribution of clouds and rain, this comma sort of stands in for the word and. You’ll sort of see in the logical operator and and the comma used interchangeably. This means the probability distribution over the clouds and knowing the fact that it is raining divided by the probability of rain. And the interesting thing to note here and what we’ll often do in order to simplify our mathematics is that dividing by the probability of rain, the probability of rain here is just some numerical constant. It is some number. Dividing by probability of rain is just dividing by some constant, or in other words, multiplying by the inverse of that constant. And it turns out that oftentimes we can just not worry about what the exact value of this is and just know that it is, in fact, a constant value. And we’ll see why in a moment. So instead of expressing this as this joint probability divided by the probability of rain, sometimes we’ll just represent it as alpha times the numerator here, the probability distribution of C, this variable, and that we know that it is raining, for instance. So all we’ve done here is said this value of 1 over the probability of rain, that’s really just a constant we’re going to divide by or equivalently multiply by the inverse of at the end. We’ll just call it alpha for now and deal with it a little bit later. But the key idea here now, and this is an idea that’s going to come up again, is that the conditional distribution of C given rain is proportional to, meaning just some factor multiplied by the joint probability of C and rain being true. And so how do we figure this out? Well, this is going to be the probability that it is cloudy given that it’s raining, which is 0.08, and the probability that it’s not cloudy given that it’s raining, which is 0.02. And so we get alpha times here now is that probability distribution. 0.08 is clouds and rain. 0.02 is not cloudy and rain. But of course, 0.08 and 0.02 don’t sum up to the number 1. And we know that in a probability distribution, if you consider all of the possible values, they must sum up to a probability of 1. And so we know that we just need to figure out some constant to normalize, so to speak, these values, something we can multiply or divide by to get it so that all these probabilities sum up to 1, and it turns out that if we multiply both numbers by 10, then we can get that result of 0.8 and 0.2. The proportions are still equivalent, but now 0.8 plus 0.2, those sum up to the number 1. So take a look at this and see if you can understand step by step how it is we’re getting from one point to another. The key idea here is that by using the joint probabilities, these probabilities that it is both cloudy and rainy and that it is not cloudy and rainy, I can take that information and figure out the conditional probability given that it’s raining. What is the chance that it’s cloudy versus not cloudy? Just by multiplying by some normalization constant, so to speak. And this is what a computer can begin to use to be able to interact with these various different types of probabilities. And it turns out there are a number of other probability rules that are going to be useful to us as we begin to explore how we can actually use this information to encode into our computers some more complex analysis that we might want to do about probability and distributions and random variables that we might be interacting with. So here are a couple of those important probability rules. One of the simplest rules is just this negation rule. What is the probability of not event A? So A is an event that has some probability, and I would like to know what is the probability that A does not occur. And it turns out it’s just 1 minus P of A, which makes sense. Because if those are the two possible cases, either A happens or A doesn’t happen, then when you add up those two cases, you must get 1, which means that P of not A must just be 1 minus P of A. Because P of A and P of not A must sum up to the number 1. They must include all of the possible cases. We’ve seen an expression for calculating the probability of A and B. We might also reasonably want to calculate the probability of A or B. What is the probability that one thing happens or another thing happens? So for example, I might want to calculate what is the probability that if I roll two dice, a red die and a blue die, what is the likelihood that A is a 6 or B is a 6, like one or the other? And what you might imagine you could do, and the wrong way to approach it, would be just to say, all right, well, A comes up as a 6 with the red die comes up as a 6 with probability 1 over 6. The same for the blue die, it’s also 1 over 6. Add them together, and you get 2 over 6, otherwise known as 1 third. But this suffers from a problem of over counting, that we’ve double counted the case, where both A and B, both the red die and the blue die, both come up as a 6-roll. And I’ve counted that instance twice. So to resolve this, the actual expression for calculating the probability of A or B uses what we call the inclusion-exclusion formula. So I take the probability of A, add it to the probability of B. That’s all same as before. But then I need to exclude the cases that I’ve double counted. So I subtract from that the probability of A and B. And that gets me the result for A or B. I consider all the cases where A is true and all the cases where B is true. And if you imagine this is like a Venn diagram of cases where A is true, cases where B is true, I just need to subtract out the middle to get rid of the cases that I have overcounted by double counting them inside of both of these individual expressions. One other rule that’s going to be quite helpful is a rule called marginalization. So marginalization is answering the question of how do I figure out the probability of A using some other variable that I might have access to, like B? Even if I don’t know additional information about it, I know that B, some event, can have two possible states, either B happens or B doesn’t happen, assuming it’s a Boolean, true or false. And well, what that means is that for me to be able to calculate the probability of A, there are only two cases. Either A happens and B happens, or A happens and B doesn’t happen. And those are two disjoint, meaning they can’t both happen together. Either B happens or B doesn’t happen. They’re disjoint or separate cases. And so I can figure out the probability of A just by adding up those two cases. The probability that A is true is the probability that A and B is true, plus the probability that A is true and B isn’t true. So by marginalizing, I’ve looked at the two possible cases that might take place, either B happens or B doesn’t happen. And in either of those cases, I look at what’s the probability that A happens. And if I add those together, well, then I get the probability that A happens as a whole. So take a look at that rule. It doesn’t matter what B is or how it’s related to A. So long as I know these joint distributions, I can figure out the overall probability of A. And this can be a useful way if I have a joint distribution, like the joint distribution of A and B, to just figure out some unconditional probability, like the probability of A. And we’ll see examples of this soon as well. Now, sometimes these might not just be random, might not just be variables that are events that are like they happened or they didn’t happen, like B is here. They might be some broader probability distribution where there are multiple possible values. And so here, in order to use this marginalization rule, I need to sum up not just over B and not B, but for all of the possible values that the other random variable could take on. And so here, we’ll see a version of this rule for random variables. And it’s going to include that summation notation to indicate that I’m summing up, adding up a whole bunch of individual values. So here’s the rule. Looks a lot more complicated, but it’s actually the equivalent exactly the same rule. What I’m saying here is that if I have two random variables, one called x and one called y, well, the probability that x is equal to some value x sub i, this is just some value that this variable takes on. How do I figure it out? Well, I’m going to sum up over j, where j is going to range over all of the possible values that y can take on. Well, let’s look at the probability that x equals xi and y equals yj. So the exact same rule, the only difference here is now I’m summing up over all of the possible values that y can take on, saying let’s add up all of those possible cases and look at this joint distribution, this joint probability, that x takes on the value I care about, given all of the possible values for y. And if I add all those up, then I can get this unconditional probability of what x is equal to, whether or not x is equal to some value x sub i. So let’s take a look at this rule, because it does look a little bit complicated. Let’s try and put a concrete example to it. Here again is that same joint distribution from before. I have cloud, not cloudy, rainy, not rainy. And maybe I want to access some variable. I want to know what is the probability that it is cloudy. Well, marginalization says that if I have this joint distribution and I want to know what is the probability that it is cloudy, well, I need to consider the other variable, the variable that’s not here, the idea that it’s rainy. And I consider the two cases, either it’s raining or it’s not raining. And I just sum up the values for each of those possibilities. In other words, the probability that it is cloudy is equal to the sum of the probability that it’s cloudy and it’s rainy and the probability that it’s cloudy and it is not raining. And so these now are values that I have access to. These are values that are just inside of this joint probability table. What is the probability that it is both cloudy and rainy? Well, it’s just the intersection of these two here, which is 0.08. And the probability that it’s cloudy and not raining is, all right, here’s cloudy, here’s not raining. It’s 0.32. So it’s 0.08 plus 0.32, which just gives us equal to 0.4. That is the unconditional probability that it is, in fact, cloudy. And so marginalization gives us a way to go from these joint distributions to just some individual probability that I might care about. And you’ll see a little bit later why it is that we care about that and why that’s actually useful to us as we begin doing some of these calculations. Last rule we’ll take a look at before transitioning to something a little bit different is this rule of conditioning, very similar to the marginalization rule. But it says that, again, if I have two events, a and b, but instead of having access to their joint probabilities, I have access to their conditional probabilities, how they relate to each other. Well, again, if I want to know the probability that a happens, and I know that there’s some other variable b, either b happens or b doesn’t happen, and so I can say that the probability of a is the probability of a given b times the probability of b, meaning b happened. And given that I know b happened, what’s the likelihood that a happened? And then I consider the other case, that b didn’t happen. So here’s the probability that b didn’t happen. And here’s the probability that a happens, given that I know that b didn’t happen. And this is really the equivalent rule just using conditional probability instead of joint probability, where I’m saying let’s look at both of these two cases and condition on b. Look at the case where b happens, and look at the case where b doesn’t happen, and look at what probabilities I get as a result. And just as in the case of marginalization, where there was an equivalent rule for random variables that could take on multiple possible values in a domain of possible values, here, too, conditioning has the same equivalent rule. Again, there’s a summation to mean I’m summing over all of the possible values that some random variable y could take on. But if I want to know what is the probability that x takes on this value, then I’m going to sum up over all the values j that y could take on, and say, all right, what’s the chance that y takes on that value yj? And multiply it by the conditional probability that x takes on this value, given that y took on that value yj. So equivalent rule just using conditional probabilities instead of joint probabilities. And using the equation we know about joint probabilities, we can translate between these two. So all right, we’ve seen a whole lot of mathematics, and we’ve just laid the foundation for mathematics. And no need to worry if you haven’t seen probability in too much detail up until this point. These are the foundations of the ideas that are going to come up as we begin to explore how we can now take these ideas from probability and begin to apply them to represent something inside of our computer, something inside of the AI agent we’re trying to design that is able to represent information and probabilities and the likelihoods between various different events. So there are a number of different probabilistic models that we can generate, but the first of the models we’re going to talk about are what are known as Bayesian networks. And a Bayesian network is just going to be some network of random variables, connected random variables that are going to represent the dependence between these random variables. The odds are most random variables in this world are not independent from each other, but there’s some relationship between things that are happening that we care about. If it is rainy today, that might increase the likelihood that my flight or my train gets delayed, for example. There are some dependence between these random variables, and a Bayesian network is going to be able to capture those dependencies. So what is a Bayesian network? What is its actual structure, and how does it work? Well, a Bayesian network is going to be a directed graph. And again, we’ve seen directed graphs before. They are individual nodes with arrows or edges that connect one node to another node pointing in a particular direction. And so this directed graph is going to have nodes as well, where each node in this directed graph is going to represent a random variable, something like the weather, or something like whether my train was on time or delayed. And we’re going to have an arrow from a node x to a node y to mean that x is a parent of y. So that’ll be our notation. If there’s an arrow from x to y, x is going to be considered a parent of y. And the reason that’s important is because each of these nodes is going to have a probability distribution that we’re going to store along with it, which is the distribution of x given some evidence, given the parents of x. So the way to more intuitively think about this is the parents seem to be thought of as sort of causes for some effect that we’re going to observe. And so let’s take a look at an actual example of a Bayesian network and think about the types of logic that might be involved in reasoning about that network. Let’s imagine for a moment that I have an appointment out of town, and I need to take a train in order to get to that appointment. So what are the things I might care about? Well, I care about getting to my appointment on time. Whether I make it to my appointment and I’m able to attend it or I miss the appointment. And you might imagine that that’s influenced by the train, that the train is either on time or it’s delayed, for example. But that train itself is also influenced. Whether the train is on time or not depends maybe on the rain. Is there no rain? Is it light rain? Is there heavy rain? And it might also be influenced by other variables too. It might be influenced as well by whether or not there’s maintenance on the train track, for example. If there is maintenance on the train track, that probably increases the likelihood that my train is delayed. And so we can represent all of these ideas using a Bayesian network that looks a little something like this. Here I have four nodes representing four random variables that I would like to keep track of. I have one random variable called rain that can take on three possible values in its domain, either none or light or heavy, for no rain, light rain, or heavy rain. I have a variable called maintenance for whether or not there is maintenance on the train track, which it has two possible values, just either yes or no. Either there is maintenance or there’s no maintenance happening on the track. Then I have a random variable for the train indicating whether or not the train was on time or not. That random variable has two possible values in its domain. The train is either on time or the train is delayed. And then finally, I have a random variable for whether I make it to my appointment. For my appointment down here, I have a random variable called appointment that itself has two possible values, attend and miss. And so here are the possible values. Here are my four nodes, each of which represents a random variable, each of which has a domain of possible values that it can take on. And the arrows, the edges pointing from one node to another, encode some notion of dependence inside of this graph, that whether I make it to my appointment or not is dependent upon whether the train is on time or delayed. And whether the train is on time or delayed is dependent on two things given by the two arrows pointing at this node. It is dependent on whether or not there was maintenance on the train track. And it is also dependent upon whether or not it was raining or whether it is raining. And just to make things a little complicated, let’s say as well that whether or not there is maintenance on the track, this too might be influenced by the rain. That if there’s heavier rain, well, maybe it’s less likely that it’s going to be maintenance on the train track that day because they’re more likely to want to do maintenance on the track on days when it’s not raining, for example. And so these nodes might have different relationships between them. But the idea is that we can come up with a probability distribution for any of these nodes based only upon its parents. And so let’s look node by node at what this probability distribution might actually look like. And we’ll go ahead and begin with this root node, this rain node here, which is at the top, and has no arrows pointing into it, which means its probability distribution is not going to be a conditional distribution. It’s not based on anything. I just have some probability distribution over the possible values for the rain random variable. And that distribution might look a little something like this. None, light and heavy, each have a possible value. Here I’m saying the likelihood of no rain is 0.7, of light rain is 0.2, of heavy rain is 0.1, for example. So here is a probability distribution for this root node in this Bayesian network. And let’s now consider the next node in the network, maintenance. Track maintenance is yes or no. And the general idea of what this distribution is going to encode, at least in this story, is the idea that the heavier the rain is, the less likely it is that there’s going to be maintenance on the track. Because the people that are doing maintenance on the track probably want to wait until a day when it’s not as rainy in order to do the track maintenance, for example. And so what might that probability distribution look like? Well, this now is going to be a conditional probability distribution, that here are the three possible values for the rain random variable, which I’m here just going to abbreviate to R, either no rain, light rain, or heavy rain. And for each of those possible values, either there is yes track maintenance or no track maintenance. And those have probabilities associated with them. That I see here that if it is not raining, then there is a probability of 0.4 that there’s track maintenance and a probability of 0.6 that there isn’t. But if there’s heavy rain, then here the chance that there is track maintenance is 0.1 and the chance that there is not track maintenance is 0.9. Each of these rows is going to sum up to 1. Because each of these represent different values of whether or not it’s raining, the three possible values that that random variable can take on. And each is associated with its own probability distribution that is ultimately all going to add up to the number 1. So that there is our distribution for this random variable called maintenance, about whether or not there is maintenance on the train track. And now let’s consider the next variable. Here we have a node inside of our Bayesian network called train that has two possible values, on time and delayed. And this node is going to be dependent upon the two nodes that are pointing towards it, that whether or not the train is on time or delayed depends on whether or not there is track maintenance. And it depends on whether or not there is rain, that heavier rain probably means more likely that my train is delayed. And if there is track maintenance, that also probably means it’s more likely that my train is delayed as well. And so you could construct a larger probability distribution, a conditional probability distribution, that instead of conditioning on just one variable, as was the case here, is now conditioning on two variables, conditioning both on rain represented by r and on maintenance represented by yes. Again, each of these rows has two values that sum up to the number 1, one for whether the train is on time, one for whether the train is delayed. And here I can say something like, all right, if I know there was light rain and track maintenance, well, OK, that would be r is light and m is yes. Well, then there is a probability of 0.6 that my train is on time, and a probability of 0.4 the train is delayed. And you can imagine gathering this data just by looking at real world data, looking at data about, all right, if I knew that it was light rain and there was track maintenance, how often was a train delayed or not delayed? And you could begin to construct this thing. The interesting thing is intelligently, being able to try to figure out how might you go about ordering these things, what things might influence other nodes inside of this Bayesian network. And the last thing I care about is whether or not I make it to my appointment. So did I attend or miss the appointment? And ultimately, whether I attend or miss the appointment, it is influenced by track maintenance, because it’s indirectly this idea that, all right, if there is track maintenance, well, then my train might more likely be delayed. And if my train is more likely to be delayed, then I’m more likely to miss my appointment. But what we encode in this Bayesian network are just what we might consider to be more direct relationships. So the train has a direct influence on the appointment. And given that I know whether the train is on time or delayed, knowing whether there’s track maintenance isn’t going to give me any additional information that I didn’t already have. That if I know train, these other nodes that are up above isn’t really going to influence the result. And so here we might represent it using another conditional probability distribution that looks a little something like this. The train can take on two possible values. Either my train is on time or my train is delayed. And for each of those two possible values, I have a distribution for what are the odds that I’m able to attend the meeting and what are the odds that I missed the meeting. And obviously, if my train is on time, I’m much more likely to be able to attend the meeting than if my train is delayed, in which case I’m more likely to miss that meeting. So all of these nodes put all together here represent this Bayesian network, this network of random variables whose values I ultimately care about, and that have some sort of relationship between them, some sort of dependence where these arrows from one node to another indicate some dependence, that I can calculate the probability of some node given the parents that happen to exist there. So now that we’ve been able to describe the structure of this Bayesian network and the relationships between each of these nodes by associating each of the nodes in the network with a probability distribution, whether that’s an unconditional probability distribution in the case of this root node here, like rain, and a conditional probability distribution in the case of all of the other nodes whose probabilities are dependent upon the values of their parents, we can begin to do some computation and calculation using the information inside of that table. So let’s imagine, for example, that I just wanted to compute something simple like the probability of light rain. How would I get the probability of light rain? Well, light rain, rain here is a root node. And so if I wanted to calculate that probability, I could just look at the probability distribution for rain and extract from it the probability of light rains, just a single value that I already have access to. But we could also imagine wanting to compute more complex joint probabilities, like the probability that there is light rain and also no track maintenance. This is a joint probability of two values, light rain and no track maintenance. And the way I might do that is first by starting by saying, all right, well, let me get the probability of light rain. But now I also want the probability of no track maintenance. But of course, this node is dependent upon the value of rain. So what I really want is the probability of no track maintenance, given that I know that there was light rain. And so the expression for calculating this idea that the probability of light rain and no track maintenance is really just the probability of light rain and the probability that there is no track maintenance, given that I know that there already is light rain. So I take the unconditional probability of light rain, multiply it by the conditional probability of no track maintenance, given that I know there is light rain. And you can continue to do this again and again for every variable that you want to add into this joint probability that I might want to calculate. If I wanted to know the probability of light rain and no track maintenance and a delayed train, well, that’s going to be the probability of light rain, multiplied by the probability of no track maintenance, given light rain, multiplied by the probability of a delayed train, given light rain and no track maintenance. Because whether the train is on time or delayed is dependent upon both of these other two variables. And so I have two pieces of evidence that go into the calculation of that conditional probability. And each of these three values is just a value that I can look up by looking at one of these individual probability distributions that is encoded into my Bayesian network. And if I wanted a joint probability over all four of the variables, something like the probability of light rain and no track maintenance and a delayed train and I miss my appointment, well, that’s going to be multiplying four different values, one from each of these individual nodes. It’s going to be the probability of light rain, then of no track maintenance given light rain, then of a delayed train, given light rain and no track maintenance. And then finally, for this node here, for whether I make it to my appointment or not, it’s not dependent upon these two variables, given that I know whether or not the train is on time. I only need to care about the conditional probability that I miss my train, or that I miss my appointment, given that the train happens to be delayed. And so that’s represented here by four probabilities, each of which is located inside of one of these probability distributions for each of the nodes, all multiplied together. And so I can take a variable like that and figure out what the joint probability is by multiplying a whole bunch of these individual probabilities from the Bayesian network. But of course, just as with last time, where what I really wanted to do was to be able to get new pieces of information, here, too, this is what we’re going to want to do with our Bayesian network. In the context of knowledge, we talked about the problem of inference. Given things that I know to be true, can I draw conclusions, make deductions about other facts about the world that I also know to be true? And what we’re going to do now is apply the same sort of idea to probability. Using information about which I have some knowledge, whether some evidence or some probabilities, can I figure out not other variables for certain, but can I figure out the probabilities of other variables taking on particular values? And so here, we introduce the problem of inference in a probabilistic setting, in a case where variables might not necessarily be true for sure, but they might be random variables that take on different values with some probability. So how do we formally define what exactly this inference problem actually is? Well, the inference problem has a couple of parts to it. We have some query, some variable x that we want to compute the distribution for. Maybe I want the probability that I miss my train, or I want the probability that there is track maintenance, something that I want information about. And then I have some evidence variables. Maybe it’s just one piece of evidence. Maybe it’s multiple pieces of evidence. But I’ve observed certain variables for some sort of event. So for example, I might have observed that it is raining. This is evidence that I have. I know that there is light rain, or I know that there is heavy rain. And that is evidence I have. And using that evidence, I want to know what is the probability that my train is delayed, for example. And that is a query that I might want to ask based on this evidence. So I have a query, some variable. Evidence, which are some other variables that I have observed inside of my Bayesian network. And of course, that does leave some hidden variables. Why? These are variables that are not evidence variables and not query variables. So you might imagine in the case where I know whether or not it’s raining, and I want to know whether my train is going to be delayed or not, the hidden variable, the thing I don’t have access to, is something like, is there maintenance on the track? Or am I going to make or not make my appointment, for example? These are variables that I don’t have access to. They’re hidden because they’re not things I observed, and they’re also not the query, the thing that I’m asking. And so ultimately, what we want to calculate is I want to know the probability distribution of x given e, the event that I observed. So given that I observed some event, I observed that it is raining, I would like to know what is the distribution over the possible values of the train random variable. Is it on time? Is it delayed? What’s the likelihood it’s going to be there? And it turns out we can do this calculation just using a lot of the probability rules that we’ve already seen in action. And ultimately, we’re going to take a look at the math at a little bit of a high level, at an abstract level. But ultimately, we can allow computers and programming libraries that already exist to begin to do some of this math for us. But it’s good to get a general sense for what’s actually happening when this inference process takes place. Let’s imagine, for example, that I want to compute the probability distribution of the appointment random variable given some evidence, given that I know that there was light rain and no track maintenance. So there’s my evidence, these two variables that I observe the values of. I observe the value of rain. I know there’s light rain. And I know that there is no track maintenance going on today. And what I care about knowing, my query, is this random variable appointment. I want to know the distribution of this random variable appointment, like what is the chance that I’m able to attend my appointment? What is the chance that I miss my appointment given this evidence? And the hidden variable, the information that I don’t have access to, is this variable train. This is information that is not part of the evidence that I see, not something that I observe. But it is also not the query that I’m asking for. And so what might this inference procedure look like? Well, if you recall back from when we were defining conditional probability and doing math with conditional probabilities, we know that a conditional probability is proportional to the joint probability. And we remembered this by recalling that the probability of A given B is just some constant factor alpha multiplied by the probability of A and B. That constant factor alpha turns out to be like dividing over the probability of B. But the important thing is that it’s just some constant multiplied by the joint distribution, the probability that all of these individual things happen. So in this case, I can take the probability of the appointment random variable given light rain and no track maintenance and say that is just going to be proportional, some constant alpha, multiplied by the joint probability, the probability of a particular value for the appointment random variable and light rain and no track maintenance. Well, all right, how do I calculate this, probability of appointment and light rain and no track maintenance, when what I really care about is knowing I need all four of these values to be able to calculate a joint distribution across everything because in a particular appointment depends upon the value of train? Well, in order to do that, here I can begin to use that marginalization trick, that there are only two ways I can get any configuration of an appointment, light rain, and no track maintenance. Either this particular setting of variables happens and the train is on time, or this particular setting of variables happens and the train is delayed. Those are two possible cases that I would want to consider. And if I add those two cases up, well, then I get the result just by adding up all of the possibilities for the hidden variable or variables that there are multiple. But since there’s only one hidden variable here, train, all I need to do is iterate over all the possible values for that hidden variable train and add up their probabilities. So this probability expression here becomes probability distribution over appointment, light, no rain, and train is on time, and the probability distribution over the appointment, light rain, no track maintenance, and that the train is delayed, for example. So I take both of the possible values for train, go ahead and add them up. These are just joint probabilities that we saw earlier, how to calculate just by going parent, parent, parent, parent, and calculating those probabilities and multiplying them together. And then you’ll need to normalize them at the end, speaking at a high level, to make sure that everything adds up to the number 1. So the formula for how you do this in a process known as inference by enumeration looks a little bit complicated, but ultimately it looks like this. And let’s now try to distill what it is that all of these symbols actually mean. Let’s start here. What I care about knowing is the probability of x, my query variable, given some sort of evidence. What do I know about conditional probabilities? Well, a conditional probability is proportional to the joint probability. So it is some alpha, some normalizing constant, multiplied by this joint probability of x and evidence. And how do I calculate that? Well, to do that, I’m going to marginalize over all of the hidden variables, all the variables that I don’t directly observe the values for. I’m basically going to iterate over all of the possibilities that it could happen and just sum them all up. And so I can translate this into a sum over all y, which ranges over all the possible hidden variables and the values that they could take on, and adds up all of those possible individual probabilities. And that is going to allow me to do this process of inference by enumeration. Now, ultimately, it’s pretty annoying if we as humans have to do all this math for ourselves. But turns out this is where computers and AI can be particularly helpful, that we can program a computer to understand a Bayesian network, to be able to understand these inference procedures, and to be able to do these calculations. And using the information you’ve seen here, you could implement a Bayesian network from scratch yourself. But turns out there are a lot of libraries, especially written in Python, that allow us to make it easier to do this sort of probabilistic inference, to be able to take a Bayesian network and do these sorts of calculations, so that you don’t need to know and understand all of the underlying math, though it’s helpful to have a general sense for how it works. But you just need to be able to describe the structure of the network and make queries in order to be able to produce the result. And so let’s take a look at an example of that right now. It turns out that there are a lot of possible libraries that exist in Python for doing this sort of inference. It doesn’t matter too much which specific library you use. They all behave in fairly similar ways. But the library I’m going to use here is one known as pomegranate. And here inside of model.py, I have defined a Bayesian network, just using the structure and the syntax that the pomegranate library expects. And what I’m effectively doing is just, in Python, creating nodes to represent each of the nodes of the Bayesian network that you saw me describe a moment ago. So here on line four, after I’ve imported pomegranate, I’m defining a variable called rain that is going to represent a node inside of my Bayesian network. It’s going to be a node that follows this distribution, where there are three possible values, none for no rain, light for light rain, heavy for heavy rain. And these are the probabilities of each of those taking place. 0.7 is the likelihood of no rain, 0.2 for light rain, 0.1 for heavy rain. Then after that, we go to the next variable, the variable for track maintenance, for example, which is dependent upon that rain variable. And this, instead of being an unconditional distribution, is a conditional distribution, as indicated by a conditional probability table here. And the idea is that I’m following this is conditional on the distribution of rain. So if there is no rain, then the chance that there is, yes, track maintenance is 0.4. If there’s no rain, the chance that there is no track maintenance is 0.6. Likewise, for light rain, I have a distribution. For heavy rain, I have a distribution as well. But I’m effectively encoding the same information you saw represented graphically a moment ago. But I’m telling this Python program that the maintenance node obeys this particular conditional probability distribution. And we do the same thing for the other random variables as well. Train was a node inside my distribution that was a conditional probability table with two parents. It was dependent not only on rain, but also on track maintenance. And so here I’m saying something like, given that there is no rain and, yes, track maintenance, the probability that my train is on time is 0.8. And the probability that it’s delayed is 0.2. And likewise, I can do the same thing for all of the other possible values of the parents of the train node inside of my Bayesian network by saying, for all of those possible values, here is the distribution that the train node should follow. Then I do the same thing for an appointment based on the distribution of the variable train. Then at the end, what I do is actually construct this network by describing what the states of the network are and by adding edges between the dependent nodes. So I create a new Bayesian network, add states to it, one for rain, one for maintenance, one for the train, one for the appointment. And then I add edges connecting the related pieces. Rain has an arrow to maintenance because rain influences track maintenance. Rain also influences the train. Maintenance also influences the train. And train influences whether I make it to my appointment and bake just finalizes the model and does some additional computation. So the specific syntax of this is not really the important part. Pomegranate just happens to be one of several different libraries that can all be used for similar purposes. And you could describe and define a library for yourself that implemented similar things. But the key idea here is that someone can design a library for a general Bayesian network that has nodes that are based upon its parents. And then all a programmer needs to do using one of those libraries is to define what those nodes and what those probability distributions are. And we can begin to do some interesting logic based on it. So let’s try doing that conditional or joint probability calculation that we saw us do by hand before by going into likelihood.py, where here I’m importing the model that I just defined a moment ago. And here I’d just like to calculate model.probability, which calculates the probability for a given observation. And I’d like to calculate the probability of no rain, no track maintenance, my train is on time, and I’m able to attend the meeting. So sort of the optimal scenario that there is no rain and no maintenance on the track, my train is on time, and I’m able to attend the meeting. What is the probability that all of that actually happens? And I can calculate that using the library and just print out its probability. And so I’ll go ahead and run python of likelihood.py. And I see that, OK, the probability is about 0.34. So about a third of the time, everything goes right for me in this case. No rain, no track maintenance, train is on time, and I’m able to attend the meeting. But I could experiment with this, try and calculate other probabilities as well. What’s the probability that everything goes right up until the train, but I still miss my meeting? So no rain, no track maintenance, train is on time, but I miss the appointment. Let’s calculate that probability. And all right, that has a probability of about 0.04. So about 4% of the time, the train will be on time, there won’t be any rain, no track maintenance, and yet I’ll still miss the meeting. And so this is really just an implementation of the calculation of the joint probabilities that we did before. What this library is likely doing is first figuring out the probability of no rain, then figuring out the probability of no track maintenance given no rain, then the probability that my train is on time given both of these values, and then the probability that I miss my appointment given that I know that the train was on time. So this, again, is the calculation of that joint probability. And turns out we can also begin to have our computer solve inference problems as well, to begin to infer, based on information, evidence that we see, what is the likelihood of other variables also being true. So let’s go into inference.py, for example. We’re here, I’m again importing that exact same model from before, importing all the nodes and all the edges and the probability distribution that is encoded there as well. And now there’s a function for doing some sort of prediction. And here, into this model, I pass in the evidence that I observe. So here, I’ve encoded into this Python program the evidence that I have observed. I have observed the fact that the train is delayed. And that is the value for one of the four random variables inside of this Bayesian network. And using that information, I would like to be able to draw inspiration and figure out inferences about the values of the other random variables that are inside of my Bayesian network. I would like to make predictions about everything else. So all of the actual computational logic is happening in just these three lines, where I’m making this call to this prediction. Down below, I’m just iterating over all of the states and all the predictions and just printing them out so that we can visually see what the results are. But let’s find out, given the train is delayed, what can I predict about the values of the other random variables? Let’s go ahead and run python inference.py. I run that, and all right, here is the result that I get. Given the fact that I know that the train is delayed, this is evidence that I have observed. Well, given that there is a 45% chance or a 46% chance that there was no rain, a 31% chance there was light rain, a 23% chance there was heavy rain, I can see a probability distribution of a track maintenance and a probability distribution over whether I’m able to attend or miss my appointment. Now, we know that whether I attend or miss the appointment, that is only dependent upon the train being delayed or not delayed. It shouldn’t depend on anything else. So let’s imagine, for example, that I knew that there was heavy rain. That shouldn’t affect the distribution for making the appointment. And indeed, if I go up here and add some evidence, say that I know that the value of rain is heavy. That is evidence that I now have access to. I now have two pieces of evidence. I know that the rain is heavy, and I know that my train is delayed. I can calculate the probability by running this inference procedure again and seeing the result. I know that the rain is heavy. I know my train is delayed. The probability distribution for track maintenance changed. Given that I know that there’s heavy rain, now it’s more likely that there is no track maintenance, 88%, as opposed to 64% from here before. And now, what is the probability that I make the appointment? Well, that’s the same as before. It’s still going to be attend the appointment with probability 0.6, missed the appointment with probability 0.4, because it was only dependent upon whether or not my train was on time or delayed. And so this here is implementing that idea of that inference algorithm to be able to figure out, based on the evidence that I have, what can we infer about the values of the other variables that exist as well. So inference by enumeration is one way of doing this inference procedure, just looping over all of the values the hidden variables could take on and figuring out what the probability is. Now, it turns out this is not particularly efficient. And there are definitely optimizations you can make by avoiding repeated work. If you’re calculating the same sort of probability multiple times, there are ways of optimizing the program to avoid having to recalculate the same probabilities again and again. But even then, as the number of variables get large, as the number of possible values of variables could take on, get large, we’re going to start to have to do a lot of computation, a lot of calculation, to be able to do this inference. And at that point, it might start to get unreasonable, in terms of the amount of time that it would take to be able to do this sort of exact inference. And it’s for that reason that oftentimes, when it comes towards probability and things we’re not entirely sure about, we don’t always care about doing exact inference and knowing exactly what the probability is. But if we can approximate the inference procedure, do some sort of approximate inference, that that can be pretty good as well. That if I don’t know the exact probability, but I have a general sense for the probability that I can get increasingly accurate with more time, that that’s probably pretty good, especially if I can get that to happen even faster. So how could I do approximate inference inside of a Bayesian network? Well, one method is through a procedure known as sampling. In the process of sampling, I’m going to take a sample of all of the variables inside of this Bayesian network here. And how am I going to sample? Well, I’m going to sample one of the values from each of these nodes according to their probability distribution. So how might I take a sample of all these nodes? Well, I’ll start at the root. I’ll start with rain. Here’s the distribution for rain. And I’ll go ahead and, using a random number generator or something like it, randomly pick one of these three values. I’ll pick none with probability 0.7, light with probability 0.2, and heavy with probability 0.1. So I’ll randomly just pick one of them according to that distribution. And maybe in this case, I pick none, for example. Then I do the same thing for the other variable. Maintenance also has a probability distribution. And I’m going to sample. Now, there are three probability distributions here. But I’m only going to sample from this first row here, because I’ve observed already in my sample that the value of rain is none. So given that rain is none, I’m going to sample from this distribution to say, all right, what should the value of maintenance be? And in this case, maintenance is going to be, let’s just say yes, which happens 40% of the time in the event that there is no rain, for example. And we’ll sample all of the rest of the nodes in this way as well, that I want to sample from the train distribution. And I’ll sample from this first row here, where there is no rain, but there is track maintenance. And I’ll sample 80% of the time. I’ll say the train is on time. 20% of the time, I’ll say the train is delayed. And finally, we’ll do the same thing for whether I make it to my appointment or not. Did I attend or miss the appointment? We’ll sample based on this distribution and maybe say that in this case, I attend the appointment, which happens 90% of the time when the train is actually on time. So by going through these nodes, I can very quickly just do some sampling and get a sample of the possible values that could come up from going through this entire Bayesian network according to those probability distributions. And where this becomes powerful is if I do this not once, but I do this thousands or tens of thousands of times and generate a whole bunch of samples all using this distribution. I get different samples. Maybe some of them are the same. But I get a value for each of the possible variables that could come up. And so then if I’m ever faced with a question, a question like, what is the probability that the train is on time, you could do an exact inference procedure. This is no different than the inference problem we had before where I could just marginalize, look at all the possible other values of the variables, and do the computation of inference by enumeration to find out this probability exactly. But I could also, if I don’t care about the exact probability, just sample it, approximate it to get close. And this is a powerful tool in AI where we don’t need to be right 100% of the time or we don’t need to be exactly right. If we just need to be right with some probability, we can often do so more effectively, more efficiently. And so if here now are all of those possible samples, I’ll highlight the ones where the train is on time. I’m ignoring the ones where the train is delayed. And in this case, there’s like six out of eight of the samples have the train is arriving on time. And so maybe in this case, I can say that in six out of eight cases, that’s the likelihood that the train is on time. And with eight samples, that might not be a great prediction. But if I had thousands upon thousands of samples, then this could be a much better inference procedure to be able to do these sorts of calculations. So this is a direct sampling method to just do a bunch of samples and then figure out what the probability of some event is. Now, this from before was an unconditional probability. What is the probability that the train is on time? And I did that by looking at all the samples and figuring out, right, here are the ones where the train is on time. But sometimes what I want to calculate is not an unconditional probability, but rather a conditional probability, something like what is the probability that there is light rain, given that the train is on time, something to that effect. And to do that kind of calculation, well, what I might do is here are all the samples that I have. And I want to calculate a probability distribution, given that I know that the train is on time. So to be able to do that, I can kind of look at the two cases where the train was delayed and ignore or reject them, sort of exclude them from the possible samples that I’m considering. And now I want to look at these remaining cases where the train is on time. Here are the cases where there is light rain. And I say, OK, these are two out of the six possible cases. That can give me an approximation for the probability of light rain, given the fact that I know the train was on time. And I did that in almost exactly the same way, just by adding an additional step, by saying that, all right, when I take each sample, let me reject all of the samples that don’t match my evidence and only consider the samples that do match what it is that I have in my evidence that I want to make some sort of calculation about. And it turns out, using the libraries that we’ve had for Bayesian networks, we can begin to implement this same sort of idea, like implement rejection sampling, which is what this method is called, to be able to figure out some probability, not via direct inference, but instead by sampling. So what I have here is a program called sample.py. Imports the exact same model. And what I define first is a program to generate a sample. And the way I generate a sample is just by looping over all of the states. The states need to be in some sort of order to make sure I’m looping in the correct order. But effectively, if it is a conditional distribution, I’m going to sample based on the parents. And otherwise, I’m just going to directly sample the variable, like rain, which has no parents. It’s just an unconditional distribution and keep track of all those parent samples and return the final sample. The exact syntax of this, again, not particularly important. It just happens to be part of the implementation details of this particular library. The interesting logic is down below. Now that I have the ability to generate a sample, if I want to know the distribution of the appointment random variable, given that the train is delayed, well, then I can begin to do calculations like this. Let me take 10,000 samples and assemble all my results in this list called data. I’ll go ahead and loop n times, in this case, 10,000 times. I’ll generate a sample. And I want to know the distribution of appointment, given that the train is delayed. So according to rejection sampling, I’m only going to consider samples where the train is delayed. If the train is not delayed, I’m not going to consider those values at all. So I’m going to say, all right, if I take the sample, look at the value of the train random variable, if the train is delayed, well, let me go ahead and add to my data that I’m collecting the value of the appointment random variable that it took on in this particular sample. So I’m only considering the samples where the train is delayed. And for each of those samples, considering what the value of appointment is, and then at the end, I’m using a Python class called counter, which quickly counts up all the values inside of a data set. So I can take this list of data and figure out how many times was my appointment made and how many times was my appointment missed. And so this here, with just a couple lines of code, is an implementation of rejection sampling. And I can run it by going ahead and running Python sample.py. And when I do that, here is the result I get. This is the result of the counter. 1,251 times, I was able to attend the meeting. And 856 times, I was able to miss the meeting. And you can imagine, by doing more and more samples, I’ll be able to get a better and better, more accurate result. And this is a randomized process. It’s going to be an approximation of the probability. If I run it a different time, you’ll notice the numbers are similar, 12, 72, and 905. But they’re not identical because there’s some randomization, some likelihood that things might be higher or lower. And so this is why we generally want to try and use more samples so that we can have a greater amount of confidence in our result, be more sure about the result that we’re getting of whether or not it accurately reflects or represents the actual underlying probabilities that are inherent inside of this distribution. And so this, then, was an instance of rejection sampling. And it turns out there are a number of other sampling methods that you could use to begin to try to sample. One problem that rejection sampling has is that if the evidence you’re looking for is a fairly unlikely event, well, you’re going to be rejecting a lot of samples. Like if I’m looking for the probability of x given some evidence e, if e is very unlikely to occur, like occurs maybe one every 1,000 times, then I’m only going to be considering 1 out of every 1,000 samples that I do, which is a pretty inefficient method for trying to do this sort of calculation. I’m throwing away a lot of samples. And it takes computational effort to be able to generate those samples. So I’d like to not have to do something like that. So there are other sampling methods that can try and address this. One such sampling method is called likelihood weighting. In likelihood weighting, we follow a slightly different procedure. And the goal is to avoid needing to throw out samples that didn’t match the evidence. And so what we’ll do is we’ll start by fixing the values for the evidence variables. Rather than sample everything, we’re going to fix the values of the evidence variables and not sample those. Then we’re going to sample all the other non-evidence variables in the same way, just using the Bayesian network looking at the probability distributions, sampling all the non-evidence variables. But then what we need to do is weight each sample by its likelihood. If our evidence is really unlikely, we want to make sure that we’ve taken into account how likely was the evidence to actually show up in the sample. If I have a sample where the evidence was much more likely to show up than another sample, then I want to weight the more likely one higher. So we’re going to weight each sample by its likelihood, where likelihood is just defined as the probability of all the evidence. Given all the evidence we have, what is the probability that it would happen in that particular sample? So before, all of our samples were weighted equally. They all had a weight of 1 when we were calculating the overall average. In this case, we’re going to weight each sample, multiply each sample by its likelihood in order to get the more accurate distribution. So what would this look like? Well, if I ask the same question, what is the probability of light rain, given that the train is on time, when I do the sampling procedure and start by trying to sample, I’m going to start by fixing the evidence variable. I’m already going to have in my sample the train is on time. That way, I don’t have to throw out anything. I’m only sampling things where I know the value of the variables that are my evidence are what I expect them to be. So I’ll go ahead and sample from rain. And maybe this time, I sample light rain instead of no rain. Then I’ll sample from track maintenance and say, maybe, yes, there’s track maintenance. Then for train, well, I’ve already fixed it in place. Train was an evidence variable. So I’m not going to bother sampling again. I’ll just go ahead and move on. I’ll move on to appointment and go ahead and sample from appointment as well. So now I’ve generated a sample. I’ve generated a sample by fixing this evidence variable and sampling the other three. And the last step is now weighting the sample. How much weight should it have? And the weight is based on how probable is it that the train was actually on time, this evidence actually happened, given the values of these other variables, light rain and the fact that, yes, there was track maintenance. Well, to do that, I can just go back to the train variable and say, all right, if there was light rain and track maintenance, the likelihood of my evidence, the likelihood that my train was on time, is 0.6. And so this particular sample would have a weight of 0.6. And I could repeat the sampling procedure again and again. Each time every sample would be given a weight according to the probability of the evidence that I see associated with it. And there are other sampling methods that exist as well, but all of them are designed to try and get it the same idea, to approximate the inference procedure of figuring out the value of a variable. So we’ve now dealt with probability as it pertains to particular variables that have these discrete values. But what we haven’t really considered is how values might change over time. That we’ve considered something like a variable for rain, where rain can take on values of none or light rain or heavy rain. But in practice, usually when we consider values for variables like rain, we like to consider it for over time, how do the values of these variables change? What do we do with when we’re dealing with uncertainty over a period of time, which can come up in the context of weather, for example, if I have sunny days and I have rainy days. And I’d like to know not just what is the probability that it’s raining now, but what is the probability that it rains tomorrow, or the day after that, or the day after that. And so to do this, we’re going to introduce a slightly different kind of model. But here, we’re going to have a random variable, not just one for the weather, but for every possible time step. And you can define time step however you like. A simple way is just to use days as your time step. And so we can define a variable called x sub t, which is going to be the weather at time t. So x sub 0 might be the weather on day 0. x sub 1 might be the weather on day 1, so on and so forth. x sub 2 is the weather on day 2. But as you can imagine, if we start to do this over longer and longer periods of time, there’s an incredible amount of data that might go into this. If you’re keeping track of data about the weather for a year, now suddenly you might be trying to predict the weather tomorrow, given 365 days of previous pieces of evidence. And that’s a lot of evidence to have to deal with and manipulate and calculate. Probably nobody knows what the exact conditional probability distribution is for all of those combinations of variables. And so when we’re trying to do this inference inside of a computer, when we’re trying to reasonably do this sort of analysis, it’s helpful to make some simplifying assumptions, some assumptions about the problem that we can just assume are true, to make our lives a little bit easier. Even if they’re not totally accurate assumptions, if they’re close to accurate or approximate, they’re usually pretty good. And the assumption we’re going to make is called the Markov assumption, which is the assumption that the current state depends only on a finite fixed number of previous states. So the current day’s weather depends not on all the previous day’s weather for the rest of all of history, but the current day’s weather I can predict just based on yesterday’s weather, or just based on the last two days weather, or the last three days weather. But oftentimes, we’re going to deal with just the one previous state that helps to predict this current state. And by putting a whole bunch of these random variables together, using this Markov assumption, we can create what’s called a Markov chain, where a Markov chain is just some sequence of random variables where each of the variables distribution follows that Markov assumption. And so we’ll do an example of this where the Markov assumption is, I can predict the weather. Is it sunny or rainy? And we’ll just consider those two possibilities for now, even though there are other types of weather. But I can predict each day’s weather just on the prior day’s weather, using today’s weather, I can come up with a probability distribution for tomorrow’s weather. And here’s what this weather might look like. It’s formatted in terms of a matrix, as you might describe it, as rows and columns of values, where on the left-hand side, I have today’s weather, represented by the variable x sub t. And over here in the columns, I have tomorrow’s weather, represented by the variable x sub t plus 1, t plus 1 day’s weather instead. And what this matrix is saying is, if today is sunny, well, then it’s more likely than not that tomorrow is also sunny. Oftentimes, the weather stays consistent for multiple days in a row. And for example, let’s say that if today is sunny, our model says that tomorrow, with probability 0.8, it will also be sunny. And with probability 0.2, it will be raining. And likewise, if today is raining, then it’s more likely than not that tomorrow is also raining. With probability 0.7, it’ll be raining. With probability 0.3, it will be sunny. So this matrix, this description of how it is we transition from one state to the next state is what we’re going to call the transition model. And using the transition model, you can begin to construct this Markov chain by just predicting, given today’s weather, what’s the likelihood of tomorrow’s weather happening. And you can imagine doing a similar sampling procedure, where you take this information, you sample what tomorrow’s weather is going to be. Using that, you sample the next day’s weather. And the result of that is you can form this Markov chain of like x0, time and time, day zero is sunny, the next day is sunny, maybe the next day it changes to raining, then raining, then raining. And the pattern that this Markov chain follows, given the distribution that we had access to, this transition model here, is that when it’s sunny, it tends to stay sunny for a little while. The next couple of days tend to be sunny too. And when it’s raining, it tends to be raining as well. And so you get a Markov chain that looks like this, and you can do analysis on this. You can say, given that today is raining, what is the probability that tomorrow is raining? Or you can begin to ask probability questions like, what is the probability of this sequence of five values, sun, sun, rain, rain, rain, and answer those sorts of questions too. And it turns out there are, again, many Python libraries for interacting with models like this of probabilities that have distributions and random variables that are based on previous variables according to this Markov assumption. And pomegranate2 has ways of dealing with these sorts of variables. So I’ll go ahead and go into the chain directory, where I have some information about Markov chains. And here, I’ve defined a file called model.py, where I’ve defined in a very similar syntax. And again, the exact syntax doesn’t matter so much as the idea that I’m encoding this information into a Python program so that the program has access to these distributions. I’ve here defined some starting distribution. So every Markov model begins at some point in time, and I need to give it some starting distribution. And so we’ll just say, you know at the start, you can pick 50-50 between sunny and rainy. We’ll say it’s sunny 50% of the time, rainy 50% of the time. And then down below, I’ve here defined the transition model, how it is that I transition from one day to the next. And here, I’ve encoded that exact same matrix from before, that if it was sunny today, then with probability 0.8, it will be sunny tomorrow. And it’ll be rainy tomorrow with probability 0.2. And I likewise have another distribution for if it was raining today instead. And so that alone defines the Markov model. You can begin to answer questions using that model. But one thing I’ll just do is sample from the Markov chain. It turns out there is a method built into this Markov chain library that allows me to sample 50 states from the chain, basically just simulating like 50 instances of weather. And so let me go ahead and run this. Python model.py. And when I run it, what I get is that it’s going to sample from this Markov chain 50 states, 50 days worth of weather that it’s just going to randomly sample. And you can imagine sampling many times to be able to get more data, to be able to do more analysis. But here, for example, it’s sunny two days in a row, rainy a whole bunch of days in a row before it changes back to sun. And so you get this model that follows the distribution that we originally described, that follows the distribution of sunny days tend to lead to more sunny days. Rainy days tend to lead to more rainy days. And that then is a Markov model. And Markov models rely on us knowing the values of these individual states. I know that today is sunny or that today is raining. And using that information, I can draw some sort of inference about what tomorrow is going to be like. But in practice, this often isn’t the case. It often isn’t the case that I know for certain what the exact state of the world is. Oftentimes, the state of the world is exactly unknown. But I’m able to somehow sense some information about that state, that a robot or an AI doesn’t have exact knowledge about the world around it. But it has some sort of sensor, whether that sensor is a camera or sensors that detect distance or just a microphone that is sensing audio, for example. It is sensing data. And using that data, that data is somehow related to the state of the world, even if it doesn’t actually know, our AI doesn’t know, what the underlying true state of the world actually is. And for that, we need to get into the world of sensor models, the way of describing how it is that we translate what the hidden state, the underlying true state of the world, is with what the observation, what it is that the AI knows or the AI has access to, actually is. And so for example, a hidden state might be a robot’s position. If a robot is exploring new uncharted territory, the robot likely doesn’t know exactly where it is. But it does have an observation. It has robot sensor data, where it can sense how far away are possible obstacles around it. And using that information, using the observed information that it has, it can infer something about the hidden state. Because what the true hidden state is influences those observations. Whatever the robot’s true position is affects or has some effect upon what the sensor data of the robot is able to collect is, even if the robot doesn’t actually know for certain what its true position is. Likewise, if you think about a voice recognition or a speech recognition program that listens to you and is able to respond to you, something like Alexa or what Apple and Google are doing with their voice recognition as well, that you might imagine that the hidden state, the underlying state, is what words are actually spoken. The true nature of the world contains you saying a particular sequence of words, but your phone or your smart home device doesn’t know for sure exactly what words you said. The only observation that the AI has access to is some audio waveforms. And those audio waveforms are, of course, dependent upon this hidden state. And you can infer, based on those audio waveforms, what the words spoken likely were. But you might not know with 100% certainty what that hidden state actually is. And it might be a task to try and predict, given this observation, given these audio waveforms, can you figure out what the actual words spoken are. And likewise, you might imagine on a website, true user engagement. Might be information you don’t directly have access to. But you can observe data, like website or app analytics, about how often was this button clicked or how often are people interacting with a page in a particular way. And you can use that to infer things about your users as well. So this type of problem comes up all the time when we’re dealing with AI and trying to infer things about the world. That often AI doesn’t really know the hidden true state of the world. All the AI has access to is some observation that is related to the hidden true state. But it’s not direct. There might be some noise there. The audio waveform might have some additional noise that might be difficult to parse. The sensor data might not be exactly correct. There’s some noise that might not allow you to conclude with certainty what the hidden state is, but can allow you to infer what it might be. And so the simple example we’ll take a look at here is imagining the hidden state as the weather, whether it’s sunny or rainy or not. And imagine you are programming an AI inside of a building that maybe has access to just a camera to inside the building. And all you have access to is an observation as to whether or not employees are bringing an umbrella into the building or not. You can detect whether it’s an umbrella or not. And so you might have an observation as to whether or not an umbrella is brought into the building or not. And using that information, you want to predict whether it’s sunny or rainy, even if you don’t know what the underlying weather is. So the underlying weather might be sunny or rainy. And if it’s raining, obviously people are more likely to bring an umbrella. And so whether or not people bring an umbrella, your observation, tells you something about the hidden state. And of course, this is a bit of a contrived example, but the idea here is to think about this more broadly in terms of more generally, any time you observe something, it having to do with some underlying hidden state. And so to try and model this type of idea where we have these hidden states and observations, rather than just use a Markov model, which has state, state, state, state, each of which is connected by that transition matrix that we described before, we’re going to use what we call a hidden Markov model. Very similar to a Markov model, but this is going to allow us to model a system that has hidden states that we don’t directly observe, along with some observed event that we do actually see. And so in addition to that transition model that we still need of saying, given the underlying state of the world, if it’s sunny or rainy, what’s the probability of tomorrow’s weather? We also need another model that, given some state, is going to give us an observation of green, yes, someone brings an umbrella into the office, or red, no, nobody brings umbrellas into the office. And so the observation might be that if it’s sunny, then odds are nobody is going to bring an umbrella to the office. But maybe some people are just being cautious, and they do bring an umbrella to the office anyways. And if it’s raining, then with much higher probability, then people are going to bring umbrellas into the office. But maybe if the rain was unexpected, people didn’t bring an umbrella. And so it might have some other probability as well. And so using the observations, you can begin to predict with reasonable likelihood what the underlying state is, even if you don’t actually get to observe the underlying state, if you don’t get to see what the hidden state is actually equal to. This here we’ll often call the sensor model. It’s also often called the emission probabilities, because the state, the underlying state, emits some sort of emission that you then observe. And so that can be another way of describing that same idea. And the sensor Markov assumption that we’re going to use is this assumption that the evidence variable, the thing we observe, the emission that gets produced, depends only on the corresponding state, meaning it can predict whether or not people will bring umbrellas or not entirely dependent just on whether it is sunny or rainy today. Of course, again, this assumption might not hold in practice, that in practice, it might depend whether or not people bring umbrellas, might depend not just on today’s weather, but also on yesterday’s weather and the day before. But for simplification purposes, it can be helpful to apply this sort of assumption just to allow us to be able to reason about these probabilities a little more easily. And if we’re able to approximate it, we can still often get a very good answer. And so what these hidden Markov models end up looking like is a little something like this, where now, rather than just have one chain of states, like sun, sun, rain, rain, rain, we instead have this upper level, which is the underlying state of the world. Is it sunny or is it rainy? And those are connected by that transition matrix we described before. But each of these states produces an emission, produces an observation that I see, that on this day, it was sunny and people didn’t bring umbrellas. And on this day, it was sunny, but people did bring umbrellas. And on this day, it was raining and people did bring umbrellas, and so on and so forth. And so each of these underlying states represented by x sub t for x sub 1, 0, 1, 2, so on and so forth, produces some sort of observation or emission, which is what the e stands for, e sub 0, e sub 1, e sub 2, so on and so forth. And so this, too, is a way of trying to represent this idea. And what you want to think about is that these underlying states are the true nature of the world, the robot’s position as it moves over time, and that produces some sort of sensor data that might be observed, or what people are actually saying and using the emission data of what audio waveforms do you detect in order to process that data and try and figure it out. And there are a number of possible tasks that you might want to do given this kind of information. And one of the simplest is trying to infer something about the future or the past or about these sort of hidden states that might exist. And so the tasks that you’ll often see, and we’re not going to go into the mathematics of these tasks, but they’re all based on the same idea of conditional probabilities and using the probability distributions we have to draw these sorts of conclusions. One task is called filtering, which is given observations from the start until now, calculate the distribution for the current state, meaning given information about from the beginning of time until now, on which days do people bring an umbrella or not bring an umbrella, can I calculate the probability of the current state that today, is it sunny or is it raining? Another task that might be possible is prediction, which is looking towards the future. Given observations about people bringing umbrellas from the beginning of when we started counting time until now, can I figure out the distribution that tomorrow is it sunny or is it raining? And you can also go backwards as well by a smoothing, where I can say given observations from start until now, calculate the distributions for some past state. Like I know that today people brought umbrellas and tomorrow people brought umbrellas. And so given two days worth of data of people bringing umbrellas, what’s the probability that yesterday it was raining? And that I know that people brought umbrellas today, that might inform that decision as well. It might influence those probabilities. And there’s also a most likely explanation task, in addition to other tasks that might exist as well, which is combining some of these given observations from the start up until now, figuring out the most likely sequence of states. And this is what we’re going to take a look at now, this idea that if I have all these observations, umbrella, no umbrella, umbrella, no umbrella, can I calculate the most likely states of sun, rain, sun, rain, and whatnot that actually represented the true weather that would produce these observations? And this is quite common when you’re trying to do something like voice recognition, for example, that you have these emissions of the audio waveforms, and you would like to calculate based on all of the observations that you have, what is the most likely sequence of actual words, or syllables, or sounds that the user actually made when they were speaking to this particular device, or other tasks that might come up in that context as well. And so we can try this out by going ahead and going into the HMM directory, HMM for Hidden Markov Model. And here, what I’ve done is I’ve defined a model where this model first defines my possible state, sun, and rain, along with their emission probabilities, the observation model, or the emission model, where here, given that I know that it’s sunny, the probability that I see people bring an umbrella is 0.2, the probability of no umbrella is 0.8. And likewise, if it’s raining, then people are more likely to bring an umbrella. Umbrella has probability 0.9, no umbrella has probability 0.1. So the actual underlying hidden states, those states are sun and rain, but the things that I observe, the observations that I can see, are either umbrella or no umbrella as the things that I observe as a result. So this then, I also need to add to it a transition matrix, same as before, saying that if today is sunny, then tomorrow is more likely to be sunny. And if today is rainy, then tomorrow is more likely to be raining. As of before, I give it some starting probabilities, saying at first, 50-50 chance for whether it’s sunny or rainy. And then I can create the model based on that information. Again, the exact syntax of this is not so important, so much as it is the data that I am now encoding into a program, such that now I can begin to do some inference. So I can give my program, for example, a list of observations, umbrella, umbrella, no umbrella, umbrella, umbrella, so on and so forth, no umbrella, no umbrella. And I would like to calculate, I would like to figure out the most likely explanation for these observations. What is likely is whether rain, rain, is this rain, or is it more likely that this was actually sunny, and then it switched back to it being rainy? And that’s an interesting question. We might not be sure, because it might just be that it just so happened on this rainy day, people decided not to bring an umbrella. Or it could be that it switched from rainy to sunny back to rainy, which doesn’t seem too likely, but it certainly could happen. And using the data we give to the hidden Markov model, our model can begin to predict these answers, can begin to figure it out. So we’re going to go ahead and just predict these observations. And then for each of those predictions, go ahead and print out what the prediction is. And this library just so happens to have a function called predict that does this prediction process for me. So I’ll run python sequence.py. And the result I get is this. This is the prediction based on the observations of what all of those states are likely to be. And it’s likely to be rain and rain. In this case, it thinks that what most likely happened is that it was sunny for a day and then went back to being rainy. But in different situations, if it was rainy for longer maybe, or if the probabilities were slightly different, you might imagine that it’s more likely that it was rainy all the way through. And it just so happened on one rainy day, people decided not to bring umbrellas. And so here, too, Python libraries can begin to allow for the sort of inference procedure. And by taking what we know and by putting it in terms of these tasks that already exist, these general tasks that work with hidden Markov models, then any time we can take an idea and formulate it as a hidden Markov model, formulate it as something that has hidden states and observed emissions that result from those states, then we can take advantage of these algorithms that are known to exist for trying to do this sort of inference. So now we’ve seen a couple of ways that AI can begin to deal with uncertainty. We’ve taken a look at probability and how we can use probability to describe numerically things that are likely or more likely or less likely to happen than other events or other variables. And using that information, we can begin to construct these standard types of models, things like Bayesian networks and Markov chains and hidden Markov models that all allow us to be able to describe how particular events relate to other events or how the values of particular variables relate to other variables, not for certain, but with some sort of probability distribution. And by formulating things in terms of these models that already exist, we can take advantage of Python libraries that implement these sort of models already and allow us just to be able to use them to produce some sort of resulting effect. So all of this then allows our AI to begin to deal with these sort of uncertain problems so that our AI doesn’t need to know things for certain but can infer based on information it doesn’t know. Next time, we’ll take a look at additional types of problems that we can solve by taking advantage of AI-related algorithms, even beyond the world of the types of problems we’ve already explored. We’ll see you next time. OK. Welcome back, everyone, to an introduction to artificial intelligence with Python. And now, so far, we’ve taken a look at a couple of different types of problems. We’ve seen classical search problems where we’re trying to get from an initial state to a goal by figuring out some optimal path. We’ve taken a look at adversarial search where we have a game-playing agent that is trying to make the best move. We’ve seen knowledge-based problems where we’re trying to use logic and inference to be able to figure out and draw some additional conclusions. And we’ve seen some probabilistic models as well where we might not have certain information about the world, but we want to use the knowledge about probabilities that we do have to be able to draw some conclusions. Today, we’re going to turn our attention to another category of problems generally known as optimization problems, where optimization is really all about choosing the best option from a set of possible options. And we’ve already seen optimization in some contexts, like game-playing, where we’re trying to create an AI that chooses the best move out of a set of possible moves. But what we’ll take a look at today is a category of types of problems and algorithms to solve them that can be used in order to deal with a broader range of potential optimization problems. And the first of the algorithms that we’ll take a look at is known as a local search. And local search differs from search algorithms we’ve seen before in the sense that the search algorithms we’ve looked at so far, which are things like breadth-first search or A-star search, for example, generally maintain a whole bunch of different paths that we’re simultaneously exploring, and we’re looking at a bunch of different paths at once trying to find our way to the solution. On the other hand, in local search, this is going to be a search algorithm that’s really just going to maintain a single node, looking at a single state. And we’ll generally run this algorithm by maintaining that single node and then moving ourselves to one of the neighboring nodes throughout this search process. And this is generally useful in context not like these problems, which we’ve seen before, like a maze-solving situation where we’re trying to find our way from the initial state to the goal by following some path. But local search is most applicable when we really don’t care about the path at all, and all we care about is what the solution is. And in the case of solving a maze, the solution was always obvious. You could point to the solution. You know exactly what the goal is, and the real question is, what is the path to get there? But local search is going to come up in cases where figuring out exactly what the solution is, exactly what the goal looks like, is actually the heart of the challenge. And to give an example of one of these kinds of problems, we’ll consider a scenario where we have two types of buildings, for example. We have houses and hospitals. And our goal might be in a world that’s formatted as this grid, where we have a whole bunch of houses, a house here, house here, two houses over there, maybe we want to try and find a way to place two hospitals on this map. So maybe a hospital here and a hospital there. And the problem now is we want to place two hospitals on the map, but we want to do so with some sort of objective. And our objective in this case is to try and minimize the distance of any of the houses from a hospital. So you might imagine, all right, what’s the distance from each of the houses to their nearest hospital? There are a number of ways we could calculate that distance. But one way is using a heuristic we’ve looked at before, which is the Manhattan distance, this idea of how many rows and columns would you have to move inside of this grid layout in order to get to a hospital, for example. And it turns out, if you take each of these four houses and figure out, all right, how close are they to their nearest hospital, you get something like this, where this house is three away from a hospital, this house is six away, and these two houses are each four away. And if you add all those numbers up together, you get a total cost of 17, for example. So for this particular configuration of hospitals, a hospital here and a hospital there, that state, we might say, has a cost of 17. And the goal of this problem now that we would like to apply a search algorithm to figure out is, can you solve this problem to find a way to minimize that cost? Minimize the total amount if you sum up all of the distances from all the houses to the nearest hospital. How can we minimize that final value? And if we think about this problem a little bit more abstractly, abstracting away from this specific problem and thinking more generally about problems like it, you can often formulate these problems by thinking about them as a state-space landscape, as we’ll soon call it. Here in this diagram of a state-space landscape, each of these vertical bars represents a particular state that our world could be in. So for example, each of these vertical bars represents a particular configuration of two hospitals. And the height of this vertical bar is generally going to represent some function of that state, some value of that state. So maybe in this case, the height of the vertical bar represents what is the cost of this particular configuration of hospitals in terms of what is the sum total of all the distances from all of the houses to their nearest hospital. And generally speaking, when we have a state-space landscape, we want to do one of two things. We might be trying to maximize the value of this function, trying to find a global maximum, so to speak, of this state-space landscape, a single state whose value is higher than all of the other states that we could possibly choose from. And generally in this case, when we’re trying to find a global maximum, we’ll call the function that we’re trying to optimize some objective function, some function that measures for any given state how good is that state, such that we can take any state, pass it into the objective function, and get a value for how good that state is. And ultimately, what our goal is is to find one of these states that has the highest possible value for that objective function. An equivalent but reversed problem is the problem of finding a global minimum, some state that has a value after you pass it into this function that is lower than all of the other possible values that we might choose from. And generally speaking, when we’re trying to find a global minimum, we call the function that we’re calculating a cost function. Generally, each state has some sort of cost, whether that cost is a monetary cost, or a time cost, or in the case of the houses and hospitals, we’ve been looking at just now, a distance cost in terms of how far away each of the houses is from a hospital. And we’re trying to minimize the cost, find the state that has the lowest possible value of that cost. So these are the general types of ideas we might be trying to go for within a state space landscape, trying to find a global maximum, or trying to find a global minimum. And how exactly do we do that? We’ll recall that in local search, we generally operate this algorithm by maintaining just a single state, just some current state represented inside of some node, maybe inside of a data structure, where we’re keeping track of where we are currently. And then ultimately, what we’re going to do is from that state, move to one of its neighbor states. So in this case, represented in this one-dimensional space by just the state immediately to the left or to the right of it. But for any different problem, you might define what it means for there to be a neighbor of a particular state. In the case of a hospital, for example, that we were just looking at, a neighbor might be moving one hospital one space to the left or to the right or up or down. Some state that is close to our current state, but slightly different, and as a result, might have a slightly different value in terms of its objective function or in terms of its cost function. So this is going to be our general strategy in local search, to be able to take a state, maintaining some current node, and move where we’re looking at in the state space landscape in order to try to find a global maximum or a global minimum somehow. And perhaps the simplest of algorithms that we could use to implement this idea of local search is an algorithm known as hill climbing. And the basic idea of hill climbing is, let’s say I’m trying to maximize the value of my state. I’m trying to figure out where the global maximum is. I’m going to start at a state. And generally, what hill climbing is going to do is it’s going to consider the neighbors of that state, that from this state, all right, I could go left or I could go right, and this neighbor happens to be higher and this neighbor happens to be lower. And in hill climbing, if I’m trying to maximize the value, I’ll generally pick the highest one I can between the state to the left and right of me. This one is higher. So I’ll go ahead and move myself to consider that state instead. And then I’ll repeat this process, continually looking at all of my neighbors and picking the highest neighbor, doing the same thing, looking at my neighbors, picking the highest of my neighbors, until I get to a point like right here, where I consider both of my neighbors and both of my neighbors have a lower value than I do. This current state has a value that is higher than any of its neighbors. And at that point, the algorithm terminates. And I can say, all right, here I have now found the solution. And the same thing works in exactly the opposite way for trying to find a global minimum. But the algorithm is fundamentally the same. If I’m trying to find a global minimum and say my current state starts here, I’ll continually look at my neighbors, pick the lowest value that I possibly can, until I eventually, hopefully, find that global minimum, a point at which when I look at both of my neighbors, they each have a higher value. And I’m trying to minimize the total score or cost or value that I get as a result of calculating some sort of cost function. So we can formulate this graphical idea in terms of pseudocode. And the pseudocode for hill climbing might look like this. We define some function called hill climb that takes as input the problem that we’re trying to solve. And generally, we’re going to start in some sort of initial state. So I’ll start with a variable called current that is keeping track of my initial state, like an initial configuration of hospitals. And maybe some problems lend themselves to an initial state, some place where you begin. In other cases, maybe not, in which case we might just randomly generate some initial state, just by choosing two locations for hospitals at random, for example, and figuring out from there how we might be able to improve. But that initial state, we’re going to store inside of current. And now, here comes our loop, some repetitive process we’re going to do again and again until the algorithm terminates. And what we’re going to do is first say, let’s figure out all of the neighbors of the current state. From my state, what are all of the neighboring states for some definition of what it means to be a neighbor? And I’ll go ahead and choose the highest value of all of those neighbors and save it inside of this variable called neighbor. So keep track of the highest-valued neighbor. This is in the case where I’m trying to maximize the value. In the case where I’m trying to minimize the value, you might imagine here, you’ll pick the neighbor with the lowest possible value. But these ideas are really fundamentally interchangeable. And it’s possible, in some cases, there might be multiple neighbors that each have an equally high value or an equally low value in the minimizing case. And in that case, we can just choose randomly from among them. Choose one of them and save it inside of this variable neighbor. And then the key question to ask is, is this neighbor better than my current state? And if the neighbor, the best neighbor that I was able to find, is not better than my current state, well, then the algorithm is over. And I’ll just go ahead and return the current state. If none of my neighbors are better, then I may as well stay where I am, is the general logic of the hill climbing algorithm. But otherwise, if the neighbor is better, then I may as well move to that neighbor. So you might imagine setting current equal to neighbor, where the general idea is if I’m at a current state and I see a neighbor that is better than me, then I’ll go ahead and move there. And then I’ll repeat the process, continually moving to a better neighbor until I reach a point at which none of my neighbors are better than I am. And at that point, we’d say the algorithm can just terminate there. So let’s take a look at a real example of this with these houses and hospitals. So we’ve seen now that if we put the hospitals in these two locations, that has a total cost of 17. And now we need to define, if we’re going to implement this hill climbing algorithm, what it means to take this particular configuration of hospitals, this particular state, and get a neighbor of that state. And a simple definition of neighbor might be just, let’s pick one of the hospitals and move it by one square, the left or right or up or down, for example. And that would mean we have six possible neighbors from this particular configuration. We could take this hospital and move it to any of these three possible squares, or we take this hospital and move it to any of those three possible squares. And each of those would generate a neighbor. And what I might do is say, all right, here’s the locations and the distances between each of the houses and their nearest hospital. Let me consider all of the neighbors and see if any of them can do better than a cost of 17. And it turns out there are a couple of ways that we could do that. And it doesn’t matter if we randomly choose among all the ways that are the best. But one such possible way is by taking a look at this hospital here and considering the directions in which it might move. If we hold this hospital constant, if we take this hospital and move it one square up, for example, that doesn’t really help us. It gets closer to the house up here, but it gets further away from the house down here. And it doesn’t really change anything for the two houses along the left-hand side. But if we take this hospital on the right and move it one square down, it’s the opposite problem. It gets further away from the house up above, and it gets closer to the house down below. The real idea, the goal should be to be able to take this hospital and move it one square to the left. By moving it one square to the left, we move it closer to both of these houses on the right without changing anything about the houses on the left. For them, this hospital is still the closer one, so they aren’t affected. So we’re able to improve the situation by picking a neighbor that results in a decrease in our total cost. And so we might do that. Move ourselves from this current state to a neighbor by just taking that hospital and moving it. And at this point, there’s not a whole lot that can be done with this hospital. But there’s still other optimizations we can make, other neighbors we can move to that are going to have a better value. If we consider this hospital, for example, we might imagine that right now it’s a bit far up, that both of these houses are a little bit lower. So we might be able to do better by taking this hospital and moving it one square down, moving it down so that now instead of a cost of 15, we’re down to a cost of 13 for this particular configuration. And we can do even better by taking the hospital and moving it one square to the left. Now instead of a cost of 13, we have a cost of 11, because this house is one away from the hospital. This one is four away. This one is three away. And this one is also three away. So we’ve been able to do much better than that initial cost that we had using the initial configuration. Just by taking every state and asking ourselves the question, can we do better by just making small incremental changes, moving to a neighbor, moving to a neighbor, and moving to a neighbor after that? And now at this point, we can potentially see that at this point, the algorithm is going to terminate. There’s actually no neighbor we can move to that is going to improve the situation, get us a cost that is less than 11. Because if we take this hospital and move it upper to the right, well, that’s going to make it further away. If we take it and move it down, that doesn’t really change the situation. It gets further away from this house but closer to that house. And likewise, the same story was true for this hospital. Any neighbor we move it to, up, left, down, or right, is either going to make it further away from the houses and increase the cost, or it’s going to have no effect on the cost whatsoever. And so the question we might now ask is, is this the best we could do? Is this the best placement of the hospitals we could possibly have? And it turns out the answer is no, because there’s a better way that we could place these hospitals. And in particular, there are a number of ways you could do this. But one of the ways is by taking this hospital here and moving it to this square, for example, moving it diagonally by one square, which was not part of our definition of neighbor. We could only move left, right, up, or down. But this is, in fact, better. It has a total cost of 9. It is now closer to both of these houses. And as a result, the total cost is less. But we weren’t able to find it, because in order to get there, we had to go through a state that actually wasn’t any better than the current state that we had been on previously. And so this appears to be a limitation, or a concern you might have as you go about trying to implement a hill climbing algorithm, is that it might not always give you the optimal solution. If we’re trying to maximize the value of any particular state, we’re trying to find the global maximum, a concern might be that we could get stuck at one of the local maxima, highlighted here in blue, where a local maxima is any state whose value is higher than any of its neighbors. If we ever find ourselves at one of these two states when we’re trying to maximize the value of the state, we’re not going to make any changes. We’re not going to move left or right. We’re not going to move left here, because those states are worse. But yet, we haven’t found the global optimum. We haven’t done as best as we could do. And likewise, in the case of the hospitals, what we’re ultimately trying to do is find a global minimum, find a value that is lower than all of the others. But we have the potential to get stuck at one of the local minima, any of these states whose value is lower than all of its neighbors, but still not as low as the local minima. And so the takeaway here is that it’s not always going to be the case that when we run this naive hill climbing algorithm, that we’re always going to find the optimal solution. There are things that could go wrong. If we started here, for example, and tried to maximize our value as much as possible, we might move to the highest possible neighbor, move to the highest possible neighbor, move to the highest possible neighbor, and stop, and never realize that there’s actually a better state way over there that we could have gone to instead. And other problems you might imagine just by taking a look at this state space landscape are these various different types of plateaus, something like this flat local maximum here, where all six of these states each have the exact same value. And so in the case of the algorithm we showed before, none of the neighbors are better, so we might just get stuck at this flat local maximum. And even if you allowed yourself to move to one of the neighbors, it wouldn’t be clear which neighbor you would ultimately move to, and you could get stuck here as well. And there’s another one over here. This one is called a shoulder. It’s not really a local maximum, because there’s still places where we can go higher, not a local minimum, because we can go lower. So we can still make progress, but it’s still this flat area, where if you have a local search algorithm, there’s potential to get lost here, unable to make some upward or downward progress, depending on whether we’re trying to maximize or minimize it, and therefore another potential for us to be able to find a solution that might not actually be the optimal solution. And so because of this potential, the potential that hill climbing has to not always find us the optimal result, it turns out there are a number of different varieties and variations on the hill climbing algorithm that help to solve the problem better depending on the context, and depending on the specific type of problem, some of these variants might be more applicable than others. What we’ve taken a look at so far is a version of hill climbing generally called steepest ascent hill climbing, where the idea of steepest ascent hill climbing is we are going to choose the highest valued neighbor, in the case where we’re trying to maximize or the lowest valued neighbor in cases where we’re trying to minimize. But generally speaking, if I have five neighbors and they’re all better than my current state, I will pick the best one of those five. Now, sometimes that might work pretty well. It’s sort of a greedy approach of trying to take the best operation at any particular time step, but it might not always work. There might be cases where actually I want to choose an option that is slightly better than me, but maybe not the best one because that later on might lead to a better outcome ultimately. So there are other variants that we might consider of this basic hill climbing algorithm. One is known as stochastic hill climbing. And in this case, we choose randomly from all of our higher value neighbors. So if I’m at my current state and there are five neighbors that are all better than I am, rather than choosing the best one, as steep as the set would do, stochastic will just choose randomly from one of them, thinking that if it’s better, then it’s better. And maybe there’s a potential to make forward progress, even if it is not locally the best option I could possibly choose. First choice hill climbing ends up just choosing the very first highest valued neighbor that it follows, behaving on a similar idea, rather than consider all of the neighbors. As soon as we find a neighbor that is better than our current state, we’ll go ahead and move there. There may be some efficiency improvements there and maybe has the potential to find a solution that the other strategies weren’t able to find. And with all of these variants, we still suffer from the same potential risk, this risk that we might end up at a local minimum or a local maximum. And we can reduce that risk by repeating the process multiple times. So one variant of hill climbing is random restart hill climbing, where the general idea is we’ll conduct hill climbing multiple times. If we apply steepest descent hill climbing, for example, we’ll start at some random state, try and figure out how to solve the problem and figure out what is the local maximum or local minimum we get to. And then we’ll just randomly restart and try again, choose a new starting configuration, try and figure out what the local maximum or minimum is, and do this some number of times. And then after we’ve done it some number of times, we can pick the best one out of all of the ones that we’ve taken a look at. So there’s another option we have access to as well. And then, although I said that generally local search will usually just keep track of a single node and then move to one of its neighbors, there are variants of hill climbing that are known as local beam searches, where rather than keep track of just one current best state, we’re keeping track of k highest valued neighbors, such that rather than starting at one random initial configuration, I might start with 3 or 4 or 5, randomly generate all the neighbors, and then pick the 3 or 4 or 5 best of all of the neighbors that I find, and continually repeat this process, with the idea being that now I have more options that I’m considering, more ways that I could potentially navigate myself to the optimal solution that might exist for a particular problem. So let’s now take a look at some actual code that can implement some of these kinds of ideas, something like steepest ascent hill climbing, for example, for trying to solve this hospital problem. So I’m going to go ahead and go into my hospitals directory, where I’ve actually set up the basic framework for solving this type of problem. I’ll go ahead and go into hospitals.py, and we’ll take a look at the code we’ve created here. I’ve defined a class that is going to represent the state space. So the space has a height, and a width, and also some number of hospitals. So you can configure how big is your map, how many hospitals should go here. We have a function for adding a new house to the state space, and then some functions that are going to get me all of the available spaces for if I want to randomly place hospitals in particular locations. And here now is the hill climbing algorithm. So what are we going to do in the hill climbing algorithm? Well, we’re going to start by randomly initializing where the hospitals are going to go. We don’t know where the hospitals should actually be, so let’s just randomly place them. So here I’m running a loop for each of the hospitals that I have. I’m going to go ahead and add a new hospital at some random location. So I basically get all of the available spaces, and I randomly choose one of them as where I would like to add this particular hospital. I have some logging output and generating some images, which we’ll take a look at a little bit later. But here is the key idea. So I’m going to just keep repeating this algorithm. I could specify a maximum of how many times I want it to run, or I could just run it up until it hits a local maximum or local minimum. And now we’ll basically consider all of the hospitals that could potentially move. So consider each of the two hospitals or more hospitals if they’re more than that. And consider all of the places where that hospital could move to, some neighbor of that hospital that we can move the neighbor to. And then see, is this going to be better than where we were currently? So if it is going to be better, then we’ll go ahead and update our best neighbor and keep track of this new best neighbor that we found. And then afterwards, we can ask ourselves the question, if best neighbor cost is greater than or equal to the cost of the current set of hospitals, meaning if the cost of our best neighbor is greater than the current cost, meaning our best neighbor is worse than our current state, well, then we shouldn’t make any changes at all. And we should just go ahead and return the current set of hospitals. But otherwise, we can update our hospitals in order to change them to one of the best neighbors. And if there are multiple that are all equivalent, I’m here using random.choice to say go ahead and choose one randomly. So this is really just a Python implementation of that same idea that we were just talking about, this idea of taking a current state, some current set of hospitals, generating all of the neighbors, looking at all of the ways we could take one hospital and move it one square to the left or right or up or down, and then figuring out, based on all of that information, which is the best neighbor or the set of all the best neighbors, and then choosing from one of those. And each time, we go ahead and generate an image in order to do that. And so now what we’re doing is if we look down at the bottom, I’m going to randomly generate a space with height 10 and width 20. And I’ll say go ahead and put three hospitals somewhere in the space. I’ll randomly generate 15 houses that I just go ahead and add in random locations. And now I’m going to run this hill climbing algorithm in order to try and figure out where we should place those hospitals. So we’ll go ahead and run this program by running Python hospitals. And we see that we started. Our initial state had a cost of 72, but we were able to continually find neighbors that were able to decrease that cost, decrease to 69, 66, 63, so on and so forth, all the way down to 53, as the best neighbor we were able to ultimately find. And we can take a look at what that looked like by just opening up these files. So here, for example, was the initial configuration. We randomly selected a location for each of these 15 different houses and then randomly selected locations for one, two, three hospitals that were just located somewhere inside of the state space. And if you add up all the distances from each of the houses to their nearest hospital, you get a total cost of about 72. And so now the question is, what neighbors can we move to that improve the situation? And it looks like the first one the algorithm found was by taking this house that was over there on the right and just moving it to the left. And that probably makes sense because if you look at the houses in that general area, really these five houses look like they’re probably the ones that are going to be closest to this hospital over here. Moving it to the left decreases the total distance, at least to most of these houses, though it does increase that distance for one of them. And so we’re able to make these improvements to the situation by continually finding ways that we can move these hospitals around until we eventually settle at this particular state that has a cost of 53, where we figured out a position for each of the hospitals. And now none of the neighbors that we could move to are actually going to improve the situation. We can take this hospital and this hospital and that hospital and look at each of the neighbors. And none of those are going to be better than this particular configuration. And again, that’s not to say that this is the best we could do. There might be some other configuration of hospitals that is a global minimum. And this might just be a local minimum that is the best of all of its neighbors, but maybe not the best in the entire possible state space. And you could search through the entire state space by considering all of the possible configurations for hospitals. But ultimately, that’s going to be very time intensive, especially as our state space gets bigger and there might be more and more possible states. It’s going to take quite a long time to look through all of them. And so being able to use these sort of local search algorithms can often be quite good for trying to find the best solution we can do. And especially if we don’t care about doing the best possible and we just care about doing pretty good and finding a pretty good placement of those hospitals, then these methods can be particularly powerful. But of course, we can try and mitigate some of this concern by instead of using hill climbing to use random restart, this idea of rather than just hill climb one time, we can hill climb multiple times and say, try hill climbing a whole bunch of times on the exact same map and figure out what is the best one that we’ve been able to find. And so I’ve here implemented a function for random restart that restarts some maximum number of times. And what we’re going to do is repeat that number of times this process of just go ahead and run the hill climbing algorithm, figure out what the cost is of getting from all the houses to the hospitals, and then figure out is this better than we’ve done so far. So I can try this exact same idea where instead of running hill climbing, I’ll go ahead and run random restart. And I’ll randomly restart maybe 20 times, for example. And we’ll go ahead and now I’ll remove all the images and then rerun the program. And now we started by finding a original state. When we initially ran hill climbing, the best cost we were able to find was 56. Each of these iterations is a different iteration of the hill climbing algorithm. We’re running hill climbing not one time, but 20 times here, each time going until we find a local minimum in this case. And we look and see each time did we do better than we did the best time we’ve done so far. So we went from 56 to 46. This one was greater, so we ignored it. This one was 41, which was less, so we went ahead and kept that one. And for all of the remaining 16 times that we tried to implement hill climbing and we tried to run the hill climbing algorithm, we couldn’t do any better than that 41. Again, maybe there is a way to do better that we just didn’t find, but it looks like that way ended up being a pretty good solution to the problem. That was attempt number three, starting from counting at zero. So we can take a look at that, open up number three. And this was the state that happened to have a cost of 41, that after running the hill climbing algorithm on some particular random initial configuration of hospitals, this is what we found was the local minimum in terms of trying to minimize the cost. And it looks like we did pretty well. This hospital is pretty close to this region. This one is pretty close to these houses here. This hospital looks about as good as we can do for trying to capture those houses over on that side. And so these sorts of algorithms can be quite useful for trying to solve these problems. But the real problem with many of these different types of hill climbing, steepest of sense, stochastic, first choice, and so forth, is that they never make a move that makes our situation worse. They’re always going to take ourselves in our current state, look at the neighbors, and consider can we do better than our current state and move to one of those neighbors. Which of those neighbors we choose might vary among these various different types of algorithms, but we never go from a current position to a position that is worse than our current position. And ultimately, that’s what we’re going to need to do if we want to be able to find a global maximum or a global minimum. Because sometimes if we get stuck, we want to find some way of dislodging ourselves from our local maximum or local minimum in order to find the global maximum or the global minimum or increase the probability that we do find it. And so the most popular technique for trying to approach the problem from that angle is a technique known as simulated annealing, simulated because it’s modeling after a real physical process of annealing, where you can think about this in terms of physics, a physical situation where you have some system of particles. And you might imagine that when you heat up a particular physical system, there’s a lot of energy there. Things are moving around quite randomly. But over time, as the system cools down, it eventually settles into some final position. And that’s going to be the general idea of simulated annealing. We’re going to simulate that process of some high temperature system where things are moving around randomly quite frequently, but over time decreasing that temperature until we eventually settle at our ultimate solution. And the idea is going to be if we have some state space landscape that looks like this and we begin at its initial state here, if we’re looking for a global maximum and we’re trying to maximize the value of the state, our traditional hill climbing algorithms would just take the state and look at the two neighbor ones and always pick the one that is going to increase the value of the state. But if we want some chance of being able to find the global maximum, we can’t always make good moves. We have to sometimes make bad moves and allow ourselves to make a move in a direction that actually seems for now to make our situation worse such that later we can find our way up to that global maximum in terms of trying to solve that problem. Of course, once we get up to this global maximum, once we’ve done a whole lot of the searching, then we probably don’t want to be moving to states that are worse than our current state. And so this is where this metaphor for annealing starts to come in, where we want to start making more random moves and over time start to make fewer of those random moves based on a particular temperature schedule. So the basic outline looks something like this. Early on in simulated annealing, we have a higher temperature state. And what we mean by a higher temperature state is that we are more likely to accept neighbors that are worse than our current state. We might look at our neighbors. And if one of our neighbors is worse than the current state, especially if it’s not all that much worse, if it’s pretty close but just slightly worse, then we might be more likely to accept that and go ahead and move to that neighbor anyways. But later on as we run simulated annealing, we’re going to decrease that temperature. And at a lower temperature, we’re going to be less likely to accept neighbors that are worse than our current state. Now to formalize this and put a little bit of pseudocode to it, here is what that algorithm might look like. We have a function called simulated annealing that takes as input the problem we’re trying to solve and also potentially some maximum number of times we might want to run the simulated annealing process, how many different neighbors we’re going to try and look for. And that value is going to vary based on the problem you’re trying to solve. We’ll, again, start with some current state that will be equal to the initial state of the problem. But now we need to repeat this process over and over for max number of times. Repeat some process some number of times where we’re first going to calculate a temperature. And this temperature function takes the current time t starting at 1 going all the way up to max and then gives us some temperature that we can use in our computation, where the idea is that this temperature is going to be higher early on and it’s going to be lower later on. So there are a number of ways this temperature function could often work. One of the simplest ways is just to say it is like the proportion of time that we still have remaining. Out of max units of time, how much time do we have remaining? You start off with a lot of that time remaining. And as time goes on, the temperature is going to decrease because you have less and less of that remaining time still available to you. So we calculate a temperature for the current time. And then we pick a random neighbor of the current state. No longer are we going to be picking the best neighbor that we possibly can or just one of the better neighbors that we can. We’re going to pick a random neighbor. It might be better. It might be worse. But we’re going to calculate that. We’re going to calculate delta E, E for energy in this case, which is just how much better is the neighbor than the current state. So if delta E is positive, that means the neighbor is better than our current state. If delta E is negative, that means the neighbor is worse than our current state. And so we can then have a condition that looks like this. If delta E is greater than 0, that means the neighbor state is better than our current state. And if ever that situation arises, we’ll just go ahead and update current to be that neighbor. Same as before, move where we are currently to be the neighbor because the neighbor is better than our current state. We’ll go ahead and accept that. But now the difference is that whereas before, we never, ever wanted to take a move that made our situation worse, now we sometimes want to make a move that is actually going to make our situation worse because sometimes we’re going to need to dislodge ourselves from a local minimum or local maximum to increase the probability that we’re able to find the global minimum or the global maximum a little bit later. And so how do we do that? How do we decide to sometimes accept some state that might actually be worse? Well, we’re going to accept a worse state with some probability. And that probability needs to be based on a couple of factors. It needs to be based in part on the temperature, where if the temperature is higher, we’re more likely to move to a worse neighbor. And if the temperature is lower, we’re less likely to move to a worse neighbor. But it also, to some degree, should be based on delta E. If the neighbor is much worse than the current state, we probably want to be less likely to choose that than if the neighbor is just a little bit worse than the current state. So again, there are a couple of ways you could calculate this. But it turns out one of the most popular is just to calculate E to the power of delta E over T, where E is just a constant. Delta E over T are based on delta E and T here. We calculate that value. And that’ll be some value between 0 and 1. And that is the probability with which we should just say, all right, let’s go ahead and move to that neighbor. And it turns out that if you do the math for this value, when delta E is such that the neighbor is not that much worse than the current state, that’s going to be more likely that we’re going to go ahead and move to that state. And likewise, when the temperature is lower, we’re going to be less likely to move to that neighboring state as well. So now this is the big picture for simulated annealing, this process of taking the problem and going ahead and generating random neighbors will always move to a neighbor if it’s better than our current state. But even if the neighbor is worse than our current state, we’ll sometimes move there depending on how much worse it is and also based on the temperature. And as a result, the hope, the goal of this whole process is that as we begin to try and find our way to the global maximum or the global minimum, we can dislodge ourselves if we ever get stuck at a local maximum or local minimum in order to eventually make our way to exploring the part of the state space that is going to be the best. And then as the temperature decreases, eventually we settle there without moving around too much from what we’ve found to be the globally best thing that we can do thus far. So at the very end, we just return whatever the current state happens to be. And that is the conclusion of this algorithm. We’ve been able to figure out what the solution is. And these types of algorithms have a lot of different applications. Any time you can take a problem and formulate it as something where you can explore a particular configuration and then ask, are any of the neighbors better than this current configuration and have some way of measuring that, then there is an applicable case for these hill climbing, simulated annealing types of algorithms. So sometimes it can be for facility location type problems, like for when you’re trying to plan a city and figure out where the hospitals should be. But there are definitely other applications as well. And one of the most famous problems in computer science is the traveling salesman problem. Traveling salesman problem generally is formulated like this. I have a whole bunch of cities here indicated by these dots. And what I’d like to do is find some route that takes me through all of the cities and ends up back where I started. So some route that starts here, goes through all these cities, and ends up back where I originally started. And what I might like to do is minimize the total distance that I have to travel or the total cost of taking this entire path. And you can imagine this is a problem that’s very applicable in situations like when delivery companies are trying to deliver things to a whole bunch of different houses, they want to figure out, how do I get from the warehouse to all these various different houses and get back again, all using as minimal time and distance and energy as possible. So you might want to try to solve these sorts of problems. But it turns out that solving this particular kind of problem is very computationally difficult. It is a very computationally expensive task to be able to figure it out. This falls under the category of what are known as NP-complete problems, problems that there is no known efficient way to try and solve these sorts of problems. And so what we ultimately have to do is come up with some approximation, some ways of trying to find a good solution, even if we’re not going to find the globally best solution that we possibly can, at least not in a feasible or tractable amount of time. And so what we could do is take the traveling salesman problem and try to formulate it using local search and ask a question like, all right, I can pick some state, some configuration, some route between all of these nodes. And I can measure the cost of that state, figure out what the distance is. And I might now want to try to minimize that cost as much as possible. And then the only question now is, what does it mean to have a neighbor of this state? What does it mean to take this particular route and have some neighboring route that is close to it but slightly different and such that it might have a different total distance? And there are a number of different definitions for what a neighbor of a traveling salesman configuration might look like. But one way is just to say, a neighbor is what happens if we pick two of these edges between nodes and switch them effectively. So for example, I might pick these two edges here, these two that just happened across this node goes here, this node goes there, and go ahead and switch them. And what that process will generally look like is removing both of these edges from the graph, taking this node, and connecting it to the node it wasn’t connected to. So connecting it up here instead. We’ll need to take these arrows that were originally going this way and reverse them, so move them going the other way, and then just fill in that last remaining blank, add an arrow that goes in that direction instead. So by taking two edges and just switching them, I have been able to consider one possible neighbor of this particular configuration. And it looks like this neighbor is actually better. It looks like this probably travels a shorter distance in order to get through all the cities through this route than the current state did. And so you could imagine implementing this idea inside of a hill climbing or simulated annealing algorithm, where we repeat this process to try and take a state of this traveling salesman problem, look at all the neighbors, and then move to the neighbors if they’re better, or maybe even move to the neighbors if they’re worse, until we eventually settle upon some best solution that we’ve been able to find. And it turns out that these types of approximation algorithms, even if they don’t always find the very best solution, can often do pretty well at trying to find solutions that are helpful too. So that then was a look at local search, a particular category of algorithms that can be used for solving a particular type of problem, where we don’t really care about the path to the solution. I didn’t care about the steps I took to decide where the hospitals should go. I just cared about the solution itself. I just care about where the hospitals should be, or what the route through the traveling salesman journey really ought to be. Another type of algorithm that might come up are known as these categories of linear programming types of problems. And linear programming often comes up in the context where we’re trying to optimize for some mathematical function. But oftentimes, linear programming will come up when we might have real numbered values. So it’s not just discrete fixed values that we might have, but any decimal values that we might want to be able to calculate. And so linear programming is a family of types of problems where we might have a situation that looks like this, where the goal of linear programming is to minimize a cost function. And you can invert the numbers and say try and maximize it, but often we’ll frame it as trying to minimize a cost function that has some number of variables, x1, x2, x3, all the way up to xn, just some number of variables that are involved, things that I want to know the values to. And this cost function might have coefficients in front of those variables. And this is what we would call a linear equation, where we just have all of these variables that might be multiplied by a coefficient and then add it together. We’re not going to square anything or cube anything, because that’ll give us different types of equations. With linear programming, we’re just dealing with linear equations in addition to linear constraints, where a constraint is going to look something like if we sum up this particular equation that is just some linear combination of all of these variables, it is less than or equal to some bound b. And we might have a whole number of these various different constraints that we might place onto our linear programming exercise. And likewise, just as we can have constraints that are saying this linear equation is less than or equal to some bound b, it might also be equal to something. That if you want some sum of some combination of variables to be equal to a value, you can specify that. And we can also maybe specify that each variable has lower and upper bounds, that it needs to be a positive number, for example, or it needs to be a number that is less than 50, for example. And there are a number of other choices that we can make there for defining what the bounds of a variable are. But it turns out that if you can take a problem and formulate it in these terms, formulate the problem as your goal is to minimize a cost function, and you’re minimizing that cost function subject to particular constraints, subjects to equations that are of the form like this of some sequence of variables is less than a bound or is equal to some particular value, then there are a number of algorithms that already exist for solving these sorts of problems. So let’s go ahead and take a look at an example. Here’s an example of a problem that might come up in the world of linear programming. Often, this is going to come up when we’re trying to optimize for something. And we want to be able to do some calculations, and we have constraints on what we’re trying to optimize. And so it might be something like this. In the context of a factory, we have two machines, x1 and x2. x1 costs $50 an hour to run. x2 costs $80 an hour to run. And our goal, what we’re trying to do, our objective, is to minimize the total cost. So that’s what we’d like to do. But we need to do so subject to certain constraints. So there might be a labor constraint that x1 requires five units of labor per hour, x2 requires two units of labor per hour, and we have a total of 20 units of labor that we have to spend. So this is a constraint. We have no more than 20 units of labor that we can spend, and we have to spend it across x1 and x2, each of which requires a different amount of labor. And we might also have a constraint like this that tells us x1 is going to produce 10 units of output per hour, x2 is going to produce 12 units of output per hour, and the company needs 90 units of output. So we have some goal, something we need to achieve. We need to achieve 90 units of output, but there are some constraints that x1 can only produce 10 units of output per hour, x2 produces 12 units of output per hour. These types of problems come up quite frequently, and you can start to notice patterns in these types of problems, problems where I am trying to optimize for some goal, minimizing cost, maximizing output, maximizing profits, or something like that. And there are constraints that are placed on that process. And so now we just need to formulate this problem in terms of linear equations. So let’s start with this first point. Two machines, x1 and x2, x costs $50 an hour, x2 costs $80 an hour. Here we can come up with an objective function that might look like this. This is our cost function, rather. 50 times x1 plus 80 times x2, where x1 is going to be a variable representing how many hours do we run machine x1 for, x2 is going to be a variable representing how many hours are we running machine x2 for. And what we’re trying to minimize is this cost function, which is just how much it costs to run each of these machines per hour summed up. This is an example of a linear equation, just some combination of these variables plus coefficients that are placed in front of them. And I would like to minimize that total value. But I need to do so subject to these constraints. x1 requires 50 units of labor per hour, x2 requires 2, and we have a total of 20 units of labor to spend. And so that gives us a constraint of this form. 5 times x1 plus 2 times x2 is less than or equal to 20. 20 is the total number of units of labor we have to spend. And that’s spent across x1 and x2, each of which requires a different number of units of labor per hour, for example. And finally, we have this constraint here. x1 produces 10 units of output per hour, x2 produces 12, and we need 90 units of output. And so this might look something like this. That 10×1 plus 12×2, this is amount of output per hour, it needs to be at least 90. We can do better or great, but it needs to be at least 90. And if you recall from my formulation before, I said that generally speaking in linear programming, we deal with equals constraints or less than or equal to constraints. So we have a greater than or equal to sign here. That’s not a problem. Whenever we have a greater than or equal to sign, we can just multiply the equation by negative 1, and that’ll flip it around to a less than or equals negative 90, for example, instead of a greater than or equal to 90. And that’s going to be an equivalent expression that we can use to represent this problem. So now that we have this cost function and these constraints that it’s subject to, it turns out there are a number of algorithms that can be used in order to solve these types of problems. And these problems go a little bit more into geometry and linear algebra than we’re really going to get into. But the most popular of these types of algorithms are simplex, which was one of the first algorithms discovered for trying to solve linear programs. And later on, a class of interior point algorithms can be used to solve this type of problem as well. The key is not to understand exactly how these algorithms work, but to realize that these algorithms exist for efficiently finding solutions any time we have a problem of this particular form. And so we can take a look, for example, at the production directory here, where here I have a file called production.py, where here I’m using scipy, which was the library for a lot of science-related functions within Python. And I can go ahead and just run this optimization function in order to run a linear program. .linprog here is going to try and solve this linear program for me, where I provide to this expression, to this function call, all of the data about my linear program. So it needs to be in a particular format, which might be a little confusing at first. But this first argument to scipy.optimize.linprogramming is the cost function, which is in this case just an array or a list that has 50 and 80, because my original cost function was 50 times x1 plus 80 times x2. So I just tell Python, 50 and 80, those are the coefficients that I am now trying to optimize for. And then I provide all of the constraints. So the constraints, and I wrote them up above in comments, is the constraint 1 is 5×1 plus 2×2 is less than or equal to 20. And constraint 2 is negative 10×1 plus negative 12×2 is less than or equal to negative 90. And so scipy expects these constraints to be in a particular format. It first expects me to provide all of the coefficients for the upper bound equations, ub just for upper bound, where the coefficients of the first equation are 5 and 2, because we have 5×1 and 2×2. And the coefficients for the second equation are negative 10 and negative 12, because I have negative 10×1 plus negative 12×2. And then here, we provide it as a separate argument, just to keep things separate, what the actual bound is. What is the upper bound for each of these constraints? Well, for the first constraint, the upper bound is 20. That was constraint number 1. And then for constraint number 2, the upper bound is 90. So a bit of a cryptic way of representing it. It’s not quite as simple as just writing the mathematical equations. What really is being expected here are all of the coefficients and all of the numbers that are in these equations by first providing the coefficients for the cost function, then providing all the coefficients for the inequality constraints, and then providing all of the upper bounds for those inequality constraints. And once all of that information is there, then we can run any of these interior point algorithms or the simplex algorithm. Even if you don’t understand how it works, you can just run the function and figure out what the result should be. And here, I said if the result is a success, we were able to solve this problem. Go ahead and print out what the value of x1 and x2 should be. Otherwise, go ahead and print out no solution. And so if I run this program by running python production.py, it takes a second to calculate. But then we see here is what the optimal solution should be. x1 should run for 1.5 hours. x2 should run for 6.25 hours. And we were able to do this by just formulating the problem as a linear equation that we were trying to optimize, some cost that we were trying to minimize, and then some constraints that were placed on that. And many, many problems fall into this category of problems that you can solve if you can just figure out how to use equations and use these constraints to represent that general idea. And that’s a theme that’s going to come up a couple of times today, where we want to be able to take some problem and reduce it down to some problem we know how to solve in order to begin to find a solution and to use existing methods that we can use in order to find a solution more effectively or more efficiently. And it turns out that these types of problems, where we have constraints, show up in other ways too. And there’s an entire class of problems that’s more generally just known as constraint satisfaction problems. And we’re going to now take a look at how you might formulate a constraint satisfaction problem and how you might go about solving a constraint satisfaction problem. But the basic idea of a constraint satisfaction problem is we have some number of variables that need to take on some values. And we need to figure out what values each of those variables should take on. But those variables are subject to particular constraints that are going to limit what values those variables can actually take on. So let’s take a look at a real world example, for example. Let’s look at exam scheduling, that I have four students here, students 1, 2, 3, and 4. Each of them is taking some number of different classes. Classes here are going to be represented by letters. So student 1 is enrolled in courses A, B, and C. Student 2 is enrolled in courses B, D, and E, so on and so forth. And now, say university, for example, is trying to schedule exams for all of these courses. But there are only three exam slots on Monday, Tuesday, and Wednesday. And we have to schedule an exam for each of these courses. But the constraint now, the constraint we have to deal with with the scheduling, is that we don’t want anyone to have to take two exams on the same day. We would like to try and minimize that or eliminate it if at all possible. So how do we begin to represent this idea? How do we structure this in a way that a computer with an AI algorithm can begin to try and solve the problem? Well, let’s in particular just look at these classes that we might take and represent each of the courses as some node inside of a graph. And what we’ll do is we’ll create an edge between two nodes in this graph if there is a constraint between those two nodes. So what does this mean? Well, we can start with student 1, who’s enrolled in courses A, B, and C. What that means is that A and B can’t have an exam at the same time. A and C can’t have an exam at the same time. And B and C also can’t have an exam at the same time. And I can represent that in this graph by just drawing edges. One edge between A and B, one between B and C, and then one between C and A. And that encodes now the idea that between those nodes, there is a constraint. And in particular, the constraint happens to be that these two can’t be equal to each other, though there are other types of constraints that are possible, depending on the type of problem that you’re trying to solve. And then we can do the same thing for each of the other students. So for student 2, who’s enrolled in courses B, D, and E, well, that means B, D, and E, those all need to have edges that connect each other as well. Student 3 is enrolled in courses C, E, and F. So we’ll go ahead and take C, E, and F and connect those by drawing edges between them too. And then finally, student 4 is enrolled in courses E, F, and G. And we can represent that by drawing edges between E, F, and G, although E and F already had an edge between them. We don’t need another one, because this constraint is just encoding the idea that course E and course F cannot have an exam on the same day. So this then is what we might call the constraint graph. There’s some graphical representation of all of my variables, so to speak, and the constraints between those possible variables. Where in this particular case, each of the constraints represents an inequality constraint, that an edge between B and D means whatever value the variable B takes on cannot be the value that the variable D takes on as well. So what then actually is a constraint satisfaction problem? Well, a constraint satisfaction problem is just some set of variables, x1 all the way through xn, some set of domains for each of those variables. So every variable needs to take on some values. Maybe every variable has the same domain, but maybe each variable has a slightly different domain. And then there’s a set of constraints, and we’ll just call a set C, that is some constraints that are placed upon these variables, like x1 is not equal to x2. But there could be other forms too, like maybe x1 equals x2 plus 1 if these variables are taking on numerical values in their domain, for example. The types of constraints are going to vary based on the types of problems. And constraint satisfaction shows up all over the place as well, in any situation where we have variables that are subject to particular constraints. So one popular game is Sudoku, for example, this 9 by 9 grid where you need to fill in numbers in each of these cells, but you want to make sure there’s never a duplicate number in any row, or in any column, or in any grid of 3 by 3 cells, for example. So what might this look like as a constraint satisfaction problem? Well, my variables are all of the empty squares in the puzzle. So represented here is just like an x comma y coordinate, for example, as all of the squares where I need to plug in a value, where I don’t know what value it should take on. The domain is just going to be all of the numbers from 1 through 9, any value that I could fill in to one of these cells. So that is going to be the domain for each of these variables. And then the constraints are going to be of the form, like this cell can’t be equal to this cell, can’t be equal to this cell, can’t be, and all of these need to be different, for example, and same for all of the rows, and the columns, and the 3 by 3 squares as well. So those constraints are going to enforce what values are actually allowed. And we can formulate the same idea in the case of this exam scheduling problem, where the variables we have are the different courses, a up through g. The domain for each of these variables is going to be Monday, Tuesday, and Wednesday. Those are the possible values each of the variables can take on, that in this case just represent when is the exam for that class. And then the constraints are of this form, a is not equal to b, a is not equal to c, meaning a and b can’t have an exam on the same day, a and c can’t have an exam on the same day. Or more formally, these two variables cannot take on the same value within their domain. So that then is this formulation of a constraint satisfaction problem that we can begin to use to try and solve this problem. And constraints can come in a number of different forms. There are hard constraints, which are constraints that must be satisfied for a correct solution. So something like in the Sudoku puzzle, you cannot have this cell and this cell that are in the same row take on the same value. That is a hard constraint. But problems can also have soft constraints, where these are constraints that express some notion of preference, that maybe a and b can’t have an exam on the same day, but maybe someone has a preference that a’s exam is earlier than b’s exam. It doesn’t need to be the case with some expression that some solution is better than another solution. And in that case, you might formulate the problem as trying to optimize for maximizing people’s preferences. You want people’s preferences to be satisfied as much as possible. In this case, though, we’ll mostly just deal with hard constraints, constraints that must be met in order to have a correct solution to the problem. So we want to figure out some assignment of these variables to their particular values that is ultimately going to give us a solution to the problem by allowing us to assign some day to each of the classes such that we don’t have any conflicts between classes. So it turns out that we can classify the constraints in a constraint satisfaction problem into a number of different categories. The first of those categories are perhaps the simplest of the types of constraints, which are known as unary constraints, where unary constraint is a constraint that just involves a single variable. For example, a unary constraint might be something like, a does not equal Monday, meaning Course A cannot have its exam on Monday. If for some reason the instructor for the course isn’t available on Monday, you might have a constraint in your problem that looks like this, something that just has a single variable a in it, and maybe says a is not equal to Monday, or a is equal to something, or in the case of numbers greater than or less than something, a constraint that just has one variable, we consider to be a unary constraint. And this is in contrast to something like a binary constraint, which is a constraint that involves two variables, for example. So this would be a constraint like the ones we were looking at before. Something like a does not equal b is an example of a binary constraint, because it is a constraint that has two variables involved in it, a and b. And we represented that using some arc or some edge that connects variable a to variable b. And using this knowledge of, OK, what is a unary constraint? What is a binary constraint? There are different types of things we can say about a particular constraint satisfaction problem. And one thing we can say is we can try and make the problem node consistent. So what does node consistency mean? Node consistency means that we have all of the values in a variable’s domain satisfying that variable’s unary constraints. So for each of the variables inside of our constraint satisfaction problem, if all of the values satisfy the unary constraints for that particular variable, we can say that the entire problem is node consistent, or we can even say that a particular variable is node consistent if we just want to make one node consistent within itself. So what does that actually look like? Let’s look at now a simplified example, where instead of having a whole bunch of different classes, we just have two classes, a and b, each of which has an exam on either Monday or Tuesday or Wednesday. So this is the domain for the variable a, and this is the domain for the variable b. And now let’s imagine we have these constraints, a not equal to Monday, b not equal to Tuesday, b not equal to Monday, a not equal to b. So those are the constraints that we have on this particular problem. And what we can now try to do is enforce node consistency. And node consistency just means we make sure that all of the values for any variable’s domain satisfy its unary constraints. And so we could start by trying to make node a node consistent. Is it consistent? Does every value inside of a’s domain satisfy its unary constraints? Well, initially, we’ll see that Monday does not satisfy a’s unary constraints, because we have a constraint, a unary constraint here, that a is not equal to Monday. But Monday is still in a’s domain. And so this is something that is not node consistent, because we have Monday in the domain. But this is not a valid value for this particular node. And so how do we make this node consistent? Well, to make the node consistent, what we’ll do is we’ll just go ahead and remove Monday from a’s domain. Now a can only be on Tuesday or Wednesday, because we had this constraint that said a is not equal to Monday. And at this point now, a is node consistent. For each of the values that a can take on, Tuesday and Wednesday, there is no constraint that is a unary constraint that conflicts with that idea. There is no constraint that says that a can’t be Tuesday. There is no unary constraint that says that a cannot be on Wednesday. And so now we can turn our attention to b. b also has a domain, Monday, Tuesday, and Wednesday. And we can begin to see whether those variables satisfy the unary constraints as well. Well, here is a unary constraint, b is not equal to Tuesday. And that does not appear to be satisfied by this domain of Monday, Tuesday, and Wednesday, because Tuesday, this possible value that the variable b could take on is not consistent with this unary constraint, that b is not equal to Tuesday. So to solve that problem, we’ll go ahead and remove Tuesday from b’s domain. Now b’s domain only contains Monday and Wednesday. But as it turns out, there’s yet another unary constraint that we placed on the variable b, which is here. b is not equal to Monday. And that means that this value, Monday, inside of b’s domain, is not consistent with b’s unary constraints, because we have a constraint that says the b cannot be Monday. And so we can remove Monday from b’s domain. And now we’ve made it through all of the unary constraints. We’ve not yet considered this constraint, which is a binary constraint. But we’ve considered all of the unary constraints, all of the constraints that involve just a single variable. And we’ve made sure that every node is consistent with those unary constraints. So we can say that now we have enforced node consistency, that for each of these possible nodes, we can pick any of these values in the domain. And there won’t be a unary constraint that is violated as a result of it. So node consistency is fairly easy to enforce. We just take each node, make sure the values in the domain satisfy the unary constraints. Where things get a little bit more interesting is when we consider different types of consistency, something like arc consistency, for example. And arc consistency refers to when all of the values in a variable’s domain satisfy the variable’s binary constraints. So when we’re looking at trying to make a arc consistent, we’re no longer just considering the unary constraints that involve a. We’re trying to consider all of the binary constraints that involve a as well. So any edge that connects a to another variable inside of that constraint graph that we were taking a look at before. Put a little bit more formally, arc consistency. And arc really is just another word for an edge that connects two of these nodes inside of our constraint graph. We can define arc consistency a little more precisely like this. In order to make some variable x arc consistent with respect to some other variable y, we need to remove any element from x’s domain to make sure that every choice for x, every choice in x’s domain, has a possible choice for y. So put another way, if I have a variable x and I want to make x an arc consistent, then I’m going to look at all of the possible values that x can take on and make sure that for all of those possible values, there is still some choice that I can make for y, if there’s some arc between x and y, to make sure that y has a possible option that I can choose as well. So let’s look at an example of that going back to this example from before. We enforced node consistency already by saying that a can only be on Tuesday or Wednesday because we knew that a could not be on Monday. And we also said that b’s only domain only consists of Wednesday because we know that b does not equal Tuesday and also b does not equal Monday. So now let’s begin to consider arc consistency. Let’s try and make a arc consistent with b. And what that means is to make a arc consistent with respect to b means that for any choice we make in a’s domain, there is some choice we can make in b’s domain that is going to be consistent. And we can try that. For a, we can choose Tuesday as a possible value for a. If I choose Tuesday for a, is there a value for b that satisfies the binary constraint? Well, yes, b Wednesday would satisfy this constraint that a does not equal b because Tuesday does not equal Wednesday. However, if we chose Wednesday for a, well, then there is no choice in b’s domain that satisfies this binary constraint. There is no way I can choose something for b that satisfies a does not equal b because I know b must be Wednesday. And so if ever I run into a situation like this where I see that here is a possible value for a such that there is no choice of value for b that satisfies the binary constraint, well, then this is not arc consistent. And to make it arc consistent, I would need to take Wednesday and remove it from a’s domain. Because Wednesday was not going to be a possible choice I can make for a because it wasn’t consistent with this binary constraint for b. There was no way I could choose Wednesday for a and still have an available solution by choosing something for b as well. So here now, I’ve been able to enforce arc consistency. And in doing so, I’ve actually solved this entire problem, that given these constraints where a and b can have exams on either Monday or Tuesday or Wednesday, the only solution, as it would appear, is that a’s exam must be on Tuesday and b’s exam must be on Wednesday. And that is the only option available to me. So if we want to apply our consistency to a larger graph, not just looking at one particular pair of our consistency, there are ways we can do that too. And we can begin to formalize what the pseudocode would look like for trying to write an algorithm that enforces arc consistency. And we’ll start by defining a function called revise. Revise is going to take as input a CSP, otherwise known as a constraint satisfaction problem, and also two variables, x and y. And what revise is going to do is it is going to make x arc consistent with respect to y, meaning remove anything from x’s domain that doesn’t allow for a possible option for y. How does this work? Well, we’ll go ahead and first keep track of whether or not we’ve made a revision. Revise is ultimately going to return true or false. It’ll return true in the event that we did make a revision to x’s domain. It’ll return false if we didn’t make any change to x’s domain. And we’ll see in a moment why that’s going to be helpful. But we start by saying revised equals false. We haven’t made any changes. Then we’ll say, all right, let’s go ahead and loop over all of the possible values in x’s domain. So loop over x’s domain for each little x in x’s domain. I want to make sure that for each of those choices, I have some available choice in y that satisfies the binary constraints that are defined inside of my CSP, inside of my constraint satisfaction problem. So if ever it’s the case that there is no value y in y’s domain that satisfies the constraint for x and y, well, if that’s the case, that means that this value x shouldn’t be in x’s domain. So we’ll go ahead and delete x from x’s domain. And I’ll set revised equal to true because I did change x’s domain. I changed x’s domain by removing little x. And I removed little x because it wasn’t art consistent. There was no way I could choose a value for y that would satisfy this xy constraint. So in this case, we’ll go ahead and set revised equal true. And we’ll do this again and again for every value in x’s domain. Sometimes it might be fine. In other cases, it might not allow for a possible choice for y, in which case we need to remove this value from x’s domain. And at the end, we just return revised to indicate whether or not we actually made a change. So this function, then, this revised function is effectively an implementation of what you saw me do graphically a moment ago. And it makes one variable, x, arc consistent with another variable, in this case, y. But generally speaking, when we want to enforce our consistency, we’ll often want to enforce our consistency not just for a single arc, but for the entire constraint satisfaction problem. And it turns out there’s an algorithm to do that as well. And that algorithm is known as AC3. AC3 takes a constraint satisfaction problem. And it enforces our consistency across the entire problem. How does it do that? Well, it’s going to basically maintain a queue or basically just a line of all of the arcs that it needs to make consistent. And over time, we might remove things from that queue as we begin dealing with our consistency. And we might need to add things to that queue as well if there are more things we need to make arc consistent. So we’ll go ahead and start with a queue that contains all of the arcs in the constraint satisfaction problem, all of the edges that connect two nodes that have some sort of binary constraint between them. And now, as long as the queue is non-empty, there is work to be done. The queue is all of the things that we need to make arc consistent. So as long as the queue is non-empty, there’s still things we have to do. What do we have to do? Well, we’ll start by de-queuing from the queue, remove something from the queue. And strictly speaking, it doesn’t need to be a queue, but a queue is a traditional way of doing this. We’ll de-queue from the queue, and that’ll give us an arc, x and y, these two variables where I would like to make x arc consistent with y. So how do we make x arc consistent with y? Well, we can go ahead and just use that revise function that we talked about a moment ago. We called the revise function, passing as input the constraint satisfaction problem, and also these variables x and y, because I want to make x arc consistent with y. In other words, remove any values from x’s domain that don’t leave an available option for y. And recall, what does revised return? Well, it returns true if we actually made a change, if we removed something from x’s domain, because there wasn’t an available option for y, for example. And it returns false if we didn’t make any change to x’s domain at all. And it turns out if revised returns false, if we didn’t make any changes, well, then there’s not a whole lot more work to be done here for this arc. We can just move ahead to the next arc that’s in the queue. But if we did make a change, if we did reduce x’s domain by removing values from x’s domain, well, then what we might realize is that this creates potential problems later on, that it might mean that some arc that was arc consistent with x, that node might no longer be arc consistent with x, because while there used to be an option that we could choose for x, now there might not be, because now we might have removed something from x that was necessary for some other arc to be arc consistent. And so if ever we did revise x’s domain, we’re going to need to add some things to the queue, some additional arcs that we might want to check. How do we do that? Well, first thing we want to check is to make sure that x’s domain is not 0. If x’s domain is 0, that means there are no available options for x at all. And that means that there’s no way you can solve the constraint satisfaction problem. If we’ve removed everything from x’s domain, we’ll go ahead and just return false here to indicate there’s no way to solve the problem, because there’s nothing left in x’s domain. But otherwise, if there are things left in x’s domain, but fewer things than before, well, then what we’ll do is we’ll loop over each variable z that is in all of x’s neighbors, except for y, y we already handled. But we’ll consider all of x’s other’s neighbors and ask ourselves, all right, will that arc from each of those z’s to x, that arc might no longer be arc consistent, because while for each z, there might have been a possible option we could choose for x to correspond with each of z’s possible values, now there might not be, because we removed some elements from x’s domain. And so what we’ll do here is we’ll go ahead and enqueue, adding something to the queue, this arc zx for all of those neighbors z. So we need to add back some arcs to the queue in order to continue to enforce arc consistency. At the very end, if we make it through all this process, then we can return true. But this now is AC3, this algorithm for enforcing arc consistency on a constraint satisfaction problem. And the big idea is really just keep track of all of the arcs that we might need to make arc consistent, make it arc consistent by calling the revise function. And if we did revise it, then there are some new arcs that might need to be added to the queue in order to make sure that everything is still arc consistent, even after we’ve removed some of the elements from a particular variable’s domain. So what then would happen if we tried to enforce arc consistency on a graph like this, on a graph where each of these variables has a domain of Monday, Tuesday, and Wednesday? Well, it turns out that by enforcing arc consistency on this graph, well, it can solve some types of problems. Nothing actually changes here. For any particular arc, just considering two variables, there’s always a way for me to just, for any of the choices I make for one of them, make a choice for the other one, because there are three options, and I just need the two to be different from each other. So this is actually quite easy to just take an arc and just declare that it is arc consistent, because if I pick Monday for D, then I just pick something that isn’t Monday for B. In arc consistency, we only consider consistency between a binary constraint between two nodes, and we’re not really considering all of the rest of the nodes yet. So just using AC3, the enforcement of arc consistency, that can sometimes have the effect of reducing domains to make it easier to find solutions, but it will not always actually solve the problem. We might still need to somehow search to try and find a solution. And we can use classical traditional search algorithms to try to do so. You’ll recall that a search problem generally consists of these parts. We have some initial state, some actions, a transition model that takes me from one state to another state, a goal test to tell me have I satisfied my objective correctly, and then some path cost function, because in the case of like maze solving, I was trying to get to my goal as quickly as possible. So you could formulate a CSP, or a constraint satisfaction problem, as one of these types of search problems. The initial state will just be an empty assignment, where an assignment is just a way for me to assign any particular variable to any particular value. So if an empty assignment is no variables that are assigned to any values yet, then the action I can take is adding some new variable equals value pair to that assignment, saying for this assignment, let me add a new value for this variable. And the transition model just defines what happens when you take that action. You get a new assignment that has that variable equal to that value inside of it. The goal test is just checking to make sure all the variables have been assigned and making sure all the constraints have been satisfied. And the path cost function is sort of irrelevant. I don’t really care about what the path really is. I just care about finding some assignment that actually satisfies all of the constraints. So really, all the paths have the same cost. I don’t really care about the path to the goal. I just care about the solution itself, much as we’ve talked about now before. The problem here, though, is that if we just implement this naive search algorithm just by implementing like breadth-first search or depth-first search, this is going to be very, very inefficient. And there are ways we can take advantage of efficiencies in the structure of a constraint satisfaction problem itself. And one of the key ideas is that we can really just order these variables. And it doesn’t matter what order we assign variables in. The assignment a equals 2 and then b equals 8 is identical to the assignment of b equals 8 and then a equals 2. Switching the order doesn’t really change anything about the fundamental nature of that assignment. And so there are some ways that we can try and revise this idea of a search algorithm to apply it specifically for a problem like a constraint satisfaction problem. And it turns out the search algorithm we’ll generally use when talking about constraint satisfaction problems is something known as backtracking search. And the big idea of backtracking search is we’ll go ahead and make assignments from variables to values. And if ever we get stuck, we arrive at a place where there is no way we can make any forward progress while still preserving the constraints that we need to enforce, we’ll go ahead and backtrack and try something else instead. So the very basic sketch of what backtracking search looks like is it looks like this. Function called backtrack that takes as input an assignment and a constraint satisfaction problem. So initially, we don’t have any assigned variables. So when we begin backtracking search, this assignment is just going to be the empty assignment with no variables inside of it. But we’ll see later this is going to be a recursive function. So backtrack takes as input the assignment and the problem. If the assignment is complete, meaning all of the variables have been assigned, we just return that assignment. That, of course, won’t be true initially, because we start with an empty assignment. But over time, we might add things to that assignment. So if ever the assignment actually is complete, then we’re done. Then just go ahead and return that assignment. But otherwise, there is some work to be done. So what we’ll need to do is select an unassigned variable for this particular problem. So we need to take the problem, look at the variables that have already been assigned, and pick a variable that has not yet been assigned. And I’ll go ahead and take that variable. And then I need to consider all of the values in that variable’s domain. So we’ll go ahead and call this domain values function. We’ll talk a little more about that later, that takes a variable and just gives me back an ordered list of all of the values in its domain. So I’ve taken a random unselected variable. I’m going to loop over all of the possible values. And the idea is, let me just try all of these values as possible values for the variable. So if the value is consistent with the assignment so far, it doesn’t violate any of the constraints, well then let’s go ahead and add variable equals value to the assignment because it’s so far consistent. And now let’s recursively call backtrack to try and make the rest of the assignments also consistent. So I’ll go ahead and call backtrack on this new assignment that I’ve added the variable equals value to. And now I recursively call backtrack and see what the result is. And if the result isn’t a failure, well then let me just return that result. And otherwise, what else could happen? Well, if it turns out the result was a failure, well then that means this value was probably a bad choice for this particular variable because when I assigned this variable equal to that value, eventually down the road I ran into a situation where I violated constraints. There was nothing more I could do. So now I’ll remove variable equals value from the assignment, effectively backtracking to say, all right, that value didn’t work. Let’s try another value instead. And then at the very end, if we were never able to return a complete assignment, we’ll just go ahead and return failure because that means that none of the values worked for this particular variable. This now is the idea for backtracking search, to take each of the variables, try values for them, and recursively try backtracking search, see if we can make progress. And if ever we run into a dead end, we run into a situation where there is no possible value we can choose that satisfies the constraints, we return failure. And that propagates up, and eventually we make a different choice by going back and trying something else instead. So let’s put this algorithm into practice. Let’s actually try and use backtracking search to solve this problem now, where I need to figure out how to assign each of these courses to an exam slot on Monday or Tuesday or Wednesday in such a way that it satisfies these constraints, that each of these edges mean those two classes cannot have an exam on the same day. So I can start by just starting at a node. It doesn’t really matter which I start with, but in this case, I’ll just start with A. And I’ll ask the question, all right, let me loop over the values in the domain. And maybe in this case, I’ll just start with Monday and say, all right, let’s go ahead and assign A to Monday. We’ll just go and order Monday, Tuesday, Wednesday. And now let’s consider node B. So I’ve made an assignment to A, so I recursively call backtrack with this new part of the assignment. And now I’m looking to pick another unassigned variable like B. And I’ll say, all right, maybe I’ll start with Monday, because that’s the very first value in B’s domain. And I ask, all right, does Monday violate any constraints? And it turns out, yes, it does. It violates this constraint here between A and B, because A and B are now both on Monday, and that doesn’t work, because B can’t be on the same day as A. So that doesn’t work. So we might instead try Tuesday, try the next value in B’s domain. And is that consistent with the assignment so far? Well, yeah, B, Tuesday, A, Monday, that is consistent so far, because they’re not on the same day. So that’s good. Now we can recursively call backtrack. Try again. Pick another unassigned variable, something like D, and say, all right, let’s go through its possible values. Is Monday consistent with this assignment? Well, yes, it is. B and D are on different days, Monday versus Tuesday. And A and B are also on different days, Monday versus Tuesday. So that’s fine so far, too. We’ll go ahead and try again. Maybe we’ll go to this variable here, E. Say, can we make that consistent? Let’s go through the possible values. We’ve recursively called backtrack. We might start with Monday and say, all right, that’s not consistent, because D and E now have exams on the same day. So we might try Tuesday instead, going to the next one. Ask, is that consistent? Well, no, it’s not, because B and E, those have exams on the same day. And so we try, all right, is Wednesday consistent? And in turn, it’s like, all right, yes, it is. Wednesday is consistent, because D and E now have exams on different days. B and E now have exams on different days. All seems to be well so far. I recursively call backtrack, select another unassigned variable, we’ll say maybe choose C this time, and say, all right, let’s try the values that C could take on. Let’s start with Monday. And it turns out that’s not consistent, because now A and C both have exams on the same day. So I try Tuesday and say, that’s not consistent either, because B and C now have exams on the same day. And then I say, all right, let’s go ahead and try Wednesday. But that’s not consistent either, because C and E each have exams on the same day too. So now we’ve gone through all the possible values for C, Monday, Tuesday, and Wednesday. And none of them are consistent. There is no way we can have a consistent assignment. Backtrack, in this case, will return a failure. And so then we’d say, all right, we have to backtrack back to here. Well, now for E, we’ve tried all of Monday, Tuesday, and Wednesday. And none of those work, because Wednesday, which seemed to work, turned out to be a failure. So that means there’s no possible way we can assign E. So that’s a failure too. We have to go back up to D, which means that Monday assignment to D, that must be wrong. We must try something else. So we can try, all right, what if instead of Monday, we try Tuesday? Tuesday, it turns out, is not consistent, because B and D now have an exam on the same day. But Wednesday, as it turns out, works. And now we can begin to mix and forward progress again. We go back to E and say, all right, which of these values works? Monday turns out to work by not violating any constraints. Then we go up to C now. Monday doesn’t work, because it violates a constraint. Violates two, actually. Tuesday doesn’t work, because it violates a constraint as well. But Wednesday does work. Then we can go to the next variable, F, and say, all right, does Monday work? We’ll know. It violates a constraint. But Tuesday does work. And then finally, we can look at the last variable, G, recursively calling backtrack one more time. Monday is inconsistent. That violates a constraint. Tuesday also violates a constraint. But Wednesday, that doesn’t violate a constraint. And so now at this point, we recursively call backtrack one last time. We now have a satisfactory assignment of all of the variables. And at this point, we can say that we are now done. We have now been able to successfully assign a variable or a value to each one of these variables in such a way that we’re not violating any constraints. We’re going to go ahead and have classes A and E have their exams on Monday. Classes B and F can have their exams on Tuesday. And classes C, D, and G can have their exams on Wednesday. And there’s no violated constraints that might come up there. So that then was a graphical look at how this might work. Let’s now take a look at some code we could use to actually try and solve this problem as well. So here I’ll go ahead and go into the scheduling directory. We’re here now. We’ll start by looking at schedule0.py. We’re here. I define a list of variables, A, B, C, D, E, F, G. Those are all different classes. Then underneath that, I define my list of constraints. So constraint A and B. That is a constraint because they can’t be on the same day. Likewise, A and C, B and C, so on and so forth, enforcing those exact same constraints. And here then is what the backtracking function might look like. First, if the assignment is complete, if I’ve made an assignment of every variable to a value, go ahead and just return that assignment. Then we’ll select an unassigned variable from that assignment. Then for each of the possible values in the domain, Monday, Tuesday, Wednesday, let’s go ahead and create a new assignment that assigns the variable to that value. I’ll call this consistent function, which I’ll show you in a moment, that just checks to make sure this new assignment is consistent. But if it is consistent, we’ll go ahead and call backtrack to go ahead and continue trying to run backtracking search. And as long as the result is not none, meaning it wasn’t a failure, we can go ahead and return that result. But if we make it through all the values and nothing works, then it is a failure. There’s no solution. We go ahead and return none here. What do these functions do? Select unassigned variable is just going to choose a variable not yet assigned. So it’s going to loop over all the variables. And if it’s not already assigned, we’ll go ahead and just return that variable. And what does the consistent function do? Well, the consistent function goes through all the constraints. And if we have a situation where we’ve assigned both of those values to variables, but they are the same, well, then that is a violation of the constraint, in which case we’ll return false. But if nothing is inconsistent, then the assignment is consistent and will return true. And then all the program does is it calls backtrack on an empty assignment, an empty dictionary that has no variable assigned and no values yet, save that inside a solution, and then print out that solution. So by running this now, I can run Python schedule0.py. And what I get as a result of that is an assignment of all these variables to values. And it turns out we assign a to Monday as we would expect, b to Tuesday, c to Wednesday, exactly the same type of thing we were talking about before, an assignment of each of these variables to values that doesn’t violate any constraints. And I had to do a fair amount of work in order to implement this idea myself. I had to write the backtrack function that went ahead and went through this process of recursively trying to do this backtracking search. But it turns out the constraint satisfaction problems are so popular that there exist many libraries that already implement this type of idea. Again, as with before, the specific library is not as important as the fact that libraries do exist. This is just one example of a Python constraint library, where now, rather than having to do all the work from scratch inside of schedule1.py, I’m just taking advantage of a library that implements a lot of these ideas already. So here, I create a new problem, add variables to it with particular domains. I add a whole bunch of these individual constraints, where I call addConstraint and pass in a function describing what the constraint is. And the constraint basically says the function that takes two variables, x and y, and makes sure that x is not equal to y, enforcing the idea that these two classes cannot have exams on the same day. And then, for any constraint satisfaction problem, I can call getSolutions to get all the solutions to that problem. And then, for each of those solutions, print out what that solution happens to be. And if I run python schedule1.py, and now see, there are actually a number of different solutions that can be used to solve the problem. There are, in fact, six different solutions, assignments of variables to values that will give me a satisfactory answer to this constraint satisfaction problem. So this then was an implementation of a very basic backtracking search method, where really we just went through each of the variables, picked one that wasn’t assigned, tried the possible values the variable could take on. And then, if it worked, if it didn’t violate any constraints, then we kept trying other variables. And if ever we hit a dead end, we had to backtrack. But ultimately, we might be able to be a little bit more intelligent about how we do this in order to improve the efficiency of how we solve these sorts of problems. And one thing we might imagine trying to do is going back to this idea of inference, using the knowledge we know to be able to draw conclusions in order to make the rest of the problem solving process a little bit easier. And let’s now go back to where we got stuck in this problem the first time. When we were solving this constraint satisfaction problem, we dealt with B. And then we went on to D. And we went ahead and just assigned D to Monday, because that seemed to work with the assignment so far. It didn’t violate any constraints. But it turned out that later on that choice turned out to be a bad one, that that choice wasn’t consistent with the rest of the values that we could take on here. And the question is, is there anything we could do to avoid getting into a situation like this, avoid trying to go down a path that’s ultimately not going to lead anywhere by taking advantage of knowledge that we have initially? And it turns out we do have that kind of knowledge. We can look at just the structure of this graph so far. And we can say that right now C’s domain, for example, contains values Monday, Tuesday, and Wednesday. And based on those values, we can say that this graph is not arc consistent. Recall that arc consistency is all about making sure that for every possible value for a particular node, that there is some other value that we are able to choose. And as we can see here, Monday and Tuesday are not going to be possible values that we can choose for C. They’re not going to be consistent with a node like B, for example, because B is equal to Tuesday, which means that C cannot be Tuesday. And because A is equal to Monday, C also cannot be Monday. So using that information, by making C arc consistent with A and B, we could remove Monday and Tuesday from C’s domain and just leave C with Wednesday, for example. And if we continued to try and enforce arc consistency, we’d see there are some other conclusions we can draw as well. We see that B’s only option is Tuesday and C’s only option is Wednesday. And so if we want to make E arc consistent, well, E can’t be Tuesday, because that wouldn’t be arc consistent with B. And E can’t be Wednesday, because that wouldn’t be arc consistent with C. So we can go ahead and say E and just set that equal to Monday, for example. And then we can begin to do this process again and again, that in order to make D arc consistent with B and E, then D would have to be Wednesday. That’s the only possible option. And likewise, we can make the same judgments for F and G as well. And it turns out that without having to do any additional search, just by enforcing arc consistency, we were able to actually figure out what the assignment of all the variables should be without needing to backtrack at all. And the way we did that is by interleaving this search process and the inference step, by this step of trying to enforce arc consistency. And the algorithm to do this is often called just the maintaining arc consistency algorithm, which just enforces arc consistency every time we make a new assignment of a value to an existing variable. So sometimes we can enforce our consistency using that AC3 algorithm at the very beginning of the problem before we even begin searching in order to limit the domain of the variables in order to make it easier to search. But we can also take advantage of the interleaving of enforcing our consistency with search such that every time in the search process we make a new assignment, we go ahead and enforce arc consistency as well to make sure that we’re just eliminating possible values from domains whenever possible. And how do we do this? Well, this is really equivalent to just every time we make a new assignment to a variable x. We’ll go ahead and call our AC3 algorithm, this algorithm that enforces arc consistency on a constraint satisfaction problem. And we go ahead and call that, starting it with a Q, not of all of the arcs, which we did originally, but just of all of the arcs that we want to make arc consistent with x, this thing that we have just made an assignment to. So all arcs yx, where y is a neighbor of x, something that shares a constraint with x, for example. And by maintaining arc consistency in the backtracking search process, we can ultimately make our search process a little bit more efficient. And so this is the revised version of this backtrack function. Same as before, the changes here are highlighted in yellow. Every time we add a new variable equals value to our assignment, we’ll go ahead and run this inference procedure, which might do a number of different things. But one thing it could do is call the maintaining arc consistency algorithm to make sure we’re able to enforce arc consistency on the problem. And we might be able to draw new inferences as a result of that process. Get new guarantees of this variable needs to be equal to that value, for example. That might happen one time. It might happen many times. And so long as those inferences are not a failure, as long as they don’t lead to a situation where there is no possible way to make forward progress, well, then we can go ahead and add those inferences, those new knowledge, that new pieces of knowledge I know about what variables should be assigned to what values, I can add those to the assignment in order to more quickly make forward progress by taking advantage of information that I can just deduce, information I know based on the rest of the structure of the constraint satisfaction problem. And the only other change I’ll need to make now is if it turns out this value doesn’t work, well, then down here, I’ll go ahead and need to remove not only variable equals value, but also any of those inferences that I made, remove that from the assignment as well. So here, then, we’re often able to solve the problem by backtracking less than we might originally have needed to, just by taking advantage of the fact that every time we make a new assignment of one variable to one value, that might reduce the domains of other variables as well. And we can use that information to begin to more quickly draw conclusions in order to try and solve the problem more efficiently as well. And it turns out there are other heuristics we can use to try and improve the efficiency of our search process as well. And it really boils down to a couple of these functions that I’ve talked about, but we haven’t really talked about how they’re working. And one of them is this function here, select unassigned variable, where we’re selecting some variable in the constraint satisfaction problem that has not yet been assigned. So far, I’ve sort of just been selecting variables randomly, just like picking one variable and one unassigned variable in order to decide, all right, this is the variable that we’re going to assign next, and then going from there. But it turns out that by being a little bit intelligent, by following certain heuristics, we might be able to make the search process much more efficient just by choosing very carefully which variable we should explore next. So some of those heuristics include the minimum remaining values, or MRV heuristic, which generally says that if I have a choice between which variable I should select, I should select the variable with the smallest domain, the variable that has the fewest number of remaining values left. With the idea being, if there are only two remaining values left, well, I may as well prune one of them very quickly in order to get to the other, because one of those two has got to be the solution, if a solution does exist. Sometimes minimum remaining values might not give a conclusive result if all the nodes have the same number of remaining values, for example. And in that case, another heuristic that can be helpful to look at is the degree heuristic. The degree of a node is the number of nodes that are attached to that node, the number of nodes that are constrained by that particular node. And if you imagine which variable should I choose, should I choose a variable that has a high degree that is connected to a lot of different things, or a variable with a low degree that is not connected to a lot of different things, well, it can often make sense to choose the variable that has the highest degree that is connected to the most other nodes as the thing you would search first. Why is that the case? Well, it’s because by choosing a variable with a high degree, that is immediately going to constrain the rest of the variables more, and it’s more likely to be able to eliminate large sections of the state space that you don’t need to search through at all. So what could this actually look like? Let’s go back to this search problem here. In this particular case, I’ve made an assignment here. I’ve made an assignment here. And the question is, what should I look at next? And according to the minimum remaining values heuristic, what I should choose is the variable that has the fewest remaining possible values. And in this case, that’s this node here, node C, that only has one variable left in this domain, which in this case is Wednesday, which is a very reasonable choice of a next assignment to make, because I know it’s the only option, for example. I know that the only possible option for C is Wednesday, so I may as well make that assignment and then potentially explore the rest of the space after that. But meanwhile, at the very start of the problem, when I didn’t have any knowledge of what nodes should have what values yet, I still had to pick what node should be the first one that I try and assign a value to. And I arbitrarily just chose the one at the top, node A originally. But we can be more intelligent about that. We can look at this particular graph. All of them have domains of the same size, domain of size 3. So minimum remaining values doesn’t really help us there. But we might notice that node E has the highest degree. It is connected to the most things. And so perhaps it makes sense to begin our search, rather than starting at node A at the very top, start with the node with the highest degree. Start by searching from node E, because from there, that’s going to much more easily allow us to enforce the constraints that are nearby, eliminating large portions of the search space that I might not need to search through. And in fact, by starting with E, we can immediately then assign other variables. And following that, we can actually assign the rest of the variables without needing to do any backtracking at all, even if I’m not using this inference procedure. Just by starting with a node that has a high degree, that is going to very quickly restrict the possible values that other nodes can take on. So that then is how we can go about selecting an unassigned variable in a particular order. Rather than randomly picking a variable, if we’re a little bit intelligent about how we choose it, we can make our search process much, much more efficient by making sure we don’t have to search through portions of the search space that ultimately aren’t going to matter. The other variable we haven’t really talked about, the other function here, is this domain values function. This domain values function that takes a variable and gives me back a sequence of all of the values inside of that variable’s domain. The naive way to approach it is what we did before, which is just go in order, go Monday, then Tuesday, then Wednesday. But the problem is that going in that order might not be the most efficient order to search in, that sometimes it might be more efficient to choose values that are likely to be solutions first and then go to other values. Now, how do you assess whether a value is likelier to lead to a solution or less likely to lead to a solution? Well, one thing you can take a look at is how many constraints get added, how many things get removed from domains as you make this new assignment of a variable to this particular value. And the heuristic we can use here is the least constraining value heuristic, which is the idea that we should return variables in order based on the number of choices that are ruled out for neighboring values. And I want to start with the least constraining value, the value that rules out the fewest possible options. And the idea there is that if all I care about doing is finding a solution, if I start with a value that rules out a lot of other choices, I’m ruling out a lot of possibilities that maybe is going to make it less likely that this particular choice leads to a solution. Whereas on the other hand, if I have a variable and I start by choosing a value that doesn’t rule out very much, well, then I still have a lot of space where there might be a solution that I could ultimately find. And this might seem a little bit counterintuitive and a little bit at odds with what we were talking about before, where I said, when you’re picking a variable, you should pick the variable that is going to have the fewest possible values remaining. But here, I want to pick the value for the variable that is the least constraining. But the general idea is that when I am picking a variable, I would like to prune large portions of the search space by just choosing a variable that is going to allow me to quickly eliminate possible options. Whereas here, within a particular variable, as I’m considering values that that variable could take on, I would like to just find a solution. And so what I want to do is ultimately choose a value that still leaves open the possibility of me finding a solution to be as likely as possible. By not ruling out many options, I leave open the possibility that I can still find a solution without needing to go back later and backtrack. So an example of that might be in this particular situation here, if I’m trying to choose a variable for a value for node C here, that C is equal to either Tuesday or Wednesday. We know it can’t be Monday because it conflicts with this domain here, where we already know that A is Monday, so C must be Tuesday or Wednesday. And the question is, should I try Tuesday first, or should I try Wednesday first? And if I try Tuesday, what gets ruled out? Well, one option gets ruled out here, a second option gets ruled out here, and a third option gets ruled out here. So choosing Tuesday would rule out three possible options. And what about choosing Wednesday? Well, choosing Wednesday would rule out one option here, and it would rule out one option there. And so I have two choices. I can choose Tuesday that rules out three options, or Wednesday that rules out two options. And according to the least constraining value heuristic, what I should probably do is go ahead and choose Wednesday, the one that rules out the fewest number of possible options, leaving open as many chances as possible for me to eventually find the solution inside of the state space. And ultimately, if you continue this process, we will find the solution, an assignment of variables, two values, that allows us to give each of these exams, each of these classes, an exam date that doesn’t conflict with anyone that happens to be enrolled in two classes at the same time. So the big takeaway now with all of this is that there are a number of different ways we can formulate a problem. The ways we’ve looked at today are we can formulate a problem as a local search problem, a problem where we’re looking at a current node and moving to a neighbor based on whether that neighbor is better or worse than the current node that we are looking at. We looked at formulating problems as linear programs, where just by putting things in terms of equations and constraints, we’re able to solve problems a little bit more efficiently. And we saw formulating a problem as a constraint satisfaction problem, creating this graph of all of the constraints that connect two variables that have some constraint between them, and using that information to be able to figure out what the solution should be. And so the takeaway of all of this now is that if we have some problem in artificial intelligence that we would like to use AI to be able to solve them, whether that’s trying to figure out where hospitals should be or trying to solve the traveling salesman problem, trying to optimize productions and costs and whatnot, or trying to figure out how to satisfy certain constraints, whether that’s in a Sudoku puzzle, or whether that’s in trying to figure out how to schedule exams for a university, or any number of a wide variety of types of problems, if we can formulate that problem as one of these sorts of problems, then we can use these known algorithms, these algorithms for enforcing art consistency and backtracking search, these hill climbing and simulated annealing algorithms, these simplex algorithms and interior point algorithms that can be used to solve linear programs, that we can use those techniques to begin to solve a whole wide variety of problems all in this world of optimization inside of artificial intelligence. This was an introduction to artificial intelligence with Python for today. We will see you next time. [” All right. Welcome back, everyone, to an introduction to artificial intelligence with Python. Now, so far in this class, we’ve used AI to solve a number of different problems, giving AI instructions for how to search for a solution, or how to satisfy certain constraints in order to find its way from some input point to some output point in order to solve some sort of problem. Today, we’re going to turn to the world of learning, in particular the idea of machine learning, which generally refers to the idea where we are not going to give the computer explicit instructions for how to perform a task, but rather we are going to give the computer access to information in the form of data, or patterns that it can learn from, and let the computer try and figure out what those patterns are, try and understand that data to be able to perform a task on its own. Now, machine learning comes in a number of different forms, and it’s a very wide field. So today, we’ll explore some of the foundational algorithms and ideas that are behind a lot of the different areas within machine learning. And one of the most popular is the idea of supervised machine learning, or just supervised learning. And supervised learning is a particular type of task. It refers to the task where we give the computer access to a data set, where that data set consists of input-output pairs. And what we would like the computer to do is we would like our AI to be able to figure out some function that maps inputs to outputs. So we have a whole bunch of data that generally consists of some kind of input, some evidence, some information that the computer will have access to. And we would like the computer, based on that input information, to predict what some output is going to be. And we’ll give it some data so that the computer can train its model on and begin to understand how it is that this information works and how it is that the inputs and outputs relate to each other. But ultimately, we hope that our computer will be able to figure out some function that, given those inputs, is able to get those outputs. There are a couple of different tasks within supervised learning. The one we’ll focus on and start with is known as classification. And classification is the problem where, if I give you a whole bunch of inputs, you need to figure out some way to map those inputs into discrete categories, where you can decide what those categories are, and it’s the job of the computer to predict what those categories are going to be. So that might be, for example, I give you information about a bank note, like a US dollar, and I’m asking you to predict for me, does it belong to the category of authentic bank notes, or does it belong to the category of counterfeit bank notes? You need to categorize the input, and we want to train the computer to figure out some function to be able to do that calculation. Another example might be the case of weather, someone we’ve talked about a little bit so far in this class, where we would like to predict on a given day, is it going to rain on that day? Is it going to be cloudy on that day? And before we’ve seen how we could do this, if we really give the computer all the exact probabilities for if these are the conditions, what’s the probability of rain? Oftentimes, we don’t have access to that information, though. But what we do have access to is a whole bunch of data. So if we wanted to be able to predict something like, is it going to rain or is it not going to rain, we would give the computer historical information about days when it was raining and days when it was not raining and ask the computer to look for patterns in that data. So what might that data look like? Well, we could structure that data in a table like this. This might be what our table looks like, where for any particular day, going back, we have information about that day’s humidity, that day’s air pressure, and then importantly, we have a label, something where the human has said that on this particular day, it was raining or it was not raining. So you could fill in this table with a whole bunch of data. And what makes this what we would call a supervised learning exercise is that a human has gone in and labeled each of these data points, said that on this day, when these were the values for the humidity and pressure, that day was a rainy day and this day was a not rainy day. And what we would like the computer to be able to do then is to be able to figure out, given these inputs, given the humidity and the pressure, can the computer predict what label should be associated with that day? Does that day look more like it’s going to be a day that rains or does it look more like a day when it’s not going to rain? Put a little bit more mathematically, you can think of this as a function that takes two inputs, the inputs being the data points that our computer will have access to, things like humidity and pressure. So we could write a function f that takes as input both humidity and pressure. And then the output is going to be what category we would ascribe to these particular input points, what label we would associate with that input. So we’ve seen a couple of example data points here, where given this value for humidity and this value for pressure, we predict, is it going to rain or is it not going to rain? And that’s information that we just gathered from the world. We measured on various different days what the humidity and pressure were. We observed whether or not we saw rain or no rain on that particular day. And this function f is what we would like to approximate. Now, the computer and we humans don’t really know exactly how this function f works. It’s probably quite a complex function. So what we’re going to do instead is attempt to estimate it. We would like to come up with a hypothesis function. h, which is going to try to approximate what f does. We want to come up with some function h that will also take the same inputs and will also produce an output, rain or no rain. And ideally, we’d like these two functions to agree as much as possible. So the goal then of the supervised learning classification tasks is going to be to figure out, what does that function h look like? How can we begin to estimate, given all of this information, all of this data, what category or what label should be assigned to a particular data point? So where could you begin doing this? Well, a reasonable thing to do, especially in this situation, I have two numerical values, is I could try to plot this on a graph that has two axes, an x-axis and a y-axis. And in this case, we’re just going to be using two numerical values as input. But these same types of ideas scale as you add more and more inputs as well. We’ll be plotting things in two dimensions. But as we soon see, you could add more inputs and just imagine things in multiple dimensions. And while we humans have trouble conceptualizing anything really beyond three dimensions, at least visually, a computer has no problem with trying to imagine things in many, many more dimensions, that for a computer, each dimension is just some separate number that it is keeping track of. So it wouldn’t be unreasonable for a computer to think in 10 dimensions or 100 dimensions to be able to try to solve a problem. But for now, we’ve got two inputs. So we’ll graph things along two axes, an x-axis, which will here represent humidity, and a y-axis, which here represents pressure. And what we might do is say, let’s take all of the days that were raining and just try to plot them on this graph and see where they fall on this graph. And here might be all of the rainy days, where each rainy day is one of these blue dots here that corresponds to a particular value for humidity and a particular value for pressure. And then I might do the same thing with the days that were not rainy. So take all the not rainy days, figure out what their values were for each of these two inputs, and go ahead and plot them on this graph as well. And I’ve here plotted them in red. So blue here stands for a rainy day. Red here stands for a not rainy day. And this then is the input that my computer has access to all of this input. And what I would like the computer to be able to do is to train a model such that if I’m ever presented with a new input that doesn’t have a label associated with it, something like this white dot here, I would like to predict, given those values for each of the two inputs, should we classify it as a blue dot, a rainy day, or should we classify it as a red dot, a not rainy day? And if you’re just looking at this picture graphically, trying to say, all right, this white dot, does it look like it belongs to the blue category, or does it look like it belongs to the red category, I think most people would agree that it probably belongs to the blue category. And why is that? Well, it looks like it’s close to other blue dots. And that’s not a very formal notion, but it’s a notion that we’ll formalize in just a moment. That because it seems to be close to this blue dot here, nothing else is closer to it, then we might say that it should be categorized as blue. It should fall into that category of, I think that day is going to be a rainy day based on that input. Might not be totally accurate, but it’s a pretty good guess. And this type of algorithm is actually a very popular and common machine learning algorithm known as nearest neighbor classification. It’s an algorithm for solving these classification-type problems. And in nearest neighbor classification, it’s going to perform this algorithm. What it will do is, given an input, it will choose the class of the nearest data point to that input. By class, we just here mean category, like rain or no rain, counterfeit or not counterfeit. And we choose the category or the class based on the nearest data point. So given all that data, we just looked at, is the nearest data point a blue point or is it a red point? And depending on the answer to that question, we were able to make some sort of judgment. We were able to say something like, we think it’s going to be blue or we think it’s going to be red. So likewise, we could apply this to other data points that we encounter as well. If suddenly this data point comes about, well, its nearest data is red. So we would go ahead and classify this as a red point, not raining. Things get a little bit trickier, though, when you look at a point like this white point over here and you ask the same sort of question. Should it belong to the category of blue points, the rainy days? Or should it belong to the category of red points, the not rainy days? Now, nearest neighbor classification would say the way you solve this problem is look at which point is nearest to that point. You look at this nearest point and say it’s red. It’s a not rainy day. And therefore, according to nearest neighbor classification, I would say that this unlabeled point, well, that should also be red. It should also be classified as a not rainy day. But your intuition might think that that’s a reasonable judgment to make, that it’s the closest thing is a not rainy day. So may as well guess that it’s a not rainy day. But it’s probably also reasonable to look at the bigger picture of things to say, yes, it is true that the nearest point to it was a red point. But it’s surrounded by a whole bunch of other blue points. So looking at the bigger picture, there’s potentially an argument to be made that this point should actually be blue. And with only this data, we actually don’t know for sure. We are given some input, something we’re trying to predict. And we don’t necessarily know what the output is going to be. So in this case, which one is correct is difficult to say. But oftentimes, considering more than just a single neighbor, considering multiple neighbors can sometimes give us a better result. And so there’s a variant on the nearest neighbor classification algorithm that is known as the K nearest neighbor classification algorithm, where K is some parameter, some number that we choose, for how many neighbors are we going to look at. So one nearest neighbor classification is what we saw before. Just pick the one nearest neighbor and use that category. But with K nearest neighbor classification, where K might be 3, or 5, or 7, to say look at the 3, or 5, or 7 closest neighbors, closest data points to that point, works a little bit differently. This algorithm, we’ll give it an input. Choose the most common class out of the K nearest data points to that input. So if we look at the five nearest points, and three of them say it’s raining, and two of them say it’s not raining, we’ll go with the three instead of the two, because each one effectively gets one vote towards what they believe the category ought to be. And ultimately, you choose the category that has the most votes as a consequence of that. So K nearest neighbor classification, fairly straightforward one to understand intuitively. You just look at the neighbors and figure out what the answer might be. And it turns out this can work very, very well for solving a whole variety of different types of classification problems. But not every model is going to work under every situation. And so one of the things we’ll take a look at today, especially in the context of supervised machine learning, is that there are a number of different approaches to machine learning, a number of different algorithms that we can apply, all solving the same type of problem, all solving some kind of classification problem where we want to take inputs and organize it into different categories. And no one algorithm is necessarily always going to be better than some other algorithm. They each have their trade-offs. And maybe depending on the data, one type of algorithm is going to be better suited to trying to model that information than some other algorithm. And so this is what a lot of machine learning research ends up being about, that when you’re trying to apply machine learning techniques, you’re often looking not just at one particular algorithm, but trying multiple different algorithms, trying to see what is going to give you the best results for trying to predict some function that maps inputs to outputs. So what then are the drawbacks of K nearest neighbor classification? Well, there are a couple. One might be that in a naive approach, at least, it could be fairly slow to have to go through and measure the distance between a point and every single one of these points that exist here. Now, there are ways of trying to get around that. There are data structures that can help to make it more quickly to be able to find these neighbors. There are also techniques you can use to try and prune some of this data, remove some of the data points so that you’re only left with the relevant data points just to make it a little bit easier. But ultimately, what we might like to do is come up with another way of trying to do this classification. And one way of trying to do the classification was looking at what are the neighboring points. But another way might be to try to look at all of the data and see if we can come up with some decision boundary, some boundary that will separate the rainy days from the not rainy days. And in the case of two dimensions, we can do that by drawing a line, for example. So what we might want to try to do is just find some line, find some separator that divides the rainy days, the blue points over here, from the not rainy days, the red points over there. We’re now trying a different approach in contrast with the nearest neighbor approach, which just looked at local data around the input data point that we cared about. Now what we’re doing is trying to use a technique known as linear regression to find some sort of line that will separate the two halves from each other. Now sometimes it’ll actually be possible to come up with some line that perfectly separates all the rainy days from the not rainy days. Realistically, though, this is probably cleaner than many data sets will actually be. Oftentimes, data is messier. There are outliers. There’s random noise that happens inside of a particular system. And what we’d like to do is still be able to figure out what a line might look like. So in practice, the data will not always be linearly separable. Or linearly separable refers to some data set where I could draw a line just to separate the two halves of it perfectly. Instead, you might have a situation like this, where there are some rainy points that are on this side of the line and some not rainy points that are on that side of the line. And there may not be a line that perfectly separates what path of the inputs from the other half, that perfectly separates all the rainy days from the not rainy days. But we can still say that this line does a pretty good job. And we’ll try to formalize a little bit later what we mean when we say something like this line does a pretty good job of trying to make that prediction. But for now, let’s just say we’re looking for a line that does as good of a job as we can at trying to separate one category of things from another category of things. So let’s now try to formalize this a little bit more mathematically. We want to come up with some sort of function, some way we can define this line. And our inputs are things like humidity and pressure in this case. So our inputs we might call x1 is going to represent humidity, and x2 is going to represent pressure. These are inputs that we are going to provide to our machine learning algorithm. And given those inputs, we would like for our model to be able to predict some sort of output. And we are going to predict that using our hypothesis function, which we called h. Our hypothesis function is going to take as input x1 and x2, humidity and pressure in this case. And you can imagine if we didn’t just have two inputs, we had three or four or five inputs or more, we could have this hypothesis function take all of those as input. And we’ll see examples of that a little bit later as well. And now the question is, what does this hypothesis function do? Well, it really just needs to measure, is this data point on one side of the boundary, or is it on the other side of the boundary? And how do we formalize that boundary? Well, the boundary is generally going to be a linear combination of these input variables, at least in this particular case. So what we’re trying to do when we say linear combination is take each of these inputs and multiply them by some number that we’re going to have to figure out. We’ll generally call that number a weight for how important should these variables be in trying to determine the answer. So we’ll weight each of these variables with some weight, and we might add a constant to it just to try and make the function a little bit different. And the result, we just need to compare. Is it greater than 0, or is it less than 0 to say, does it belong on one side of the line or the other side of the line? So what that mathematical expression might look like is this. We would take each of my variables, x1 and x2, multiply them by some weight. I don’t yet know what that weight is, but it’s going to be some number, weight 1 and weight 2. And maybe we just want to add some other weight 0 to it, because the function might require us to shift the entire value up or down by a certain amount. And then we just compare. If we do all this math, is it greater than or equal to 0? If so, we might categorize that data point as a rainy day. And otherwise, we might say, no rain. So the key here, then, is that this expression is how we are going to calculate whether it’s a rainy day or not. We’re going to do a bunch of math where we take each of the variables, multiply them by a weight, maybe add an extra weight to it, see if the result is greater than or equal to 0. And using that result of that expression, we’re able to determine whether it’s raining or not raining. This expression here is in this case going to refer to just some line. If you were to plot that graphically, it would just be some line. And what the line actually looks like depends upon these weights. x1 and x2 are the inputs, but these weights are really what determine the shape of that line, the slope of that line, and what that line actually looks like. So we then would like to figure out what these weights should be. We can choose whatever weights we want, but we want to choose weights in such a way that if you pass in a rainy day’s humidity and pressure, then you end up with a result that is greater than or equal to 0. And we would like it such that if we passed into our hypothesis function a not rainy day’s inputs, then the output that we get should be not raining. So before we get there, let’s try and formalize this a little bit more mathematically just to get a sense for how it is that you’ll often see this if you ever go further into supervised machine learning and explore this idea. One thing is that generally for these categories, we’ll sometimes just use the names of the categories like rain and not rain. Often mathematically, if we’re trying to do comparisons between these things, it’s easier just to deal in the world of numbers. So we could just say 1 and 0, 1 for raining, 0 for not raining. So we do all this math. And if the result is greater than or equal to 0, we’ll go ahead and say our hypothesis function outputs 1, meaning raining. And otherwise, it outputs 0, meaning not raining. And oftentimes, this type of expression will instead express using vector mathematics. And all a vector is, if you’re not familiar with the term, is it refers to a sequence of numerical values. You could represent that in Python using a list of numerical values or a tuple with numerical values. And here, we have a couple of sequences of numerical values. One of our vectors, one of our sequences of numerical values, are all of these individual weights, w0, w1, and w2. So we could construct what we’ll call a weight vector, and we’ll see why this is useful in a moment, called w, generally represented using a boldface w, that is just a sequence of these three weights, weight 0, weight 1, and weight 2. And to be able to calculate, based on those weights, whether we think a day is raining or not raining, we’re going to multiply each of those weights by one of our input variables. That w2, this weight, is going to be multiplied by input variable x2. w1 is going to be multiplied by input variable x1. And w0, well, it’s not being multiplied by anything. But to make sure the vectors are the same length, and we’ll see why that’s useful in just a second, we’ll just go ahead and say w0 is being multiplied by 1. Because you can multiply by something by 1, and you end up getting the exact same number. So in addition to the weight vector w, we’ll also have an input vector that we’ll call x that has three values, 1, again, because we’re just multiplying w0 by 1 eventually, and then x1 and x2. So here, then, we’ve represented two distinct vectors, a vector of weights that we need to somehow learn. The goal of our machine learning algorithm is to learn what this weight vector is supposed to be. We could choose any arbitrary set of numbers, and it would produce a function that tries to predict rain or not rain, but it probably wouldn’t be very good. What we want to do is come up with a good choice of these weights so that we’re able to do the accurate predictions. And then this input vector represents a particular input to the function, a data point for which we would like to estimate, is that day a rainy day, or is that day a not rainy day? And so that’s going to vary just depending on what input is provided to our function, what it is that we are trying to estimate. And then to do the calculation, we want to calculate this expression here, and it turns out that expression is what we would call the dot product of these two vectors. The dot product of two vectors just means taking each of the terms in the vectors and multiplying them together, w0 multiply it by 1, w1 multiply it by x1, w2 multiply it by x2, and that’s why these vectors need to be the same length. And then we just add all of the results together. So the dot product of w and x, our weight vector and our input vector, that’s just going to be w0 times 1, or just w0, plus w1 times x1, multiplying these two terms together, plus w2 times x2, multiplying those terms together. So we have our weight vector, which we need to figure out. We need our machine learning algorithm to figure out what the weights should be. We have the input vector representing the data point that we’re trying to predict a category for, predict a label for. And we’re able to do that calculation by taking this dot product, which you’ll often see represented in vector form. But if you haven’t seen vectors before, you can think of it as identical to just this mathematical expression, just doing the multiplication, adding the results together, and then seeing whether the result is greater than or equal to 0 or not. This expression here is identical to the expression that we’re calculating to see whether or not that answer is greater than or equal to 0 in this case. And so for that reason, you’ll often see the hypothesis function written as something like this, a simpler representation where the hypothesis takes as input some input vector x, some humidity and pressure for some day. And we want to predict an output like rain or no rain or 1 or 0 if we choose to represent things numerically. And the way we do that is by taking the dot product of the weights and our input. If it’s greater than or equal to 0, we’ll go ahead and say the output is 1. Otherwise, the output is going to be 0. And this hypothesis, we say, is parameterized by the weights. Depending on what weights we choose, we’ll end up getting a different hypothesis. If we choose the weights randomly, we’re probably not going to get a very good hypothesis function. We’ll get a 1 or a 0. But it’s probably not accurately going to reflect whether we think a day is going to be rainy or not rainy. But if we choose the weights right, we can often do a pretty good job of trying to estimate whether we think the output of the function should be a 1 or a 0. And so the question, then, is how to figure out what these weights should be, how to be able to tune those parameters. And there are a number of ways you can do that. One of the most common is known as the perceptron learning rule. And we’ll see more of this later. But the idea of the perceptron learning rule, and we’re not going to get too deep into the mathematics, we’ll mostly just introduce it more conceptually, is to say that given some data point that we would like to learn from, some data point that has an input x and an output y, where y is like 1 for rain or 0 for not rain, then we’re going to update the weights. And we’ll look at the formula in just a moment. But the big picture idea is that we can start with random weights, but then learn from the data. Take the data points one at a time. And for each one of the data points, figure out, all right, what parameters do we need to change inside of the weights in order to better match that input point. And so that is the value of having access to a lot of data in the supervised machine learning algorithm, is that you take each of the data points and maybe look at them multiple times and constantly try and figure out whether you need to shift your weights in order to better create some weight vector that is able to correctly or more accurately try to estimate what the output should be, whether we think it’s going to be raining or whether we think it’s not going to be raining. So what does that weight update look like? Without going into too much of the mathematics, we’re going to update each of the weights to be the result of the original weight plus some additional expression. And to understand this expression, y, well, y is what the actual output is. And hypothesis of x, the input, that’s going to be what we thought the input was. And so I can replace this by saying what the actual value was minus what our estimate was. And based on the difference between the actual value and what our estimate was, we might want to change our hypothesis, change the way that we do that estimation. If the actual value and the estimate were the same thing, meaning we were correctly able to predict what category this data point belonged to, well, then actual value minus estimate, that’s just going to be 0, which means this whole term on the right-hand side goes to be 0, and the weight doesn’t change. Weight i, where i is like weight 1 or weight 2 or weight 0, weight i just stays at weight i. And none of the weights change if we were able to correctly predict what category the input belonged to. But if our hypothesis didn’t correctly predict what category the input belonged to, well, then maybe then we need to make some changes, adjust the weights so that we’re better able to predict this kind of data point in the future. And what is the way we might do that? Well, if the actual value was bigger than the estimate, then, and for now we’ll go ahead and assume that these x’s are positive values, then if the actual value was bigger than the estimate, well, that means we need to increase the weight in order to make it such that the output is bigger, and therefore we’re more likely to get to the right actual value. And so if the actual value is bigger than the estimate, then actual value minus estimate, that’ll be a positive number. And so you imagine we’re just adding some positive number to the weight just to increase it ever so slightly. And likewise, the inverse case is true, that if the actual value was less than the estimate, the actual value was 0, but we estimated 1, meaning it actually was not raining, but we predicted it was going to be raining. Well, then we want to decrease the value of the weight, because then in that case, we want to try and lower the total value of computing that dot product in order to make it less likely that we would predict that it would actually be raining. So no need to get too deep into the mathematics of that, but the general idea is that every time we encounter some data point, we can adjust these weights accordingly to try and make the weights better line up with the actual data that we have access to. And you can repeat this process with data point after data point until eventually, hopefully, your algorithm converges to some set of weights that do a pretty good job of trying to figure out whether a day is going to be rainy or not raining. And just as a final point about this particular equation, this value alpha here is generally what we’ll call the learning rate. It’s just some parameter, some number we choose for how quickly we’re actually going to be updating these weight values. So that if alpha is bigger, then we’re going to update these weight values by a lot. And if alpha is smaller, then we’ll update the weight values by less. And you can choose a value of alpha. Depending on the problem, different values might suit the situation better or worse than others. So after all of that, after we’ve done this training process of take all this data and using this learning rule, look at all the pieces of data and use each piece of data as an indication to us of do the weights stay the same, do we increase the weights, do we decrease the weights, and if so, by how much? What you end up with is effectively a threshold function. And we can look at what the threshold function looks like like this. On the x-axis here, we have the output of that function, taking the weights, taking the dot product of it with the input. And on the y-axis, we have what the output is going to be, 0, which in this case represented not raining, and 1, which in this case represented raining. And the way that our hypothesis function works is it calculates this value. And if it’s greater than 0 or greater than some threshold value, then we declare that it’s a rainy day. And otherwise, we declare that it’s a not rainy day. And this then graphically is what that function looks like, that initially when the value of this dot product is small, it’s not raining, it’s not raining, it’s not raining. But as soon as it crosses that threshold, we suddenly say, OK, now it’s raining, now it’s raining, now it’s raining. And the way to interpret this kind of representation is that anything on this side of the line, that would be the category of data points where we say, yes, it’s raining. Anything that falls on this side of the line are the data points where we would say, it’s not raining. And again, we want to choose some value for the weights that results in a function that does a pretty good job of trying to do this estimation. But one tricky thing with this type of hard threshold is that it only leaves two possible outcomes. We plug in some data as input. And the output we get is raining or not raining. And there’s no room for anywhere in between. And maybe that’s what you want. Maybe all you want is given some data point, you would like to be able to classify it into one or two or more of these various different categories. But it might also be the case that you care about knowing how strong that prediction is, for example. So if we go back to this instance here, where we have rainy days on this side of the line, not rainy days on that side of the line, you might imagine that let’s look now at these two white data points. This data point here that we would like to predict a label or a category for. And this data point over here that we would also like to predict a label or a category for. It seems likely that you could pretty confidently say that this data point, that should be a rainy day. Seems close to the other rainy days if we’re going by the nearest neighbor strategy. It’s on this side of the line if we’re going by the strategy of just saying, which side of the line does it fall on by figuring out what those weights should be. And if we’re using the line strategy of just which side of the line does it fall on, which side of this decision boundary, well, we’d also say that this point here is also a rainy day because it falls on the side of the line that corresponds to rainy days. But it’s likely that even in this case, we would know that we don’t feel nearly as confident about this data point on the left as compared to this data point on the right. That for this one on the right, we can feel very confident that yes, it’s a rainy day. This one, it’s pretty close to the line if we’re judging just by distance. And so you might be less sure. But our threshold function doesn’t allow for a notion of less sure or more sure about something. It’s what we would call a hard threshold. It’s once you’ve crossed this line, then immediately we say, yes, this is going to be a rainy day. Anywhere before it, we’re going to say it’s not a rainy day. And that may not be helpful in a number of cases. One, this is not a particularly easy function to deal with. As you get deeper into the world of machine learning and are trying to do things like taking derivatives of these curves with this type of function makes things challenging. But the other challenge is that we don’t really have any notion of gradation between things. We don’t have a notion of yes, this is a very strong belief that it’s going to be raining as opposed to it’s probably more likely than not that it’s going to be raining, but maybe not totally sure about that either. So what we can do by taking advantage of a technique known as logistic regression is instead of using this hard threshold type of function, we can use instead a logistic function, something we might call a soft threshold. And that’s going to transform this into looking something a little more like this, something that more nicely curves. And as a result, the possible output values are no longer just 0 and 1, 0 for not raining, 1 for raining. But you can actually get any real numbered value between 0 and 1. But if you’re way over on this side, then you get a value of 0. OK, it’s not going to be raining, and we’re pretty sure about that. And if you’re over on this side, you get a value of 1. And yes, we’re very sure that it’s going to be raining. But in between, you could get some real numbered value, where a value like 0.7 might mean we think it’s going to rain. It’s more probable that it’s going to rain than not based on the data. But we’re not as confident as some of the other data points might be. So one of the advantages of the soft threshold is that it allows us to have an output that could be some real number that potentially reflects some sort of probability, the likelihood that we think that this particular data point belongs to that particular category. And there are some other nice mathematical properties of that as well. So that then is two different approaches to trying to solve this type of classification problem. One is this nearest neighbor type of approach, where you just take a data point and look at the data points that are nearby to try and estimate what category we think it belongs to. And the other approach is the approach of saying, all right, let’s just try and use linear regression, figure out what these weights should be, adjust the weights in order to figure out what line or what decision boundary is going to best separate these two categories. It turns out that another popular approach, a very popular approach if you just have a data set and you want to start trying to do some learning on it, is what we call the support vector machine. And we’re not going to go too much into the mathematics of the support vector machine, but we’ll at least explore it graphically to see what it is that it looks like. And the idea or the motivation behind the support vector machine is the idea that there are actually a lot of different lines that we could draw, a lot of different decision boundaries that we could draw to separate two groups. So for example, I had the red data points over here and the blue data points over here. One possible line I could draw is a line like this, that this line here would separate the red points from the blue points. And it does so perfectly. All the red points are on one side of the line. All the blue points are on the other side of the line. But this should probably make you a little bit nervous. If you come up with a model and the model comes up with a line that looks like this. And the reason why is that you worry about how well it’s going to generalize to other data points that are not necessarily in the data set that we have access to. For example, if there was a point that fell like right here, for example, on the right side of the line, well, then based on that, we might want to guess that it is, in fact, a red point, but it falls on the side of the line where instead we would estimate that it’s a blue point instead. And so based on that, this line is probably not a great choice just because it is so close to these various data points. We might instead prefer like a diagonal line that just goes diagonally through the data set like we’ve seen before. But there too, there’s a lot of diagonal lines that we could draw as well. For example, I could draw this diagonal line here, which also successfully separates all the red points from all of the blue points. From the perspective of something like just trying to figure out some setting of weights that allows us to predict the correct output, this line will predict the correct output for this particular set of data every single time because the red points are on one side, the blue points are on the other. But yet again, you should probably be a little nervous because this line is so close to these red points, even though we’re able to correctly predict on the input data, if there was a point that fell somewhere in this general area, our algorithm, this model, would say that, yeah, we think it’s a blue point, when in actuality, it might belong to the red category instead just because it looks like it’s close to the other red points. What we really want to be able to say, given this data, how can you generalize this as best as possible, is to come up with a line like this that seems like the intuitive line to draw. And the reason why it’s intuitive is because it seems to be as far apart as possible from the red data and the blue data. So that if we generalize a little bit and assume that maybe we have some points that are different from the input but still slightly further away, we can still say that something on this side probably red, something on that side probably blue, and we can make those judgments that way. And that is what support vector machines are designed to do. They’re designed to try and find what we call the maximum margin separator, where the maximum margin separator is just some boundary that maximizes the distance between the groups of points rather than come up with some boundary that’s very close to one set or the other, where in the case before, we wouldn’t have cared. As long as we’re categorizing the input well, that seems all we need to do. The support vector machine will try and find this maximum margin separator, some way of trying to maximize that particular distance. And it does so by finding what we call the support vectors, which are the vectors that are closest to the line, and trying to maximize the distance between the line and those particular points. And it works that way in two dimensions. It also works in higher dimensions, where we’re not looking for some line that separates the two data points, but instead looking for what we generally call a hyperplane, some decision boundary, effectively, that separates one set of data from the other set of data. And this ability of support vector machines to work in higher dimensions actually has a number of other applications as well. But one is that it helpfully deals with cases where data may not be linearly separable. So we talked about linear separability before, this idea that you can take data and just draw a line or some linear combination of the inputs that allows us to perfectly separate the two sets from each other. There are some data sets that are not linearly separable. And some were even two. You would not be able to find a good line at all that would try to do that kind of separation. Something like this, for example. Or if you imagine here are the red points and the blue points around it. If you try to find a line that divides the red points from the blue points, it’s actually going to be difficult, if not impossible, to do that any line you choose, well, if you draw a line here, then you ignore all of these blue points that should actually be blue and not red. Anywhere else you draw a line, there’s going to be a lot of error, a lot of mistakes, a lot of what we’ll soon call loss to that line that you draw, a lot of points that you’re going to categorize incorrectly. What we really want is to be able to find a better decision boundary that may not be just a straight line through this two dimensional space. And what support vector machines can do is they can begin to operate in higher dimensions and be able to find some other decision boundary, like the circle in this case, that actually is able to separate one of these sets of data from the other set of data a lot better. So oftentimes in data sets where the data is not linearly separable, support vector machines by working in higher dimensions can actually figure out a way to solve that kind of problem effectively. So that then, three different approaches to trying to solve these sorts of problems. We’ve seen support vector machines. We’ve seen trying to use linear regression and the perceptron learning rule to be able to figure out how to categorize inputs and outputs. We’ve seen the nearest neighbor approach. No one necessarily better than any other again. It’s going to depend on the data set, the information you have access to. It’s going to depend on what the function looks like that you’re ultimately trying to predict. And this is where a lot of research and experimentation can be involved in trying to figure out how it is to best perform that kind of estimation. But classification is only one of the tasks that you might encounter in supervised machine learning. Because in classification, what we’re trying to predict is some discrete category. We’re trying to predict red or blue, rain or not rain, authentic or counterfeit. But sometimes what we want to predict is a real numbered value. And for that, we have a related problem, not classification, but instead known as regression. And regression is the supervised learning problem where we try and learn a function mapping inputs to outputs same as before. But instead of the outputs being discrete categories, things like rain or not rain, in a regression problem, the output values are generally continuous values, some real number that we would like to predict. This happens all the time as well. You might imagine that a company might take this approach if it’s trying to figure out, for instance, what the effect of its advertising is. How do advertising dollars spent translate into sales for the company’s product, for example? And so they might like to try to predict some function that takes as input the amount of money spent on advertising. And here, we’re just going to use one input. But again, you could scale this up to many more inputs as well if you have a lot of different kinds of data you have access to. And the goal is to learn a function that given this amount of spending on advertising, we’re going to get this amount in sales. And you might judge, based on having access to a whole bunch of data, like for every past month, here is how much we spent on advertising, and here is what sales were. And we would like to predict some sort of hypothesis function that, again, given the amount spent on advertising, we can predict, in this case, some real number, some number estimate of how much sales we expect that company to do in this month or in this quarter or whatever unit of time we’re choosing to measure things in. And so again, the approach to solving this type of problem, we could try using a linear regression type approach where we take this data and we just plot it. On the x-axis, we have advertising dollars spent. On the y-axis, we have sales. And we might just want to try and draw a line that does a pretty good job of trying to estimate this relationship between advertising and sales. And in this case, unlike before, we’re not trying to separate the data points into discrete categories. But instead, in this case, we’re just trying to find a line that approximates this relationship between advertising and sales so that if we want to figure out what the estimated sales are for a particular advertising budget, you just look it up in this line, figure out for this amount of advertising, we would have this amount of sales and just try and make the estimate that way. And so you can try and come up with a line, again, figuring out how to modify the weights using various different techniques to try and make it so that this line fits as well as possible. So with all of these approaches, then, to trying to solve machine learning style problems, the question becomes, how do we evaluate these approaches? How do we evaluate the various different hypotheses that we could come up with? Because each of these algorithms will give us some sort of hypothesis, some function that maps inputs to outputs, and we want to know, how well does that function work? And you can think of evaluating these hypotheses and trying to get a better hypothesis as kind of like an optimization problem. In an optimization problem, as you recall from before, we were either trying to maximize some objective function by trying to find a global maximum, or we were trying to minimize some cost function by trying to find some global minimum. And in the case of evaluating these hypotheses, one thing we might say is that this cost function, the thing we’re trying to minimize, we might be trying to minimize what we would call a loss function. And what a loss function is, is it is a function that is going to estimate for us how poorly our function performs. More formally, it’s like a loss of utility by whenever we predict something that is wrong, that is a loss of utility. That’s going to add to the output of our loss function. And you could come up with any loss function that you want, just some mathematical way of estimating, given each of these data points, given what the actual output is, and given what our projected output is, our estimate, you could calculate some sort of numerical loss for it. But there are a couple of popular loss functions that are worth discussing, just so that you’ve seen them before. When it comes to discrete categories, things like rain or not rain, counterfeit or not counterfeit, one approaches the 0, 1 loss function. And the way that works is for each of the data points, our loss function takes as input what the actual output is, like whether it was actually raining or not raining, and takes our prediction into account. Did we predict, given this data point, that it was raining or not raining? And if the actual value equals the prediction, well, then the 0, 1 loss function will just say the loss is 0. There was no loss of utility, because we were able to predict correctly. And otherwise, if the actual value was not the same thing as what we predicted, well, then in that case, our loss is 1. We lost something, lost some utility, because what we predicted was the output of the function, was not what it actually was. And the goal, then, in a situation like this would be to come up with some hypothesis that minimizes the total empirical loss, the total amount that we’ve lost, if you add up for all these data points what the actual output is and what your hypothesis would have predicted. So in this case, for example, if we go back to classifying days as raining or not raining, and we came up with this decision boundary, how would we evaluate this decision boundary? How much better is it than drawing the line here or drawing the line there? Well, we could take each of the input data points, and each input data point has a label, whether it was raining or whether it was not raining. And we could compare it to the prediction, whether we predicted it would be raining or not raining, and assign it a numerical value as a result. So for example, these points over here, they were all rainy days, and we predicted they would be raining, because they fall on the bottom side of the line. So they have a loss of 0, nothing lost from those situations. And likewise, same is true for some of these points over here, where it was not raining and we predicted it would not be raining either. Where we do have loss are points like this point here and that point there, where we predicted that it would not be raining, but in actuality, it’s a blue point. It was raining. Or likewise here, we predicted that it would be raining, but in actuality, it’s a red point. It was not raining. And so as a result, we miscategorized these data points that we were trying to train on. And as a result, there is some loss here. One loss here, there, here, and there, for a total loss of 4, for example, in this case. And that might be how we would estimate or how we would say that this line is better than a line that goes somewhere else or a line that’s further down, because this line might minimize the loss. So there is no way to do better than just these four points of loss if you’re just drawing a straight line through our space. So the 0, 1 loss function checks. Did we get it right? Did we get it wrong? If we got it right, the loss is 0, nothing lost. If we got it wrong, then our loss function for that data point says 1. And we add up all of those losses across all of our data points to get some sort of empirical loss, how much we have lost across all of these original data points that our algorithm had access to. There are other forms of loss as well that work especially well when we deal with more real valued cases, cases like the mapping between advertising budget and amount that we do in sales, for example. Because in that case, you care not just that you get the number exactly right, but you care how close you were to the actual value. If the actual value is you did like $2,800 in sales and you predicted that you would do $2,900 in sales, maybe that’s pretty good. That’s much better than if you had predicted you’d do $1,000 in sales, for example. And so we would like our loss function to be able to take that into account as well, take into account not just whether the actual value and the expected value are exactly the same, but also take into account how far apart they were. And so for that one approach is what we call L1 loss. L1 loss doesn’t just look at whether actual and predicted are equal to each other, but we take the absolute value of the actual value minus the predicted value. In other words, we just ask how far apart were the actual and predicted values, and we sum that up across all of the data points to be able to get what our answer ultimately is. So what might this actually look like for our data set? Well, if we go back to this representation where we had advertising along the x-axis, sales along the y-axis, our line was our prediction, our estimate for any given amount of advertising, what we predicted sales was going to be. And our L1 loss is just how far apart vertically along the sales axis our prediction was from each of the data points. So we could figure out exactly how far apart our prediction was from each of the data points and figure out as a result of that what our loss is overall for this particular hypothesis just by adding up all of these various different individual losses for each of these data points. And our goal then is to try and minimize that loss, to try and come up with some line that minimizes what the utility loss is by judging how far away our estimate amount of sales is from the actual amount of sales. And turns out there are other loss functions as well. One that’s quite popular is the L2 loss. The L2 loss, instead of just using the absolute value, like how far away the actual value is from the predicted value, it uses the square of actual minus predicted. So how far apart are the actual and predicted value? And it squares that value, effectively penalizing much more harshly anything that is a worse prediction. So you imagine if you have two data points that you predict as being one value away from their actual value, as opposed to one data point that you predict as being two away from its actual value, the L2 loss function will more harshly penalize that one that is two away, because it’s going to square, however, much the differences between the actual value and the predicted value. And depending on the situation, you might want to choose a loss function depending on what you care about minimizing. If you really care about minimizing the error on more outlier cases, then you might want to consider something like this. But if you’ve got a lot of outliers, and you don’t necessarily care about modeling them, then maybe an L1 loss function is preferable. But there are trade-offs here that you need to decide, based on a particular set of data. But what you do run the risk of with any of these loss functions, with anything that we’re trying to do, is a problem known as overfitting. And overfitting is a big problem that you can encounter in machine learning, which happens anytime a model fits too closely with a data set, and as a result, fails to generalize. We would like our model to be able to accurately predict data and inputs and output pairs for the data that we have access to. But the reason we wanted to do so is because we want our model to generalize well to data that we haven’t seen before. I would like to take data from the past year of whether it was raining or not raining, and use that data to generalize it towards the future. Say, in the future, is it going to be raining or not raining? Or if I have a whole bunch of data on what counterfeit and not counterfeit US dollar bills look like in the past when people have encountered them, I’d like to train a computer to be able to, in the future, generalize to other dollar bills that I might see as well. And the problem with overfitting is that if you try and tie yourself too closely to the data set that you’re training your model on, you can end up not generalizing very well. So what does this look like? Well, we might imagine the rainy day and not rainy day example again from here, where the blue points indicate rainy days and the red points indicate not rainy days. And we decided that we felt pretty comfortable with drawing a line like this as the decision boundary between rainy days and not rainy days. So we can pretty comfortably say that points on this side more likely to be rainy days, points on that side more likely to be not rainy days. But the loss, the empirical loss, isn’t zero in this particular case because we didn’t categorize everything perfectly. There was this one outlier, this one day that it wasn’t raining, but yet our model still predicts that it is raining. But that doesn’t necessarily mean our model is bad. It just means the model isn’t 100% accurate. If you really wanted to try and find a hypothesis that resulted in minimizing the loss, you could come up with a different decision boundary. It wouldn’t be a line, but it would look something like this. This decision boundary does separate all of the red points from all of the blue points because the red points fall on this side of this decision boundary, the blue points fall on the other side of the decision boundary. But this, we would probably argue, is not as good of a prediction. Even though it seems to be more accurate based on all of the available training data that we have for training this machine learning model, we might say that it’s probably not going to generalize well. That if there were other data points like here and there, we might still want to consider those to be rainy days because we think this was probably just an outlier. So if the only thing you care about is minimizing the loss on the data you have available to you, you run the risk of overfitting. And this can happen in the classification case. It can also happen in the regression case, that here we predicted what we thought was a pretty good line relating advertising to sales, trying to predict what sales were going to be for a given amount of advertising. But I could come up with a line that does a better job of predicting the training data, and it would be something that looks like this, just connecting all of the various different data points. And now there is no loss at all. Now I’ve perfectly predicted, given any advertising, what sales are. And for all the data available to me, it’s going to be accurate. But it’s probably not going to generalize very well. I have overfit my model on the training data that is available to me. And so in general, we want to avoid overfitting. We’d like strategies to make sure that we haven’t overfit our model to a particular data set. And there are a number of ways that you could try to do this. One way is by examining what it is that we’re optimizing for. In an optimization problem, all we do is we say, there is some cost, and I want to minimize that cost. And so far, we’ve defined that cost function, the cost of a hypothesis, just as being equal to the empirical loss of that hypothesis, like how far away are the actual data points, the outputs, away from what I predicted them to be based on that particular hypothesis. And if all you’re trying to do is minimize cost, meaning minimizing the loss in this case, then the result is going to be that you might overfit, that to minimize cost, you’re going to try and find a way to perfectly match all the input data. And that might happen as a result of overfitting on that particular input data. So in order to address this, you could add something to the cost function. What counts as cost will not just loss, but also some measure of the complexity of the hypothesis. The word the complexity of the hypothesis is something that you would need to define for how complicated does our line look. This is sort of an Occam’s razor-style approach where we want to give preference to a simpler decision boundary, like a straight line, for example, some simpler curve, as opposed to something far more complex that might represent the training data better but might not generalize as well. We’ll generally say that a simpler solution is probably the better solution and probably the one that is more likely to generalize well to other inputs. So we measure what the loss is, but we also measure the complexity. And now that all gets taken into account when we consider the overall cost, that yes, something might have less loss if it better predicts the training data, but if it’s much more complex, it still might not be the best option that we have. And we need to come up with some balance between loss and complexity. And for that reason, you’ll often see this represented as multiplying the complexity by some parameter that we have to choose, parameter lambda in this case, where we’re saying if lambda is a greater value, then we really want to penalize more complex hypotheses. Whereas if lambda is smaller, we’re going to penalize more complex hypotheses a little bit, and it’s up to the machine learning programmer to decide where they want to set that value of lambda for how much do I want to penalize a more complex hypothesis that might fit the data a little better. And again, there’s no one right answer to a lot of these things, but depending on the data set, depending on the data you have available to you and the problem you’re trying to solve, your choice of these parameters may vary, and you may need to experiment a little bit to figure out what the right choice of that is ultimately going to be. This process, then, of considering not only loss, but also some measure of the complexity is known as regularization. Regularization is the process of penalizing a hypothesis that is more complex in order to favor a simpler hypothesis that is more likely to generalize well, more likely to be able to apply to other situations that are dealing with other input points unlike the ones that we’ve necessarily seen before. So oftentimes, you’ll see us add some regularizing term to what we’re trying to minimize in order to avoid this problem of overfitting. Now, another way of making sure we don’t overfit is to run some experiments and to see whether or not we are able to generalize our model that we’ve created to other data sets as well. And it’s for that reason that oftentimes when you’re doing a machine learning experiment, when you’ve got some data and you want to try and come up with some function that predicts, given some input, what the output is going to be, you don’t necessarily want to do your training on all of the data you have available to you that you could employ a method known as holdout cross-validation, where in holdout cross-validation, we split up our data. We split up our data into a training set and a testing set. The training set is the set of data that we’re going to use to train our machine learning model. And the testing set is the set of data that we’re going to use in order to test to see how well our machine learning model actually performed. So the learning happens on the training set. We figure out what the parameters should be. We figure out what the right model is. And then we see, all right, now that we’ve trained the model, we’ll see how well it does at predicting things inside of the testing set, some set of data that we haven’t seen before. And the hope then is that we’re going to be able to predict the testing set pretty well if we’re able to generalize based on the training data that’s available to us. If we’ve overfit the training data, though, and we’re not able to generalize, well, then when we look at the testing set, it’s likely going to be the case that we’re not going to predict things in the testing set nearly as effectively. So this is one method of cross-validation, validating to make sure that the work we have done is actually going to generalize to other data sets as well. And there are other statistical techniques we can use as well. One of the downsides of this just hold out cross-validation is if you say I just split it 50-50, I train using 50% of the data and test using the other 50%, or you could choose other percentages as well, is that there is a fair amount of data that I am now not using to train, that I might be able to get a better model as a result, for example. So one approach is known as k-fold cross-validation. In k-fold cross-validation, rather than just divide things into two sets and run one experiment, we divide things into k different sets. So maybe I divide things up into 10 different sets and then run 10 different experiments. So if I split up my data into 10 different sets of data, then what I’ll do is each time for each of my 10 experiments, I will hold out one of those sets of data, where I’ll say, let me train my model on these nine sets, and then test to see how well it predicts on set number 10. And then pick another set of nine sets to train on, and then test it on the other one that I held out, where each time I train the model on everything minus the one set that I’m holding out, and then test to see how well our model performs on the test that I did hold out. And what you end up getting is 10 different results, 10 different answers for how accurately our model worked. And oftentimes, you could just take the average of those 10 to get an approximation for how well we think our model performs overall. But the key idea is separating the training data from the testing data, because you want to test your model on data that is different from what you trained the model on. Because the training, you want to avoid overfitting. You want to be able to generalize. And the way you test whether you’re able to generalize is by looking at some data that you haven’t seen before and seeing how well we’re actually able to perform. And so if we want to actually implement any of these techniques inside of a programming language like Python, number of ways we could do that. We could write this from scratch on our own, but there are libraries out there that allow us to take advantage of existing implementations of these algorithms, that we can use the same types of algorithms in a lot of different situations. And so there’s a library, very popular one, known as Scikit-learn, which allows us in Python to be able to very quickly get set up with a lot of these different machine learning models. This library has already written an algorithm for nearest neighbor classification, for doing perceptron learning, for doing a bunch of other types of inference and supervised learning that we haven’t yet talked about. But using it, we can begin to try actually testing how these methods work and how accurately they perform. So let’s go ahead and take a look at one approach to trying to solve this type of problem. All right, so I’m first going to pull up banknotes.csv, which is a whole bunch of data provided by UC Irvine, which is information about various different banknotes that people took pictures of various different banknotes and measured various different properties of those banknotes. And in particular, some human categorized each of those banknotes as either a counterfeit banknote or as not counterfeit. And so what you’re looking at here is each row represents one banknote. This is formatted as a CSV spreadsheet, where just comma separated values separating each of these various different fields. We have four different input values for each of these data points, just information, some measurement that was made on the banknote. And what those measurements exactly are aren’t as important as the fact that we do have access to this data. But more importantly, we have access for each of these data points to a label, where 0 indicates something like this was not a counterfeit bill, meaning it was an authentic bill. And a data point labeled 1 means that it is a counterfeit bill, at least according to the human researcher who labeled this particular data. So we have a whole bunch of data representing a whole bunch of different data points, each of which has these various different measurements that were made on that particular bill, and each of which has an output value, 0 or 1, 0 meaning it was a genuine bill, 1 meaning it was a counterfeit bill. And what we would like to do is use supervised learning to begin to predict or model some sort of function that can take these four values as input and predict what the output would be. We want our learning algorithm to find some sort of pattern that is able to predict based on these measurements, something that you could measure just by taking a photo of a bill, predict whether that bill is authentic or whether that bill is counterfeit. And so how can we do that? Well, I’m first going to open up banknote0.py and see how it is that we do this. I’m first importing a lot of things from Scikit-learn, but importantly, I’m going to set my model equal to the perceptron model, which is one of those models that we talked about before. We’re just going to try and figure out some setting of weights that is able to divide our data into two different groups. Then I’m going to go ahead and read data in for my file from banknotes.csv. And basically, for every row, I’m going to separate that row into the first four values of that row, which is the evidence for that row. And then the label, where if the final column in that row is a 0, the label is authentic. And otherwise, it’s going to be counterfeit. So I’m effectively reading data in from the CSV file, dividing into a whole bunch of rows where each row has some evidence, those four input values that are going to be inputs to my hypothesis function. And then the label, the output, whether it is authentic or counterfeit, that is the thing that I am then trying to predict. So the next step is that I would like to split up my data set into a training set and a testing set, some set of data that I would like to train my machine learning model on, and some set of data that I would like to use to test that model, see how well it performed. So what I’ll do is I’ll go ahead and figure out length of the data, how many data points do I have. I’ll go ahead and take half of them, save that number as a number called holdout. That is how many items I’m going to hold out for my data set to save for the testing phase. I’ll randomly shuffle the data so it’s in some random order. And then I’ll say my testing set will be all of the data up to the holdout. So I’ll take holdout many data items, and that will be my testing set. My training data will be everything else, the information that I’m going to train my model on. And then I’ll say I need to divide my training data into two different sets. I need to divide it into my x values, where x here represents the inputs. So the x values, the x values that I’m going to train on, are basically for every row in my training set, I’m going to get the evidence for that row, those four values, where it’s basically a vector of four numbers, where that is going to be all of the input. And then I need the y values. What are the outputs that I want to learn from, the labels that belong to each of these various different input points? Well, that’s going to be the same thing for each row in the training data. But this time, I take that row and get what its label is, whether it is authentic or counterfeit. So I end up with one list of all of these vectors of my input data, and one list, which follows the same order, but is all of the labels that correspond with each of those vectors. And then to train my model, which in this case is just this perceptron model, I just call model.fit, pass in the training data, and what the labels for those training data are. And scikit-learn will take care of fitting the model, will do the entire algorithm for me. And then when it’s done, I can then test to see how well that model performed. So I can say, let me get all of these input vectors for what I want to test on. So for each row in my testing data set, go ahead and get the evidence. And the y values, those are what the actual values were for each of the rows in the testing data set, what the actual label is. But then I’m going to generate some predictions. I’m going to use this model and try and predict, based on the testing vectors, I want to predict what the output is. And my goal then is to now compare y testing with predictions. I want to see how well my predictions, based on the model, actually reflect what the y values were, what the output is, that were actually labeled. Because I now have this label data, I can assess how well the algorithm worked. And so now I can just compute how well we did. I’m going to, this zip function basically just lets me look through two different lists, one by one at the same time. So for each actual value and for each predicted value, if the actual is the same thing as what I predicted, I’ll go ahead and increment the counter by one. Otherwise, I’ll increment my incorrect counter by one. And so at the end, I can print out, here are the results, here’s how many I got right, here’s how many I got wrong, and here was my overall accuracy, for example. So I can go ahead and run this. I can run python banknote0.py. And it’s going to train on half the data set and then test on half the data set. And here are the results for my perceptron model. In this case, it correctly was able to classify 679 bills as correctly either authentic or counterfeit and incorrectly classified seven of them for an overall accuracy of close to 99% accurate. So on this particular data set, using this perceptron model, we were able to predict very well what the output was going to be. And we can try different models, too, that scikit-learn makes it very easy just to swap out one model for another model. So instead of the perceptron model, I can use the support vector machine using the SVC, otherwise known as a support vector classifier, using a support vector machine to classify things into two different groups. And now see, all right, how well does this perform? And all right, this time, we were able to correctly predict 682 and incorrectly predicted four for accuracy of 99.4%. And we could even try the k-neighbors classifier as the model instead. And this takes a parameter, n neighbors, for how many neighbors do you want to look at? Let’s just look at one neighbor, the one nearest neighbor, and use that to predict. Go ahead and run this as well. And it looks like, based on the k-neighbors classifier, looking at just one neighbor, we were able to correctly classify 685 data points, incorrectly classified one. Maybe let’s try three neighbors instead, instead of just using one neighbor. Do more of a k-nearest neighbors approach, where I look at the three nearest neighbors and see how that performs. And that one, in this case, seems to have gotten 100% of all of the predictions correctly described as either authentic banknotes or as counterfeit banknotes. And we could run these experiments multiple times, because I’m randomly reorganizing the data every time. We’re technically training these on slightly different data sets. And so you might want to run multiple experiments to really see how well they’re actually going to perform. But in short, they all perform very well. And while some of them perform slightly better than others here, that might not always be the case for every data set. But you can begin to test now by very quickly putting together these machine learning models using Scikit-learn to be able to train on some training set and then test on some testing set as well. And this splitting up into training groups and testing groups and testing happens so often that Scikit-learn has functions built in for trying to do it. I did it all by hand just now. But if we take a look at banknotes one, we take advantage of some other features that exist in Scikit-learn, where we can really simplify a lot of our logic, that there is a function built into Scikit-learn called train test split, which will automatically split data into a training group and a testing group. I just have to say what proportion should be in the testing group, something like 0.5, half the data inside the testing group. Then I can fit the model on the training data, make the predictions on the testing data, and then just count up. And Scikit-learn has some nice methods for just counting up how many times our testing data match the predictions, how many times our testing data didn’t match the predictions. So very quickly, you can write programs with not all that many lines of code. It’s maybe like 40 lines of code to get through all of these predictions. And then as a result, see how well we’re able to do. So these types of libraries can allow us, without really knowing the implementation details of these algorithms, to be able to use the algorithms in a very practical way to be able to solve these types of problems. So that then was supervised learning, this task of given a whole set of data, some input output pairs, we would like to learn some function that maps those inputs to those outputs. But turns out there are other forms of learning as well. And another popular type of machine learning, especially nowadays, is known as reinforcement learning. And the idea of reinforcement learning is rather than just being given a whole data set at the beginning of input output pairs, reinforcement learning is all about learning from experience. In reinforcement learning, our agent, whether it’s like a physical robot that’s trying to make actions in the world or just some virtual agent that is a program running somewhere, our agent is going to be given a set of rewards or punishments in the form of numerical values. But you can think of them as reward or punishment. And based on that, it learns what actions to take in the future, that our agent, our AI, will be put in some sort of environment. It will make some actions. And based on the actions that it makes, it learns something. It either gets a reward when it does something well, it gets a punishment when it does something poorly, and it learns what to do or what not to do in the future based on those individual experiences. And so what this will often look like is it will often start with some agent, some AI, which might, again, be a physical robot, if you’re imagining a physical robot moving around, but it can also just be a program. And our agent is situated in their environment, where the environment is where they’re going to make their actions, and it’s what’s going to give them rewards or punishments for various actions that they’re in. So for example, the environment is going to start off by putting our agent inside of a state. Our agent has some state that, in a game, might be the state of the game that the agent is playing. In a world that the agent is exploring might be some position inside of a grid representing the world that they’re exploring. But the agent is in some sort of state. And in that state, the agent needs to choose to take an action. The agent likely has multiple actions they can choose from, but they pick an action. So they take an action in a particular state. And as a result of that, the agent will generally get two things in response as we model them. The agent gets a new state that they find themselves in. After being in this state, taking one action, they end up in some other state. And they’re also given some sort of numerical reward, positive meaning reward, meaning it was a good thing, negative generally meaning they did something bad, they received some sort of punishment. And that is all the information the agent has. It’s told what state it’s in. It makes some sort of action. And based on that, it ends up in another state. And it ends up getting some particular reward. And it needs to learn, based on that information, what actions to begin to take in the future. And so you could imagine generalizing this to a lot of different situations. This is oftentimes how you train if you’ve ever seen those robots that are now able to walk around the way humans do. It would be quite difficult to program the robot in exactly the right way to get it to walk the way humans do. You could instead train it through reinforcement learning, give it some sort of numerical reward every time it does something good, like take steps forward, and punish it every time it does something bad, like fall over, and then let the AI just learn based on that sequence of rewards, based on trying to take various different actions. You can begin to have the agent learn what to do in the future and what not to do. So in order to begin to formalize this, the first thing we need to do is formalize this notion of what we mean about states and actions and rewards, like what does this world look like? And oftentimes, we’ll formulate this world as what’s known as a Markov decision process, similar in spirit to Markov chains, which you might recall from before. But a Markov decision process is a model that we can use for decision making, for an agent trying to make decisions in its environment. And it’s a model that allows us to represent the various different states that an agent can be in, the various different actions that they can take, and also what the reward is for taking one action as opposed to another action. So what then does it actually look like? Well, if you recall a Markov chain from before, a Markov chain looked a little something like this, where we had a whole bunch of these individual states, and each state immediately transitioned to another state based on some probability distribution. We saw this in the context of the weather before, where if it was sunny, we said with some probability, it’ll be sunny the next day. With some other probability, it’ll be rainy, for example. But we could also imagine generalizing this. It’s not just sun and rain anymore. We just have these states, where one state leads to another state according to some probability distribution. But in this original model, there was no agent that had any control over this process. It was just entirely probability based, where with some probability, we moved to this next state. But maybe it’s going to be some other state with some other probability. What we’ll now have is the ability for the agent in this state to choose from a set of actions, where maybe instead of just one path forward, they have three different choices of actions that each lead up down different paths. And even this is a bit of an oversimplification, because in each of these states, you might imagine more branching points where there are more decisions that can be taken as well. So we’ve extended the Markov chain to say that from a state, you now have available action choices. And each of those actions might be associated with its own probability distribution of going to various different states. Then in addition, we’ll add another extension, where any time you move from a state, taking an action, going into this other state, we can associate a reward with that outcome, saying either r is positive, meaning some positive reward, or r is negative, meaning there was some sort of punishment. And this then is what we’ll consider to be a Markov decision process. That a Markov decision process has some initial set of states, of states in the world that we can be in. We have some set of actions that, given a state, I can say, what are the actions that are available to me in that state, an action that I can choose from? Then we have some transition model. The transition model before just said that, given my current state, what is the probability that I end up in that next state or this other state? The transition model now has effectively two things we’re conditioning on. We’re saying, given that I’m in this state and that I take this action, what’s the probability that I end up in this next state? Now maybe we live in a very deterministic world in this Markov decision process. We’re given a state and given an action. We know for sure what next state we’ll end up in. But maybe there’s some randomness in the world that when you take in a state and you take an action, you might not always end up in the exact same state. There might be some probabilities involved there as well. The Markov decision process can handle both of those possible cases. And then finally, we have a reward function, generally called r, that in this case says, what is the reward for being in this state, taking this action, and then getting to s prime this next state? So I’m in this original state. I take this action. I get to this next state. What is the reward for doing that process? And you can add up these rewards every time you take an action to get the total amount of rewards that an agent might get from interacting in a particular environment modeled using this Markov decision process. So what might this actually look like in practice? Well, let’s just create a little simulated world here where I have this agent that is just trying to navigate its way. This agent is this yellow dot here, like a robot in the world, trying to navigate its way through this grid. And ultimately, it’s trying to find its way to the goal. And if it gets to the green goal, then it’s going to get some sort of reward. But then we might also have some red squares that are places where you get some sort of punishment, some bad place where we don’t want the agent to go. And if it ends up in the red square, then our agent is going to get some sort of punishment as a result of that. But the agent originally doesn’t know all of these details. It doesn’t know that these states are associated with punishments. But maybe it does know that this state is associated with a reward. Maybe it doesn’t. But it just needs to sort of interact with the environment to try and figure out what to do and what not to do. So the first thing the agent might do is, given no additional information, if it doesn’t know what the punishments are, it doesn’t know where the rewards are, it just might try and take an action. And it takes an action and ends up realizing that it got some sort of punishment. And so what does it learn from that experience? Well, it might learn that when you’re in this state in the future, don’t take the action move to the right, that that is a bad action to take. That in the future, if you ever find yourself back in the state, don’t take this action of going to the right when you’re in this particular state, because that leads to punishment. That might be the intuition at least. And so you could try doing other actions. You move up, all right, that didn’t lead to any immediate rewards. Maybe try something else. Then maybe try something else. And all right, now you found that you got another punishment. And so you learn something from that experience. So the next time you do this whole process, you know that if you ever end up in this square, you shouldn’t take the down action, because being in this state and taking that action ultimately leads to some sort of punishment, a negative reward, in other words. And this process repeats. You might imagine just letting our agent explore the world, learning over time what states tend to correspond with poor actions, learning over time what states correspond with poor actions, until eventually, if it tries enough things randomly, it might find that eventually when you get to this state, if you take the up action in this state, it might find that you actually get a reward from that. And what it can learn from that is that if you’re in this state, you should take the up action, because that leads to a reward. And over time, you can also learn that if you’re in this state, you should take the left action, because that leads to this state that also lets you eventually get to the reward. So you begin to learn over time not only which actions are good in particular states, but also which actions are bad, such that once you know some sequence of good actions that leads you to some sort of reward, our agent can just follow those instructions, follow the experience that it has learned. We didn’t tell the agent what the goal was. We didn’t tell the agent where the punishments were. But the agent can begin to learn from this experience and learn to begin to perform these sorts of tasks better in the future. And so let’s now try to formalize this idea, formalize the idea that we would like to be able to learn in this state taking this action, is that a good thing or a bad thing? There are lots of different models for reinforcement learning. We’re just going to look at one of them today. And the one that we’re going to look at is a method known as Q-learning. And what Q-learning is all about is about learning a function, a function Q, that takes inputs S and A, where S is a state and A is an action that you take in that state. And what this Q function is going to do is it is going to estimate the value. How much reward will I get from taking this action in this state? Originally, we don’t know what this Q function should be. But over time, based on experience, based on trying things out and seeing what the result is, I would like to try and learn what Q of SA is for any particular state and any particular action that I might take in that state. So what is the approach? Well, the approach originally is we’ll start with Q SA equal to 0 for all states S and for all actions A. That initially, before I’ve ever started anything, before I’ve had any experiences, I don’t know the value of taking any action in any given state. So I’m going to assume that the value is just 0 all across the board. But then as I interact with the world, as I experience rewards or punishments, or maybe I go to a cell where I don’t get either reward or a punishment, I want to somehow update my estimate of Q SA. I want to continually update my estimate of Q SA based on the experiences and rewards and punishments that I’ve received, such that in the future, my knowledge of what actions are good and what states will be better. So when we take an action and receive some sort of reward, I want to estimate the new value of Q SA. And I estimate that based on a couple of different things. I estimate it based on the reward that I’m getting from taking this action and getting into the next state. But assuming the situation isn’t over, assuming there are still future actions that I might take as well, I also need to take into account the expected future rewards. That if you imagine an agent interacting with the environment, then sometimes you’ll take an action and get a reward, but then you can keep taking more actions and get more rewards, that these both are relevant, both the current reward I’m getting from this current step and also my future reward. And it might be the case that I’ll want to take a step that doesn’t immediately lead to a reward, because later on down the line, I know it will lead to more rewards as well. So there’s a balancing act between current rewards that the agent experiences and future rewards that the agent experiences as well. And then we need to update QSA. So we estimate the value of QSA based on the current reward and the expected future rewards. And then we need to update this Q function to take into account this new estimate. Now, we already, as we go through this process, we’ll already have an estimate for what we think the value is. Now we have a new estimate, and then somehow we need to combine these two estimates together, and we’ll look at more formal ways that we can actually begin to do that. So to actually show you what this formula looks like, here is the approach we’ll take with Q learning. We’re going to, again, start with Q of S and A being equal to 0 for all states. And then every time we take an action A in state S and observer reward R, we’re going to update our value, our estimate, for Q of SA. And the idea is that we’re going to figure out what the new value estimate is minus what our existing value estimate is. And so we have some preconceived notion for what the value is for taking this action in this state. Maybe our expectation is we currently think the value is 10. But then we’re going to estimate what we now think it’s going to be. Maybe the new value estimate is something like 20. So there’s a delta of 10 that our new value estimate is 10 points higher than what our current value estimate happens to be. And so we have a couple of options here. We need to decide how much we want to adjust our current expectation of what the value is of taking this action in this particular state. And what that difference is, how much we add or subtract from our existing notion of how much do we expect the value to be, is dependent on this parameter alpha, also called a learning rate. And alpha represents, in effect, how much we value new information compared to how much we value old information. An alpha value of 1 means we really value new information. But if we have a new estimate, then it doesn’t matter what our old estimate is. We’re only going to consider our new estimate because we always just want to take into consideration our new information. So the way that works is that if you imagine alpha being 1, well, then we’re taking the old value of QSA and then adding 1 times the new value minus the old value. And that just leaves us with the new value. So when alpha is 1, all we take into consideration is what our new estimate happens to be. But over time, as we go through a lot of experiences, we already have some existing information. We might have tried taking this action nine times already. And now we just tried it a 10th time. And we don’t only want to consider this 10th experience. I also want to consider the fact that my prior nine experiences, those were meaningful, too. And that’s data I don’t necessarily want to lose. And so this alpha controls that decision, controls how important is the new information. 0 would mean ignore all the new information. Just keep this Q value the same. 1 means replace the old information entirely with the new information. And somewhere in between, keep some sort of balance between these two values. We can put this equation a little bit more formally as well. The old value estimate is our old estimate for what the value is of taking this action in a particular state. That’s just Q of SNA. So we have it once here, and we’re going to add something to it. We’re going to add alpha times the new value estimate minus the old value estimate. But the old value estimate, we just look up by calling this Q function. And what then is the new value estimate? Based on this experience we have just taken, what is our new estimate for the value of taking this action in this particular state? Well, it’s going to be composed of two parts. It’s going to be composed of what reward did I just get from taking this action in this state. And then it’s going to be, what can I expect my future rewards to be from this point forward? So it’s going to be R, some reward I’m getting right now, plus whatever I estimate I’m going to get in the future. And how do I estimate what I’m going to get in the future? Well, it’s a bit of another call to this Q function. It’s going to be take the maximum across all possible actions I could take next and say, all right, of all of these possible actions I could take, which one is going to have the highest reward? And so this then looks a little bit complicated. This is going to be our notion for how we’re going to perform this kind of update. I have some estimate, some old estimate, for what the value is of taking this action in this state. And I’m going to update it based on new information that I experience some reward. I predict what my future reward is going to be. And using that I update what I estimate the reward will be for taking this action in this particular state. And there are other additions you might make to this algorithm as well. Sometimes it might not be the case that future rewards you want to wait equally to current rewards. Maybe you want an agent that values reward now over reward later. And so sometimes you can even add another term in here, some other parameter, where you discount future rewards and say future rewards are not as valuable as rewards immediately. That getting reward in the current time step is better than waiting a year and getting rewards later. But that’s something up to the programmer to decide what that parameter ought to be. But the big picture idea of this entire formula is to say that every time we experience some new reward, we take that into account. We update our estimate of how good is this action. And then in the future, we can make decisions based on that algorithm. Once we have some good estimate for every state and for every action, what the value is of taking that action, then we can do something like implement a greedy decision making policy. That if I am in a state and I want to know what action should I take in that state, well, then I consider for all of my possible actions, what is the value of QSA? What is my estimated value of taking that action in that state? And I will just pick the action that has the highest value after I evaluate that expression. So I pick the action that has the highest value. And based on that, that tells me what action I should take. At any given state that I’m in, I can just greedily say across all my actions, this action gives me the highest expected value. And so I’ll go ahead and choose that action as the action that I take as well. But there is a downside to this kind of approach. And then downside comes up in a situation like this, where we know that there is some solution that gets me to the reward. And our agent has been able to figure that out. But it might not necessarily be the best way or the fastest way. If the agent is allowed to explore a little bit more, it might find that it can get the reward faster by taking some other route instead, by going through this particular path that is a faster way to get to that ultimate goal. And maybe we would like for the agent to be able to figure that out as well. But if the agent always takes the actions that it knows to be best, well, when it gets to this particular square, it doesn’t know that this is a good action because it’s never really tried it. But it knows that going down eventually leads its way to this reward. So it might learn in the future that it should just always take this route and it’s never going to explore and go along that route instead. So in reinforcement learning, there is this tension between exploration and exploitation. And exploitation generally refers to using knowledge that the AI already has. The AI already knows that this is a move that leads to reward. So we’ll go ahead and use that move. And exploration is all about exploring other actions that we may not have explored as thoroughly before because maybe one of these actions, even if I don’t know anything about it, might lead to better rewards faster or to more rewards in the future. And so an agent that only ever exploits information and never explores might be able to get reward, but it might not maximize its rewards because it doesn’t know what other possibilities are out there, possibilities that we only know about by taking advantage of exploration. And so how can we try and address this? Well, one possible solution is known as the Epsilon greedy algorithm, where we set Epsilon equal to how often we want to just make a random move, where occasionally we will just make a random move in order to say, let’s try to explore and see what happens. And then the logic of the algorithm will be with probability 1 minus Epsilon, choose the estimated best move. In a greedy case, we’d always choose the best move. But in Epsilon greedy, we’re most of the time going to choose the best move or sometimes going to choose the best move. But sometimes with probability Epsilon, we’re going to choose a random move instead. So every time we’re faced with the ability to take an action, sometimes we’re going to choose the best move. Sometimes we’re just going to choose a random move. So this type of algorithm can be quite powerful in a reinforcement learning context by not always just choosing the best possible move right now, but sometimes, especially early on, allowing yourself to make random moves that allow you to explore various different possible states and actions more, and maybe over time, you might decrease your value of Epsilon. More and more often, choosing the best move after you’re more confident that you’ve explored what all of the possibilities actually are. So we can put this into practice. And one very common application of reinforcement learning is in game playing, that if you want to teach an agent how to play a game, you just let the agent play the game a whole bunch. And then the reward signal happens at the end of the game. When the game is over, if our AI won the game, it gets a reward of like 1, for example. And if it lost the game, it gets a reward of negative 1. And from that, it begins to learn what actions are good and what actions are bad. You don’t have to tell the AI what’s good and what’s bad, but the AI figures it out based on that reward. Winning the game is some signal, losing the game is some signal, and based on all of that, it begins to figure out what decisions it should actually make. So one very simple game, which you may have played before, is a game called Nim. And in the game of Nim, you’ve got a whole bunch of objects in a whole bunch of different piles, where here I’ve represented each pile as an individual row. So you’ve got one object in the first pile, three in the second pile, five in the third pile, seven in the fourth pile. And the game of Nim is a two player game where players take turns removing objects from piles. And the rule is that on any given turn, you were allowed to remove as many objects as you want from any one of these piles, any one of these rows. You have to remove at least one object, but you remove as many as you want from exactly one of the piles. And whoever takes the last object loses. So player one might remove four from this pile here. Player two might remove four from this pile here. So now we’ve got four piles left, one, three, one, and three. Player one might remove the entirety of the second pile. Player two, if they’re being strategic, might remove two from the third pile. Now we’ve got three piles left, each with one object left. Player one might remove one from one pile. Player two removes one from the other pile. And now player one is left with choosing this one object from the last pile, at which point player one loses the game. So fairly simple game. Piles of objects, any turn you choose how many objects to remove from a pile, whoever removes the last object loses. And this is the type of game you could encode into an AI fairly easily, because the states are really just four numbers. Every state is just how many objects in each of the four piles. And the actions are things like, how many am I going to remove from each one of these individual piles? And the reward happens at the end, that if you were the player that had to remove the last object, then you get some sort of punishment. But if you were not, and the other player had to remove the last object, well, then you get some sort of reward. So we could actually try and show a demonstration of this, that I’ve implemented an AI to play the game of Nim. All right, so here, what we’re going to do is create an AI as a result of training the AI on some number of games, that the AI is going to play against itself, where the idea is the AI will play games against itself, learn from each of those experiences, and learn what to do in the future. And then I, the human, will play against the AI. So initially, we’ll say train zero times, meaning we’re not going to let the AI play any practice games against itself in order to learn from its experiences. We’re just going to see how well it plays. And it looks like there are four piles. I can choose how many I remove from any one of the piles. So maybe from pile three, I will remove five objects, for example. So now, AI chose to take one item from pile zero. So I’m left with these piles now, for example. And so here, I could choose maybe to say, I would like to remove from pile two, I’ll remove all five of them, for example. And so AI chose to take two away from pile one. Now I’m left with one pile that has one object, one pile that has two objects. So from pile three, I will remove two objects. And now I’ve left the AI with no choice but to take that last one. And so the game is over, and I was able to win. But I did so because the AI was really just playing randomly. It didn’t have any prior experience that it was using in order to make these sorts of judgments. Now let me let the AI train itself on 10,000 games. I’m going to let the AI play 10,000 games of nim against itself. Every time it wins or loses, it’s going to learn from that experience and learn in the future what to do and what not to do. So here then, I’ll go ahead and run this again. And now you see the AI running through a whole bunch of training games, 10,000 training games against itself. And now it’s going to let me make these sorts of decisions. So now I’m going to play against the AI. Maybe I’ll remove one from pile three. And the AI took everything from pile three, so I’m left with three piles. I’ll go ahead and from pile two maybe remove three items. And the AI removes one item from pile zero. I’m left with two piles, each of which has two items in it. I’ll remove one from pile one, I guess. And the AI took two from pile two, leaving me with no choice but to take one away from pile one. So it seems like after playing 10,000 games of nim against itself, the AI has learned something about what states and what actions tend to be good and has begun to learn some sort of pattern for how to predict what actions are going to be good and what actions are going to be bad in any given state. So reinforcement learning can be a very powerful technique for achieving these sorts of game-playing agents, agents that are able to play a game well just by learning from experience, whether that’s playing against other people or by playing against itself and learning from those experiences as well. Now, nim is a bit of an easy game to use reinforcement learning for because there are so few states. There are only states that are as many as how many different objects are in each of these various different piles. You might imagine that it’s going to be harder if you think of a game like chess or games where there are many, many more states and many, many more actions that you can imagine taking, where it’s not going to be as easy to learn for every state and for every action what the value is going to be. So oftentimes in that case, we can’t necessarily learn exactly what the value is for every state and for every action, but we can approximate it. So much as we saw with minimax, so we could use a depth-limiting approach to stop calculating at a certain point in time, we can do a similar type of approximation known as function approximation in a reinforcement learning context where instead of learning a value of q for every state and every action, we just have some function that estimates what the value is for taking this action in this particular state that might be based on various different features of the state that the agent happens to be in, where you might have to choose what those features actually are. But you can begin to learn some patterns that generalize beyond one specific state and one specific action that you can begin to learn if certain features tend to be good things or bad things. Reinforcement learning can allow you, using a very similar mechanism, to generalize beyond one particular state and say, if this other state looks kind of like this state, then maybe the similar types of actions that worked in one state will also work in another state as well. And so this type of approach can be quite helpful as you begin to deal with reinforcement learning that exist in larger and larger state spaces where it’s just not feasible to explore all of the possible states that could actually exist. So there, then, are two of the main categories of reinforcement learning. Supervised learning, where you have labeled input and output pairs, and reinforcement learning, where an agent learns from rewards or punishments that it receives. The third major category of machine learning that we’ll just touch on briefly is known as unsupervised learning. And unsupervised learning happens when we have data without any additional feedback, without labels, that in the supervised learning case, all of our data had labels. We labeled the data point with whether that was a rainy day or not rainy day. And using those labels, we were able to infer what the pattern was. Or we labeled data as a counterfeit banknote or not a counterfeit. And using those labels, we were able to draw inferences and patterns to figure out what does a banknote look like versus not. In unsupervised learning, we don’t have any access to any of those labels. But we still would like to learn some of those patterns. And one of the tasks that you might want to perform in unsupervised learning is something like clustering, where clustering is just the task of, given some set of objects, organize it into distinct clusters, groups of objects that are similar to one another. And there’s lots of applications for clustering. It comes up in genetic research, where you might have a whole bunch of different genes and you want to cluster them into similar genes if you’re trying to analyze them across a population or across species. It comes up in an image if you want to take all the pixels of an image, cluster them into different parts of the image. Comes a lot up in market research if you want to divide your consumers into different groups so you know which groups to target with certain types of product advertisements, for example, and a number of other contexts as well in which clustering can be very applicable. One technique for clustering is an algorithm known as k-means clustering. And what k-means clustering is going to do is it is going to divide all of our data points into k different clusters. And it’s going to do so by repeating this process of assigning points to clusters and then moving around those clusters at centers. We’re going to define a cluster by its center, the middle of the cluster, and then assign points to that cluster based on which center is closest to that point. And I’ll show you an example of that now. Here, for example, I have a whole bunch of unlabeled data, just various data points that are in some sort of graphical space. And I would like to group them into various different clusters. But I don’t know how to do that originally. And let’s say I want to assign like three clusters to this group. And you have to choose how many clusters you want in k-means clustering that you could try multiple and see how well those values perform. But I’ll start just by randomly picking some places to put the centers of those clusters. Maybe I have a blue cluster, a red cluster, and a green cluster. And I’m going to start with the centers of those clusters just being in these three locations here. And what k-means clustering tells us to do is once I have the centers of the clusters, assign every point to a cluster based on which cluster center it is closest to. So we end up with something like this, where all of these points are closer to the blue cluster center than any other cluster center. All of these points here are closer to the green cluster center than any other cluster center. And then these two points plus these points over here, those are all closest to the red cluster center instead. So here then is one possible assignment of all these points to three different clusters. But it’s not great that it seems like in this red cluster, these points are kind of far apart. In this green cluster, these points are kind of far apart. It might not be my ideal choice of how I would cluster these various different data points. But k-means clustering is an iterative process that after I do this, there is a next step, which is that after I’ve assigned all of the points to the cluster center that it is nearest to, we are going to re-center the clusters, meaning take the cluster centers, these diamond shapes here, and move them to the middle, or the average, effectively, of all of the points that are in that cluster. So we’ll take this blue point, this blue center, and go ahead and move it to the middle or to the center of all of the points that were assigned to the blue cluster, moving it slightly to the right in this case. And we’ll do the same thing for red. We’ll move the cluster center to the middle of all of these points, weighted by how many points there are. There are more points over here, so the red center ends up moving a little bit further that way. And likewise, for the green center, there are many more points on this side of the green center. So the green center ends up being pulled a little bit further in this direction. So we re-center all of the clusters, and then we repeat the process. We go ahead and now reassign all of the points to the cluster center that they are now closest to. And now that we’ve moved around the cluster centers, these cluster assignments might change. That this point originally was closer to the red cluster center, but now it’s actually closer to the blue cluster center. Same goes for this point as well. And these three points that were originally closer to the green cluster center are now closer to the red cluster center instead. So we can reassign what colors or which clusters each of these data points belongs to, and then repeat the process again, moving each of these cluster means and the middles of the clusterism to the mean, the average, of all of the other points that happen to be there, and repeat the process again. Go ahead and assign each of the points to the cluster that they are closest to. So once we reach a point where we’ve assigned all the points to clusters to the cluster that they are nearest to, and nothing changed, we’ve reached a sort of equilibrium in this situation, where no points are changing their allegiance. And as a result, we can declare this algorithm is now over. And we now have some assignment of each of these points into three different clusters. And it looks like we did a pretty good job of trying to identify which points are more similar to one another than they are to points in other groups. So we have the green cluster down here, this blue cluster here, and then this red cluster over there as well. And we did so without any access to some labels to tell us what these various different clusters were. We just used an algorithm in an unsupervised sense without any of those labels to figure out which points belonged to which categories. And again, lots of applications for this type of clustering technique. And there are many more algorithms in each of these various different fields within machine learning, supervised and reinforcement and unsupervised. But those are many of the big picture foundational ideas that underlie a lot of these techniques, where these are the problems that we’re trying to solve. And we try and solve those problems using a number of different methods of trying to take data and learn patterns in that data, whether that’s trying to find neighboring data points that are similar or trying to minimize some sort of loss function or any number of other techniques that allow us to begin to try to solve these sorts of problems. That then was a look at some of the principles that are at the foundation of modern machine learning, this ability to take data and learn from that data so that the computer can perform a task even if they haven’t explicitly been given instructions in order to do so. Next time, we’ll continue this conversation about machine learning, looking at other techniques we can use for solving these sorts of problems. We’ll see you then. All right, welcome back, everyone, to an introduction to artificial intelligence with Python. Now, last time, we took a look at machine learning, a set of techniques that computers can use in order to take a set of data and learn some patterns inside of that data, learn how to perform a task even if we the programmers didn’t give the computer explicit instructions for how to perform that task. Today, we transition to one of the most popular techniques and tools within machine learning, that of neural networks. And neural networks were inspired as early as the 1940s by researchers who were thinking about how it is that humans learn, studying neuroscience in the human brain and trying to see whether or not we could apply those same ideas to computers as well and model computer learning off of human learning. So how is the brain structured? Well, very simply put, the brain consists of a whole bunch of neurons. And those neurons are connected to one another and communicate with one another in some way. In particular, if you think about the structure of a biological neural network, something like this, there are a couple of key properties that scientists observed. One was that these neurons are connected to each other and receive electrical signals from one another, that one neuron can propagate electrical signals to another neuron. And another point is that neurons process those input signals and then can be activated, that a neuron becomes activated at a certain point and then can propagate further signals onto neurons in the future. And so the question then became, could we take this biological idea of how it is that humans learn with brains and with neurons and apply that to a machine as well, in effect designing an artificial neural network, or an ANN, which will be a mathematical model for learning that is inspired by these biological neural networks? And what artificial neural networks will allow us to do is they will first be able to model some sort of mathematical function. Every time you look at a neural network, which we’ll see more of later today, each one of them is really just some mathematical function that is mapping certain inputs to particular outputs based on the structure of the network, that depending on where we place particular units inside of this neural network, that’s going to determine how it is that the network is going to function. And in particular, artificial neural networks are going to lend themselves to a way that we can learn what the network’s parameters should be. We’ll see more on that in just a moment. But in effect, we want a model such that it is easy for us to be able to write some code that allows for the network to be able to figure out how to model the right mathematical function given a particular set of input data. So in order to create our artificial neural network, instead of using biological neurons, we’re just going to use what we’re going to call units, units inside of a neural network, which we can represent kind of like a node in a graph, which will here be represented just by a blue circle like this. And these artificial units, these artificial neurons, can be connected to one another. So here, for instance, we have two units that are connected by this edge inside of this graph, effectively. And so what we’re going to do now is think of this idea as some sort of mapping from inputs to outputs. So we have one unit that is connected to another unit that we might think of this side of the input and that side of the output. And what we’re trying to do then is to figure out how to solve a problem, how to model some sort of mathematical function. And this might take the form of something we saw last time, which was something like we have certain inputs, like variables x1 and x2. And given those inputs, we want to perform some sort of task, a task like predicting whether or not it’s going to rain. And ideally, we’d like some way, given these inputs, x1 and x2, which stand for some sort of variables to do with the weather, we would like to be able to predict, in this case, a Boolean classification. Is it going to rain, or is it not going to rain? And we did this last time by way of a mathematical function. We defined some function, h, for our hypothesis function, that took as input x1 and x2, the two inputs that we cared about processing, in order to determine whether we thought it was going to rain or whether we thought it was not going to rain. The question then becomes, what does this hypothesis function do in order to make that determination? And we decided last time to use a linear combination of these input variables to determine what the output should be. So our hypothesis function was equal to something like this. Weight 0 plus weight 1 times x1 plus weight 2 times x2. So what’s going on here is that x1 and x2, those are input variables, the inputs to this hypothesis function. And each of those input variables is being multiplied by some weight, which is just some number. So x1 is being multiplied by weight 1, x2 is being multiplied by weight 2. And we have this additional weight, weight 0, that doesn’t get multiplied by an input variable at all, that just serves to either move the function up or move the function’s value down. You can think of this as either a weight that’s just multiplied by some dummy value, like the number 1. It’s multiplied by 1, and so it’s not multiplied by anything. Or sometimes, you’ll see in the literature, people call this variable weight 0 a bias, so that you can think of these variables as slightly different. We have weights that are multiplied by the input, and we separately add some bias to the result as well. You’ll hear both of those terminologies used when people talk about neural networks and machine learning. So in effect, what we’ve done here is that in order to define a hypothesis function, we just need to decide and figure out what these weights should be to determine what values to multiply by our inputs to get some sort of result. Of course, at the end of this, what we need to do is make some sort of classification, like rainy or not rainy. And to do that, we use some sort of function that defines some sort of threshold. And so we saw, for instance, the step function, which is defined as 1 if the result of multiplying the weights by the inputs is at least 0, otherwise it’s 0. And you can think of this line down the middle as kind of like a dotted line. Effectively, it stays at 0 all the way up to one point, and then the function steps or jumps up to 1. So it’s 0 before it reaches some threshold, and then it’s 1 after it reaches a particular threshold. And so this was one way we could define what will come to call an activation function, a function that determines when it is that this output becomes active, changes to 1 instead of being a 0. But we also saw that if we didn’t just want a purely binary classification, we didn’t want purely 1 or 0, but we wanted to allow for some in-between real numbered values, we could use a different function. And there are a number of choices, but the one that we looked at was the logistic sigmoid function that has sort of an s-shaped curve, where we could represent this as a probability that may be somewhere in between the probability of rain or something like 0.5. Maybe a little bit later, the probability of rain is 0.8. And so rather than just have a binary classification of 0 or 1, we could allow for numbers that are in between as well. And it turns out there are many other different types of activation functions, where an activation function just takes the output of multiplying the weights together and adding that bias, and then figuring out what the actual output should be. Another popular one is the rectified linear unit, otherwise known as ReLU. And the way that works is that it just takes its input and takes the maximum of that input and 0. So if it’s positive, it remains unchanged. But if it’s 0, if it’s negative, it goes ahead and levels out at 0. And there are other activation functions that we could choose as well. But in short, each of these activation functions, you can just think of as a function that gets applied to the result of all of this computation. We take some function g and apply it to the result of all of that calculation. And this then is what we saw last time, the way of defining some hypothesis function that takes in inputs, calculate some linear combination of those inputs, and then passes it through some sort of activation function to get our output. And this actually turns out to be the model for the simplest of neural networks, that we’re going to instead represent this mathematical idea graphically by using a structure like this. Here then is a neural network that has two inputs. We can think of this as x1 and this as x2. And then one output, which you can think of as classifying whether or not we think it’s going to rain or not rain, for example, in this particular instance. And so how exactly does this model work? Well, each of these two inputs represents one of our input variables, x1 and x2. And notice that these inputs are connected to this output via these edges, which are going to be defined by their weights. So these edges each have a weight associated with them, weight 1 and weight 2. And then this output unit, what it’s going to do is it is going to calculate an output based on those inputs and based on those weights. This output unit is going to multiply all the inputs by their weights, add in this bias term, which you can think of as an extra w0 term that gets added into it, and then we pass it through an activation function. So this then is just a graphical way of representing the same idea we saw last time just mathematically. And we’re going to call this a very simple neural network. And we’d like for this neural network to be able to learn how to calculate some function, that we want some function for the neural network to learn. And the neural network is going to learn what should the values of w0, w1, and w2 be? What should the activation function be in order to get the result that we would expect? So we can actually take a look at an example of this. What then is a very simple function that we might calculate? Well, if we recall back from when we were looking at propositional logic, one of the simplest functions we looked at was something like the or function that takes two inputs, x and y, and outputs 1, otherwise known as true, if either one of the inputs or both of them are 1, and outputs of 0 if both of the inputs are 0 or false. So this then is the or function. And this was the truth table for the or function, that as long as either of the inputs are 1, the output of the function is 1, and the only case where the output is 0 is where both of the inputs are 0. So the question is, how could we take this and train a neural network to be able to learn this particular function? What would those weights look like? Well, we could do something like this. Here’s our neural network. And I’ll propose that in order to calculate the or function, we’re going to use a value of 1 for each of the weights. And we’ll use a bias of negative 1. And then we’ll just use this step function as our activation function. How then does this work? Well, if I wanted to calculate something like 0 or 0, which we know to be 0 because false or false is false, then what are we going to do? Well, our output unit is going to calculate this input multiplied by the weight, 0 times 1, that’s 0. Same thing here, 0 times 1, that’s 0. And we’ll add to that the bias minus 1. So that’ll give us a result of negative 1. If we plot that on our activation function, negative 1 is here. It’s before the threshold, which means either 0 or 1. It’s only 1 after the threshold. Since negative 1 is before the threshold, the output that this unit provides is going to be 0. And that’s what we would expect it to be, that 0 or 0 should be 0. What if instead we had had 1 or 0, where this is the number 1? Well, in this case, in order to calculate what the output is going to be, we again have to do this weighted sum, 1 times 1, that’s 1. 0 times 1, that’s 0. Sum of that so far is 1. Add negative 1 to that. Well, then the output is 0. And if we plot 0 on the step function, 0 ends up being here. It’s just at the threshold. And so the output here is going to be 1, because the output of 1 or 0, that’s 1. So that’s what we would expect as well. And just for one more example, if I had 1 or 1, what would the result be? Well, 1 times 1 is 1. 1 times 1 is 1. The sum of those is 2. I add the bias term to that. I get the number 1. 1 plotted on this graph is way over there. That’s well beyond the threshold. And so this output is going to be 1 as well. The output is always 0 or 1, depending on whether or not we’re past the threshold. And this neural network then models the OR function, a very simple function, definitely. But it still is able to model it correctly. If I give it the inputs, it will tell me what x1 or x2 happens to be. And you could imagine trying to do this for other functions as well. A function like the AND function, for instance, that takes two inputs and calculates whether both x and y are true. So if x is 1 and y is 1, then the output of x and y is 1. But in all the other cases, the output is 0. How could we model that inside of a neural network as well? Well, it turns out we could do it in the same way, except instead of negative 1 as the bias, we can use negative 2 as the bias instead. What does that end up looking like? Well, if I had 1 and 1, that should be 1, because 1 true and true is equal to true. Well, I take 1 times 1, that’s 1. 1 times 1 is 1. I get a total sum of 2 so far. Now I add the bias of negative 2, and I get the value 0. And 0, when I plot it on the activation function, is just past that threshold, and so the output is going to be 1. But if I had any other input, for example, like 1 and 0, well, the weighted sum of these is 1 plus 0 is going to be 1. Minus 2 is going to give us negative 1, and negative 1 is not past that threshold, and so the output is going to be 0. So those then are some very simple functions that we can model using a neural network that has two inputs and one output, where our goal is to be able to figure out what those weights should be in order to determine what the output should be. And you could imagine generalizing this to calculate more complex functions as well, that maybe, given the humidity and the pressure, we want to calculate what’s the probability that it’s going to rain, for example. Or we might want to do a regression-style problem. We’re given some amount of advertising, and given what month it is maybe, we want to predict what our expected sales are going to be for that particular month. So you could imagine these inputs and outputs being different as well. And it turns out that in some problems, we’re not just going to have two inputs, and the nice thing about these neural networks is that we can compose multiple units together, make our networks more complex just by adding more units into this particular neural network. So the network we’ve been looking at has two inputs and one output. But we could just as easily say, let’s go ahead and have three inputs in there, or have even more inputs, where we could arbitrarily decide however many inputs there are to our problem, all going to be calculating some sort of output that we care about figuring out the value of. How then does the math work for figuring out that output? Well, it’s going to work in a very similar way. In the case of two inputs, we had two weights indicated by these edges, and we multiplied the weights by the numbers, adding this bias term. And we’ll do the same thing in the other cases as well. If I have three inputs, you’ll imagine multiplying each of these three inputs by each of these weights. If I had five inputs instead, we’re going to do the same thing. Here I’m saying sum up from 1 to 5, xi multiplied by weight i. So take each of the five input variables, multiply them by their corresponding weight, and then add the bias to that. So this would be a case where there are five inputs into this neural network, for example. But there could be more, arbitrarily many nodes that we want inside of this neural network, where each time we’re just going to sum up all of those input variables multiplied by their weight and then add the bias term at the very end. And so this allows us to be able to represent problems that have even more inputs just by growing the size of our neural network. Now, the next question we might ask is a question about how it is that we train these neural networks. In the case of the or function and the and function, they were simple enough functions that I could just tell you, like here, what the weights should be. And you could probably reason through it yourself what the weights should be in order to calculate the output that you want. But in general, with functions like predicting sales or predicting whether or not it’s going to rain, these are much trickier functions to be able to figure out. We would like the computer to have some mechanism of calculating what it is that the weights should be, how it is to set the weights so that our neural network is able to accurately model the function that we care about trying to estimate. And it turns out that the strategy for doing this, inspired by the domain of calculus, is a technique called gradient descent. And what gradient descent is, it is an algorithm for minimizing loss when you’re training a neural network. And recall that loss refers to how bad our hypothesis function happens to be, that we can define certain loss functions. And we saw some examples of loss functions last time that just give us a number for any particular hypothesis, saying, how poorly does it model the data? How many examples does it get wrong? How are they worse or less bad as compared to other hypothesis functions that we might define? And this loss function is just a mathematical function. And when you have a mathematical function, in calculus what you could do is calculate something known as the gradient, which you can think of as like a slope. It’s the direction the loss function is moving at any particular point. And what it’s going to tell us is, in which direction should we be moving these weights in order to minimize the amount of loss? And so generally speaking, we won’t get into the calculus of it. But the high level idea for gradient descent is going to look something like this. If we want to train a neural network, we’ll go ahead and start just by choosing the weights randomly. Just pick random weights for all of the weights in the neural network. And then we’ll use the input data that we have access to in order to train the network, in order to figure out what the weights should actually be. So we’ll repeat this process again and again. The first step is we’re going to calculate the gradient based on all of the data points. So we’ll look at all the data and figure out what the gradient is at the place where we currently are for the current setting of the weights, which means in which direction should we move the weights in order to minimize the total amount of loss, in order to make our solution better. And once we’ve calculated that gradient, which direction we should move in the loss function, well, then we can just update those weights according to the gradient. Take a small step in the direction of those weights in order to try to make our solution a little bit better. And the size of the step that we take, that’s going to vary. And you can choose that when you’re training a particular neural network. But in short, the idea is going to be take all the data points, figure out based on those data points in what direction the weights should move, and then move the weights one small step in that direction. And if you repeat that process over and over again, adjusting the weights a little bit at a time based on all the data points, eventually you should end up with a pretty good solution to trying to solve this sort of problem. At least that’s what we would hope to happen. Now, if you look at this algorithm, a good question to ask anytime you’re analyzing an algorithm is what is going to be the expensive part of doing the calculation? What’s going to take a lot of work to try to figure out? What is going to be expensive to calculate? And in particular, in the case of gradient descent, the really expensive part is this all data points part right here, having to take all of the data points and using all of those data points figure out what the gradient is at this particular setting of all of the weights. Because odds are in a big machine learning problem where you’re trying to solve a big problem with a lot of data, you have a lot of data points in order to calculate. And figuring out the gradient based on all of those data points is going to be expensive. And you’ll have to do it many times. You’ll likely repeat this process again and again and again, going through all the data points, taking one small step over and over as you try and figure out what the optimal setting of those weights happens to be. It turns out that we would ideally like to be able to train our neural networks faster, to be able to more quickly converge to some sort of solution that is going to be a good solution to the problem. So in that case, there are alternatives to just standard gradient descent, which looks at all of the data points at once. We can employ a method like stochastic gradient descent, which will randomly just choose one data point at a time to calculate the gradient based on, instead of calculating it based on all of the data points. So the idea there is that we have some setting of the weights. We pick a data point. And based on that one data point, we figure out in which direction should we move all of the weights and move the weights in that small direction, then take another data point and do that again and repeat this process again and again, maybe looking at each of the data points multiple times, but each time only using one data point to calculate the gradient, to calculate which direction we should move in. Now, just using one data point instead of all of the data points probably gives us a less accurate estimate of what the gradient actually is. But on the plus side, it’s going to be much faster to be able to calculate, that we can much more quickly calculate what the gradient is based on one data point, instead of calculating based on all of the data points and having to do all of that computational work again and again. So there are trade-offs here between looking at all of the data points and just looking at one data point. And it turns out that a middle ground that is also quite popular is a technique called mini-batch gradient descent, where the idea there is instead of looking at all of the data versus just a single point, we instead divide our data set up into small batches, groups of data points, where you can decide how big a particular batch is. But in short, you’re just going to look at a small number of points at any given time, hopefully getting a more accurate estimate of the gradient, but also not requiring all of the computational effort needed to look at every single one of these data points. So gradient descent, then, is this technique that we can use in order to train these neural networks, in order to figure out what the setting of all of these weights should be if we want some way to try and get an accurate notion of how it is that this function should work, some way of modeling how to transform the inputs into particular outputs. Now, so far, the networks that we’ve taken a look at have all been structured similar to this. We have some number of inputs, maybe two or three or five or more. And then we have one output that is just predicting like rain or no rain or just predicting one particular value. But often in machine learning problems, we don’t just care about one output. We might care about an output that has multiple different values associated with it. So in the same way that we could take a neural network and add units to the input layer, we can likewise add inputs or add outputs to the output layer as well. Instead of just one output, you could imagine we have two outputs, or we could have four outputs, for example, where in each case, as we add more inputs or add more outputs, if we want to keep this network fully connected between these two layers, we just need to add more weights, that now each of these input nodes has four weights associated with each of the four outputs. And that’s true for each of these various different input nodes. So as we add nodes, we add more weights in order to make sure that each of the inputs can somehow be connected to each of the outputs so that each output value can be calculated based on what the value of the input happens to be. So what might a case be where we want multiple different output values? Well, you might consider that in the case of weather predicting, for example, we might not just care whether it’s raining or not raining. There might be multiple different categories of weather that we would like to categorize the weather into. With just a single output variable, we can do a binary classification, like rain or no rain, for instance, 1 or 0. But it doesn’t allow us to do much more than that. With multiple output variables, I might be able to use each one to predict something a little different. Maybe I want to categorize the weather into one of four different categories, something like is it going to be raining or sunny or cloudy or snowy. And I now have four output variables that can be used to represent maybe the probability that it is rainy as opposed to sunny as opposed to cloudy or as opposed to snowy. How then would this neural network work? Well, we have some input variables that represent some data that we have collected about the weather. Each of those inputs gets multiplied by each of these various different weights. We have more multiplications to do, but these are fairly quick mathematical operations to perform. And then what we get is after passing them through some sort of activation function in the outputs, we end up getting some sort of number, where that number, you might imagine, you could interpret as a probability, like a probability that it is one category as opposed to another category. So here we’re saying that based on the inputs, we think there is a 10% chance that it’s raining, a 60% chance that it’s sunny, a 20% chance of cloudy, a 10% chance that it’s snowy. And given that output, if these represent a probability distribution, well, then you could just pick whichever one has the highest value, in this case, sunny, and say that, well, most likely, we think that this categorization of inputs means that the output should be snowy or should be sunny. And that is what we would expect the weather to be in this particular instance. And so this allows us to do these sort of multi-class classifications, where instead of just having a binary classification, 1 or 0, we can have as many different categories as we want. And we can have our neural network output these probabilities over which categories are more likely than other categories. And using that data, we’re able to draw some sort of inference on what it is that we should do. So this was sort of the idea of supervised machine learning. I can give this neural network a whole bunch of data, a whole bunch of input data corresponding to some label, some output data, like we know that it was raining on this day, we know that it was sunny on that day. And using all of that data, the algorithm can use gradient descent to figure out what all of the weights should be in order to create some sort of model that hopefully allows us a way to predict what we think the weather is going to be. But neural networks have a lot of other applications as well. You could imagine applying the same sort of idea to a reinforcement learning sort of example as well, where you remember that in reinforcement learning, what we wanted to do is train some sort of agent to learn what action to take, depending on what state they currently happen to be in. So depending on the current state of the world, we wanted the agent to pick from one of the available actions that is available to them. And you might model that by having each of these input variables represent some information about the state, some data about what state our agent is currently in. And then the output, for example, could be each of the various different actions that our agent could take, action 1, 2, 3, and 4. And you might imagine that this network would work in the same way, but based on these particular inputs, we go ahead and calculate values for each of these outputs. And those outputs could model which action is better than other actions. And we could just choose, based on looking at those outputs, which action we should take. And so these neural networks are very broadly applicable, that all they’re really doing is modeling some mathematical function. So anything that we can frame as a mathematical function, something like classifying inputs into various different categories or figuring out based on some input state what action we should take, these are all mathematical functions that we could attempt to model by taking advantage of this neural network structure, and in particular, taking advantage of this technique, gradient descent, that we can use in order to figure out what the weights should be in order to do this sort of calculation. Now, how is it that you would go about training a neural network that has multiple outputs instead of just one? Well, with just a single output, we could see what the output for that value should be, and then you update all of the weights that corresponded to it. And when we have multiple outputs, at least in this particular case, we can really think of this as four separate neural networks, that really we just have one network here that has these three inputs corresponding with these three weights corresponding to this one output value. And the same thing is true for this output value. This output value effectively defines yet another neural network that has these same three inputs, but a different set of weights that correspond to this output. And likewise, this output has its own set of weights as well, and same thing for the fourth output too. And so if you wanted to train a neural network that had four outputs instead of just one, in this case where the inputs are directly connected to the outputs, you could really think of this as just training four independent neural networks. We know what the outputs for each of these four should be based on our input data, and using that data, we can begin to figure out what all of these individual weights should be. And maybe there’s an additional step at the end to make sure that we turn these values into a probability distribution such that we can interpret which one is better than another or more likely than another as a category or something like that. So this then seems like it does a pretty good job of taking inputs and trying to predict what outputs should be. And we’ll see some real examples of this in just a moment as well. But it’s important then to think about what the limitations of this sort of approach is, of just taking some linear combination of inputs and passing it into some sort of activation function. And it turns out that when we do this in the case of binary classification, trying to predict does it belong to one category or another, we can only predict things that are linearly separable. Because we’re taking a linear combination of inputs and using that to define some decision boundary or threshold, then what we get is a situation where if we have this set of data, we can predict a line that separates linearly the red points from the blue points, but a single unit that is making a binary classification, otherwise known as a perceptron, can’t deal with a situation like this, where we’ve seen this type of situation before, where there is no straight line that just goes straight through the data that will divide the red points away from the blue points. It’s a more complex decision boundary. The decision boundary somehow needs to capture the things inside of this circle. And there isn’t really a line that will allow us to deal with that. So this is the limitation of the perceptron, these units that just make these binary decisions based on their inputs, that a single perceptron is only capable of learning a linearly separable decision boundary. All it can do is define a line. And sure, it can give us probabilities based on how close to that decision boundary we are, but it can only really decide based on a linear decision boundary. And so this doesn’t seem like it’s going to generalize well to situations where real world data is involved, because real world data often isn’t linearly separable. It often isn’t the case that we can just draw a line through the data and be able to divide it up into multiple groups. So what then is the solution to this? Well, what was proposed was the idea of a multilayer neural network, that so far all of the neural networks we’ve seen have had a set of inputs and a set of outputs, and the inputs are connected to those outputs. But in a multilayer neural network, this is going to be an artificial neural network that has an input layer still. It has an output layer, but also has one or more hidden layers in between. Other layers of artificial neurons or units that are going to calculate their own values as well. So instead of a neural network that looks like this with three inputs and one output, you might imagine in the middle here injecting a hidden layer, something like this. This is a hidden layer that has four nodes. You could choose how many nodes or units end up going into the hidden layer. You can have multiple hidden layers as well. And so now each of these inputs isn’t directly connected to the output. Each of the inputs is connected to this hidden layer. And then all of the nodes in the hidden layer, those are connected to the one output. And so this is just another step that we can take towards calculating more complex functions. Each of these hidden units will calculate its output value, otherwise known as its activation, based on a linear combination of all the inputs. And once we have values for all of these nodes, as opposed to this just being the output, we do the same thing again. Calculate the output for this node based on multiplying each of the values for these units by their weights as well. So in effect, the way this works is that we start with inputs. They get multiplied by weights in order to calculate values for the hidden nodes. Those get multiplied by weights in order to figure out what the ultimate output is going to be. And the advantage of layering things like this is it gives us an ability to model more complex functions, that instead of just having a single decision boundary, a single line dividing the red points from the blue points, each of these hidden nodes can learn a different decision boundary. And we can combine those decision boundaries to figure out what the ultimate output is going to be. And as we begin to imagine more complex situations, you could imagine each of these nodes learning some useful property or learning some useful feature of all of the inputs and us somehow learning how to combine those features together in order to get the output that we actually want. Now, the natural question when we begin to look at this now is to ask the question of, how do we train a neural network that has hidden layers inside of it? And this turns out to initially be a bit of a tricky question, because the input data that we are given is we are given values for all of the inputs, and we’re given what the value of the output should be, what the category is, for example. But the input data doesn’t tell us what the values for all of these nodes should be. So we don’t know how far off each of these nodes actually is because we’re only given data for the inputs and the outputs. The reason this is called the hidden layer is because the data that is made available to us doesn’t tell us what the values for all of these intermediate nodes should actually be. And so the strategy people came up with was to say that if you know what the error or the losses on the output node, well, then based on what these weights are, if one of these weights is higher than another, you can calculate an estimate for how much the error from this node was due to this part of the hidden node, or this part of the hidden layer, or this part of the hidden layer, based on the values of these weights, in effect saying that based on the error from the output, I can back propagate the error and figure out an estimate for what the error is for each of these nodes in the hidden layer as well. And there’s some more calculus here that we won’t get into the details of, but the idea of this algorithm is known as back propagation. It’s an algorithm for training a neural network with multiple different hidden layers. And the idea for this, the pseudocode for it, will again be if we want to run gradient descent with back propagation. We’ll start with a random choice of weights, as we did before. And now we’ll go ahead and repeat the training process again and again. But what we’re going to do each time is now we’re going to calculate the error for the output layer first. We know the output and what it should be, and we know what we calculated so we can figure out what the error there is. But then we’re going to repeat for every layer, starting with the output layer, moving back into the hidden layer, then the hidden layer before that if there are multiple hidden layers, going back all the way to the very first hidden layer, assuming there are multiple, we’re going to propagate the error back one layer. Whatever the error was from the output, figure out what the error should be a layer before that based on what the values of those weights are. And then we can update those weights. So graphically, the way you might think about this is that we first start with the output. We know what the output should be. We know what output we calculated. And based on that, we can figure out, all right, how do we need to update those weights? Backpropagating the error to these nodes. And using that, we can figure out how we should update these weights. And you might imagine if there are multiple layers, we could repeat this process again and again to begin to figure out how all of these weights should be updated. And this backpropagation algorithm is really the key algorithm that makes neural networks possible. It makes it possible to take these multi-level structures and be able to train those structures depending on what the values of these weights are in order to figure out how it is that we should go about updating those weights in order to create some function that is able to minimize the total amount of loss, to figure out some good setting of the weights that will take the inputs and translate it into the output that we expect. And this works, as we said, not just for a single hidden layer. But you can imagine multiple hidden layers, where each hidden layer we just define however many nodes we want, where each of the nodes in one layer, we can connect to the nodes in the next layer, defining more and more complex networks that are able to model more and more complex types of functions. And so this type of network is what we might call a deep neural network, part of a larger family of deep learning algorithms, if you’ve ever heard that term. And all deep learning is about is it’s using multiple layers to be able to predict and be able to model higher level features inside of the input, to be able to figure out what the output should be. And so a deep neural network is just a neural network that has multiple of these hidden layers, where we start at the input, calculate values for this layer, then this layer, then this layer, and then ultimately get an output. And this allows us to be able to model more and more sophisticated types of functions, that each of these layers can calculate something a little bit different, and we can combine that information to figure out what the output should be. Of course, as with any situation of machine learning, as we begin to make our models more and more complex, to model more and more complex functions, the risk we run is something like overfitting. And we talked about overfitting last time in the context of overfitting based on when we were training our models to be able to learn some sort of decision boundary, where overfitting happens when we fit too closely to the training data. And as a result, we don’t generalize well to other situations as well. And one of the risks we run with a far more complex neural network that has many, many different nodes is that we might overfit based on the input data. We might grow over reliant on certain nodes to calculate things just purely based on the input data that doesn’t allow us to generalize very well to the output. And there are a number of strategies for dealing with overfitting. But one of the most popular in the context of neural networks is a technique known as dropout. And what dropout does is it, when we’re training the neural network, what we’ll do in dropout is temporarily remove units, temporarily remove these artificial neurons from our network chosen at random. And the goal here is to prevent over-reliance on certain units. What generally happens in overfitting is that we begin to over-rely on certain units inside the neural network to be able to tell us how to interpret the input data. What dropout will do is randomly remove some of these units in order to reduce the chance that we over-rely on certain units to make our neural network more robust, to be able to handle the situations even when we just drop out particular neurons entirely. So the way that might work is we have a network like this. And as we’re training it, when we go about trying to update the weights the first time, we’ll just randomly pick some percentage of the nodes to drop out of the network. It’s as if those nodes aren’t there at all. It’s as if the weights associated with those nodes aren’t there at all. And we’ll train it this way. Then the next time we update the weights, we’ll pick a different set and just go ahead and train that way. And then again, randomly choose and train with other nodes that have been dropped out as well. And the goal of that is that after the training process, if you train by dropping out random nodes inside of this neural network, you hopefully end up with a network that’s a little bit more robust, that doesn’t rely too heavily on any one particular node, but more generally learns how to approximate a function in general. So that then is a look at some of these techniques that we can use in order to implement a neural network, to get at the idea of taking this input, passing it through these various different layers in order to produce some sort of output. And what we’d like to do now is take those ideas and put them into code. And to do that, there are a number of different machine learning libraries, neural network libraries that we can use that allow us to get access to someone’s implementation of back propagation and all of these hidden layers. And one of the most popular, developed by Google, is known as TensorFlow, a library that we can use for quickly creating neural networks and modeling them and running them on some sample data to see what the output is going to be. And before we actually start writing code, we’ll go ahead and take a look at TensorFlow’s playground, which will be an opportunity for us just to play around with this idea of neural networks in different layers, just to get a sense for what it is that we can do by taking advantage of neural networks. So let’s go ahead and go into TensorFlow’s playground, which you can go to by visiting that URL from before. And what we’re going to do now is we’re going to try and learn the decision boundary for this particular output. I want to learn to separate the orange points from the blue points. And I’d like to learn some sort of setting of weights inside of a neural network that will be able to separate those from each other. The features we have access to, our input data, are the x value and the y value, so the two values along each of the two axes. And what I’ll do now is I can set particular parameters, like what activation function I would like to use. And I’ll just go ahead and press play and see what happens. And what happens here is that you’ll see that just by using these two input features, the x value and the y value, with no hidden layers, just take the input, x and y values, and figure out what the decision boundary is. Our neural network learns pretty quickly that in order to divide these two points, we should just use this line. This line acts as a decision boundary that separates this group of points from that group of points, and it does it very well. You can see up here what the loss is. The training loss is 0, meaning we were able to perfectly model separating these two points from each other inside of our training data. So this was a fairly simple case of trying to apply a neural network because the data is very clean. It’s very nicely linearly separable. We could just draw a line that separates all of those points from each other. Let’s now consider a more complex case. So I’ll go ahead and pause the simulation, and we’ll go ahead and look at this data set here. This data set is a little bit more complex now. In this data set, we still have blue and orange points that we’d like to separate from each other. But there’s no single line that we can draw that is going to be able to figure out how to separate the blue from the orange, because the blue is located in these two quadrants, and the orange is located here and here. It’s a more complex function to be able to learn. So let’s see what happens. If we just try and predict based on those inputs, the x and y coordinates, what the output should be, I’ll press Play. And what you’ll notice is that we’re not really able to draw much of a conclusion, that we’re not able to very cleanly see how we should divide the orange points from the blue points, and you don’t see a very clean separation there. So it seems like we don’t have enough sophistication inside of our network to be able to model something that is that complex. We need a better model for this neural network. And I’ll do that by adding a hidden layer. So now I have a hidden layer that has two neurons inside of it. So I have two inputs that then go to two neurons inside of a hidden layer that then go to our output. And now I’ll press Play. And what you’ll notice here is that we’re able to do slightly better. We’re able to now say, all right, these points are definitely blue. These points are definitely orange. We’re still struggling a little bit with these points up here, though. And what we can do is we can see for each of these hidden neurons, what is it exactly that these hidden neurons are doing? Each hidden neuron is learning its own decision boundary. And we can see what that boundary is. This first neuron is learning, all right, this line that seems to separate some of the blue points from the rest of the points. This other hidden neuron is learning another line that seems to be separating the orange points in the lower right from the rest of the points. So that’s why we’re able to figure out these two areas in the bottom region. But we’re still not able to perfectly classify all of the points. So let’s go ahead and add another neuron. Now we’ve got three neurons inside of our hidden layer and see what we’re able to learn now. All right, well, now we seem to be doing a better job. By learning three different decision boundaries, which each of the three neurons inside of our hidden layer, we’re able to much better figure out how to separate these blue points from the orange points. And we can see what each of these hidden neurons is learning. Each one is learning a slightly different decision boundary. And then we’re combining those decision boundaries together to figure out what the overall output should be. And then we can try it one more time by adding a fourth neuron there and try learning that. And it seems like now we can do even better at trying to separate the blue points from the orange points. But we were only able to do this by adding a hidden layer, by adding some layer that is learning some other boundaries and combining those boundaries to determine the output. And the strength, the size and thickness of these lines indicate how high these weights are, how important each of these inputs is for making this sort of calculation. And we can do maybe one more simulation. Let’s go ahead and try this on a data set that looks like this. Go ahead and get rid of the hidden layer. Here now we’re trying to separate the blue points from the orange points where all the blue points are located, again, inside of a circle effectively. So we’re not going to be able to learn a line. Notice I press Play. And we’re really not able to draw any sort of classification at all because there is no line that cleanly separates the blue points from the orange points. So let’s try to solve this by introducing a hidden layer. I’ll go ahead and press Play. And all right, with two neurons in a hidden layer, we’re able to do a little better because we effectively learned two different decision boundaries. We learned this line here. And we learned this line on the right-hand side. And right now we’re just saying, all right, well, if it’s in between, we’ll call it blue. And if it’s outside, we’ll call it orange. So not great, but certainly better than before, that we’re learning one decision boundary and another. And based on those, we can figure out what the output should be. But let’s now go ahead and add a third neuron and see what happens now. I go ahead and train it. And now, using three different decision boundaries that are learned by each of these hidden neurons, we’re able to much more accurately model this distinction between blue points and orange points. We’re able to figure out maybe with these three decision boundaries, combining them together, you can imagine figuring out what the output should be and how to make that sort of classification. And so the goal here is just to get a sense for having more neurons in these hidden layers allows us to learn more structure in the data, allows us to figure out what the relevant and important decision boundaries are. And then using this backpropagation algorithm, we’re able to figure out what the values of these weights should be in order to train this network to be able to classify one category of points away from another category of points instead. And this is ultimately what we’re going to be trying to do whenever we’re training a neural network. So let’s go ahead and actually see an example of this. You’ll recall from last time that we had this banknotes file that included information about counterfeit banknotes as opposed to authentic banknotes, where I had four different values for each banknote and then a categorization of whether that banknote is considered to be authentic or a counterfeit note. And what I wanted to do was, based on that input information, figure out some function that could calculate based on the input information what category it belonged to. And what I’ve written here in banknotes.py is a neural network that will learn just that, a network that learns based on all of the input whether or not we should categorize a banknote as authentic or as counterfeit. The first step is the same as what we saw from last time. I’m really just reading the data in and getting it into an appropriate format. And so this is where more of the writing Python code on your own comes in, in terms of manipulating this data, massaging the data into a format that will be understood by a machine learning library like scikit-learn or like TensorFlow. And so here I separate it into a training and a testing set. And now what I’m doing down below is I’m creating a neural network. Here I’m using TF, which stands for TensorFlow. Up above, I said import TensorFlow as TF, TF just an abbreviation that we’ll often use so we don’t need to write out TensorFlow every time we want to use anything inside of the library. I’m using TF.keras. Keras is an API, a set of functions that we can use in order to manipulate neural networks inside of TensorFlow. And it turns out there are other machine learning libraries that also use the Keras API. But here I’m saying, all right, go ahead and give me a model that is a sequential model, a sequential neural network, meaning one layer after another. And now I’m going to add to that model what layers I want inside of my neural network. So here I’m saying model.add. Go ahead and add a dense layer. And when we say a dense layer, we mean a layer that is just each of the nodes inside of the layer is going to be connected to each of the nodes from the previous layer. So we have a densely connected layer. This layer is going to have eight units inside of it. So it’s going to be a hidden layer inside of a neural network with eight different units, eight artificial neurons, each of which might learn something different. And I just sort of chose eight arbitrarily. You could choose a different number of hidden nodes inside of the layer. And as we saw before, depending on the number of units there are inside of your hidden layer, more units means you can learn more complex functions. So maybe you can more accurately model the training data. But it comes at the cost. More units means more weights that you need to figure out how to update. So it might be more expensive to do that calculation. And you also run the risk of overfitting on the data. If you have too many units and you learn to just overfit on the training data, that’s not good either. So there is a balance. And there’s often a testing process where you’ll train on some data and maybe validate how well you’re doing on a separate set of data, often called a validation set, to see, all right, which setting of parameters. How many layers should I have? How many units should be in each layer? Which one of those performs the best on the validation set? So you can do some testing to figure out what these hyper parameters, so called, should be equal to. Next, I specify what the input shape is. Meaning, all right, what does my input look like? My input has four values. And so the input shape is just four, because we have four inputs. And then I specify what the activation function is. And the activation function, again, we can choose. There are a number of different activation functions. Here I’m using relu, which you might recall from earlier. And then I’ll add an output layer. So I have my hidden layer. Now I’m adding one more layer that will just have one unit, because all I want to do is predict something like counterfeit build or authentic build. So I just need a single unit. And the activation function I’m going to use here is that sigmoid activation function, which, again, was that S-shaped curve that just gave us a probability of what is the probability that this is a counterfeit build, as opposed to an authentic build. So that, then, is the structure of my neural network, a sequential neural network that has one hidden layer with eight units inside of it, and then one output layer that just has a single unit inside of it. And I can choose how many units there are. I can choose the activation function. Then I’m going to compile this model. TensorFlow gives you a choice of how you would like to optimize the weights. There are various different algorithms for doing that. What type of loss function you want to use. Again, many different options for doing that. And then how I want to evaluate my model, well, I care about accuracy. I care about how many of my points am I able to classify correctly versus not correctly as counterfeit or not counterfeit. And I would like it to report to me how accurate my model is performing. Then, now that I’ve defined that model, I call model.fit to say go ahead and train the model. Train it on all the training data plus all of the training labels. So labels for each of those pieces of training data. And I’m saying run it for 20 epics, meaning go ahead and go through each of these training points 20 times, effectively. Go through the data 20 times and keep trying to update the weights. If I did it for more, I could train for even longer and maybe get a more accurate result. But then after I fit it on all the data, I’ll go ahead and just test it. I’ll evaluate my model using model.evaluate built into TensorFlow that is just going to tell me how well do I perform on the testing data. So ultimately, this is just going to give me some numbers that tell me how well we did in this particular case. So now what I’m going to do is go into banknotes and go ahead and run banknotes.py. And what’s going to happen now is it’s going to read in all of that training data. It’s going to generate a neural network with all my inputs, my eight hidden units inside my layer, and then an output unit. And now what it’s doing is it’s training. It’s training 20 times. And each time you can see how my accuracy is increasing on my training data. It starts off the very first time not very accurate, though better than random, something like 79% of the time. It’s able to accurately classify one bill from another. But as I keep training, notice this accuracy value improves and improves and improves until after I’ve trained through all the data points 20 times, it looks like my accuracy is above 99% on the training data. And here’s where I tested it on a whole bunch of testing data. And it looks like in this case, I was also like 99.8% accurate. So just using that, I was able to generate a neural network that can detect counterfeit bills from authentic bills based on this input data 99.8% of the time, at least based on this particular testing data. And I might want to test it with more data as well, just to be confident about that. But this is really the value of using a machine learning library like TensorFlow. And there are others available for Python and other languages as well. But all I have to do is define the structure of the network and define the data that I’m going to pass into the network. And then TensorFlow runs the backpropagation algorithm for learning what all of those weights should be, for figuring out how to train this neural network to be able to accurately, as accurately as possible, figure out what the output values should be there as well. And so this then was a look at what it is that neural networks can do just using these sequences of layer after layer after layer. And you can begin to imagine applying these to much more general problems. And one big problem in computing and artificial intelligence more generally is the problem of computer vision. Computer vision is all about computational methods for analyzing and understanding images. You might have pictures that you want the computer to figure out how to deal with, how to process those images and figure out how to produce some sort of useful result out of this. You’ve seen this in the context of social media websites that are able to look at a photo that contains a whole bunch of faces. And it’s able to figure out what’s a picture of whom and label those and tag them with appropriate people. This is becoming increasingly relevant as we begin to discuss self-driving cars, that these cars now have cameras. And we would like for the computer to have some sort of algorithm that looks at the image and figures out what color is the light, what cars are around us and in what direction, for example. And so computer vision is all about taking an image and figuring out what sort of computation, what sort of calculation we can do with that image. It’s also relevant in the context of something like handwriting recognition. This, what you’re looking at, is an example of the MNIST data set. It’s a big data set just of handwritten digits that we could use to ideally try and figure out how to predict, given someone’s handwriting, given a photo of a digit that they have drawn, can you predict whether it’s a 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9, for example. So this sort of handwriting recognition is yet another task that we might want to use computer vision tasks and tools to be able to apply it towards. This might be a task that we might care about. So how, then, can we use neural networks to be able to solve a problem like this? Well, neural networks rely upon some sort of input where that input is just numerical data. We have a whole bunch of units where each one of them just represents some sort of number. And so in the context of something like handwriting recognition or in the context of just an image, you might imagine that an image is really just a grid of pixels, grid of dots where each dot has some sort of color. And in the context of something like handwriting recognition, you might imagine that if you just fill in each of these dots in a particular way, you can generate a 2 or an 8, for example, based on which dots happen to be shaded in and which dots are not. And we can represent each of these pixel values just using numbers. So for a particular pixel, for example, 0 might represent entirely black. Depending on how you’re representing color, it’s often common to represent color values on a 0 to 255 range so that you can represent a color using 8 bits for a particular value, like how much white is in the image. So 0 might represent all black. 255 might represent entirely white as a pixel. And somewhere in between might represent some shade of gray, for example. But you might imagine not just having a single slider that determines how much white is in the image, but if you had a color image, you might imagine three different numerical values, a red, green, and blue value, where the red value controls how much red is in the image. We have one value for controlling how much green is in the pixel and one value for how much blue is in the pixel as well. And depending on how it is that you set these values of red, green, and blue, you can get a different color. And so any pixel can really be represented, in this case, by three numerical values, a red value, a green value, and a blue value. And if you take a whole bunch of these pixels, assemble them together inside of a grid of pixels, then you really just have a whole bunch of numerical values that you can use in order to perform some sort of prediction task. And so what you might imagine doing is using the same techniques we talked about before, just design a neural network with a lot of inputs, that for each of the pixels, we might have one or three different inputs in the case of a color image, a different input that is just connected to a deep neural network, for example. And this deep neural network might take all of the pixels inside of the image of what digit a person drew. And the output might be like 10 neurons that classify it as a 0, or a 1, or a 2, or a 3, or just tells us in some way what that digit happens to be. Now, there are a couple of drawbacks to this approach. The first drawback to the approach is just the size of this input array, that we have a whole bunch of inputs. If we have a big image that has a lot of different channels, we’re looking at a lot of inputs, and therefore a lot of weights that we have to calculate. And a second problem is the fact that by flattening everything into just this structure of all the pixels, we’ve lost access to a lot of the information about the structure of the image that’s relevant, that really, when a person looks at an image, they’re looking at particular features of the image. They’re looking at curves. They’re looking at shapes. They’re looking at what things can you identify in different regions of the image, and maybe put those things together in order to get a better picture of what the overall image is about. And by just turning it into pixel values for each of the pixels, sure, you might be able to learn that structure, but it might be challenging in order to do so. It might be helpful to take advantage of the fact that you can use properties of the image itself, the fact that it’s structured in a particular way, to be able to improve the way that we learn based on that image too. So in order to figure out how we can train our neural networks to better be able to deal with images, we’ll introduce a couple of ideas, a couple of algorithms that we can apply that allow us to take the image and extract some useful information out of that image. And the first idea we’ll introduce is the notion of image convolution. And what image convolution is all about is it’s about filtering an image, sort of extracting useful or relevant features out of the image. And the way we do that is by applying a particular filter that basically adds the value for every pixel with the values for all of the neighboring pixels to it, according to some sort of kernel matrix, which we’ll see in a moment, is going to allow us to weight these pixels in various different ways. And the goal of image convolution, then, is to extract some sort of interesting or useful features out of an image, to be able to take a pixel and, based on its neighboring pixels, maybe predict some sort of valuable information. Something like taking a pixel and looking at its neighboring pixels, you might be able to predict whether or not there’s some sort of curve inside the image, or whether it’s forming the outline of a particular line or a shape, for example. And that might be useful if you’re trying to use all of these various different features to combine them to say something meaningful about an image as a whole. So how, then, does image convolution work? Well, we start with a kernel matrix. And the kernel matrix looks something like this. And the idea of this is that, given a pixel that will be the middle pixel, we’re going to multiply each of the neighboring pixels by these values in order to get some sort of result by summing up all the numbers together. So if I take this kernel, which you can think of as a filter that I’m going to apply to the image, and let’s say that I take this image. This is a 4 by 4 image. We’ll think of it as just a black and white image, where each one is just a single pixel value. So somewhere between 0 and 255, for example. So we have a whole bunch of individual pixel values like this. And what I’d like to do is apply this kernel, this filter, so to speak, to this image. And the way I’ll do that is, all right, the kernel is 3 by 3. You can imagine a 5 by 5 kernel or a larger kernel, too. And I’ll take it and just first apply it to the first 3 by 3 section of the image. And what I’ll do is I’ll take each of these pixel values, multiply it by its corresponding value in the filter matrix, and add all of the results together. So here, for example, I’ll say 10 times 0, plus 20 times negative 1, plus 30 times 0, so on and so forth, doing all of this calculation. And at the end, if I take all these values, multiply them by their corresponding value in the kernel, add the results together, for this particular set of 9 pixels, I get the value of 10, for example. And then what I’ll do is I’ll slide this 3 by 3 grid, effectively, over. I’ll slide the kernel by 1 to look at the next 3 by 3 section. Here, I’m just sliding it over by 1 pixel. But you might imagine a different stride length, or maybe I jump by multiple pixels at a time if you really wanted to. You have different options here. But here, I’m just sliding over, looking at the next 3 by 3 section. And I’ll do the same math, 20 times 0, plus 30 times negative 1, plus 40 times 0, plus 20 times negative 1, so on and so forth, plus 30 times 5. And what I end up getting is the number 20. Then you can imagine shifting over to this one, doing the same thing, calculating the number 40, for example, and then doing the same thing here, and calculating a value there as well. And so what we have now is what we’ll call a feature map. We have taken this kernel, applied it to each of these various different regions, and what we get is some representation of a filtered version of that image. And so to give a more concrete example of why it is that this kind of thing could be useful, let’s take this kernel matrix, for example, which is quite a famous one, that has an 8 in the middle, and then all of the neighboring pixels get a negative 1. And let’s imagine we wanted to apply that to a 3 by 3 part of an image that looks like this, where all the values are the same. They’re all 20, for instance. Well, in this case, if you do 20 times 8, and then subtract 20, subtract 20, subtract 20 for each of the eight neighbors, well, the result of that is you just get that expression, which comes out to be 0. You multiplied 20 by 8, but then you subtracted 20 eight times, according to that particular kernel. The result of all that is just 0. So the takeaway here is that when a lot of the pixels are the same value, we end up getting a value close to 0. If, though, we had something like this, 20 is along this first row, then 50 is in the second row, and 50 is in the third row, well, then when you do this, because it’s the same kind of math, 20 times negative 1, 20 times negative 1, so on and so forth, then I get a higher value, a value like 90 in this particular case. And so the more general idea here is that by applying this kernel, negative 1s, 8 in the middle, and then negative 1s, what I get is when this middle value is very different from the neighboring values, like 50 is greater than these 20s, then you’ll end up with a value higher than 0. If this number is higher than its neighbors, you end up getting a bigger output. But if this value is the same as all of its neighbors, then you get a lower output, something like 0. And it turns out that this sort of filter can therefore be used in something like detecting edges in an image. Or I want to detect the boundaries between various different objects inside of an image. I might use a filter like this, which is able to tell whether the value of this pixel is different from the values of the neighboring pixel, if it’s greater than the values of the pixels that happen to surround it. And so we can use this in terms of image filtering. And so I’ll show you an example of that. I have here in filter.py a file that uses Python’s image library, or PIL, to do some image filtering. I go ahead and open an image. And then all I’m going to do is apply a kernel to that image. It’s going to be a 3 by 3 kernel, same kind of kernel we saw before. And here is the kernel. This is just a list representation of the same matrix that I showed you a moment ago. It’s negative 1, negative 1, negative 1. The second row is negative 1, 8, negative 1. And the third row is all negative 1s. And then at the end, I’m going to go ahead and show the filtered image. So if, for example, I go into convolution directory and I open up an image, like bridge.png, this is what an input image might look like, just an image of a bridge over a river. Now I’m going to go ahead and run this filter program on the bridge. And what I get is this image here. Just by taking the original image and applying that filter to each 3 by 3 grid, I’ve extracted all of the boundaries, all of the edges inside the image that separate one part of the image from another. So here I’ve got a representation of boundaries between particular parts of the image. And you might imagine that if a machine learning algorithm is trying to learn what an image is of, a filter like this could be pretty useful. Maybe the machine learning algorithm doesn’t care about all of the details of the image. It just cares about certain useful features. It cares about particular shapes that are able to help it determine that based on the image, this is going to be a bridge, for example. And so this type of idea of image convolution can allow us to apply filters to images that allow us to extract useful results out of those images, taking an image and extracting its edges, for example. And you might imagine many other filters that could be applied to an image that are able to extract particular values as well. And a filter might have separate kernels for the red values, the green values, and the blue values that are all summed together at the end, such that you could have particular filters looking for, is there red in this part of the image? Are there green in other parts of the image? You can begin to assemble these relevant and useful filters that are able to do these calculations as well. So that then was the idea of image convolution, applying some sort of filter to an image to be able to extract some useful features out of that image. But all the while, these images are still pretty big. There’s a lot of pixels involved in the image. And realistically speaking, if you’ve got a really big image, that poses a couple of problems. One, it means a lot of input going into the neural network. But two, it also means that we really have to care about what’s in each particular pixel. Whereas realistically, we often, if you’re looking at an image, you don’t care whether something is in one particular pixel versus the pixel immediately to the right of it. They’re pretty close together. You really just care about whether there’s a particular feature in some region of the image. And maybe you don’t care about exactly which pixel it happens to be in. And so there’s a technique we can use known as pooling. And what pooling is, is it means reducing the size of an input by sampling from regions inside of the input. So we’re going to take a big image and turn it into a smaller image by using pooling. And in particular, one of the most popular types of pooling is called max pooling. And what max pooling does is it pools just by choosing the maximum value in a particular region. So for example, let’s imagine I had this 4 by 4 image. But I wanted to reduce its dimensions. I wanted to make it a smaller image so that I have fewer inputs to work with. Well, what I could do is I could apply a 2 by 2 max pool, where the idea would be that I’m going to first look at this 2 by 2 region and say, what is the maximum value in that region? Well, it’s the number 50. So we’ll go ahead and just use the number 50. And then we’ll look at this 2 by 2 region. What is the maximum value here? It’s 110, so that’s going to be my value. Likewise here, the maximum value looks like 20. Go ahead and put that there. Then for this last region, the maximum value was 40. So we’ll go ahead and use that. And what I have now is a smaller representation of this same original image that I obtained just by picking the maximum value from each of these regions. So again, the advantages here are now I only have to deal with a 2 by 2 input instead of a 4 by 4. And you can imagine shrinking the size of an image even more. But in addition to that, I’m now able to make my analysis independent of whether a particular value was in this pixel or this pixel. I don’t care if the 50 was here or here. As long as it was generally in this region, I’ll still get access to that value. So it makes our algorithms a little bit more robust as well. So that then is pooling, taking the size of the image, reducing it a little bit by just sampling from particular regions inside of the image. And now we can put all of these ideas together, pooling, image convolution, and neural networks all together into another type of neural network called a convolutional neural network, or a CNN, which is a neural network that uses this convolution step usually in the context of analyzing an image, for example. And so the way that a convolutional neural network works is that we start with some sort of input image, some grid of pixels. But rather than immediately put that into the neural network layers that we’ve seen before, we’ll start by applying a convolution step, where the convolution step involves applying some number of different image filters to our original image in order to get what we call a feature map, the result of applying some filter to an image. And we could do this once, but in general, we’ll do this multiple times, getting a whole bunch of different feature maps, each of which might extract some different relevant feature out of the image, some different important characteristic of the image that we might care about using in order to calculate what the result should be. And in the same way that when we train neural networks, we can train neural networks to learn the weights between particular units inside of the neural networks, we can also train neural networks to learn what those filters should be, what the values of the filters should be in order to get the most useful, most relevant information out of the original image just by figuring out what setting of those filter values, the values inside of that kernel, results in minimizing the loss function, minimizing how poorly our hypothesis actually performs in figuring out the classification of a particular image, for example. So we first apply this convolution step, get a whole bunch of these various different feature maps. But these feature maps are quite large. There’s a lot of pixel values that happen to be here. And so a logical next step to take is a pooling step, where we reduce the size of these images by using max pooling, for example, extracting the maximum value from any particular region. There are other pooling methods that exist as well, depending on the situation. You could use something like average pooling, where instead of taking the maximum value from a region, you take the average value from a region, which has its uses as well. But in effect, what pooling will do is it will take these feature maps and reduce their dimensions so that we end up with smaller grids with fewer pixels. And this then is going to be easier for us to deal with. It’s going to mean fewer inputs that we have to worry about. And it’s also going to mean we’re more resilient, more robust against potential movements of particular values, just by one pixel, when ultimately we really don’t care about those one-pixel differences that might arise in the original image. And now, after we’ve done this pooling step, now we have a whole bunch of values that we can then flatten out and just put into a more traditional neural network. So we go ahead and flatten it, and then we end up with a traditional neural network that has one input for each of these values in each of these resulting feature maps after we do the convolution and after we do the pooling step. And so this then is the general structure of a convolutional network. We begin with the image, apply convolution, apply pooling, flatten the results, and then put that into a more traditional neural network that might itself have hidden layers. You can have deep convolutional networks that have hidden layers in between this flattened layer and the eventual output to be able to calculate various different features of those values. But this then can help us to be able to use convolution and pooling to use our knowledge about the structure of an image to be able to get better results, to be able to train our networks faster in order to better capture particular parts of the image. And there’s no reason necessarily why you can only use these steps once. In fact, in practice, you’ll often use convolution and pooling multiple times in multiple different steps. See, what you might imagine doing is starting with an image, first applying convolution to get a whole bunch of maps, then applying pooling, then applying convolution again, because these maps are still pretty big. You can apply convolution to try and extract relevant features out of this result. Then take those results, apply pooling in order to reduce their dimensions, and then take that and feed it into a neural network that maybe has fewer inputs. So here I have two different convolution and pooling steps. I do convolution and pooling once, and then I do convolution and pooling a second time, each time extracting useful features from the layer before it, each time using pooling to reduce the dimensions of what you’re ultimately looking at. And the goal now of this sort of model is that in each of these steps, you can begin to learn different types of features of the original image. That maybe in the first step, you learn very low level features. Just learn and look for features like edges and curves and shapes, because based on pixels and their neighboring values, you can figure out, all right, what are the edges? What are the curves? What are the various different shapes that might be present there? But then once you have a mapping that just represents where the edges and curves and shapes happen to be, you can imagine applying the same sort of process again to begin to look for higher level features, look for objects, maybe look for people’s eyes and facial recognition, for example. Maybe look for more complex shapes like the curves on a particular number if you’re trying to recognize a digit in a handwriting recognition sort of scenario. And then after all of that, now that you have these results that represent these higher level features, you can pass them into a neural network, which is really just a deep neural network that looks like this, where you might imagine making a binary classification or classifying into multiple categories or performing various different tasks on this sort of model. So convolutional neural networks can be quite powerful and quite popular when it comes towards trying to analyze images. We don’t strictly need them. We could have just used a vanilla neural network that just operates with layer after layer, as we’ve seen before. But these convolutional neural networks can be quite helpful, in particular, because of the way they model the way a human might look at an image, that instead of a human looking at every single pixel simultaneously and trying to convolve all of them by multiplying them together, you might imagine that what convolution is really doing is looking at various different regions of the image and extracting relevant information and features out of those parts of the image, the same way that a human might have visual receptors that are looking at particular parts of what they see and using those combining them to figure out what meaning they can draw from all of those various different inputs. And so you might imagine applying this to a situation like handwriting recognition. So we’ll go ahead and see an example of that now, where I’ll go ahead and open up handwriting.py. Again, what we do here is we first import TensorFlow. And then TensorFlow, it turns out, has a few data sets that are built into the library that you can just immediately access. And one of the most famous data sets in machine learning is the MNIST data set, which is just a data set of a whole bunch of samples of people’s handwritten digits. I showed you a slide of that a little while ago. And what we can do is just immediately access that data set which is built into the library so that if I want to do something like train on a whole bunch of handwritten digits, I can just use the data set that is provided to me. Of course, if I had my own data set of handwritten images, I can apply the same idea. I’d first just need to take those images and turn them into an array of pixels, because that’s the way that these are going to be formatted. They’re going to be formatted as, effectively, an array of individual pixels. Now there’s a bit of reshaping I need to do, just turning the data into a format that I can put into my convolutional neural network. So this is doing things like taking all the values and dividing them by 255. If you remember, these color values tend to range from 0 to 255. So I can divide them by 255 just to put them into 0 to 1 range, which might be a little bit easier to train on. And then doing various other modifications to the data just to get it into a nice usable format. But here’s the interesting and important part. Here is where I create the convolutional neural network, the CNN, where here I’m saying, go ahead and use a sequential model. And before I could use model.add to say add a layer, add a layer, add a layer, another way I could define it is just by passing as input to this sequential neural network a list of all of the layers that I want. And so here, the very first layer in my model is a convolution layer, where I’m first going to apply convolution to my image. I’m going to use 13 different filters. So my model is going to learn 32, rather, 32 different filters that I would like to learn on the input image, where each filter is going to be a 3 by 3 kernel. So we saw those 3 by 3 kernels before, where we could multiply each value in a 3 by 3 grid by a value, multiply it, and add all the results together. So here, I’m going to learn 32 different of these 3 by 3 filters. I can, again, specify my activation function. And I specify what my input shape is. My input shape in the banknotes case was just 4. I had 4 inputs. My input shape here is going to be 28, 28, 1, because for each of these handwritten digits, it turns out that the MNIST data set organizes their data. Each image is a 28 by 28 pixel grid. So we’re going to have a 28 by 28 pixel grid. And each one of those images only has one channel value. These handwritten digits are just black and white. So there’s just a single color value representing how much black or how much white. You might imagine that in a color image, if you were doing this sort of thing, you might have three different channels, a red, a green, and a blue channel, for example. But in the case of just handwriting recognition, recognizing a digit, we’re just going to use a single value for, like, shaded in or not shaded in. And it might range, but it’s just a single color value. And that, then, is the very first layer of our neural network, a convolutional layer that will take the input and learn a whole bunch of different filters that we can apply to the input to extract meaningful features. Next step is going to be a max pooling layer, also built right into TensorFlow, where this is going to be a layer that is going to use a pool size of 2 by 2, meaning we’re going to look at 2 by 2 regions inside of the image and just extract the maximum value. Again, we’ve seen why this can be helpful. It’ll help to reduce the size of our input. And once we’ve done that, we’ll go ahead and flatten all of the units just into a single layer that we can then pass into the rest of the neural network. And now, here’s the rest of the neural network. Here, I’m saying, let’s add a hidden layer to my neural network with 128 units, so a whole bunch of hidden units inside of the hidden layer. And just to prevent overfitting, I can add a dropout to that. Say, you know what, when you’re training, randomly dropout half of the nodes from this hidden layer just to make sure we don’t become over-reliant on any particular node, we begin to really generalize and stop ourselves from overfitting. So TensorFlow allows us, just by adding a single line, to add dropout into our model as well, such that when it’s training, it will perform this dropout step in order to help make sure that we don’t overfit on this particular data. And then finally, I add an output layer. The output layer is going to have 10 units, one for each category that I would like to classify digits into, so 0 through 9, 10 different categories. And the activation function I’m going to use here is called the softmax activation function. And in short, what the softmax activation function is going to do is it’s going to take the output and turn it into a probability distribution. So ultimately, it’s going to tell me, what did we estimate the probability is that this is a 2 versus a 3 versus a 4. And so it will turn it into that probability distribution for me. Next up, I’ll go ahead and compile my model and fit it on all of my training data. And then I can evaluate how well the neural network performs. And then I’ve added to my Python program, if I’ve provided a command line argument like the name of a file, I’m going to go ahead and save the model to a file. And so this can be quite useful too. Once you’ve done the training step, which could take some time in terms of taking all the time, going through the data, running back propagation with gradient descent to be able to say, all right, how should we adjust the weight to this particular model? You end up calculating values for these weights, calculating values for these filters. You’d like to remember that information so you can use it later. And so TensorFlow allows us to just save a model to a file, such that later, if we want to use the model we’ve learned, use the weights that we’ve learned to make some sort of new prediction, we can just use the model that already exists. So what we’re doing here is after we’ve done all the calculation, we go ahead and save the model to a file, such that we can use it a little bit later. So for example, if I go into digits, I’m going to run handwriting.py. I won’t save it this time. We’ll just run it and go ahead and see what happens. What will happen is we need to go through the model in order to train on all of these samples of handwritten digits. The MNIST data set gives us thousands and thousands of sample handwritten digits in the same format that we can use in order to train. And so now what you’re seeing is this training process. And unlike the banknotes case, where there was much fewer data points, the data was very, very simple, here this data is more complex and this training process takes time. And so this is another one of those cases where when training neural networks, this is why computational power is so important that oftentimes you see people wanting to use sophisticated GPUs in order to more efficiently be able to do this sort of neural network training. It also speaks to the reason why more data can be helpful. The more sample data points you have, the better you can begin to do this training. So here we’re going through 60,000 different samples of handwritten digits. And I said we’re going to go through them 10 times. We’re going to go through the data set 10 times, training each time, hopefully improving upon our weights with every time we run through this data set. And we can see over here on the right what the accuracy is each time we go ahead and run this model, that the first time it looks like we got an accuracy of about 92% of the digits correct based on this training set. We increased that to 96% or 97%. And every time we run this, we’re going to see hopefully the accuracy improve as we continue to try and use that gradient descent, that process of trying to run the algorithm, to minimize the loss that we get in order to more accurately predict what the output should be. And what this process is doing is it’s learning not only the weights, but it’s learning the features to use, the kernel matrix to use when performing that convolution step. Because this is a convolutional neural network, where I’m first performing those convolutions and then doing the more traditional neural network structure, this is going to learn all of those individual steps as well. And so here we see the TensorFlow provides me with some very nice output, telling me about how many seconds are left with each of these training runs that allows me to see just how well we’re doing. So we’ll go ahead and see how this network performs. It looks like we’ve gone through the data set seven times. We’re going through it an eighth time now. And at this point, the accuracy is pretty high. We saw we went from 92% up to 97%. Now it looks like 98%. And at this point, it seems like things are starting to level out. It’s probably a limit to how accurate we can ultimately be without running the risk of overfitting. Of course, with enough nodes, you would just memorize the input and overfit upon them. But we’d like to avoid doing that. And Dropout will help us with this. But now we see we’re almost done finishing our training step. We’re at 55,000. All right, we finished training. And now it’s going to go ahead and test for us on 10,000 samples. And it looks like on the testing set, we were at 98.8% accurate. So we ended up doing pretty well, it seems, on this testing set to see how accurately can we predict these handwritten digits. And so what we could do then is actually test it out. I’ve written a program called Recognition.py using PyGame. If you pass it a model that’s been trained, and I pre-trained an example model using this input data, what we can do is see whether or not we’ve been able to train this convolutional neural network to be able to predict handwriting, for example. So I can try, just like drawing a handwritten digit. I’ll go ahead and draw the number 2, for example. So there’s my number 2. Again, this is messy. If you tried to imagine, how would you write a program with just ifs and thens to be able to do this sort of calculation, it would be tricky to do so. But here I’ll press Classify, and all right, it seems I was able to correctly classify that what I drew was the number 2. I’ll go ahead and reset it, try it again. We’ll draw an 8, for example. So here is an 8. Press Classify. And all right, it predicts that the digit that I drew was an 8. And the key here is this really begins to show the power of what the neural network is doing, somehow looking at various different features of these different pixels, figuring out what the relevant features are, and figuring out how to combine them to get a classification. And this would be a difficult task to provide explicit instructions to the computer on how to do, to use a whole bunch of ifs ands to process all these pixel values to figure out what the handwritten digit is. Everyone’s going to draw their 8s a little bit differently. If I drew the 8 again, it would look a little bit different. And yet, ideally, we want to train a network to be robust enough so that it begins to learn these patterns on its own. All I said was, here is the structure of the network, and here is the data on which to train the network. And the network learning algorithm just tries to figure out what is the optimal set of weights, what is the optimal set of filters to use them in order to be able to accurately classify a digit into one category or another. Just going to show the power of these sorts of convolutional neural networks. And so that then was a look at how we can use convolutional neural networks to begin to solve problems with regards to computer vision, the ability to take an image and begin to analyze it. So this is the type of analysis you might imagine that’s happening in self-driving cars that are able to figure out what filters to apply to an image to understand what it is that the computer is looking at, or the same type of idea that might be applied to facial recognition and social media to be able to determine how to recognize faces in an image as well. You can imagine a neural network that instead of classifying into one of 10 different digits could instead classify like, is this person A or is this person B, trying to tell those people apart just based on convolution. And so now what we’ll take a look at is yet another type of neural network that can be quite popular for certain types of tasks. But to do so, we’ll try to generalize and think about our neural network a little bit more abstractly. That here we have a sample deep neural network where we have this input layer, a whole bunch of different hidden layers that are performing certain types of calculations, and then an output layer here that just generates some sort of output that we care about calculating. But we could imagine representing this a little more simply like this. Here is just a more abstract representation of our neural network. We have some input that might be like a vector of a whole bunch of different values as our input. That gets passed into a network that performs some sort of calculation or computation, and that network produces some sort of output. That output might be a single value. It might be a whole bunch of different values. But this is the general structure of the neural network that we’ve seen. There is some sort of input that gets fed into the network. And using that input, the network calculates what the output should be. And this sort of model for a neural network is what we might call a feed-forward neural network. Feed-forward neural networks have connections only in one direction. They move from one layer to the next layer to the layer after that, such that the inputs pass through various different hidden layers and then ultimately produce some sort of output. So feed-forward neural networks were very helpful for solving these types of classification problems that we saw before. We have a whole bunch of input. We want to learn what setting of weights will allow us to calculate the output effectively. But there are some limitations on feed-forward neural networks that we’ll see in a moment. In particular, the input needs to be of a fixed shape, like a fixed number of neurons are in the input layer. And there’s a fixed shape for the output, like a fixed number of neurons in the output layer. And that has some limitations of its own. And a possible solution to this, and we’ll see examples of the types of problems we can solve for this in just a second, is instead of just a feed-forward neural network, where there are only connections in one direction from left to right effectively across the network, we could also imagine a recurrent neural network, where a recurrent neural network generates output that gets fed back into itself as input for future runs of that network. So whereas in a traditional neural network, we have inputs that get fed into the network, that get fed into the output. And the only thing that determines the output is based on the original input and based on the calculation we do inside of the network itself. This goes in contrast with a recurrent neural network, where in a recurrent neural network, you can imagine output from the network feeding back to itself into the network again as input for the next time you do the calculations inside of the network. What this allows is it allows the network to maintain some sort of state, to store some sort of information that can be used on future runs of the network. Previously, the network just defined some weights, and we passed inputs through the network, and it generated outputs. But the network wasn’t saving any information based on those inputs to be able to remember for future iterations or for future runs. What a recurrent neural network will let us do is let the network store information that gets passed back in as input to the network again the next time we try and perform some sort of action. And this is particularly helpful when dealing with sequences of data. So we’ll see a real world example of this right now, actually. Microsoft has developed an AI known as the caption bot. And what the caption bot does is it says, I can understand the content of any photograph, and I’ll try to describe it as well as any human. I’ll analyze your photo, but I won’t store it or share it. And so what Microsoft’s caption bot seems to be claiming to do is it can take an image and figure out what’s in the image and just give us a caption to describe it. So let’s try it out. Here, for example, is an image of Harvard Square. It’s some people walking in front of one of the buildings at Harvard Square. I’ll go ahead and take the URL for that image, and I’ll paste it into caption bot and just press Go. So caption bot is analyzing the image, and then it says, I think it’s a group of people walking in front of a building, which seems amazing. The AI is able to look at this image and figure out what’s in the image. And the important thing to recognize here is that this is no longer just a classification task. We saw being able to classify images with a convolutional neural network where the job was take the image and then figure out, is it a 0 or a 1 or a 2, or is it this person’s face or that person’s face? What seems to be happening here is the input is an image, and we know how to get networks to take input of images, but the output is text. It’s a sentence. It’s a phrase, like a group of people walking in front of a building. And this would seem to pose a challenge for our more traditional feed-forward neural networks, for the reason being that in traditional neural networks, we just have a fixed-size input and a fixed-size output. There are a certain number of neurons in the input to our neural network and a certain number of outputs for our neural network, and then some calculation that goes on in between. But the size of the inputs and the number of values in the input and the number of values in the output, those are always going to be fixed based on the structure of the neural network. And that makes it difficult to imagine how a neural network could take an image like this and say it’s a group of people walking in front of the building because the output is text, like it’s a sequence of words. Now, it might be possible for a neural network to output one word, one word you could represent as a vector of values, and you can imagine ways of doing that. Next time, we’ll talk a little bit more about AI as it relates to language and language processing. But a sequence of words is much more challenging because depending on the image, you might imagine the output is a different number of words. We could have sequences of different lengths, and somehow we still want to be able to generate the appropriate output. And so the strategy here is to use a recurrent neural network, a neural network that can feed its own output back into itself as input for the next time. And this allows us to do what we call a one-to-many relationship for inputs to outputs, that in vanilla, more traditional neural networks, these are what we might consider to be one-to-one neural networks. You pass in one set of values as input. You get one vector of values as the output. But in this case, we want to pass in one value as input, the image, and we want to get a sequence, many values as output, where each value is like one of these words that gets produced by this particular algorithm. And so the way we might do this is we might imagine starting by providing input, the image, into our neural network. And the neural network is going to generate output, but the output is not going to be the whole sequence of words, because we can’t represent the whole sequence of words using just a fixed set of neurons. Instead, the output is just going to be the first word. We’re going to train the network to output what the first word of the caption should be. And you could imagine that Microsoft has trained this by running a whole bunch of training samples through the AI, giving it a whole bunch of pictures and what the appropriate caption was, and having the AI begin to learn from that. But now, because the network generates output that can be fed back into itself, you could imagine the output of the network being fed back into the same network. This here looks like a separate network, but it’s really the same network that’s just getting different input, that this network’s output gets fed back into itself, but it’s going to generate another output. And that other output is going to be the second word in the caption. And this recurrent neural network then, this network is going to generate other output that can be fed back into itself to generate yet another word, fed back into itself to generate another word. And so recurrent neural networks allow us to represent this one-to-many structure. You provide one image as input, and the neural network can pass data into the next run of the network, and then again and again, such that you could run the network multiple times, each time generating a different output still based on that original input. And this is where recurrent neural networks become particularly useful when dealing with sequences of inputs or outputs. And my output is a sequence of words, and since I can’t very easily represent outputting an entire sequence of words, I’ll instead output that sequence one word at a time by allowing my network to pass information about what still needs to be said about the photo into the next stage of running the network. So you could run the network multiple times, the same network with the same weights, just getting different input each time. First, getting input from the image, and then getting input from the network itself as additional information about what additionally needs to be given in a particular caption, for example. So this then is a one-to-many relationship inside of a recurrent neural network, but it turns out there are other models that we can use, other ways we can try and use recurrent neural networks to be able to represent data that might be stored in other forms as well. We saw how we could use neural networks in order to analyze images in the context of convolutional neural networks that take an image, figure out various different properties of the image, and are able to draw some sort of conclusion based on that. But you might imagine that something like YouTube, they need to be able to do a lot of learning based on video. They need to look through videos to detect if they’re like copyright violations, or they need to be able to look through videos to maybe identify what particular items are inside of the video, for example. And video, you might imagine, is much more difficult to put in as input to a neural network, because whereas an image, you could just treat each pixel as a different value, videos are sequences. They’re sequences of images, and each sequence might be of different length. And so it might be challenging to represent that entire video as a single vector of values that you could pass in to a neural network. And so here, too, recurrent neural networks can be a valuable solution for trying to solve this type of problem. Then instead of just passing in a single input into our neural network, we could pass in the input one frame at a time, you might imagine. First, taking the first frame of the video, passing it into the network, and then maybe not having the network output anything at all yet. Let it take in another input, and this time, pass it into the network. But the network gets information from the last time we provided an input into the network. Then we pass in a third input, and then a fourth input, where each time, what the network gets is it gets the most recent input, like each frame of the video. But it also gets information the network processed from all of the previous iterations. So on frame number four, you end up getting the input for frame number four plus information the network has calculated from the first three frames. And using all of that data combined, this recurrent neural network can begin to learn how to extract patterns from a sequence of data as well. And so you might imagine, if you want to classify a video into a number of different genres, like an educational video, or a music video, or different types of videos, that’s a classification task, where you want to take as input each of the frames of the video, and you want to output something like what it is, what category that it happens to belong to. And you can imagine doing this sort of thing, this sort of many-to-one learning, any time your input is a sequence. And so input is a sequence in the context of video. It could be in the context of, like, if someone has typed a message and you want to be able to categorize that message, like if you’re trying to take a movie review and trying to classify it as, is it a positive review or a negative review? That input is a sequence of words, and the output is a classification, positive or negative. There, too, a recurrent neural network might be helpful for analyzing sequences of words. And they’re quite popular when it comes to dealing with language. Could even be used for spoken language as well, that spoken language is an audio waveform that can be segmented into distinct chunks. And each of those could be passed in as an input into a recurrent neural network to be able to classify someone’s voice, for instance. If you want to do voice recognition to say, is this one person or is this another, here are also cases where you might want this many-to-one architecture for a recurrent neural network. And then as one final problem, just to take a look at in terms of what we can do with these sorts of networks, imagine what Google Translate is doing. So what Google Translate is doing is it’s taking some text written in one language and converting it into text written in some other language, for example, where now this input is a sequence of data. It’s a sequence of words. And the output is a sequence of words as well. It’s also a sequence. So here we want effectively a many-to-many relationship. Our input is a sequence and our output is a sequence as well. And it’s not quite going to work to just say, take each word in the input and translate it into a word in the output. Because ultimately, different languages put their words in different orders. And maybe one language uses two words for something, whereas another language only uses one. So we really want some way to take this information, this input, encode it somehow, and use that encoding to generate what the output ultimately should be. And this has been one of the big advancements in automated translation technology, is the ability to use the neural networks to do this instead of older, more traditional methods. And this has improved accuracy dramatically. And the way you might imagine doing this is, again, using a recurrent neural network with multiple inputs and multiple outputs. We start by passing in all the input. Input goes into the network. Another input, like another word, goes into the network. And we do this multiple times, like once for each word in the input that I’m trying to translate. And only after all of that is done does the network now start to generate output, like the first word of the translated sentence, and the next word of the translated sentence, so on and so forth, where each time the network passes information to itself by allowing for this model of giving some sort of state from one run in the network to the next run, assembling information about all the inputs, and then passing in information about which part of the output in order to generate next. And there are a number of different types of these sorts of recurrent neural networks. One of the most popular is known as the long short-term memory neural network, otherwise known as LSTM. But in general, these types of networks can be very, very powerful whenever we’re dealing with sequences, whether those are sequences of images or especially sequences of words when it comes towards dealing with natural language. And so that then were just some of the different types of neural networks that can be used to do all sorts of different computations. And these are incredibly versatile tools that can be applied to a number of different domains. We only looked at a couple of the most popular types of neural networks from more traditional feed-forward neural networks, convolutional neural networks, and recurrent neural networks. But there are other types as well. There are adversarial networks where networks compete with each other to try and be able to generate new types of data, as well as other networks that can solve other tasks based on what they happen to be structured and adapted for. And these are very powerful tools in machine learning from being able to very easily learn based on some set of input data and to be able to, therefore, figure out how to calculate some function from inputs to outputs, whether it’s input to some sort of classification like analyzing an image and getting a digit or machine translation where the input is in one language and the output is in another. These tools have a lot of applications for machine learning more generally. Next time, we’ll look at machine learning and AI in particular in the context of natural language. We talked a little bit about this today, but looking at how it is that our AI can begin to understand natural language and can begin to be able to analyze and do useful tasks with regards to human language, which turns out to be a challenging and interesting task. So we’ll see you next time. And welcome back, everybody, to our final class in an introduction to artificial intelligence with Python. Now, so far in this class, we’ve been taking problems that we want to solve intelligently and framing them in ways that computers are going to be able to make sense of. We’ve been taking problems and framing them as search problems or constraint satisfaction problems or optimization problems, for example. In essence, we have been trying to communicate about problems in ways that our computer is going to be able to understand. Today, the goal is going to be to get computers to understand the way you and I communicate naturally via our own natural languages, languages like English. But natural language contains a lot of nuance and complexity that’s going to make it challenging for computers to be able to understand. So we’ll need to explore some new tools and some new techniques to allow computers to make sense of natural language. So what is it exactly that we’re trying to get computers to do? Well, they all fall under this general heading of natural language processing, getting computers to work with natural language. And these tasks include tasks like automatic summarization. Given a long text, can we train the computer to be able to come up with a shorter representation of it? Information extraction, getting the computer to pull out relevant facts or details out of some text. Machine translation, like Google Translate, translating some text from one language into another language. Question answering, if you’ve ever asked a question to your phone or had a conversation with an AI chatbot where you provide some text to the computer, the computer is able to understand that text and then generate some text in response. Text classification, where we provide some text to the computer and the computer assigns it a label, positive or negative, inbox or spam, for example. And there are several other kinds of tasks that all fall under this heading of natural language processing. But before we take a look at how the computer might try to solve these kinds of tasks, it might be useful for us to think about language in general. What are the kinds of challenges that we might need to deal with as we start to think about language and getting a computer to be able to understand it? So one part of language that we’ll need to consider is the syntax of language. Syntax is all about the structure of language. Language is composed of individual words. And those words are composed together in some kind of structured whole. And if our computer is going to be able to understand language, it’s going to need to understand something about that structure. So let’s take a couple of examples. Here, for instance, is a sentence. Just before 9 o’clock, Sherlock Holmes stepped briskly into the room. That sentence is made up of words. And those words together form a structured whole. This is syntactically valid as a sentence. But we could take some of those same words, rearrange them, and come up with a sentence that is not syntactically valid. Here, for example, just before Sherlock Holmes 9 o’clock stepped briskly the room is still composed of valid words. But they’re not in any kind of logical whole. This is not a syntactically well-formed sentence. Another interesting challenge is that some sentences will have multiple possible valid structures. Here’s a sentence, for example. I saw the man on the mountain with a telescope. And here, this is a valid sentence. But it actually has two different possible structures that lend themselves to two different interpretations and two different meanings. Maybe I, the one doing the seeing, am the one with the telescope. Or maybe the man on the mountain is the one with the telescope. And so natural language is ambiguous. Sometimes the same sentence can be interpreted in multiple ways. And that’s something that we’ll need to think about as well. And this lends itself to another problem within language that we’ll need to think about, which is semantics. While syntax is all about the structure of language, semantics is about the meaning of language. It’s not enough for a computer just to know that a sentence is well-structured if it doesn’t know what that sentence means. And so semantics is going to concern itself with the meaning of words and the meaning of sentences. So if we go back to that same sentence as before, just before 9 o’clock, Sherlock Holmes stepped briskly into the room, I could come up with another sentence, say the sentence, a few minutes before 9, Sherlock Holmes walked quickly into the room. And those are two different sentences with some of the words the same and some of the words different. But the two sentences have essentially the same meaning. And so ideally, whatever model we build, we’ll be able to understand that these two sentences, while different, mean something very similar. Some syntactically well-formed sentences don’t mean anything at all. A famous example from linguist Noam Chomsky is the sentence, colorless green ideas sleep furiously. This is a syntactically, structurally well-formed sentence. We’ve got adjectives modifying a noun, ideas. We’ve got a verb and an adverb in the correct positions. But when taken as a whole, the sentence doesn’t really mean anything. And so if our computers are going to be able to work with natural language and perform tasks in natural language processing, these are some concerns we’ll need to think about. We’ll need to be thinking about syntax. And we’ll need to be thinking about semantics. So how could we go about trying to teach a computer how to understand the structure of natural language? Well, one approach we might take is by starting by thinking about the rules of natural language. Our natural languages have rules. In English, for example, nouns tend to come before verbs. Nouns can be modified by adjectives, for example. And so if only we could formalize those rules, then we could give those rules to a computer, and the computer would be able to make sense of them and understand them. And so let’s try to do exactly that. We’re going to try to define a formal grammar. Where a formal grammar is some system of rules for generating sentences in a language. This is going to be a rule-based approach to natural language processing. We’re going to give the computer some rules that we know about language and have the computer use those rules to make sense of the structure of language. And there are a number of different types of formal grammars. Each one of them has slightly different use cases. But today, we’re going to focus specifically on one kind of grammar known as a context-free grammar. So how does the context-free grammar work? Well, here is a sentence that we might want a computer to generate. She saw the city. And we’re going to call each of these words a terminal symbol. A terminal symbol, because once our computer has generated the word, there’s nothing else for it to generate. Once it’s generated the sentence, the computer is done. We’re going to associate each of these terminal symbols with a non-terminal symbol that generates it. So here we’ve got n, which stands for noun, like she or city. We’ve got v as a non-terminal symbol, which stands for a verb. And then we have d, which stands for determiner. A determiner is a word like the or a or an in English, for example. So each of these non-terminal symbols can generate the terminal symbols that we ultimately care about generating. But how do we know, or how does the computer know which non-terminal symbols are associated with which terminal symbols? Well, to do that, we need some kind of rule. Here are some what we call rewriting rules that have a non-terminal symbol on the left-hand side of an arrow. And on the right side is what that non-terminal symbol can be replaced with. So here we’re saying the non-terminal symbol n, again, which stands for noun, could be replaced by any of these options separated by vertical bars. n could be replaced by she or city or car or hairy. d for determiner could be replaced by the a or an and so forth. Each of these non-terminal symbols could be replaced by any of these words. We can also have non-terminal symbols that are replaced by other non-terminal symbols. Here is an interesting rule, np arrow n bar dn. So what does that mean? Well, np stands for a noun phrase. Sometimes when we have a noun phrase in a sentence, it’s not just a single word, it could be multiple words. And so here we’re saying a noun phrase could be just a noun, or it could be a determiner followed by a noun. So we might have a noun phrase that’s just a noun, like she, that’s a noun phrase. Or we could have a noun phrase that’s multiple words, something like the city also acts as a noun phrase. But in this case, it’s composed of two words, a determiner, the, and a noun city. We could do the same for verb phrases. A verb phrase, or VP, might be just a verb, or it might be a verb followed by a noun phrase. So we could have a verb phrase that’s just a single word, like the word walked, or we could have a verb phrase that is an entire phrase, something like saw the city, as an entire verb phrase. A sentence, meanwhile, we might then define as a noun phrase followed by a verb phrase. And so this would allow us to generate a sentence like she saw the city, an entire sentence made up of a noun phrase, which is just the word she, and then a verb phrase, which is saw the city, saw which is a verb, and then the city, which itself is also a noun phrase. And so if we could give these rules to a computer explaining to it what non-terminal symbols could be replaced by what other symbols, then a computer could take a sentence and begin to understand the structure of that sentence. And so let’s take a look at an example of how we might do that. And to do that, we’re going to use a Python library called NLTK, or the Natural Language Toolkit, which we’ll see a couple of times today. It contains a lot of helpful features and functions that we can use for trying to deal with and process natural language. So here we’ll take a look at how we can use NLTK in order to parse a context-free grammar. So let’s go ahead and open up cfg0.py, cfg standing for context-free grammar. And what you’ll see in this file is that I first import NLTK, the Natural Language Toolkit. And the first thing I do is define a context-free grammar, saying that a sentence is a noun phrase followed by a verb phrase. I’m defining what a noun phrase is, defining what a verb phrase is, and then giving some examples of what I can do with these non-terminal symbols, D for determiner, N for noun, and V for verb. We’re going to use NLTK to parse that grammar. Then we’ll ask the user for some input in the form of a sentence and split it into words. And then we’ll use this context-free grammar parser to try to parse that sentence and print out the resulting syntax tree. So let’s take a look at an example. We’ll go ahead and go into my cfg directory, and we’ll run cfg0.py. And here I’m asked to type in a sentence. Let’s say I type in she walked. And when I do that, I see that she walked is a valid sentence, where she is a noun phrase, and walked is the corresponding verb phrase. I could try to do this with a more complex sentence too. I could do something like she saw the city. And here we see that she is the noun phrase, and then saw the city is the entire verb phrase that makes up this sentence. So that was a very simple grammar. Let’s take a look at a slightly more complex grammar. Here is cfg1.py, where a sentence is still a noun phrase followed by a verb phrase, but I’ve added some other possible non-terminal symbols too. I have AP for adjective phrase and PP for prepositional phrase. And we specified that we could have an adjective phrase before a noun phrase or a prepositional phrase after a noun, for example. So lots of additional ways that we might try to structure a sentence and interpret and parse one of those resulting sentences. So let’s see that one in action. We’ll go ahead and run cfg1.py with this new grammar. And we’ll try a sentence like she saw the wide street. Here, Python’s NLTK is able to parse that sentence and identify that she saw the wide street has this particular structure, a sentence with a noun phrase and a verb phrase, where that verb phrase has a noun phrase that within it contains an adjective. And so it’s able to get some sense for what the structure of this language actually is. Let’s try another example. Let’s say she saw the dog with the binoculars. And we’ll try that sentence. And here, we get one possible syntax tree, she saw the dog with the binoculars. But notice that this sentence is actually a little bit ambiguous in our own natural language. Who has the binoculars? Is it she who has the binoculars or the dog who has the binoculars? And NLTK is able to identify both possible structures for the sentence. In this case, the dog with the binoculars is an entire noun phrase. It’s all underneath this NP here. So it’s the dog that has the binoculars. But we also got an alternative parse tree, where the dog is just the noun phrase. And with the binoculars is a prepositional phrase modifying saw. So she saw the dog and she used the binoculars in order to see the dog as well. So this allows us to get a sense for the structure of natural language. But it relies on us writing all of these rules. And it would take a lot of effort to write all of the rules for any possible sentence that someone might write or say in the English language. Language is complicated. And as a result, there are going to be some very complex rules. So what else might we try? We might try to take a statistical lens towards approaching this problem of natural language processing. If we were able to give the computer a lot of existing data of sentences written in the English language, what could we try to learn from that data? Well, it might be difficult to try and interpret long pieces of text all at once. So instead, what we might want to do is break up that longer text into smaller pieces of information instead. In particular, we might try to create n-grams out of a longer sequence of text. An n-gram is just some contiguous sequence of n items from a sample of text. It might be n characters in a row or n words in a row, for example. So let’s take a passage from Sherlock Holmes. And let’s look for all of the trigrams. A trigram is an n-gram where n is equal to 3. So in this case, we’re looking for sequences of three words in a row. So the trigrams here would be phrases like how often have. That’s three words in a row. Often have I is another trigram. Have I said, I said to, said to you, to you that. These are all trigrams, sequences of three words that appear in sequence. And if we could give the computer a large corpus of text and have it pull out all of the trigrams in this case, it could get a sense for what sequences of three words tend to appear next to each other in our own natural language and, as a result, get some sense for what the structure of the language actually is. So let’s take a look at an example of that. How can we use NLTK to try to get access to information about n-grams? So here, we’re going to open up ngrams.py. And this is a Python program that’s going to load a corpus of data, just some text files, into our computer’s memory. And then we’re going to use NLTK’s ngrams function, which is going to go through the corpus of text, pulling out all of the ngrams for a particular value of n. And then, by using Python’s counter class, we’re going to figure out what are the most common ngrams inside of this entire corpus of text. And we’re going to need a data set in order to do this. And I’ve prepared a data set of some of the stories of Sherlock Holmes. So it’s just a bunch of text files. A lot of words for it to analyze. And as a result, we’ll get a sense for what sequences of two words or three words that tend to be most common in natural language. So let’s give this a try. We’ll go into my ngrams directory. And we’ll run ngrams.py. We’ll try an n value of 2. So we’re looking for sequences of two words in a row. And we’ll use our corpus of stories from Sherlock Holmes. And when we run this program, we get a list of the most common ngrams where n is equal to 2, otherwise known as a bigram. So the most common one is of the. That’s a sequence of two words that appears quite frequently in natural language. Then in the. And it was. These are all common sequences of two words that appear in a row. Let’s instead now try running ngrams with n equal to 3. Let’s get all of the trigrams and see what we get. And now we see the most common trigrams are it was a. One of the. I think that. These are all sequences of three words that appear quite frequently. And we were able to do this essentially via a process known as tokenization. Tokenization is the process of splitting a sequence of characters into pieces. In this case, we’re splitting a long sequence of text into individual words and then looking at sequences of those words to get a sense for the structure of natural language. So once we’ve done this, once we’ve done the tokenization, once we’ve built up our corpus of ngrams, what can we do with that information? So the one thing that we might try is we could build a Markov chain, which you might recall from when we talked about probability. Recall that a Markov chain is some sequence of values where we can predict one value based on the values that came before it. And as a result, if we know all of the common ngrams in the English language, what words tend to be associated with what other words in sequence, we can use that to predict what word might come next in a sequence of words. And so we could build a Markov chain for language in order to try to generate natural language that follows the same statistical patterns as some input data. So let’s take a look at that and build a Markov chain for natural language. And as input, I’m going to use the works of William Shakespeare. So here I have a file Shakespeare.txt, which is just a bunch of the works of William Shakespeare. It’s a long text file, so plenty of data to analyze. And here in generator.py, I’m using a third party Python library in order to do this analysis. We’re going to read in the sample of text, and then we’re going to train a Markov model based on that text. And then we’re going to have the Markov chain generate some sentences. We’re going to generate a sentence that doesn’t appear in the original text, but that follows the same statistical patterns that’s generating it based on the ngrams trying to predict what word is likely to come next that we would expect based on those statistical patterns. So we’ll go ahead and go into our Markov directory, run this generator with the works of William Shakespeare’s input. And what we’re going to get are five new sentences, where these sentences are not necessarily sentences from the original input text itself, but just that follow the same statistical patterns. It’s predicting what word is likely to come next based on the input data that we’ve seen and the types of words that tend to appear in sequence there too. And so we’re able to generate these sentences. Of course, so far, there’s no guarantee that any of the sentences that are generated actually mean anything or make any sense. They just happen to follow the statistical patterns that our computer is already aware of. So we’ll return to this issue of how to generate text in perhaps a more accurate or more meaningful way a little bit later. So let’s now turn our attention to a slightly different problem, and that’s the problem of text classification. Text classification is the problem where we have some text and we want to put that text into some kind of category. We want to apply some sort of label to that text. And this kind of problem shows up in a wide variety of places. A commonplace might be your email inbox, for example. You get an email and you want your computer to be able to identify whether the email belongs in your inbox or whether it should be filtered out into spam. So we need to classify the text. Is it a good email or is it spam? Another common use case is sentiment analysis. We might want to know whether the sentiment of some text is positive or negative. And so how might we do that? This comes up in situations like product reviews, where we might have a bunch of reviews for a product on some website. My grandson loved it so much fun. Product broke after a few days. One of the best games I’ve played in a long time and kind of cheap and flimsy, not worth it. Here’s some example sentences that you might see on a product review website. And you and I could pretty easily look at this list of product reviews and decide which ones are positive and which ones are negative. We might say the first one and the third one, those seem like positive sentiment messages. But the second one and the fourth one seem like negative sentiment messages. But how did we know that? And how could we train a computer to be able to figure that out as well? Well, you might have clued your eye in on particular key words, where those particular words tend to mean something positive or negative. So you might have identified words like loved and fun and best tend to be associated with positive messages. And words like broke and cheap and flimsy tend to be associated with negative messages. So if only we could train a computer to be able to learn what words tend to be associated with positive versus negative messages, then maybe we could train a computer to do this kind of sentiment analysis as well. So we’re going to try to do just that. We’re going to use a model known as the bag of words model, which is a model that represents text as just an unordered collection of words. For the purpose of this model, we’re not going to worry about the sequence and the ordering of the words, which word came first, second, or third. We’re just going to treat the text as a collection of words in no particular order. And we’re losing information there, right? The order of words is important. And we’ll come back to that a little bit later. But for now, to simplify our model, it’ll help us tremendously just to think about text as some unordered collection of words. And in particular, we’re going to use the bag of words model to build something known as a naive Bayes classifier. So what is a naive Bayes classifier? Well, it’s a tool that’s going to allow us to classify text based on Bayes rule, again, which you might remember from when we talked about probability. Bayes rule says that the probability of B given A is equal to the probability of A given B multiplied by the probability of B divided by the probability of A. So how are we going to use this rule to be able to analyze text? Well, what are we interested in? We’re interested in the probability that a message has a positive sentiment and the probability that a message has a negative sentiment, which I’m here for simplicity going to represent just with these emoji, happy face and frown face, as positive and negative sentiment. And so if I had a review, something like my grandson loved it, then what I’m interested in is not just the probability that a message has positive sentiment, but the conditional probability that a message has positive sentiment given that this is the message my grandson loved it. But how do I go about calculating this value, the probability that the message is positive given that the review is this sequence of words? Well, here’s where the bag of words model comes in. Rather than treat this review as a string of a sequence of words in order, we’re just going to treat it as an unordered collection of words. We’re going to try to calculate the probability that the review is positive given that all of these words, my grandson loved it, are in the review in no particular order, just this unordered collection of words. And this is a conditional probability, which we can then apply Bayes rule to try to make sense of. And so according to Bayes rule, this conditional probability is equal to what? It’s equal to the probability that all of these four words are in the review given that the review is positive multiplied by the probability that the review is positive divided by the probability that all of these words happen to be in the review. So this is the value now that we’re going to try to calculate. Now, one thing you might notice is that the denominator here, the probability that all of these words appear in the review, doesn’t actually depend on whether or not we’re looking at the positive sentiment or negative sentiment case. So we can actually get rid of this denominator. We don’t need to calculate it. We can just say that this probability is proportional to the numerator. And then at the end, we’re going to need to normalize the probability distribution to make sure that all of the values sum up to the value 1. So now, how do we calculate this value? Well, this is the probability of all of these words given positive times probability of positive. And that, by the definition of joint probability, is just one big joint probability, the probability that all of these things are the case, that it’s a positive review, and that all four of these words are in the review. But still, it’s not entirely obvious how we calculate that value. And here is where we need to make one more assumption. And this is where the naive part of naive Bayes comes in. We’re going to make the assumption that all of the words are independent of each other. And by that, I mean that if the word grandson is in the review, that doesn’t change the probability that the word loved is in the review or that the word it is in the review, for example. And in practice, this assumption might not be true. It’s almost certainly the case that the probability of words do depend on each other. But it’s going to simplify our analysis and still give us reasonably good results just to assume that the words are independent of each other and they only depend on whether it’s positive or negative. You might, for example, expect the word loved to appear more often in a positive review than in a negative review. So what does that mean? Well, if we make this assumption, then we can say that this value, the probability we’re interested in, is not directly proportional to, but it’s naively proportional to this value. The probability that the review is positive times the probability that my is in the review, given that it’s positive, times the probability that grandson is in the review, given that it’s positive, and so on for the other two words that happen to be in this review. And now this value, which looks a little more complex, is actually a value that we can calculate pretty easily. So how are we going to estimate the probability that the review is positive? Well, if we have some training data, some example data of example reviews where each one has already been labeled as positive or negative, then we can estimate the probability that a review is positive just by counting the number of positive samples and dividing by the total number of samples that we have in our training data. And for the conditional probabilities, the probability of loved, given that it’s positive, well, that’s going to be the number of positive samples with loved in it divided by the total number of positive samples. So let’s take a look at an actual example to see how we could try to calculate these values. Here I’ve put together some sample data. The way to interpret the sample data is that based on the training data, 49% of the reviews are positive, 51% are negative. And then over here in this table, we have some conditional probabilities. And then we have if the review is positive, then there is a 30% chance that my appears in it. And if the review is negative, there is a 20% chance that my appears in it. And based on our training data among the positive reviews, 1% of them contain the word grandson. And among the negative reviews, 2% contain the word grandson. So using this data, let’s try to calculate this value, the value we’re interested in. And to do that, we’ll need to multiply all of these values together. The probability of positive, and then all of these positive conditional probabilities. And when we do that, we get some value. And then we can do the same thing for the negative case. We’re going to do the same thing, take the probability that it’s negative, multiply it by all of these conditional probabilities, and we’re going to get some other value. And now these values don’t sum to one. They’re not a probability distribution yet. But I can normalize them and get some values. And that tells me that we’re going to predict that my grandson loved it. We think there’s a 68% chance, probability 0.68, that that is a positive sentiment review, and 0.32 probability that it’s a negative review. So what problems might we run into here? What could potentially go wrong when doing this kind of analysis in order to analyze whether text has a positive or negative sentiment? Well, a couple of problems might arise. One problem might be, what if the word grandson never appears for any of the positive reviews? If that were the case, then when we try to calculate the value, the probability that we think the review is positive, we’re going to multiply all these values together, and we’re just going to get 0 for the positive case, because we’re all going to ultimately multiply by that 0 value. And so we’re going to say that we think there is no chance that the review is positive because it contains the word grandson. And in our training data, we’ve never seen the word grandson appear in a positive sentiment message before. And that’s probably not the right analysis, because in cases of rare words, it might be the case that in nowhere in our training data did we ever see the word grandson appear in a message that has positive sentiment. So what can we do to solve this problem? Well, one thing we’ll often do is some kind of additive smoothing, where we add some value alpha to each value in our distribution just to smooth out the data a little bit. And a common form of this is Laplace smoothing, where we add 1 to each value in our distribution. In essence, we pretend we’ve seen each value one more time than we actually have. So if we’ve never seen the word grandson for a positive review, we pretend we’ve seen it once. If we’ve seen it once, we pretend we’ve seen it twice, just to avoid the possibility that we might multiply by 0 and as a result, get some results we don’t want in our analysis. So let’s see what this looks like in practice. Let’s try to do some naive Bayes classification in order to classify text as either positive or negative. We’ll take a look at sentiment.py. And what this is going to do is load some sample data into memory, some examples of positive reviews and negative reviews. And then we’re going to train a naive Bayes classifier on all of this training data, training data that includes all of the words we see in positive reviews and all of the words we see in negative reviews. And then we’re going to try to classify some input. And so we’re going to do this based on a corpus of data. I have some example positive reviews. Here are some positive reviews. It was great, so much fun, for example. And then some negative reviews, not worth it, kind of cheap. These are some examples of negative reviews. So now let’s try to run this classifier and see how it would classify particular text as either positive or negative. We’ll go ahead and run our sentiment analysis on this corpus. And we need to provide it with a review. So I’ll say something like, I enjoyed it. And we see that the classifier says there is about a 0.92 probability that we think that this particular review is positive. Let’s try something negative. We’ll try kind of overpriced. And we see that there is a 0.96 probability now that we think that this particular review is negative. And so our naive Bayes classifier has learned what kinds of words tend to appear in positive reviews and what kinds of words tend to appear in negative reviews. And as a result of that, we’ve been able to design a classifier that can predict whether a particular review is positive or negative. And so this definitely is a useful tool that we can use to try and make some predictions. But we had to make some assumptions in order to get there. So what if we want to now try to build some more sophisticated models, use some tools from machine learning to try and take better advantage of language data to be able to draw more accurate conclusions and solve new kinds of tasks and new kinds of problems? Well, we’ve seen a couple of times now that when we want to take some data and take some input, put it in a way that the computer is going to be able to make sense of, it can be helpful to take that data and turn it into numbers, ultimately. And so what we might want to try to do is come up with some word representation, some way to take a word and translate its meaning into numbers. Because, for example, if we wanted to use a neural network to be able to process language, give our language to a neural network and have it make some predictions or perform some analysis there, a neural network takes its input and produces its output a vector of values, a vector of numbers. And so what we might want to do is take our data and somehow take words and convert them into some kind of numeric representation. So how might we do that? How might we take words and turn them into numbers? Let’s take a look at an example. Here’s a sentence, he wrote a book. And let’s say I wanted to take each of those words and turn it into a vector of values. Here’s one way I might do that. We’ll say he is going to be a vector that has a 1 in the first position and the rest of the values are 0. Wrote will have a 1 in the second position and the rest of the values are 0. A has a 1 in the third position with the rest of the value 0. And book has a 1 in the fourth position with the rest of the value 0. So each of these words now has a distinct vector representation. And this is what we often call a one-hot representation, a representation of the meaning of a word as a vector with a single 1 and all of the rest of the values are 0. And so when doing this, we now have a numeric representation for every word and we could pass in those vector representations into a neural network or other models that require some kind of numeric data as input. But this one-hot representation actually has a couple of problems and it’s not ideal for a few reasons. One reason is, here we’re just looking at four words. But if you imagine a vocabulary of thousands of words or more, these vectors are going to get quite long in order to have a distinct vector for every possible word in a vocabulary. And as a result of that, these longer vectors are going to be more difficult to deal with, more difficult to train, and so forth. And so that might be a problem. Another problem is a little bit more subtle. If we want to represent a word as a vector, and in particular the meaning of a word as a vector, then ideally it should be the case that words that have similar meanings should also have similar vector representations, so that they’re close to each other together inside a vector space. But that’s not really going to be the case with these one-hot representations, because if we take some similar words, say the word wrote and the word authored, which means similar things, they have entirely different vector representations. Likewise, book and novel, those two words mean somewhat similar things, but they have entirely different vector representations because they each have a one in some different position. And so that’s not ideal either. So what we might be interested in instead is some kind of distributed representation. A distributed representation is the representation of the meaning of a word distributed across multiple values, instead of just being one-hot with a one in one position. Here is what a distributed representation of words might be. Each word is associated with some vector of values, with the meaning distributed across multiple values, ideally in such a way that similar words have a similar vector representation. But how are we going to come up with those values? Where do those values come from? How can we define the meaning of a word in this distributed sequence of numbers? Well, to do that, we’re going to draw inspiration from a quote from British linguist J.R. Firth, who said, you shall know a word by the company it keeps. In other words, we’re going to define the meaning of a word based on the words that appear around it, the context words around it. Take, for example, this context, for blank he ate. You might wonder, what words could reasonably fill in that blank? Well, it might be words like breakfast or lunch or dinner. All of those could reasonably fill in that blank. And so what we’re going to say is because the words breakfast and lunch and dinner appear in a similar context, that they must have a similar meaning. And that’s something our computer could understand and try to learn. A computer could look at a big corpus of text, look at what words tend to appear in similar context to each other, and use that to identify which words have a similar meaning and should therefore appear close to each other inside a vector space. And so one common model for doing this is known as the word to vec model. It’s a model for generating word vectors, a vector representation for every word by looking at data and looking at the context in which a word appears. The idea is going to be this. If you start out with all of the words just in some random position in space and train it on some training data, what the word to vec model will do is start to learn what words appear in similar contexts. And it will move these vectors around in such a way that hopefully words with similar meanings, breakfast, lunch, and dinner, book, memoir, novel, will hopefully appear to be near to each other as vectors as well. So let’s now take a look at what word to vec might look like in practice when implemented in code. What I have here inside of words.txt is a pre-trained model where each of these words has some vector representation trained by word to vec. Each of these words has some sequence of values representing its meaning, hopefully in such a way that similar words are represented by similar vectors. I also have this file vectors.py, which is going to open up the words and form them into a dictionary. And we also define some useful functions like distance to get the distance between two word vectors and closest words to find which words are nearby in terms of having close vectors to each other. And so let’s give this a try. We’ll go ahead and open a Python interpreter. And I’m going to import these vectors. And we might say, all right, what is the vector representation of the word book? And we get this big long vector that represents the word book as a sequence of values. And this sequence of values by itself is not all that meaningful. But it is meaningful in the context of comparing it to other vectors for other words. So we could use this distance function, which is going to get us the distance between two word vectors. And we might say, what is the distance between the vector representation for the word book and the vector representation for the word novel? And we see that it’s 0.34. You can kind of interpret 0 as being really close together and 1 being very far apart. And so now, what is the distance between book and, let’s say, breakfast? Well, book and breakfast are more different from each other than book and novel are. So I would hopefully expect the distance to be larger. And in fact, it is 0.64 approximately. These two words are further away from each other. And what about now the distance between, let’s say, lunch and breakfast? Well, that’s about 0.2. Those are even closer together. They have a meaning that is closer to each other. Another interesting thing we might do is calculate the closest words. We might say, what are the closest words, according to Word2Vec, to the word book? And let’s say, let’s get the 10 closest words. What are the 10 closest vectors to the vector representation for the word book? And when we perform that analysis, we get this list of words. The closest one is book itself, but we also have books plural, and then essay, memoir, essays, novella, anthology, and so on. All of these words mean something similar to the word book, according to Word2Vec, at least, because they have a similar vector representation. So it seems like we’ve done a pretty good job of trying to capture this kind of vector representation of word meaning. One other interesting side effect of Word2Vec is that it’s also able to capture something about the relationships between words as well. Let’s take a look at an example. Here, for instance, are two words, man and king. And these are each represented by Word2Vec as vectors. So what might happen if I subtracted one from the other, calculated the value king minus man? Well, that will be the vector that will take us from man to king, somehow represent this relationship between the vector representation of the word man and the vector representation of the word king. And that’s what this value, king minus man, represents. So what would happen if I took the vector representation of the word woman and added that same value, king minus man, to it? What would we get as the closest word to that, for example? Well, we could try it. Let’s go ahead and go back to our Python interpreter and give this a try. I could say, what is the closest word to the vector representation of the word king minus the representation of the word man plus the representation of the word woman? And we see that the closest word is the word queen. We’ve somehow been able to capture the relationship between king and man. And then when we apply it to the word woman, we get, as the result, the word queen. So Word2Vec has been able to capture not just the words and how they’re similar to each other, but also something about the relationships between words and how those words are connected to each other. So now that we have this vector representation of words, what can we now do with it? Now we can represent words as numbers. And so we might try to pass those words as input to, say, a neural network. Neural networks we’ve seen are very powerful tools for identifying patterns and making predictions. Recall that a neural network you can think of as all of these units. But really what the neural network is doing is taking some input, passing it into the network, and then producing some output. And by providing the neural network with training data, we’re able to update the weights inside of the network so that the neural network can do a more accurate job of translating those inputs into those outputs. And now that we can represent words as numbers that could be the input or output, you could imagine passing a word in as input to a neural network and getting a word as output. And so when might that be useful? One common use for neural networks is in machine translation, when we want to translate text from one language into another, say translate English into French by passing English into the neural network and getting some French output. You might imagine, for instance, that we could take the English word for lamp, pass it into the neural network, get the French word for lamp as output. But in practice, when we’re translating text from one language to another, we’re usually not just interested in translating a single word from one language to another, but a sequence, say a sentence or a paragraph of words. Here, for example, is another paragraph, again taken from Sherlock Holmes, written in English. And what I might want to do is take that entire sentence, pass it into the neural network, and get as output a French translation of the same sentence. But recall that a neural network’s input and output needs to be of some fixed size. And a sentence is not a fixed size. It’s variable. You might have shorter sentences, and you might have longer sentences. So somehow, we need to solve the problem of translating a sequence into another sequence by means of a neural network. And that’s going to be true not only for machine translation, but also for other problems, problems like question answering. If I want to pass as input a question, something like what is the capital of Massachusetts, feed that as input into the neural network, I would hope that what I would get as output is a sentence like the capital is Boston, again, translating some sequence into some other sequence. And if you’ve ever had a conversation with an AI chatbot, or have ever asked your phone a question, it needs to do something like this. It needs to understand the sequence of words that you, the human, provided as input. And then the computer needs to generate some sequence of words as output. So how can we do this? Well, one tool that we can use is the recurrent neural network, which we took a look at last time, which is a way for us to provide a sequence of values to a neural network by running the neural network multiple times. And each time we run the neural network, what we’re going to do is we’re going to keep track of some hidden state. And that hidden state is going to be passed from one run of the neural network to the next run of the neural network, keeping track of all of the relevant information. And so let’s take a look at how we can apply that to something like this. And in particular, we’re going to look at an architecture known as an encoder-decoder architecture, where we’re going to encode this question into some kind of hidden state, and then use a decoder to decode that hidden state into the output that we’re interested in. So what’s that going to look like? We’ll start with the first word, the word what. That goes into our neural network, and it’s going to produce some hidden state. This is some information about the word what that our neural network is going to need to keep track of. Then when the second word comes along, we’re going to feed it into that same encoder neural network, but it’s going to get as input that hidden state as well. So we pass in the second word. We also get the information about the hidden state, and that’s going to continue for the other words in the input. This is going to produce a new hidden state. And so then when we get to the third word, the, that goes into the encoder. It also gets access to the hidden state, and then it produces a new hidden state that gets passed into the next run when we use the word capital. And the same thing is going to repeat for the other words that appear in the input. So of Massachusetts, that produces one final piece of hidden state. Now somehow, we need to signal the fact that we’re done. There’s nothing left in the input. And we typically do this by passing some kind of special token, say an end token, into the neural network. And now the decoding process is going to start. We’re going to generate the word the. But in addition to generating the word the, this decoder network is also going to generate some kind of hidden state. And so what happens the next time? Well, to generate the next word, it might be helpful to know what the first word was. So we might pass the first word the back into the decoder network. It’s going to get as input this hidden state, and it’s going to generate the next word capital. And that’s also going to generate some hidden state. And we’ll repeat that, passing capital into the network to generate the third word is, and then one more time in order to get the fourth word Boston. And at that point, we’re done. But how do we know we’re done? Usually, we’ll do this one more time, pass Boston into the decoder network, and get an output some end token to indicate that that is the end of our input. And so this then is how we could use a recurrent neural network to take some input, encode it into some hidden state, and then use that hidden state to decode it into the output we’re interested in. To visualize it in a slightly different way, we have some input sequence. This is just some sequence of words. That input sequence goes into the encoder, which in this case is a recurrent neural network generating these hidden states along the way until we generate some final hidden state, at which point we start the decoding process. Again, using a recurrent neural network, that’s going to generate the output sequence as well. So we’ve got the encoder, which is encoding the information about the input sequence into this hidden state, and then the decoder, which takes that hidden state and uses it in order to generate the output sequence. But there are some problems. And for many years, this was the state of the art. The recurrent neural network and variance on this approach were some of the best ways we knew in order to perform tasks in natural language processing. But there are some problems that we might want to try to deal with and that have been dealt with over the years to try and improve upon this kind of model. And one problem you might notice happens in this encoder stage. We’ve taken this input sequence, the sequence of words, and encoded it all into this final piece of hidden state. And that final piece of hidden state needs to contain all of the information from the input sequence that we need in order to generate the output sequence. And while that’s possible, it becomes increasingly difficult as the sequence gets larger and larger. For larger and larger input sequences, it’s going to become more and more difficult to store all of the information we need about the input inside this single hidden state piece of context. That’s a lot of information to pack into just a single value. It might be useful for us, when generating output, to not just refer to this one value, but to all of the previous hidden values that have been generated by the encoder. And so that might be useful, but how could we do that? We’ve got a lot of different values. We need to combine them somehow. So you could imagine adding them together, taking the average of them, for example. But doing that would assume that all of these pieces of hidden state are equally important. But that’s not necessarily true either. Some of these pieces of hidden state are going to be more important than others, depending on what word they most closely correspond to. This piece of hidden state very closely corresponds to the first word of the input sequence. This one very closely corresponds to the second word of the input sequence, for example. And some of those are going to be more important than others. To make matters more complicated, depending on which word of the output sequence we’re generating, different input words might be more or less important. And so what we really want is some way to decide for ourselves which of the input values are worth paying attention to, at what point in time. And this is the key idea behind a mechanism known as attention. Attention is all about letting us decide which values are important to pay attention to, when generating, in this case, the next word in our sequence. So let’s take a look at an example of that. Here’s a sentence. What is the capital of Massachusetts? Same sentence as before. And let’s imagine that we were trying to answer that question by generating tokens of output. So what would the output look like? Well, it’s going to look like something like the capital is. And let’s say we’re now trying to generate this last word here. What is that last word? How is the computer going to figure it out? Well, what it’s going to need to do is decide which values it’s going to pay attention to. And so the attention mechanism will allow us to calculate some attention scores for each word, some value corresponding to each word, determining how relevant is it for us to pay attention to that word right now? And in this case, when generating the fourth word of the output sequence, the most important words to pay attention to might be capital and Massachusetts, for example. That those words are going to be particularly relevant. And there are a number of different mechanisms that have been used in order to calculate these attention scores. It could be something as simple as a dot product to see how similar two vectors are, or we could train an entire neural network to calculate these attention scores. But the key idea is that during the training process for our neural network, we’re going to learn how to calculate these attention scores. Our model is going to learn what is important to pay attention to in order to decide what the next word should be. So the result of all of this, calculating these attention scores, is that we can calculate some value, some value for each input word, determining how important is it for us to pay attention to that particular value. And recall that each of these input words is also associated with one of these hidden state context vectors, capturing information about the sentence up to that point, but primarily focused on that word in particular. And so what we can now do is if we have all of these vectors and we have values representing how important is it for us to pay attention to those particular vectors, is we can take a weighted average. We can take all of these vectors, multiply them by their attention scores, and add them up to get some new vector value, which is going to represent the context from the input, but specifically paying attention to the words that we think are most important. And once we’ve done that, that context vector can be fed into our decoder in order to say that the word should be, in this case, Boston. So attention is this very powerful tool that allows any word when we’re trying to decode it to decide which words from the input should we pay attention to in order to determine what’s important for generating the next word of the output. And one of the first places this was really used was in the field of machine translation. Here’s an example of a diagram from the paper that introduced this idea, which was focused on trying to translate English sentences into French sentences. So we have an input English sentence up along the top, and then along the left side, the output French equivalent of that same sentence. And what you see in all of these squares are the attention scores visualized, where a lighter square indicates a higher attention score. And what you’ll notice is that there’s a strong correspondence between the French word and the equivalent English word, that the French word for agreement is really paying attention to the English word for agreement in order to decide what French word should be generated at that point in time. And sometimes you might pay attention to multiple words if you look at the French word for economic. That’s primarily paying attention to the English word for economic, but also paying attention to the English word for European in this case too. And so attention scores are very easy to visualize to get a sense for what is our machine learning model really paying attention to, what information is it using in order to determine what’s important and what’s not in order to determine what the ultimate output token should be. And so when we combine the attention mechanism with a recurrent neural network, we can get very powerful and useful results where we’re able to generate an output sequence by paying attention to the input sequence too. But there are other problems with this approach of using a recurrent neural network as well. In particular, notice that every run of the neural network depends on the output of the previous step. And that was important for getting a sense for the sequence of words and the ordering of those particular words. But we can’t run this unit of the neural network until after we’ve calculated the hidden state from the run before it from the previous input token. And what that means is that it’s very difficult to parallelize this process. That as the input sequence get longer and longer, we might want to use parallelism to try and speed up this process of training the neural network and making sense of all of this language data. But it’s difficult to do that. And it’s slow to do that with a recurrent neural network because all of it needs to be performed in sequence. And that’s become an increasing challenge as we’ve started to get larger and larger language models. The more language data that we have available to us to use to train our machine learning models, the more accurate it can be, the better representation of language it can have, the better understanding it can have, and the better results that we can see. And so we’ve seen this growth of large language models that are using larger and larger data sets. But as a result, they take longer and longer to train. And so this problem that recurrent neural networks are not easy to parallelize has become an increasing problem. And as a result of that, that was one of the main motivations for a different architecture, for thinking about how to deal with natural language. And that’s known as the transformer architecture. And this has been a significant milestone in the world of natural language processing for really increasing how well we can perform these kinds of natural language processing tasks, as well as how quickly we can train a machine learning model to be able to produce effective results. There are a number of different types of transformers in terms of how they work. But what we’re going to take a look at here is the basic architecture for how one might work with a transformer to get a sense for what’s involved and what we’re doing. So let’s start with the model we were looking at before, specifically at this encoder part of our encoder-decoder architecture, where we used a recurrent neural network to take this input sequence and capture all of this information about the hidden state and the information we need to know about that input sequence. Right now, it all needs to happen in this linear progression. But what the transformer is going to allow us to do is process each of the words independently in a way that’s easy to parallelize, rather than have each word wait for some other word. Each word is going to go through this same neural network and produce some kind of encoded representation of that particular input word. And all of this is going to happen in parallel. Now, it’s happening for all of the words at once, but we’re really just going to focus on what’s happening for one word to make it clear. But know that whatever you’re seeing happen for this one word is going to happen for all of the other input words, too. So what’s going on here? Well, we start with some input word. That input word goes into the neural network. And the output is hopefully some encoded representation of the input word, the information we need to know about the input word that’s going to be relevant to us as we’re generating the output. And because we’re doing this each word independently, it’s easy to parallelize. We don’t have to wait for the previous word before we run this word through the neural network. But what did we lose in this process by trying to parallelize this whole thing? Well, we’ve lost all notion of word ordering. The order of words is important. The sentence, Sherlock Holmes gave the book to Watson, has a different meaning than Watson gave the book to Sherlock Holmes. And so we want to keep track of that information about word position. In the recurrent neural network, that happened for us automatically because we could run each word one at a time through the neural network, get the hidden state, pass it on to the next run of the neural network. But that’s not the case here with the transformer, where each word is being processed independent of all of the other ones. So what are we going to do to try to solve that problem? One thing we can do is add some kind of positional encoding to the input word. The positional encoding is some vector that represents the position of the word in the sentence. This is the first word, the second word, the third word, and so forth. We’re going to add that to the input word. And the result of that is going to be a vector that captures multiple pieces of information. It captures the input word itself as well as where in the sentence it appears. The result of that is we can pass the output of that addition, the addition of the input word and the positional encoding into the neural network. That way, the neural network knows the word and where it appears in the sentence and can use both of those pieces of information to determine how best to represent the meaning of that word in the encoded representation at the end of it. In addition to what we have here, in addition to the positional encoding and this feed forward neural network, we’re also going to add one additional component, which is going to be a self-attention step. This is going to be attention where we’re paying attention to the other input words. Because the meaning or interpretation of an input word might vary depending on the other words in the input as well. And so we’re going to allow each word in the input to decide what other words in the input it should pay attention to in order to decide on its encoded representation. And that’s going to allow us to get a better encoded representation for each word because words are defined by their context, by the words around them and how they’re used in that particular context. This kind of self-attention is so valuable, in fact, that oftentimes the transformer will use multiple different self-attention layers at the same time to allow for this model to be able to pay attention to multiple facets of the input at the same time. And we call this multi-headed attention, where each attention head can pay attention to something different. And as a result, this network can learn to pay attention to many different parts of the input for this input word all at the same time. And in the spirit of deep learning, these two steps, this multi-headed self-attention layer and this neural network layer, that itself can be repeated multiple times, too, in order to get a deeper representation, in order to learn deeper patterns within the input text and ultimately get a better representation of language in order to get useful encoded representations of all of the input words. And so this is the process that a transformer might use in order to take an input word and get it its encoded representation. And the key idea is to really rely on this attention step in order to get information that’s useful in order to determine how to encode that word. And that process is going to repeat for all of the input words that are in the input sequence. We’re going to take all of the input words, encode them with some kind of positional encoding, feed those into these self-attention and feed-forward neural networks in order to ultimately get these encoded representations of the words. That’s the result of the encoder. We get all of these encoded representations that will be useful to us when it comes time then to try to decode all of this information into the output sequence we’re interested in. And again, this might take place in the context of machine translation, where the output is going to be the same sentence in a different language, or it might be an answer to a question in the case of an AI chatbot, for example. And so now let’s take a look at how that decoder is going to work. Ultimately, it’s going to have a very similar structure. Any time we’re trying to generate the next output word, we need to know what the previous output word is, as well as its positional encoding. Where in the output sequence are we? And we’re going to have these same steps, self-attention, because we might want an output word to be able to pay attention to other words in that same output, as well as a neural network. And that might itself repeat multiple times. But in this decoder, we’re going to add one additional step. We’re going to add an additional attention step, where instead of self-attention, where the output word is going to pay attention to other output words, in this step, we’re going to allow the output word to pay attention to the encoded representations. So recall that the encoder is taking all of the input words and transforming them into these encoded representations of all of the input words. But it’s going to be important for us to be able to decide which of those encoded representations we want to pay attention to when generating any particular token in the output sequence. And that’s what this additional attention step is going to allow us to do. It’s saying that every time we’re generating a word of the output, we can pay attention to the other words in the output, because we might want to know, what are the words we’ve generated previously? And we want to pay attention to some of them to decide what word is going to be next in the sequence. But we also care about paying attention to the input words, too. And we want the ability to decide which of these encoded representations of the input words are going to be relevant in order for us to generate the next step. And so these two pieces combine together. We have this encoder that takes all of the input words and produces this encoded representation. And we have this decoder that is able to take the previous output word, pay attention to that encoded input, and then generate the next output word. And this is one of the possible architectures we could use for a transformer, with the key idea being these attention steps that allow words to pay attention to each other. During the training process here, we can now much more easily parallelize this, because we don’t have to wait for all of the words to happen in sequence. And we can learn how we should perform these attention steps. The model is able to learn what is important to pay attention to, what things do I need to pay attention to, in order to be more accurate at predicting what the output word is. And this has proved to be a tremendously effective model for conversational AI agents, for building machine translation systems. And there have been many variants proposed on this model, too. Some transformers only use an encoder. Some only use a decoder. Some use some other combination of these different particular features. But the key ideas ultimately remain the same, this real focus on trying to pay attention to what is most important. And the world of natural language processing is fast growing and fast evolving. Year after year, we keep coming up with new models that allow us to do an even better job of performing these natural language related tasks, all on the surface of solving the tricky problem, which is our own natural language. We’ve seen how the syntax and semantics of our language is ambiguous, and it introduces all of these new challenges that we need to think about, if we’re going to be able to design AI agents that are able to work with language effectively. So as we think about where we’ve been in this class, all of the different types of artificial intelligence we’ve considered, we’ve looked at artificial intelligence in a wide variety of different forms now. We started by taking a look at search problems, where we looked at how AI can search for solutions, play games, and find the optimal decision to make. We talked about knowledge, how AI can represent information that it knows and use that information to generate new knowledge as well. Then we looked at what AI can do when it’s less certain, when it doesn’t know things for sure, and we have to represent things in terms of probability. We then took a look at optimization problems. We saw how a lot of problems in AI can be boiled down to trying to maximize or minimize some function. And we looked at strategies that AI can use in order to do that kind of maximizing and minimizing. We then looked at the world of machine learning, learning from data in order to figure out some patterns and identify how to perform a task by looking at the training data that we have available to it. And one of the most powerful tools there was the neural network, the sequence of units whose weights can be trained in order to allow us to really effectively go from input to output and predict how to get there by learning these underlying patterns. And then today, we took a look at language itself, trying to understand how can we train the computer to be able to understand our natural language, to be able to understand syntax and semantics, make sense of and generate natural language, which introduces a number of interesting problems too. And we’ve really just scratched the surface of artificial intelligence. There is so much interesting research and interesting new techniques and algorithms and ideas being introduced to try to solve these types of problems. So I hope you enjoyed this exploration into the world of artificial intelligence. A huge thanks to all of the course’s teaching staff and production team for making the class possible. This was an introduction to artificial intelligence with Python.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
February 23, 2025