This extensive tutorial details the creation of a complete coffee shop customer service chatbot. It begins with the core concepts of building such a bot, including prompt engineering, Retrieval Augmented Generation (RAG), and agent-based systems, before demonstrating how to implement them. The tutorial explores advanced techniques such as creating a Market Basket Analysis recommendation engine and deploying large language models (LLMs) without local GPUs using Runpod. It covers constructing a React Native mobile application, complete with user interface design based on Figma, Firebase integration, and the incorporation of the developed chatbot functionality for taking and processing customer orders.
Prompt Engineering & Recommendation Systems Study Guide
Quiz:
- What is prompt engineering? Prompt engineering is the art of crafting effective text prompts to elicit desired responses from large language models (LLMs). It involves designing input text that guides the model to generate specific, accurate, and relevant outputs.
- Explain the importance of structured output in prompt engineering. Structured output ensures that the LLM’s response adheres to a defined format (e.g., JSON), facilitating easy parsing and integration with other systems or databases. It enhances the usability of the generated content by making it predictable and machine-readable.
- Describe the “give the model time to think” approach (Chain of Thought) and its benefits. The “give the model time to think” approach, particularly Chain of Thought (CoT), encourages the LLM to reason through a problem step-by-step before providing a final answer. This method significantly improves accuracy by guiding the model through a logical thought process, leading to more reliable results.
- What is a vector embedding, and how is it used in retrieval-augmented generation (RAG) systems? A vector embedding is a numerical representation of text that captures its semantic meaning. In RAG systems, embeddings are used to compare the user’s query with a knowledge base, retrieving the most relevant information to augment the prompt and improve the quality of the LLM’s response.
- Explain the concept of confidence in Market Basket Analysis. In Market Basket Analysis, confidence measures the likelihood that a customer who purchased item A (antecedent) will also purchase item B (consequent). It helps determine the probability of a customer buying additional items based on what’s already in their cart.
- What is the significance of the lift metric in Market Basket Analysis? Lift indicates how much more likely two items are to be bought together than if they were bought randomly and independently. A lift value greater than one suggests a positive association, meaning the items are often purchased together.
- Briefly describe the Apriori algorithm. The Apriori algorithm is a Market Basket Analysis technique that identifies frequent itemsets in a transaction database. It operates using a bottom-up approach. It starts with one item. Then, it builds to Latte and croissant and then builds the items again.
- In the context of chatbots and prompt engineering, what is a “guard agent” and what role does it play? A guard agent is a component designed to filter user inputs to ensure they adhere to specific guidelines or policies. It analyzes user prompts and determines whether they are appropriate and safe to process, preventing harmful or irrelevant queries from reaching the core chatbot logic.
- What is the purpose of a classification agent in a chatbot architecture? A classification agent categorizes user inputs to determine the appropriate agent or module to handle the request. This ensures that each query is routed to the most relevant component, such as a details agent for specific information or an order-taking agent for purchase requests.
- What is an agent protocol and what is the advantage of using one in chatbot development? An agent protocol defines a standard interface for different agents within a chatbot system, ensuring they can interact seamlessly. Using a protocol allows for flexibility and scalability, making it easier to add, remove, or modify agents without disrupting the overall architecture.
Essay Questions:
- Discuss the three prompt engineering techniques mentioned in the source material.
- Explain the concept of Market Basket Analysis.
- Explain in detail how the “give the model time to think” works, including the Chain of Thought mentioned.
- Describe the role of the recommendation agent in detail.
- Explain how using a Docker system enables the portability of code.
Glossary of Key Terms:
- API (Application Programming Interface): A set of rules and specifications that software programs can follow to communicate with each other.
- JSON (JavaScript Object Notation): A lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate.
- LLM (Large Language Model): A type of artificial intelligence model that uses deep learning techniques to generate human-like text.
- Prompt Engineering: The process of crafting effective input prompts to elicit desired responses from large language models.
- RAG (Retrieval-Augmented Generation): An AI framework that combines a pre-trained language model with an information retrieval system to improve the accuracy and relevance of generated text.
- Vector Embedding: A numerical representation of text or data that captures its semantic meaning in a high-dimensional space.
- Market Basket Analysis: A data mining technique used to identify associations between items in a transaction database, often used for recommendation systems.
- Confidence (Market Basket Analysis): A metric indicating the likelihood that a customer who purchased item A will also purchase item B.
- Lift (Market Basket Analysis): A metric measuring how much more likely two items are to be bought together than if they were bought independently.
- Apriori Algorithm: A Market Basket Analysis algorithm that identifies frequent itemsets in a transaction database.
- Guard Agent: A component in a chatbot designed to filter user inputs and ensure they adhere to specific guidelines or policies.
- Classification Agent: A module in a chatbot that categorizes user inputs to determine the appropriate agent or module to handle the request.
- Agent Protocol: A standardized interface for different agents within a chatbot system, ensuring seamless interaction and scalability.
- Docker: A platform for developing, shipping, and running applications in containers, which are lightweight, portable, and self-sufficient.
- Expo: A framework for building cross-platform mobile apps with React Native.
- React Native: A JavaScript framework for building native mobile apps.
- CSS (Cascading Style Sheets): A style sheet language used for describing the presentation of a document written in a markup language like HTML.
- Tailwind CSS: A utility-first CSS framework for rapidly styling web applications.
- Firebase: A platform from Google for building and deploying mobile apps.
AI Coffee Shop App: A Development Tutorial
Okay, here’s a detailed briefing document summarizing the main themes and important ideas from the provided text. The text appears to be a transcript of someone walking through a practical coding tutorial, focused on building an AI-powered coffee shop application using Python and React Native. The tutorial covers prompt engineering, recommendation engines, vector databases, conversational agents, and front-end development.
I. Prompt Engineering Techniques:
- Structured Output: The tutorial emphasizes guiding LLMs to provide structured outputs, particularly in JSON format, to facilitate easy parsing and integration with other systems.
- “you can write your output and you can start describing the output should be in a structured Json format… you are not allowed to write anything other than the Json object…the country here is Germany and the capital is Berlin and you can see that it’s a list and inside of it it’s a dictionary and it’s a structured format so if I can extract it and put it in a database quite easily.”
- Structured Input: Structuring input by using titles, backticks, and clear delimiters helps the LLM process information more accurately, especially when dealing with multiple inputs.
- “I try to put some titles and then I can also put some back ticks uh to specify the input itself so to specify that this is the input uh that I’m going to use…now you have the input structured and uh the llm is not like is less likely now to forget about a country.”
- Chain of Thought (Giving Model Time to Think): Encouraging the LLM to reason through the problem step-by-step, rather than directly providing an answer, significantly improves accuracy.
- “the idea behind it is that models are are generating uh like the answers word by word so uh right now if they have to generate this answer output uh it might get the uh wrong result… if you told it to think and if you told it to get the outputs like calculate the output step by step it will have the enough words behind it so that it can be directed into the right direction.”
II. Recommendation Engine:
- Market Basket Analysis: The tutorial implements a recommendation engine using Market Basket Analysis techniques, focusing on metrics like confidence, lift, and support.
- “confidence is the measure of the likelihood of a customer who bought a certain item… then he will also buy another item or a set of items that is called the consequent…we can just sort with confidence in a descending fashion and the thing that has the highest confidence we should recommend to users that is it.”
- A priori Algorithm: The A priori algorithm is used to discover frequent item sets and association rules from transaction data.
- “One of the Market Basket analysis algorithms is called the aiori algorithm and it builds this and builds all those numbers together from the bottom up…it starts off with one item then makes this latte like latte and quas then builds the items again so it builds it from the bottom up approach.”
- Confidence, Lift, and Support:Confidence: Measures the likelihood of a customer buying item B if they bought item A.
- Lift: Indicates how much more likely two items are to be bought together than independently. Lift > 1 means a positive association.
- Support: Indicates the frequency of an item or itemset in the dataset. Items with low support are excluded.
III. Vector Database & Embeddings:
- Embeddings: The tutorial demonstrates using embeddings to represent text data numerically, allowing for semantic similarity searches using cosine similarity.
- “llms have the ability to embed those uh text into numbers…if you subtracted car from motorcycle then you will get one and if you subtracted the car from banana… then you will get 44…we are going to get the closest one so this closer number the smaller the number that it is uh the more similar those two concepts are.”
- Pinecone: Pinecone is used as a vector database for efficient storage and retrieval of embeddings.
- “Vector databases are good because when you search an item you can search by the closest thing… the database itself the vector database itself does this for me.”
- Retrieval Augmented Generation (RAG): The tutorial implements a basic RAG system by retrieving relevant data from the vector database (product descriptions) and injecting it into the user’s prompt.
IV. Conversational Agent Architecture:
- Modular Agent Design: The conversational agent is built with a modular architecture consisting of multiple specialized agents (Guard Agent, Classification Agent, Details Agent, Recommendation Agent, Order Taking Agent).
- Agent Protocol: A standardized protocol enables seamless integration and communication between different agents.
- “we don’t care about the uh the agent… whatever chosen diction like whatever chosen agent from the classification I can just add the thread there then get response I don’t care about anything I don’t care about uh uh which agent is which I just care about that it has the get response.”
- Guard Agent: Filters inappropriate or out-of-scope user input.
- Classification Agent: Determines which agent is most suitable to handle the user’s request.
- Details Agent: Retrieves detailed information about products (e.g., price) from the vector database.
- Recommendation Agent: Provides product recommendations using both A priori analysis and popularity-based methods.
- Order Taking Agent: Guides the user through the order process, handles order details, and integrates with the recommendation agent.
- State Management: The Order Taking Agent manages the conversation state (e.g., current order, step number) to provide context-aware responses.
V. Front-End Development (React Native):
- Expo: Expo is used to simplify React Native development.
- Native Wind: Native Wind simplifies CSS styling within React Native.
- Firebase: Firebase is utilized for real-time database functionality (product data).
- Redux Toolkit: Redux Toolkit is used for state management (cart items).
- Navigation: Expo Router is used for navigation between different screens.
- Context API: React Context API is used to share state and functions across components (cart context).
- Asynchronous Data Fetching: useEffect and async/await are used to fetch data from Firebase.
VI. Deployment (Run pod & Docker):
- Docker: Docker is used to containerize the application for deployment, ensuring consistency and reproducibility across different environments.
- Run pod: Run pod is used to deploy the Docker container.
- API Endpoint: The application is deployed with an API endpoint to receive user requests and return responses.
VII. Key Ideas and Facts:
- Version Control: The tutorial stresses the importance of specifying exact versions of Python packages to ensure consistent results.
- “those are the exact versions that I’m using right now but it might work with other versions as well but I feel like I uh if you like include the versions then you’re going to have the same results that I do.”
- Modular Code: The tutorial emphasizes breaking down the application into smaller, manageable modules (agents, components, functions).
- Importance of Clear Communication: The tutorial highlights the need for clear and concise prompts and instructions to guide LLMs effectively.
- Iterative Development: The tutorial demonstrates an iterative development process, where the application is built and tested incrementally.
In essence, the provided text showcases a comprehensive guide to building a modern, AI-driven application, blending backend logic with frontend design and emphasizing best practices in both development and deployment.
Prompt Engineering, RAG, and Recommendation Engines with Python
1. What are the key Python libraries used for prompt engineering and interacting with language models, and how are they installed?
The key Python libraries mentioned are:
- Pandas: For working with structured data like CSV files.
- python-dotenv: For easily reading environment variables from a .env file.
- OpenAI: For interacting with OpenAI language models.
- mlx-ten: For machine learning tasks, specifically used here for market basket analysis.
- Pinecone: For interacting with the Pinecone vector database.
These libraries can be installed using pip:
pip install pandas python-dotenv openai mlx-ten==0.2.3.0 pinecone==5.3.1
Alternatively, you can create a requirements.txt file listing these libraries and their versions and then install them using:
pip install -r requirements.txt
2. What are the three prompt engineering techniques discussed, and how do they improve language model performance?
The three prompt engineering techniques discussed are:
- Structured Output: Instructing the model to output data in a structured format, such as JSON. This makes the model’s output easier to parse and use in downstream applications or databases.
- Structured Input: Organizing the user’s input into distinct sections using titles, backticks, or triple quotes. This helps the model to better understand the different parts of the input, such as instructions, variables, and requests.
- Chain of Thought: Giving the model time to “think” by prompting it to reason through the problem step-by-step before providing the final answer. This can significantly improve accuracy, especially for complex reasoning tasks.
3. What is Retrieval Augmented Generation (RAG), and how does it work?
Retrieval Augmented Generation (RAG) is a technique for improving the relevance of language model responses by incorporating external knowledge. It involves the following steps:
- Embedding: Converting text data (e.g., documents, product descriptions) into numerical vector representations called embeddings.
- Vector Database: Storing these embeddings in a vector database like Pinecone.
- Retrieval: When a user makes a query, the query is also converted into an embedding. The vector database is then searched to find the embeddings that are most similar to the query embedding.
- Augmentation: The text data associated with the most similar embeddings is retrieved and added to the user’s prompt.
- Generation: The language model then uses the augmented prompt to generate a response.
This allows the model to provide more relevant and informative answers by drawing upon external knowledge.
4. What is Market Basket Analysis, and how can it be used to create a recommendation engine?
Market Basket Analysis is a technique for identifying relationships between items that are frequently purchased together. In the context of a recommendation engine, it can be used to suggest items to customers based on what they have already placed in their cart or purchased in the past. Key concepts in Market Basket Analysis include:
- Antecedent: An item already present in the customer’s cart.
- Consequent: An item that is recommended based on the presence of the antecedent.
- Confidence: The probability that a customer who buys the antecedent will also buy the consequent.
- Lift: A measure of how much more likely two items are to be bought together than to be bought randomly and independently. A lift greater than 1 indicates a positive association.
- Support: The frequency with which an item or item set appears in the dataset.
Algorithms like Apriori can be used to identify frequent itemsets and generate association rules based on these metrics.
5. How does the Apriori algorithm work in the context of a recommendation engine?
The Apriori algorithm is used to discover frequent itemsets in a transaction database. It starts by identifying individual items that meet a minimum support threshold. Then, it iteratively combines these items to form larger itemsets, pruning any itemsets that do not meet the support threshold. This process continues until no more frequent itemsets can be found. The algorithm then uses these frequent itemsets to generate association rules, which can be used to make recommendations.
6. What are the different types of agents described, and what are their roles in the coffee shop application?
The different types of agents described are:
- Guard Agent: Responsible for filtering out inappropriate or irrelevant user inputs, ensuring that the conversation stays within the intended scope of the application (e.g., preventing the user from asking math questions).
- Classification Agent: Responsible for determining which agent should handle a given user input based on the content of the message (e.g., routing a question about prices to the Details Agent).
- Details Agent: Responsible for providing detailed information about menu items, such as prices or descriptions. It often utilizes a vector database to retrieve relevant information.
- Order Taking Agent: Responsible for taking customer orders, handling the conversation flow, and confirming the order details.
- Recommendations Agent: Responsible for suggesting additional items to customers based on their current order or past purchase history. It can use techniques like market basket analysis or popular recommendations.
7. What is the Agent Protocol used in this application, and why is it important?
The Agent Protocol defines a standard interface for all agents in the application. This interface typically includes a get_response function that takes a user’s message as input and returns a response. By adhering to this protocol, the application can easily add or remove agents without modifying the core orchestration logic. This promotes modularity, maintainability, and extensibility. It also allows a single point to call any agent response.
8. What are the main steps for deploying this application on RunPod, and what is the purpose of a Dockerfile in this process?
The main steps for deploying the application on RunPod are:
- Create a RunPod Account and Obtain API Key: This allows you to authenticate your requests to the RunPod API.
- Prepare the Application Code: Ensure that all the necessary files (Python scripts, models, data) are organized and accessible.
- Create a Dockerfile: A Dockerfile is a text file that contains instructions for building a Docker image. It specifies the base image, dependencies, and commands needed to run the application.
- Build the Docker Image: Use the docker build command to create a Docker image from the Dockerfile.
- Push the Docker Image to a Registry (e.g., Docker Hub): This allows RunPod to access the image.
- Create a RunPod Endpoint: Use the RunPod API to create a new endpoint, specifying the Docker image, resources (CPU, GPU, memory), and other configuration options.
- Test the Endpoint: Send requests to the endpoint to ensure that the application is running correctly.
The Dockerfile is crucial because it provides a consistent and reproducible way to package the application and its dependencies. This ensures that the application will run correctly on RunPod, regardless of the underlying infrastructure.
Customer Service Chatbot for Coffee Shops
A customer service chatbot can handle various tasks to improve customer experience and drive sales. Here’s how it works:
- Order Management The chatbot can take orders and provide detailed information about menu items.
- Information Retrieval The chatbot answers questions about the coffee shop, such as its location, working hours, and menu items. It can also provide details about the ingredients of a specific item.
- Recommendation Engine The chatbot can suggest complementary products to users, improving the overall customer experience and driving sales. This is achieved through a Market Basket analysis recommendation engine, which identifies items often bought together and suggests them to customers.
- Irrelevant Conversation Blocking The chatbot is designed to filter out irrelevant conversations. A guard agent detects content not related to the coffee shop and prevents the chatbot from engaging in those topics.
- Personalized Recommendations The chatbot can provide tailored suggestions in a conversational manner, enhancing customer experience and potentially increasing sales.
- Modular Design The chatbot uses an agent-based system composed of distinct components or agents. Each agent handles a specific function, such as taking orders, providing information, filtering out irrelevant conversations, or recommending items. This modular approach allows for easy updates and improvements without affecting the entire system.
- Integration with Recommendation Engine The agent-based system can integrate with external systems like a recommendation engine, allowing agents to incorporate outputs from these resources into the conversation.
- Full-Stack Development The chatbot application includes a React Native application that connects to a Firebase database and Runpod endpoints. This setup allows for dynamic display of items, filtering by category, and real-time interaction with the chatbot.
- Availability The chatbot is available 24/7.
- Upselling The chatbot will try to upsell users based on current orders.
A chatbot can be trained with a dataset of coffee shop transactions to identify which items are popular with specific orders. This enables the chatbot to make informed recommendations and provide a seamless user experience. The use of open-source LLMs like Llama allows for full control over the chatbot, including retraining and customization for specific purposes.
Prompt Engineering Techniques for Enhanced Language Model Output
Prompt engineering techniques can enhance the output of language models, making them more accurate and structured. Here are some techniques that can be used to improve a chatbot’s responses:
- Structured Output You can format the chatbot’s output into a structured format, such as a JSON object, so that other systems can understand it and extract data from it.
- Input Structuring Structuring the input helps to separate it into different sections, such as titles and backticks, making the instructions clearer for the language model.
- Giving the Model Time to Think (Chain of Thought) This involves prompting the model to think step by step to increase the accuracy of its answers. The “Chain of Thought” technique can significantly increase accuracy by directing the language model to reason through the problem, calculating the output step by step. This method guides the language model toward the correct direction, enhancing the accuracy and structure of the output.
- Retrieval Augmented Generation (RAG) RAG helps the model to output information that is not already in its memory. This involves injecting relevant information into the prompt so that the user can get the output from it and respond accordingly. This is particularly useful when the chatbot needs to provide information about a specific coffee shop’s menu or details that it was not initially trained on. Injecting data in the prompt allows the chatbot to retrieve and use information it doesn’t have in its memory. The process involves using embeddings to identify the most relevant data to inject into the prompt. Embeddings are the process of changing text into an array of numbers to measure the similarity between two texts. By converting text into embeddings, mathematical operations can be performed to determine the similarity between different pieces of text.
- System Prompts System prompts define how the chatbot should behave, defining the overall behavior.
- Double Checking JSON Using an agent to double-check the JSON output can guarantee that the format is correct and make the code more robust. This involves having a specialized agent whose sole task is to validate and correct the JSON format, ensuring that the output is parsable and error-free.
Recommendation Engine Training: Market Basket Analysis
A recommendation engine can be trained to provide suggestions to customers, improving their overall experience and potentially increasing sales. One type of recommendation engine is the Market Basket analysis recommendation engine.
Key aspects of recommendation engine training:
- Market Basket Analysis This statistical model identifies which items are most popular with specific orders.
- Association Rule Association refers to how likely two items are to be bought together.
- Support This refers to the popularity of a single item.
- Confidence This indicates the likelihood of buying item Y if item X is purchased.
- Lift Lift measures how much more likely two items are to be bought together compared to buying them individually; a lift of 1 indicates no association, while a lift less than 1 suggests a negative association.
- Apriori Algorithm One Market Basket analysis algorithm builds association rules and calculates support, confidence, and lift from the bottom up, starting with single items and then combining them.
- Popularity Recommendation Engine This involves recommending the most popular items to customers who have not provided any specific order information. It can also recommend the most popular items per category.
To train a recommendation engine, a dataset with coffee shop transactions can be used. This dataset includes transaction numbers, items sold, customer information, and quantities.
Agent-Based Chatbots: Architecture, Design, and Functionality
An agent-based chatbot is built using distinct components called agents, each designed to handle a specific function. This approach makes the chatbot more efficient, accurate, and easier to update. Agent-based systems are used in production environments across various industries.
Key aspects of agent-based chatbots:
- Modular Design Each agent is designed to handle a specific function, such as taking orders, providing information, filtering out irrelevant conversations, or recommending items. This modular approach allows for easy updates and improvements without affecting the entire system.
- Specialized Tasks Assigning specialized tasks to agents is key to producing higher accuracy results.
- Guard Agent A guard agent flags content that is not relevant to the coffee shop. If a user is asking irrelevant questions, the guard agent should respond with a default response, such as offering help with an order.
- Input Classifier An input classifier agent classifies user requests into different categories, such as order, recommendations, or details.
- Details Agent This agent answers questions about the coffee shop, menu items, or other details, using a vector database for information retrieval.
- Order Agent This agent outputs the order in a structured format, which can then be easily integrated into an app.
- Recommendation Agent The recommendation agent connects to a trained recommendation engine to provide relevant suggestions.
- Memory Agents have memory so that it can remember what steps it went through and what the next steps are.
- Orchestration Agent controller orchestrates the communication between agents. It first goes to the guard agent, then to the classification agent, and then chooses an agent based on the classification agent.
React Native App with Firebase and Chatbot: Development Guide
A React Native application can be created to complete the customer service chatbot. It can connect seamlessly to both a Firebase database and Runpod endpoints.
Key features and steps in developing the React Native application:
- Home Screen The home screen can retrieve and display items dynamically from a Firebase database. Users can also filter items by category for easier navigation.
- Item Page An item page allows users to view more information about each product, pulling data directly from the database.
- Card Screen A card screen displays all selected items along with the total price.
- Chatbot Screen A dedicated chatbot screen enables users to interact with the chatbot directly within the application. The chatbot connects to Runpod endpoints.
- Navigation The application uses tabs for easy navigation between the home, orders, and chatbot screens.
- Styling Native Wind can be used to simplify CSS styling.
- Data Fetching Firebase can be used to fetch the product data.
- State Management React’s useState hook is used for managing local component state, such as loading states.
- Context API The Context API can be used to manage global states.
Steps to create the React Native application:
- Install Node.js Node.js is required to run JavaScript code.
- Install Expo Go Expo Go allows running the application on a smartphone. It is available on both Google Play and the Apple App Store.
- Create a New Application with Expo Expo is a library that helps write React Native code with helper packages.
- Install Dependencies Install necessary packages, such as those for routing.
- Start the Application Run the application using npx Expo start. If running in WSL, use the –tunnel flag.
- Install Native Wind Install Native Wind to simplify CSS styling.
- Configure Tailwind CSS Configure Tailwind CSS by initializing it with npx tailwind init and updating the tailwind.config.js file.
- Install Firebase Firebase is used to fetch data. Install the necessary Firebase packages using npm.
- Expo Vector Icons Install Expo Vector icons using npm. These can then be used for things such as the tab icons.
The React Native application can be further enhanced with features such as:
- Cart Context To implement cart functionality, a cart context can be created to store and manage cart items.
- Toast Notifications The react-native-root-toast library can be used to display toast notifications when items are added to the cart.
- Details Page When clicking on an item, the application can direct users to a details page with additional information.
- Message List The application should display messages in a list. The messages can be rendered in a scroll view.
By following these steps, a full-stack React Native application can be created, enabling users to interact with the chatbot, view product details, manage their cart, and place orders.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

Leave a comment