This webinar recording details various Excel data cleaning techniques. Bill Jellen, a Microsoft Excel expert, demonstrates several methods, including traditional Excel functions, Flash Fill, and pivot tables. He then introduces Power Query, a powerful data transformation tool within Excel, highlighting its efficiency and audit trail capabilities. Finally, he explores using Python within Excel for data cleaning and visualization, showcasing its potential and accessibility even for beginners. The presentation includes audience participation through polls and Q&A sessions. The overall aim is to equip viewers with improved data cleaning skills using both established and newer Excel features.
Excel Data Cleaning Study Guide
Quiz
Instructions: Answer each question in 2-3 sentences.
- What is the “Go To Special” feature in Excel, and how can it be used to fill blank cells in a dataset?
- How does Excel’s Flash Fill feature work, and in what scenarios is it most useful?
- Describe the problem of “pivoting” data, and why transforming the data into a “tall and narrow” format is beneficial.
- Explain how Power Query in Excel handles data sources, and describe the initial steps required to import data into the Power Query editor.
- What are the advantages of using the “fill down” function within Power Query compared to performing the same action in standard Excel?
- What does it mean to “unpivot” columns in Power Query, and what type of data transformation is this most helpful for?
- Explain the “split by delimiter” function in Power Query, and provide an example of how it can be used to clean data.
- Describe how Power Query can combine data from multiple files in a folder, and why this is a powerful data cleaning tool.
- What is a data frame in the context of Python, and how does it relate to a range in Excel when using Python within Excel?
- Describe how Python code in Excel can be used to transform data (e.g., case changes, missing data) and how results can be presented.
Quiz Answer Key
- The “Go To Special” feature in Excel allows you to select specific types of cells, such as blanks, formulas, or visible cells. When filling blank cells, it first selects all blank cells within a given range, allowing for subsequent actions, such as filling them with data from the cell above.
- Flash Fill analyzes data patterns and automatically fills in values based on examples provided by the user. It is beneficial when you need to extract specific information from a column, such as extracting a state from a full address column or combining first and last names.
- Pivoting data often involves reformatting data from a “wide” format, where multiple columns represent similar data points, to a “tall and narrow” format. This transformation makes it easier to analyze data and is typically required before creating a pivot table.
- Power Query can handle data from a wide range of sources, such as Excel files, CSV files, databases, and the web. The first step to import data in Power Query is often selecting the data source from the “Get Data” tab, often from a named range or table, and then using the “Transform Data” option to enter the Power Query Editor.
- In Power Query, the “fill down” function fills the selected empty cells with the value from the first preceding non-empty cell in the same column, and it is more efficient as it takes fewer clicks to accomplish. This contrasts with standard Excel, where the same action requires several steps such as “Go To Special” to select blank cells and then a formula like “equal up arrow” using control enter.
- To “unpivot” columns in Power Query transforms data from a “wide” format into a “tall and narrow” format by taking columns and converting them into a column of attributes and a column of values. This is particularly useful when data is spread across multiple columns but represents similar categories of data (such as monthly sales).
- The “split by delimiter” function in Power Query divides a single column into multiple columns or rows based on a specified delimiter. For example, a column with data separated by semicolons can be split into multiple rows with one data item per row and can select a space as a default to split the column into new data.
- Power Query can connect to a folder of files and combine data from all of the files into a single table, filtering the correct file types to read. It is particularly useful because it allows consistent formatting and data cleaning to be applied to many files at once, which automates the process.
- In the context of Python, a data frame is a structure that organizes data in rows and columns, which is similar to a range or table in Excel. In Python within Excel, a data frame is assigned to a range for cleaning and analysis, such as using a variable named “DF”.
- Python code can transform data within Excel to clean things like change case, fill in missing data, drop rows or columns, split data into new columns, or correct data by using built in functions in Python. Results can be displayed within Excel either as a python object or, more usefully, as an Excel value, which is shown as a new table within the Excel sheet.
Essay Questions
Instructions: Answer each question thoroughly in essay format.
- Compare and contrast the “old school” methods of data cleaning in Excel (e.g., Go To Special, Flash Fill) with the more recent techniques available in Power Query. Discuss the strengths and limitations of both approaches.
- Discuss the impact of Power Query on the process of cleaning and transforming data within Excel. How has it changed the workflow, and what are the key benefits over previous methods?
- Analyze the role and significance of “unpivoting” data in Power Query. In what real-world scenarios is this feature crucial for data analysis and reporting?
- Evaluate the integration of Python within Excel as a tool for data cleaning and transformation. How does it compare to both standard Excel features and Power Query?
- Describe the process of combining data from multiple files in a folder within Power Query. Explain why this feature is particularly useful for scenarios involving regular or frequently updated data.
Glossary of Key Terms
- Data Cleaning: The process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets to improve the quality of the data.
- Go To Special: An Excel feature that allows users to select specific types of cells (e.g., blanks, formulas) within a worksheet.
- Flash Fill: An Excel feature that automatically fills data based on patterns identified in previous entries.
- Pivot Table: A tool in Excel used to summarize and analyze large datasets by aggregating and reorganizing data.
- Data Pivoting: Transforming data into a “wide” or “tall and narrow” format in order to better summarize and analyse the data.
- Power Query: A data transformation and manipulation tool available in Excel (and other Microsoft products) for data extraction, cleaning, and loading.
- Power Query Editor: The user interface for Power Query, where data transformation steps can be defined and reviewed.
- M language: The language used in Power Query to apply transformations and steps.
- Fill Down (Power Query): A transformation step in Power Query that fills blank cells in a column with the value from the first preceding non-empty cell in that column.
- Unpivot Columns: A Power Query operation that converts multiple columns of similar data into two columns, one listing the attribute and another listing the value.
- Split by Delimiter: A Power Query function that divides text in a single column into multiple columns or rows based on a specified character or string.
- Python: A versatile programming language that can be integrated into Excel for advanced data analysis and manipulation.
- Data Frame: In Python, a data structure that organizes data in rows and columns, similar to an Excel table or range, used for cleaning and analysis.
- Title Case: In Python, a string formatting where the first letter of each word is capitalized.
- CPE: Continuing Professional Education
- Named Range: In Excel, a name given to a range of cells, often useful when working with formulas and Power Query.
Excel Data Cleaning Hacks with Power Query and Python
Okay, here is a detailed briefing document summarizing the key themes and ideas from the provided transcript of the Excel webinar:
Briefing Document: Excel Hacks for Easy Data Cleaning
Date: October 20, 2024 Presenter: Bill Jelen (MrExcel.com) Moderator: Stephanie Audience: Excel users seeking data cleaning techniques, especially in accounting and finance.
I. Overview
This webinar focuses on practical Excel data cleaning techniques, transitioning from traditional methods to more advanced tools like Power Query and the new Python integration within Excel. The session targets both experienced Excel users looking to improve their workflow and those unfamiliar with powerful data manipulation features. Bill Jelen, known as “Mr Excel,” brings his extensive knowledge and experience to demonstrate these hacks. The main goal of the webinar is to show how to clean data more efficiently and move data into a format that is suitable for further analysis, particularly using pivot tables.
II. Key Themes and Ideas
- Evolution of Data Cleaning in Excel: The webinar traces the evolution of data cleaning in Excel, starting with traditional methods (like Go To Special) and moving to more powerful, efficient tools like Flash Fill, Power Query and now Python.
- Traditional Methods: Jelen acknowledges that while old methods work, they are often tedious, time-consuming, and error-prone. He shows how even basic tasks like filling blanks can be optimized.
- The “Secret” of Power Query: The primary focus of the webinar is power query. Jelen emphasizes that most Excel users do not realize its power. The big idea here is that the tools on the data tab have been replaced with completely different tools, that perform vastly different work, yet look almost exactly the same as the old ones.
- Power Query as a Game Changer: The webinar stresses the transformative potential of Power Query. Key benefits highlighted include:
- Ease of Use: Jelen highlights how many tasks that are difficult to perform in Excel are much easier in power query.
- Step-by-Step Automation: The “Applied Steps” feature enables automation of data cleaning processes, making them repeatable, auditable, and error-resistant.
- Diverse Data Sources: Power Query can import data from various sources, not just other Excel files.
- Unpivot Function: Jelen demonstrates how the “Unpivot” function in Power Query is revolutionary for reshaping data from wide to long format, making it more analysis-ready.
- Auditable data cleaning: Jelen highlights how all of the data cleaning steps are recorded and can be provided to auditors.
- Python Integration: The webinar introduces Python as a powerful tool for advanced data cleaning tasks within Excel, and demonstrates how easy it is to add Python formulas to a spreadsheet, leveraging the power of the Python eco-system without needing to install python separately.
- Simplified Learning: Excel’s integration removes the complex installation process typically associated with Python and its libraries.
- Internet Code: Jelen indicates that most python code can be found with a simple Google search, or in the beta version of excel using co-pilot.
- New Data Visualizations: Jelen describes that the real power of Python comes not only from data cleaning, but also from powerful data visualizations.
- Importance of Data Format: Jelen underscores the need to format data for analysis, noting that many real-world reports require significant cleaning before they can be used effectively. The focus is on converting data from a format that is easy to read by a human, to a format that is easy to analyze by a computer.
- Real-World Examples: The webinar uses practical examples, often based on scenarios that Jelen has personally encountered during his seminars. These examples help contextualize the data cleaning methods and are easy to relate to by the audience.
III. Specific Techniques and Examples
- “Go To Special” for Filling Blanks: Using Go To Special to select blank cells and then using =↑ followed by Ctrl+Enter to quickly fill data from the cell above.
- Quote: ” inside of a go-to special, I’m going to choose all of the blank cells… And now from here, it’s just three clicks…equal sign, up arrow, and then I need to fill in that type of formula…hold down the control key and press enter.”
- Flash Fill for Extracting Data: Using Flash Fill (Ctrl+E) to automatically extract data like state abbreviations from a column of addresses after providing a few examples.
- Quote: “All we have to do is just give it a few examples, so I want Kansas from that one, Delaware from that one, and then from the next blank cell, I do Flash Fill right, and they get it.”
- Text After Formula Demonstrates how the TextAfter formula is used to extract the data from the right side of a string based on a character, and also indicates the ability to count backwards.
- Quote: “I want everything after the last comma, so in quotes I put a comma…the awesome syntax here…is to say -1.”
- Text Join Formula Shows how the TextJoin formula can take several strings, concatenate them with a specific delimiter, and eliminate blanks.
- Quote: “Take all of these values and put a pipe…”
- Old-School Pivot Table Method: Using “multiple consolidation range” from the old pivot table interface to unpivot columns prior to the advent of power query.
- Quote: “And we get a pivot table that looks just like the original data. I remember thinking to like like what are you doing this is completely insane…double click…and it takes that data that had been going across and it makes it go down the page.”
- Power Query: Fill Down: Using “Fill Down” in the transform menu of Power Query to fill in empty cells with the data from the cell above.
- Quote: “I just choose that cell and then under transform, they have something called fill down…three clicks and it’s done.”
- Power Query: Transpose Demonstrates the usage of “transpose” on a data set to turn a data set 90 degrees.
- Quote: “I’m going to transpose the whole data set.”
- Power Query: Unpivot Columns: Using “Unpivot Other Columns” to convert wide data into long data format.
- Quote: “Under transform, I’m going to take this data that currently is very short and very wide…I’m going to make it very narrow and very tall…and that is called unpivot the other columns”
- Power Query: Split Column: Using “Split Column” to split data by a delimiter, both into multiple columns and into multiple rows.
- Quote: “They have something called split into columns…then they have this thing called split into rows like what what is this split into rows…”
- Power Query: Combining Multiple Files from a Folder: The use of data – get data – from folder, to pull all the contents of multiple files into a single dataset in power query.
- Quote: “…there’s something here called advanced options… They have something called split into columns… then they this thing called split into rows…
- Python for Data Cleaning: Using Python code in Excel to perform complex data cleaning, such as converting text case, replacing values, dropping rows, and splitting columns.
- Quote: “Every example you ever find in the internet, the First Data frame you created is called DF.”
IV. Key Takeaways
- Embrace Power Query: Power Query is a critical tool for modern Excel users, especially when dealing with messy, real-world data. Its ability to automate repetitive tasks and handle diverse data sources makes it indispensable. It is also very easy to learn.
- Python as a Next Step: Python in Excel offers a new dimension to data cleaning and analysis. It is not necessary to know how to code to use python in excel. It is much easier than python in a separate app.
- Automation and Auditability: By using either Power Query or Python, data cleaning processes can be automated and made auditable.
- Constant Learning: Jelen implies that Excel users need to continuously adapt and adopt new tools to remain productive in the ever-changing data landscape.
V. CPE Poll Questions (and Answers)
- Have you ever used Power Query in Excel? (A) I use it all the time and I love it (B) I have it but I don’t use it (C) My Excel does not have power query (D) I’ve never heard of power query
- After using unpivot your data will often be: (A) Tall and narrow with few columns (B) Wide with many columns (C) Difficult to summarize (D) Converted to text.
- Which of these file types cannot be combined using the from folder trick? (A) CSV or text (B) Excel workbooks with a single sheet (C) PDF files (D) JPEG files.
- Which of these can be used to clean data? (A) Cobra (B) Anaconda (C) Python (D) Boa.
- The answers are, respectively, (A, B, C and D are correct, but there is not a single correct answer to this poll, (A), (D) and (C)
VI. Action Items for Participants
- Explore Power Query in your own Excel environment.
- Practice the data cleaning techniques demonstrated in the webinar.
- Look into the basics of Python for future data analysis opportunities.
- Ask your IT department for help activating Python if you do not see it in your copy of Excel
This briefing document is intended to provide a detailed summary of the webinar’s content, highlighting key themes and practical takeaways for the participants.
Data Cleaning & Transformation in Excel with Power Query and Python
Data Cleaning & Transformation with Excel: An FAQ
Here are some frequently asked questions based on the provided transcript.
- What is Power Query and how does it differ from traditional Excel data cleaning methods?
- Power Query, found under the “Get & Transform Data” section of the Data tab in Excel, is a powerful data transformation and cleaning tool. Unlike older Excel techniques, which often involve multiple steps and manual processes like Go To Special, Power Query offers a visual, step-by-step approach to cleaning data. It’s designed to be more efficient and repeatable, remembering the steps you’ve applied and allowing you to easily refresh your data. The core of Power Query is a formula language called M that automates data preparation. This makes it vastly different from the typical Excel formulas that operate cell by cell.
- What are some specific data cleaning tasks that Power Query handles exceptionally well compared to traditional Excel methods?
- Power Query shines in scenarios such as:
- Filling blanks in outline views: Power Query’s Fill Down feature is far simpler than using Go To Special and multiple keystrokes.
- Unpivoting data: Transforming wide data tables into tall and narrow ones, which is exceptionally difficult and cumbersome in standard Excel.
- Splitting delimited data: Power Query can automatically detect delimiters and offers flexible splitting options.
- Combining multiple files: It can combine data from multiple CSV or Excel files in a folder, a task that would take hours to perform manually. It also has better ability to handle inconsistent data across multiple files.
- Robust audit trail: It automatically records all the data cleaning steps, allowing the user to understand how the final results were obtained. The steps can be modified or removed as needed.
- How does Flash Fill work, and when is it most useful?
- Flash Fill is a feature that automatically fills in data based on patterns it recognizes in your existing data. You provide a few examples of the desired outcome, and Flash Fill attempts to complete the rest. It’s particularly useful for extracting data from messy text strings, like taking a name and address column and creating first name, last name, street, city, state, and zip code columns automatically. It also works when it needs to combine information from multiple columns into one.
- What is Python integration in Excel, and how can it be used for data cleaning?
- Excel now supports the ability to execute Python code directly within cells. This enables more complex data manipulations and transformations using Python’s powerful libraries like pandas and numpy. This can be great for situations where you need more flexible data manipulation compared to the standard set of Excel formulas or power query functions. You can write custom logic for cleaning, reshaping and creating new columns of data.
- How does Power Query handle new data in a regularly updated file?
- Once you’ve set up the cleaning steps in Power Query for a particular file, you can save the workbook and use it as your “clean” version. When you download a new file, save it to the same location with the same name. Open the workbook with the Power Query connection, and refresh the data connection. Power Query automatically applies all the saved cleaning steps to the new data, ensuring that your data cleaning process is fully automated and repeatable.
- Is the Python integration complex or difficult for Excel users?
- No, the integration is designed to be very user-friendly. You don’t need to install Python or manage libraries as Excel handles all of that behind the scenes. The interface includes a formula bar and an option to return the results of the python code as Excel values rather than Python objects so you can see the results quickly. Additionally, users can leverage AI-powered code generation tools within Excel’s Copilot to get Python code to perform specific tasks. You can get very powerful results using AI code generation and editing them to match your needs.
- What are the benefits of using Python for cleaning data within Excel?
- Python adds a new dimension of flexibility, you’re no longer limited by Excel’s formulas and functions. You can use robust data transformation, string manipulation, data formatting, and other advanced logic via pandas, numpy, and other common Python libraries to handle many data preparation needs. You are essentially combining the strengths of both systems into a single application.
- Why would you choose Power Query over Python for data cleaning, or vice versa?
- Power Query is generally preferred for its ease of use and visual interface, making it suitable for most common data cleaning and transformation tasks. Power Query is also better for connecting to external data sources. Python’s integration is best for more advanced data cleaning, and custom transformations or for data visualization using libraries like matplot lib. If you need complex logic and require custom data manipulations, then python may be a better solution. Additionally, if you need to use AI-based code generation to accomplish tasks, then Python provides the best starting point for those needs.
Data Cleaning with Excel, Power Query, and Python
Data cleaning is the process of modifying or removing data in a dataset that is incorrect, incomplete, improperly formatted, or duplicated [1-3]. Data cleaning is often a necessary precursor to data analysis and is an important skill for anyone working with data [1, 2].
The sources discuss several methods for cleaning data, both in Excel and using other tools:
- Excel data cleaning: The presenter discusses several “old school” Excel tricks for cleaning data, including using the “Go To Special” dialog box to fill in blank cells [3] and using the Flash Fill feature to extract data based on examples [4, 5].
- The “Go To Special” dialog box allows the selection of blank cells [3]. After selecting the blank cells, the user can type “=” and then the up arrow to reference the cell above, then press control + enter to copy the formula to all selected blank cells. This action fills the blank cells with the value from the cell directly above them [3].
- The Flash Fill feature can automatically fill in data based on a few examples. For example, in a column with addresses, Flash Fill can be used to extract the state code by giving a few examples [5].
- Power Query: Power Query is a data transformation and data preparation engine [2, 6]. Power Query is accessed through the “Get & Transform Data” section of the Data tab in Excel [2, 7]. Power Query has a number of features that make data cleaning easier [6, 8, 9]:
- Fill Down: Power Query’s “Fill Down” feature can be used to fill in blank cells in a column with the value from the cell above. This is an easier process than using “Go To Special” in Excel [6].
- Unpivot: The unpivot feature can transform a wide data set with many columns to a narrow, tall data set with fewer columns. This is useful for data that has multiple columns for the same type of information (such as months) that should be in one column [9, 10].
- Split Column: The split column feature can be used to split a column into multiple columns by a delimiter, or by number of characters [9, 11, 12]. It also has an advanced option to split data into rows, which allows for data in one column to be split into multiple rows [11]. The split column tool can also detect the appropriate delimiter, such as a space, rather than defaulting to tab, which is the default for Excel’s text to columns [9].
- Combine files: Power Query can combine multiple files in a folder, including CSV, text, and Excel files with a single sheet into one table [13, 14].
- Power Query records all the steps taken to clean data in the “Applied Steps” section, providing an audit trail that can be reviewed [10]. This also allows the user to repeat data cleaning steps on new data by clicking “Refresh” [15, 16].
- Python: Python can be used in Excel to clean and transform data [2, 17]. Python in Excel allows for complex data transformations that might be difficult to do in Excel or Power Query [18, 19].
- The presenter provides an example of code using a data frame object that converts a name column to title case, fills blank sales data with zeros, deletes rows without email addresses, splits the name column into first and last name columns, fixes spelling in the status column, and then returns the transformed data to an Excel sheet [18].
- The presenter also notes that they got all of this code from the internet and that an Excel co-pilot feature wrote the code after the user gave it five questions on what they were trying to do [19].
The presenter notes that Power Query is not well-known by many Excel users and that many users are not aware of its capabilities [2, 20, 21]. The presenter also notes that the Python integration in Excel is very new [17].
Excel Data Cleaning Hacks
The sources describe several Excel hacks for data cleaning, including both traditional Excel features and newer tools like Power Query and Python integration [1-27].
Here are some of the Excel data cleaning techniques discussed:
- Filling in Blanks with “Go To Special” [3]:
- This method is used to fill blank cells with the value from the cell directly above.
- Select the data range, then use Home > Find & Select > Go To Special > Blanks.
- Type = (equal sign) and press the up arrow to reference the cell above the first blank cell, then press Ctrl + Enter to fill all selected blank cells with the formula [3].
- After filling the blank cells, the data needs to be reselected, copied, and then pasted as values to convert them from formulas [4].
- Extracting Data with Flash Fill [5]:
- Flash Fill can automatically recognize patterns and fill in data based on a few examples [5].
- For example, to extract state codes from a column of addresses, type the desired state code in the column next to the first few addresses [5].
- Select the cell below the examples and press Ctrl + E to activate Flash Fill and populate the rest of the column [5].
- Flash fill must be consciously activated; the automatic flash fill feature is often turned off by users [5, 6].
- Power Query for Data Transformation [2, 11]:
- Power Query is a powerful data transformation tool accessed via the Data tab in Excel [2, 10, 11].
- Fill Down can be used to fill blank cells with values from the cells above [11]. It is located under the Transform tab. Select the column and click Transform > Fill > Down [11].
- Unpivot transforms data from a wide format (with many columns) to a tall format (with fewer columns) [13]. This can be used to transform data where different categories of data are spread across multiple columns to a format where all the data is in a single column with an additional column to designate the category [6, 13]. To use it, select the columns that you do not want to unpivot, then under Transform click Unpivot Columns > Unpivot Other Columns [13].
- Split Column can split columns by a delimiter or by the number of characters [13, 18]. It can split data into multiple columns or multiple rows, which is a unique feature [13, 18]. Under the Transform tab, click Split Column. Power Query can automatically detect the most likely delimiter, such as a space, rather than defaulting to a tab [13].
- Combine Files allows for the combining of multiple files in a folder (CSV, text, Excel files with a single sheet) into a single table [20]. Go to Data > Get Data > From File > From Folder, select the folder, and then click “Transform Data.” Filter the file types to include only the desired file type, and then click the button to combine the files which looks like two arrows pointing down [20, 21].
- Power Query keeps a record of all cleaning steps in the “Applied Steps” section, providing an audit trail [14]. This allows the user to refresh the data after additional data is added and have the same cleaning steps automatically applied [15, 16].
- Python Integration for Data Cleaning [2, 24]:
- Python can be used directly within Excel to perform data cleaning tasks [2, 24].
- To start, insert a new sheet, and type =py() into a cell, which will designate that cell as a python cell [25].
- Use a data frame (typically named “df” in python) to refer to the selected range of cells in the excel sheet [25]. For example, df = Sheet1!A3:E153 [25].
- Python code can be written within the cell to perform various operations, such as converting text to title case, filling blanks with zeros, deleting rows, and splitting columns [25]. For example, one code example in the sources converts a name column to title case using the string.title function, fills blank sales data with zeros, deletes rows without email addresses, splits the name column into first and last name columns, and fixes the spelling of “complete” to “completed” in the status column [25].
- Python will return an object in a cell by default; to see the transformed data in the sheet, select Excel Value to the left of the formula bar [26].
- Press Ctrl + Enter to commit the python code [26].
- Python libraries, like pandas, numpy, and matplotlib, are available automatically, without the need to install or refer to them in code [24].
These Excel hacks provide a range of options for cleaning data, from basic operations like filling blank cells to more complex transformations using Power Query and Python [3, 5, 11, 13, 24].
Mastering Power Query in Excel
Power Query is a powerful data transformation and data preparation engine within Excel [1, 2]. It is accessed through the “Get & Transform Data” section of the Data tab in Excel [1, 2]. Many Excel users are unaware of Power Query and its capabilities [1, 3].
Here’s a detailed breakdown of Power Query’s features and functions discussed in the sources:
- Data Import:
- Power Query can import data from various sources, including Excel files, CSV files, JSON files, PDF files, and entire folders of files [4]. It can also connect to databases and other sources through an ODBC driver [4].
- When importing from Excel, Power Query can use either a sheet or a named range within the file [2, 4].
- Data Transformation: Power Query provides several tools to clean and transform data [1]:
- Fill Down: This feature fills blank cells in a column with the value from the cell above [2]. To use it, select the column and then under the Transform tab, select Fill > Down [2].
- Unpivot: The unpivot feature is used to transform wide data into a tall, narrow format [1, 5]. Select the columns you want to remain as identifier columns, then under Transform, select Unpivot Columns > Unpivot Other Columns [5]. This is useful when dealing with data where different categories are spread across multiple columns [5, 6].
- Split Column: This feature can split a column into multiple columns by a delimiter or by the number of characters [1, 5]. It also has an advanced option to split data into rows, which is a unique feature [7]. The tool can detect the appropriate delimiter, such as a space, rather than defaulting to a tab [5]. Under the Transform tab, click Split Column [5].
- Merge Columns: This function combines multiple columns into one, with an option to include a separator, such as a space [8]. Under the Add Column tab, click Merge Columns [8].
- Transpose: This function transposes all of the data, converting rows into columns and columns into rows [8]. Under the Transform tab, click Transpose [8].
- Remove Columns: This feature allows for the removal of unneeded columns [8]. Select the column and then right-click and choose Remove [8].
- Filter: This feature allows for filtering data based on specific criteria, including the removal of null values and specific text entries [5, 8]. Click the dropdown arrow at the top of the column to access the filter menu [5].
- Use First Row as Headers: This feature designates the first row of the data as the column headers [8]. Under the Home tab, click Use First Row as Headers [8].
- Combine Files:
- Power Query can combine multiple files from a folder (e.g., CSV, text, Excel files with a single sheet) into a single table [1, 9].
- To combine files, go to Data > Get Data > From File > From Folder, select the folder, and then click Transform Data [9].
- Filter the file types to include only the desired file types, and then click the combine files button, which looks like two arrows pointing down [9, 10].
- Audit Trail and Refresh:
- Power Query records all the data cleaning steps in the “Applied Steps” section [11]. This provides an audit trail that can be reviewed.
- It also allows users to repeat data cleaning steps on new data by clicking “Refresh” [12, 13].
- Users can also set the query to refresh data automatically when the workbook is opened [13].
- Power Query Editor:
- When transforming data with power query, the user is taken to the Power Query Editor [2].
- The Power Query Editor is a separate window where data transformation steps are performed and recorded [2].
- The Power Query editor is written by the SQL Server team and is used in other Microsoft products such as Power BI and Power Automate [2].
- Advanced Editor:
- The advanced editor displays the steps of the query in a text file [11].
- Advantages of Power Query:
- Power Query tools like Unpivot and Fill Down are easier to use than similar tools in Excel [11].
- Power Query’s Split Column tool offers more advanced features than Excel’s text-to-columns feature [5, 11].
- Power Query automatically detects delimiters like spaces when splitting columns, whereas Excel’s text-to-columns defaults to tabs [5, 11].
- Power Query can handle data that changes, such as the addition of new rows or columns [12].
In summary, Power Query provides a robust set of tools that streamline data cleaning and transformation, making it a valuable asset for anyone working with data in Excel [1]. Its ability to automate data cleaning steps and work with multiple data sources makes it a powerful tool for data preparation [11, 12].
Mastering Excel’s Flash Fill
Flash Fill is an Excel feature that automatically recognizes patterns and fills in data based on a few examples [1]. It is a tool that is designed to make data entry and data transformation easier.
Here’s a detailed explanation of Flash Fill:
- How it works:
- Flash Fill analyzes the data entered and tries to identify patterns.
- It uses these identified patterns to fill in the remaining cells in the column automatically [1].
- Activation:
- Flash Fill can be activated by pressing Ctrl + E [1].
- It is located on the Data tab on the right-hand side [1].
- Turning it off:
- Flash Fill can be turned off in Excel options under File > Options > Advanced by unchecking the box that says “Automatically Flash Fill” [1, 2].
- Use cases:
- Extracting Data: Flash Fill is useful for extracting specific parts of data from a column, such as state codes from a full address. For example, when given an address in a single cell, Flash Fill can extract the state abbreviation into its own cell [1].
- Combining Data: Flash Fill can also be used to combine data from separate columns into a new column. For example, if you have a column of first names and a column of last names, Flash Fill can combine them into a full name column, based on a few examples [1].
- Examples:
- To extract state codes from a column of addresses, type the desired state code in the column next to the first few addresses. Select the cell below the examples and press Ctrl + E to activate Flash Fill and populate the rest of the column [1].
- Limitations:
- Flash Fill requires a column heading above the data [2].
- It is important to manually turn off the automatic Flash Fill feature because it can start to take over without being asked [1].
In summary, Flash Fill is a convenient tool that can save time and effort when it comes to data entry and transformation in Excel, especially when you have patterned data to extract or combine [1].
Python Integration in Microsoft Excel
Python integration in Excel allows users to leverage the power of Python for data cleaning and analysis directly within Excel spreadsheets [1, 2]. This feature is relatively new and aims to bridge the gap between Excel’s ease of use and Python’s robust data processing capabilities [1, 2].
Here’s a detailed breakdown of the key aspects of Python integration in Excel:
- How to use Python in Excel:
- To use Python in Excel, you insert a new sheet and then type =PY() into a cell [3]. This designates the cell as a Python cell, which is indicated by a green py symbol to the left of the formula bar [3].
- In a Python cell, you can write Python code. The sources use a data frame (usually named df) to refer to a selected range of cells in the Excel sheet [3]. For example, df = Sheet1!A3:E153 will create a data frame named df that contains the data in the specified range [3].
- Python libraries such as pandas, numpy, and matplotlib, which are commonly used in data analysis, are available automatically without the need for separate installation or references in code [4].
- Python code is entered in the cell, and to execute the code, you must press Ctrl + Enter [5]. A regular Enter key press will not commit the python code [5].
- Data Cleaning with Python:
- Python can perform many data cleaning tasks, including text manipulation, filling blanks, deleting rows, and splitting columns [3].
- Text Manipulation: Python can easily convert text to different cases. For example, using the function string.title to convert text to title case [3].
- Filling Blanks: Python can fill blank cells with a specific value using fillna function. For example, to fill blank sales data with zeros [3].
- Deleting Rows: Python code can be used to delete rows that meet specific criteria, such as rows without email addresses [3].
- Splitting Columns: Python code can split columns, like splitting the name column into first and last name columns [3].
- Returning Values to Excel:
- By default, Python will return an object in the cell rather than the transformed data [5]. To view the data, you need to select Excel Value to the left of the formula bar [5]. This converts the python object to an Excel value that can be displayed in the worksheet [5].
- After receiving the transformed data, the columns may need to be adjusted, and any unneeded columns may need to be hidden or removed. [5]
- Python Code Sources:
- Python code for data cleaning tasks can often be found online [3].
- Additionally, Excel now has a feature called Copilot, that can write Python code based on a user’s needs [5].
- Python Licensing:
- Python integration requires a specific license that may be an additional cost [6].
- It is possible to get a basic version of python without paying for a premium license [6].
- The basic version has been shown to work just as well without paying extra for the premium license [6].
- Advantages of Python Integration:
- Python is known for being more powerful than Excel in certain data processing tasks [2].
- Python allows for more complex operations on data than Excel’s built-in functions [4].
- Python’s integration in Excel eliminates the need for complicated installations and setup processes [4]. The necessary libraries are available automatically without having to be downloaded or called in the code [4].
- It lowers the learning curve for Excel users by allowing them to use Python within an environment they are already comfortable with [4].
- Python is open source, which enables integration with a variety of third-party and community-developed tools and visualizations [6].
- Python can also provide charts and visualizations that Excel does not have [5, 6].
In summary, Python integration in Excel provides a way for users to use both the ease of Excel and the power of Python for more advanced data cleaning and analysis tasks.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!

Leave a comment