Basic Data Preparation Challenges

Summary

Data preparation can be extremely time-consuming. But with the rich transformation capabilities available in the Power Query Editor, you can clean and prepare any dataset for analysis. This chapter introduced you to common data challenges and the ways Power Query provides to address them.

In Exercise 2-2 you learned how Power Query can extract codes and how to associate meaning with these codes, without the complex and difficult-to-maintain formulas presented in Exercise 2-1.

In Exercise 2-3, you were introduced to a very useful capability to extract meaning from columns, using examples. You saw that Column from Examples enables you to extract text, date, time, and numbers and offers a variety of transformations, based on the output examples you provide. You saw how this feature can be used to extract text between delimiters, apply a conditional column, and create buckets from ranges of numbers.

In Exercise 2-4, you learned how to parse hyperlinks from textual feed and apply different transformation techniques to achieve your goals. One of the most important lessons from this exercise is that you should audit your queries as often as possible and look for edge cases.

A common and highly time-consuming challenge involves handling inconsistent dates in your datasets. In Exercises 2-5 and 2-6, you learned how simple it is address inconsistent dates in Power Query.

Finally, in Exercises 2-7 and 2-8, you learned how to split a table into a fact table and a lookup table by using the Reference and Remove Duplicates options. You also learned how to avoid refresh errors by using Lowercase and Trim transformations, as well as how to split comma-separated values into rows to build group/members tables.

Even though you are still close to the beginning of this book, based on the exercises you have completed so far, you can probably reduce your data cleansing efforts by thousands of hours. In the next chapter, you will learn how to combine multiple tables and further reduce your time to insight.