Getting to know Python in Excel

Leveling up with Python objects

Now that we have covered a lot of building blocks, you are ready to be introduced to more Python concepts that will help you transform and analyze data in Excel and build effective dashboards. As promised earlier, this last section covers several key Python concepts. The concepts you have already encountered are summarized in this section, and new concepts are explored with Excel data.

You are familiar with displaying a Python cell as a Python object or an Excel value. Virtually everything in Python is considered to be an object. This section focuses on objects that are useful when working with Python in Excel. You can read this section all at once and follow along with the examples or dip into the section when you meet a particular object in the wild. The examples for this section are on a new sheet called More Python Syntax in the workbook SP Python in Excel Example.xlsm.

Revisiting variables

You already understand how important it is to use variable names to identify Python objects. And you have already learned the rules for variable names:

  • A variable name must begin with a letter or an underscore (_).

  • A variable name can contain only letters, numbers, and underscores. It cannot include spaces.

  • A variable cannot have the same name as a keyword. For example, assigning a value to True will result in an error.

  • While you can use Python function names as variable names, doing so is not recommended since it can be confusing.

Remember that variable names are case sensitive, so df_frame and Df_frame are different variables. You may change the assignment of any variable by using a new statement. This is something to watch out for in Python in Excel.

Let’s look at an example. Open the worksheet More Python Syntax in the workbook SP Python in Excel Example.xlsm. In cell F10, enter the following Python code:

df_variable_name = xl("'Range Data'!F10:J15", headers=True)

Figure 2-166 shows the results.

FIGURE 2.166

FIGURE 2.166 Using the variable df_variable_name to point to an Excel range.

Next, enter the following Python code, which references df_variable_name, in cell R10:

df_variable_name_ref=df_variable_name

Figure 2-167 shows the results.

FIGURE 2.167

FIGURE 2.167 Creating another Python cell that points to df_variable_name produces a copy of those results.

Now you will use the variable df_variable_name in cell L10. This time, enter the following Python code to reference a single cell:

df_variable_name=xl("'Range Data'!G11")

Figure 2-168 shows the impact on cell R10.

FIGURE 2.168

FIGURE 2.168 Reassigning the variable name to a single cell changes the Python cell that references the variable.

So, you can use the same variable name again without encountering errors, and you can also assign it to a different Python object. This is known as dynamic typing. It is a flexible feature but can be problematic. If you accidentally use the same variable name for a different object, you will reassign its target, and you will get no warnings. As you saw in this example, reassigning a variable name to a different object is not an obstacle. The variable d_variable_name went from referencing a DataFrame to referencing a string, as shown in Figure 2-169.

FIGURE 2.169

FIGURE 2.169 Reassigning the variable name to a single cell changes the Python object to a string.

Storing data in strings

Now let’s look at the properties of a Python string. You have been using Python strings throughout this chapter. In F18, enter the following Python code:

message="Hello World"

Figure 2-170 shows the data type card for the resulting Python object.

FIGURE 2.170

FIGURE 2.170 The Python code df_message = "Hello World" results in a Python string.

A string is textual data, and using strings in Python is easier than in many other languages (including VBA). In this example, you can see that you use double quotes (") to start and end a string. You can also use single quotes ('), but you must be consistent with your choice: If you start a string with a single quote, you must end it with a single quote.

You can concatenate strings by using the plus (+) symbol. To see this in action, enter the following Python code in cell F19.

message_plus="Hello "+'World'

Figure 2-171 shows the results.

FIGURE 2.171

FIGURE 2.171 It is easy to concatenate Python strings.

Note that this example uses double and single quotes. If you use single quotes, then double quotes can be part of the text and vice versa. To see how this works, enter the following Python code in cell F20:

message_plus_quotes="Hello "+'"World"'

As Figure 2-172 shows, the result has double quotes around World.

FIGURE 2.172

FIGURE 2.172 You can mix quote types to include quotes in a string.

Before we move on to using strings with Excel data, enter the following Python code in cell F21 to see how easy it is to repeat a string:

message_repeated_plus_quotes="Hello "*3+'"World"'

Figure 2-173 shows the results.

FIGURE 2.173

FIGURE 2.173 You can use * and an integer to repeat a string in the result.

A common use for strings is to describe the results of a calculation. To see how this works, enter the following Python code in cell F22:

df_Sales = sum(xl("TblExcelData[[#Data],[Sales]]")[0])

This code extracts the data from the Sales column of the Excel table TblExcelData, puts it into a DataFrame, and then sums column 0 of the DataFrame, which is the column that contains the sales data. Figure 2-174 shows the results.

FIGURE 2.174

FIGURE 2.174 An example of summing a column from a DataFrame

To make the results easier to understand, enter the following Python code in cell H22:

df_Sales_desc = f"The total sales is {df_Sales}"

Figure 2-175 shows the results.

FIGURE 2.175

FIGURE 2.175 An example of using an fstring to combine variables with text

This example shows the use of an fstring, a formatted string used to include variables in text. The variable is indicated by using curly braces ({}). You do not need to specify a data type for the variable df_Sales. Because it is a number, though, you can add the comma separator so it is displayed as 798,300. To see this in action, amend the code in cell H22 so it looks like this:

df_Sales_desc = f"The total sales is {df_Sales:,}"

Figure 2-176 shows the results.

FIGURE 2.176

FIGURE 2.176 If the variable is numeric, you can add the comma separator.

Storing numbers with number types

We have been looking at displaying numeric values in strings, so let’s move on to how numbers can be stored. To see how the DataFrame df_Sales is stored, you can use the data type card for that object. Using Python code, you can also use the type() function. To see how it works, enter the following Python code in cell J22:

type (df_Sales)

If you view the Excel value, you see that the class is int, for an integer (see Figure 2-177).

FIGURE 2.177

FIGURE 2.177 The type() function reveals how a number is stored.

The other number type you will work with in examples in this book is a float. To get an idea of how it works, enter the decimal number 56.78 in cell F23 and enter the following Python code in cell J23:

type(xl("F23"))

Figure 2-178 shows the results.

FIGURE 2.178

FIGURE 2.178 The type() function reveals how a decimal number is stored.

You may sometimes need to convert an integer to a float or vice versa. There are functions you can use to do this. To see how it works, enter the following Python code in cell H23:

int(xl("F23"))

As you can see from the data type card shown in Figure 2-179, Python in Excel converts the value into an integer.

FIGURE 2.179

FIGURE 2.179 The int() function converts a float into an integer by removing the value after the decimal point.

To convert an integer to a float, you use the float() function. To see it in action, enter the following Python code in cell H24:

float(xl("H23"))

The result may look misleading in the Excel cell, but as you can see from the data type card in Figure 2-180, Python in Excel has converted 56 into a float.

FIGURE 2.180

FIGURE 2.180 The float() function converts an integer into a float by adding a decimal point.

Simplifying with decimal numbers

The float and integer types will be sufficient for the examples in this book. However, there are a few more types that may help you with calculation problems. For example, the decimal type can be useful if you get inaccurate results using floats. In such a situation, you need to convert the float to a string and then convert the string to a decimal. To do this, you need the decimal library. To try it, enter the following Python code in cell H25:

import decimal
decimal.Decimal(str(xl("F23")))

The function str() converts a number to a string, and then the function Decimal() converts the string into a decimal. Figure 2-181 shows the results.

FIGURE 2.181

FIGURE 2.181 You can import the decimal library to convert floats to decimals.

The decimal type cannot be expressed as an Excel value. Figure 2-182 shows the data type card for cell H25.

FIGURE 2.182

FIGURE 2.182 The data type card shows the decimal value and the class decimal.Decimal.

Because decimals cannot be viewed as Excel values, this conversion would only be needed to correct rare float calculation issues.

Creating and using complex numbers

Some mathematical calculations use complex numbers. A complex number is expressed as a real part and an imaginary part. The unit of imaginary numbers is the square root of -1, known as i. The expression for a complex number is a + bi, where a is the real part and b is the imaginary part. This is the Python function to create a complex number:

complex(real, imaginary)

The parameters real and imaginary default to zero if they are not specified.

To see a complex Python type, enter the following Python code in cell H27:

complex(3,2)

Figure 2-183 shows the results.

FIGURE 2.183

FIGURE 2.183 A complex number can be created by using the complex() function.

As shown in Figure 2-184, the data type card for cell H27 shows the complex number you created.

FIGURE 2.184

FIGURE 2.184 The complex number is expressed as (3+2j).

As expected, a complex number cannot be viewed as an Excel value.

Using Booleans for True/False values

Now that you have learned about numeric objects, let’s move on to Booleans, which are used for True/False values. You used True and False earlier, with the Python function xl():

xl("source", headers = True/False)

Interpreting the Boolean type can be useful for testing whether data exists in an object. These are the Boolean operators:

  • and (or False and True)

  • or (or False or True)

  • not (or not True)

  • ==

  • !=

  • <

  • >

Every object has a Boolean property, and most objects evaluate to True. Some evaluate to False, 0, and other empty values, such as an empty string. You came across the Python object None when you used the print() command, as this command is not used in the Excel frontend. None is a Python constant that represents the absence of a value, and it evaluates to False.

To check whether an object is True or False, you can use the bool() function. To see how this works, enter the following code in cell H28:

bool(xl("H27"))

Figure 2-185 shows the result.

FIGURE 2.185

FIGURE 2.185 The Boolean type of the complex Python object is True.

Figure 2-186 shows the data type card for the cell H28.

FIGURE 2.186

FIGURE 2.186 The Boolean value is True, and the class and type of the result are both bool.

Extracting partial strings with indexing

Now we’re ready to look at how to extract partial strings. To do this, you need to understand more about Python indexing.

Python uses zero-based indexing, which means that you start counting at 0. We already looked at indexing in DataFrames, but it can also be used in many other Python objects. The simplest example is using indexing in a string. To use Python to get parts of a string in Excel, enter the following string as an Excel formula in cell F30:

"Sales for Department Admin in June 2024"

Then extract this data into a DataFrame in cell F31:

df_string_for_index = xl("F30")

Each character (including spaces) in this string can be accessed with an index. Python counts from 0 at either end. To extract a character using indexing, you use square brackets ([]) around the index of the character. To see how it works, enter the following Python code in cell H31:

df_string_for_index[10]

The result is D. The same result can be achieved by counting from the other end. To see this in action, enter the following code in cell H31:

df_string_for_index[-29]

It is more useful to extract words from a string, which is called slicing. Slicing has something in common with creating a list from a range, which we looked at earlier when learning Python syntax, in this example:

numbers_list=list(range(1,5))

You saw earlier that, with this method, the range goes up to the number just before the second number of the range. Slicing works the same way: The start interval is included, but the stop is not. To extract the word Department from df_string_for_index, replace the Python code in cell H31 with the following code:

df_string_for_index[10:20]

The second number is the index of the character after the last letter, not the length of the string to be extracted. You can also start from the other end, as you did for the single character. To try it, enter this code:

df_string_for_index[-29:-19]

You can also add a third parameter: step. The step parameter can be useful for other Python objects, such as lists. Here, it would take every third character from the beginning to the end:

df_string_for_index[0::3]

The result is Sef ptnAiiJe0. Collecting every third character is not likely to be useful, but how about every third item in an Excel data range? To see if that’s useful, enter the following range of data in cells F33:F45:

Company: A
Sales: 1000
Region: 1
Company: B
Sales: 200
Region: 2
Company: C
Sales: 4000
Region: 3
Company: D
Sales: 1500
Region: 4

Then enter the following Python code in cell H33:

xl("F33:F44")[0::3]

Figure 2-187 shows the results.

FIGURE 2.187

FIGURE 2.187 You can slice the data range to extract the company data.

The four company names are stored in a DataFrame with their row numbers. Note that this is not the same as selecting a row number. To prove this, you can replace the Python code in cell H33 with the following:

xl("F33:F44")[-12::3]

This gives the same result but does so by using reverse indexing.

Storing and using dates

As an Excel user, you are probably familiar with how Excel stores dates. A date is a serial number, counting from January 1, 1900. You can change this in the Excel Options dialog, but for the purposes of this book, you will use the standard setting. When you enter a date into a cell, Excel formats the date for you to look like a date, but it stores the value as a float. If you also specify the time, you can specify it to the nearest millisecond.

There is no option to enter a time zone. The default format for a date is based on the Windows Language & Region settings on your computer (see Figure 2-188). You can also change the cell format to show a different date format, but the stored float remains the same.

FIGURE 2.188

FIGURE 2.188 Excel bases its date formatting on the user’s Language & Region settings.

In Python, dates and datetimes are stored in datetime objects. Before we explore how to create a datetime object, you need to see how easy it is to convert an Excel datetime to a Python object in Python in Excel. Enter the date 23 January 2025 in cell H46. The example in Figure 2-189 shows the UK region, with the cell formatted to show the month name to avoid any confusion.

FIGURE 2.189

FIGURE 2.189 Cell formatting can be applied to change the way the date is displayed.

To see date formatting in action, enter the following Python code in cell H46:

df_date = xl("F46")

The Python object is displayed in an Excel cell and uses the default date formatting. Figure 2-190 shows the results for this example, although your results will show your default date formatting.

FIGURE 2.190

FIGURE 2.190 When a Python date is displayed as a Python object, the Excel default date format is used.

The data type card for cell H46 reveals the Python type and class (see Figure 2-191).

FIGURE 2.191

FIGURE 2.191 The data type card for a Python date shows the Python format of the date, which includes the time.

In this example, you have seen that extracting an Excel date into Python is easy. Now, let’s look at how to extract part of a date by using Python. Enter the following Python code in cell H47 to extract the day from df_date:

df_day=df_date.day

Enter the following Python code in cell H48 to extract the month from df_date:

df_month=df_date.month

Enter the following Python code in cell H49 to extract the year from df_date:

df_year=df_date.year

The result for each example is an integer, as shown in Figure 2-192.

FIGURE 2.192

FIGURE 2.192 Extracting the day, the month number, and the year from a datetime object.

To extract the month name, you must use a different function to convert the datetime to a string first. To do so, enter the following Python code in cell H50:

df_month_name=df_date.strftime("%B")

Figure 2-193 shows the result.

FIGURE 2.193

FIGURE 2.193 You can extract the month name by using the strftime() function.

You can use strftime() (which stands for “string from time”) to reformat how the date is displayed. To see how it works, enter the following Python code in cell H51:

df_reformat = df_date.strftime("%A  %d - %m - %Y")

This example demonstrates the main symbols for extracting date components. The other common components from the time segment are %H for hours and %M for minutes. As you can see, the case is important here too. Figure 2-194 shows the result of using df_reformat.

FIGURE 2.194

FIGURE 2.194 You can also use the strftime() function to control how the date is displayed.

Creating and using dictionaries

A Python dictionary is similar to a list in that it is a collection of Python objects. However, each object has a key. To create a simple dictionary that links products to account codes, enter the following Python code in cell F53:

dict_Products = {"Chains":14827,
                 "Socks":14786,
                 "Shorts":14928,
                 "Tights":14848}

Figure 2-195 shows the results. In this example, the product is the key, which links to an account code.

FIGURE 2.195

FIGURE 2.195 Creating a Python dictionary

Figure 2-196 shows the data type card for cell F53.

FIGURE 2.196

FIGURE 2.196 The data type card for a Python dictionary

This data type card contains different information from the cards you have seen so far. There is no Python class. The type dict is the title, and each pair is shown with the value underneath the key. The Excel value view for a dict object also shows a dictionary icon next to the word dict. You can click the dictionary icon, which looks like a stack of two sheets, to view a similar card, as shown in Figure 2-197.

FIGURE 2.197

FIGURE 2.197 The data type card for a Python dictionary can be accessed from the dict icon when in the Excel value view.

You can change or add a pair to a dictionary by using similar syntax. To see this in our dictionary example, enter the following Python code in cell F54:

dict_Products = dict_Products | {"Bells":14852}

The new value will be assigned to the key "Bells". If the key exists, it will be assigned the new value. If the key does not exist, a new pair will be added. Remember that Python is case sensitive, so it is important to take care when updating a key.

Check the results in cell F54 and then enter similar Python code with a new value for "Bells" in cell F55:

dict_Products = dict_Products | {"Bells":17852}

You can extract a value from the dictionary by using get(). Enter the following Python code in cell F56 to get the account code for the "Bells" key:

dict_Products.get("Bells")

You should see the value you gave to the key "Bells". You can also include an error trap if the key doesn’t exist. To do this, enter the following Python code in cell F57:

dict_Products.get("jackets","N/A")

Python in Excel returns the value "N/A" because "jackets" is not in the dictionary.

Figure 2-198 shows the results for this section.

FIGURE 2.198

FIGURE 2.198 An error trap can be specified if the key is not found.

Accident-proofing with tuples

If you want to create a list that cannot be changed, you need a Python object called a tuple, which is similar to a list. You can use tuples to avoid accidentally changing data. To create a tuple of the department regions, for example, enter the following Python code in cell F59:

tp_regions="NORTH","SOUTH","EAST","WEST"

The Excel Value view for this tuple looks like a list, as you can see in Figure 2-199.

FIGURE 2.199

FIGURE 2.199 The Excel Value view of a tuple looks like a list.

To see what happens if you try to change one of the elements of the tuple, enter the following Python code in cell H59:

tp_regions[0]="North"

In this case, you are attempting to change the first item in the tuple tp_regions from "NORTH" to "North". Figure 2-200 shows the results.

FIGURE 2.200

FIGURE 2.200 Trying to change the value of an item in a tuple is not allowed.

As you’ve seen, you are only prevented from changing values in an existing tuple. However, you can use the same variable tp_regions to point to a new tuple.

Finding unique list values with sets

A set is a list that cannot contain duplicate elements. A practical use of a set is to find the unique values in a list. To see how it works, enter the following Python code in cell F64:

set_Products=set(df_contig_range[2])

Figure 2-201 shows the results, using the Excel Value view.

FIGURE 2.201

FIGURE 2.201 A set can be used to find unique values in a list or column.

To create another set in cell H64, enter the following Python code in that cell:

set_challenge_Products=set(df_data_range[0])

Part of the challenge earlier in this chapter was to find the products in df_contig_range that are not in df_data_range. One way to find the products would be to create sets and use difference(). To do this, enter the following code in cell J64:

set_ProductsMatch = set_Products.difference(set_challenge_Products)

Figure 2-202 shows the results.

FIGURE 2.202

FIGURE 2.202 You can use the difference() function to find the missing values from one set.

There are two other common set operators:

  • Union: Finds the elements that exist in either set.

  • Intersection: Finds the elements that exist in both sets.

You can also perform operations with more than two sets. For example, if you have three sets—A, B, and C—you can use the following code to find elements in A that are not in B or C:

set_differences = A.difference(B,C)

In this section, you have just glimpsed what is possible with sets. Set theory is used extensively in mathematics and data analytics.

Using conditional expressions

Conditional expressions allow you to follow a path according to the answers to questions. In Python, the keywords for conditional expressions are if, elif, and else.

For example, to set a rate according to a value, you could enter this Python code:

value = 44
if value <=10:
    rate = 50
elif value <=30:
    rate = 40
elif value <=50:
    rate = 20
else:
    rate = 10
rate

When this Python code is executed in a Python cell, the result is 20.

If you wanted to run this code with different values, you would use a loop.

Counting with loops

To explore loops, let’s start with another simple example. Start with the set set_Products that you created earlier. You can count the entities in the set by using the following Python code:

count=0
for product in set_Products:
    count=count+1
count

When this Python code is executed, the result in a Python cell is 7.

You could stop counting when count is 3. There are two ways to do this. The first method is to use break to get out of the loop:

count=0
for product in set_Products:
    if count==3:
      break
    else:
         count=count+1
count

The second method is to use continue to keep going through the loop but skip the count+1:

count=0
for product in set_Products:
    if count==3:
        continue
    else:
        count=count+1
count

Finally, here is a simple example of a while loop that also returns the value 3:

count=0
while count<=2:
    count=count+1
count

The Python syntax in this section should be enough to get you started, but there is one more area to cover before we finish the chapter: You need to know how to write your own Python functions.

Creating your own Python functions

You have been using preloaded Python functions for the examples so far. You can also create your own functions.

A Python function is a block of code that is assigned a function name. The rules for naming functions are the same as those for naming variables. As with variables, assigning a new function to a name means that the name no longer refers to the previous function. Once a function has been created, you can call it with the Python code in any Python cell by using the function name. Remember the Python calculation order: Cells to the left of and above the cell defining the function occur before the function is defined, so they will not recognize it.

You will find functions useful when repeating the same code lines. When maintaining your Python code, calling a function is more efficient than making changes each time those lines occur. It also saves time and space when creating new code that can use the function. To create a function, you begin with the keyword def. For example, enter this Python code in cell F91:

def fn_hello():
    return "Hello, I am a function"
fn_hello()

Figure 2-203 shows the results.

FIGURE 2.203

FIGURE 2.203 Creating a basic function

This simple example shows you the basic syntax required to write your own function. You must use parentheses after the function name when defining it and calling it, even if there are no parameters. You must use a colon (:) after the def statement, and you must indent the return statement. As always with Python, everything is case sensitive.

Consider this example with parameters and conditions:

def fn_UKtoUS(uk_word):
    if uk_word.lower() =="colour":
        return "color"
    elif uk_word.lower() =="analyse":
        return "analyze"
    else:
        return "No, I'm not changing it"
fn_UKtoUS("Colour")

In this example, we use lower() to lower the case of uk_word. The result of running the function with the argument set to "Colour" is "color". The argument uk_word is required. To enter an optional argument, you must specify the default value, as in the following code:

def fn_UKtoUS(uk_word,skip_colour=True):
    if uk_word.lower() =="colour" and not skip_colour:
        return "color"
    elif uk_word.lower() =="analyse":
        return "analyze"
    else:
        return "No, I'm not changing it"
fn_UKtoUS("Colour")

Because we have not specified a value for skip_colour, the value is True. The function would return "No, I'm not changing it". In Python code, it is not necessary to declare arguments in the same order as the function definition. We can put them in any order, specifying them by name, as shown here:

fn_UKtoUS( skip_colour=False,uk_word="Colour")

This would return the value "color".