Getting to know Python in Excel

By Liam Bastick and Kathryn Newitt
2/25/2026

Contents

Accessing Excel entities

In this chapter, you have created Python objects by entering data directly into Python cells. Let’s move on to manipulating the other data in an Excel workbook. You may recall that pandas is one of the preloaded Python libraries. The purpose of the pandas library is to share code that assists with manipulating and analyzing data. In particular, the pandas DataFrame is vital to Python in Excel; it enables you to work with tabular data. The rows, columns, and column headings of DataFrames are ideal for storing and analyzing Excel data.

pandas attempts to infer the data type of data, which works well in simple cases, as you will see. In this section, you will see how to examine DataFrames by using the data type card and the Python Editor to become familiar with the properties associated with Excel entities stored in Python objects. In this section, you will also encounter the one-dimensional form of a DataFrame, known as a series.

Extracting Excel data with the Python `xl()` function

To extract Excel data into a pandas DataFrame, you use a specific Python function, xl(), which is unique to Python in Excel. It has the following syntax:

xl("source", headers = True/False)

In this syntax, source is the Excel object, which can be identified by a cell reference or an Excel identifier. The headers option is a Boolean.

You may be familiar with creating tables in Excel by using Insert Table. The headers parameter is similar to the choice of specifying whether column headings are present in a table. If headers is True, the first row of Excel data will be used as headers. The output is a DataFrame object, although it could be a series or a scalar (that is, a single value), depending on the source. In this section, you will create examples for different Excel objects and explore the results.

It is good practice to specify a variable to assign to the DataFrame object, and this is the syntax you will use to do so:

x=xl("source", headers = True/False)

where x is the variable you are using to identify a Python object.

Working with cells

The smallest Excel entity is a single cell. In this section, we’ll look at how to extract the data from an Excel cell into a Python object. Figure 2-70 shows the Range Data sheet with a dataset that has not been converted to a table.

FIGURE 2.70 The data in the Range Data sheet has not been converted into a table.

Enter the following Python code in cell O10 to select the data in cell F11:

df_single_cell = xl("F11")

Figure 2-71 shows this Python code in the formula bar and the results in the worksheet.

FIGURE 2-71 The Python object when Python reads a single Excel cell shows the value of that cell.

The data type card indicates that because you have extracted a single numeric value, it has been recognized as an integer (see Figure 2-72).

FIGURE 2.72 The Python object when Python reads a single numeric cell with no decimal point is an integer.

If you change the Python syntax to select the data from cell J11 instead, the resulting data type card will be for a float. The Excel formatting is ignored, and the data value is extracted as 0.75 rather than 75% (see Figure 2-73).

FIGURE 2.73 The Python object when Python reads a single numeric cell with a decimal point is a float.

The Python object for a single cell corresponds to the data found in the cell. A text cell is read as a Python string. For example, if you amend the Python code in O10 to extract the data from H11, the string will have the value Chains (see Figure 2-74).

FIGURE 2.74 The Python object when Python reads a single text cell is a string.

Finally, let’s see how Python recognizes dates for the single-cell example. Add the heading Dates to cell K10 and enter the date 01 January 2025 in cell K11. Then change the Python code in cell O10 to extract data from cell K11. Figure 2-75 shows the results.

FIGURE 2.75 The Python object when Python reads a single cell containing a date is a datetime object.

The examples for single-cell values have demonstrated how Python interprets data types from data values. The way that dates can be detected is a useful feature of Python in Excel. You will learn more about the datetime object later in this chapter. Next, we will look at examples where multiple cells are extracted.

Using contiguous ranges

A contiguous range is a range of cells directly next to each other. Using the data on the Range Data sheet as a reference, F11:J12 would be contiguous, but F11:J11, H12 would not.

Enter the following Python code into cell O12 to extract data from the range F10:J18:

df_contig_range = xl("F10:J18",headers = True)

Figure 2-76 shows the Python code and the result.

FIGURE 2.76 The Python object when Python reads a contiguous range is a DataFrame.

Notice that when you select the range, the second parameter, headers=True, is included in the generated syntax. The data type card in Figure 2-77 shows an 8x5 DataFrame with the headers from the data. The row number is also included in the card, but not in the five columns of the 8x5 DataFrame. The card has been expanded by dragging the bottom-right corner.

FIGURE 2.77 The data type card for the contiguous range F10:J18 shows an 8x5 DataFrame.

The data types have been determined in much the same way as they were in the single-cell example. Year and Sales are right-justified numeric columns, Category and Product are left-justified text, and the decimal point in the Rating column has been preserved, indicating that it is a float.

The Excel value of cell O12 spills from cell O12 (see Figure 2-78).

FIGURE 2.78 The Excel value of the 8x5 DataFrame spills from cell O12.

Amend the Python code in cell O12 so that the second parameter, headers=True, is not entered:

df_contig_range = xl("F10:J18")

In the Excel Value view, the results look the same (see Figure 2-79).

FIGURE 2.79 The Excel value spilling from cell O12 remains the same when the headers parameter is not defined.

Change to the Python Object view. The data type card shows the header detail in the first row, and the DataFrame is now 9x5 (see Figure 2-80).

FIGURE 2.80 The data type card for cell O12 shows a 9x5 DataFrame where the first row contains the header values.

You can conclude that if you define the value of the headers parameter as True, the first row of data will be treated as headers in the DataFrame.

Now, amend the Python code in cell O12 to remove everything to the right of the equal sign and then select another contiguous range by selecting the data F11:J18 from the sheet (rather than by typing in the range). Figure 2-81 shows the selection and the Python code that is generated.

FIGURE 2.81 The data selected contains no header values, but headers=True is generated in the Python code.

This example shows that the default behavior when you select Excel data to be extracted to a DataFrame is that the headers parameter value defaults to True. Therefore, the DataFrame containing the extracted data has the dimensions 7x5, with the first line of data in the headings (see Figure 2-82). This is something to watch out for when selecting data to be extracted to Python objects.

FIGURE 2.82 The data selected contains no header values, but because headers=True, the first line of data is in the headings.

Now remove the second parameter so the formula looks like this:

df_contig_range = xl("F11:J18")

The data type card now shows an 8x5 DataFrame, and the header information contains numbers for the columns (see Figure 2-83). By default, if the headers parameter is not specified, the value of this parameter is False.

FIGURE 2.83 The data selected contains no header values, and the headers parameter is not defined, so the first line of data is in the first DataFrame row.

These examples help you understand the default behavior when extracting Excel data into DataFrames. By default, if you select the data rather than enter it as a formula, the first line of the data appears in the headings. If you enter the data as a formula without specifying the headers parameter, all the data appears in the DataFrame rows. It’s important to look out for the headers parameter as your Python code gets more complex.

Referencing DataFrames

In the example we’ve been working with, the variable df_contig_range allows you to use the DataFrame in other Python code. You can enter the following code in cell O21 to access the data that has been extracted to the Python object identified by df_contig_range:

df_point_at_contig_range = df_contig_range

Figure 2-84 shows the result, using the Excel Value view for cells O12 and O21.

FIGURE 2.84 Cell O21 accesses the variable df_contig_range, which is defined in cell O12.

In Python in Excel, it is also possible to point to the data in other cells, including cells containing Python objects. You must be careful with this method, though. The results can vary depending on how the original Python cell is displayed. To see an example of why it is important to take care when referencing Python objects using the Excel cell reference, enter the following Python code, where the original Python object is in cell O12, and you are referencing that object in cell O21:

df_point_at_contig_range = xl("O12")

Figure 2-85 shows the results: The original Python object is displayed in cell O12, and the Excel value is displayed in cell O21.

FIGURE 2.85 Cell O21 displays the spilled range, which is the same as the Excel value of the Python object in cell O12.

However, if you change cell O12 to display the Excel value instead, the result changes, as shown in Figure 2-86. Now only one value appears in cell O21.

FIGURE 2.86 Cell O21 selects only the first value of the spilled range when extracting data from O12.

With dynamic arrays, you can add a pound sign (#) after the cell reference to select all the spilled data (see Figure 2-87). (You will learn more about the properties of dynamic arrays later in this chapter.)

FIGURE 2.87 Using O12# to select the spilled array

If you use cell references to identify Python cells, you must ensure you always add the pound sign after the cell reference in case the Excel value of a Python cell is displayed as a dynamic array. This works with any cell value other than null, including a Python object (see Figure 2-88).

FIGURE 2.88 Using O12# also works with a Python object

The other consideration when using Excel cells as references for Python objects is that Excel referencing is relative. If you copy cell O21 to cell U21 and give the Python object a new variable name, the Excel cell referencing moves, too, as shown in Figure 2-89.

FIGURE 2.89 Copying a Python cell that uses an unanchored cell reference to identify another Python object can cause errors.

Because the Excel cell reference is not anchored, it now points at cell U12 instead of O12. U12 is empty, so it is an invalid target for the dynamic array reference. If you amend the formula in cell U21 to remove the pound sign, the result would be None, which is the equivalent of a null value in Excel. The #REF! error results from the dynamic array referencing an empty cell.

Another problem, the #SPILL! error, crops up when you reference the cell that contains the Python object rather than using the variable name. (We will look at this error in detail later in this chapter.) If the target cell is displayed in Excel Value view and has a #SPILL! error, the Python cell that references that cell also triggers a #SPILL! error.

To summarize, when creating and referencing DataFrames, you should assign and use variable names unless you have an unusual task that requires relative referencing of Excel data.

Carrying out Python calculations

You already know that Python code can refer to data that has occurred above or to the left (or on a previous sheet). This impacts the way that calculations in Python cells are refreshed. The default behavior for Python cells that use Excel data is that Python formulas recalculate automatically in row-major order when a value used in a Python formula is changed. This means across row 1 from column A to column XFD and then row 2 and so on, from the first worksheet to the last. If this behavior becomes a problem, it may be possible to control the way the Python cells are recalculated from the Calculation Options settings in the Formulas tab (see Figure 2-90).

FIGURE 2.90 The default calculation option is Automatic.

The default option for Calculation Options is Automatic, which means Python cells that use Excel data and Excel formulas will be recalculated if a value that the formula refers to changes. You can change the Calculation Options to Partial or Manual.

Selecting the partial calculation option

The Partial calculation option changes how formulas are calculated for Excel entities (excluding Data Tables and automatic PivotTables) (see Figure 2-91). At the time of writing, this impacts Python objects that use Excel data if you have the Python in Excel add-on. If you do not have the add-on, you will be prevented from manually recalculating cells if you select Partial or Manual. (You will learn more about this later in the chapter.)

FIGURE 2-91 The Partial calculation option excludes some Excel entities.

If you have the Python in Excel add-on and Partial is selected, there are four ways to manually recalculate the Python cells that use Excel data:

Click Calculate Now on the Formulas tab (or press F9).
Click Calculate Sheet on the Formulas tab (or press Shift+F9).
Update the Python formula in the cell.
Click the Reset option in the Python section of the Formulas tab.

Figure 2-92 shows the Calculate Now option.

FIGURE 2.92 The Calculate Now option recalculates the entire workbook, including Python cells that use Excel.

You may have noticed that in this section, we have referred to Python cells that use Excel data. This is because the Calculate Now and Calculate Sheet options do not recalculate Python cells that do not reference Excel data. Before we look at the options to refresh Python calculations in the Python section of the Formulas tab, let’s look at what happens to Python cells that do not reference Excel data.

In cell O30, enter the following Python code:

import random
No_Excel = random.randint(1,10)

This Python code loads the Python library random and then generates a random number from 1 to 10. Figure 2-93 shows one result, which will probably not be the same as yours.

FIGURE 2.93 Using Python to generate a random number

Now select cell F9 and notice what happens to the value in cell O30: It stays the same.

In the Python section of the Formulas tab, click Reset (see Figure 2-94).

FIGURE 2.94 The Reset functionality in the Python section of the Formulas tab resets global Python variables and triggers calculation if the calculation type is Automatic.

Note that the Reset dropdown list has two items: Reset and Reset Runtime (see Figure 2-95). Reset is the default, and Reset Runtime is similar, except that it does not reset the global Python variables.

FIGURE 2.95 Reset Runtime triggers the recalculation of Python cells.

When you click Reset or Reset Runtime, the value in cell O30 is recalculated (see Figure 2-96).

FIGURE 2.96 Select Reset to recalculate cell O30 and generate a different random number.

In summary, you can use the Reset options in the Python section of the Formulas tab to recalculate all Python cells. The Calculate Now and Calculate Sheet options only recalculate Python cells that use Excel data in the workbook or sheet, respectively. The Partial calculation setting works with Python cells only if you have the Python in Excel add-on. The same is true for the Manual calculation setting.

Calculating manually

The Manual calculation option in the Calculation Options dropdown menu applies to all formulas in the workbook. If this option is selected, Excel formulas can be recalculated by using Calculate Now or Calculate Sheet. Python formulas that use Excel references are also recalculated. Python cells not using Excel data must be recalculated by using the Reset options in the Python section of the Formulas tab. If you set the Calculation option to Manual when the Python in Excel add-on is not installed, you get errors when updating the Python cells. (You’ll learn more about this later in the chapter.)

In this section, you have created Python cells that extract Excel data from contiguous ranges into DataFrames. You have also seen how to reference other DataFrames and how to recalculate Python cells. This experience will be helpful as you move on to extracting data from other Excel entities, beginning with named ranges.

Identifying cells with named ranges

Much like the variables you use to reference Python objects, a named range is a name that can be used in Excel formulas to identify a cell or a group of cells.

Rules related to creating a name for a cell or a range of cells apply when naming tables too:

Names are not case sensitive.
A name must begin with a letter, an underscore (_), or a backslash (\).
A name cannot be longer than 255 characters.
A name cannot contain spaces, and most punctuation is not allowed.
r and c are not valid names because they are reserved for row and column.
A name cannot be the same as a default cell name.

The default name for a cell is the column letter followed by the row number (for example, A1). A named range allows you to enter a meaningful name that will make formulas easier to follow.

In the same Excel workbook you have been using (SP Python in Excel Example.xlsm), open the sheet Named Range. Select the data in cells F10:I15. In the name box, enter the name NamedRange1 (see Figure 2-97). You must press Enter after typing the named range in the Name Box to assign it to the cells.

FIGURE 2.97 Defining a named range

You can also define a named range in the Name Manager on the Formulas tab (see Figure 2-98). The Name Manager also shows any Excel table names, which you will use in the next section.

FIGURE 2.98 Named ranges can be created, amended, and deleted in the Name Manager.

Open the Name Manager, and the names in the workbook are shown, including NamedRange1 (see Figure 2-99).

FIGURE 2.99 The named range NamedRange1 appears in the Name Manager.

Python in Excel recognizes named ranges. To see how it works, enter the following Python code in cell O10:

df_named_range=xl("NamedRange1", headers=True)

The headers parameter behaves in the same way it does for cell ranges. Figure 2-100 shows the results of entering the code: The original data is shown on the sheet, and the data in the named range NamedRange1 spills from cell O10.

FIGURE 2.100 NamedRange1 is extracted to a DataFrame, and the Excel value is displayed.

Understanding the limitations of using named ranges with Python in Excel

The Python function xl() allows only one Excel entity to be defined. You may be tempted to get around this by including more than one range in a named range (that is, by creating a noncontiguous range where multiple sets of data are not right next to each other). Don’t do it. The next example demonstrates what happens if you try to extract a noncontiguous named range into a DataFrame.

In this example, you will attempt to extract the headings and some data that is not directly under the headings, which would create a subset of the data and retain the headings. Select cells F10:J10 and F12:J14 to create a named range and give it the name NamedRangeNonContig. Enter the following Python code in cell O17:

df_named_range_nc=xl("NamedRangeNonContig", headers=True)

Figure 2-101 shows the results of trying to use a named range to enter more than one contiguous range: Cell O17 contains the error code #BUSY!, and an error message is displayed.

FIGURE 2.101 Trying to extract a named range that references more than one contiguous range prompts an error.

The error message is “Excel ran out of resources while attempting to calculate one or more formulas. As a result, these formulas cannot be evaluated.” Do not try to extract data by using a named range that references more than one contiguous range. The results have varied as Python in Excel has been developed, but they have never been useful! In this example, if you wanted to extract the data from the noncontiguous range, you could extract the data in cells F10:J14 and then remove data from the DataFrame. (You will learn how to transform the data in DataFrames in the next chapter.)

Delete cell O17 to remove the error. Let’s move on to Excel tables.

Storing data in Excel tables

The examples you have seen so far with Excel data have used cell references or named ranges to access the data. In Python in Excel, you can also select all the data in an Excel table.

One major advantage of using Excel tables to store data rather than using ranges is that they are dynamic—so if you add or remove rows or columns, the table will reference the current data.

Open the sheet Table Data in the current workbook. The data is currently in a range that extends from F10 to J85. Enter the following Excel formula in cell L10:

=MAX(F11:F85)

This formula simply finds the maximum Year column value, which is 2025 (see Figure 2-102).

FIGURE 2.102 Determining the latest year in the data

Now add a row to the bottom of the data, using the following values:

Year=2026
Category=Tools
Product=Spanner
Sales=1000
Rate=50%

The value in cell L10 is still 2025, and you may get a warning notice that the formula omits adjacent cells, as shown in Figure 2-103.

FIGURE 2.103 The extra row is not included in the MAX calculation, and a warning appears.

Now, you will create a table for the original data. Delete the bottom row and the contents of cell L10. Click anywhere in the data and press Ctrl+A to select the whole range. On the Insert tab, choose Insert Table (see Figure 2-104).

FIGURE 2.104 The Insert Table option allows you to organize and analyze related data.

The tooltip tells you that creating a table is useful for organizing and analyzing related data, and a table makes it easier to sort, filter, and format data. Because you will analyze data in this example, tables will be particularly useful.

When you select the Table option, you are prompted to confirm the range and whether headers exist in the data, as shown in Figure 2-105.

FIGURE 2.105 The Create Table dialog allows you to change the range and confirm the existence of headers.

Accept the defaults and click OK. The format of the data changes, and the contextual tab Table Design appears (see Figure 2-106). Your table may have different colors if your default table format is different from ours. (If you are reading a printed copy of the book, you will see only black-and-white text and figures.)

FIGURE 2.106 The Table Design tab is available when a table is inserted.

A filter button is added to each column by default. You can toggle this option by using the Filter Button checkbox in the Table Style Options section of the Table Design tab. There is a Table Name box, where the table in this example is currently named Table2. Your table name may include a different number if you have already tried creating tables in your workbook. No matter what your table is currently named, rename it TblExcelData so that the name is more helpful. Make sure you enter the name by pressing Enter after typing it in.

Now you can find the latest year again. When you enter the MAX formula in cell L10 again, select all the data in the Year column (including the column heading), and the formula bar shows the table syntax for the column:

=MAX(TblExcelData[[#All],[Year]])

As shown in Figure 2-107, the result in cell L10 is 2025.

FIGURE 2.107 Selecting the whole column causes table references to be inserted into the formula.

Now add a row at the bottom of the table, using the following values:

Year=2026
Category=Tools
Product=Spanner
Sales=1000
Rate=50%

The result is updated to 2026, as shown in Figure 2-108.

FIGURE 2.108 Adding rows to a table dynamically changes the result of the formula.

This is a simple example, but it demonstrates a powerful property of tables: The syntax for a table will also be used by Python. In this example, the data in the Year column is referenced by TblExcelData[[#All],[[Year]], which begins with the table name and then has square brackets ([]) to indicate that column data is being referenced. [#All ] means that headers and data are included, and [Year] is the column you are interested in for this example.

You can also use these table references:

[#Data] is the data in the table, excluding the headings.
[#Headers] is just the headings.
[#Totals] selects the total row if Total Row is toggled on in the Table Design tab; otherwise, it returns null.

Referencing Excel tables

Let’s move on to creating a Python cell to read the data. Continuing with the same worksheet you’ve been using, in cell O10, enter the following Python code:

df_table=xl("TblExcelData[#All]", headers=True)

Figure 2-109 shows the Excel value of cell O10.

FIGURE 2.109 Using Python to extract an Excel table into a DataFrame

Change the view to Python Object view for cell O10 and look at the data type card (see Figure 2-110).

FIGURE 2.110 The Python object for an Excel table with 5 columns and 76 rows is a 76x5 DataFrame.

When dealing with very large datasets, you cannot view all the rows on the data type card. The alternatives are to view the Excel value or to use the Python Editor. (You will review the examples from this chapter using the Python Editor in Chapter 4.)

To complete this section on Excel tables, remove the final row you added at the bottom of the table and then check the data type card to see the results (see Figure 2-111).

FIGURE 2.111 Remove the final row to update the Python code automatically.

Referencing dynamic arrays

The Excel Value view for a DataFrame is a dynamic array, so it is no surprise that Python in Excel can extract dynamic arrays. A dynamic array function returns a result that can be more than one value. The result can spill to the right and below the cell that contains the function. If the result of the Python code in a Python cell in Excel is more than one value, then the Excel value is displayed as a dynamic array. (You encountered this concept in the “Referencing DataFrames” section earlier in this chapter.) You can use Python in Excel to reference dynamic arrays created by other Excel functions.

Let’s start by entering a dynamic array formula. Open the sheet Dynamic Array and then use the Excel function RANDARRAY() in cell F10:

=RANDARRAY(5,6,10,300,TRUE)

This function generates an array with five rows and six columns. The minimum value is 10, and the maximum is 300. The final parameter indicates that the array should contain integers (whole numbers).

In cell P10, enter the following Python code:

df_dynamic_array= xl("F10#")

When you view the Excel value of P10, you should find that the array spilling from P10 has the same values as the array spilling from F10 (see Figure 2-112).

FIGURE 2.112 The Python cell displays the same dynamic array values as the referenced dynamic array.

Accessing data in Power Query queries

The data for the examples in this chapter has been in an Excel workbook. To work with external data, Python in Excel can access data in Power Query queries. This section is particularly important because Python in Excel needs to use the Power Query data connectors to access external data. In case you are unfamiliar with Power Query, we will walk you through how to extract the data we will use for the example.

Power Query is also known as Get & Transform. You can find it in the Data tab, in the section Get & Transform Data (see Figure 2-113). Power Query is an integrated extract, transform, and load (ETL) engine that allows you to access data from a wide range of sources.

FIGURE 2.113 The Get & Transform Data section of the Data tab

In this example, you will use the Power Query connectors to extract data from external sources for use in Python in Excel. You do not need to load the data into the Excel workbook to do this. The Get Data dropdown menu shows the available source groups. In Figure 2-114, the From File dropdown menu has been opened to show the sources for that group.

FIGURE 2.114 The Get Data dropdown menu in the Get & Transform Data section on the Data tab

At the time of writing, a new alternative to the Get Data dropdown menu was in preview: the Get Data (Power Query) dialog (see Figure 2-115). This new dialog allows you to view all the connectors in one place rather than opening multiple dropdown menus.

FIGURE 2.115 The Get Data (Power Query) dialog shows all the sources that can be accessed.

One of the workbooks you downloaded as part of the downloadable resources for this book is the Financial Sample.xlsx Excel workbook. This is a sample data workbook available from Microsoft that you can also access at https://learn.microsoft.com/en-us/power-bi/create-reports/sample-financial-download. If you haven’t already downloaded it, do so now and note its location. You will browse to this location for the next example.

In the Get & Transform Data section on the Data tab, use the Get Data dropdown menu to access the From File group and then select the From Excel Workbook option (see Figure 2-116).

FIGURE 2.116 The From Excel Workbook connector is in the From File group.

When prompted to browse to the workbook’s location, select the workbook in the browser and choose Import. The Navigator dialog will appear (see Figure 2-117).

FIGURE 2.117 The Navigator dialog for an Excel workbook

When you extract data by using Power Query, the Navigator dialog varies depending on the type of data you are accessing. In a workbook, you can access sheets and tables from the Navigator dialog. Select the table financials to preview the data, as shown in Figure 2-118.

FIGURE 2.118 A preview of the data in the financials table

Once you have selected the table, you can see a preview of the data, and you can choose to transform the data. Do not choose to load it, or you will write all the data into the workbook in a new sheet called financials. (If you do this accidentally, delete the financials sheet.)

Click the Transform Data button to display the Power Query Editor (see Figure 2-119).

FIGURE 2.119 The Power Query Editor

A snapshot has been taken of the financials table in the Financial Sample workbook. The menu options can be used to transform the data. Every transformation creates a new step, and all steps are recorded in the Applied Steps section. Some steps can be performed automatically, which is why there are three steps shown. Every query has a Source step: For this example, the Source step identifies the workbook. The Navigation step selects the table. The Changed Type step detects the data type of each column based on algorithms performed on a sample of the data.

For this example, you will not perform further transformations. Instead, you will load the data into a Power Query query not loaded to the workbook. To do this, select the Close & Load dropdown menu on the Home tab and select Close & Load To (see Figure 2-120).

FIGURE 2.120 On the Close & Load dropdown menu, choose the Close & Load To option to specify where to load the results.

The Import Data dialog appears, with defaults selected (see Figure 2-121). It allows you to control the output so the data can be accessed from the query without being written to a new worksheet.

FIGURE 2.121 The Import Data dialog allows you to specify how the data is stored in the workbook.

In the Import Data dialog, select Only Create Connection. Leave the option Add This Data to the Data Model unchecked because Python in Excel cannot access data in the Data Model. Click OK, and the Queries & Connections pane opens (see Figure 2-122).

FIGURE 2.122 The financials query appears in the Queries & Connections pane as a connection-only query.

Now that you have created a connection to the external data, you can use a Python cell to access the data. Open the sheet called Power Query. In cell F10, enter the following Python code:

df_power_query = xl("financials")

Figure 2-123 shows the results in the Excel Value view.

FIGURE 2.123 The financials query has been extracted to a DataFrame, and the Excel Value view displays the data from the external workbook.

The query is a snapshot of the table in the external workbook. If you hover over the financials query in the Queries & Connections pane, you can then click the Refresh icon to refresh the query (see Figure 2-124). If the calculation mode is automatic, Excel automatically refreshes the Python cell.

FIGURE 2.124 Clicking the icon refreshes the query.

To complete this example, change cell F10 to the Python Object view and look at the data type card (see Figure 2-125).

FIGURE 2.125 The DataFrame for the Power Query query is a 700x16 DataFrame.

You have now seen all the Excel entities that Python in Excel references.

NOTE

Python in Excel cannot reference some parts of Excel. You cannot use a Python cell to reference the following Excel entities:

Formulas: You cannot use Excel formulas in Python code.
Charts: Python code cannot reference charts. If you try to reference a cell with a sparkline, the result is None.
PivotTables and PivotCharts: Using xl() to access a PivotTable or PivotChart name fails with a #PYTHON! error because these names are not recognized by Python in Excel.
Macros and VBA code: You cannot reference macros or VBA code in a Python formula.

Referencing elements of DataFrames

Now, let’s look at how to reference parts of the DataFrames you have created. As you discovered earlier, you can reference other Python DataFrames by using either the variable name associated with a DataFrame name or the cell location. The best practice is to use the variable name to avoid errors with unanchored cell referencing and to make formulas easier to understand.

It is also possible to reference part of a DataFrame. To see how this works, on the sheet Table Data, enter the following Python code in cell V10:

df_table_column = df_table['Category']

This code selects the column Category from the DataFrame df_table. Figure 2-126 shows the Excel Value view for cell V10.

FIGURE 2.126 You can reference a column on an existing DataFrame.

Change cell V10 to the Python Object view and look at the data type card (see Figure 2-127).

FIGURE 2.127 The DataFrame for a single column is a series.

Before you can reference rows, you must identify the row(s) you want to extract. One way to do this is to specify the column values. In cell Y10, enter the following Python code to extract all the rows from 2025:

df_table_row=df_table.loc[(df_table["Year"]==2025)]

In this code, you locate the data where the column Year of df_table has the numeric value 2025. The double equal sign (==) is the Python code required to indicate that the column value should equal the value specified. Figure 2-128 shows the results in the Excel Value view.

FIGURE 2.128 Extracting rows from a DataFrame based on column values

Note that the row number column, also known as the index column, is also shown for the Excel Value view for this DataFrame. The Excel Value view does not usually show this. We will look more closely at the use of this index in a moment, but first let’s expand this example to show rows for 2025 where the category is Components. Change the Python code in cell Y10 to the following:

df_table_row=df_table.loc[(df_table["Year"]==2025)&(df_table["Category"]=="Components")]

Figure 2-129 shows the results.

FIGURE 2.129 Extracting rows from a DataFrame based on multiple column values

Change the view to Python Object view and look at the data type card for cell Y10 (see Figure 2-130). This DataFrame is a reduced version of the DataFrame df_table, and the row numbers shown here are the row numbers from df_table.

FIGURE 2.130 The DataFrame for the selected rows

You can also reference a row in a DataFrame by using the row number column that appears on the data type card. To see this in action, in cell Y20, enter the following Python code:

df_table_row_index = df_table.iloc[0]

This extracts the first row of the DataFrame df_table, as shown in Figure 2-131.

FIGURE 2.131 Using the row index to extract a row returns the results in a series displayed in columns in the Excel Value view.

Change to the Python Object view for cell Y20 and look at the data type card (see Figure 2-132).

FIGURE 2.132 Using the row index to extract a row returns the results as a 5x1 series, not a 1x5 series.

It is important to understand that extracting the row in this way will return a column of data with the headings as row identifiers; it will not return a row of data.

The final example in this section uses the iloc() function to identify a cell. Enter the following Python code in cell AC20:

df_table_cell_index = df_table.iloc[0,2]

This code finds the value in the first row and the third column, as shown in Figure 2-133. (We will consider the iloc() function in more detail when discussing the pandas library in the next chapter.)

FIGURE 2.133 Using the row index and the column index to extract a value

Change cell AC20 to the Python Object view and look at the data type card (see Figure 2-134).

FIGURE 2.134 Using the row index and the column index results in a Python string.

If you expected the result to be a Python string, then you are getting the hang of this!

Solving an Excel entities challenge

In this section, you’ll have a chance to use some of the code from the Excel entity examples along with some new Python code to solve a challenge. In the same workbook you have been using (SP Python in Excel Example.xlsm), open the Challenge sheet (see Figure 2-135).

FIGURE 2.135 The Challenge sheet

If you use Power Query regularly, you know that when you extract data from the current workbook, it is converted into a table. This can be a problem for users who prefer to keep their data as a range. Python comes to the rescue for these users: It can extract the data into a DataFrame without impacting the data format. To see how this works, start by creating a DataFrame for the data range in cell M14:

df_data_range=xl("F10:F33")

Leave cell M14 as a Python object and look at the data type card (see Figure 2-136).

FIGURE 2.136 Start by creating a 24x1 DataFrame for the data range.

You have already given the DataFrame for TblExcelData the name df_table. Now, you will join the DataFrames to see what values are not in both of them. This process is called merging. The type of merge, or join, you need to perform here is called a left anti join (see Figure 2-137).

FIGURE 2.137 A left anti join takes everything in DataFrame A that does not exist in DataFrame B.

You need to take two steps in Python. First, you create an outer join, which is everything in each of the DataFrames, and then you exclude the data in the second DataFrame. To do this, in cell M15, enter the following Python code:

df_outer=pd.merge(df_table,df_data_range,how='outer',left_on='Product',right_on=0,indicator=True)

Let’s break this up. You are using the pandas merge functionality to join DataFrame df_table (the left DataFrame) to DataFrame df_data_range (the right DataFrame). You have defined the join type (how) to be an outer join. You need to specify what columns to use to join the DataFrames. In df_table, the column is called Product, and in the data type card for df_data_range, the column header is the number 0. Finally, because indicator is True, a column is created to provide information about the merge. (You will need this for the next step.) The result, as shown in Figure 2-138, is a DataFrame.

FIGURE 2.138 The outer join produces a 75x7 DataFrame.

Next, you keep the rows where the _merge column has the value left_only. This means the product is in df_table but not in df_data_range. In cell M16, enter the following Python code:

df_anti_join = df_outer[df_outer['_merge']=='left_only'].drop(columns=['_merge'])

Figure 2-139 shows the results.

FIGURE 2.139 The result when selecting the rows where the merge is left_only is a 3x6 DataFrame.

The final column contains the value nan (which sometimes appears as NaN and stands for “not a number”) because the data is missing or undefined. It cannot be displayed in the Excel Value view (see Figure 2-140).

FIGURE 2.140 The NaN values cannot be viewed in the Excel Value view.

To complete this example, you need to remove the final column. To do so, you can use the drop command that was part of the Python code for cell M16. Enter the following Python code in cell M18:

dp_solution= df_anti_join.drop(0,axis='columns')

axis is required to tell Python whether to drop columns or rows. For this example, you could substitute 1 for 'columns' because it means the same thing to Python, but using 'columns' makes the purpose of the code clearer.

You can now view the results of cell M17 by using the Excel Value view (see Figure 2-141).

FIGURE 2.141 The Excel Value view for the solution displays three rows.

The solution is dynamic. To test this, delete the value Chains from cell F10. The Python results update, and the rows for the product Chains are shown in the solution (see Figure 2-142).

FIGURE 2.142 Deleting the product Chains from the list causes the results to update.

This example demonstrates how you can use Python to solve a practical problem. You may have encountered some errors while experimenting with the Python code you entered. In the next section, you’ll learn about possible errors and how to deal with them.

Save to your account