Merging Multiple Excel Files Using Python in a Few Steps

In practical work, we may encounter a scenario where we need to merge data from many Excel files into a single Excel file. Today, let’s see how to achieve this functionality using Python.

  • Generate the Excel files to be merged

To implement this Excel merging functionality, we need a batch of Excel files. Let’s see how we can generate a batch of Excel files using code.

from openpyxl import Workbook
def genExcelFile(dst_file, sheet_name): # Data to be generated in the Excel file
    data_list = [['Name','Age','Sex'],
                 ['Regina',34,'female'],
                 ['Bob',35,'male']]
    workbook = Workbook()
    xlsheet = workbook.active
    # Get the default worksheet
    xlsheet.title = sheet_name
    # Set the worksheet name
    for i,row in enumerate(data_list,start=1):
        for j,item in enumerate(row,start=1):
            xlsheet.cell(row=i, column=j, value=item)
    # Save the file
    workbook.save(dst_file)
for i in range(10):
    genExcelFile(f"./execlfiles/{i+1}.xlsx", f"result{i+1}")

After executing the above code, you can find 10 Excel files in the execlfiles directory. Among them, enumerate is a built-in function in Python that allows you to get both the index and value while iterating over an iterable (such as a list, tuple, string, etc.). The parameter start indicates the starting index value.

  • Iterate through the Excel files

How can we get the file paths of the 10 Excel files we just generated?

from pathlib import Path, PurePath
def getExeclFiles(src_path):
    p = Path(src_path)
    files = [x for x in p.iterdir() if PurePath(x).match('*.xlsx')]
    return files
files = getExeclFiles("./execlfiles")
for file in files:
    print(file)

The above code sequentially retrieves all files in the path pointed to by the src_path variable. To avoid too many file types in this directory, an if statement is used for conditional judgment, extracting only files that end with .xlsx. By using the print function, you can see the paths of all generated files.

  • Merge the data from the Excel files

First, let’s modify the function that generates the Excel files to pass the Excel data as a parameter.

def genExcelFile(dst_file, sheet_name, data_list):
    workbook = Workbook()
    xlsheet = workbook.active  # Get the default worksheet
    xlsheet.title = sheet_name  # Set the worksheet name
    for i,row in enumerate(data_list,start=1):
        for j,item in enumerate(row,start=1):
            xlsheet.cell(row=i, column=j, value=item)   # Save the file
    workbook.save(dst_file)

Then execute the following code to merge the Excel files:

import openpyxl
content = []
for file in files:
    wb = openpyxl.load_workbook(file)  # Replace with your file path
    ws = wb.active
    content = []
    for row in ws.iter_rows(values_only=True):
        content.append(list(row))
    print(content)
genExcelFile("./execlfiles/result.xlsx", "Merged Results", content)

After executing, you can see a result.xlsx file in the excelfiles directory, which contains all the data from the initially generated 10 files.

Leave a Comment