In practical work, we may encounter a scenario where we need to merge data from many Excel files into a single Excel file. Today, let’s see how to achieve this functionality using Python.
-
Generate the Excel files to be merged
To implement this Excel merging functionality, we need a batch of Excel files. Let’s see how we can generate a batch of Excel files using code.
from openpyxl import Workbook
def genExcelFile(dst_file, sheet_name): # Data to be generated in the Excel file
data_list = [['Name','Age','Sex'],
['Regina',34,'female'],
['Bob',35,'male']]
workbook = Workbook()
xlsheet = workbook.active
# Get the default worksheet
xlsheet.title = sheet_name
# Set the worksheet name
for i,row in enumerate(data_list,start=1):
for j,item in enumerate(row,start=1):
xlsheet.cell(row=i, column=j, value=item)
# Save the file
workbook.save(dst_file)
for i in range(10):
genExcelFile(f"./execlfiles/{i+1}.xlsx", f"result{i+1}")
After executing the above code, you can find 10 Excel files in the execlfiles directory. Among them, enumerate is a built-in function in Python that allows you to get both the index and value while iterating over an iterable (such as a list, tuple, string, etc.). The parameter start indicates the starting index value.
-
Iterate through the Excel files
How can we get the file paths of the 10 Excel files we just generated?
from pathlib import Path, PurePath
def getExeclFiles(src_path):
p = Path(src_path)
files = [x for x in p.iterdir() if PurePath(x).match('*.xlsx')]
return files
files = getExeclFiles("./execlfiles")
for file in files:
print(file)
The above code sequentially retrieves all files in the path pointed to by the src_path variable. To avoid too many file types in this directory, an if statement is used for conditional judgment, extracting only files that end with .xlsx. By using the print function, you can see the paths of all generated files.
-
Merge the data from the Excel files
First, let’s modify the function that generates the Excel files to pass the Excel data as a parameter.
def genExcelFile(dst_file, sheet_name, data_list):
workbook = Workbook()
xlsheet = workbook.active # Get the default worksheet
xlsheet.title = sheet_name # Set the worksheet name
for i,row in enumerate(data_list,start=1):
for j,item in enumerate(row,start=1):
xlsheet.cell(row=i, column=j, value=item) # Save the file
workbook.save(dst_file)
Then execute the following code to merge the Excel files:
import openpyxl
content = []
for file in files:
wb = openpyxl.load_workbook(file) # Replace with your file path
ws = wb.active
content = []
for row in ws.iter_rows(values_only=True):
content.append(list(row))
print(content)
genExcelFile("./execlfiles/result.xlsx", "Merged Results", content)
After executing, you can see a result.xlsx file in the excelfiles directory, which contains all the data from the initially generated 10 files.