Kirin Linux | The Ultimate Showdown: find -exec vs xargs for Efficient Shell Batch Processing

Note: This article is an original work by Liu Feng from Anyatech. Please respect intellectual property rights. When sharing, please indicate the source. No unauthorized copying, adaptation, or reproduction is allowed.

Introduction

In Linux system administration and daily development, we often face a common requirement: to find a batch of files that meet certain criteria and then perform the same operation on them, such as deleting, renaming, modifying permissions, or processing content.

The find command is the king of file searching, but how can we efficiently perform operations on the search results?

At this point, the two main characters, find -exec and xargs, take the stage.

This article will delve into the internal mechanisms, performance differences, security considerations, and best use cases for these two methods, along with a real case of cleaning up a large number of Oracle log files, helping you make the wisest choice in Shell batch processing.

Introducing the Main Characters — Understanding find, -exec, and xargs

find Command

Its responsibility is very specific — to recursively search for files and directories in a specified directory structure based on various conditions (name, size, modification time, etc.) and output the result list.

-exec Operation

This is an action built into the find command. It allows you to execute an external command for each result found directly within the find command.

xargs Command

This is a standalone command, short for “extended arguments.” Its core function is to read data from standard input (stdin) (usually a list of filenames) and then use this data as parameters to construct and execute a specified command.

It typically works in conjunction with find through a pipe (|).

find -exec — Direct and Reliable

Built-in Executor

find -exec has two main execution modes, determined by the command-ending delimiters: \; and +.

This subtle difference is key to understanding its performance.

Mode One: Inefficient but Flexible \;

Syntax Example:

find . -name "*.tmp" -exec rm {} \;

How It Works:

a. For each matching file found (e.g., a.tmp), find starts a new rm process to execute rm a.tmp.

b. When the next file is found (e.g., b.tmp), it starts another new rm process to execute rm b.tmp.

c. This continues in a loop…

Interpretation

{ } is a placeholder representing the filename currently found by the find command.
\; is the command-ending marker (it needs to be escaped with \ to prevent being interpreted by the Shell).
Performance Bottleneck

If there are 10,000 files, find will create and destroy 10,000 rm processes.

The creation and destruction of processes have system overhead, which becomes very slow when dealing with a large number of files.

Mode Two: Efficient +

Syntax Example:

find . -name "*.tmp" -exec rm {} +

How It Works:

a. find collects as many filenames as possible.

b. It then passes these filenames to a single rm command, constructing a long command like rm a.tmp b.tmp c.tmp …

c. It only starts as few rm processes as possible.

Interpretation

+ tells find to “pack” the parameters.
Performance Advantage

Greatly reduces the overhead of process creation, achieving performance comparable to xargs.
Security

Since find directly passes the filenames to exec, it can correctly handle filenames containing spaces, newlines, and other special characters, making it very reliable.

xargs — Powerful and Flexible Pipeline Artist

How xargs Works

xargs receives the output of find through a pipe, which is a classic Unix philosophy combination.

Syntax Example:

find . -name "*.tmp" | xargs rm

How It Works:

a. The find command prints the list of filenames it finds to standard output (stdout) and sends it to xargs through a pipe (|).

b. xargs reads these filenames from standard input (stdin).

c. xargs also “packs” these filenames, constructing a command like rm a.tmp b.tmp c.tmp … and executes it.

Performance Advantage

Similar to find -exec … +, xargs also achieves high performance through batch processing of parameters.

Disadvantages of xargs and Solutions

xargs has a critical flaw in its default behavior: it uses spaces or newlines as parameter delimiters.

If a filename itself contains spaces (e.g., my report.tmp), xargs will incorrectly parse it as two separate parameters my and report.tmp, leading to command execution failure or unexpected results.

Trap Example: Space Trap

1) Prepare the environment:

mkdir -p /tmp/xargs_test &amp;&amp; cd /tmp/xargs_test
touch "a dangerous file.txt"

2) Incorrect use of xargs:

find . -type f -name "*.txt" | xargs rm

3) Catastrophic Result

The rm command will report an error, and the file will not be deleted!

rm: cannot remove './a': No such file or directory
rm: cannot remove 'dangerous': No such file or directory
rm: cannot remove 'file.txt': No such file or directory

4) Cause Analysis

xargs sees three words separated by spaces: ./a, dangerous, and file.txt, so it incorrectly executes rm ./a dangerous file.txt.

Solution

The Golden Pair -print0 and -0

This is the best practice for using xargs and the safest way.

Syntax Example:

find . -type f -name "*.txt" -print0 | xargs -0 rm

ls
--output empty line

How It Works:

a. find … -print0

find outputs each filename using a null character (NULL character, \0) as a separator instead of a newline.

Since filenames themselves cannot contain null characters, this is an absolutely safe delimiter.

b. xargs -0

xargs is told that the input stream is null-separated, and it should parse parameters accordingly.

This combination ensures that even if filenames contain spaces, quotes, newlines, or any special characters, they will be processed correctly.

Chapter Four: In-Depth Feature Comparison

Method One: find -exec … \; (Individual Execution Mode)

Performance: Extremely Low

This is its biggest drawback.

Starting a new process for each found file incurs unacceptable overhead when the number of files is large.

Security: Extremely High

Since the find command natively handles filenames, it can perfectly and unambiguously process any filenames containing spaces, newlines, quotes, and other special characters.

Syntax: Simple and Intuitive, Integrated with find

Flexibility: Very Flexible

The placeholder {} can be placed anywhere in the executed command.

For example, to copy files and add a suffix: find . -name “*.txt” -exec cp {} {}.bak \;.

Parallel Processing: Not Supported.

Method Two: find -exec … + (Batch Execution Mode)

Performance: High.

It packs a large number of filenames into a single command’s parameter list, starting very few processes, achieving performance comparable to xargs, and is very efficient.

Security: Extremely High.

Like the \; mode, it inherits the advantage of find’s native handling of filenames, safely processing all special filenames.

Syntax: Simple, just replace \; with +.

Flexibility: Limited.

Since {} represents a long list of filenames, it must appear at the end of the executed command. You cannot place it in the middle like in the \; mode.

Parallel Processing

Not Supported.

Method Three: find … -print0 | xargs -0 … (Pipeline and xargs Mode)

Performance: High.

Similar to -exec +, it achieves high performance through batch processing of parameters.

Security: Conditional Safety.

It is absolutely safe only when using -print0 and -0, this “golden pair.”

If you forget to use it, it will incorrectly handle filenames containing special characters due to the default use of spaces as delimiters, leading to serious issues.

Syntax

Relatively Complex.

Requires understanding how pipes, -print0, and -0 work together.

Flexibility: High.

Although parameters are also appended at the end by default, you can customize the placeholder using the -I option to achieve flexibility similar to the \; mode.

For example: find … -print0 | xargs -0 -I % cp % %.bak.

Parallel Processing: Supported!

This is a unique “killer feature” of xargs.

By using the -P option (e.g., -P 4), you can easily achieve parallel processing, significantly reducing total time when performing CPU-intensive tasks (like image compression, code compilation) on multi-core CPUs.

Simulation Experiment and Performance Comparison

To intuitively feel the performance differences, let’s conduct a simple experiment.

We will create 10,000 empty files and then use the three methods to perform a simple echo operation.

We will use the time command to measure the execution time of each method.

Step 1: Prepare the Experiment Environment

# Create a directory for the experiment
mkdir test_dir &amp;&amp; cd test_dir

# Create 10,000 empty files
touch file_{1..10000}.tmp

Step 2: Performance Testing

Test 1: find -exec … \; (Individual Execution)

time find . -name "*.tmp" -exec echo {} \; &gt; /dev/null

Expected Behavior

You will feel that this command executes relatively slowly.

It will start 10,000 echo processes.
Typical Output (Time Part):

real    0m18.652s
user    0m2.540s
sys     0m13.863s

Test 2: find -exec … + (Batch Execution)

time find . -name "*.tmp" -exec echo {} + &gt; /dev/null

Expected Behavior

This command will complete instantly. It may only start 1-2 echo processes.
Typical Output

real    0m0.017s
user    0m0.006s
sys     0m0.010s

Test 3: find … | xargs … (Pipeline Batch Execution)

time find . -name "*.tmp" -print0 | xargs -0 echo &gt; /dev/null

Expected Behavior

This will also complete instantly, with performance comparable to -exec … +.
Typical Output:

real    0m0.117s
user    0m0.006s
sys     0m0.010s

Experiment Conclusion

In this experiment, the batch processing modes (+ and xargs) were about 100 times faster than the individual processing mode (\;).

This gap will become even more pronounced as the number of files increases.

This clearly demonstrates the significant performance advantage of avoiding the creation of new processes for each file.

Practical Scenario — Cleaning Up a Large Number of Oracle Log Files

In the daily maintenance of Oracle databases, a large number of trace (.trc) and audit (.aud) files are generated under the diagnostic directory (ADR Home), which can accumulate to hundreds of thousands or even millions over time, taking up a lot of disk space and file inodes. This is a perfect real-world scenario to test our batch processing skills.

Our task is: to safely and efficiently delete all .trc files older than 30 days.

Assuming the Oracle diagnostic directory is located at /u01/app/oracle/diag.

Option One: Extremely Inefficient Method

(Never use this on a production environment for a large number of files)

# Extremely poor performance, only serves as a negative example
find /u01/app/oracle/diag -name "*.trc" -mtime +30 -exec rm {} \;

Analysis

If there are 500,000 files to delete, this will start 500,000 rm processes.

This will not only consume a lot of time (potentially hours) but also put unnecessary load on the system, and may even affect the normal operation of the database due to too many processes.

Option Two: Efficient, Safe, and Concise Method (Recommended)

# High performance, safe, and simple syntax
find /u01/app/oracle/diag -name "*.trc" -mtime +30 -exec rm -f {} +

Analysis

This is an excellent choice.

It perfectly combines find’s safe filename handling capability with high-performance batch execution.

find will pack a large number of filenames and pass them to a few rm -f processes at once.

Execution speed is fast, resource consumption is low, and the syntax is clear and straightforward.

For most DBAs and system administrators, this is the preferred way to accomplish this task.

Option Three: Using xargs for Parallel Deletion

(Expert Choice)

# Utilize multi-core CPU for parallel deletion, further improving efficiency
find /u01/app/oracle/diag -name "*.trc" -mtime +30 -print0 | xargs -0 -P 4 rm -f

Analysis

This option has performance comparable to Option Two, but it introduces xargs’ ace feature: Parallel Processing.

-P 4 tells xargs to start up to 4 rm -f processes simultaneously to delete files in parallel.

On multi-core CPUs and high-speed I/O systems (like SSDs, NVMe), this can significantly shorten cleanup time by better utilizing system resources.

When the number of files reaches millions, the advantages of parallel processing become even more apparent.

Final Verdict — When to Use Which?

When simplicity is desired and performance is not critical: find -exec … \;

Scenario

When you need to execute a complex shell function or script that cannot accept multiple parameters for each file.
Example

find . -type f -exec my_script.sh {} \;

For a safe, simple, and high-performance “best default choice”:

find -exec … +

Scenario

For the vast majority of daily batch file operations (deleting, moving, changing permissions), such as our case of cleaning Oracle logs.
Advantages

It has high performance, natively supports special filenames, and has the simplest syntax.

This is the choice of many modern Linux users.

For extreme performance and advanced features, the “expert mode”:

find … -print0 | xargs -0 …

Scenario

1) When parallel processing is needed to maximize the use of multi-core CPUs (e.g., compressing and converting a large number of images, or parallel deletion of massive files).

2) When the output of find needs to be further processed by other commands before being handed to xargs.
Advantages

Most powerful functionality, especially the -P parallel option, which can significantly shorten total time in CPU or I/O intensive tasks.
Golden Rule

When using xargs, never forget -print0 and -0, unless you are 100% sure your filenames do not contain any special characters.

Conclusion

find -exec and xargs are not mutually exclusive enemies, but rather allies with their own strengths.

-exec is a reliable aide built into find, while xargs is a capable independent specialist adept at collaborative operations.

For daily tasks, find -exec … + provides a perfect balance of performance, safety, and simplicity.
When you need to squeeze out system performance for large-scale parallel processing, find … -print0 | xargs -0 … is your sharpest weapon.

By truly mastering the differences and combinations of these two, you will possess the powerful ability to handle massive file processing in the Shell world.

Final Note

In fact, the root cause of most database performance issues is often closely related to resource configuration and performance bottlenecks at the operating system level.

If you want to systematically fill in the practical gaps in Linux, I recommend checking out Liu Feng’s Linux series courses, which progress step by step from command line to operational deployment.

Author Introduction

Hello everyone, I am Liu Feng, founder of Anyatech & Senior Database Technology Instructor, focusing on PostgreSQL, domestic database operation and migration, database performance optimization, and other areas.

As an officially authorized instructor of the PG China branch and a PostgreSQL ACE certified expert, I have been actively involved in frontline project practice for over 10 years, deeply participating in database performance tuning and migration projects across various industries such as telecommunications, finance, and government.

Feel free to follow me as we explore the infinite possibilities of databases together, with no limits on technical exchanges!

📌 If you find this helpful, remember to like, bookmark, and share your support, and don’t forget to follow me for more database insights~

Anyatech Data Workshop | What We Can Do

Whether you are the technical leader of a business system or the first responder in the data department, we can provide you with reliable support:

Supported Database Types

Oracle / MySQL / PostgreSQL / SQL Server and other mainstream databases
Core Service Content

Performance Optimization / Fault Handling / Data Migration / Backup Recovery / Version Upgrade / Patch Management
Systematic Support

In-depth Inspection / High Availability Architecture Design / Application Layer Compatibility Assessment / Operation and Maintenance Tool Integration
Specialized Capability Supplement

Custom Course Training / Client Team Coaching / Complex Problem Collaborative Troubleshooting / Emergency Rescue Support

📮 If you have an undeletable table, a query that won’t run, or an unclear upgrade risk, feel free to reach out to us for a chat.

END

Keyword Response (See corresponding article):

oracle, mysql, pg, postgresql, sql, performance optimization, fault handling, data migration, backup recovery, version upgrade, patch management, in-depth inspection, solutions, architecture design…

Add WeChat friends for 1-on-1 consultation

\ | /

★

Move your fingers

Give 【Anyatech Data Workshop】 a star mark~

So you won’t lose me~

Remember to add a star mark!

Related posts

Leave a Comment Cancel reply