Fixing Special Characters in Linux Scripts

The scripts required for this article can be directly copied in the format of this article:

train_script.sh

#!/bin/bash

train=(
"    ____          " 
"  _|____|____     "
" |  _________ |   "
"   |  _  _  |     "
"   |_| |_| |_|    "
)

cols=$(tput cols)
train_width=0
for line in "${train[@]}"; do
    (( ${#line} > train_width )) && train_width=${#line}
done

clear
tput civis

trap 'tput cnorm; exit' SIGINT

position=0
direction=1

while true; do
    clear
    position=$((position + direction))

    if (( position >= cols - train_width )); then
        direction=-1
    elif (( position <= 0 )); then
        direction=1
    fi

    for line in "${train[@]}"; do
        printf "%${position}s%s\n" "" "$line"
    done

    sleep 0.1
done

tab.sh

#!/bin/bash
generate_random() {
	local min=$1
	local max=$2
	echo $(( RANDOM % (max - min + 1) + min ))
}
MIN=1
MAX=100
COUNT=1
while getopts "m:M:c:" opt; do
	case $opt in
		m) MIN=$OPTARG ;; 
		M) MAX=$OPTARG ;; 
		c) COUNT=$OPTARG ;; 
		*) echo "USE: $0 [-m min] [-M max] [-c count]"; exit 1 ;;
	esac
done
for ((i=0; i<COUNT; i++)); do
	generate_random $MIN $MAX
done

In daily operations, we often copy some Bash, Python, or Expect scripts from public accounts or websites to execute on Linux servers. However, we sometimes encounter execution exceptions, such as <span>command not found</span> or <span>syntax error</span>. Usually, this is due to the conversion of special characters during the paste process, such as:

  • Inconsistent line endings between Windows and Linux
  • Invisible characters (such as full-width spaces, BOM headers, carriage returns, etc.)
  • HTML encoded characters (such as &nbsp replaced with spaces)
  • Formatting errors (for example, tabs turning into multiple spaces)

This article will introduce several methods to help you quickly fix these issues.

1. Use <span>cat -A</span> to check file format

In the Linux terminal, you can use the <span>cat -A</span> command to check for special characters. The table below lists the special characters that can be viewed:

Special Character Display Method
<span>Tab</span> Replaced by <span>^I</span> (uppercase i) for the tab character
Line Feed Replaced by <span>$</span>, which is normal and needs to be preserved during processing
Carriage Return <span>^M</span> replaces the carriage return character (usually in Windows/DOS files)
Abnormal Spaces <span>M-BM-</span> usually copied from websites or web pages, or non-printable characters
cat -A space.sh

The execution result for spaces is as follows:

[root@test ~]# cat -A train_script.sh
#!/bin/bash$
$
train=($
" M-BM-  M-BM- ____ M-BM-  M-BM-  M-BM-  M-BM-  M-BM- "M-BM- $
" M-BM- _|____|____ M-BM-  M-BM-  "$ 
" | M-BM- _________ | M-BM-  "$ 
" M-BM-  | M-BM- _ M-BM- _ M-BM- | M-BM-  M-BM-  "$ 
" M-BM-  |_| |_| |_| M-BM-  M-BM- "$ 
)$
$
cols=$(tput cols)$
train_width=0$
for line in "${train[@]}"; do$
M-BM-  M-BM-  (( ${#line} > train_width )) && train_width=${#line}$
done$
$
clear$
tput civis$
$
trap 'tput cnorm; exit' SIGINT$
$
position=0$
direction=1$
$
while true; do$
M-BM-  M-BM-  clear$
M-BM-  M-BM-  position=$((position + direction))$
$
M-BM-  M-BM-  if (( position >= cols - train_width )); then$
M-BM-  M-BM-  M-BM-  M-BM-  direction=-1$
M-BM-  M-BM-  elif (( position <= 0 )); then$
M-BM-  M-BM-  M-BM-  M-BM-  direction=1$
M-BM-  M-BM-  fi$
$
M-BM-  M-BM-  for line in "${train[@]}"; do$
M-BM-  M-BM-  M-BM-  M-BM-  printf "%${position}s%s\n" "" "$line"$
M-BM-  M-BM-  done$
$
M-BM-  M-BM-  sleep 0.1$
done$

The execution result for tab is as follows:

[root@test ~]# cat -A tab.sh 
#!/bin/bash$
generate_random() {$
^Ilocal min=$1$
^Ilocal max=$2$
^Iecho $(( RANDOM % (max - min + 1) + min ))$
} $
MIN=1$
MAX=100$
COUNT=1$
while getopts "m:M:c:" opt; do$
^Icase $opt in$
^I^Im) MIN=$OPTARG ;;$
^I^IM) MAX=$OPTARG ;;$
^I^Ic) COUNT=$OPTARG ;;$
^I^I*) echo "USE: $0 [-m min] [-M max] [-c count]"; exit 1 ;;$
^Iesac$
done$
for ((i=0; i<COUNT; i++)); do$
^Igenerate_random $MIN $MAX$
done$

Command Explanation

  • <span>-A</span> is equivalent to the following three options
    • <span>-v</span> (show control characters): displays all non-printable characters (e.g., ASCII control characters) as ^ plus character (for example, ^M represents carriage return, ^I represents tab).
    • <span>-E</span> (show line-ending line feeds): adds a <span>$</span> symbol at the end of each line, indicating the presence of line feeds.
    • <span>-T</span> (show tabs): displays tabs as <span>^I</span> instead of actual tabs.

2. Fixing Illegal Characters M-BM-

Use the od command to view the hexadecimal encoding of the file

od -t x1 -c train_script.sh

Command Explanation:

  • <span>(Octal Dump)</span>: is a binary file viewing tool that displays file content in octal format by default, but can be customized with parameters.
  • <span>-t</span>: specifies the output format.
  • <span>x1</span>: displays in hexadecimal format, byte by byte.
  • <span>-c</span>: displays content in ASCII character form simultaneously. Non-printable characters (such as line feeds, tabs, etc.) will be displayed as escape symbols (such as <span>\n</span>, <span>\t</span>).

Example Output:

0000000  23  21  2f  62  69  6e  2f  62  61  73  68  0a  0a  74  72  61
          #   !   /   b   i   n   /   b   a   s   h  \n  \n   t   r   a
0000020  69  6e  3d  28  0a  22  20  c2  a0  20  c2  a0  5f  5f  5f  5f
          i   n   =   (  \n   "     302 240     302 240   _   _   _   _
0000040  20  c2  a0  20  c2  a0  20  c2  a0  20  c2  a0  20  c2  a0  22
            302 240     302 240     302 240     302 240     302 240   "
0000060  c2  a0  0a  22  20  c2  a0  5f  7c  5f  5f  5f  5f  7c  5f  5f
        302 240  \n   "     302 240   _   |   _   _   _   _   |   _   _
0000100  5f  5f  20  c2  a0  20  c2  a0  20  22  0a  22  20  7c  20  c2
          _   _     302 240     302 240       "  \n   "       |     302
0000120  a0  5f  5f  5f  5f  5f  5f  5f  5f  5f  20  7c  20  c2  a0  20
        240   _   _   _   _   _   _   _   _   _       |     302 240    
0000140  22  0a  22  20  c2  a0  20  7c  20  c2  a0  5f  20  c2  a0  5f
          "  \n   "     302 240       |     302 240   _     302 240   _
0000160  20  c2  a0  7c  20  c2  a0  20  c2  a0  20  22  0a  22  20  c2
            302 240   |     302 240     302 240       "  \n   "     302
0000200  a0  20  7c  5f  7c  20  7c  5f  7c  20  7c  5f  7c  20  c2  a0
        240       |   _   |       |   _   |       |   _   |     302 240
0000220  20  c2  a0  22  0a  29  0a  0a  63  6f  6c  73  3d  24  28  74
............省略后半部分

From the output, we can see that <span>c2 a0 (302 240 is octal)</span>, which corresponds to M-BM-, because the preceding part (see the second line) is:<span>#!/bin/bash</span> <span>\n</span> is a line feed, <span>train=(</span>, <span>\n</span> line feed, and the following part is garbled.

Since we have confirmed that <span>c2 a0</span> (<span>M-BM-</span>) is a non-breaking space, we can use <span>sed</span> to perform the replacement:

sed -i 's/\xC2\xA0/ /g' train_script.sh

Command Explanation

  • <span>-i</span> directly modifies the file (if you want to keep the original file, you can remove -i or export it as a new file using <span>></span>).
  • <span>\xC2\xA0</span> is the hexadecimal representation of <span>U+00A0</span>, which can be verified by entering <span>echo -e "aa\xC2\xA0aa"</span> in Linux.
  • <span>/ /g</span> replaces it with a normal space.

The script can now run normally

3. Fixing Illegal Characters<span>^I</span>

Using the experience from the previous script fix, first execute the command to check the encoding

od -t x1 -c tab.sh

Example Output:

0000000  23  21  2f  62  69  6e  2f  62  61  73  68  0a  67  65  6e  65
          #   !   /   b   i   n   /   b   a   s   h  \n   g   e   n   e
0000020  72  61  74  65  5f  72  61  6e  64  6f  6d  28  29  20  7b  0a
          r   a   t   e   _   r   a   n   d   o   m   (   )       {  \n
0000040  09  6c  6f  63  61  6c  20  6d  69  6e  3d  24  31  0a  09  6c
         \t   l   o   c   a   l       m   i   n   =   $   1  \n  \t   l
0000060  6f  63  61  6c  20  6d  61  78  3d  24  32  0a  09  65  63  68
          o   c   a   l       m   a   x   =   $   2  \n  \t   e   c   h
0000100  6f  20  24  28  28  20  52  41  4e  44  4f  4d  20  25  20  28
          o       $   (   (       R   A   N   D   O   M       %       (
0000120  6d  61  78  20  2d  20  6d  69  6e  20  2b  20  31  29  20  2b
          m   a   x       -       m   i   n       +       1   )       +
0000140  20  6d  69  6e  20  29  29  0a  7d  0a  4d  49  4e  3d  31  0a
              m   i   n       )   )  \n   }  \n   M   I   N   =   1  \n
0000160  4d  41  58  3d  31  30  30  0a  43  4f  55  4e  54  3d  31  0a
          M   A   X   =   1   0   0  \n   C   O   U   N   T   =   1  \n
.........省略后半部分

From the output, we can see that the illegal character <span>^I</span> corresponds to <span>\t</span>, which is actually the tab character, so we can fix it with the following command:

# Since a tab generally represents four spaces, replacing it with four spaces can maintain the original script format
sed 's/\x09/  /g' tab.sh
or
sed -i 's/\t/    /g' tab.sh
or
expand -t 4 tab.sh > new_tab.sh

The above 09 is hexadecimal; why is there no corresponding octal below? Because 09 in hexadecimal represents a tab character, so it is displayed normally here.

Command Explanation

  • <span>expand</span> is a command-line tool used to convert tabs in a file to spaces. It can replace each tab in the file with a specified number of spaces to align the text.
  • <span>-t</span> specifies how many spaces each tab should be replaced with; here it is replaced with four spaces.

The script can now run normally

bash new_tab.sh -m 1 -M 5000 -c 10

4. Other Methods

You can use cat to generate a new file and use sed for file replacement.

Generate a new file

cat -vT train_script.sh > new_train_script.sh

[!NOTE]

Consideration: Why not use <span>cat -A</span>

The content is as follows:

[root@test ~]# cat -vT space.sh.bak 
#!/bin/bash

train=(
" M-BM-  M-BM- ____ M-BM-  M-BM-  M-BM-  M-BM-  M-BM- "M-BM- 
" M-BM- _|____|____ M-BM-  M-BM-  "
" | M-BM- _________ | M-BM-  "
" M-BM-  | M-BM- _ M-BM- _ M-BM- | M-BM-  M-BM-  "
" M-BM-  |_| |_| |_| M-BM-  M-BM- "
)

cols=$(tput cols)
train_width=0
for line in "${train[@]}"; do
M-BM-  M-BM-  (( ${#line} > train_width )) && train_width=${#line}
done

clear

tput civis

trap 'tput cnorm; exit' SIGINT

position=0
direction=1

while true; do
M-BM-  M-BM-  clear
M-BM-  M-BM-  position=$((position + direction))

M-BM-  M-BM-  if (( position >= cols - train_width )); then
M-BM-  M-BM-  M-BM-  M-BM-  direction=-1
M-BM-  M-BM-  elif (( position <= 0 )); then
M-BM-  M-BM-  M-BM-  M-BM-  direction=1
M-BM-  M-BM-  fi

M-BM-  M-BM-  for line in "${train[@]}"; do
M-BM-  M-BM-  M-BM-  M-BM-  printf "%${position}s%s\n" "" "$line"
M-BM-  M-BM-  done

M-BM-  M-BM-  sleep 0.1
end

Perform replacement

sed -i 's/M-BM-/  /' new_train_script.sh

The script can now run normally

5. Other

Check for <span>^I</span> illegal strings:

vim script.sh
:set list

Enter the <span>:set list</span> command to check if the file has <span>^I</span> symbols. If so, you can delete them manually or use:

:%s/\t/    /g
# Here, \t is used because ^I represents the tab character

Content copied from Windows may carry Windows styles, such as

[root@test]# cat -A space.sh
#!/bin/bash^M$
^M$
train=(^M$
"    ____          " ^M$
"  _|____|____     "^M$
" |  _________ |   "^M$
"   |  _  _  |     "^M$
"   |_| |_| |_|    "^M$
)^M$
^M$
cols=$(tput cols)^M$
train_width=0^M$
for line in "${train[@]}"; do^M$
    (( ${#line} > train_width )) && train_width=${#line}^M$
done^M$
^M$
clear^M$
tput civis^M$
^M$
trap 'tput cnorm; exit' SIGINT^M$
^M$
position=0^M$
direction=1^M$
^M$
while true; do^M$
    clear^M$
    position=$((position + direction))^M$
^M$
    if (( position >= cols - train_width )); then^M$
        direction=-1^M$
    elif (( position <= 0 )); then^M$
        direction=1^M$
    fi^M$
^M$
    for line in "${train[@]}"; do^M$
        printf "%${position}s%s\n" "" "$line"^M$
    done^M$
^M$
    sleep 0.1^M$
done^M$

From the output, it is found that the end character is not <span>$</span>, but <span>^M$</span>

Common line endings are of two types:

  • ^M$ → Windows line endings (CRLF, i.e., \r\n)
  • $ → Unix line endings (LF, i.e., \n).

You can test with the following commands

printf "Hello\r\nWorld\r\n" > win.txt  # Write CRLF line endings
echo -e "Hello\nWorld" > unix.txt   # Write LF line endings

# View output
cat -A win.txt   
cat -A unix.txt  

[!NOTE]

Consideration: Why is one using <span>printf</span> and the other using <span>echo</span>

Of course, you can also use another way to check

vi win.txt
:set ff?
# or
[root@test ~]# file win.txt 
win.txt: ASCII text, with CRLF line terminators

Fix Windows line endings

# Let vi reformat
vim script.sh
:set ff=unix
:wq

# Alternatively, you can use the above method to check encoding and replace with sed, but this method is simpler.

# The other two commands require installation, dos2unix unix2dos 
# Windows --> Unix:
dos2unix filename
unix2dos filename

I hope this tutorial on fixing special characters helps you! 🚀🚀!

Leave a Comment