Basic Linux for Bioinformatics (Part 2)

Basic Linux for Bioinformatics (Part 2)Basic Linux for Bioinformatics (Part 2)Basic Linux for Bioinformatics (Part 2)Basic Linux for Bioinformatics (Part 2)

tar

PART 01

Basic Linux for Bioinformatics (Part 2)

Function Description: Add or restore files from backup files

Syntax: tar -f[cxzjv] <file>

Parameters: -f is a required parameter

-c Create a backup file

-x Extract files from the backup file

-z Use gzip/gunzip to compress/decompress files

-j Use bzip2/bunzip2 to compress/decompress files

-v Show the command execution process

Example: tar -cf newfile.tar file1 file2 packages file1 and file2 into newfile.tar

tar -xf newfile.tar extracts files from newfile.tar

tar -czvf newfile.tar.gz file1 file2 packages file1 and file2 and uses gzip to compress the files into newfile.tar.gz

tar -xzvf newfile.tar.gz decompresses newfile.tar.gz and extracts the files inside

Basic Linux for Bioinformatics (Part 2)

zip

PART 02

Basic Linux for Bioinformatics (Part 2)

Function Description: Compress and generate files ending with “.zip”

Syntax: zip [-1..9][-r] <newfile.zip> <sourcefile/dir>

Parameters: -r processes all files and directories in subdirectories

-1..9 compression efficiency, the larger the number, the higher the compression efficiency

Example: zip newfile.zip file1 file2 compresses file1 and file2 into newfile.zip

Decompression method: use unzip to decompress, unzip newfile.zip

Basic Linux for Bioinformatics (Part 2)

gzip

PART 03

Basic Linux for Bioinformatics (Part 2)

Function Description: Compress and generate files ending with “.gz”

Syntax: gzip [-1..9][-r] <file/dirname>

Parameters: -r compresses files in the directory but does not compress the directory itself

-1..9 compression efficiency, the larger the number, the higher the compression efficiency (default 6)

Example: gzip file compresses file into file.gz and deletes the source file

Decompression method: use gunzip to decompress, gunzip file.gz

Basic Linux for Bioinformatics (Part 2)

bzip2

PART 04

Basic Linux for Bioinformatics (Part 2)

Function Description: Compress and generate files ending with “.bz2”

Syntax: bzip2 <file>

Example: bzip2 file compresses file into file.bz2 and deletes the source file

Decompression method: use bunzip2 to decompress, bunzip2 file.bz2

Basic Linux for Bioinformatics (Part 2)

sort

PART 05

Basic Linux for Bioinformatics (Part 2)

Function Description: Sort the contents of a text file

Syntax: sort [-nkr] <filename>

Parameters: -n/-g sorts the file by numerical size from smallest to largest (default sorts by ASCII code from smallest to largest)

-k <num> sorts the file by the content of the num-th column (default is the first column)

-r reverse sorting

Example: sort file sorts the file by the ASCII code value of the first column from smallest to largest and outputs.

sort -n -k 3 file sorts the file by the numerical size of the 3rd column from smallest to largest.

sort -nr -k1,2 file sorts the file in reverse order by numerical size, prioritizing the first column, then the second column

Basic Linux for Bioinformatics (Part 2)

uniq

PART 06

Basic Linux for Bioinformatics (Part 2)

Function Description: Merge adjacent identical lines in a file

Syntax: uniq [-cd] <file> [outfile]

Parameters: -c displays the count of repetitions for each line in the first column

-d only displays lines that have duplicates

Example: uniq -c file merges identical lines and counts the number of repetitions for each line, outputting to the screen

uniq -d file outfile merges identical lines and displays lines that appear multiple times in the file, outputting to outfile

Basic Linux for Bioinformatics (Part 2)

wc

PART 07

Basic Linux for Bioinformatics (Part 2)

Function Description: Count the number of bytes in a file

Syntax: wc [-cwl] <file>

Parameters: -c counts only the number of bytes

-w counts only the number of words

-l counts only the number of lines

Example: wc file displays the number of bytes, words, and lines in the file

wc -l file displays the number of lines in the file

Basic Linux for Bioinformatics (Part 2)

grep

PART 08

Basic Linux for Bioinformatics (Part 2)

Function Description: Find lines in a file that match a condition

Syntax: grep [-v] <string> <file>

Parameters: -v does not match, displays lines in the file that do not match the string

Example: grep world file finds lines in the file that contain “world”

grep -v world file finds lines in the file that do not contain “world”

Basic Linux for Bioinformatics (Part 2)

awk

PART 09

Basic Linux for Bioinformatics (Part 2)

Function Description: Perform operations on specific columns of specific lines

Syntax: awk [-F] ‘(condition){operate}’ <filename>

Parameters: -F specifies the delimiter for columns, which can be any character, default is whitespace

Example: awk -F “:” ‘{print $1}’ splits by “:” and prints the first column

awk ‘($1 > 100){print $0}’ outputs the entire line for rows where the first column is greater than 100

awk ‘($1 > 100){print $1 “\t” $2}’ outputs the first and second columns for rows where the first column is greater than 100, separated by “\t”.

awk ‘($3~/world/){ x+= $1} END{print x}’ sums the first column for rows where the third column matches “world”, and outputs the result x after processing is complete

Basic Linux for Bioinformatics (Part 2)

sed

PART 10

Basic Linux for Bioinformatics (Part 2)

Function Description: Text processing and editing of files

Syntax: sed [-i] ‘{command}’ <filename>

Parameters: -i modifies the original file directly (default modifies and outputs to screen, original file remains unchanged)

Example: sed -i ‘s/test/new_word/’ file replaces the string “test” in the file with “new_word”

sed -i ‘/pattern/ s/ test/new_word/’ file performs replacement on lines in the file that match the pattern string

sed -i ‘/^$/ d’ file deletes empty lines from the file

Basic Linux for Bioinformatics (Part 2)

md5sum

PART 11

Basic Linux for Bioinformatics (Part 2)

Function Description: Verify the integrity of file transfers

Syntax: md5sum [-c] <filename>

Parameters: -c checks whether the file transfer is complete

Example: md5sum file1 generates the md5 value for file1.

md5sum file1 > newfile generates the md5 value for file1 and redirects it to newfile.

md5sum -c newfile checks whether the md5 value in newfile matches the file.

Basic Linux for Bioinformatics (Part 2)

chmod

PART 12

Basic Linux for Bioinformatics (Part 2)

Function Description: Set file or directory permissions

Syntax: chmod [-R] <mode> <file/dirname>

Parameters: -R sets permissions for the directory and all files within it

Detailed Description: File permissions are divided into whether they are readable (r), writable (w), and executable (x), corresponding to the owner (u), group members (g), and others (o). Mode can be in symbolic or numeric form.

Symbolic mode: [ugoa] [+-=] [rwx]

Numeric mode: represented by 1/0, where 111 means readable, writable, and executable, and 000 means not readable, not writable, and not executable. Readable, writable, and executable correspond to decimal 4, 2, and 1, so 5 means readable, not writable, and not executable.

Example: chmod u+xg=rx o-rwx file adds executable permission for the user of the file, sets group member permissions to readable and writable, and removes rwx permissions for others.

chmod -R 750 dirname sets permissions for the dirname directory and all files within it to be readable, writable, and executable for the user, readable and executable for group members, and no permissions for others.

Basic Linux for Bioinformatics (Part 2)

find

PART 13

Basic Linux for Bioinformatics (Part 2)

Function Description: Find files

Syntax: find [path] [expression]

Detailed Description: path searches under the specified path

expression search pattern, commonly used include

-name <filename> search by filename (wildcards allowed)

-perm <mode> search by file permissions

-user <user name> search by file owner

-group <group name> search by file group

-mtime <+n/-n> search by file modification time, -n means modified within n days, +n means modified more than n days ago

-type <l/d/f> search by file type, l: symbolic link, f: regular file, d: directory

Example: find ./ -name file searches for files named “file” in the current directory and its subdirectories

find ./ -name ‘*a’ -type d finds directory files whose names end with “a”.

Basic Linux for Bioinformatics (Part 2)

du

PART 14

Basic Linux for Bioinformatics (Part 2)

Function Description: Display the size of directories or files

Syntax: du [-ash] [–max-depth=<n>] <file/dirname>

Parameters: -a displays the size of individual files in the directory

-s displays only the total

-h displays in units of “K”, “M”, “G”

–max-depth=<n> displays files within n levels of directories only

Example: du -sh ./ displays the size of the current directory

du -ah –max-depth=1 dir displays the sizes of all files in the dir directory, excluding the next level of directories

Basic Linux for Bioinformatics (Part 2)Basic Linux for Bioinformatics (Part 2)

END

Basic Linux for Bioinformatics (Part 2)Basic Linux for Bioinformatics (Part 2)

Leave a Comment