Comprehensive Python Key Knowledge Summary

Follow 👆 the public account and reply "python" to get a zero-based tutorial! Source from the internet, please delete if infringing.

This is a summary of key Python knowledge compiled by developer @TwentyOne from SegmentFault. Due to the extensive nature of the summary, the length is a bit long, and this is also the result of the author’s “patching up” for a long time.

Comprehensive Python Key Knowledge Summary

[The way to obtain is at the end of the article!!]

[The way to obtain is at the end of the article!!]

Py2 VS Py3

  • print has become a function, in python2 it is a keyword

  • No more unicode objects, default str is unicode

  • In python3, division returns a float

  • No long type anymore

  • xrange does not exist, range replaces xrange

  • Functions and variable names can be defined in Chinese

  • Advanced unpacking and * unpacking

  • Keyword arguments must include name=value after *

  • raise from

  • iteritems removed and changed to items()

  • yield from links to sub-generators

  • asyncio, async/await natively support asynchronous programming

  • New additions: enum, mock, ipaddress, concurrent.futures, asyncio urllib, selector

    • Different enumeration classes cannot be compared

    • Only equality comparison is allowed within the same enumeration class

    • Usage of enumeration class (numbering starts from 1 by default)

    • To avoid the occurrence of the same enumeration value in the enumeration class, the @unique decorator can be used on the enumeration class

# Notes on enumerations
from enum import Enum

class COLOR(Enum):
    YELLOW=1
#YELLOW=2# will report an error
    GREEN=1# will not report an error, GREEN can be considered an alias of YELLOW
    BLACK=3
    RED=4
print(COLOR.GREEN)#COLOR.YELLOW, will still print YELLOW
for i in COLOR:# traversing COLOR will not include GREEN
    print(i)
#COLOR.YELLOW
COLOR.BLACK
COLOR.RED
How to traverse aliases
for i in COLOR.__members__.items():
    print(i)
# output:('YELLOW', <COLOR.YELLOW: 1>)
('GREEN', <COLOR.YELLOW: 1>)
('BLACK', <COLOR.BLACK: 3>)
('RED', <COLOR.RED: 4>)
for i in COLOR.__members__:
    print(i)
# output:YELLOW
GREEN
BLACK
RED

# Enumeration conversion
# It's best to use the numeric value of the enumeration for storage in the database rather than using the label name string
# Use enumeration classes in the code
a=1
print(COLOR(a))# output:COLOR.YELLOW

py2/3 Conversion Tools

  • six module: a module compatible with python2 and python3

  • 2to3 tool: changes code syntax version

  • __future__: use features from the next version

Common Libraries

  • Collections that must be known

    https://segmentfault.com/a/1190000017385799

  • Python sorting operations and heapq module

    https://segmentfault.com/a/1190000017383322

  • itertools module super practical methods

    https://segmentfault.com/a/1190000017416590

Less Common but Very Important Libraries

  • dis (code bytecode analysis)

  • inspect (generator state)

  • cProfile (performance analysis)

  • bisect (maintain ordered list)

  • fnmatch

    • fnmatch(string,”*.txt”) # case insensitive on win

    • fnmatch depends on the system

    • fnmatchcase is case sensitive

  • timeit (code execution time)

    def isLen(strString):
        # should still use ternary expression, faster
        return True if len(strString)>6 else False

    def isLen1(strString):
        # note the position of false and true here
        return [False,True][len(strString)>6]
    import timeit
    print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))

    print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))
  • contextlib

    • @contextlib.contextmanager makes generator functions into context managers

  • types (contains all types of type objects defined by the standard interpreter, can decorate generator functions as asynchronous)

    import types
    types.coroutine # equivalent to implementing __await__
  • html (implement HTML escaping)

    import html
    html.escape("<h1>I'm Jim</h1>") # output:'<h1>I'm Jim</h1>'
    html.unescape('<h1>I'm Jim</h1>') # <h1>I'm Jim</h1>
  • mock (solving test dependencies)

  • concurrent (creating process pools and thread pools)

from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor()
task = pool.submit(function_name,(parameters)) # this method will not block and will return immediately
task.done()# check if the task is completed
task.result()# blocking method, check the return value of the task
task.cancel()# cancel the unexecuted task, returns True or False, returns True if canceled successfully
task.add_done_callback()# callback function
task.running()# whether it is executing     task is a Future object

for data in pool.map(function, parameter_list):# returns a list of results of completed tasks, executed in the order of parameters
    print(the result of the completed task data)

from concurrent.futures import as_completed
as_completed(task_list)# returns a list of completed tasks, executes one as soon as it completes

wait(task_list,return_when=condition)# block the main thread according to the condition, there are four conditions
  • selector (wraps select, user multiplexing IO programming)

  • asyncio

future=asyncio.ensure_future(coroutine)  equals the following method  future=loop.create_task(coroutine)
future.add_done_callback() adds a callback function after completion
loop.run_until_complete(future)
future.result() check the return result

asyncio.wait() accepts an iterable coroutine object
asynicio.gather(*iterable_object,*iterable_object)    both results are the same, but gather can cancel in bulk, gather object.cancel()

There is only one loop in a thread

When loop.stop, make sure to loop.run_forever() otherwise an error will occur
loop.run_forever() can execute non-coroutines
Finally execute finally module loop.close()

asyncio.Task.all_tasks() get all tasks and then iterate through and use task.cancel() to cancel

Partial function partial(function, parameters) wraps the function into another function name, its parameters must be placed before the defined function

loop.call_soon(function, parameters)
call_soon_threadsafe() thread-safe    
loop.call_later(time, function, parameters)
In the same code block call_soon takes priority, then multiple later execute in ascending order of time

If you must run blocking code
use loop.run_in_executor(executor,function,parameters) wrap into a multi-thread, then put into a task list, run through wait(task list)

Implement HTTP via asyncio
reader,writer=await asyncio.open_connection(host,port)
writer.writer() send request
async for data in reader:
    data=data.decode("utf-8")
    list.append(data)
then the list contains the HTML

as_completed(tasks) returns one as soon as one is completed, returns an iterable object    

Coroutine lock
async with Lock():

[The way to obtain is at the end of the article!!]

[The way to obtain is at the end of the article!!]

Python Advanced

  • Inter-process communication:

    • Manager (built-in many data structures, can achieve memory sharing between multiple processes)

from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
    p_dict[key] = value

if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)
    • Pipe (suitable for two processes)

from multiprocessing import Pipe,Process
#pipe performance is higher than queue
def producer(pipe):
    pipe.send("bobby")

def consumer(pipe):
    print(pipe.recv())

if __name__ == "__main__":
    recevie_pipe, send_pipe = Pipe()
    #pipe can only be used for two processes
    my_producer= Process(target=producer, args=(send_pipe, ))
    my_consumer = Process(target=consumer, args=(recevie_pipe,))

    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()
    • Queue (cannot be used in process pools, inter-process communication requires Manager().Queue())

from multiprocessing import Queue,Process
def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()
    • Process pool

def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()
  • Common methods of sys module

    • argv command line parameter list, the first is the path of the program itself

    • path returns the module search path

    • modules.keys() returns the list of all imported modules

    • exit(0) exit the program

  • a in s or b in s or c in s shorthand

    • using any: all() returns True for any iterable object that is empty

    # Method one
    True in [i in s for i in [a,b,c]]
    # Method two
    any(i in s for i in [a,b,c])
    # Method three
    list(filter(lambda x:x in s,[a,b,c]))
  • set collection application

    • {1,2}.issubset({1,2,3})# check if it is a subset

    • {1,2,3}.issuperset({1,2})

    • {}.isdisjoint({})# check if the intersection of two sets is empty, if it is empty then True

  • Chinese matching in code

    • [u4E00-u9FA5] matches the range of Chinese characters [one to é¾¥]

  • Check the system’s default encoding format

    import sys
    sys.getdefaultencoding()    # setdefaultencodeing() sets the system encoding method
  • getattr VS getattribute

class A(dict):
    def __getattr__(self,value):# returns when accessing a non-existent attribute
        return 2
    def __getattribute__(self,item):# shield all element access
        return item
  • Class variables are not stored in instance __dict__, only exist in class’s __dict__

  • globals/locals (can indirectly operate code)

    • globals contains all variable properties and values in the current module

    • locals contains all variable properties and values in the current environment

  • Python variable name resolution mechanism (LEGB)

    • Local scope (Local)

    • The current scope embedded local scope (Enclosing locals)

    • Global/module scope (Global)

    • Built-in scope (Built-in)

  • Implement grouping from 1-100 every three

    print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])
  • What is a metaclass?

    • It is the class that creates classes, when creating a class, just need to set metaclass=metaclass, the metaclass needs to inherit from type instead of object, because type is the metaclass

type.__bases__  #(<class 'object'>,)
object.__bases__    #()
type(object)    #<class 'type'>
    class Yuan(type):
        def __new__(cls,name,base,attr,*args,**kwargs):
            return type(name,base,attr,*args,**kwargs)
    class MyClass(metaclass=Yuan):
        pass
  • What is duck typing (i.e.: polymorphism)?

    • Python does not default to judging parameter types during the process of using input parameters, as long as the parameters meet the execution conditions, they can be executed

  • Deep copy and shallow copy

    • Deep copy copies content, shallow copy copies address (increases reference count)

    • copy module implements deep copy

  • Unit testing

    • Generally, the test class inherits from the TestCase in the unittest module

    • pytest module quick testing (methods starting with test_/test files starting with test_/test classes starting with Test and cannot have init method)

    • coverage statistics testing coverage

    • [The way to obtain is at the end of the article!!]

    • [The way to obtain is at the end of the article!!]

    class MyTest(unittest.TestCase):
        def tearDown(self):# Execute before each test case
            print('This method has started testing')

        def setUp(self):# Perform operations before each test case executes
            print('This method has finished testing')

        @classmethod
        def tearDownClass(self):# Must use @classmethod decorator, runs once after all tests have run
            print('Start testing')
        @classmethod
        def setUpClass(self):# Must use @classmethod decorator, runs once before all tests
            print('End testing')

        def test_a_run(self):
            self.assertEqual(1, 1)  # test case
  • GIL releases based on the number of executed bytecode lines and time slices, GIL actively releases when encountering IO operations

  • What is monkey patch?

    • Monkey patch, replaces blocking syntax with non-blocking methods during runtime

  • What is introspection?

    • The ability to determine the type of an object at runtime, id, type, isinstance

  • Is Python pass-by-value or pass-by-reference?

    • Neither, Python is shared parameter passing, default parameters are only executed once during execution

  • Difference between else and finally in try-except-else-finally

    • else executes when no exception occurs, finally executes regardless of whether an exception occurs

    • except can catch multiple exceptions at once, but generally for different exceptions, we capture them separately

  • GIL global interpreter lock

    • Only one thread can execute at the same time, a feature of CPython (IPython), other interpreters do not have this

    • CPU-intensive: multi-process + process pool

    • IO-intensive: multi-threading/coroutines

  • What is Cython

    • A tool that compiles Python into C code

  • Generators and iterators

    • Iterable objects only need to implement the __iter__ method

      • Objects that implement __next__ and __iter__ methods are iterators

    • Using generator expressions or generator functions with yield (a generator is a special kind of iterator)

  • What is a coroutine

    • yield

    • async-await

      • A more lightweight way of multitasking than threads

      • Implementation method

  • Dict underlying structure

    • In order to support fast lookup, a hash table is used as the underlying structure

    • The average time complexity for hash table lookup is O(1)

    • The CPython interpreter uses double probing to solve hash collision problems

  • Hash expansion and hash collision resolution solutions

    • Chaining method

    • Double probing (open addressing method): Python uses

      • Circularly copies to a new space to achieve expansion

      • Collision resolution:

    from gevent import monkey
    monkey.patch_all()  # modify all blocking methods in the code, can specify specific methods to modify
  • Check if it is a generator or coroutine

    co_flags = func.__code__.co_flags

    # Check if it is a coroutine
    if co_flags & 0x180:
        return func

    # Check if it is a generator
    if co_flags & 0x20:
        return func
  • Fibonacci problems and variations

# A frog can jump up 1 step or 2 steps at a time. How many ways can the frog jump up n steps?
# How many ways can n 2*1 small rectangles cover a 2*n big rectangle without overlapping?
# Method one:
fib = lambda n: n if n <= 2 else fib(n - 1) + fib(n - 2)
# Method two:
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return b

# A frog can jump up 1 step or 2 steps... it can also jump up n steps. How many ways can the frog jump up n steps?
fib = lambda n: n if n < 2 else 2 * fib(n - 1)
  • Get the environment variables set on the computer

    import os
    os.getenv(env_name,None)# get the environment variable if it does not exist return None
  • Garbage collection mechanism

    • Reference counting

    • Mark and sweep

    • Generational collection

    # Check the threshold for generational collection
    import gc
    gc.get_threshold()  #output:(700, 10, 10)
  • True and False are completely equivalent to 1 and 0 in the code, can be directly calculated with numbers, inf represents infinity

  • C10M/C10K

    • C10M: 8-core CPU, 64G memory, maintaining 10 million concurrent connections on a 10gbps network

    • C10K: 1GHz CPU, 2G memory, maintaining 10 thousand clients providing FTP service under 1gbps network environment

  • Difference between yield from and yield:

    • yield from follows an iterable object, while yield has no restrictions

    • GeneratorExit triggers when the generator stops

  • Several uses of single underscore

    • When defining a variable, it indicates a private variable

    • When unpacking, it indicates discarding useless data

    • In interactive mode, it represents the result of the last code execution

    • Can be used for number concatenation (111_222_333)

  • Using break will not execute else

  • Decimal to binary conversion

[The way to obtain is at the end of the article!!]

[The way to obtain is at the end of the article!!]

def conver_bin(num):
        if num == 0:
            return num
        re = []
        while num:
            num, rem = divmod(num,2)
            re.append(str(rem))
        return "".join(reversed(re))
    conver_bin(10)
  • list1 = [‘A’, ‘B’, ‘C’, ‘D’] How can I get a new list named after the elements in the list A=[],B=[],C=[],D=[]?

    list1 = ['A', 'B', 'C', 'D']

    # Method one
    for i in list1:
        globals()[i] = []   # can be used to implement python version reflection

    # Method two
    for i in list1:
        exec(f'{i} = []')   # exec executes string statements
  • memoryview and bytearray (not commonly used, just recorded)

    # bytearray is mutable, bytes is immutable, memoryview does not produce new slices and objects
    a = 'aaaaaa'
    ma = memoryview(a)
    ma.readonly  # readonly memoryview
    mb = ma[:2]  # will not produce new string

    a = bytearray('aaaaaa')
    ma = memoryview(a)
    ma.readonly  # writable memoryview
    mb = ma[:2]      # will not produce new bytearray
    mb[:2] = 'bb'    # changes to mb will also change ma
  • Ellipsis type

# The phenomenon of ... appearing in the code is an Ellipsis object
L = [1,2,3]
L.append(L)
print(L)    # output:[1,2,3,[…]]
  • Lazy evaluation

    class lazy(object):
        def __init__(self, func):
            self.func = func

        def __get__(self, instance, cls):
            val = self.func(instance)    # equivalent to executing area(c), c is the Circle object below
            setattr(instance, self.func.__name__, val)
            return val

    class Circle(object):
        def __init__(self, radius):
            self.radius = radius

        @lazy
        def area(self):
            print('evaluate')
            return 3.14 * self.radius ** 2
  • Traverse files, pass in a folder, print out all file paths inside (recursively)

all_files = []    
def getAllFiles(directory_path):
    import os                                       
    for sChild in os.listdir(directory_path):                
        sChildPath = os.path.join(directory_path,sChild)
        if os.path.isdir(sChildPath):
            getAllFiles(sChildPath)
        else:
            all_files.append(sChildPath)
    return all_files
  • When storing files, handling file names

# secure_filename converts strings into safe file names
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool \uc3b4mluts.txt') # output:i_contain_cool_umlauts.txt
  • Date formatting

from datetime import datetime

datetime.now().strftime("%Y-%m-%d")

import time
# Only localtime can be formatted; time cannot be formatted
time.strftime("%Y-%m-%d",time.localtime())
  • Strange issues with tuple using +=

# Will report an error, but the value of the tuple will change because t[1]id has not changed

t=(1,[2,3])
t[1]+=[4,5]
# t[1] using append	extend method will not report an error and can be executed successfully
  • __missing__ you should know

class Mydict(dict):
    def __missing__(self,key): # returns the value when Mydict accesses the property that does not exist through slicing
        return key
  • + and +=

# + cannot be used to concatenate lists and tuples, while += can (through iadd implementation, internal implementation is extends(), so it can increase tuples), + will create new objects
# Immutable objects do not have __iadd__ method, so they directly use __add__ method, thus tuples can use += to add between tuples
  • How to turn each element of an iterable into all keys of a dictionary?

dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}
  • Wireshark packet capture software

Network Knowledge

  • What is HTTPS?

    • Secure HTTP protocol, https requires a cs certificate, data encryption, port 443, secure, the same website https SEO ranking will be higher

  • Common response status codes

    204 No Content // request successfully processed, no entity body returned, generally used to indicate successful deletion
    206 Partial Content // Get range request has been successfully processed
    303 See Other // temporary redirect, expected to use get redirection to obtain
    304 Not Modified // request cached resource
    307 Temporary Redirect // temporary redirect, Post will not change to Get
    401 Unauthorized // authentication failed
    403 Forbidden // resource request denied
    400 // request parameter error
    201 // add or change successfully
    503 // server maintenance or overload
  • Idempotence and safety of HTTP request methods

  • WSGI

    # environ: a dict object containing all HTTP request information
    # start_response: a function that sends HTTP responses
    def application(environ, start_response):
        start_response('200 OK', [('Content-Type', 'text/html')])
        return '<h1>Hello, web!</h1>'
  • RPC

  • CDN

  • SSL (Secure Sockets Layer) and its successor Transport Layer Security (TLS) is a security protocol that provides security and data integrity for network communication.

  • SSH (Secure Shell) is a secure protocol established by the IETF’s Network Working Group; SSH is a security protocol built on the application layer. SSH is currently a more reliable protocol specifically for remote login sessions and other network services. Using the SSH protocol can effectively prevent information leakage during remote management. SSH was originally a program on UNIX systems, but quickly expanded to other operating platforms. SSH can make up for vulnerabilities in the network when used correctly. SSH clients are suitable for various platforms. Almost all UNIX platforms—including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix, and others—can run SSH.

  • [The way to obtain is at the end of the article!!]

  • [The way to obtain is at the end of the article!!]

  • TCP/IP

    • TCP: connection-oriented/reliable/byte stream based

    • UDP: connectionless/unreliable/message-oriented

    • Three-way handshake and four-way handshake

      • Three-way handshake (SYN/SYN+ACK/ACK)

      • Four-way handshake (FIN/ACK/FIN/ACK)

    • Why is it three-way handshake when connecting, but four-way handshake when closing?

      • Because when the Server side receives the Client side’s SYN connection request packet, it can directly send the SYN+ACK packet. The ACK packet is used for acknowledgment, and the SYN packet is used for synchronization. However, when closing the connection, when the Server side receives the FIN packet, it may not immediately close the SOCKET, so it can only reply with an ACK packet to tell the Client side, “I received the FIN packet you sent.” Only after the Server side has sent all its packets can it send the FIN packet, so they cannot be sent together. Hence, four steps are needed to close.

    • Why does the TIME_WAIT state need to go through 2MSL (maximum segment lifetime) to return to the CLOSE state?

      • Although theoretically, after all four packets are sent, we can directly enter the CLOSE state, but we must assume that the network is unreliable, and the last ACK may be lost. Therefore, the TIME_WAIT state is used to resend the possibly lost ACK packet.

  • XSS/CSRF

    • HttpOnly prevents JS scripts from accessing and manipulating Cookies, effectively preventing XSS

MySQL

  • Index improvement process

    • Linear structure -> binary search -> hash -> binary search tree -> balanced binary tree -> multi-way search tree -> multi-way balanced search tree (B-Tree)

  • MySQL interview summary basic version

    https://segmentfault.com/a/1190000018371218

  • MySQL interview summary advanced version

    https://segmentfault.com/a/1190000018380324

  • Deep dive into MySQL

    http://ningning.today/2017/02/13/database/深入浅出mysql/

  • When emptying the entire table, InnoDB deletes row by row, while MyISAM deletes and rebuilds the table

  • text/blob data types cannot have default values, and there is no case conversion during queries

  • When does an index become invalid?

    • like fuzzy queries starting with %

    • Implicit type conversion occurs

    • The leftmost prefix principle is not satisfied

      • For multi-column indexes, if the first part is not used, the index will not be used

    • Invalid scenarios:

      • Try to avoid using != or <> operators in the where clause, otherwise the engine will abandon the use of the index and perform a full table scan

      • Try to avoid using or to connect conditions in the where clause, otherwise it will cause the engine to abandon the use of the index and perform a full table scan, even if some conditions have indexes, this is also why it is advisable to use or less

      • If the column type is a string, it must be enclosed in quotes in the condition, otherwise the index will not be used

      • Try to avoid performing function operations on fields in the where clause, as this will cause the engine to abandon the use of the index and perform a full table scan

For example:
select id from t where substring(name,1,3) = 'abc' – name;
Starting with abc, should change to:
select id from t where name like 'abc%' 
For example:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
Should change to:
      • Do not perform functions, arithmetic operations, or other expression operations on the left side of “=” in the where clause, otherwise the system may not be able to correctly use the index

      • Try to avoid performing expression operations on fields in the where clause, as this will cause the engine to abandon the use of the index and perform a full table scan

For example:
select id from t where num/2 = 100 
Should change to:
select id from t where num = 100*2ï¼›
      • Not suitable for columns with fewer key values (columns with more duplicate data) such as: set enum columns are not suitable (enumeration type (enum) can add null, and the default value will automatically filter the empty space collection (set) and enumeration is similar, but can only add 64 values)

      • If MySQL estimates that a full table scan will be faster than using an index, then do not use the index

  • What is a clustered index

    • In a B+Tree, the leaf nodes save data or pointers

    • MyISAM indexes and data are separated, using non-clustered

    • InnoDB data files are index files, primary key indexes are clustered indexes

Redis Command Summary

  • Why is it so fast?

    • Based on memory, written in C language

    • Uses multiplexed I/O model, non-blocking IO

    • Uses single thread to reduce thread switching

      • Since Redis is based on memory operations, the CPU is not the bottleneck for Redis, the bottleneck is most likely the size of the machine’s memory or network bandwidth. Since single-threading is easy to implement, and the CPU will not become a bottleneck, it is reasonable to adopt a single-threaded solution (after all, using multi-threading would cause many troubles!).

    • Simple data structure

    • Built its own VM mechanism, reducing the time for calling system functions

  • Advantages

    • High performance – Redis can read at a speed of 110,000 times/s, and write at a speed of 81,000 times/s

    • Rich data types

    • Atomic – All operations of Redis are atomic, and Redis also supports the atomic execution of several operations combined

    • Rich features – Redis also supports publish/subscribe, notifications, key expiration, and other features

  • What is a Redis transaction?

    • A mechanism that packages multiple requests and executes multiple commands once and in order

    • Transaction functions are realized through commands such as multi, exec, watch

    • Python redis-py pipeline=conn.pipeline(transaction=True)

  • Persistence methods

    • RDB (snapshot)

      • save (synchronous, can guarantee data consistency)

      • bgsave (asynchronous, when shutdown, if no AOF, it defaults to use)

    • AOF (append log)

  • How to implement a queue

    • push

    • rpop

  • Common data types (Bitmaps, Hyperloglogs, range queries, etc. not commonly used)

    • String: counter

      • integer or sds (Simple Dynamic String)

    • List: user’s followers, fans list

      • ziplist (continuous memory block, each entry node head saves the length information of the front and back nodes to achieve bidirectional linked list function) or double linked list

    • Hash:

    • Set: user’s followers

      • intset or hashtable

    • Zset (sorted set): real-time information leaderboard

      • skiplist (skip list)

  • Difference from Memcached

    • Memcached can only store string keys

    • Memcached users can only append data to the end of existing strings and use this string as a list. However, when deleting these elements, Memcached uses a blacklist to hide elements in the list, thus avoiding operations such as reading, updating, and deleting elements

    • Both Redis and Memcached store data in memory and are in-memory databases. However, Memcached can also be used to cache other things, such as images, videos, etc.

    • Virtual memory – Redis can swap some values that have not been used for a long time to disk when physical memory is exhausted

    • Data storage safety – If Memcached crashes, the data is lost; Redis can periodically save to disk (persistence)

    • Different application scenarios: Redis is used as a NoSQL database, and can also be used as a message queue, data stack, and data cache; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data, and sessions, etc.

  • Implement distributed locks with Redis

    • Use setnx to implement locking, and can also add expiration time through expire

    • The value of the lock can be a random uuid or a specific name

    • When releasing the lock, check if it is that lock through uuid, if so, execute delete to release the lock

  • Common problems

    • Cache avalanche

      • Cache data expires in a short period of time, a large number of requests access the database

    • Cache penetration

      • When requesting data, querying the cache does not exist, and it does not exist in the database either

    • Cache warming

      • Initialize the project, adding some commonly used data to the cache

    • Cache update

      • Data expires, update cache data

    • Cache degradation

      • When the access volume surges, the service has problems (such as slow response time or no response) or non-core services affect the performance of core processes, it is still necessary to ensure that the service is available, even if it is a degraded service. The system can automatically degrade based on some key data, or can configure a switch for manual degradation

  • Consistent Hashing algorithm

    • Ensure data consistency when using clusters

  • Based on Redis to implement a distributed lock, requiring a timeout parameter

    • setnx

  • Virtual memory

  • Memory jitter

  • [The way to obtain is at the end of the article!!]

  • [The way to obtain is at the end of the article!!]

Linux

  • Five I/O models of Unix

    • Blocking IO

    • Non-blocking IO

    • Multiplexing IO (implemented using selectot for IO multiplexing in Python)

      • select

        • Not high concurrency, in the case of very active connection numbers

      • poll

        • Not much improvement over select

      • epoll

        • Suitable for situations with many connections, but few active links

    • Signal-driven IO

    • Asynchronous IO (implemented asynchronously by Gevent/Asyncio)

  • Command manual better than man

    • tldr: a manual with command examples

  • Difference between kill -9 and -15

    • -15: program stops immediately/when the program releases the corresponding resources then stops/program may still continue running

    • -9: due to the uncertainty of -15, directly use -9 to immediately kill the process

  • Paging mechanism (a memory allocation management scheme that separates logical addresses and physical addresses):

    • Operating system efficiently manages memory to reduce fragmentation

    • Logical addresses of the program are divided into fixed-size pages

    • Physical addresses are divided into frames of the same size

    • Corresponding logical addresses and physical addresses through page tables

  • Segmentation mechanism

    • To meet some logical requirements of the code

    • Data sharing/data protection/dynamic linking

    • Each segment is continuously allocated memory, and segments are discretely allocated

  • How to check CPU memory usage?

    • top

    • free Check available memory, troubleshoot memory leaks

Design Patterns

Singleton Pattern

    # Method one
    def Single(cls,*args,**kwargs):
        instances = {}
        def get_instance (*args, **kwargs):
            if cls not in instances:
                instances[cls] = cls(*args, **kwargs)
            return instances[cls]
        return get_instance
    @Single
    class B:
        pass
    # Method two
    class Single:
        def __init__(self):
            print("Singleton pattern implementation method two...")

    single = Single()
    del Single  # Every time you call single, you can do it
    # Method three (most commonly used method)
    class Single:
        def __new__(cls,*args,**kwargs):
            if not hasattr(cls,'_instance'):
                cls._instance = super().__new__(cls,*args,**kwargs)
            return cls._instance

Factory Pattern

    class Dog:
        def __init__(self):
            print("Wang Wang Wang")
    class Cat:
        def __init__(self):
            print("Miao Miao Miao")


    def fac(animal):
        if animal.lower() == "dog":
            return Dog()
        if animal.lower() == "cat":
            return Cat()
        print("Sorry, must be: dog, cat")

Builder Pattern

    class Computer:
        def __init__(self,serial_number):
            self.serial_number = serial_number
            self.memory = None
            self.hadd = None
            self.gpu = None
        def __str__(self):
            info = (f'Memory:{self.memoryGB}',
            'Hard Disk:{self.hadd}GB',
            'Graphics Card:{self.gpu}')
            return ''.join(info)
    class ComputerBuilder:
        def __init__(self):
            self.computer = Computer('Jim1996')
        def configure_memory(self,amount):
            self.computer.memory = amount
            return self # to facilitate chain calling
        def configure_hdd(self,amount):
            pass
        def configure_gpu(self,gpu_model):
            pass
    class HardwareEngineer:
        def __init__(self):
            self.builder = None
        def construct_computer(self,memory,hdd,gpu)
            self.builder = ComputerBuilder()
            self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
        @property
        def computer(self):
            return self.builder.computer

Data Structures and Algorithms Built-in Data Structures and Algorithms

Python implementation of various data structures

Quick Sort

    def quick_sort(_list):
            if len(_list) &lt; 2:
                return _list
            pivot_index = 0
            pivot = _list(pivot_index)
            left_list = [i for i in _list[:pivot_index] if i &lt; pivot]
            right_list = [i for i in _list[pivot_index:] if i &gt; pivot]
        return quick_sort(left) + [pivot] + quick_sort(right)

Selection Sort

    def select_sort(seq):
        n = len(seq)
        for i in range(n-1)
        min_idx = i
            for j in range(i+1,n):
                if seq[j] &lt; seq[min_inx]:
                    min_idx = j
            if min_idx != i:
                seq[i], seq[min_idx] = seq[min_idx],seq[i]

Insertion Sort

    def insertion_sort(_list):
        n = len(_list)
        for i in range(1,n):
            value = _list[i]
            pos = i
            while pos &gt; 0 and value &lt; _list[pos - 1]
                _list[pos] = _list[pos - 1]
                pos -= 1
            _list[pos] = value
            print(sql)

Merge Sort

    def merge_sorted_list(_list1,_list2):   # Merge ordered lists
        len_a, len_b = len(_list1),len(_list2)
        a = b = 0
        sort = []
        while len_a &gt; a and len_b &gt; b:
            if _list1[a] &gt; _list2[b]:
                sort.append(_list2[b])
                b += 1
            else:
                sort.append(_list1[a])
                a += 1
        if len_a &gt; a:
            sort.append(_list1[a:])
        if len_b &gt; b:
            sort.append(_list2[b:])
        return sort

    def merge_sort(_list):
        if len(list1)&lt;2:
            return list1
        else:
            mid = int(len(list1)/2)
            left = mergesort(list1[:mid])
            right = mergesort(list1[mid:])
            return merge_sorted_list(left,right)

Heap Sort heapq module

    from heapq import nsmallest
    def heap_sort(_list):
        return nsmallest(len(_list),_list)

Stack

    from collections import deque
    class Stack:
        def __init__(self):
            self.s = deque()
        def peek(self):
            p = self.pop()
            self.push(p)
            return p
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.pop()

Queue

    from collections import deque
    class Queue:
        def __init__(self):
            self.s = deque()
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.popleft()

Binary Search

def binary_search(_list,num):
        mid = len(_list)//2
        if len(_list) &lt; 1:
            return Flase
        if num &gt; _list[mid]:
            BinarySearch(_list[mid:],num)
        elif num &lt; _list[mid]:
            BinarySearch(_list[:mid],num)
        else:
            return _list.index(num)

[The way to obtain is at the end of the article!!]

[The way to obtain is at the end of the article!!]

Interview Strengthening Questions:

About database optimization and design

https://segmentfault.com/a/1190000018426586

  • How to implement a queue using two stacks

  • Reverse linked list

  • Merge two sorted linked lists

  • Delete linked list nodes

  • Reverse binary tree

  • Design a short URL service? 62-base implementation

  • Design a flash sale system (feed stream)?

    https://www.jianshu.com/p/ea0259d109f9

  • Why is it better to use auto-incrementing integers for primary keys in MySQL databases? Can UUID be used? Why?

    • If the data writing order of the InnoDB table can be consistent with the order of the leaf nodes of the B+ tree index, the storage and access efficiency will be the highest. To improve storage and query performance, it is recommended to use auto-incrementing IDs as primary keys.

    • For the primary index of InnoDB, the data will be sorted according to the primary key. Due to the unordered nature of UUID, InnoDB will generate a huge IO pressure, making it unsuitable to use UUID as a physical primary key. It can be used as a logical primary key, while the physical primary key still uses auto-incrementing ID. For global uniqueness, UUID should be used to index and associate other tables or as foreign keys.

  • If we generate auto-incrementing IDs for databases in a distributed system, how do we do it?

    • Use Redis

  • Based on Redis, implement a distributed lock requiring a timeout parameter

    • setnx

    • setnx + expire

  • What if a single Redis node goes down? Are there other industry solutions to implement distributed lock codes?

    • Use consistent hashing algorithm

Cache Algorithms

  • LRU (least-recently-used): replaces the least recently used object

  • LFU (Least frequently used): least frequently used, if a piece of data has been used very few times in a recent period, the likelihood of it being used in the future is also very low

Server Performance Optimization Directions

  • Using data structures and algorithms

  • Database

    • Index optimization

    • Eliminate slow queries

      • slow_query_log_file enabled and query slow query logs

      • Use explain to troubleshoot index issues

      • Adjust data modification indexes

    • Batch operations to reduce IO operations

    • Use NoSQL: for example, Redis

  • Network IO

    • Batch operations

    • Pipeline

  • Cache

    • Redis

  • Asynchronous

    • Asynchronous operations implemented by Asyncio

    • Use Celery to reduce IO blocking

  • Concurrency

    • Multi-threading

    • Gevent

According to the learning progress of students, arrange corresponding Python training camps. Spending eight hours a week learning Python with me can enhance oneself. What I am arranging for you is free.

How to obtain:

  1. Like + see again

  2. Reply “python” in the public account

Get the latest Python zero-based learning materials for 2024,Reply in the background:Python

Leave a Comment