Comprehensive Python Key Knowledge Summary

Follow 👆 the public account and reply "python" to get a zero-based tutorial! Source from the internet, please delete if infringing.

This is a summary of key Python knowledge compiled by developer @TwentyOne from SegmentFault. Due to the extensive nature of the summary, the length is a bit long, and this is also the result of the author’s “patching up” for a long time.

Comprehensive Python Key Knowledge Summary

[The way to obtain is at the end of the article!!]

Py2 VS Py3

print has become a function, in python2 it is a keyword
No more unicode objects, default str is unicode
In python3, division returns a float
No long type anymore
xrange does not exist, range replaces xrange
Functions and variable names can be defined in Chinese
Advanced unpacking and * unpacking
Keyword arguments must include name=value after *
raise from
iteritems removed and changed to items()
yield from links to sub-generators
asyncio, async/await natively support asynchronous programming
New additions: enum, mock, ipaddress, concurrent.futures, asyncio urllib, selector

Different enumeration classes cannot be compared
Only equality comparison is allowed within the same enumeration class
Usage of enumeration class (numbering starts from 1 by default)
To avoid the occurrence of the same enumeration value in the enumeration class, the @unique decorator can be used on the enumeration class

# Notes on enumerations
from enum import Enum

class COLOR(Enum):
    YELLOW=1
#YELLOW=2# will report an error
    GREEN=1# will not report an error, GREEN can be considered an alias of YELLOW
    BLACK=3
    RED=4
print(COLOR.GREEN)#COLOR.YELLOW, will still print YELLOW
for i in COLOR:# traversing COLOR will not include GREEN
    print(i)
#COLOR.YELLOW
COLOR.BLACK
COLOR.RED
How to traverse aliases
for i in COLOR.__members__.items():
    print(i)
# output:('YELLOW', &lt;COLOR.YELLOW: 1&gt;)
('GREEN', &lt;COLOR.YELLOW: 1&gt;)
('BLACK', &lt;COLOR.BLACK: 3&gt;)
('RED', &lt;COLOR.RED: 4&gt;)
for i in COLOR.__members__:
    print(i)
# output:YELLOW
GREEN
BLACK
RED

# Enumeration conversion
# It's best to use the numeric value of the enumeration for storage in the database rather than using the label name string
# Use enumeration classes in the code
a=1
print(COLOR(a))# output:COLOR.YELLOW

py2/3 Conversion Tools

six module: a module compatible with python2 and python3
2to3 tool: changes code syntax version
__future__: use features from the next version

Common Libraries

Collections that must be known

https://segmentfault.com/a/1190000017385799
Python sorting operations and heapq module

https://segmentfault.com/a/1190000017383322
itertools module super practical methods

https://segmentfault.com/a/1190000017416590

Less Common but Very Important Libraries

dis (code bytecode analysis)
inspect (generator state)
cProfile (performance analysis)
bisect (maintain ordered list)
fnmatch

fnmatch(string,”*.txt”) # case insensitive on win
fnmatch depends on the system
fnmatchcase is case sensitive

timeit (code execution time)

    def isLen(strString):
        # should still use ternary expression, faster
        return True if len(strString)&gt;6 else False

    def isLen1(strString):
        # note the position of false and true here
        return [False,True][len(strString)&gt;6]
    import timeit
    print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))

    print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))

contextlib

@contextlib.contextmanager makes generator functions into context managers

types (contains all types of type objects defined by the standard interpreter, can decorate generator functions as asynchronous)

    import types
    types.coroutine # equivalent to implementing __await__

html (implement HTML escaping)

    import html
    html.escape("&lt;h1&gt;I'm Jim&lt;/h1&gt;") # output:'&amp;lt;h1&amp;gt;I&amp;#x27;m Jim&amp;lt;/h1&amp;gt;'
    html.unescape('&amp;lt;h1&amp;gt;I&amp;#x27;m Jim&amp;lt;/h1&amp;gt;') # &lt;h1&gt;I'm Jim&lt;/h1&gt;

mock (solving test dependencies)
concurrent (creating process pools and thread pools)

from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor()
task = pool.submit(function_name,(parameters)) # this method will not block and will return immediately
task.done()# check if the task is completed
task.result()# blocking method, check the return value of the task
task.cancel()# cancel the unexecuted task, returns True or False, returns True if canceled successfully
task.add_done_callback()# callback function
task.running()# whether it is executing     task is a Future object

for data in pool.map(function, parameter_list):# returns a list of results of completed tasks, executed in the order of parameters
    print(the result of the completed task data)

from concurrent.futures import as_completed
as_completed(task_list)# returns a list of completed tasks, executes one as soon as it completes

wait(task_list,return_when=condition)# block the main thread according to the condition, there are four conditions

selector (wraps select, user multiplexing IO programming)
asyncio

future=asyncio.ensure_future(coroutine)  equals the following method  future=loop.create_task(coroutine)
future.add_done_callback() adds a callback function after completion
loop.run_until_complete(future)
future.result() check the return result

asyncio.wait() accepts an iterable coroutine object
asynicio.gather(*iterable_object,*iterable_object）    both results are the same, but gather can cancel in bulk, gather object.cancel()

There is only one loop in a thread

When loop.stop, make sure to loop.run_forever() otherwise an error will occur
loop.run_forever() can execute non-coroutines
Finally execute finally module loop.close()

asyncio.Task.all_tasks() get all tasks and then iterate through and use task.cancel() to cancel

Partial function partial(function, parameters) wraps the function into another function name, its parameters must be placed before the defined function

loop.call_soon(function, parameters)
call_soon_threadsafe() thread-safe    
loop.call_later(time, function, parameters)
In the same code block call_soon takes priority, then multiple later execute in ascending order of time

If you must run blocking code
use loop.run_in_executor(executor,function,parameters) wrap into a multi-thread, then put into a task list, run through wait(task list)

Implement HTTP via asyncio
reader,writer=await asyncio.open_connection(host,port)
writer.writer() send request
async for data in reader:
    data=data.decode("utf-8")
    list.append(data)
then the list contains the HTML

as_completed(tasks) returns one as soon as one is completed, returns an iterable object    

Coroutine lock
async with Lock():

[The way to obtain is at the end of the article!!]

Python Advanced

Inter-process communication:

Manager (built-in many data structures, can achieve memory sharing between multiple processes)

from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
    p_dict[key] = value

if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)

Pipe (suitable for two processes)

from multiprocessing import Pipe,Process
#pipe performance is higher than queue
def producer(pipe):
    pipe.send("bobby")

def consumer(pipe):
    print(pipe.recv())

if __name__ == "__main__":
    recevie_pipe, send_pipe = Pipe()
    #pipe can only be used for two processes
    my_producer= Process(target=producer, args=(send_pipe, ))
    my_consumer = Process(target=consumer, args=(recevie_pipe,))

    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

Queue (cannot be used in process pools, inter-process communication requires Manager().Queue())

from multiprocessing import Queue,Process
def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

Process pool

def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()

Common methods of sys module

argv command line parameter list, the first is the path of the program itself
path returns the module search path
modules.keys() returns the list of all imported modules
exit(0) exit the program

a in s or b in s or c in s shorthand

using any: all() returns True for any iterable object that is empty

    # Method one
    True in [i in s for i in [a,b,c]]
    # Method two
    any(i in s for i in [a,b,c])
    # Method three
    list(filter(lambda x:x in s,[a,b,c]))

set collection application

{1,2}.issubset({1,2,3})# check if it is a subset
{1,2,3}.issuperset({1,2})
{}.isdisjoint({})# check if the intersection of two sets is empty, if it is empty then True

Chinese matching in code

[u4E00-u9FA5] matches the range of Chinese characters [one to 龥]

Check the system’s default encoding format

    import sys
    sys.getdefaultencoding()    # setdefaultencodeing() sets the system encoding method

getattr VS getattribute

class A(dict):
    def __getattr__(self,value):# returns when accessing a non-existent attribute
        return 2
    def __getattribute__(self,item):# shield all element access
        return item

Class variables are not stored in instance __dict__, only exist in class’s __dict__
globals/locals (can indirectly operate code)

globals contains all variable properties and values in the current module
locals contains all variable properties and values in the current environment

Python variable name resolution mechanism (LEGB)

Local scope (Local)
The current scope embedded local scope (Enclosing locals)
Global/module scope (Global)
Built-in scope (Built-in)

Implement grouping from 1-100 every three

    print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])

What is a metaclass?

It is the class that creates classes, when creating a class, just need to set metaclass=metaclass, the metaclass needs to inherit from type instead of object, because type is the metaclass

type.__bases__  #(&lt;class 'object'&gt;,)
object.__bases__    #()
type(object)    #&lt;class 'type'&gt;

    class Yuan(type):
        def __new__(cls,name,base,attr,*args,**kwargs):
            return type(name,base,attr,*args,**kwargs)
    class MyClass(metaclass=Yuan):
        pass

What is duck typing (i.e.: polymorphism)?

Python does not default to judging parameter types during the process of using input parameters, as long as the parameters meet the execution conditions, they can be executed

Deep copy and shallow copy

Deep copy copies content, shallow copy copies address (increases reference count)
copy module implements deep copy

Unit testing

Generally, the test class inherits from the TestCase in the unittest module
pytest module quick testing (methods starting with test_/test files starting with test_/test classes starting with Test and cannot have init method)
coverage statistics testing coverage
[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]

    class MyTest(unittest.TestCase):
        def tearDown(self):# Execute before each test case
            print('This method has started testing')

        def setUp(self):# Perform operations before each test case executes
            print('This method has finished testing')

        @classmethod
        def tearDownClass(self):# Must use @classmethod decorator, runs once after all tests have run
            print('Start testing')
        @classmethod
        def setUpClass(self):# Must use @classmethod decorator, runs once before all tests
            print('End testing')

        def test_a_run(self):
            self.assertEqual(1, 1)  # test case

GIL releases based on the number of executed bytecode lines and time slices, GIL actively releases when encountering IO operations
What is monkey patch?

Monkey patch, replaces blocking syntax with non-blocking methods during runtime

What is introspection?

The ability to determine the type of an object at runtime, id, type, isinstance

Is Python pass-by-value or pass-by-reference?

Neither, Python is shared parameter passing, default parameters are only executed once during execution

Difference between else and finally in try-except-else-finally

else executes when no exception occurs, finally executes regardless of whether an exception occurs
except can catch multiple exceptions at once, but generally for different exceptions, we capture them separately

GIL global interpreter lock

Only one thread can execute at the same time, a feature of CPython (IPython), other interpreters do not have this
CPU-intensive: multi-process + process pool
IO-intensive: multi-threading/coroutines

What is Cython

A tool that compiles Python into C code

Generators and iterators

Iterable objects only need to implement the __iter__ method

Objects that implement __next__ and __iter__ methods are iterators

Using generator expressions or generator functions with yield (a generator is a special kind of iterator)

What is a coroutine

yield
async-await

A more lightweight way of multitasking than threads
Implementation method

Dict underlying structure

In order to support fast lookup, a hash table is used as the underlying structure
The average time complexity for hash table lookup is O(1)
The CPython interpreter uses double probing to solve hash collision problems

Hash expansion and hash collision resolution solutions

Chaining method
Double probing (open addressing method): Python uses

Circularly copies to a new space to achieve expansion
Collision resolution:

    from gevent import monkey
    monkey.patch_all()  # modify all blocking methods in the code, can specify specific methods to modify

Check if it is a generator or coroutine

    co_flags = func.__code__.co_flags

    # Check if it is a coroutine
    if co_flags &amp; 0x180:
        return func

    # Check if it is a generator
    if co_flags &amp; 0x20:
        return func

Fibonacci problems and variations

# A frog can jump up 1 step or 2 steps at a time. How many ways can the frog jump up n steps?
# How many ways can n 2*1 small rectangles cover a 2*n big rectangle without overlapping?
# Method one:
fib = lambda n: n if n &lt;= 2 else fib(n - 1) + fib(n - 2)
# Method two:
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return b

# A frog can jump up 1 step or 2 steps... it can also jump up n steps. How many ways can the frog jump up n steps?
fib = lambda n: n if n &lt; 2 else 2 * fib(n - 1)

Get the environment variables set on the computer

    import os
    os.getenv(env_name,None)# get the environment variable if it does not exist return None

Garbage collection mechanism

Reference counting
Mark and sweep
Generational collection

    # Check the threshold for generational collection
    import gc
    gc.get_threshold()  #output:(700, 10, 10)

True and False are completely equivalent to 1 and 0 in the code, can be directly calculated with numbers, inf represents infinity
C10M/C10K

C10M: 8-core CPU, 64G memory, maintaining 10 million concurrent connections on a 10gbps network
C10K: 1GHz CPU, 2G memory, maintaining 10 thousand clients providing FTP service under 1gbps network environment

Difference between yield from and yield:

yield from follows an iterable object, while yield has no restrictions
GeneratorExit triggers when the generator stops

Several uses of single underscore

When defining a variable, it indicates a private variable
When unpacking, it indicates discarding useless data
In interactive mode, it represents the result of the last code execution
Can be used for number concatenation (111_222_333)

Using break will not execute else
Decimal to binary conversion

[The way to obtain is at the end of the article!!]

def conver_bin(num):
        if num == 0:
            return num
        re = []
        while num:
            num, rem = divmod(num,2)
            re.append(str(rem))
        return "".join(reversed(re))
    conver_bin(10)

list1 = [‘A’, ‘B’, ‘C’, ‘D’] How can I get a new list named after the elements in the list A=[],B=[],C=[],D=[]?

    list1 = ['A', 'B', 'C', 'D']

    # Method one
    for i in list1:
        globals()[i] = []   # can be used to implement python version reflection

    # Method two
    for i in list1:
        exec(f'{i} = []')   # exec executes string statements

memoryview and bytearray (not commonly used, just recorded)

    # bytearray is mutable, bytes is immutable, memoryview does not produce new slices and objects
    a = 'aaaaaa'
    ma = memoryview(a)
    ma.readonly  # readonly memoryview
    mb = ma[:2]  # will not produce new string

    a = bytearray('aaaaaa')
    ma = memoryview(a)
    ma.readonly  # writable memoryview
    mb = ma[:2]      # will not produce new bytearray
    mb[:2] = 'bb'    # changes to mb will also change ma

Ellipsis type

# The phenomenon of ... appearing in the code is an Ellipsis object
L = [1,2,3]
L.append(L)
print(L)    # output:[1,2,3,[…]]

Lazy evaluation

    class lazy(object):
        def __init__(self, func):
            self.func = func

        def __get__(self, instance, cls):
            val = self.func(instance)    # equivalent to executing area(c), c is the Circle object below
            setattr(instance, self.func.__name__, val)
            return val

    class Circle(object):
        def __init__(self, radius):
            self.radius = radius

        @lazy
        def area(self):
            print('evaluate')
            return 3.14 * self.radius ** 2

Traverse files, pass in a folder, print out all file paths inside (recursively)

all_files = []    
def getAllFiles(directory_path):
    import os                                       
    for sChild in os.listdir(directory_path):                
        sChildPath = os.path.join(directory_path,sChild)
        if os.path.isdir(sChildPath):
            getAllFiles(sChildPath)
        else:
            all_files.append(sChildPath)
    return all_files

When storing files, handling file names

# secure_filename converts strings into safe file names
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool \uc3b4mluts.txt') # output:i_contain_cool_umlauts.txt

Date formatting

from datetime import datetime

datetime.now().strftime("%Y-%m-%d")

import time
# Only localtime can be formatted; time cannot be formatted
time.strftime("%Y-%m-%d",time.localtime())

Strange issues with tuple using +=

# Will report an error, but the value of the tuple will change because t[1]id has not changed

t=(1,[2,3])
t[1]+=[4,5]
# t[1] using append	extend method will not report an error and can be executed successfully

__missing__ you should know

class Mydict(dict):
    def __missing__(self,key): # returns the value when Mydict accesses the property that does not exist through slicing
        return key

+ and +=

# + cannot be used to concatenate lists and tuples, while += can (through iadd implementation, internal implementation is extends(), so it can increase tuples), + will create new objects
# Immutable objects do not have __iadd__ method, so they directly use __add__ method, thus tuples can use += to add between tuples

How to turn each element of an iterable into all keys of a dictionary?

dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}

Wireshark packet capture software

Network Knowledge

What is HTTPS?

Secure HTTP protocol, https requires a cs certificate, data encryption, port 443, secure, the same website https SEO ranking will be higher

Common response status codes

    204 No Content // request successfully processed, no entity body returned, generally used to indicate successful deletion
    206 Partial Content // Get range request has been successfully processed
    303 See Other // temporary redirect, expected to use get redirection to obtain
    304 Not Modified // request cached resource
    307 Temporary Redirect // temporary redirect, Post will not change to Get
    401 Unauthorized // authentication failed
    403 Forbidden // resource request denied
    400 // request parameter error
    201 // add or change successfully
    503 // server maintenance or overload

Idempotence and safety of HTTP request methods
WSGI

    # environ: a dict object containing all HTTP request information
    # start_response: a function that sends HTTP responses
    def application(environ, start_response):
        start_response('200 OK', [('Content-Type', 'text/html')])
        return '&lt;h1&gt;Hello, web!&lt;/h1&gt;'

RPC
CDN
SSL (Secure Sockets Layer) and its successor Transport Layer Security (TLS) is a security protocol that provides security and data integrity for network communication.
SSH (Secure Shell) is a secure protocol established by the IETF’s Network Working Group; SSH is a security protocol built on the application layer. SSH is currently a more reliable protocol specifically for remote login sessions and other network services. Using the SSH protocol can effectively prevent information leakage during remote management. SSH was originally a program on UNIX systems, but quickly expanded to other operating platforms. SSH can make up for vulnerabilities in the network when used correctly. SSH clients are suitable for various platforms. Almost all UNIX platforms—including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix, and others—can run SSH.
[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]
TCP/IP

TCP: connection-oriented/reliable/byte stream based
UDP: connectionless/unreliable/message-oriented
Three-way handshake and four-way handshake

Three-way handshake (SYN/SYN+ACK/ACK)
Four-way handshake (FIN/ACK/FIN/ACK)

Why is it three-way handshake when connecting, but four-way handshake when closing?

Because when the Server side receives the Client side’s SYN connection request packet, it can directly send the SYN+ACK packet. The ACK packet is used for acknowledgment, and the SYN packet is used for synchronization. However, when closing the connection, when the Server side receives the FIN packet, it may not immediately close the SOCKET, so it can only reply with an ACK packet to tell the Client side, “I received the FIN packet you sent.” Only after the Server side has sent all its packets can it send the FIN packet, so they cannot be sent together. Hence, four steps are needed to close.

Why does the TIME_WAIT state need to go through 2MSL (maximum segment lifetime) to return to the CLOSE state?

Although theoretically, after all four packets are sent, we can directly enter the CLOSE state, but we must assume that the network is unreliable, and the last ACK may be lost. Therefore, the TIME_WAIT state is used to resend the possibly lost ACK packet.

XSS/CSRF

HttpOnly prevents JS scripts from accessing and manipulating Cookies, effectively preventing XSS

MySQL

Index improvement process

Linear structure -> binary search -> hash -> binary search tree -> balanced binary tree -> multi-way search tree -> multi-way balanced search tree (B-Tree)

MySQL interview summary basic version

https://segmentfault.com/a/1190000018371218
MySQL interview summary advanced version

https://segmentfault.com/a/1190000018380324
Deep dive into MySQL

http://ningning.today/2017/02/13/database/深入浅出mysql/
When emptying the entire table, InnoDB deletes row by row, while MyISAM deletes and rebuilds the table
text/blob data types cannot have default values, and there is no case conversion during queries
When does an index become invalid?

like fuzzy queries starting with %
Implicit type conversion occurs
The leftmost prefix principle is not satisfied

For multi-column indexes, if the first part is not used, the index will not be used

Invalid scenarios:

Try to avoid using != or <> operators in the where clause, otherwise the engine will abandon the use of the index and perform a full table scan
Try to avoid using or to connect conditions in the where clause, otherwise it will cause the engine to abandon the use of the index and perform a full table scan, even if some conditions have indexes, this is also why it is advisable to use or less
If the column type is a string, it must be enclosed in quotes in the condition, otherwise the index will not be used
Try to avoid performing function operations on fields in the where clause, as this will cause the engine to abandon the use of the index and perform a full table scan

For example:
select id from t where substring(name,1,3) = 'abc' – name;
Starting with abc, should change to:
select id from t where name like 'abc%' 
For example:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
Should change to:

Do not perform functions, arithmetic operations, or other expression operations on the left side of “=” in the where clause, otherwise the system may not be able to correctly use the index
Try to avoid performing expression operations on fields in the where clause, as this will cause the engine to abandon the use of the index and perform a full table scan

For example:
select id from t where num/2 = 100 
Should change to:
select id from t where num = 100*2；

Not suitable for columns with fewer key values (columns with more duplicate data) such as: set enum columns are not suitable (enumeration type (enum) can add null, and the default value will automatically filter the empty space collection (set) and enumeration is similar, but can only add 64 values)
If MySQL estimates that a full table scan will be faster than using an index, then do not use the index

What is a clustered index

In a B+Tree, the leaf nodes save data or pointers
MyISAM indexes and data are separated, using non-clustered
InnoDB data files are index files, primary key indexes are clustered indexes

Redis Command Summary

Why is it so fast?

Based on memory, written in C language
Uses multiplexed I/O model, non-blocking IO
Uses single thread to reduce thread switching

Since Redis is based on memory operations, the CPU is not the bottleneck for Redis, the bottleneck is most likely the size of the machine’s memory or network bandwidth. Since single-threading is easy to implement, and the CPU will not become a bottleneck, it is reasonable to adopt a single-threaded solution (after all, using multi-threading would cause many troubles!).

Simple data structure
Built its own VM mechanism, reducing the time for calling system functions

Advantages

High performance – Redis can read at a speed of 110,000 times/s, and write at a speed of 81,000 times/s
Rich data types
Atomic – All operations of Redis are atomic, and Redis also supports the atomic execution of several operations combined
Rich features – Redis also supports publish/subscribe, notifications, key expiration, and other features

What is a Redis transaction?

A mechanism that packages multiple requests and executes multiple commands once and in order
Transaction functions are realized through commands such as multi, exec, watch
Python redis-py pipeline=conn.pipeline(transaction=True)

Persistence methods

RDB (snapshot)

save (synchronous, can guarantee data consistency)
bgsave (asynchronous, when shutdown, if no AOF, it defaults to use)

AOF (append log)

How to implement a queue

push
rpop

Common data types (Bitmaps, Hyperloglogs, range queries, etc. not commonly used)

String: counter

integer or sds (Simple Dynamic String)

List: user’s followers, fans list

ziplist (continuous memory block, each entry node head saves the length information of the front and back nodes to achieve bidirectional linked list function) or double linked list

Hash:
Set: user’s followers

intset or hashtable

Zset (sorted set): real-time information leaderboard

skiplist (skip list)

Difference from Memcached

Memcached can only store string keys
Memcached users can only append data to the end of existing strings and use this string as a list. However, when deleting these elements, Memcached uses a blacklist to hide elements in the list, thus avoiding operations such as reading, updating, and deleting elements
Both Redis and Memcached store data in memory and are in-memory databases. However, Memcached can also be used to cache other things, such as images, videos, etc.
Virtual memory – Redis can swap some values that have not been used for a long time to disk when physical memory is exhausted
Data storage safety – If Memcached crashes, the data is lost; Redis can periodically save to disk (persistence)
Different application scenarios: Redis is used as a NoSQL database, and can also be used as a message queue, data stack, and data cache; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data, and sessions, etc.

Implement distributed locks with Redis

Use setnx to implement locking, and can also add expiration time through expire
The value of the lock can be a random uuid or a specific name
When releasing the lock, check if it is that lock through uuid, if so, execute delete to release the lock

Common problems

Cache avalanche

Cache data expires in a short period of time, a large number of requests access the database

Cache penetration

When requesting data, querying the cache does not exist, and it does not exist in the database either

Cache warming

Initialize the project, adding some commonly used data to the cache

Cache update

Data expires, update cache data

Cache degradation

When the access volume surges, the service has problems (such as slow response time or no response) or non-core services affect the performance of core processes, it is still necessary to ensure that the service is available, even if it is a degraded service. The system can automatically degrade based on some key data, or can configure a switch for manual degradation

Consistent Hashing algorithm

Ensure data consistency when using clusters

Based on Redis to implement a distributed lock, requiring a timeout parameter

setnx

Virtual memory
Memory jitter
[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]

Linux

Five I/O models of Unix

Blocking IO
Non-blocking IO
Multiplexing IO (implemented using selectot for IO multiplexing in Python)

select

Not high concurrency, in the case of very active connection numbers

poll

Not much improvement over select

epoll

Suitable for situations with many connections, but few active links

Signal-driven IO
Asynchronous IO (implemented asynchronously by Gevent/Asyncio)

Command manual better than man

tldr: a manual with command examples

Difference between kill -9 and -15

-15: program stops immediately/when the program releases the corresponding resources then stops/program may still continue running
-9: due to the uncertainty of -15, directly use -9 to immediately kill the process

Paging mechanism (a memory allocation management scheme that separates logical addresses and physical addresses):

Operating system efficiently manages memory to reduce fragmentation
Logical addresses of the program are divided into fixed-size pages
Physical addresses are divided into frames of the same size
Corresponding logical addresses and physical addresses through page tables

Segmentation mechanism

To meet some logical requirements of the code
Data sharing/data protection/dynamic linking
Each segment is continuously allocated memory, and segments are discretely allocated

How to check CPU memory usage?

top
free Check available memory, troubleshoot memory leaks

Design Patterns

Singleton Pattern

    # Method one
    def Single(cls,*args,**kwargs):
        instances = {}
        def get_instance (*args, **kwargs):
            if cls not in instances:
                instances[cls] = cls(*args, **kwargs)
            return instances[cls]
        return get_instance
    @Single
    class B:
        pass
    # Method two
    class Single:
        def __init__(self):
            print("Singleton pattern implementation method two...")

    single = Single()
    del Single  # Every time you call single, you can do it
    # Method three (most commonly used method)
    class Single:
        def __new__(cls,*args,**kwargs):
            if not hasattr(cls,'_instance'):
                cls._instance = super().__new__(cls,*args,**kwargs)
            return cls._instance

Factory Pattern

    class Dog:
        def __init__(self):
            print("Wang Wang Wang")
    class Cat:
        def __init__(self):
            print("Miao Miao Miao")


    def fac(animal):
        if animal.lower() == "dog":
            return Dog()
        if animal.lower() == "cat":
            return Cat()
        print("Sorry, must be: dog, cat")

Builder Pattern

    class Computer:
        def __init__(self,serial_number):
            self.serial_number = serial_number
            self.memory = None
            self.hadd = None
            self.gpu = None
        def __str__(self):
            info = (f'Memory:{self.memoryGB}',
            'Hard Disk:{self.hadd}GB',
            'Graphics Card:{self.gpu}')
            return ''.join(info)
    class ComputerBuilder：
        def __init__(self):
            self.computer = Computer('Jim1996')
        def configure_memory(self,amount):
            self.computer.memory = amount
            return self # to facilitate chain calling
        def configure_hdd(self,amount):
            pass
        def configure_gpu(self,gpu_model):
            pass
    class HardwareEngineer:
        def __init__(self):
            self.builder = None
        def construct_computer(self,memory,hdd,gpu)
            self.builder = ComputerBuilder()
            self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
        @property
        def computer(self):
            return self.builder.computer

Data Structures and Algorithms Built-in Data Structures and Algorithms

Python implementation of various data structures

Quick Sort

    def quick_sort(_list):
            if len(_list) &lt; 2:
                return _list
            pivot_index = 0
            pivot = _list(pivot_index)
            left_list = [i for i in _list[:pivot_index] if i &lt; pivot]
            right_list = [i for i in _list[pivot_index:] if i &gt; pivot]
        return quick_sort(left) + [pivot] + quick_sort(right)

Selection Sort

    def select_sort(seq):
        n = len(seq)
        for i in range(n-1)
        min_idx = i
            for j in range(i+1,n):
                if seq[j] &lt; seq[min_inx]:
                    min_idx = j
            if min_idx != i:
                seq[i], seq[min_idx] = seq[min_idx],seq[i]

Insertion Sort

    def insertion_sort(_list):
        n = len(_list)
        for i in range(1,n):
            value = _list[i]
            pos = i
            while pos &gt; 0 and value &lt; _list[pos - 1]
                _list[pos] = _list[pos - 1]
                pos -= 1
            _list[pos] = value
            print(sql)

Merge Sort

    def merge_sorted_list(_list1,_list2):   # Merge ordered lists
        len_a, len_b = len(_list1),len(_list2)
        a = b = 0
        sort = []
        while len_a &gt; a and len_b &gt; b:
            if _list1[a] &gt; _list2[b]:
                sort.append(_list2[b])
                b += 1
            else:
                sort.append(_list1[a])
                a += 1
        if len_a &gt; a:
            sort.append(_list1[a:])
        if len_b &gt; b:
            sort.append(_list2[b:])
        return sort

    def merge_sort(_list):
        if len(list1)&lt;2:
            return list1
        else:
            mid = int(len(list1)/2)
            left = mergesort(list1[:mid])
            right = mergesort(list1[mid:])
            return merge_sorted_list(left,right)

Heap Sort heapq module

    from heapq import nsmallest
    def heap_sort(_list):
        return nsmallest(len(_list),_list)

Stack

    from collections import deque
    class Stack:
        def __init__(self):
            self.s = deque()
        def peek(self):
            p = self.pop()
            self.push(p)
            return p
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.pop()

Queue

    from collections import deque
    class Queue:
        def __init__(self):
            self.s = deque()
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.popleft()

Binary Search

def binary_search(_list,num):
        mid = len(_list)//2
        if len(_list) &lt; 1:
            return Flase
        if num &gt; _list[mid]:
            BinarySearch(_list[mid:],num)
        elif num &lt; _list[mid]:
            BinarySearch(_list[:mid],num)
        else:
            return _list.index(num)

[The way to obtain is at the end of the article!!]

Interview Strengthening Questions:

About database optimization and design

https://segmentfault.com/a/1190000018426586

How to implement a queue using two stacks
Reverse linked list
Merge two sorted linked lists
Delete linked list nodes
Reverse binary tree
Design a short URL service? 62-base implementation
Design a flash sale system (feed stream)?

https://www.jianshu.com/p/ea0259d109f9
Why is it better to use auto-incrementing integers for primary keys in MySQL databases? Can UUID be used? Why?

If the data writing order of the InnoDB table can be consistent with the order of the leaf nodes of the B+ tree index, the storage and access efficiency will be the highest. To improve storage and query performance, it is recommended to use auto-incrementing IDs as primary keys.
For the primary index of InnoDB, the data will be sorted according to the primary key. Due to the unordered nature of UUID, InnoDB will generate a huge IO pressure, making it unsuitable to use UUID as a physical primary key. It can be used as a logical primary key, while the physical primary key still uses auto-incrementing ID. For global uniqueness, UUID should be used to index and associate other tables or as foreign keys.

If we generate auto-incrementing IDs for databases in a distributed system, how do we do it?

Use Redis

Based on Redis, implement a distributed lock requiring a timeout parameter

setnx
setnx + expire

What if a single Redis node goes down? Are there other industry solutions to implement distributed lock codes?

Use consistent hashing algorithm

Cache Algorithms

LRU (least-recently-used): replaces the least recently used object
LFU (Least frequently used): least frequently used, if a piece of data has been used very few times in a recent period, the likelihood of it being used in the future is also very low

Server Performance Optimization Directions

Using data structures and algorithms
Database

Index optimization
Eliminate slow queries

slow_query_log_file enabled and query slow query logs
Use explain to troubleshoot index issues
Adjust data modification indexes

Batch operations to reduce IO operations
Use NoSQL: for example, Redis

Network IO

Batch operations
Pipeline

Cache

Redis

Asynchronous

Asynchronous operations implemented by Asyncio
Use Celery to reduce IO blocking

Concurrency

Multi-threading
Gevent

According to the learning progress of students, arrange corresponding Python training camps. Spending eight hours a week learning Python with me can enhance oneself. What I am arranging for you is free.

How to obtain:

Like + see again
Reply “python” in the public account

Get the latest Python zero-based learning materials for 2024,Reply in the background:Python

Related posts

Leave a Comment Cancel reply