Comprehensive Python Key Concepts You Need to Know

Click the above “Visual Learning for Beginners” to choose to add a Star or “Top”

Important content delivered promptly

Since I summarized too much content, the length is a bit long, and this is also something I have been “patching together” for a long time.

Py2 VS Py3

print became a function; in Python 2 it is a keyword
No more unicode objects; default str is unicode
Division in Python 3 returns a float
No long type
xrange does not exist; range replaces xrange
Function names and variable names can be defined in Chinese
Advanced unpacking and * unpacking
Keyword-only arguments: variables after * must be defined as name=value
raise from
iteritems removed and replaced with items()
yield from connects child generators
Native coroutine support for asynchronous programming with asyncio, async/await
New modules: enum, mock, ipaddress, concurrent.futures, asyncio, urllib, selector

Different enum classes cannot be compared
Only equality comparison is allowed within the same enum class
Usage of enum class (default numbering starts from 1)
To avoid duplicate enum values, use @unique to decorate the enum class

# Notes on enumerations
from enum import Enum

class COLOR(Enum):
    YELLOW=1
#YELLOW=2# will raise an error
    GREEN=1# will not raise an error; GREEN can be considered an alias for YELLOW
    BLACK=3
    RED=4
print(COLOR.GREEN)#COLOR.YELLOW will still print YELLOW
for i in COLOR:# Iterating over COLOR will not include GREEN
    print(i)
#COLOR.YELLOW
COLOR.BLACK
COLOR.RED
# How to iterate over aliases
for i in COLOR.__members__.items():
    print(i)
# output:('YELLOW', &lt;COLOR.YELLOW: 1&gt;)
('GREEN', &lt;COLOR.YELLOW: 1&gt;)
('BLACK', &lt;COLOR.BLACK: 3&gt;)
('RED', &lt;COLOR.RED: 4&gt;)
for i in COLOR.__members__:
    print(i)
# output:YELLOW
GREEN
BLACK
RED

# Enum conversion
# It is better to use the numerical value of the enum when storing in the database rather than the string name
# Use enum classes in the code
a=1
print(COLOR(a))# output:COLOR.YELLOW

Py2/3 Conversion Tools

six module: A module compatible with Python 2 and Python 3
2to3 tool: Changes code syntax version
__future__: Use features from the next version

Common Libraries

Must-know collections

https://segmentfault.com/a/1190000017385799
Python sorting operations and heapq module

https://segmentfault.com/a/1190000017383322
itertools module super useful methods

https://segmentfault.com/a/1190000017416590

Less Common but Important Libraries

dis (code bytecode analysis)
inspect (generator state)
cProfile (performance analysis)
bisect (maintaining ordered lists)
fnmatch

fnmatch(string,”*.txt”) # Case insensitive on Windows
fnmatch is determined by the system
fnmatchcase is case sensitive

timeit (code execution time)

    def isLen(strString):
        # It is still recommended to use the ternary expression, faster
        return True if len(strString)&gt;6 else False

    def isLen1(strString):
        # Note the position of false and true here
        return [False,True][len(strString)&gt;6]
    import timeit
    print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))

    print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))

contextlib

@contextlib.contextmanager makes generator functions into context managers

types (contains all type objects defined by the standard interpreter, can mark generator functions as asynchronous)

    import types
    types.coroutine # equivalent to implementing __await__

html (implements HTML escaping)

    import html
    html.escape("&lt;h1&gt;I'm Jim&lt;/h1&gt;") # output:'&amp;lt;h1&amp;gt;I&amp;#x27;m Jim&amp;lt;/h1&amp;gt;'
html.unescape('&amp;lt;h1&amp;gt;I&amp;#x27;m Jim&amp;lt;/h1&amp;gt;') # &lt;h1&gt;I'm Jim&lt;/h1&gt;

mock (solving test dependencies)
concurrent (creating process pools and thread pools)

from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor()
task = pool.submit(function_name,(parameters)) # This method does not block, will return immediately
task.done()# Check if the task execution is complete
task.result()# Blocking method, check the task return value
task.cancel()# Cancel unexecuted tasks, returns True or False, if cancellation is successful returns True
task.add_done_callback()# Callback function
task.running()# Is it currently executing? task is a Future object

for data in pool.map(function, parameter_list):# Returns the list of results of completed tasks, executed in the order of parameters
    print(returned task completion result data)

from concurrent.futures import as_completed
as_completed(task_list)# Returns the list of completed tasks, executes one after another

selector (encapsulates select, user multiplexing IO programming)
asyncio

future=asyncio.ensure_future(coroutine)  # Equivalent to the following method  future=loop.create_task(coroutine)
future.add_done_callback()# Add a callback function after completion
loop.run_until_complete(future)
future.result()# Check the return result

asyncio.wait() accepts an iterable of coroutine objects
asynicio.gather(*iterable_objects,*iterable_objects)    Both results are the same, but gather can cancel in bulk, gather object.cancel()

There can only be one loop in a thread

When loop.stop is called, it must be loop.run_forever() otherwise it will raise an error
loop.run_forever() can execute non-coroutines
Finally, execute finally module loop.close()

asyncio.Task.all_tasks() gets all tasks and then iterates through them to use task.cancel() to cancel

Partial function partial(function, parameters) wraps the function into another function name, its parameters must be placed before the defined function parameters

loop.call_soon(function, parameters)
call_soon_threadsafe() thread safe
loop.call_later(time, function, parameters)
In the same code block call_soon executes first, then multiple laters execute in ascending order based on time

If blocking code must be run
Use loop.run_in_executor(executor, function, parameters) to wrap it into a thread, then put it into a task list, and run it using wait(task list)

Implement HTTP with asyncio
reader,writer=await asyncio.open_connection(host,port)
writer.writer() sends requests
async for data in reader:
    data=data.decode("utf-8")
    list.append(data)
Then the list contains the HTML

Python Advanced

Inter-process communication:

Manager (built-in many data structures, can achieve memory sharing between processes)

from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
    p_dict[key] = value

if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)

Pipe (suitable for two processes)

from multiprocessing import Pipe,Process
# Pipe performs better than queue
def producer(pipe):
    pipe.send("bobby")

def consumer(pipe):
    print(pipe.recv())

if __name__ == "__main__":
    recevie_pipe, send_pipe = Pipe()
    # Pipe can only be used for two processes
    my_producer= Process(target=producer, args=(send_pipe, ))
    my_consumer = Process(target=consumer, args=(recevie_pipe,))

    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

Queue (cannot be used for process pools, inter-process communication in process pools requires Manager().Queue())

from multiprocessing import Queue,Process
def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

Process Pool

def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()

Common Methods in sys Module

argv command line parameter list, the first is the path of the program itself
path returns the module search path
modules.keys() returns the list of all imported modules
exit(0) exits the program

Concise way to check if a, b, or c is in s

Using any: all() returns True if any iterable is empty

    # Method one
    True in [i in s for i in [a,b,c]]
    # Method two
    any(i in s for i in [a,b,c])
    # Method three
    list(filter(lambda x:x in s,[a,b,c]))

Set operations

{1,2}.issubset({1,2,3})# Check if it is a subset
{1,2,3}.issuperset({1,2})
{}.isdisjoint({})# Check if the intersection of two sets is empty; if empty, returns True

Matching Chinese characters in code

[u4E00-u9FA5] matches the range of Chinese characters [one to 龥]

Check system default encoding format

    import sys
    sys.getdefaultencoding()    # setdefaultencodeing() sets the system encoding

getattr VS getattribute

class A(dict):
    def __getattr__(self,value):# Returns when accessing a non-existing property
        return 2
    def __getattribute__(self,item):# Shield all element access
        return item

Class variables will not be stored in the instance __dict__, only exist in the class’s __dict__
globals/locals (can manipulate code indirectly)

globals stores all variable properties and values in the current module
locals stores all variable properties and values in the current environment

Python variable name resolution mechanism (LEGB)

Local scope (Local)
The current scope nested local scope (Enclosing locals)
Global/module scope (Global)
Built-in scope (Built-in)

Implement grouping of 1-100 every three

    print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])

What is a metaclass?

It is the class that creates classes. When creating a class, just set metaclass=metaclass, and the metaclass must inherit from type rather than object, because type is the metaclass

type.__bases__  #(&lt;class 'object'&gt;,)
object.__bases__    #()
type(object)    #&lt;class 'type'&gt;

    class Yuan(type):
        def __new__(cls,name,base,attr,*args,**kwargs):
            return type(name,base,attr,*args,**kwargs)
    class MyClass(metaclass=Yuan):
        pass

What is duck typing (i.e., polymorphism)?

Python does not default to judging parameter types during the use of passed parameters; as long as the parameters meet the execution conditions, they can be executed

Deep copy and shallow copy

Deep copy copies content, shallow copy copies address (increases reference count)
The copy module implements deep copy

Unit testing

Generally, test classes inherit from unittest’s TestCase module
pytest module quick testing (methods start with test_/test files start with test_/test classes start with Test and cannot have init methods)
coverage statistics test coverage

    class MyTest(unittest.TestCase):
        def tearDown(self):# Executed before each test case
            print('This method starts testing')

        def setUp(self):# Operations before each test case
            print('This method ends testing')

        @classmethod
        def tearDownClass(self):# Must use @classmethod decorator, runs once after all tests
            print('Start testing')
        @classmethod
        def setUpClass(self):# Must use @classmethod decorator, runs once before all tests
            print('End testing')

        def test_a_run(self):
            self.assertEqual(1, 1)  # Test case

GIL will release based on the number of executed bytecode lines and time slices, GIL actively releases when encountering IO operations
What is monkey patch?

Monkey patch, replaces blocking syntax with non-blocking methods during runtime

What is introspection?

The ability to determine the type of an object at runtime, id, type, isinstance

Is Python pass-by-value or pass-by-reference?

Neither; Python passes parameters by sharing, and default parameters are executed only once during execution

Difference between else and finally in try-except-else-finally

else executes when no exception occurs; finally executes regardless of whether an exception occurs
except can catch multiple exceptions at once, but generally for different exceptions, we capture and handle them separately

GIL Global Interpreter Lock

Only one thread can execute at a time, a feature of CPython (IPython), other interpreters do not exist
CPU-bound: multi-process + process pool
IO-bound: multi-thread/coroutine

What is Cython?

A tool to convert Python into C code

Generators and iterators

Iterable objects only need to implement the __iter__ method

Objects implementing __next__ and __iter__ methods are iterators

Generator functions using generator expressions or yield (generators are a special type of iterator)

What is a coroutine?

yield
async-await

A lighter-weight multitasking method than threads
Implementation method

Underlying structure of dict

To support fast lookup, it uses a hash table as the underlying structure
The average time complexity of hash table lookup is O(1)
CPython interpreter uses double probing to solve hash collision problems

Hash expansion and hash collision solutions

Chaining method
Double probing (open addressing method): Python uses

Cycle copy to new space to achieve expansion
Collision resolution:

    for gevent import monkey
    monkey.patch_all()  # Modify all blocking methods in the code, can specify specific methods to modify

Check if it is a generator or coroutine

    co_flags = func.__code__.co_flags

    # Check if it is a coroutine
    if co_flags &amp; 0x180:
        return func

    # Check if it is a generator
    if co_flags &amp; 0x20:
        return func

Fibonacci problem and its variations

# A frog can jump up one or two steps at a time. How many ways can it jump up n steps?
# How many ways can n 2*1 rectangles cover a 2*n rectangle without overlapping?
# Method one:
fib = lambda n: n if n &lt;= 2 else fib(n - 1) + fib(n - 2)
# Method two:
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return b

# A frog can jump up one or two steps at a time... it can also jump up n steps. How many ways can it jump up n steps?
fib = lambda n: n if n &lt; 2 else 2 * fib(n - 1)

Get the environment variables set on the computer

    import os
    os.getenv(env_name,None)# Get environment variable, if it does not exist return None

Garbage collection mechanism

Reference counting
Mark and sweep
Generational garbage collection

    # Check the trigger of generational garbage collection
    import gc
    gc.get_threshold()  #output:(700, 10, 10)

True and False are equivalent to 1 and 0 in code and can be calculated directly with numbers; inf represents infinity
C10M/C10K

C10M: 8-core CPU, 64GB memory, maintaining 10 million concurrent connections on a 10gbps network
C10K: 1GHz CPU, 2GB memory, maintaining 10,000 clients providing FTP service on a 1gbps network

Difference between yield from and yield:

yield from follows an iterable object, while yield has no restrictions
GeneratorExit is triggered when the generator stops

Several uses of a single underscore

When defining a variable, indicates a private variable
When unpacking, indicates discarding useless data
In interactive mode, represents the result of the last executed code
Can be used to concatenate numbers (111_222_333)

Using break will not execute else
Decimal to binary conversion

    def conver_bin(num):
        if num == 0:
            return num
        re = []
        while num:
            num, rem = divmod(num,2)
            re.append(str(rem))
        return "".join(reversed(re))
    conver_bin(10)

list1 = [‘A’, ‘B’, ‘C’, ‘D’] How can I get a new list named after the elements in the list A=[],B=[],C=[],D=[]?

    list1 = ['A', 'B', 'C', 'D']

    # Method one
    for i in list1:
        globals()[i] = []   # Can be used to implement Python reflection

    # Method two
    for i in list1:
        exec(f'{i} = []')   # exec executes string statements

memoryview and bytearray (not commonly used, just recorded)

    # bytearray is mutable, bytes is immutable, memoryview does not produce new slices and objects
a = 'aaaaaa'
ma = memoryview(a)
ma.readonly  # Read-only memoryview
mb = ma[:2]  # Will not produce a new string

a = bytearray('aaaaaa')
ma = memoryview(a)
ma.readonly  # Writable memoryview
mb = ma[:2]      # Will not produce a new bytearray
mb[:2] = 'bb'    # Changes to mb are changes to ma

Ellipsis type

# The occurrence of ... in code is an Ellipsis object
L = [1,2,3]
L.append(L)
print(L)    # output:[1,2,3,[…]]

Lazy evaluation

    class lazy(object):
        def __init__(self, func):
            self.func = func

        def __get__(self, instance, cls):
            val = self.func(instance)    # It is equivalent to executing area(c), c is the Circle object below
            setattr(instance, self.func.__name__, val)
            return val

    class Circle(object):
        def __init__(self, radius):
            self.radius = radius

        @lazy
        def area(self):
            print('evaluate')
            return 3.14 * self.radius ** 2

Traverse files, pass in a folder, and print out all file paths (recursively)

all_files = []
def getAllFiles(directory_path):
    import os
    for sChild in os.listdir(directory_path):
        sChildPath = os.path.join(directory_path,sChild)
        if os.path.isdir(sChildPath):
            getAllFiles(sChildPath)
        else:
            all_files.append(sChildPath)
    return all_files

File name handling when storing files

# secure_filename converts a string into a safe file name
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool cml
euts.txt') # output:i_contain_cool_umlauts.txt

Date formatting

from datetime import datetime

datetime.now().strftime("%Y-%m-%d")

import time
# Only localtime can be formatted; time cannot be formatted
time.strftime("%Y-%m-%d",time.localtime())

Strange issue with tuple using +=

# Will raise an error, but the value of the tuple will change because t[1] id has not changed
t=(1,[2,3])
t[1]+=[4,5]
# Using append	extend on t[1] will not raise an error and can execute successfully

__missing__ you should know

class Mydict(dict):
    def __missing__(self,key): # When Mydict uses slice access and the property does not exist, it returns the value
        return key

+ and +=

# + cannot be used to concatenate lists and tuples, but += can (via iadd, the internal implementation is extends(), so it can increase tuples), + will create a new object
# Immutable objects do not have __iadd__, so they use __add__ directly, so tuples can use += to add between tuples

How to turn each element of an iterable into all keys of a dictionary?

dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}

Wireshark packet capture software

Networking Knowledge

What is HTTPS?

Secure HTTP protocol, HTTPS requires CS certificates, data encryption, port 443, secure; the same website with HTTPS will rank higher in SEO

Common response status codes

    204 No Content // Request successfully processed, no entity body returned, generally used to indicate successful deletion
    206 Partial Content // GET range request has been successfully processed
    303 See Other // Temporary redirect, expect to use GET to redirect to obtain
    304 Not Modified // Request cached resources
    307 Temporary Redirect // Temporary redirect, POST will not change to GET
    401 Unauthorized // Authentication failed
    403 Forbidden // Resource request denied
    400 // Request parameter error
    201 // Add or change successfully
    503 // Server maintenance or overload

Idempotency and security of HTTP request methods
WSGI

    # environ: a dict object containing all HTTP request information
    # start_response: a function that sends HTTP responses
    def application(environ, start_response):
        start_response('200 OK', [('Content-Type', 'text/html')])
        return '&lt;h1&gt;Hello, web!&lt;/h1&gt;'

RPC
CDN
SSL (Secure Sockets Layer), and its successor Transport Layer Security (TLS), is a security protocol that provides security and data integrity for network communications.
SSH (Secure Shell) is a security protocol established on the application layer; it is currently a reliable protocol designed for remote login sessions and other network services. SSH can effectively prevent information leakage during remote management processes. SSH was initially a program on UNIX systems, and later quickly expanded to other operating platforms. When used correctly, SSH can compensate for vulnerabilities in the network. SSH clients are suitable for various platforms. Almost all UNIX platforms—including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix—can run SSH.
TCP/IP

TCP: connection-oriented/reliable/byte stream-based
UDP: connectionless/unreliable/message-oriented
Three-way handshake and four-way handshake

Three-way handshake (SYN/SYN+ACK/ACK)
Four-way handshake (FIN/ACK/FIN/ACK)

Why is there a three-way handshake during connection but a four-way handshake during disconnection?

Because when the server receives the client’s SYN connection request packet, it can directly send the SYN+ACK packet. The ACK packet is used to respond, and the SYN packet is used to synchronize. However, when closing the connection, when the server receives the FIN packet, it may not immediately close the SOCKET, so it can only first reply with an ACK packet to tell the client, “I received your FIN packet.” Only after the server has finished sending all its packets can it send the FIN packet, so they cannot be sent together. Thus, four steps are needed for the handshake.

Why does the TIME_WAIT state need to pass through 2MSL (maximum segment lifetime) to return to the CLOSE state?

Although theoretically, after the four packets have been sent, we can directly enter the CLOSE state, we must assume that the network is unreliable, and the last ACK may be lost. Therefore, the TIME_WAIT state is used to resend potentially lost ACK packets.

XSS/CSRF

HttpOnly prohibits JS scripts from accessing and manipulating cookies, effectively preventing XSS

MySQL

Index improvement process

Linear structure -> binary search -> hash -> binary search tree -> balanced binary tree -> multi-way search tree -> multi-way balanced search tree (B-Tree)

MySQL interview summary basics

https://segmentfault.com/a/1190000018371218
MySQL interview summary advanced

https://segmentfault.com/a/1190000018380324
In-depth MySQL

http://ningning.today/2017/02/13/database/深入浅出mysql/
When emptying an entire table, InnoDB deletes one row at a time, while MyISAM deletes and rebuilds the table
text/blob data types cannot have default values; there is no case conversion during queries
When does an index become invalid?

LIKE queries starting with %
Implicit type conversion occurs
Does not satisfy the most left prefix principle

For multi-column indexes, if the first part is not used, the index will not be used

Invalidation scenarios:

Avoid using != or <> operators in the where clause, otherwise the engine will give up using the index and perform a full table scan
Avoid using or to connect conditions in the where clause, otherwise it will cause the engine to give up using the index and perform a full table scan, even if some conditions have indexes, this is also why or should be used sparingly
If the column type is a string, be sure to quote the data in the condition, otherwise the index will not be used
Avoid performing function operations on fields in the where clause, which will cause the engine to give up using the index and perform a full table scan

For example:
select id from t where substring(name,1,3) = 'abc' – name;
Starting with abc should be changed to:
select id from t where name like 'abc%'
For example:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
Should be changed to:

Avoid performing functions, arithmetic operations, or other expression operations on the left side of “=” in the where clause, otherwise the system may not be able to use the index correctly
Avoid performing expression operations on fields in the where clause, which will cause the engine to give up using the index and perform a full table scan

For example:
select id from t where num/2 = 100
Should be changed to:
select id from t where num = 100*2;

Not suitable for columns with fewer key values (columns with more duplicate data), for example: set enum columns are not suitable (enum type can add null, and the default values will automatically filter out spaces; set is similar to enum but can only add 64 values)
If MySQL estimates that using a full table scan is faster than using an index, it will not use the index

What is a clustered index?

B+Tree leaf nodes store data or pointers
MyISAM index and data are separated, using non-clustered
InnoDB data file is the index file, the primary key index is the clustered index

Redis Command Summary

Why is it so fast?

Based on memory, written in C
Uses multiplexing I/O model, non-blocking I/O
Uses single-threading to reduce thread switching

Since Redis operates based on memory, CPU is not the bottleneck for Redis; the bottleneck is most likely the size of the machine’s memory or network bandwidth. Since single-threading is easy to implement, and CPU will not become a bottleneck, it is reasonable to adopt a single-threaded solution (after all, using multi-threading can be troublesome!).

Simple data structures
Built its own VM mechanism to reduce the time spent calling system functions

Advantages

High performance – Redis can read at a speed of 110,000 times/s, write at a speed of 81,000 times/s
Rich data types
Atomic – All operations of Redis are atomic, and Redis also supports atomic execution of several operations combined
Rich features – Redis also supports publish/subscribe, notifications, key expiration, and other features

What is a Redis transaction?

A mechanism to package multiple requests and execute multiple commands at once and in order
Transaction functions are implemented through multi, exec, watch, etc.
Python redis-py pipeline=conn.pipeline(transaction=True)

Persistence methods

RDB (snapshot)

save (synchronous, can guarantee data consistency)
bgsave (asynchronous, during shutdown, defaults to using without AOF)

AOF (append-only file)

How to implement a queue

push
rpop

Common data types (Bitmaps, Hyperloglogs, range queries, etc. are not commonly used)

String: Counter

Integer or sds (Simple Dynamic String)

List: User’s followers, fans list

ziplist (continuous memory blocks, each entry node header saves information on the lengths of previous and next nodes to implement a doubly linked list function) or double linked list

Hash:
Set: User’s followers

intset or hashtable

Zset: Real-time information leaderboard

skiplist (skip list)

Difference from Memcached

Memcached can only store string keys
Memcached users can only append data to the end of an existing string using APPEND and treat this string as a list. However, when deleting these elements, Memcached uses a blacklist method to hide elements in the list, thus avoiding operations of reading, updating, and deleting elements
Both Redis and Memcached store data in memory and are in-memory databases. However, Memcached can also be used to cache other things, such as images, videos, etc.
Virtual memory – Redis can swap some rarely used values to disk when physical memory runs out
Data storage safety – Data is lost when Memcached crashes; Redis can periodically save to disk (persistence)
Different application scenarios: Redis can be used as a NoSQL database as well as for message queues, data stacks, and data caching; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data, and sessions, etc.

Redis implements distributed locks

Use setnx to implement locking, can also add expiration time through expire
The lock’s value can be a random UUID or a specific name
When releasing the lock, determine if it is that lock by the UUID, and if so, execute delete to release the lock

Common issues

Cache avalanche

Cache data expires in a short time, a large number of requests access the database

Cache penetration

When requesting to access data, it does not exist in the cache or the database

Cache warming

Initialize the project, add some commonly used data to the cache

Cache update

Data expires, update cached data

Cache degradation

When the access volume surges, services experience problems (such as slow response times or no response) or non-core services affect the performance of core processes, it is still necessary to ensure that the service remains available, even if it means degraded service. The system can automatically degrade based on some key data or configure a switch for manual degradation

Consistent Hashing Algorithm

Ensure data consistency when using clusters

Implement a distributed lock based on Redis, requiring a timeout parameter

setnx

Virtual memory
Memory jitter

Linux

Five I/O models in Unix

Blocking I/O
Non-blocking I/O
Multiplexing I/O (using selectot to implement I/O multiplexing in Python)

select

Not high concurrency, in the case of very active connection numbers

poll

Not much improvement over select

epoll

Suitable for cases with many connections but few active links

Signal-driven I/O
Asynchronous I/O (Gevent/Asyncio implements asynchrony)

Command manual better than man

tldr: A manual with command examples

Difference between kill -9 and -15

-15: program stops immediately/when the program releases corresponding resources it stops/ the program may still continue running
-9: Because of the uncertainty of -15, -9 is used to immediately kill the process

Paging mechanism (logical address and physical address separation memory allocation management scheme):

The operating system manages memory efficiently to reduce fragmentation
Logical addresses of programs are divided into fixed-size pages
Physical addresses are divided into frames of the same size
Corresponding logical addresses and physical addresses through page tables

Segmentation mechanism

To meet some logical requirements of code
Data sharing/data protection/dynamic linking
Each segment is allocated contiguous memory, while segments are allocated discretely

How to check CPU memory usage?

top
free to check available memory, troubleshoot memory leak issues

Design Patterns

Singleton Pattern

    # Method one
    def Single(cls,*args,**kwargs):
        instances = {}
        def get_instance (*args, **kwargs):
            if cls not in instances:
                instances[cls] = cls(*args, **kwargs)
            return instances[cls]
        return get_instance
    @Single
    class B:
        pass
    # Method two
    class Single:
        def __init__(self):
            print("Singleton pattern implementation method two...")

    single = Single()
    del Single  # Every time, just call single
    # Method three (most commonly used method)
    class Single:
        def __new__(cls,*args,**kwargs):
            if not hasattr(cls,'_instance'):
                cls._instance = super().__new__(cls,*args,**kwargs)
            return cls._instance

Factory Pattern

    class Dog:
        def __init__(self):
            print("Wang Wang Wang")
    class Cat:
        def __init__(self):
            print("Miao Miao Miao")


    def fac(animal):
        if animal.lower() == "dog":
            return Dog()
        if animal.lower() == "cat":
            return Cat()
        print("Sorry, must be: dog, cat")

Builder Pattern

    class Computer:
        def __init__(self,serial_number):
            self.serial_number = serial_number
            self.memory = None
            self.hadd = None
            self.gpu = None
        def __str__(self):
            info = (f'Memory:{self.memoryGB}',
            'Hard Disk:{self.hadd}GB',
            'Graphics Card:{self.gpu}')
            return ''.join(info)
    class ComputerBuilder:
        def __init__(self):
            self.computer = Computer('Jim1996')
        def configure_memory(self,amount):
            self.computer.memory = amount
            return self # To facilitate chain calling
        def configure_hdd(self,amount):
            pass
        def configure_gpu(self,gpu_model):
            pass
    class HardwareEngineer:
        def __init__(self):
            self.builder = None
        def construct_computer(self,memory,hdd,gpu):
            self.builder = ComputerBuilder()
            self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
        @property
        def computer(self):
            return self.builder.computer

Built-in Data Structures and Algorithms in Python

Python Implementation of Various Data Structures

Quick Sort

    def quick_sort(_list):
        if len(_list) &lt; 2:
            return _list
        pivot_index = 0
        pivot = _list(pivot_index)
        left_list = [i for i in _list[:pivot_index] if i &lt; pivot]
        right_list = [i for i in _list[pivot_index:] if i &gt; pivot]
    return quick_sort(left) + [pivot] + quick_sort(right)

Selection Sort

    def select_sort(seq):
        n = len(seq)
        for i in range(n-1)
        min_idx = i
            for j in range(i+1,n):
                if seq[j] &lt; seq[min_inx]:
                    min_idx = j
            if min_idx != i:
                seq[i], seq[min_idx] = seq[min_idx],seq[i]

Insertion Sort

    def insertion_sort(_list):
        n = len(_list)
        for i in range(1,n):
            value = _list[i]
            pos = i
            while pos &gt; 0 and value &lt; _list[pos - 1]:
                _list[pos] = _list[pos - 1]
                pos -= 1
            _list[pos] = value
            print(sql)

Merge Sort

    def merge_sorted_list(_list1,_list2):   # Merge ordered lists
        len_a, len_b = len(_list1),len(_list2)
        a = b = 0
        sort = []
        while len_a &gt; a and len_b &gt; b:
            if _list1[a] &gt; _list2[b]:
                sort.append(_list2[b])
                b += 1
            else:
                sort.append(_list1[a])
                a += 1
        if len_a &gt; a:
            sort.append(_list1[a:])
        if len_b &gt; b:
            sort.append(_list2[b:])
        return sort

    def merge_sort(_list):
        if len(list1)&lt;2:
            return list1
        else:
            mid = int(len(list1)/2)
            left = mergesort(list1[:mid])
            right = mergesort(list1[mid:])
            return merge_sorted_list(left,right)

Heap Sort using heapq module

    from heapq import nsmallest
def heap_sort(_list):
    return nsmallest(len(_list),_list)

Stack

    from collections import deque
    class Stack:
        def __init__(self):
            self.s = deque()
        def peek(self):
            p = self.pop()
            self.push(p)
            return p
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.pop()

Queue

    from collections import deque
    class Queue:
        def __init__(self):
            self.s = deque()
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.popleft()

Binary Search

    def binary_search(_list,num):
        mid = len(_list)//2
        if len(_list) &lt; 1:
            return Flase
        if num &gt; _list[mid]:
            BinarySearch(_list[mid:],num)
        elif num &lt; _list[mid]:
            BinarySearch(_list[:mid],num)
        else:
            return _list.index(num)

Interview Strengthening Questions:

About database optimization and design

https://segmentfault.com/a/1190000018426586

How to implement a queue using two stacks
Reverse a linked list
Merge two sorted linked lists
Delete a node in a linked list
Reverse a binary tree
Design a short URL service? 62-base implementation
Design a flash sale system (feed stream)?

https://www.jianshu.com/p/ea0259d109f9
Why is it better to use auto-incrementing integers as primary keys in MySQL databases? Can UUID be used? Why?

If the write order of InnoDB table data can be consistent with the order of the leaf nodes of the B+ tree index, the storage and retrieval efficiency will be the highest. To optimize storage and query performance, auto-incrementing IDs should be used as primary keys.
For the primary index of InnoDB, data will be sorted according to the primary key. Due to the unordered nature of UUID, InnoDB will generate huge IO pressure, so it is not suitable to use UUID as a physical primary key; it can be used as a logical primary key, while the physical primary key still uses auto-incrementing IDs. To ensure global uniqueness, UUID should be used to index and associate other tables or as foreign keys.

If we are in a distributed system, how do we generate auto-incrementing IDs in databases?

Use Redis

Implement a distributed lock based on Redis, requiring a timeout parameter

setnx
setnx + expire

If a single Redis node crashes, how to handle it? Are there other industry solutions for implementing distributed locks?

Use a consistent hashing algorithm

Cache Algorithms

LRU (least-recently-used): Replace the least recently used objects
LFU (Least frequently used): If a piece of data has been used very few times in a recent period, it is unlikely to be used in the future

Server Performance Optimization Directions

Use data structures and algorithms
Database

Index optimization
Eliminate slow queries

slow_query_log_file turned on and log slow queries
Use explain to troubleshoot index issues
Adjust data modification indexes

Batch operations to reduce IO operations
Use NoSQL: e.g., Redis

Network IO

Batch operations
pipeline

Cache

Redis

Asynchronous

Asyncio implements asynchronous operations
Use Celery to reduce IO blocking

Concurrency

Multithreading
Gevent

Good news!
Visual Learning for Beginners knowledge community
is now open to the public 👇👇👇




Download 1: OpenCV-Contrib extension module Chinese tutorial

Reply “Extension Module Chinese Tutorial” in the background of the “Visual Learning for Beginners” public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: 52 Python Visual Practical Projects

Reply “Python Visual Practical Projects” in the background of the “Visual Learning for Beginners” public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition, to help you quickly learn computer vision.

Download 3: 20 OpenCV Practical Projects

Reply “20 OpenCV Practical Projects” in the background of the “Visual Learning for Beginners” public account to download 20 practical projects based on OpenCV to achieve advanced learning of OpenCV.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups on SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually subdivide in the future). Please scan the WeChat ID below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for notes, otherwise you will not be approved. After successful addition, you will be invited into the relevant WeChat group based on research direction. Please do not send advertisements in the group, otherwise you will be removed from the group. Thank you for your understanding~

Related posts

Leave a Comment Cancel reply