Follow 👆 the public account and reply "python" to get a zero-based tutorial! Source from the internet, please delete if infringing.
This is a summary of key Python knowledge compiled by developer @TwentyOne from SegmentFault. Due to the extensive nature of the summary, the length is a bit long, and this is also the result of the author’s “patching up” for a long time.

[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]
Py2 VS Py3
-
print has become a function, in python2 it is a keyword
-
No more unicode objects, default str is unicode
-
In python3, division returns a float
-
No long type anymore
-
xrange does not exist, range replaces xrange
-
Functions and variable names can be defined in Chinese
-
Advanced unpacking and * unpacking
-
Keyword arguments must include name=value after *
-
raise from
-
iteritems removed and changed to items()
-
yield from links to sub-generators
-
asyncio, async/await natively support asynchronous programming
-
New additions: enum, mock, ipaddress, concurrent.futures, asyncio urllib, selector
-
Different enumeration classes cannot be compared
-
Only equality comparison is allowed within the same enumeration class
-
Usage of enumeration class (numbering starts from 1 by default)
-
To avoid the occurrence of the same enumeration value in the enumeration class, the @unique decorator can be used on the enumeration class
# Notes on enumerations
from enum import Enum
class COLOR(Enum):
YELLOW=1
#YELLOW=2# will report an error
GREEN=1# will not report an error, GREEN can be considered an alias of YELLOW
BLACK=3
RED=4
print(COLOR.GREEN)#COLOR.YELLOW, will still print YELLOW
for i in COLOR:# traversing COLOR will not include GREEN
print(i)
#COLOR.YELLOW
COLOR.BLACK
COLOR.RED
How to traverse aliases
for i in COLOR.__members__.items():
print(i)
# output:('YELLOW', <COLOR.YELLOW: 1>)
('GREEN', <COLOR.YELLOW: 1>)
('BLACK', <COLOR.BLACK: 3>)
('RED', <COLOR.RED: 4>)
for i in COLOR.__members__:
print(i)
# output:YELLOW
GREEN
BLACK
RED
# Enumeration conversion
# It's best to use the numeric value of the enumeration for storage in the database rather than using the label name string
# Use enumeration classes in the code
a=1
print(COLOR(a))# output:COLOR.YELLOW
py2/3 Conversion Tools
-
six module: a module compatible with python2 and python3
-
2to3 tool: changes code syntax version
-
__future__: use features from the next version
Common Libraries
-
Collections that must be known
https://segmentfault.com/a/1190000017385799
-
Python sorting operations and heapq module
https://segmentfault.com/a/1190000017383322
-
itertools module super practical methods
https://segmentfault.com/a/1190000017416590
Less Common but Very Important Libraries
-
dis (code bytecode analysis)
-
inspect (generator state)
-
cProfile (performance analysis)
-
bisect (maintain ordered list)
-
fnmatch
-
fnmatch(string,”*.txt”) # case insensitive on win
-
fnmatch depends on the system
-
fnmatchcase is case sensitive
-
timeit (code execution time)
def isLen(strString):
# should still use ternary expression, faster
return True if len(strString)>6 else False
def isLen1(strString):
# note the position of false and true here
return [False,True][len(strString)>6]
import timeit
print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))
print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))
-
contextlib
-
@contextlib.contextmanager makes generator functions into context managers
-
types (contains all types of type objects defined by the standard interpreter, can decorate generator functions as asynchronous)
import types
types.coroutine # equivalent to implementing __await__
-
html (implement HTML escaping)
import html
html.escape("<h1>I'm Jim</h1>") # output:'&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;'
html.unescape('&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;') # <h1>I'm Jim</h1>
-
mock (solving test dependencies)
-
concurrent (creating process pools and thread pools)
from concurrent.futures import ThreadPoolExecutor
pool = ThreadPoolExecutor()
task = pool.submit(function_name,(parameters)) # this method will not block and will return immediately
task.done()# check if the task is completed
task.result()# blocking method, check the return value of the task
task.cancel()# cancel the unexecuted task, returns True or False, returns True if canceled successfully
task.add_done_callback()# callback function
task.running()# whether it is executing task is a Future object
for data in pool.map(function, parameter_list):# returns a list of results of completed tasks, executed in the order of parameters
print(the result of the completed task data)
from concurrent.futures import as_completed
as_completed(task_list)# returns a list of completed tasks, executes one as soon as it completes
wait(task_list,return_when=condition)# block the main thread according to the condition, there are four conditions
-
selector (wraps select, user multiplexing IO programming)
-
asyncio
future=asyncio.ensure_future(coroutine) equals the following method future=loop.create_task(coroutine)
future.add_done_callback() adds a callback function after completion
loop.run_until_complete(future)
future.result() check the return result
asyncio.wait() accepts an iterable coroutine object
asynicio.gather(*iterable_object,*iterable_object) both results are the same, but gather can cancel in bulk, gather object.cancel()
There is only one loop in a thread
When loop.stop, make sure to loop.run_forever() otherwise an error will occur
loop.run_forever() can execute non-coroutines
Finally execute finally module loop.close()
asyncio.Task.all_tasks() get all tasks and then iterate through and use task.cancel() to cancel
Partial function partial(function, parameters) wraps the function into another function name, its parameters must be placed before the defined function
loop.call_soon(function, parameters)
call_soon_threadsafe() thread-safe
loop.call_later(time, function, parameters)
In the same code block call_soon takes priority, then multiple later execute in ascending order of time
If you must run blocking code
use loop.run_in_executor(executor,function,parameters) wrap into a multi-thread, then put into a task list, run through wait(task list)
Implement HTTP via asyncio
reader,writer=await asyncio.open_connection(host,port)
writer.writer() send request
async for data in reader:
data=data.decode("utf-8")
list.append(data)
then the list contains the HTML
as_completed(tasks) returns one as soon as one is completed, returns an iterable object
Coroutine lock
async with Lock():
[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]
Python Advanced
-
Inter-process communication:
-
Manager (built-in many data structures, can achieve memory sharing between multiple processes)
from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
p_dict[key] = value
if __name__ == "__main__":
progress_dict = Manager().dict()
from queue import PriorityQueue
first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))
first_progress.start()
second_progress.start()
first_progress.join()
second_progress.join()
print(progress_dict)
-
Pipe (suitable for two processes)
from multiprocessing import Pipe,Process
#pipe performance is higher than queue
def producer(pipe):
pipe.send("bobby")
def consumer(pipe):
print(pipe.recv())
if __name__ == "__main__":
recevie_pipe, send_pipe = Pipe()
#pipe can only be used for two processes
my_producer= Process(target=producer, args=(send_pipe, ))
my_consumer = Process(target=consumer, args=(recevie_pipe,))
my_producer.start()
my_consumer.start()
my_producer.join()
my_consumer.join()
-
Queue (cannot be used in process pools, inter-process communication requires Manager().Queue())
from multiprocessing import Queue,Process
def producer(queue):
queue.put("a")
time.sleep(2)
def consumer(queue):
time.sleep(2)
data = queue.get()
print(data)
if __name__ == "__main__":
queue = Queue(10)
my_producer = Process(target=producer, args=(queue,))
my_consumer = Process(target=consumer, args=(queue,))
my_producer.start()
my_consumer.start()
my_producer.join()
my_consumer.join()
-
Process pool
def producer(queue):
queue.put("a")
time.sleep(2)
def consumer(queue):
time.sleep(2)
data = queue.get()
print(data)
if __name__ == "__main__":
queue = Manager().Queue(10)
pool = Pool(2)
pool.apply_async(producer, args=(queue,))
pool.apply_async(consumer, args=(queue,))
pool.close()
pool.join()
-
Common methods of sys module
-
argv command line parameter list, the first is the path of the program itself
-
path returns the module search path
-
modules.keys() returns the list of all imported modules
-
exit(0) exit the program
-
a in s or b in s or c in s shorthand
-
using any: all() returns True for any iterable object that is empty
# Method one
True in [i in s for i in [a,b,c]]
# Method two
any(i in s for i in [a,b,c])
# Method three
list(filter(lambda x:x in s,[a,b,c]))
-
set collection application
-
{1,2}.issubset({1,2,3})# check if it is a subset
-
{1,2,3}.issuperset({1,2})
-
{}.isdisjoint({})# check if the intersection of two sets is empty, if it is empty then True
-
Chinese matching in code
-
[u4E00-u9FA5] matches the range of Chinese characters [one to é¾¥]
-
Check the system’s default encoding format
import sys
sys.getdefaultencoding() # setdefaultencodeing() sets the system encoding method
-
getattr VS getattribute
class A(dict):
def __getattr__(self,value):# returns when accessing a non-existent attribute
return 2
def __getattribute__(self,item):# shield all element access
return item
-
Class variables are not stored in instance __dict__, only exist in class’s __dict__
-
globals/locals (can indirectly operate code)
-
globals contains all variable properties and values in the current module
-
locals contains all variable properties and values in the current environment
-
Python variable name resolution mechanism (LEGB)
-
Local scope (Local)
-
The current scope embedded local scope (Enclosing locals)
-
Global/module scope (Global)
-
Built-in scope (Built-in)
-
Implement grouping from 1-100 every three
print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])
-
What is a metaclass?
-
It is the class that creates classes, when creating a class, just need to set metaclass=metaclass, the metaclass needs to inherit from type instead of object, because type is the metaclass
type.__bases__ #(<class 'object'>,)
object.__bases__ #()
type(object) #<class 'type'>
class Yuan(type):
def __new__(cls,name,base,attr,*args,**kwargs):
return type(name,base,attr,*args,**kwargs)
class MyClass(metaclass=Yuan):
pass
-
What is duck typing (i.e.: polymorphism)?
-
Python does not default to judging parameter types during the process of using input parameters, as long as the parameters meet the execution conditions, they can be executed
-
Deep copy and shallow copy
-
Deep copy copies content, shallow copy copies address (increases reference count)
-
copy module implements deep copy
-
Unit testing
-
Generally, the test class inherits from the TestCase in the unittest module
-
pytest module quick testing (methods starting with test_/test files starting with test_/test classes starting with Test and cannot have init method)
-
coverage statistics testing coverage
-
[The way to obtain is at the end of the article!!]
-
[The way to obtain is at the end of the article!!]
class MyTest(unittest.TestCase):
def tearDown(self):# Execute before each test case
print('This method has started testing')
def setUp(self):# Perform operations before each test case executes
print('This method has finished testing')
@classmethod
def tearDownClass(self):# Must use @classmethod decorator, runs once after all tests have run
print('Start testing')
@classmethod
def setUpClass(self):# Must use @classmethod decorator, runs once before all tests
print('End testing')
def test_a_run(self):
self.assertEqual(1, 1) # test case
-
GIL releases based on the number of executed bytecode lines and time slices, GIL actively releases when encountering IO operations
-
What is monkey patch?
-
Monkey patch, replaces blocking syntax with non-blocking methods during runtime
-
What is introspection?
-
The ability to determine the type of an object at runtime, id, type, isinstance
-
Is Python pass-by-value or pass-by-reference?
-
Neither, Python is shared parameter passing, default parameters are only executed once during execution
-
Difference between else and finally in try-except-else-finally
-
else executes when no exception occurs, finally executes regardless of whether an exception occurs
-
except can catch multiple exceptions at once, but generally for different exceptions, we capture them separately
-
GIL global interpreter lock
-
Only one thread can execute at the same time, a feature of CPython (IPython), other interpreters do not have this
-
CPU-intensive: multi-process + process pool
-
IO-intensive: multi-threading/coroutines
-
What is Cython
-
A tool that compiles Python into C code
-
Generators and iterators
-
Iterable objects only need to implement the __iter__ method
-
Objects that implement __next__ and __iter__ methods are iterators
-
Using generator expressions or generator functions with yield (a generator is a special kind of iterator)
-
What is a coroutine
-
yield
-
async-await
-
A more lightweight way of multitasking than threads
-
Implementation method
-
Dict underlying structure
-
In order to support fast lookup, a hash table is used as the underlying structure
-
The average time complexity for hash table lookup is O(1)
-
The CPython interpreter uses double probing to solve hash collision problems
-
Hash expansion and hash collision resolution solutions
-
Chaining method
-
Double probing (open addressing method): Python uses
-
Circularly copies to a new space to achieve expansion
-
Collision resolution:
from gevent import monkey
monkey.patch_all() # modify all blocking methods in the code, can specify specific methods to modify
-
Check if it is a generator or coroutine
co_flags = func.__code__.co_flags
# Check if it is a coroutine
if co_flags & 0x180:
return func
# Check if it is a generator
if co_flags & 0x20:
return func
-
Fibonacci problems and variations
# A frog can jump up 1 step or 2 steps at a time. How many ways can the frog jump up n steps?
# How many ways can n 2*1 small rectangles cover a 2*n big rectangle without overlapping?
# Method one:
fib = lambda n: n if n <= 2 else fib(n - 1) + fib(n - 2)
# Method two:
def fib(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return b
# A frog can jump up 1 step or 2 steps... it can also jump up n steps. How many ways can the frog jump up n steps?
fib = lambda n: n if n < 2 else 2 * fib(n - 1)
-
Get the environment variables set on the computer
import os
os.getenv(env_name,None)# get the environment variable if it does not exist return None
-
Garbage collection mechanism
-
Reference counting
-
Mark and sweep
-
Generational collection
# Check the threshold for generational collection
import gc
gc.get_threshold() #output:(700, 10, 10)
-
True and False are completely equivalent to 1 and 0 in the code, can be directly calculated with numbers, inf represents infinity
-
C10M/C10K
-
C10M: 8-core CPU, 64G memory, maintaining 10 million concurrent connections on a 10gbps network
-
C10K: 1GHz CPU, 2G memory, maintaining 10 thousand clients providing FTP service under 1gbps network environment
-
Difference between yield from and yield:
-
yield from follows an iterable object, while yield has no restrictions
-
GeneratorExit triggers when the generator stops
-
Several uses of single underscore
-
When defining a variable, it indicates a private variable
-
When unpacking, it indicates discarding useless data
-
In interactive mode, it represents the result of the last code execution
-
Can be used for number concatenation (111_222_333)
-
Using break will not execute else
-
Decimal to binary conversion
[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]
def conver_bin(num):
if num == 0:
return num
re = []
while num:
num, rem = divmod(num,2)
re.append(str(rem))
return "".join(reversed(re))
conver_bin(10)
-
list1 = [‘A’, ‘B’, ‘C’, ‘D’] How can I get a new list named after the elements in the list A=[],B=[],C=[],D=[]?
list1 = ['A', 'B', 'C', 'D']
# Method one
for i in list1:
globals()[i] = [] # can be used to implement python version reflection
# Method two
for i in list1:
exec(f'{i} = []') # exec executes string statements
-
memoryview and bytearray (not commonly used, just recorded)
# bytearray is mutable, bytes is immutable, memoryview does not produce new slices and objects
a = 'aaaaaa'
ma = memoryview(a)
ma.readonly # readonly memoryview
mb = ma[:2] # will not produce new string
a = bytearray('aaaaaa')
ma = memoryview(a)
ma.readonly # writable memoryview
mb = ma[:2] # will not produce new bytearray
mb[:2] = 'bb' # changes to mb will also change ma
-
Ellipsis type
# The phenomenon of ... appearing in the code is an Ellipsis object
L = [1,2,3]
L.append(L)
print(L) # output:[1,2,3,[…]]
-
Lazy evaluation
class lazy(object):
def __init__(self, func):
self.func = func
def __get__(self, instance, cls):
val = self.func(instance) # equivalent to executing area(c), c is the Circle object below
setattr(instance, self.func.__name__, val)
return val
class Circle(object):
def __init__(self, radius):
self.radius = radius
@lazy
def area(self):
print('evaluate')
return 3.14 * self.radius ** 2
-
Traverse files, pass in a folder, print out all file paths inside (recursively)
all_files = []
def getAllFiles(directory_path):
import os
for sChild in os.listdir(directory_path):
sChildPath = os.path.join(directory_path,sChild)
if os.path.isdir(sChildPath):
getAllFiles(sChildPath)
else:
all_files.append(sChildPath)
return all_files
-
When storing files, handling file names
# secure_filename converts strings into safe file names
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool \uc3b4mluts.txt') # output:i_contain_cool_umlauts.txt
-
Date formatting
from datetime import datetime
datetime.now().strftime("%Y-%m-%d")
import time
# Only localtime can be formatted; time cannot be formatted
time.strftime("%Y-%m-%d",time.localtime())
-
Strange issues with tuple using +=
# Will report an error, but the value of the tuple will change because t[1]id has not changed
t=(1,[2,3])
t[1]+=[4,5]
# t[1] using append extend method will not report an error and can be executed successfully
-
__missing__ you should know
class Mydict(dict):
def __missing__(self,key): # returns the value when Mydict accesses the property that does not exist through slicing
return key
-
+ and +=
# + cannot be used to concatenate lists and tuples, while += can (through iadd implementation, internal implementation is extends(), so it can increase tuples), + will create new objects
# Immutable objects do not have __iadd__ method, so they directly use __add__ method, thus tuples can use += to add between tuples
-
How to turn each element of an iterable into all keys of a dictionary?
dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}
-
Wireshark packet capture software
Network Knowledge
-
What is HTTPS?
-
Secure HTTP protocol, https requires a cs certificate, data encryption, port 443, secure, the same website https SEO ranking will be higher
-
Common response status codes
204 No Content // request successfully processed, no entity body returned, generally used to indicate successful deletion
206 Partial Content // Get range request has been successfully processed
303 See Other // temporary redirect, expected to use get redirection to obtain
304 Not Modified // request cached resource
307 Temporary Redirect // temporary redirect, Post will not change to Get
401 Unauthorized // authentication failed
403 Forbidden // resource request denied
400 // request parameter error
201 // add or change successfully
503 // server maintenance or overload
-
Idempotence and safety of HTTP request methods
-
WSGI
# environ: a dict object containing all HTTP request information
# start_response: a function that sends HTTP responses
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
return '<h1>Hello, web!</h1>'
-
RPC
-
CDN
-
SSL (Secure Sockets Layer) and its successor Transport Layer Security (TLS) is a security protocol that provides security and data integrity for network communication.
-
SSH (Secure Shell) is a secure protocol established by the IETF’s Network Working Group; SSH is a security protocol built on the application layer. SSH is currently a more reliable protocol specifically for remote login sessions and other network services. Using the SSH protocol can effectively prevent information leakage during remote management. SSH was originally a program on UNIX systems, but quickly expanded to other operating platforms. SSH can make up for vulnerabilities in the network when used correctly. SSH clients are suitable for various platforms. Almost all UNIX platforms—including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix, and others—can run SSH.
-
[The way to obtain is at the end of the article!!]
-
[The way to obtain is at the end of the article!!]
-
TCP/IP
-
TCP: connection-oriented/reliable/byte stream based
-
UDP: connectionless/unreliable/message-oriented
-
Three-way handshake and four-way handshake
-
Three-way handshake (SYN/SYN+ACK/ACK)
-
Four-way handshake (FIN/ACK/FIN/ACK)
-
Why is it three-way handshake when connecting, but four-way handshake when closing?
-
Because when the Server side receives the Client side’s SYN connection request packet, it can directly send the SYN+ACK packet. The ACK packet is used for acknowledgment, and the SYN packet is used for synchronization. However, when closing the connection, when the Server side receives the FIN packet, it may not immediately close the SOCKET, so it can only reply with an ACK packet to tell the Client side, “I received the FIN packet you sent.” Only after the Server side has sent all its packets can it send the FIN packet, so they cannot be sent together. Hence, four steps are needed to close.
-
Why does the TIME_WAIT state need to go through 2MSL (maximum segment lifetime) to return to the CLOSE state?
-
Although theoretically, after all four packets are sent, we can directly enter the CLOSE state, but we must assume that the network is unreliable, and the last ACK may be lost. Therefore, the TIME_WAIT state is used to resend the possibly lost ACK packet.
-
XSS/CSRF
-
HttpOnly prevents JS scripts from accessing and manipulating Cookies, effectively preventing XSS
MySQL
-
Index improvement process
-
Linear structure -> binary search -> hash -> binary search tree -> balanced binary tree -> multi-way search tree -> multi-way balanced search tree (B-Tree)
-
MySQL interview summary basic version
https://segmentfault.com/a/1190000018371218
-
MySQL interview summary advanced version
https://segmentfault.com/a/1190000018380324
-
Deep dive into MySQL
http://ningning.today/2017/02/13/database/深入浅出mysql/
-
When emptying the entire table, InnoDB deletes row by row, while MyISAM deletes and rebuilds the table
-
text/blob data types cannot have default values, and there is no case conversion during queries
-
When does an index become invalid?
-
like fuzzy queries starting with %
-
Implicit type conversion occurs
-
The leftmost prefix principle is not satisfied
-
For multi-column indexes, if the first part is not used, the index will not be used
-
Invalid scenarios:
-
Try to avoid using != or <> operators in the where clause, otherwise the engine will abandon the use of the index and perform a full table scan
-
Try to avoid using or to connect conditions in the where clause, otherwise it will cause the engine to abandon the use of the index and perform a full table scan, even if some conditions have indexes, this is also why it is advisable to use or less
-
If the column type is a string, it must be enclosed in quotes in the condition, otherwise the index will not be used
-
Try to avoid performing function operations on fields in the where clause, as this will cause the engine to abandon the use of the index and perform a full table scan
For example:
select id from t where substring(name,1,3) = 'abc' – name;
Starting with abc, should change to:
select id from t where name like 'abc%'
For example:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
Should change to:
-
Do not perform functions, arithmetic operations, or other expression operations on the left side of “=” in the where clause, otherwise the system may not be able to correctly use the index
-
Try to avoid performing expression operations on fields in the where clause, as this will cause the engine to abandon the use of the index and perform a full table scan
For example:
select id from t where num/2 = 100
Should change to:
select id from t where num = 100*2ï¼›
-
Not suitable for columns with fewer key values (columns with more duplicate data) such as: set enum columns are not suitable (enumeration type (enum) can add null, and the default value will automatically filter the empty space collection (set) and enumeration is similar, but can only add 64 values)
-
If MySQL estimates that a full table scan will be faster than using an index, then do not use the index
-
What is a clustered index
-
In a B+Tree, the leaf nodes save data or pointers
-
MyISAM indexes and data are separated, using non-clustered
-
InnoDB data files are index files, primary key indexes are clustered indexes
Redis Command Summary
-
Why is it so fast?
-
Based on memory, written in C language
-
Uses multiplexed I/O model, non-blocking IO
-
Uses single thread to reduce thread switching
-
Since Redis is based on memory operations, the CPU is not the bottleneck for Redis, the bottleneck is most likely the size of the machine’s memory or network bandwidth. Since single-threading is easy to implement, and the CPU will not become a bottleneck, it is reasonable to adopt a single-threaded solution (after all, using multi-threading would cause many troubles!).
-
Simple data structure
-
Built its own VM mechanism, reducing the time for calling system functions
-
Advantages
-
High performance – Redis can read at a speed of 110,000 times/s, and write at a speed of 81,000 times/s
-
Rich data types
-
Atomic – All operations of Redis are atomic, and Redis also supports the atomic execution of several operations combined
-
Rich features – Redis also supports publish/subscribe, notifications, key expiration, and other features
-
What is a Redis transaction?
-
A mechanism that packages multiple requests and executes multiple commands once and in order
-
Transaction functions are realized through commands such as multi, exec, watch
-
Python redis-py pipeline=conn.pipeline(transaction=True)
-
Persistence methods
-
RDB (snapshot)
-
save (synchronous, can guarantee data consistency)
-
bgsave (asynchronous, when shutdown, if no AOF, it defaults to use)
-
AOF (append log)
-
How to implement a queue
-
push
-
rpop
-
Common data types (Bitmaps, Hyperloglogs, range queries, etc. not commonly used)
-
String: counter
-
integer or sds (Simple Dynamic String)
-
List: user’s followers, fans list
-
ziplist (continuous memory block, each entry node head saves the length information of the front and back nodes to achieve bidirectional linked list function) or double linked list
-
Hash:
-
Set: user’s followers
-
intset or hashtable
-
Zset (sorted set): real-time information leaderboard
-
skiplist (skip list)
-
Difference from Memcached
-
Memcached can only store string keys
-
Memcached users can only append data to the end of existing strings and use this string as a list. However, when deleting these elements, Memcached uses a blacklist to hide elements in the list, thus avoiding operations such as reading, updating, and deleting elements
-
Both Redis and Memcached store data in memory and are in-memory databases. However, Memcached can also be used to cache other things, such as images, videos, etc.
-
Virtual memory – Redis can swap some values that have not been used for a long time to disk when physical memory is exhausted
-
Data storage safety – If Memcached crashes, the data is lost; Redis can periodically save to disk (persistence)
-
Different application scenarios: Redis is used as a NoSQL database, and can also be used as a message queue, data stack, and data cache; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data, and sessions, etc.
-
Implement distributed locks with Redis
-
Use setnx to implement locking, and can also add expiration time through expire
-
The value of the lock can be a random uuid or a specific name
-
When releasing the lock, check if it is that lock through uuid, if so, execute delete to release the lock
-
Common problems
-
Cache avalanche
-
Cache data expires in a short period of time, a large number of requests access the database
-
Cache penetration
-
When requesting data, querying the cache does not exist, and it does not exist in the database either
-
Cache warming
-
Initialize the project, adding some commonly used data to the cache
-
Cache update
-
Data expires, update cache data
-
Cache degradation
-
When the access volume surges, the service has problems (such as slow response time or no response) or non-core services affect the performance of core processes, it is still necessary to ensure that the service is available, even if it is a degraded service. The system can automatically degrade based on some key data, or can configure a switch for manual degradation
-
Consistent Hashing algorithm
-
Ensure data consistency when using clusters
-
Based on Redis to implement a distributed lock, requiring a timeout parameter
-
setnx
-
Virtual memory
-
Memory jitter
-
[The way to obtain is at the end of the article!!]
-
[The way to obtain is at the end of the article!!]
Linux
-
Five I/O models of Unix
-
Blocking IO
-
Non-blocking IO
-
Multiplexing IO (implemented using selectot for IO multiplexing in Python)
-
select
-
Not high concurrency, in the case of very active connection numbers
-
poll
-
Not much improvement over select
-
epoll
-
Suitable for situations with many connections, but few active links
-
Signal-driven IO
-
Asynchronous IO (implemented asynchronously by Gevent/Asyncio)
-
Command manual better than man
-
tldr: a manual with command examples
-
Difference between kill -9 and -15
-
-15: program stops immediately/when the program releases the corresponding resources then stops/program may still continue running
-
-9: due to the uncertainty of -15, directly use -9 to immediately kill the process
-
Paging mechanism (a memory allocation management scheme that separates logical addresses and physical addresses):
-
Operating system efficiently manages memory to reduce fragmentation
-
Logical addresses of the program are divided into fixed-size pages
-
Physical addresses are divided into frames of the same size
-
Corresponding logical addresses and physical addresses through page tables
-
Segmentation mechanism
-
To meet some logical requirements of the code
-
Data sharing/data protection/dynamic linking
-
Each segment is continuously allocated memory, and segments are discretely allocated
-
How to check CPU memory usage?
-
top
-
free Check available memory, troubleshoot memory leaks
Design Patterns
Singleton Pattern
# Method one
def Single(cls,*args,**kwargs):
instances = {}
def get_instance (*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return get_instance
@Single
class B:
pass
# Method two
class Single:
def __init__(self):
print("Singleton pattern implementation method two...")
single = Single()
del Single # Every time you call single, you can do it
# Method three (most commonly used method)
class Single:
def __new__(cls,*args,**kwargs):
if not hasattr(cls,'_instance'):
cls._instance = super().__new__(cls,*args,**kwargs)
return cls._instance
Factory Pattern
class Dog:
def __init__(self):
print("Wang Wang Wang")
class Cat:
def __init__(self):
print("Miao Miao Miao")
def fac(animal):
if animal.lower() == "dog":
return Dog()
if animal.lower() == "cat":
return Cat()
print("Sorry, must be: dog, cat")
Builder Pattern
class Computer:
def __init__(self,serial_number):
self.serial_number = serial_number
self.memory = None
self.hadd = None
self.gpu = None
def __str__(self):
info = (f'Memory:{self.memoryGB}',
'Hard Disk:{self.hadd}GB',
'Graphics Card:{self.gpu}')
return ''.join(info)
class ComputerBuilder:
def __init__(self):
self.computer = Computer('Jim1996')
def configure_memory(self,amount):
self.computer.memory = amount
return self # to facilitate chain calling
def configure_hdd(self,amount):
pass
def configure_gpu(self,gpu_model):
pass
class HardwareEngineer:
def __init__(self):
self.builder = None
def construct_computer(self,memory,hdd,gpu)
self.builder = ComputerBuilder()
self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
@property
def computer(self):
return self.builder.computer
Data Structures and Algorithms Built-in Data Structures and Algorithms
Python implementation of various data structures
Quick Sort
def quick_sort(_list):
if len(_list) < 2:
return _list
pivot_index = 0
pivot = _list(pivot_index)
left_list = [i for i in _list[:pivot_index] if i < pivot]
right_list = [i for i in _list[pivot_index:] if i > pivot]
return quick_sort(left) + [pivot] + quick_sort(right)
Selection Sort
def select_sort(seq):
n = len(seq)
for i in range(n-1)
min_idx = i
for j in range(i+1,n):
if seq[j] < seq[min_inx]:
min_idx = j
if min_idx != i:
seq[i], seq[min_idx] = seq[min_idx],seq[i]
Insertion Sort
def insertion_sort(_list):
n = len(_list)
for i in range(1,n):
value = _list[i]
pos = i
while pos > 0 and value < _list[pos - 1]
_list[pos] = _list[pos - 1]
pos -= 1
_list[pos] = value
print(sql)
Merge Sort
def merge_sorted_list(_list1,_list2): # Merge ordered lists
len_a, len_b = len(_list1),len(_list2)
a = b = 0
sort = []
while len_a > a and len_b > b:
if _list1[a] > _list2[b]:
sort.append(_list2[b])
b += 1
else:
sort.append(_list1[a])
a += 1
if len_a > a:
sort.append(_list1[a:])
if len_b > b:
sort.append(_list2[b:])
return sort
def merge_sort(_list):
if len(list1)<2:
return list1
else:
mid = int(len(list1)/2)
left = mergesort(list1[:mid])
right = mergesort(list1[mid:])
return merge_sorted_list(left,right)
Heap Sort heapq module
from heapq import nsmallest
def heap_sort(_list):
return nsmallest(len(_list),_list)
Stack
from collections import deque
class Stack:
def __init__(self):
self.s = deque()
def peek(self):
p = self.pop()
self.push(p)
return p
def push(self, el):
self.s.append(el)
def pop(self):
return self.pop()
Queue
from collections import deque
class Queue:
def __init__(self):
self.s = deque()
def push(self, el):
self.s.append(el)
def pop(self):
return self.popleft()
Binary Search
def binary_search(_list,num):
mid = len(_list)//2
if len(_list) < 1:
return Flase
if num > _list[mid]:
BinarySearch(_list[mid:],num)
elif num < _list[mid]:
BinarySearch(_list[:mid],num)
else:
return _list.index(num)
[The way to obtain is at the end of the article!!]
[The way to obtain is at the end of the article!!]
Interview Strengthening Questions:
About database optimization and design
https://segmentfault.com/a/1190000018426586
-
How to implement a queue using two stacks
-
Reverse linked list
-
Merge two sorted linked lists
-
Delete linked list nodes
-
Reverse binary tree
-
Design a short URL service? 62-base implementation
-
Design a flash sale system (feed stream)?
https://www.jianshu.com/p/ea0259d109f9
-
Why is it better to use auto-incrementing integers for primary keys in MySQL databases? Can UUID be used? Why?
-
If the data writing order of the InnoDB table can be consistent with the order of the leaf nodes of the B+ tree index, the storage and access efficiency will be the highest. To improve storage and query performance, it is recommended to use auto-incrementing IDs as primary keys.
-
For the primary index of InnoDB, the data will be sorted according to the primary key. Due to the unordered nature of UUID, InnoDB will generate a huge IO pressure, making it unsuitable to use UUID as a physical primary key. It can be used as a logical primary key, while the physical primary key still uses auto-incrementing ID. For global uniqueness, UUID should be used to index and associate other tables or as foreign keys.
-
If we generate auto-incrementing IDs for databases in a distributed system, how do we do it?
-
Use Redis
-
Based on Redis, implement a distributed lock requiring a timeout parameter
-
setnx
-
setnx + expire
-
What if a single Redis node goes down? Are there other industry solutions to implement distributed lock codes?
-
Use consistent hashing algorithm
Cache Algorithms
-
LRU (least-recently-used): replaces the least recently used object
-
LFU (Least frequently used): least frequently used, if a piece of data has been used very few times in a recent period, the likelihood of it being used in the future is also very low
Server Performance Optimization Directions
-
Using data structures and algorithms
-
Database
-
Index optimization
-
Eliminate slow queries
-
slow_query_log_file enabled and query slow query logs
-
Use explain to troubleshoot index issues
-
Adjust data modification indexes
-
Batch operations to reduce IO operations
-
Use NoSQL: for example, Redis
-
Network IO
-
Batch operations
-
Pipeline
-
Cache
-
Redis
-
Asynchronous
-
Asynchronous operations implemented by Asyncio
-
Use Celery to reduce IO blocking
-
Concurrency
-
Multi-threading
-
Gevent
According to the learning progress of students, arrange corresponding Python training camps. Spending eight hours a week learning Python with me can enhance oneself. What I am arranging for you is free.
How to obtain:
-
Like + see again
-
Reply “python” in the public account
Get the latest Python zero-based learning materials for 2024,Reply in the background:Python