Click the above “Visual Learning for Beginners” to choose to add a Star or “Top”
Important content delivered promptly
Since I summarized too much content, the length is a bit long, and this is also something I have been “patching together” for a long time.
Py2 VS Py3
-
print became a function; in Python 2 it is a keyword
-
No more unicode objects; default str is unicode
-
Division in Python 3 returns a float
-
No long type
-
xrange does not exist; range replaces xrange
-
Function names and variable names can be defined in Chinese
-
Advanced unpacking and * unpacking
-
Keyword-only arguments: variables after * must be defined as name=value
-
raise from
-
iteritems removed and replaced with items()
-
yield from connects child generators
-
Native coroutine support for asynchronous programming with asyncio, async/await
-
New modules: enum, mock, ipaddress, concurrent.futures, asyncio, urllib, selector
-
Different enum classes cannot be compared
-
Only equality comparison is allowed within the same enum class
-
Usage of enum class (default numbering starts from 1)
-
To avoid duplicate enum values, use @unique to decorate the enum class
# Notes on enumerations
from enum import Enum
class COLOR(Enum):
YELLOW=1
#YELLOW=2# will raise an error
GREEN=1# will not raise an error; GREEN can be considered an alias for YELLOW
BLACK=3
RED=4
print(COLOR.GREEN)#COLOR.YELLOW will still print YELLOW
for i in COLOR:# Iterating over COLOR will not include GREEN
print(i)
#COLOR.YELLOW
COLOR.BLACK
COLOR.RED
# How to iterate over aliases
for i in COLOR.__members__.items():
print(i)
# output:('YELLOW', <COLOR.YELLOW: 1>)
('GREEN', <COLOR.YELLOW: 1>)
('BLACK', <COLOR.BLACK: 3>)
('RED', <COLOR.RED: 4>)
for i in COLOR.__members__:
print(i)
# output:YELLOW
GREEN
BLACK
RED
# Enum conversion
# It is better to use the numerical value of the enum when storing in the database rather than the string name
# Use enum classes in the code
a=1
print(COLOR(a))# output:COLOR.YELLOW
Py2/3 Conversion Tools
-
six module: A module compatible with Python 2 and Python 3
-
2to3 tool: Changes code syntax version
-
__future__: Use features from the next version
Common Libraries
-
Must-know collections
https://segmentfault.com/a/1190000017385799
-
Python sorting operations and heapq module
https://segmentfault.com/a/1190000017383322
-
itertools module super useful methods
https://segmentfault.com/a/1190000017416590
Less Common but Important Libraries
-
dis (code bytecode analysis)
-
inspect (generator state)
-
cProfile (performance analysis)
-
bisect (maintaining ordered lists)
-
fnmatch
-
fnmatch(string,”*.txt”) # Case insensitive on Windows
-
fnmatch is determined by the system
-
fnmatchcase is case sensitive
-
timeit (code execution time)
def isLen(strString):
# It is still recommended to use the ternary expression, faster
return True if len(strString)>6 else False
def isLen1(strString):
# Note the position of false and true here
return [False,True][len(strString)>6]
import timeit
print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))
print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))
-
contextlib
-
@contextlib.contextmanager makes generator functions into context managers
-
types (contains all type objects defined by the standard interpreter, can mark generator functions as asynchronous)
import types
types.coroutine # equivalent to implementing __await__
-
html (implements HTML escaping)
import html
html.escape("<h1>I'm Jim</h1>") # output:'&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;'
html.unescape('&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;') # <h1>I'm Jim</h1>
-
mock (solving test dependencies)
-
concurrent (creating process pools and thread pools)
from concurrent.futures import ThreadPoolExecutor
pool = ThreadPoolExecutor()
task = pool.submit(function_name,(parameters)) # This method does not block, will return immediately
task.done()# Check if the task execution is complete
task.result()# Blocking method, check the task return value
task.cancel()# Cancel unexecuted tasks, returns True or False, if cancellation is successful returns True
task.add_done_callback()# Callback function
task.running()# Is it currently executing? task is a Future object
for data in pool.map(function, parameter_list):# Returns the list of results of completed tasks, executed in the order of parameters
print(returned task completion result data)
from concurrent.futures import as_completed
as_completed(task_list)# Returns the list of completed tasks, executes one after another
-
selector (encapsulates select, user multiplexing IO programming)
-
asyncio
future=asyncio.ensure_future(coroutine) # Equivalent to the following method future=loop.create_task(coroutine)
future.add_done_callback()# Add a callback function after completion
loop.run_until_complete(future)
future.result()# Check the return result
asyncio.wait() accepts an iterable of coroutine objects
asynicio.gather(*iterable_objects,*iterable_objects) Both results are the same, but gather can cancel in bulk, gather object.cancel()
There can only be one loop in a thread
When loop.stop is called, it must be loop.run_forever() otherwise it will raise an error
loop.run_forever() can execute non-coroutines
Finally, execute finally module loop.close()
asyncio.Task.all_tasks() gets all tasks and then iterates through them to use task.cancel() to cancel
Partial function partial(function, parameters) wraps the function into another function name, its parameters must be placed before the defined function parameters
loop.call_soon(function, parameters)
call_soon_threadsafe() thread safe
loop.call_later(time, function, parameters)
In the same code block call_soon executes first, then multiple laters execute in ascending order based on time
If blocking code must be run
Use loop.run_in_executor(executor, function, parameters) to wrap it into a thread, then put it into a task list, and run it using wait(task list)
Implement HTTP with asyncio
reader,writer=await asyncio.open_connection(host,port)
writer.writer() sends requests
async for data in reader:
data=data.decode("utf-8")
list.append(data)
Then the list contains the HTML
Python Advanced
-
Inter-process communication:
-
Manager (built-in many data structures, can achieve memory sharing between processes)
from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
p_dict[key] = value
if __name__ == "__main__":
progress_dict = Manager().dict()
from queue import PriorityQueue
first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))
first_progress.start()
second_progress.start()
first_progress.join()
second_progress.join()
print(progress_dict)
-
Pipe (suitable for two processes)
from multiprocessing import Pipe,Process
# Pipe performs better than queue
def producer(pipe):
pipe.send("bobby")
def consumer(pipe):
print(pipe.recv())
if __name__ == "__main__":
recevie_pipe, send_pipe = Pipe()
# Pipe can only be used for two processes
my_producer= Process(target=producer, args=(send_pipe, ))
my_consumer = Process(target=consumer, args=(recevie_pipe,))
my_producer.start()
my_consumer.start()
my_producer.join()
my_consumer.join()
-
Queue (cannot be used for process pools, inter-process communication in process pools requires Manager().Queue())
from multiprocessing import Queue,Process
def producer(queue):
queue.put("a")
time.sleep(2)
def consumer(queue):
time.sleep(2)
data = queue.get()
print(data)
if __name__ == "__main__":
queue = Queue(10)
my_producer = Process(target=producer, args=(queue,))
my_consumer = Process(target=consumer, args=(queue,))
my_producer.start()
my_consumer.start()
my_producer.join()
my_consumer.join()
-
Process Pool
def producer(queue):
queue.put("a")
time.sleep(2)
def consumer(queue):
time.sleep(2)
data = queue.get()
print(data)
if __name__ == "__main__":
queue = Manager().Queue(10)
pool = Pool(2)
pool.apply_async(producer, args=(queue,))
pool.apply_async(consumer, args=(queue,))
pool.close()
pool.join()
-
Common Methods in sys Module
-
argv command line parameter list, the first is the path of the program itself
-
path returns the module search path
-
modules.keys() returns the list of all imported modules
-
exit(0) exits the program
-
Concise way to check if a, b, or c is in s
-
Using any: all() returns True if any iterable is empty
# Method one
True in [i in s for i in [a,b,c]]
# Method two
any(i in s for i in [a,b,c])
# Method three
list(filter(lambda x:x in s,[a,b,c]))
-
Set operations
-
{1,2}.issubset({1,2,3})# Check if it is a subset
-
{1,2,3}.issuperset({1,2})
-
{}.isdisjoint({})# Check if the intersection of two sets is empty; if empty, returns True
-
Matching Chinese characters in code
-
[u4E00-u9FA5] matches the range of Chinese characters [one to 龥]
-
Check system default encoding format
import sys
sys.getdefaultencoding() # setdefaultencodeing() sets the system encoding
-
getattr VS getattribute
class A(dict):
def __getattr__(self,value):# Returns when accessing a non-existing property
return 2
def __getattribute__(self,item):# Shield all element access
return item
-
Class variables will not be stored in the instance __dict__, only exist in the class’s __dict__
-
globals/locals (can manipulate code indirectly)
-
globals stores all variable properties and values in the current module
-
locals stores all variable properties and values in the current environment
-
Python variable name resolution mechanism (LEGB)
-
Local scope (Local)
-
The current scope nested local scope (Enclosing locals)
-
Global/module scope (Global)
-
Built-in scope (Built-in)
-
Implement grouping of 1-100 every three
print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])
-
What is a metaclass?
-
It is the class that creates classes. When creating a class, just set metaclass=metaclass, and the metaclass must inherit from type rather than object, because type is the metaclass
type.__bases__ #(<class 'object'>,)
object.__bases__ #()
type(object) #<class 'type'>
class Yuan(type):
def __new__(cls,name,base,attr,*args,**kwargs):
return type(name,base,attr,*args,**kwargs)
class MyClass(metaclass=Yuan):
pass
-
What is duck typing (i.e., polymorphism)?
-
Python does not default to judging parameter types during the use of passed parameters; as long as the parameters meet the execution conditions, they can be executed
-
Deep copy and shallow copy
-
Deep copy copies content, shallow copy copies address (increases reference count)
-
The copy module implements deep copy
-
Unit testing
-
Generally, test classes inherit from unittest’s TestCase module
-
pytest module quick testing (methods start with test_/test files start with test_/test classes start with Test and cannot have init methods)
-
coverage statistics test coverage
class MyTest(unittest.TestCase):
def tearDown(self):# Executed before each test case
print('This method starts testing')
def setUp(self):# Operations before each test case
print('This method ends testing')
@classmethod
def tearDownClass(self):# Must use @classmethod decorator, runs once after all tests
print('Start testing')
@classmethod
def setUpClass(self):# Must use @classmethod decorator, runs once before all tests
print('End testing')
def test_a_run(self):
self.assertEqual(1, 1) # Test case
-
GIL will release based on the number of executed bytecode lines and time slices, GIL actively releases when encountering IO operations
-
What is monkey patch?
-
Monkey patch, replaces blocking syntax with non-blocking methods during runtime
-
What is introspection?
-
The ability to determine the type of an object at runtime, id, type, isinstance
-
Is Python pass-by-value or pass-by-reference?
-
Neither; Python passes parameters by sharing, and default parameters are executed only once during execution
-
Difference between else and finally in try-except-else-finally
-
else executes when no exception occurs; finally executes regardless of whether an exception occurs
-
except can catch multiple exceptions at once, but generally for different exceptions, we capture and handle them separately
-
GIL Global Interpreter Lock
-
Only one thread can execute at a time, a feature of CPython (IPython), other interpreters do not exist
-
CPU-bound: multi-process + process pool
-
IO-bound: multi-thread/coroutine
-
What is Cython?
-
A tool to convert Python into C code
-
Generators and iterators
-
Iterable objects only need to implement the __iter__ method
-
Objects implementing __next__ and __iter__ methods are iterators
-
Generator functions using generator expressions or yield (generators are a special type of iterator)
-
What is a coroutine?
-
yield
-
async-await
-
A lighter-weight multitasking method than threads
-
Implementation method
-
Underlying structure of dict
-
To support fast lookup, it uses a hash table as the underlying structure
-
The average time complexity of hash table lookup is O(1)
-
CPython interpreter uses double probing to solve hash collision problems
-
Hash expansion and hash collision solutions
-
Chaining method
-
Double probing (open addressing method): Python uses
-
Cycle copy to new space to achieve expansion
-
Collision resolution:
for gevent import monkey
monkey.patch_all() # Modify all blocking methods in the code, can specify specific methods to modify
-
Check if it is a generator or coroutine
co_flags = func.__code__.co_flags
# Check if it is a coroutine
if co_flags & 0x180:
return func
# Check if it is a generator
if co_flags & 0x20:
return func
-
Fibonacci problem and its variations
# A frog can jump up one or two steps at a time. How many ways can it jump up n steps?
# How many ways can n 2*1 rectangles cover a 2*n rectangle without overlapping?
# Method one:
fib = lambda n: n if n <= 2 else fib(n - 1) + fib(n - 2)
# Method two:
def fib(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return b
# A frog can jump up one or two steps at a time... it can also jump up n steps. How many ways can it jump up n steps?
fib = lambda n: n if n < 2 else 2 * fib(n - 1)
-
Get the environment variables set on the computer
import os
os.getenv(env_name,None)# Get environment variable, if it does not exist return None
-
Garbage collection mechanism
-
Reference counting
-
Mark and sweep
-
Generational garbage collection
# Check the trigger of generational garbage collection
import gc
gc.get_threshold() #output:(700, 10, 10)
-
True and False are equivalent to 1 and 0 in code and can be calculated directly with numbers; inf represents infinity
-
C10M/C10K
-
C10M: 8-core CPU, 64GB memory, maintaining 10 million concurrent connections on a 10gbps network
-
C10K: 1GHz CPU, 2GB memory, maintaining 10,000 clients providing FTP service on a 1gbps network
-
Difference between yield from and yield:
-
yield from follows an iterable object, while yield has no restrictions
-
GeneratorExit is triggered when the generator stops
-
Several uses of a single underscore
-
When defining a variable, indicates a private variable
-
When unpacking, indicates discarding useless data
-
In interactive mode, represents the result of the last executed code
-
Can be used to concatenate numbers (111_222_333)
-
Using break will not execute else
-
Decimal to binary conversion
def conver_bin(num):
if num == 0:
return num
re = []
while num:
num, rem = divmod(num,2)
re.append(str(rem))
return "".join(reversed(re))
conver_bin(10)
-
list1 = [‘A’, ‘B’, ‘C’, ‘D’] How can I get a new list named after the elements in the list A=[],B=[],C=[],D=[]?
list1 = ['A', 'B', 'C', 'D']
# Method one
for i in list1:
globals()[i] = [] # Can be used to implement Python reflection
# Method two
for i in list1:
exec(f'{i} = []') # exec executes string statements
-
memoryview and bytearray (not commonly used, just recorded)
# bytearray is mutable, bytes is immutable, memoryview does not produce new slices and objects
a = 'aaaaaa'
ma = memoryview(a)
ma.readonly # Read-only memoryview
mb = ma[:2] # Will not produce a new string
a = bytearray('aaaaaa')
ma = memoryview(a)
ma.readonly # Writable memoryview
mb = ma[:2] # Will not produce a new bytearray
mb[:2] = 'bb' # Changes to mb are changes to ma
-
Ellipsis type
# The occurrence of ... in code is an Ellipsis object
L = [1,2,3]
L.append(L)
print(L) # output:[1,2,3,[…]]
-
Lazy evaluation
class lazy(object):
def __init__(self, func):
self.func = func
def __get__(self, instance, cls):
val = self.func(instance) # It is equivalent to executing area(c), c is the Circle object below
setattr(instance, self.func.__name__, val)
return val
class Circle(object):
def __init__(self, radius):
self.radius = radius
@lazy
def area(self):
print('evaluate')
return 3.14 * self.radius ** 2
-
Traverse files, pass in a folder, and print out all file paths (recursively)
all_files = []
def getAllFiles(directory_path):
import os
for sChild in os.listdir(directory_path):
sChildPath = os.path.join(directory_path,sChild)
if os.path.isdir(sChildPath):
getAllFiles(sChildPath)
else:
all_files.append(sChildPath)
return all_files
-
File name handling when storing files
# secure_filename converts a string into a safe file name
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool cml
euts.txt') # output:i_contain_cool_umlauts.txt
-
Date formatting
from datetime import datetime
datetime.now().strftime("%Y-%m-%d")
import time
# Only localtime can be formatted; time cannot be formatted
time.strftime("%Y-%m-%d",time.localtime())
-
Strange issue with tuple using +=
# Will raise an error, but the value of the tuple will change because t[1] id has not changed
t=(1,[2,3])
t[1]+=[4,5]
# Using append extend on t[1] will not raise an error and can execute successfully
-
__missing__ you should know
class Mydict(dict):
def __missing__(self,key): # When Mydict uses slice access and the property does not exist, it returns the value
return key
-
+ and +=
# + cannot be used to concatenate lists and tuples, but += can (via iadd, the internal implementation is extends(), so it can increase tuples), + will create a new object
# Immutable objects do not have __iadd__, so they use __add__ directly, so tuples can use += to add between tuples
-
How to turn each element of an iterable into all keys of a dictionary?
dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}
-
Wireshark packet capture software
Networking Knowledge
-
What is HTTPS?
-
Secure HTTP protocol, HTTPS requires CS certificates, data encryption, port 443, secure; the same website with HTTPS will rank higher in SEO
-
Common response status codes
204 No Content // Request successfully processed, no entity body returned, generally used to indicate successful deletion
206 Partial Content // GET range request has been successfully processed
303 See Other // Temporary redirect, expect to use GET to redirect to obtain
304 Not Modified // Request cached resources
307 Temporary Redirect // Temporary redirect, POST will not change to GET
401 Unauthorized // Authentication failed
403 Forbidden // Resource request denied
400 // Request parameter error
201 // Add or change successfully
503 // Server maintenance or overload
-
Idempotency and security of HTTP request methods
-
WSGI
# environ: a dict object containing all HTTP request information
# start_response: a function that sends HTTP responses
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
return '<h1>Hello, web!</h1>'
-
RPC
-
CDN
-
SSL (Secure Sockets Layer), and its successor Transport Layer Security (TLS), is a security protocol that provides security and data integrity for network communications.
-
SSH (Secure Shell) is a security protocol established on the application layer; it is currently a reliable protocol designed for remote login sessions and other network services. SSH can effectively prevent information leakage during remote management processes. SSH was initially a program on UNIX systems, and later quickly expanded to other operating platforms. When used correctly, SSH can compensate for vulnerabilities in the network. SSH clients are suitable for various platforms. Almost all UNIX platforms—including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix—can run SSH.
-
TCP/IP
-
TCP: connection-oriented/reliable/byte stream-based
-
UDP: connectionless/unreliable/message-oriented
-
Three-way handshake and four-way handshake
-
Three-way handshake (SYN/SYN+ACK/ACK)
-
Four-way handshake (FIN/ACK/FIN/ACK)
-
Why is there a three-way handshake during connection but a four-way handshake during disconnection?
-
Because when the server receives the client’s SYN connection request packet, it can directly send the SYN+ACK packet. The ACK packet is used to respond, and the SYN packet is used to synchronize. However, when closing the connection, when the server receives the FIN packet, it may not immediately close the SOCKET, so it can only first reply with an ACK packet to tell the client, “I received your FIN packet.” Only after the server has finished sending all its packets can it send the FIN packet, so they cannot be sent together. Thus, four steps are needed for the handshake.
-
Why does the TIME_WAIT state need to pass through 2MSL (maximum segment lifetime) to return to the CLOSE state?
-
Although theoretically, after the four packets have been sent, we can directly enter the CLOSE state, we must assume that the network is unreliable, and the last ACK may be lost. Therefore, the TIME_WAIT state is used to resend potentially lost ACK packets.
-
XSS/CSRF
-
HttpOnly prohibits JS scripts from accessing and manipulating cookies, effectively preventing XSS
MySQL
-
Index improvement process
-
Linear structure -> binary search -> hash -> binary search tree -> balanced binary tree -> multi-way search tree -> multi-way balanced search tree (B-Tree)
-
MySQL interview summary basics
https://segmentfault.com/a/1190000018371218
-
MySQL interview summary advanced
https://segmentfault.com/a/1190000018380324
-
In-depth MySQL
http://ningning.today/2017/02/13/database/深入浅出mysql/
-
When emptying an entire table, InnoDB deletes one row at a time, while MyISAM deletes and rebuilds the table
-
text/blob data types cannot have default values; there is no case conversion during queries
-
When does an index become invalid?
-
LIKE queries starting with %
-
Implicit type conversion occurs
-
Does not satisfy the most left prefix principle
-
For multi-column indexes, if the first part is not used, the index will not be used
-
Invalidation scenarios:
-
Avoid using != or <> operators in the where clause, otherwise the engine will give up using the index and perform a full table scan
-
Avoid using or to connect conditions in the where clause, otherwise it will cause the engine to give up using the index and perform a full table scan, even if some conditions have indexes, this is also why or should be used sparingly
-
If the column type is a string, be sure to quote the data in the condition, otherwise the index will not be used
-
Avoid performing function operations on fields in the where clause, which will cause the engine to give up using the index and perform a full table scan
For example:
select id from t where substring(name,1,3) = 'abc' – name;
Starting with abc should be changed to:
select id from t where name like 'abc%'
For example:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
Should be changed to:
-
Avoid performing functions, arithmetic operations, or other expression operations on the left side of “=” in the where clause, otherwise the system may not be able to use the index correctly
-
Avoid performing expression operations on fields in the where clause, which will cause the engine to give up using the index and perform a full table scan
For example:
select id from t where num/2 = 100
Should be changed to:
select id from t where num = 100*2;
-
Not suitable for columns with fewer key values (columns with more duplicate data), for example: set enum columns are not suitable (enum type can add null, and the default values will automatically filter out spaces; set is similar to enum but can only add 64 values)
-
If MySQL estimates that using a full table scan is faster than using an index, it will not use the index
-
What is a clustered index?
-
B+Tree leaf nodes store data or pointers
-
MyISAM index and data are separated, using non-clustered
-
InnoDB data file is the index file, the primary key index is the clustered index
Redis Command Summary
-
Why is it so fast?
-
Based on memory, written in C
-
Uses multiplexing I/O model, non-blocking I/O
-
Uses single-threading to reduce thread switching
-
Since Redis operates based on memory, CPU is not the bottleneck for Redis; the bottleneck is most likely the size of the machine’s memory or network bandwidth. Since single-threading is easy to implement, and CPU will not become a bottleneck, it is reasonable to adopt a single-threaded solution (after all, using multi-threading can be troublesome!).
-
Simple data structures
-
Built its own VM mechanism to reduce the time spent calling system functions
-
Advantages
-
High performance – Redis can read at a speed of 110,000 times/s, write at a speed of 81,000 times/s
-
Rich data types
-
Atomic – All operations of Redis are atomic, and Redis also supports atomic execution of several operations combined
-
Rich features – Redis also supports publish/subscribe, notifications, key expiration, and other features
-
What is a Redis transaction?
-
A mechanism to package multiple requests and execute multiple commands at once and in order
-
Transaction functions are implemented through multi, exec, watch, etc.
-
Python redis-py pipeline=conn.pipeline(transaction=True)
-
Persistence methods
-
RDB (snapshot)
-
save (synchronous, can guarantee data consistency)
-
bgsave (asynchronous, during shutdown, defaults to using without AOF)
-
AOF (append-only file)
-
How to implement a queue
-
push
-
rpop
-
Common data types (Bitmaps, Hyperloglogs, range queries, etc. are not commonly used)
-
String: Counter
-
Integer or sds (Simple Dynamic String)
-
List: User’s followers, fans list
-
ziplist (continuous memory blocks, each entry node header saves information on the lengths of previous and next nodes to implement a doubly linked list function) or double linked list
-
Hash:
-
Set: User’s followers
-
intset or hashtable
-
Zset: Real-time information leaderboard
-
skiplist (skip list)
-
Difference from Memcached
-
Memcached can only store string keys
-
Memcached users can only append data to the end of an existing string using APPEND and treat this string as a list. However, when deleting these elements, Memcached uses a blacklist method to hide elements in the list, thus avoiding operations of reading, updating, and deleting elements
-
Both Redis and Memcached store data in memory and are in-memory databases. However, Memcached can also be used to cache other things, such as images, videos, etc.
-
Virtual memory – Redis can swap some rarely used values to disk when physical memory runs out
-
Data storage safety – Data is lost when Memcached crashes; Redis can periodically save to disk (persistence)
-
Different application scenarios: Redis can be used as a NoSQL database as well as for message queues, data stacks, and data caching; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data, and sessions, etc.
-
Redis implements distributed locks
-
Use setnx to implement locking, can also add expiration time through expire
-
The lock’s value can be a random UUID or a specific name
-
When releasing the lock, determine if it is that lock by the UUID, and if so, execute delete to release the lock
-
Common issues
-
Cache avalanche
-
Cache data expires in a short time, a large number of requests access the database
-
Cache penetration
-
When requesting to access data, it does not exist in the cache or the database
-
Cache warming
-
Initialize the project, add some commonly used data to the cache
-
Cache update
-
Data expires, update cached data
-
Cache degradation
-
When the access volume surges, services experience problems (such as slow response times or no response) or non-core services affect the performance of core processes, it is still necessary to ensure that the service remains available, even if it means degraded service. The system can automatically degrade based on some key data or configure a switch for manual degradation
-
Consistent Hashing Algorithm
-
Ensure data consistency when using clusters
-
Implement a distributed lock based on Redis, requiring a timeout parameter
-
setnx
-
Virtual memory
-
Memory jitter
Linux
-
Five I/O models in Unix
-
Blocking I/O
-
Non-blocking I/O
-
Multiplexing I/O (using selectot to implement I/O multiplexing in Python)
-
select
-
Not high concurrency, in the case of very active connection numbers
-
poll
-
Not much improvement over select
-
epoll
-
Suitable for cases with many connections but few active links
-
Signal-driven I/O
-
Asynchronous I/O (Gevent/Asyncio implements asynchrony)
-
Command manual better than man
-
tldr: A manual with command examples
-
Difference between kill -9 and -15
-
-15: program stops immediately/when the program releases corresponding resources it stops/ the program may still continue running
-
-9: Because of the uncertainty of -15, -9 is used to immediately kill the process
-
Paging mechanism (logical address and physical address separation memory allocation management scheme):
-
The operating system manages memory efficiently to reduce fragmentation
-
Logical addresses of programs are divided into fixed-size pages
-
Physical addresses are divided into frames of the same size
-
Corresponding logical addresses and physical addresses through page tables
-
Segmentation mechanism
-
To meet some logical requirements of code
-
Data sharing/data protection/dynamic linking
-
Each segment is allocated contiguous memory, while segments are allocated discretely
-
How to check CPU memory usage?
-
top
-
free to check available memory, troubleshoot memory leak issues
Design Patterns
Singleton Pattern
# Method one
def Single(cls,*args,**kwargs):
instances = {}
def get_instance (*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return get_instance
@Single
class B:
pass
# Method two
class Single:
def __init__(self):
print("Singleton pattern implementation method two...")
single = Single()
del Single # Every time, just call single
# Method three (most commonly used method)
class Single:
def __new__(cls,*args,**kwargs):
if not hasattr(cls,'_instance'):
cls._instance = super().__new__(cls,*args,**kwargs)
return cls._instance
Factory Pattern
class Dog:
def __init__(self):
print("Wang Wang Wang")
class Cat:
def __init__(self):
print("Miao Miao Miao")
def fac(animal):
if animal.lower() == "dog":
return Dog()
if animal.lower() == "cat":
return Cat()
print("Sorry, must be: dog, cat")
Builder Pattern
class Computer:
def __init__(self,serial_number):
self.serial_number = serial_number
self.memory = None
self.hadd = None
self.gpu = None
def __str__(self):
info = (f'Memory:{self.memoryGB}',
'Hard Disk:{self.hadd}GB',
'Graphics Card:{self.gpu}')
return ''.join(info)
class ComputerBuilder:
def __init__(self):
self.computer = Computer('Jim1996')
def configure_memory(self,amount):
self.computer.memory = amount
return self # To facilitate chain calling
def configure_hdd(self,amount):
pass
def configure_gpu(self,gpu_model):
pass
class HardwareEngineer:
def __init__(self):
self.builder = None
def construct_computer(self,memory,hdd,gpu):
self.builder = ComputerBuilder()
self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
@property
def computer(self):
return self.builder.computer
Built-in Data Structures and Algorithms in Python
Python Implementation of Various Data Structures
Quick Sort
def quick_sort(_list):
if len(_list) < 2:
return _list
pivot_index = 0
pivot = _list(pivot_index)
left_list = [i for i in _list[:pivot_index] if i < pivot]
right_list = [i for i in _list[pivot_index:] if i > pivot]
return quick_sort(left) + [pivot] + quick_sort(right)
Selection Sort
def select_sort(seq):
n = len(seq)
for i in range(n-1)
min_idx = i
for j in range(i+1,n):
if seq[j] < seq[min_inx]:
min_idx = j
if min_idx != i:
seq[i], seq[min_idx] = seq[min_idx],seq[i]
Insertion Sort
def insertion_sort(_list):
n = len(_list)
for i in range(1,n):
value = _list[i]
pos = i
while pos > 0 and value < _list[pos - 1]:
_list[pos] = _list[pos - 1]
pos -= 1
_list[pos] = value
print(sql)
Merge Sort
def merge_sorted_list(_list1,_list2): # Merge ordered lists
len_a, len_b = len(_list1),len(_list2)
a = b = 0
sort = []
while len_a > a and len_b > b:
if _list1[a] > _list2[b]:
sort.append(_list2[b])
b += 1
else:
sort.append(_list1[a])
a += 1
if len_a > a:
sort.append(_list1[a:])
if len_b > b:
sort.append(_list2[b:])
return sort
def merge_sort(_list):
if len(list1)<2:
return list1
else:
mid = int(len(list1)/2)
left = mergesort(list1[:mid])
right = mergesort(list1[mid:])
return merge_sorted_list(left,right)
Heap Sort using heapq module
from heapq import nsmallest
def heap_sort(_list):
return nsmallest(len(_list),_list)
Stack
from collections import deque
class Stack:
def __init__(self):
self.s = deque()
def peek(self):
p = self.pop()
self.push(p)
return p
def push(self, el):
self.s.append(el)
def pop(self):
return self.pop()
Queue
from collections import deque
class Queue:
def __init__(self):
self.s = deque()
def push(self, el):
self.s.append(el)
def pop(self):
return self.popleft()
Binary Search
def binary_search(_list,num):
mid = len(_list)//2
if len(_list) < 1:
return Flase
if num > _list[mid]:
BinarySearch(_list[mid:],num)
elif num < _list[mid]:
BinarySearch(_list[:mid],num)
else:
return _list.index(num)
Interview Strengthening Questions:
About database optimization and design
https://segmentfault.com/a/1190000018426586
-
How to implement a queue using two stacks
-
Reverse a linked list
-
Merge two sorted linked lists
-
Delete a node in a linked list
-
Reverse a binary tree
-
Design a short URL service? 62-base implementation
-
Design a flash sale system (feed stream)?
https://www.jianshu.com/p/ea0259d109f9
-
Why is it better to use auto-incrementing integers as primary keys in MySQL databases? Can UUID be used? Why?
-
If the write order of InnoDB table data can be consistent with the order of the leaf nodes of the B+ tree index, the storage and retrieval efficiency will be the highest. To optimize storage and query performance, auto-incrementing IDs should be used as primary keys.
-
For the primary index of InnoDB, data will be sorted according to the primary key. Due to the unordered nature of UUID, InnoDB will generate huge IO pressure, so it is not suitable to use UUID as a physical primary key; it can be used as a logical primary key, while the physical primary key still uses auto-incrementing IDs. To ensure global uniqueness, UUID should be used to index and associate other tables or as foreign keys.
-
If we are in a distributed system, how do we generate auto-incrementing IDs in databases?
-
Use Redis
-
Implement a distributed lock based on Redis, requiring a timeout parameter
-
setnx
-
setnx + expire
-
If a single Redis node crashes, how to handle it? Are there other industry solutions for implementing distributed locks?
-
Use a consistent hashing algorithm
Cache Algorithms
-
LRU (least-recently-used): Replace the least recently used objects
-
LFU (Least frequently used): If a piece of data has been used very few times in a recent period, it is unlikely to be used in the future
Server Performance Optimization Directions
-
Use data structures and algorithms
-
Database
-
Index optimization
-
Eliminate slow queries
-
slow_query_log_file turned on and log slow queries
-
Use explain to troubleshoot index issues
-
Adjust data modification indexes
-
Batch operations to reduce IO operations
-
Use NoSQL: e.g., Redis
-
Network IO
-
Batch operations
-
pipeline
-
Cache
-
Redis
-
Asynchronous
-
Asyncio implements asynchronous operations
-
Use Celery to reduce IO blocking
-
Concurrency
-
Multithreading
-
Gevent
Good news!
Visual Learning for Beginners knowledge community
is now open to the public 👇👇👇
Download 1: OpenCV-Contrib extension module Chinese tutorial
Reply “Extension Module Chinese Tutorial” in the background of the “Visual Learning for Beginners” public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.
Download 2: 52 Python Visual Practical Projects
Reply “Python Visual Practical Projects” in the background of the “Visual Learning for Beginners” public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition, to help you quickly learn computer vision.
Download 3: 20 OpenCV Practical Projects
Reply “20 OpenCV Practical Projects” in the background of the “Visual Learning for Beginners” public account to download 20 practical projects based on OpenCV to achieve advanced learning of OpenCV.
Group Chat
Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups on SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually subdivide in the future). Please scan the WeChat ID below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for notes, otherwise you will not be approved. After successful addition, you will be invited into the relevant WeChat group based on research direction. Please do not send advertisements in the group, otherwise you will be removed from the group. Thank you for your understanding~