Weekly insights into the audio and video technology field.
News submissions: [email protected].
TCSVT 2022 | Depth Video Compression Based on In-loop Multi-frame Prediction
This article proposes an in-loop multi-frame prediction module based on an end-to-end deep video compression framework, achieving efficient prediction of the current frame based on multiple reference frames without additional bitrate consumption.
Gathering New Energies in Audio and Video to Explore New Blue Oceans in the Industry
The author shares ideas on technological transformations and breakthroughs based on video industry trends and pain points, combined with Kuaishou’s own explorations and evolution, seeking new growth points in the industry.
NVIDIA Optical Flow SDK Accelerates Motion Processing for Vulkan
https://developer.nvidia.com/blog/accelerated-motion-processing-brought-to-vulkan-with-optical-flow-sdk/
What is Speech Recognition?
This introduces the basic concepts, working principles, and application scenarios of speech recognition technology. Additionally, the author mentions some open-source and commercial speech recognition solutions, such as Google Cloud Speech-to-Text and Twilio Autopilot.
https://www.twilio.com/blog/what-is-speech-recognition
Why Can We Judge the Distance of Sound?
Together released the RedPajama project, aiming to create a leading fully open-source model. Currently, the project has completed its first step by successfully replicating the LLaMA training dataset with over 1.2 trillion data tokens.
https://www.together.xyz/blog/redpajama
Exclusive Share from ICLR 2023 Outstanding Paper Award Winner: A General Few-shot Learner Adapted for Any Dense Prediction Task
When computer vision models learn to “generalize”
Recently, in the latest AI frontier podcast series at Microsoft Research, Peter Lee had an in-depth conversation with Ashley Llorens, Vice President of Microsoft Research, expressing his views on the potential and challenges of large models in healthcare, as well as Microsoft’s research plans for future computing under the trend of large models.
Industry insiders predict: “There should be no more than 1,000 people in the country capable of conducting related technology R&D, conservatively speaking only two or three hundred.” However, roughly calculated, there are already dozens of large model projects on the market. The talent war is heating up.
Liang Jianzhang: How AI Affects the Economy and Various Industries
The future question is not what AI can do, but what choices humans will make for AI to do.
A reference for every parent concerned about the changes of the times and their children’s growth.
Innovations in foundational models are the core driving force of visual development.
Teachers and students from Renmin University of China have researched the latest progress and main technical paths of large language models, forming a review article in this field, citing or introducing over 420 related papers, hoping to provide certain technical references for researchers and engineers.
One week after the launch of the Qianwen large model, DingTalk confirmed its access to Qianwen. Currently, DingTalk is testing scenarios of integration with large models and will go live after relevant security assessments are completed.
https://ai.googleblog.com/2023/04/beyond-automatic-differentiation.html
DeepSpeed User Guide (Brief Version)
AI Research Knowledge Group
https://zl49so8lbq.feishu.cn/wiki/wikcnLrLDTYCm2uxYKqzCVnCr1c
No need to worry about buying ChatGPT Plus anymore.
Google Forms “Magi” Project Team to Launch New AI-Driven Search Engine
The new search engine will provide users with a more personalized experience than Google’s existing search services and will attempt to anticipate user needs. Currently, Google has assembled a team of designers, engineers, and executives responsible for building this brand-new search engine.
AI with “Consciousness”: How to Enable Large Language Models to Have Self-Awareness?
To better explore the relationship between consciousness and artificial intelligence, Professor Zhang Jiang has sorted out topics such as human consciousness research, consciousness theory and modeling, self-reference and consciousness machines, and self-simulating consciousness machines.
OpenAI’s CEO States that the Era of Giant AI Models is Over
https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
A Gradient Perspective on LoRA: Introduction, Analysis, Speculation, and Promotion
DINOv2: Learning Robust Visual Features Without Supervision
https://github.com/facebookresearch/dinov2
What is Emergence?
MIT Experts Discuss Generative AI, Should Approach Model Potentials with Humility and Continue Learning
How AIGC is Used for Recommendations? Latest Paper from USTC “Generative Recommendation: Towards a New Paradigm for Next-Generation Recommendation Systems”
This paper proposes a new generative recommendation system paradigm GeneRec, which serves users’ personalized information needs by combining content generation and instruction guidance. Additionally, the author emphasizes the importance of various fidelity checks to ensure the credibility of generated content.
Unveiling the Harsh Truth Behind the Buzz of Auto-GPT!
Is Auto-GPT a groundbreaking project or an overhyped AI experiment? This article uncovers the truth behind the buzz and reveals the production limitations of Auto-GPT that make it unsuitable for practical applications.
Adobe Firefly Also Begins Supporting Video
Adobe brings generative AI into video editing, allowing algorithms to assist users in generating desired video effects.
https://research.nvidia.com/labs/toronto-ai/VideoLDM/
Microsoft Open Sources “Foolproof” ChatGPT Model Training Tool, Greatly Reducing Costs and Speeding Up by 15 Times
Deep Speed Chat is developed based on Microsoft’s Deep Speed deep learning optimization library, featuring training, reinforcement inference, and other functions, and uses RLHF (Reinforcement Learning from Human Feedback) technology to increase training speed by over 15 times while significantly reducing costs.
Amazon EC2 Inf2 Has Officially Launched, Providing Low-Cost, High-Performance Generative AI Inference Services.
https://aws.amazon.com/cn/blogs/aws/amazon-ec2-inf2-instances-for-low-cost-high-performance-generative-ai-inference-are-now-generally-available/
NVIDIA’s Core i5 Processor is One of the Most Cost-Effective CPUs Currently, but Which One is More Suitable for You?
https://arstechnica.com/gadgets/2023/04/intels-core-i5-is-the-best-bargain-in-cpus-right-now-but-which-should-you-get/
The World’s First 3nm Chip Officially Released
According to Marvell, the industry’s first silicon building module at this node includes 112G XSR SerDes (serializer/deserializer), Long Reach SerDes, PCIe Gen 6 / CXL 3.0 SerDes, and 240 Tbps parallel chip-to-chip interconnect.
Amazon CEO States AWS Employees Now Spend “Most of Their Time” Optimizing Customer Cloud
https://www.theregister.com/2023/04/17/amazon_annual_shareholder_letter_aws/
PAG 4.2 Version Officially Released: New 3D Layer and Video Replacement Capabilities, Significantly Optimized UI Playback Performance
Image Classification Using Flux.jl
BP-EVD: A Real-Time Video Denoising Method
This article presents a deep learning-based video denoising method that cleverly arranges the utilization of data in the time domain to achieve high-quality real-time video denoising.
How to Systematically Learn Machine Vision Technology?
This article is a summary of some knowledge about machine vision, and it is recommended for those who want to learn to save.
Real-Time Interactive RTI Technology Capability Construction in the Metaverse Scenario
LiveVideoStack 2022 Beijing Station invited experts from ZEGO to introduce the underlying technology capabilities built in the metaverse scenario.
How Edison Helps Us Build a Faster, More Powerful Dropbox on the Web
https://dropbox.tech/frontend/edison-webserver-a-faster-more-powerful-dropbox-on-the-web
NAB Exhibition Zone Detailed Explanation
https://www.sportsvideo.org/2023/04/19/sportstechbuzz-at-nab-2023-wednesdays-latest-from-vegas/
2023 Spring Volcano Engine “FORCE·Original Power” Conference
On April 18, the 2023 Spring Volcano Engine “FORCE·Original Power” Conference was held in Shanghai, showcasing the latest explorations, applications, and practices of Volcano Engine in cloud technology, cloud services, and cloud scenarios, presenting a strategic blueprint for innovative development.
BlikVM’s Open Source KVM-over-IP Solution
It allows you to remotely control and manage other computers over the network using devices with Raspberry Pi CM4 or Allwinner H616 processors. BlikVM is driven by a PCIe card designed based on the Raspberry Pi HAT, which provides the capability to transmit video signals and USB input/output over the network.
https://www.cnx-software.com/2023/04/18/blikvm-open-source-kvm-over-ip-raspberry-pi-cm4-raspberry-pi-hat-pcie-board-allwinner-h616/
CNCF Fuzz Testing Open Source Project’s Security and Reliability
https://www.cncf.io/blog/2023/04/18/cncf-fuzzing-open-source-projects-for-security-and-reliability/
2023 Video Codec Status
https://www.streamingmedia.com/Articles/Editorial/Featured-Articles/The-State-of-Video-Codecs-2023-158116.aspx
CVPR 2019 | Practical Full-Resolution Learning Lossless Image Compression
This article proposes the first practical learning lossless image compression system, L3C, and shows that it outperforms popular engineering codecs PNG, WebP, and JPEG2000.
Non-linear Vector Transform Coding – Exploring a New Coding Framework
Proposes a VQ codebook initialization strategy to solve the problem of joint optimization in multi-level VQ.
NVIDIA Quietly Monopolizes Computing Power: The New Empire Behind AI
The expansion of computing power and the development and layout of technology are the reasons for NVIDIA’s success.
How Japan Uses AI to Solve Elderly Travel Problems
Tokyo Haneda Airport has launched self-driving wheelchairs for elderly and mobility-impaired passengers, achieving automated driving from the security check to the boarding gate.
Event Recommendations

【公开课】开放XCDN直播方案设计与实践
On April 25, 19:00, we invite Baidu Intelligent Cloud Video Cloud Technical Architect Ke Yugang to introduce a live broadcast solution based on the HTTP/3 protocol and analyze in detail how to use a unified protocol to collaborate with cloud, edge, and end resources at all levels, using an open architecture to achieve interoperability of services from multiple vendors, and how to efficiently utilize complex edge resources for fast loading and stable playback of videos.
Time:April 25, 2023, 19:00
Registration: Scan the QR code in the image or click【Read Original】 to make an appointment for registration and watch the live broadcast!