Weekly Insights into Audio and Video Technology

Weekly insights into the audio and video technology field.

News submissions: [email protected].

TCSVT 2022 | Depth Video Compression Based on In-loop Multi-frame Prediction

This article proposes an in-loop multi-frame prediction module based on an end-to-end deep video compression framework, achieving efficient prediction of the current frame based on multiple reference frames without additional bitrate consumption.

Gathering New Energies in Audio and Video to Explore New Blue Oceans in the Industry

The author shares ideas on technological transformations and breakthroughs based on video industry trends and pain points, combined with Kuaishou’s own explorations and evolution, seeking new growth points in the industry.

NVIDIA Optical Flow SDK Accelerates Motion Processing for Vulkan

NVOFA is a dedicated hardware unit on new NVIDIA GPUs for high-performance computation of optical flow between two images. The NVIDIA Optical Flow SDK exposes a developer API that allows users to leverage the power of NVOFA hardware in applications.

https://developer.nvidia.com/blog/accelerated-motion-processing-brought-to-vulkan-with-optical-flow-sdk/

What is Speech Recognition?

This introduces the basic concepts, working principles, and application scenarios of speech recognition technology. Additionally, the author mentions some open-source and commercial speech recognition solutions, such as Google Cloud Speech-to-Text and Twilio Autopilot.

https://www.twilio.com/blog/what-is-speech-recognition

Why Can We Judge the Distance of Sound?

This article discusses the importance of binaural hearing in distance perception and details four key parameters—sound pressure level, the ratio of direct sound to reverberant sound energy, frequency spectrum, and binaural differences that affect distance perception.

Wang Bo Discusses Acoustics | Subjective Evaluation Methods for Audio – MUSHRA

This article explores the technical challenges and HBK’s solutions from the aspects of subjective attributes of audio perception, evaluation methods, and objective parameter measurements.

RedPajama Model Released, Trillions of Data and Open Source

Together released the RedPajama project, aiming to create a leading fully open-source model. Currently, the project has completed its first step by successfully replicating the LLaMA training dataset with over 1.2 trillion data tokens.

https://www.together.xyz/blog/redpajama

Exclusive Share from ICLR 2023 Outstanding Paper Award Winner: A General Few-shot Learner Adapted for Any Dense Prediction Task

When computer vision models learn to “generalize”

Conversation with Peter Lee: Opportunities and Challenges of Large Models in Healthcare

Recently, in the latest AI frontier podcast series at Microsoft Research, Peter Lee had an in-depth conversation with Ashley Llorens, Vice President of Microsoft Research, expressing his views on the potential and challenges of large models in healthcare, as well as Microsoft’s research plans for future computing under the trend of large models.

100k Monthly Salary, Large Models Crazy Talent Hunt

Industry insiders predict: “There should be no more than 1,000 people in the country capable of conducting related technology R&D, conservatively speaking only two or three hundred.” However, roughly calculated, there are already dozens of large model projects on the market. The talent war is heating up.

Liang Jianzhang: How AI Affects the Economy and Various Industries

The future question is not what AI can do, but what choices humans will make for AI to do.

How to Talk to Kids About ChatGPT: A Complete Parent’s Guide in the Age of AI

A reference for every parent concerned about the changes of the times and their children’s growth.

A New Approach to Visual Neural Network Architecture Design Towards “Large” and “Unified”

Innovations in foundational models are the core driving force of visual development.

Overview of Large Language Models

Teachers and students from Renmin University of China have researched the latest progress and main technical paths of large language models, forming a review article in this field, citing or introducing over 420 related papers, hoping to provide certain technical references for researchers and engineers.

DingTalk Accesses Qianwen Large Model, Claiming Full Intelligence in the Future

One week after the launch of the Qianwen large model, DingTalk confirmed its access to Qianwen. Currently, DingTalk is testing scenarios of integration with large models and will go live after relevant security assessments are completed.

Solving Various Problems Encountered in Deep Learning—Automatic Differentiation Methods—JAX (Just Another XLA)

Compared to the widely used automatic differentiation methods, JAX offers higher flexibility and scalability and can run on multiple platforms, including CPU, GPU, and TPU. Another advantage of JAX is its support for some programming languages based on source code generation, such as Python, NumPy, and SciPy.

https://ai.googleblog.com/2023/04/beyond-automatic-differentiation.html

DeepSpeed User Guide (Brief Version)

This article aims to briefly introduce the core concepts of using DeepSpeed for large-scale model training and the most basic usage methods.

http://e.betheme.net/article/show-1318637.aspx?action=onClick

AI Research Knowledge Group

A collection of mainstream AI tools, including chatgpt, Midjourney, AI painting, and video, etc.

https://zl49so8lbq.feishu.cn/wiki/wikcnLrLDTYCm2uxYKqzCVnCr1c

The World’s Largest Open Source Alternative to ChatGPT Has Arrived, Supporting 35 Languages

No need to worry about buying ChatGPT Plus anymore.

Google Forms “Magi” Project Team to Launch New AI-Driven Search Engine

The new search engine will provide users with a more personalized experience than Google’s existing search services and will attempt to anticipate user needs. Currently, Google has assembled a team of designers, engineers, and executives responsible for building this brand-new search engine.

AI with “Consciousness”: How to Enable Large Language Models to Have Self-Awareness?

To better explore the relationship between consciousness and artificial intelligence, Professor Zhang Jiang has sorted out topics such as human consciousness research, consciousness theory and modeling, self-reference and consciousness machines, and self-simulating consciousness machines.

OpenAI’s CEO States that the Era of Giant AI Models is Over

He believes that as large-scale pre-trained models consume a lot of computing resources and energy and have issues related to data privacy and environmental sustainability, future AI technology development will shift towards smaller, more interpretable, and more environmentally friendly models.

https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/

A Gradient Perspective on LoRA: Introduction, Analysis, Speculation, and Promotion

DINOv2: Learning Robust Visual Features Without Supervision

https://github.com/facebookresearch/dinov2

What is Emergence?

MIT Experts Discuss Generative AI, Should Approach Model Potentials with Humility and Continue Learning

How AIGC is Used for Recommendations? Latest Paper from USTC “Generative Recommendation: Towards a New Paradigm for Next-Generation Recommendation Systems”

This paper proposes a new generative recommendation system paradigm GeneRec, which serves users’ personalized information needs by combining content generation and instruction guidance. Additionally, the author emphasizes the importance of various fidelity checks to ensure the credibility of generated content.

Unveiling the Harsh Truth Behind the Buzz of Auto-GPT!

Is Auto-GPT a groundbreaking project or an overhyped AI experiment? This article uncovers the truth behind the buzz and reveals the production limitations of Auto-GPT that make it unsuitable for practical applications.

Adobe Firefly Also Begins Supporting Video

Adobe brings generative AI into video editing, allowing algorithms to assist users in generating desired video effects.

NVIDIA Releases Audio-to-Video Model LDMs

https://research.nvidia.com/labs/toronto-ai/VideoLDM/

Microsoft Open Sources “Foolproof” ChatGPT Model Training Tool, Greatly Reducing Costs and Speeding Up by 15 Times

Deep Speed Chat is developed based on Microsoft’s Deep Speed deep learning optimization library, featuring training, reinforcement inference, and other functions, and uses RLHF (Reinforcement Learning from Human Feedback) technology to increase training speed by over 15 times while significantly reducing costs.

Amazon EC2 Inf2 Has Officially Launched, Providing Low-Cost, High-Performance Generative AI Inference Services.

This details the characteristics and advantages of Inf2 instances, providing useful guidance and suggestions for users to better utilize Inf2 instances for generative AI inference.

https://aws.amazon.com/cn/blogs/aws/amazon-ec2-inf2-instances-for-low-cost-high-performance-generative-ai-inference-are-now-generally-available/

NVIDIA’s Core i5 Processor is One of the Most Cost-Effective CPUs Currently, but Which One is More Suitable for You?

The author mentions that the Core i5 processor strikes a good balance between price and performance, meeting the needs of most users. However, different models of Core i5 processors have different specifications and characteristics, such as core count, clock frequency, cache size, etc., and choices should be made based on one’s usage needs and budget.

https://arstechnica.com/gadgets/2023/04/intels-core-i5-is-the-best-bargain-in-cpus-right-now-but-which-should-you-get/

The World’s First 3nm Chip Officially Released

According to Marvell, the industry’s first silicon building module at this node includes 112G XSR SerDes (serializer/deserializer), Long Reach SerDes, PCIe Gen 6 / CXL 3.0 SerDes, and 240 Tbps parallel chip-to-chip interconnect.

Amazon CEO States AWS Employees Now Spend “Most of Their Time” Optimizing Customer Cloud

Bezos stated that AWS is building a more secure, reliable, efficient, and environmentally friendly cloud computing infrastructure while also expanding new products and services to meet customer needs.

https://www.theregister.com/2023/04/17/amazon_annual_shareholder_letter_aws/

PAG 4.2 Version Officially Released: New 3D Layer and Video Replacement Capabilities, Significantly Optimized UI Playback Performance

The PAG 4.2 version has added support for the highly demanded 3D layers, optimized for UI and list scenes that require multiple PAG animations to play simultaneously, while also packaging vertical fields such as video post-editing and material encryption to meet specific user needs.

Image Classification Using Flux.jl

National Standards for AI Model Technology Officially Released, Global Standard System Layout Basically Formed

BP-EVD: A Real-Time Video Denoising Method

This article presents a deep learning-based video denoising method that cleverly arranges the utilization of data in the time domain to achieve high-quality real-time video denoising.

How to Systematically Learn Machine Vision Technology?

This article is a summary of some knowledge about machine vision, and it is recommended for those who want to learn to save.

Real-Time Interactive RTI Technology Capability Construction in the Metaverse Scenario

LiveVideoStack 2022 Beijing Station invited experts from ZEGO to introduce the underlying technology capabilities built in the metaverse scenario.

How Edison Helps Us Build a Faster, More Powerful Dropbox on the Web

Dropbox has rewritten its core web service stack for the next decade: decommissioning the technical debt accumulated over the past 13 years and migrating high-traffic surfaces to a future-proofed platform to accommodate the company’s multi-product evolution.

https://dropbox.tech/frontend/edison-webserver-a-faster-more-powerful-dropbox-on-the-web

NAB Exhibition Zone Detailed Explanation

This introduces the booths and new technologies at NAB, interested parties can watch.

https://www.sportsvideo.org/2023/04/19/sportstechbuzz-at-nab-2023-wednesdays-latest-from-vegas/

2023 Spring Volcano Engine “FORCE·Original Power” Conference

On April 18, the 2023 Spring Volcano Engine “FORCE·Original Power” Conference was held in Shanghai, showcasing the latest explorations, applications, and practices of Volcano Engine in cloud technology, cloud services, and cloud scenarios, presenting a strategic blueprint for innovative development.

BlikVM’s Open Source KVM-over-IP Solution

It allows you to remotely control and manage other computers over the network using devices with Raspberry Pi CM4 or Allwinner H616 processors. BlikVM is driven by a PCIe card designed based on the Raspberry Pi HAT, which provides the capability to transmit video signals and USB input/output over the network.

https://www.cnx-software.com/2023/04/18/blikvm-open-source-kvm-over-ip-raspberry-pi-cm4-raspberry-pi-hat-pcie-board-allwinner-h616/

CNCF Fuzz Testing Open Source Project’s Security and Reliability

This introduces the CNCF project, results, and two goals: 1. Expand existing setups to include more fuzzers and integrate more projects into OSS-Fuzz; 2. Improve the sustainability of fuzz testing work by increasing maintainer participation and education.

https://www.cncf.io/blog/2023/04/18/cncf-fuzzing-open-source-projects-for-security-and-reliability/

2023 Video Codec Status

Although HEVC is an efficient codec, due to its usage fees and patent restrictions, AV1 is becoming a more popular choice.

https://www.streamingmedia.com/Articles/Editorial/Featured-Articles/The-State-of-Video-Codecs-2023-158116.aspx

CVPR 2019 | Practical Full-Resolution Learning Lossless Image Compression

This article proposes the first practical learning lossless image compression system, L3C, and shows that it outperforms popular engineering codecs PNG, WebP, and JPEG2000.

Non-linear Vector Transform Coding – Exploring a New Coding Framework

Proposes a VQ codebook initialization strategy to solve the problem of joint optimization in multi-level VQ.

NVIDIA Quietly Monopolizes Computing Power: The New Empire Behind AI

The expansion of computing power and the development and layout of technology are the reasons for NVIDIA’s success.

How Japan Uses AI to Solve Elderly Travel Problems

Tokyo Haneda Airport has launched self-driving wheelchairs for elderly and mobility-impaired passengers, achieving automated driving from the security check to the boarding gate.

Event Recommendations

LiveVideoStackCon 2023 Shanghai Station Lecturers Recruitment

LiveVideoStackCon is a stage for everyone. If you are a key player in your team or company, have years of practice in a specific field or technology, and are keen on technical exchanges, you are welcome to apply to be a lecturer at LiveVideoStackCon. Please submit your speaking content to the email: [email protected].

https://sh2023.livevideostack.cn/

Weekly Insights into Audio and Video Technology

【公开课】开放XCDN直播方案设计与实践

On April 25, 19:00, we invite Baidu Intelligent Cloud Video Cloud Technical Architect Ke Yugang to introduce a live broadcast solution based on the HTTP/3 protocol and analyze in detail how to use a unified protocol to collaborate with cloud, edge, and end resources at all levels, using an open architecture to achieve interoperability of services from multiple vendors, and how to efficiently utilize complex edge resources for fast loading and stable playback of videos.

Time: Weekly Insights into Audio and Video Technology April 25, 2023, 19:00

Registration: Weekly Insights into Audio and Video Technology Scan the QR code in the image or click【Read Original】 to make an appointment for registration and watch the live broadcast!

Leave a Comment Cancel reply