Automatically Fetch Kugou Music with Python: Build Your Personal Music Library

In the digital music era, we often wish to have a personal music library to save our favorite songs. Manually downloading music is not only time-consuming but also prone to omissions. Today, I will share how to build a tool using Python to automatically fetch Kugou music, allowing you to easily establish your own music library.

Automate Music Fetching with Python for More Efficient Music Collection.

Clear Objective: Efficient Acquisition and Convenient Management

My goal is to create a tool that can automatically fetch songs from the Kugou music website and save them as local files. This way, I can enjoy music anytime and anywhere without worrying about network connectivity issues. At the same time, I hope this tool can support batch downloads of multiple songs and albums for easy management.

Tool Selection: From Simple to Powerful

To achieve this goal, I chose the requests library in Python to send HTTP requests and the BeautifulSoup library to parse HTML content. These two libraries are easy to use and powerful enough to handle most website scraping needs.

import requests
from bs4 import BeautifulSoup

Step 1: Send Request to Fetch Web Page Content

I first need to send an HTTP request to fetch the HTML content of the Kugou music website. Taking the example of searching for a singer’s songs, I constructed the request URL and sent the request:

search_url = "https://www.kugou.com/song/#searchKeyWord=周杰伦"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

response = requests.get(search_url, headers=headers)

Check the response status code to ensure the request was successful:

if response.status_code == 200:
    html_content = response.text
else:
    print(f"Request failed, status code: {response.status_code}")

Step 2: Parse HTML and Extract Song Information

After obtaining the HTML content, I used BeautifulSoup to parse the page and find the section containing the song list. By examining the webpage source code, I found that the song list is usually within a specific div tag:

soup = BeautifulSoup(html_content, 'html.parser')
song_list = soup.find('div', class_='song-list')

Extract the name, singer, and play link of each song:

songs = []
for item in song_list.find_all('div', class_='item'):
    title = item.find('span', class_='song-name').get_text()
    singer = item.find('span', class_='singer').get_text()
    play_link = item.find('a')['href']
    songs.append({'title': title, 'singer': singer, 'play_link': play_link})

Step 3: Get Play Page and Extract Download Link

With the play link, I need to further obtain the actual download link for the song. Typically, the play page contains the source address of the audio:

def get_download_link(play_link):
    play_response = requests.get(play_link, headers=headers)
    if play_response.status_code == 200:
        play_soup = BeautifulSoup(play_response.text, 'html.parser')
        audio_tag = play_soup.find('audio')
        if audio_tag and 'src' in audio_tag.attrs:
            return audio_tag['src']
    return None

for song in songs:
    song['download_link'] = get_download_link(song['play_link'])

Step 4: Download Music and Save as Local Files

After obtaining the download link, I save the music content as local files:

import os

def download_music(download_link, save_path):
    response = requests.get(download_link, headers=headers, stream=True)
    if response.status_code == 200:
        with open(save_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
        return True
    return False

music_folder = "my_music"
os.makedirs(music_folder, exist_ok=True)

for song in songs:
    if song['download_link']:
        file_name = f"{song['title']} - {song['singer']}.mp3"
        save_path = os.path.join(music_folder, file_name)
        if download_music(song['download_link'], save_path):
            print(f"Download successful: {file_name}")
        else:
            print(f"Download failed: {file_name}")

Step 5: Handle Pagination to Fetch All Songs

The search results on Kugou music are usually displayed in pages, so I need to handle pagination logic to fetch more songs:

def get_all_songs(artist, max_pages=5):
    all_songs = []
    for page in range(1, max_pages + 1):
        page_url = f"https://www.kugou.com/song/#searchKeyWord={artist}&amp;page={page}"
        response = requests.get(page_url, headers=headers)
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, 'html.parser')
            song_list = soup.find('div', class_='song-list')
            if not song_list:
                break  # No more songs, exit loop
            for item in song_list.find_all('div', class_='item'):
                title = item.find('span', class_='song-name').get_text()
                singer = item.find('span', class_='singer').get_text()
                play_link = item.find('a')['href']
                all_songs.append({'title': title, 'singer': singer, 'play_link': play_link})
        else:
            print(f"Request for page {page} failed")
            break
    return all_songs

artist_songs = get_all_songs("周杰伦")
for song in artist_songs:
    song['download_link'] = get_download_link(song['play_link'])

Step 6: Batch Download and Manage Music

To support batch downloads, I created a configuration file listing the artists or albums to download:

# config.py
artists = ["周杰伦", "林俊杰", "Taylor Swift"]
albums = ["Album 1", "Album 2"]

The main program reads the configuration file and downloads each song in turn:

from config import artists, albums

def download_all_music():
    for artist in artists:
        songs = get_all_songs(artist)
        for song in songs:
            if song['download_link']:
                file_name = f"{song['title']} - {song['singer']}.mp3"
                save_path = os.path.join(music_folder, file_name)
                if download_music(song['download_link'], save_path):
                    print(f"Download successful: {file_name}")
                else:
                    print(f"Download failed: {file_name}")

download_all_music()

Optimization and Expansion: From Basic to Advanced

After implementing the basic functionality, I began optimizing and expanding the tool. To improve download speed, I added multi-threading support:

import concurrent.futures

def download_song(song):
    if song['download_link']:
        file_name = f"{song['title']} - {song['singer']}.mp3"
        save_path = os.path.join(music_folder, file_name)
        if download_music(song['download_link'], save_path):
            return f"Download successful: {file_name}"
        else:
            return f"Download failed: {file_name}"
    return None

def download_all_music_multithread():
    all_songs = []
    for artist in artists:
        all_songs.extend(get_all_songs(artist))
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        results = list(executor.map(download_song, all_songs))
        for result in results:
            if result:
                print(result)

download_all_music_multithread()

To handle potential network issues, I added a retry mechanism:

from tenacity import retry, stop_after_attempt

@retry(stop=stop_after_attempt(3))
def download_music_with_retry(download_link, save_path):
    return download_music(download_link, save_path)

Conclusion:

By building this tool to automatically fetch music, I not only solved the cumbersome problem of manually downloading music but also learned how to efficiently handle web scraping tasks with Python. This tool allows me to easily establish my own music library and enjoy music anytime, anywhere.

As I realized during development, “Automation is not just a technology, but a way of life.” This statement made me aware that the true value of programming lies in how it changes our lives. Whether it’s fetching music or solving other daily problems, Python can become our most powerful tool.