Practical Python: Batch Retrieve Street Information from Gaode Map API and Write Back to Excel

In our daily data processing work, we often need to batch retrieve the street information based on the registered addresses of companies, events, or stores, such as “Zhangjiang Town, Pudong New District” or “Longhua Street, Xuhui District”. Manual queries are obviously inefficient, but with Python + Gaode Map API, we can easily automate batch queries and write the results into an Excel file.

This article will fully demonstrate a practical script that reads addresses from Excel β†’ calls Gaode API β†’ retrieves streets β†’ writes back to Excel, and explain the implementation details and optimization ideas.

1. Function Overview

The functionality of this script can be summarized in four steps:

  1. 1. Read address data from the Excel file;
  2. 2. Call Gaode Map’s geocoding (geocode) and reverse geocoding (regeo) interfaces to obtain street names;
  3. 3. Automatically write the query results back to a new column in Excel;
  4. 4. Retry and log for addresses that fail to query, ensuring data completeness as much as possible.

2. Project Dependencies and Preparation

Before starting, please ensure the following dependencies are installed:

pip install pandas openpyxl requests

And apply for an API Key on the Gaode Open Platform, the application address is:πŸ‘‰ https://lbs.amap.com/api/webservice/guide/create-project

After obtaining the key, fill it in the configuration section at the beginning of the script:

key = "Your_Gaode_API_KEY"

3. Core Logic Explanation

1. Reading Excel Files and Column Processing

The script uses <span>pandas</span> and <span>openpyxl</span> to read the Excel file:

df = pd.read_excel(input_file)
if 'ζ³¨ε†Œεœ°ε€' not in df.columns:
    df['ζ³¨ε†Œεœ°ε€'] = df.iloc[:,16]
addresses = df['ζ³¨ε†Œεœ°ε€'].tolist()

This code first reads the entire Excel file and checks if the “registered address” column exists; if not, it automatically takes the 17th column (index 16) as the address column to ensure compatibility with different formats of tables.

Then, the script uses <span>openpyxl</span> to open the same file to preserve cell styles and prepare to write a new “street” column:

wb = load_workbook(input_file)
ws = wb.active
ws.insert_cols(target_col)
ws.cell(row=header_row_index, column=target_col, value="著道")

This allows for data reading while maintaining the original format of the table, making it easier for downstream personnel to view directly.

2. Calling Gaode API to Retrieve Street Information

The core query function is as follows:

def get_street_from_amap(address, retries=max_retries):
    if not isinstance(address, str) or not address.strip():
        return ""
    for attempt in range(1, retries+1):
        try:
            geo_resp = requests.get(
                "https://restapi.amap.com/v3/geocode/geo",
                params={"key": key, "address": address, "city": "上桷"},
                timeout=15
            ).json()

            if not geo_resp.get("geocodes"):
                continue

            location = geo_resp["geocodes"][0]["location"]

            regeo_resp = requests.get(
                "https://restapi.amap.com/v3/geocode/regeo",
                params={"key": key, "location": location, "extensions": "base", "radius":500},
                timeout=15
            ).json()

            if regeo_resp.get("regeocode"):
                township = regeo_resp["regeocode"]["addressComponent"].get("township","") or ""
                return township
        except Exception as e:
            print(f"[Attempt {attempt}/{retries}] Address query failed: {address}, Error: {e}")
        time.sleep(sleep_time + random.random()*0.5)
    return None

This logic is divided into two steps:

  1. 1. Forward Geocoding (geocode): Obtain latitude and longitude based on the address string;
  2. 2. Reverse Geocoding (regeo): Retrieve the street name (township) based on latitude and longitude.

It also includes an exception retry mechanism and random delay, to prevent frequent requests from triggering Gaode API rate limiting.

3. Batch Query and Cache Optimization

The query process is implemented through a loop:

cache = {}
failed_addresses = []

for row_idx, addr in enumerate(addresses, start=header_row_index+1):
    if not isinstance(addr,str) or not addr.strip():
        ws.cell(row=row_idx, column=target_col, value="")
        continue
    if addr in cache:
        township = cache[addr]
    else:
        township = get_street_from_amap(addr)
        if township is None:
            failed_addresses.append((row_idx, addr))
            township = ""
        cache[addr] = township
        time.sleep(sleep_time + random.random()*0.5)
    ws.cell(row=row_idx, column=target_col, value=township)

There are several optimization points here:

  • β€’ Cache Mechanism: If the same address appears multiple times, only request once;
  • β€’ Delay Strategy: <span>sleep_time + random.random()*0.5</span><span>, to avoid being flagged by the API;</span>
  • β€’ Real-time Progress Output: Print progress every 50 rows.

4. Failure Retry and Error Logging

For addresses that fail on the first query, the script will automatically initiate a second round of re-query:

if failed_addresses:
    print(f"A total of {len(failed_addresses)} addresses failed on the first query, starting automatic re-query...")
    still_failed = []
    for row_idx, addr in failed_addresses:
        township = get_street_from_amap(addr)
        if township is None:
            still_failed.append((row_idx, addr))
            township = ""
        cache[addr] = township
        ws.cell(row=row_idx, column=target_col, value=township)
        time.sleep(sleep_time + random.random()*0.5)
    failed_addresses = still_failed

Addresses that still fail to query will be written into a separate Excel file:

if failed_addresses:
    df_fail = pd.DataFrame([addr for _, addr in failed_addresses], columns=["εœ°ε€"])
    df_fail.to_excel(failed_file, index=False)

This makes it convenient for manual secondary processing, such as manually adjusting address formats or supplementing missing information.

4. Running Results

After executing the script, the console will display output similar to:

Processed 50 rows, most recent address: 123 Zhangjiang Road, Pudong New District, Shanghai β†’ Zhangjiang Town
Processed 100 rows, most recent address: 56 Chuansha Road, Pudong New District, Shanghai β†’ Chuansha New Town
A total of 5 addresses failed on the first query, starting automatic re-query...
Completed, saved: Event List - Shanghai Pudong - with Streets.xlsx
Addresses that still failed have been saved to Failed Address Query.xlsx

The final output file will have a new column “Street”, while preserving the original format:

Registered Address Street
123 Zhangjiang Road, Pudong New District, Shanghai Zhangjiang Town
56 Chuansha Road, Pudong New District, Shanghai Chuansha New Town

5. Practical Suggestions and Expansion Directions

  1. 1. Batch Query Speed Control
  • β€’ Gaode API has request frequency limits for a single IP, it is recommended to control the number of requests per second.
  • β€’ If the data volume is large, consider a multi-threaded + rate-limited queue model.
  • 2. Address Cleaning Preprocessing
    • β€’ You can first perform regex cleaning on the addresses to remove excess punctuation, parentheses, spaces, etc., to improve hit rates.
  • 3. Multi-City Adaptation
    • β€’ The current city is fixed as “Shanghai”, which can be configured through parameters for nationwide adaptation.
  • 4. Error Log Recording
    • β€’ It is recommended to output more logs during the re-query phase, such as return status codes and error types, for easier debugging.
  • 5. API Alternative Solutions
    • β€’ If the data volume is huge, you can use Gaode Map’s batch geocoding interface (supports up to 10 entries at a time) to further improve efficiency.

    6. Conclusion

    This article demonstrates how to use Python + Gaode Map API to achieve automated processing of “batch addresses β†’ street affiliation” through a practical case. The entire process covers data reading, interface calling, exception retrying, and result writing back, making it both a practical tool script and a reflection of Python’s powerful capabilities in data automation.

    Core Highlights:

    Module Function
    pandas + openpyxl Efficient reading and writing of Excel
    requests Call Gaode API for geographic parsing
    Cache and Retry Mechanism Improve query stability and speed
    Automatically Generate Failure Files Facilitate manual supplementation and quality control

    If you regularly need to process a large number of enterprise, store, or event addresses, this script can save you a lot of time.

    Leave a Comment