Practical Use of Arcpy: Easily Handle shp/mdb/gdb Data Reading and Attribute Filtering

Practical Use of Arcpy: Easily Handle shp/mdb/gdb Data Reading and Attribute Filtering

Hello, GIS enthusiasts! Are you still struggling to manually open each shp, mdb, and gdb file to filter data? Today, we will introduce arcpy, the “GIS automation wizard,” to teach you how to read different format data and filter attributes with just a few lines of code, allowing you to say goodbye to repetitive clicks and achieve “run the code, get the data!”

1. Preparation: Setting Up the “Workspace” for Arcpy

Before we start, we need to clarify the environment and basic concepts; otherwise, it’s easy to get stuck.

1. Environment Requirements

  • ArcGIS Installation: arcpy is a Python module that comes with ArcGIS, so you must first install ArcGIS (either ArcMap 10.x or ArcGIS Pro is fine, but Pro is recommended as it is more compatible with Python 3).
  • Python Environment: Use the Python that comes with ArcGIS (for example, the Python environment for ArcGIS Pro is located in the installation directory under <span>bin\Python</span>), to avoid issues with “arcpy not found” when using the system’s Python.
  • Workspace Setup: This is crucial! It sets the “base camp” for arcpy, and subsequent operations will look for data in this path by default, reducing the need to write repetitive paths.
import arcpy
# Use raw string (r"") to avoid errors with slashes in the path, replace with your data path!
arcpy.env.workspace = r"C:\GIS Data\My Dataset"

2. Core Concepts

  • Feature Class: The spatial data carrier in shp, mdb/gdb (for example, “city.shp” or “village distribution”), which contains spatial information (point/line/polygon) and attribute tables.
  • Attribute Filtering (Where Clause): Using SQL syntax to filter records in the attribute table that meet certain conditions (for example, “population > 5000” or “type = ‘park'”).

2. Step-by-Step: Reading and Filtering Different Format Data

We will sort by “commonality” and first tackle shp, then mdb and gdb, with each section including “code + comments + practical tips”.

1. Shp Files: The Most Common “Lightweight” Data

Shp is the most common format in GIS, consisting of files like <span>.shp</span><span>.shx</span><span>.dbf</span>, making it the most straightforward to read.

Reading + Filtering Steps

  1. Set the workspace (pointing to the folder where the shp is located);
  2. Use <span>ListFeatureClasses()</span> to find the target shp;
  3. Write the filtering conditions (Where Clause);
  4. Use <span>SearchCursor</span> to read the filtered data.

Practical Code (Filtering cities with “population > 1000”)

import arcpy
# 1. Set the workspace (folder where the shp is located)
arcpy.env.workspace = r"C:\GIS Data\Basic Data\shp folder"
# 2. Get all shp files in the folder (ListFeatureClasses() defaults to finding all feature classes)
shplist = arcpy.ListFeatureClasses()
# 3. Iterate through shp to find the target file (for example, "city population.shp")
for shp in shplist:
    if shp == "城市人口.shp":
        print(f"Found target shp: {shp}, starting filtering...")
# 4. Define filtering condition: numeric fields write conditions directly ("population" is the attribute table field name)
        whereclause = "'人口数' > 1000"
# 5. Use SearchCursor to read data (specify the fields to read to avoid wasting resources)
# Syntax: SearchCursor(feature class, fields to read, filtering condition)
with arcpy.da.SearchCursor(shp, ["城市名", "人口数"], whereclause) as cursor:
    for row in cursor:
        # row[0] corresponds to "城市名", row[1] corresponds to "人口数"
        print(f"City: {row[0]}, Population: {row[1]}")
print(f"Filtering of {shp} completed!\n")

Mini Exercise

Try filtering shp with “area > 5000 square meters” (Hint: the area field is numeric, write the condition as <span>area > 5000</span>).

2. Mdb Files: The “Data Warehouse” in Access Database

Mdb is the Access database format, where GIS data exists in “datasets”; reading requires an additional step to “find the dataset”.

Reading + Filtering Steps

  1. Point the workspace to the <span>.mdb</span> file (not a folder!);
  2. Use <span>ListDatasets()</span> to find datasets (mdb feature classes are usually under datasets);
  3. Find the target feature class within the dataset;
  4. Write filtering conditions and read.

Practical Code (Filtering villages with “type = central village”)

import arcpy
# 1. Set the workspace directly to the mdb file
arcpy.env.workspace = r"C:\GIS Data\Regional Data.mdb"
# 2. Get all datasets in the mdb
datasets = arcpy.ListDatasets()
if datasets:  # If there are datasets (most cases)
    for dataset in datasets:
        print(f"Processing dataset: {dataset}")
# 3. Find feature classes in the current dataset (for example, "village distribution")
        fclist = arcpy.ListFeatureClasses(featuredataset=dataset)
        for fc in fclist:
            if fc == "村庄分布":
                print(f"Found target feature class: {fc}, starting filtering...")
# 4. Filtering condition: string fields must use single quotes!
                whereclause = "[村庄类型] = '中心村'"
# 5. Read data (fields: village name, village type)
with arcpy.da.SearchCursor(fc, ["村庄名", "村庄类型"], whereclause) as cursor:
    for row in cursor:
        print(f"Village: {row[0]}, Type: {row[1]}")
else:  # In rare cases: feature classes directly in the mdb root directory (no datasets)
    fclist = arcpy.ListFeatureClasses()
# Subsequent filtering logic is the same as above...
print("\nmdb data filtering completed!")

Pitfall Points

  • Mdb Version: ArcGIS has general compatibility with Access versions above 2007 (.accdb), so try to use .mdb format;
  • Chinese Path: If the path contains Chinese characters, use Python 3 strings (no extra processing needed, ArcGIS Pro supports this).

3. Gdb Files: The “Professional Warehouse” for GIS Users

Gdb (file geodatabase) is the format recommended by Esri, supporting large data volumes and multiple feature classes, and is now commonly used in projects.

Reading + Filtering Steps

  1. Point the workspace to the <span>.gdb</span> folder (note: .gdb is a folder, not a single file!);
  2. Directly use <span>ListFeatureClasses()</span> to find feature classes (file gdb does not require datasets, enterprise gdb needs to connect first);
  3. Key: The filtering syntax for date fields is different from numeric/string.

Practical Code (Filtering communities with “completion date > 2000”)

import arcpy
# 1. Point the workspace to the gdb folder
arcpy.env.workspace = r"C:\GIS Data\Basic Data.gdb"
# 2. Target feature class (for example, "community construction")
targetfc = "社区建设"
# First check if the feature class exists to avoid path errors
if arcpy.Exists(targetfc):
    print(f"Found {targetfc}, starting filtering completion date...")
# 3. Filtering condition: date fields must be wrapped with #! Format: YYYY-MM-DD
    whereclause = "建成时间 > #2000-01-01#"
# 4. Read data (fields: community name, completion date)
with arcpy.da.SearchCursor(targetfc, ["社区名", "建成时间"], whereclause) as cursor:
    for row in cursor:
        # Date fields read as datetime type, convert to string for easy viewing
        buildtime = str(row[1])
        print(f"Community: {row[0]}, Completion Date: {buildtime}")
else:
    print(f"Feature class {targetfc} does not exist! Check the workspace path...")
print("\ngdb data filtering completed!")

Advanced Tips

If you only need to read certain fields, specify them in the second parameter of <span>SearchCursor</span> (for example, <span>["社区名", "建成时间"]</span>), which can significantly improve efficiency (especially with large data volumes).

3. Pitfall Guide: Avoid the “Landmines” of Attribute Filtering

Many beginners get stuck not because of code errors, but because the filtering conditions are written incorrectly! Here, we have compiled “incorrect vs correct writing” to help you avoid many detours.

Field Type Incorrect Writing (Will Cause Errors!) Correct Writing Reason Analysis
String village type = “central village” village type = ‘中心村’ ArcGIS only recognizes single quotes; double quotes will be treated as invalid symbols
Date completion date > 2000-01-01 completion date > #2000-01-01# Date must be wrapped with #, otherwise it will be treated as a numeric calculation
Numeric population > ‘1000’ population > 1000 Adding quotes to numeric values will treat them as strings, making it impossible to compare sizes
Logical Conditions population > 5000 and type = ‘park’ population > 5000 AND type = ‘park’ SQL keywords are case-insensitive, but using uppercase is more standard

4. Comprehensive Practice: Filtering Three Formats of Data at Once

After learning about individual formats, let’s do a “big integration”: filter elements with “population > 5000” from shp, mdb, and gdb, and merge them into a new gdb.

Requirements

  • Filter cities from shp (city population.shp) with population > 5000;
  • Filter villages from mdb (regional data.mdb/ towns/ village distribution) with population > 5000;
  • Filter communities from gdb (basic data.gdb/ community construction) with population > 5000;
  • Merge results into a new gdb (filtered results.gdb/ high population areas).

Complete Code

import arcpy
# 1. Define all paths
shppath = r"C:\GIS Data\Basic Data\shp folder\城市人口.shp"
mdbpath = r"C:\GIS Data\Regional Data.mdb"
gdbpath = r"C:\GIS Data\Basic Data.gdb"
outputgdb = r"C:\GIS Data\Filtered Results.gdb"  # Output new gdb
outputfc = "高人口区域"  # Output feature class name
# 2. Create new gdb (if it does not exist)
if not arcpy.Exists(outputgdb):
    arcpy.CreateFileGDB_management(r"C:\GIS Data", "Filtered Results.gdb")
print(f"Created new gdb: {outputgdb}")
# 3. Filter shp and export to new gdb (as initial feature class)
arcpy.FeatureClassToFeatureClass_conversion(
    in_features=shppath,
    out_path=outputgdb,
    out_name=outputfc,
    where_clause="'人口数' > 5000"
)
print("shp filtering export completed...")
# 4. Filter mdb and append to new feature class
arcpy.env.workspace = mdbpath
mdbfc = arcpy.ListFeatureClasses(featuredataset="城镇", wildcard="村庄分布")[0]
arcpy.Append_management(
    inputs=mdbfc,
    target=f"{outputgdb}\{outputfc}",
    schema_type="TEST",  # Check if field structure is consistent
    where_clause="[人口数] > 5000"
)
print("mdb filtering append completed...")
# 5. Filter gdb and append to new feature class
arcpy.env.workspace = gdbpath
arcpy.Append_management(
    inputs="社区建设",
    target=f"{outputgdb}\{outputfc}",
    schema_type="TEST",
    where_clause="人口数 > 5000"
)
print("gdb filtering append completed...")
# 6. Completion prompt
print(f"\nAll operations completed! Results are in: {outputgdb}\{outputfc}")

Running Instructions

  • Ensure all three data sources have a “population” field (field names must match), otherwise appending will cause errors;
  • If field names differ, you can first use <span>arcpy.AlterField_management()</span> to change the field name before appending.

5. Common Issues: Don’t Panic When Encountering Errors!

Q1: Importing arcpy causes “ModuleNotFoundError”?

  • Reason: Used the system’s Python, which could not find the arcpy module.
  • Solution: Open the Python window in ArcGIS (in ArcMap/Pro under “Geoprocessing → Python”), or use the Python interpreter that comes with ArcGIS (for example, ArcGIS Pro’s <span>bin\Python\envs\arcgispro-py3\python.exe</span>).

Q2: No data after filtering, and no errors?

  • Reason 1: Filtering conditions written incorrectly (for example, the field name is “population”, but you wrote “population count”);
  • Reason 2: There are no records in the data that meet the conditions (for example, all city populations are < 1000);
  • Solution: First manually check the attribute table in ArcGIS to confirm field names and data ranges, then adjust the filtering conditions.

Q3: Errors caused by Chinese paths?

  • Reason: ArcMap 10.x has poor compatibility with Chinese paths, while ArcGIS Pro has fixed this.
  • Solution: ArcMap users should try to use English paths; ArcGIS Pro users can directly use Chinese paths (Python 3 supports this).

6. Conclusion: The First Step from “Manual Worker” to “Code Master”

Today, we used arcpy to handle reading and attribute filtering of three core GIS data formats: shp, mdb, and gdb. From basic steps to pitfall guides, and then to comprehensive practice, the core logic is actually “set workspace → find feature class → write filtering conditions → read data.” The capabilities of arcpy go far beyond this! In the future, you can also combine it with spatial analysis (such as buffer zones, overlay analysis) and batch processing (such as batch format conversion) to increase your GIS work efficiency tenfold. Hurry up and try it with your own data, and if you encounter problems, come back to this article; I believe you will soon become an “arcpy expert”~

Leave a Comment