On the RK3588 platform, after a certain stage, if the hardware cursor is enabled, it is likely to disappear in some scenarios, and cannot be restored without a system reboot. Based on this phenomenon, it can be determined that the root cause lies within the kernel. Below, we will discuss this issue and how to resolve and avoid the cursor disappearance problem.
1. Problem Description
On the RK3588 platform, customers frequently report that the cursor occasionally disappears, which may be related to plugging and unplugging the display, waking from sleep, or turning the display on and off. Once the mouse disappears, there is no way to restore it normally; the only solution is to reboot the system.
2. Diagnosis
According to the problem description
"Only a reboot can restore it," indicating that the issue is in the kernel.
"Plugging and unplugging the display, waking from sleep, turning the display on and off" indicates that it is related to the VOP of the RK platform.The VOP driver for the RK platform is mainly located in the file <span>drivers/gpu/drm/rockchip/rockchip_drm_vop2.c</span>, so debugging this file is necessary.
3. Specific Cause
Pay attention to this commit:
Author: Andy Yan <[email protected]>
Date: Fri Feb 25 20:31:08 2022 +0800
drm/rockchip: vop2: A workaround for PD_CLUSTER0 off
 The internal PD of VOP2 on rk3588 takes effect immediately
for power up and takes effect by vsync for power down.
And the PD_CLUSTER0 is a parent PD of PD_CLUSTER1/2/3,
 we may have this use case:
 Cluster0 is attached to VP0 for HDMI output,
 Cluster1 is attached to VP1 for MIPI DSI,
When we enable Cluster1 on VP1, we should enable PD_CLUSTER0 as
 it is the parent PD, even though HDMI is unplugged, VP1 is disabled,
 the PD of Cluster0 should keep power on.
When the system goes to suspend:
 (1) Power down PD of Cluster1 before VP1 standby (the power down is
take effect by vsync)
 (2) Power down PD of Cluster0
 But we have a problem at step (2), Cluster0 is attached to VP0. The bus VP0
is in standby mode, as it is never used or HDMI is unplugged. So there is
 no vsync, the power down will never take effect.
 According to the IC designer: We must power down all internal PD of VOP
 before we power down the global PD_VOP.
 So we get this workaround:
 We found a VP is in standby mode when we want to power down a PD attached to it,
 we release the VP from standby mode, then it will run a default timing and generate vsync.
 Then we can power down the PD by this vsync. After all this is done, we standby the VP at last.
 Signed-off-by: Andy Yan <[email protected]>
 Change-Id: Ib9be8628f07d783c6bc3b7678c5eebfc63aabe1cIt can be seen that RK previously implemented a workaround to solve a certain issue.
Next, look at the following function:
static void vop2_power_domain_get(struct vop2_power_domain *pd)
{
if (pd->parent)
vop2_power_domain_get(pd->parent);
spin_lock(&pd->lock);
if (pd->ref_count == 0) {
if (pd->vop2->data->delayed_pd)
cancel_delayed_work(&pd->power_off_work);
vop2_power_domain_on(pd);
}
pd->ref_count++;
spin_unlock(&pd->lock);
}
static void vop2_power_domain_put(struct vop2_power_domain *pd)
{
spin_lock(&pd->lock);
/*
* For a nested power domain (PD_Cluster0 is the parent of PD_Cluster1/2/3)
* the parent power domain must be enabled before the child power domain
* is on.
*
* So we may meet this condition: Cluster0 is not on an activated VP,
* but PD_Cluster0 must be enabled as one of the child PD_CLUSTER1/2/3 is enabled.
* When all child PD is disabled, we want to disable the parent
* PD (PD_CLUSTER0), but as module CLUSTER0 is not attached to an activated VP,
* the turn off operation (which takes effect by vsync) will never take effect.
* So we will see a "wait pd0 off timeout" log when we turn on PD_CLUSTER0 next time.
*
* So we have a check here
*/
if (--pd->ref_count == 0 && vop2_power_domain_can_off_by_vsync(pd)) {
if (pd->vop2->data->delayed_pd)
schedule_delayed_work(&pd->power_off_work, msecs_to_jiffies(2500));
else
vop2_power_domain_off(pd);
}
spin_unlock(&pd->lock);
if (pd->parent)
vop2_power_domain_put(pd->parent);
}
Note this comment:
+ /*
+ * @lock: protect power up/down procedure.
+ * power on takes effect immediately,
+ * power down takes effect by vsync.
+ * we must check power_domain_status register
+ * to make sure the power domain is down before
+ * sending a power on request.
+ *
+ */It can be seen that RK needs to ensure that the power on of the window must occur after the window has been powered off.
Thus, pay attention to this reference count variable:
pd->ref_countThis means that RK uses a reference counting method to track whether the VOP window has been properly powered off. If the reference count is not zero, it indicates that the window is still open, and therefore it will not actively attempt to reopen the window.
Thus, the commit <span>RK drm/rockchip: vop2: A workaround for PD_CLUSTER0 off</span> has issues.
Note the following comment and commit:
 According to IC designer: We must power down all internal PD of VOP
 before we power down the global PD_VOP.The code is as follows:
if (vp) {
 ret = clk_prepare_enable(vp->dclk);
if (ret < 0)
 DRM_DEV_ERROR(vop2->dev, "failed to enable dclk for video port%d - %d\n",
 vp->id, ret);
 crtc = &vp->rockchip_crtc.crtc;
 VOP_MODULE_SET(vop2, vp, standby, 0);
 vop2_power_domain_off(pd);
 vop2_cfg_done(crtc); 
 vop2_wait_power_domain_off(pd);
reinit_completion(&vp->dsp_hold_completion);
 vop2_dsp_hold_valid_irq_enable(crtc);
 VOP_MODULE_SET(vop2, vp, standby, 1);
 ret = wait_for_completion_timeout(&vp->dsp_hold_completion, msecs_to_jiffies(50));
if (!ret)
 DRM_DEV_INFO(vop2->dev, "wait for vp%d dsp_hold timeout\n", vp->id);
vop2_dsp_hold_valid_irq_disable(crtc);
 clk_disable_unprepare(vp->dclk);
 }
There is a loophole here, which is that this workaround patch will actively close the VOP, but does not reset the reference count. Therefore, the window is abnormally closed due to this workaround commit, making it difficult to reopen.
The solution to the problem is as follows:
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c 
index 996afa881289..e648a23bcc9f 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
@@ -3454,6 +3454,7 @@ static void vop2_power_domain_off_by_disabled_vp(struct vop2_power_domain *pd) crtc = &vp->rockchip_crtc.crtc;
 VOP_MODULE_SET(vop2, vp, standby, 0);
 vop2_power_domain_off(pd);
+ pd->ref_count = 0;
 vop2_cfg_done(crtc);
vop2_wait_power_domain_off(pd);
4. Mitigation Methods
Now that we know the cursor disappearance is related to a workaround from RK, a careful look at this patch reveals the following:
if (pd->data->id == VOP2_PD_CLUSTER0 || pd->data->id == VOP2_PD_CLUSTER1 ||
 pd->data->id == VOP2_PD_CLUSTER2 || pd->data->id == VOP2_PD_CLUSTER3) {
 phys_id = ffs(pd->data->module_id_mask) - 1;
 win = vop2_find_win_by_phys_id(vop2, phys_id);
 vp_id = ffs(win->vp_mask) - 1;
 vp = &vop2->vps[vp_id];
 } else {
 DRM_DEV_ERROR(vop2->dev, "unexpected power on pd%d\n", ffs(pd->data->id) - 1);
 }
This means that the workaround only takes effect when the plane is cluster 0/1/2/3. Therefore, for the kernel, we can set the cursor layer to not use the cluster layer, as follows:
&vp0 {
 rockchip,plane-mask = <(1 << ROCKCHIP_VOP2_CLUSTER0 | 1 << ROCKCHIP_VOP2_ESMART0)>;
 rockchip,primary-plane = <ROCKCHIP_VOP2_ESMART0>;
 cursor-win-id = <ROCKCHIP_VOP2_ESMART1>;
};Here, using esmart1 as an example, any non-cluster layer can be used.
5. Current Phenomenon
If the reference counting method is used to solve the problem, there is still a probability that the mouse layer will be closed, but the next time <span>drm_atomic_check_only</span> is called, it will still open normally. The phenomenon is as follows:
The cursor may disappear (in cases of plugging and unplugging the display, waking from sleep), but moving the mouse will cause the cursor to display normally.
Considering this situation, the impact is minimal and does not count as a bug, so no further modifications are necessary.
If the mitigation method is used, this impact does not exist.
6. Further Modifications
If the current phenomenon still needs to improve the experience, further customization can be made, which is to recognize if it is the cursor layer and if the cursor is bound to the CRTC, then set it to visible instead of invisible. However, I personally think this is not very necessary.
@@ -4131,6 +4132,8 @@ static int vop2_plane_atomic_check(struct drm_plane *plane, struct drm_plane_sta
 plane->name, state->src_x >> 16, state->src_y >> 16, state->src_w >> 16,
 state->src_h >> 16, state->crtc_x, state->crtc_y, state->crtc_w,
 state->crtc_h);
+ if(plane == crtc->cursor)
+ state->visible = true;
 return 0;
 }
This ensures that the mouse window will not be abnormally closed due to other special circumstances.
7. Conclusion
This issue is a new problem introduced by RK to solve other issues. It is related to reference counting, and the kernel uses reference counting in many places. However, it is important to note that reference counting must be considered thoroughly; if someone modifies the issue without paying attention to the use of reference counting, the entire reference counting logic will fail, leading to problems.
For more articles, please follow the public account: Kirin Embedded.