The Local Bus, also known as the CPU bus, can be divided into Motorola CPU bus and Intel CPU bus based on the differences in high and low address lines. The ancient CS51 microcontroller is a typical representative of the Intel CPU bus, while the commonly used Power PC is based on the Motorola CPU bus architecture, which evolved from the 60X bus (the 60X bus supports four selectable width modes: 64, 32, 16, and 8 bits). This article introduces the precautions during the development process using a Power PC CPU to read and write FPGA internal registers or RAM via the Local Bus and handle interrupt responses.
In the HINOC1.0 project, the FPGA prototype was interconnected with the Intel XScale PXA270 (ARM CPU) chip via the Local Bus timing, which is the same as the timing for the CPU with external asynchronous SRAM. Thus, the HINOC chip can be seen as a peripheral of the ARM CPU, with configurable registers or RAM access space. The specific timing diagram is shown below; the bus includes address, chip select, and read/write signals, and the following diagram shows a simple read timing. Note that the so-called asynchronous means that the CPU interface bus and the corresponding FPGA interface are not synchronized; that is, the signals sent by the CPU to the FPGA do not include a clock signal. The FPGA needs to sample the address, data, and other signals from the CPU interface using its internal clock signal and can only use them after synchronization. As shown in the timing of the CPU reading from the FPGA, after the CPU provides the read address and control signal, it must wait for tAA time before it can read valid data from the data bus. The duration of this tAA time can be adjusted by setting the registers in the PXA270.
It is quite impressive that when observing the above timing diagram from the FPGA side, one can feel how slow the CPU’s efficiency is. The CPU’s operation of reading and writing to the FPGA’s internal registers often takes dozens of clock cycles, resulting in lower efficiency. Fortunately, some improved SoC buses like AXI and others still have burst modes similar to DMA for continuous read and write operations.
When setting up the simulation environment, the CPU behavior shown in the above diagram can be simplified to two simple tasks: read and write (as shown in the following diagram). In the FPGA side circuit design code, the signals sent by the CPU are synchronized using two clock cycles, and then the address decoding is performed before passing it to subsequent modules.
Currently, this type of bus is used less frequently. However, in many specialized fields, such as aerospace, some domestic CPUs still use similar bus timing. Simple, reliable, and stable.
Debugging Environment Introduction
Hardware development board: AX7103; CPU development board: p2020; Operating system: Vxworks.
The hardware connection between the CPU and FPGA under the Vxworks operating system is shown in the following diagram 1, using the Local Bus interface.
Figure1 CPUandFPGAHardware Connection
On the hardware side, there is no direct connection between the development board and the p2020 Local Bus’s 50 pins. However, the driver side needs to develop and test the Local Bus driver, so a debugging environment needs to be created. By utilizing the 68 pins of the AX7103 development board, key pins related to the Local Bus (37 pins = 16 pins_addr + 16 pins_data + 3 pins_csn + 1 pin_oen + 1 pin_wen) and interrupt signals are defined to EX_IO1 and EX_IO2. Then, according to the p2020 schematic and the connectors J5, J4, the debugging environment is shown in the following diagram 2 (a bit ugly, just for prototype function verification). The diagram shows J5 as the p2020 connector (Local Bus related signals); J4 as the p2020 connector (hard wire interrupt, GND signal); EX_IO1 as the Local Bus address, chip select, and read/write enable pins; EX_IO2 as the data and interrupt pins.
Note: The voltage standards of the two development boards must be consistent; otherwise, they cannot be directly connected via Dupont wires. It is also recommended that the two development boards share a common ground connection.
Figure2 Actual Debugging Environment of Hardware Board and CPU Board
Overview
This public account has previously introduced the interfaces used for data interaction between FPGA and CPU, such as PCIe interfaces. In TSN or TTE systems, PCIe interfaces are often used at end nodes, while Local Bus interfaces are commonly used for data interaction between CPU and FPGA within switches. This article only discusses the debugging process of the Local Bus. For the hardware side, the Local Bus is simpler to develop than PCI or PCIe, as it only requires mapping the CPU memory addresses to hardware registers and RAM addresses, as well as the timing of read/write enable signals and chip select signals. This time, the Local Bus development involves establishing a Local Bus test project on the hardware side to debug the register read/write functionality and interrupt functionality of the Local Bus driver.
2.1.Data Read/Write Operations
Regarding the register read/write functionality, a valuable reference for an introduction to the Local Bus can be found at this link: https://wenku.baidu.com/view/aeca83593b3567ec102d8a80.html?from=search. Since the Local Bus is simple, there is not much to introduce; for theoretical parts, please refer to the link. Here, I will attach the timing diagrams of the key signals for read and write operations, as shown in diagrams 3 and 4 below. The same Local Bus interface may have inconsistent address and data bit widths on different CPU processors, and some signals may also differ. For example, in the BM3803 processor, the address data lines are not multiplexed, and the data width is 32 bits (double-word operation); in the p2020 processor, the data lines are multiplexed (the LALE signal latches the high 11 bits of the address, and if only 16 bits of the address are used, LALE can be left unconnected), and the data width is 16 bits (byte operation).
2.2.Interrupt Functionality
For interrupts, we use traditional hardwired interrupts. According to the p2020 schematic, except for irq3, which is already occupied, there are six other hardwired interrupts (irq0/1/2/4/5/6) available for peripherals, all located at the J4 connector position. From the p2020 Datasheet, we know that there are 64 types of internal interrupts and 12 types of external interrupts in the p2020, and we are using the external interrupt irq1 for debugging.
Debugging Process
3.1.Data Read/Write Operations
At the beginning of debugging, the p2020 schematic shows that CS0 and CS1 are assigned to the internal nand_flash and nor_flash, respectively, and three output chip select signals (CS2, CS3, and CS4) indicate that up to eight Local Bus peripherals can be connected. These peripherals (including *_flash) share data lines, address lines, and read/write enable signals. The first step is to determine which chip select signal the driver uses; the driver also needs to be explored. The driver code indicates that only CS4 is used, while CS2 is masked. After capturing cpu_csn[2:0] on the hardware side, it was found that cpu_csn[0] also has a low chip select state. After modifying the driver, CS2 was retained, allowing the driver to trigger “cpu_csn[2:0]==3’b110” during each read/write operation. The CPU board will pull down CS2 when performing read/write operations in the address range 0xf1000_0000~0xf1000_ffff (128KB), mapping it to register/RAM read/write.
Testing the write function: After the CPU starts, the driver writes 16’h5555 to the memory address 0xf100_0118, as shown in diagram 5. The write address is given first, followed by pulling down the chip select signal, almost simultaneously pulling down the write enable signal, allowing data to be written from the CPU to the FPGA.
Figure5 Board-Level Write Operation
Compared to the simplicity of the write operation, issues arose during the read operation. We also connected the read enable lines of AX7103 and p2020 (address, data, and chip select signals were connected first) and found that the CPU could not start at this time. However, when this signal was connected to an unused EX_IO, the CPU could start normally, indicating that the read enable signal interfered with the CPU’s startup. However, the cpu_oen and cpu_wen attributes are the same; input to the FPGA does not output to the CPU, which would cause a startup failure. Upon examining the p2020 datasheet, it was found that the LGPL2 signal on the p2020 has two definitions:
1. Local Bus read enable cpu_oen;
2. Configuration of e500 core pll clock duty cycle with LBCTL and LALE signals.
It was suspected that the cpu_oen level on the FPGA side affected the CPU’s LGPL2 during the short time after the system started. To address this, the read enable was changed to an inout signal, which would be in high impedance state for 10 seconds after the CPU started, serving as isolation. After 10 seconds, the p2020 bootrom would also load and be ready to boot up. However, the actual test results showed that the CPU still could not start normally.
All Dupont wires were completely removed, leaving only the connection of the read enable line, and the CPU could start normally. This indicated that the read enable level on the FPGA side did not affect the CPU’s startup. To pinpoint the exact signal causing the issue, the wires were connected one by one, and after connecting the read enable line, write enable line, chip select line, and address line, the CPU board could boot up normally, and the FPGA could capture the correct read/write timing (although the read/write data was not visible). After connecting the data lines, the CPU could not start. Thus, the issue was identified. Consequently, the FPGA side was modified for the first time to implement a simple delay, keeping the data lines in high impedance state for 20 seconds (during which the CPU could perform Local Bus write operations) before enabling the data readout after booting up. This approach was quite clumsy but effectively resolved the CPU startup issue, although it was a temporary fix and not reliable.
/********** First Modification ********/
// rst_done only becomes 1 after 20 seconds, before which cpu_data is in high impedance
assign cpu_data =((cpu_oen==1’b0)&& (rst_done==1’b1)) ? cpu_rdata : 16’bz;
assign cpu_wdata = cpu_data ;
/*******************************/
Inspiration struck… By adding a constraint condition “(rst_done==1’b1)” to the FPGA side in “assign cpu_data =(cpu_oen==1’b0) ? cpu_rdata : 16’bz;”, the CPU could start, indicating that during the CPU’s initial startup, the low enable of cpu_oen caused cpu_data to interfere with the CPU’s data lines. At the beginning of this article, it was mentioned that the p2020 has internal nand_flash and nor_flash. When the p2020 starts, it needs to read and write data from these flashes, which pulls down cpu_oen. At this time, the Local Bus reads the default value 16’h0, which should have been some useful values from *_flash but got overwritten by 16’h0. To verify this hypothesis, I made a second modification to the data signal code.
/********** Second Modification ********/
// The FPGA will only output cpu_rdata after the CPU selects CS_4 for the local bus, otherwise it remains high impedance.
assign cpu_data =((cpu_oen==1’b0)&& (cpu_csn[2]==1’b0)) ? cpu_rdata : 16’bz;
assign cpu_wdata = cpu_data ;
/*******************************/
After this modification, the CPU started normally because at the beginning of the p2020 startup, it would read *_flash. Although cpu_oen was pulled low, the chip select was not CS2. Thus, the FPGA’s cpu_data remained high impedance until the chip select succeeded and the read enable was low effective, allowing the read data to be output. The final correct timing for the board-level read operation is shown in diagram 6 below.
Figure6 Board-Level Read Operation
3.2.Hard Interrupt
The p2020 processor’s interrupts are divided into external and internal interrupts, and we are using external interrupt 1. In Vxworks, the external interrupt vector for the processor starts numbering from 0, so the interrupt vector assigned to external interrupt 1 is 1. First, in the BSP, bind the interrupt vector to the device and register it in the hcfDevice device list; then, initialize and enable the interrupt in the driver to receive interrupt signals from the external interrupt 1 pin.
It is important to note that the following diagram 7 shows the description of the interrupt irq in the p2020 datasheet: it is high when the interrupt is set, and low when no interrupt is generated. However, actual board-level testing found that irq is valid when low, generating a hardwired interrupt.
Figure7 p2020InterruptirqDescription
When the driver receives an interrupt, it is necessary to reset the hardwired interrupt. Referring to the PCIe bus INTa interrupt operation (valuable! The minimum time interval for VxWorks to respond to PCIe interrupts was tested), we define register 0x110 as the interrupt register. Writing 1 to bit [15] of 0x110 and then writing 0 will reset the hardwired interrupt signal back to high level. The interrupt timing is shown in diagram 8 below.
Figure8 Interrupt Timing
—— by Liu Wenfeng & Zhang Xinghao version 2.0 2018-12-6
End of the article.
Scan the QR code below, thank you for your attention!
Leave a Comment
Your email address will not be published. Required fields are marked *