indexing memory for UART transmission using > 100% SLICEs Tang Nano

127 Views Asked by At

I am trying to build a simple UART reception parser command line based on example from Tang Nano 9K repo here, here is my modified version. It basically uses a memory to hold some values, which is working. Once I receive 5 characters I would like to send them back to host, basically looping through the memory items and sending once tx_data_rdy is set.

module uart_test(
    input                        clk,
    input                        rst_n,
    input                        uart_rx,
    output                       uart_tx,
    output reg [5:0]             led
);

parameter                        CLK_FRE  = 27;         //MHz
parameter                        UART_FRE = 115200;     //bps
localparam                       IDLE =  0;
localparam                       SEND =  1;   
localparam                       WAIT =  2;   
localparam                       PROCESS_RX =  3;


parameter                        MSG_SIZE = 256;    

reg[7:0]                         tx_data;
reg[7:0]                         tx_str;
reg                              tx_data_valid;
wire                             tx_data_ready;
reg[7:0]                         tx_cnt;
wire[7:0]                        rx_data;
wire                             rx_data_valid;
wire                             rx_data_ready;
reg[31:0]                        wait_cnt;
reg[3:0]                         state;

reg [7:0]                        message [MSG_SIZE - 1:0];
reg [7:0]                        rx_index;   
integer                          i;        
localparam                      DATA_NUM2   = 3;    // just for testing

assign rx_data_ready = 1'b1;//always can receive data,

always@(posedge clk or negedge rst_n)
begin
    if(rst_n == 1'b0)
    begin
        led <= ~6'b100000;
        wait_cnt <= 32'd0;
        tx_data <= 8'd0;
        state <= IDLE;
        tx_cnt <= 8'd0;
        tx_data_valid <= 1'b0;
        rx_index <= 8'd0;
        for (i = 0; i < 256; i = i+1) begin
            message[i] <= 0;
        end
    end
    else
    case(state)
        IDLE:
        begin
            if(rx_data_valid == 1'b1)
            begin
                message[rx_index] <= rx_data;   // send uart received data
                if(rx_index >= 8'd4)
                begin
                    rx_index <= 8'd0;
                    tx_cnt <= 8'd0;
                    state <= PROCESS_RX;
                end
                else begin
                    led <= ~rx_data[5:0];
                    rx_index <= rx_index + 8'd1;
                end
            end
        end
        PROCESS_RX:
        begin
            tx_data <= tx_str;
            tx_data_valid <= 1'b1;
            state <= WAIT;
        end
        WAIT:
        begin
            if(tx_data_valid == 1'b1 && tx_data_ready == 1'b1) begin
                tx_data_valid <= 1'b0;
                tx_cnt <= tx_cnt + 8'd1; //Send data counter
                if(tx_cnt <= (DATA_NUM2 - 1)) begin
                    state <= PROCESS_RX;
                    led <= ~6'b000110;
                end
                else begin
                    led <= ~6'b000100;
                    state <= IDLE;
                end
            end
        end
        default:
            state <= IDLE;
    endcase
end


always@(tx_cnt)
    tx_str <= message[tx_cnt];

uart_rx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_rx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .rx_data                    (rx_data                  ),
    .rx_data_valid              (rx_data_valid            ),
    .rx_data_ready              (rx_data_ready            ),
    .rx_pin                     (uart_rx                  )
);

uart_tx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_tx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .tx_data                    (tx_data                  ),
    .tx_data_valid              (tx_data_valid            ),
    .tx_data_ready              (tx_data_ready            ),
    .tx_pin                     (uart_tx                  )
);
endmodule

The Verilog is not able to synthetize as it runs out of SLICEs.

enter image description here

If I just comment the line that increments the index variable used for indexing the memory, i.e. tx_cnt <= tx_cnt + 8'd1; , then it builds ok.

I have tried different things without success, any Verilog expert probably can quickly see the problem but I am not getting it.

What am I doing wrong?

1

There are 1 best solutions below

4
Mikef On BEST ANSWER

The problems is that the synthesis tool is trying to make a 8-bit 256 location memory out of logic fabric/slices/registers. The physical FPGA does not have the resources to implement the design.

Here are two solutions:

  1. Reduce the memory size. 256 elements seems like a lot when the need is 5.
    Changing the size to 16 reduces the utilization burden, and provides margin based on the need of 5.

    Change the parameter to parameter MSG_SIZE = 16;

    and change the for loop indexing to for (i = 0; i < 16; i = i+1) begin

  2. Code using a style which will infer a BRAM primitive for the memory (rather than inferring logic fabric/slices/registers). FPGA BRAM's do not have reset pins (reset inputs) on them therefore its not possible to reset/initialized them using a reset signal in RTL. Most applications don't need a reset. There does not seem to be a reason this application needs a reset. Make sure the design writes at least one value to the memory before reading one.

    If the tools detect a coding style attempting reset (like the posted code), then they will make a memory using logic fabric/slices/registers for the user. Removing code which attempts to reset the memory will allow the synthesis tool to infer a BRAM and use 0 slices for the memory. For this solution, keep the memory size at 256 because if the memory is small beyond some threshold, then the tool may decide not to waste a BRAM on what it analyzes to be a very small memory.

    Another coding style issue is that the read side of the memory was modeled as a transparent latch, and BRAMS don't have them so that also prevents BRAM inference. Latches are generally not recommended for RTL design. Here is a small re-write to create a separate synchronous process for the memory to facilitate the tools to inference of BRAM.

    Please run the simulations to make sure it behaves as desired. It may need a read enable if the data does not come out of the memory at the right time.


module uart_test(
    input                        clk,
    input                        rst_n,
    input                        uart_rx,
    output                       uart_tx,
    output reg [5:0]             led
);

parameter                        CLK_FRE  = 27;         //MHz
parameter                        UART_FRE = 115200;     //bps
localparam                       IDLE =  0;
localparam                       SEND =  1;   
localparam                       WAIT =  2;   
localparam                       PROCESS_RX =  3;


parameter                        MSG_SIZE = 256;    

reg[7:0]                         tx_data;
reg[7:0]                         tx_str;
reg                              tx_data_valid;
wire                             tx_data_ready;
reg[7:0]                         tx_cnt;
wire[7:0]                        rx_data;
wire                             rx_data_valid;
wire                             rx_data_ready;
reg[31:0]                        wait_cnt;
reg[3:0]                         state;

reg                              message_wr_en;
reg [7:0]                        message [MSG_SIZE - 1:0];
reg [7:0]                        rx_index;   
integer                          i;

localparam                      DATA_NUM2   = 3;    // just for testing

assign rx_data_ready = 1'b1;//always can receive data,

always@(posedge clk or negedge rst_n)
begin
    if(rst_n == 1'b0)
    begin
        led <= ~6'b100000;
        wait_cnt <= 32'd0;
        tx_data <= 8'd0;
        state <= IDLE;
        tx_cnt <= 8'd0;
        tx_data_valid <= 1'b0;
        rx_index <= 8'd0;
        message_wr_en <= 1'b0;
        //for (i = 0; i < 256; i = i+1) begin
        //    message[i] <= 0;
        //end
    end
    else
    case(state)
        IDLE:
        begin
            if(rx_data_valid == 1'b1)
            begin
                message_wr_en <= 1'b1;
                // message[rx_index] <= rx_data;   // send uart received data
                if(rx_index >= 8'd4)
                begin
                    rx_index <= 8'd0;
                    tx_cnt <= 8'd0;
                    state <= PROCESS_RX;
                end
                else begin
                    led <= ~rx_data[5:0];
                    rx_index <= rx_index + 8'd1;
                end
            end
            else
              message_wr_en <= 1'b0;
          
        end
        PROCESS_RX:
        begin
            tx_data <= tx_str;
            tx_data_valid <= 1'b1;
            state <= WAIT;
        end
        WAIT:
        begin
            if(tx_data_valid == 1'b1 && tx_data_ready == 1'b1) begin
                tx_data_valid <= 1'b0;
                tx_cnt <= tx_cnt + 8'd1; //Send data counter
                if(tx_cnt <= (DATA_NUM2 - 1)) begin
                    state <= PROCESS_RX;
                    led <= ~6'b000110;
                end
                else begin
                    led <= ~6'b000100;
                    state <= IDLE;
                end
            end
        end
        default:
            state <= IDLE;
    endcase
end

  // **************************************************************
  // Model BRAM
  // **************************************************************
  // Changed this process from a transparent latch to clocked registers
  // latches are not good 99.9999% of the time.
  always@(posedge clk) begin
    if(message_wr_en)
      message[rx_index] <= rx_data;   // send uart received data
    
    // This memory output might need an enable from the state machine
    // if the data comes out at the wrong time.
    tx_str <= message[tx_cnt];
  end
  
uart_rx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_rx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .rx_data                    (rx_data                  ),
    .rx_data_valid              (rx_data_valid            ),
    .rx_data_ready              (rx_data_ready            ),
    .rx_pin                     (uart_rx                  )
);

uart_tx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_tx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .tx_data                    (tx_data                  ),
    .tx_data_valid              (tx_data_valid            ),
    .tx_data_ready              (tx_data_ready            ),
    .tx_pin                     (uart_tx                  )
);
endmodule