TITLE: Packet Processing—An Introduction to State Machine Concepts
TEXT_MARKDOWN:
With nothing to do during the summer holidays, I dug out an HC-05 Bluetooth module that had been idle for a long time, just right for playing around. After checking briefly, it is essentially a wireless serial port bridge tool. After connecting with a Bluetooth debugging app on the phone, the actual communication effect is no different from a wired serial port, so there isn't much extra to learn.
I recalled the pit I dug in my previous studies regarding data unpacking/packing, which I hadn't filled yet. This is the perfect opportunity to learn about it.
I am using the LiChuang SkyStar STM32F407VGT6. I connected the HC-05's TXD and RXD to the corresponding pins of the microcontroller's USART3. Its logic level is 3.3V, and it is driven by a 5V voltage. The serial port solution uses DMA + Idle Interrupt + Ring Buffer.
When the EN/KEY pin is at a low level or floating, the HC-05 is in default mode (Data Mode). After the phone connects to the HC-05 using a Bluetooth debugging app to send data, the HC-05 automatically forwards the data to the microcontroller.
When the EN/KEY pin is pulled to a high level, the HC-05 module will enter AT Command Mode. This mode allows users to send specific "AT commands" to the module via the serial port to configure various parameters of the module.
A well-designed data packet usually contains the following parts:
| Component | Alias/Colloquial Name | Function and Role |
|---|---|---|
| 1. Header | Sync Header, Frame Header | A fixed, unique sequence of bytes used to identify the start of a data packet. |
| 2. Address/ID | Source/Destination Address | Used on multi-device communication buses (like RS485) to specify who the packet is for or where it came from. It can be omitted in simple point-to-point communication. |
| 3. Command/Function Code | Function Code, CMD | The core of the packet, telling the receiver "what to do", e.g., "Set LED", "Read Temperature", "Return Heartbeat". |
| 4. Data Length | LEN | Explicitly points out the length of the Payload. This is key to implementing variable-length packets; the receiver uses this length to determine how many bytes of data to read. |
| 5. Payload | Data | The part of the packet that actually carries the changing information. For example, the status value to set an LED, the sensor reading to send, etc. If a command does not need extra data (like "Query"), this part can be empty. |
| 6. Checksum | CRC, Checksum | A "fingerprint" calculated by a specific algorithm (cumulative sum, CRC, etc.) on the key contents of the packet. The receiver calculates it once using the same algorithm; if the result matches the received checksum, it indicates the data is likely fine. |
| 7. Footer | Terminator, End Mark | A fixed byte sequence used to identify the end of a data packet. In protocols with a "Data Length" field, the footer is not mandatory, but sometimes serves as an extra layer of verification. In some text protocols, the newline character \r\n acts as the footer. |
For practice, I designed a simple scenario: communicating via phone with the Bluetooth module to control 6 LEDs on the development board.
The custom data packet format is as follows:
| Field | Bytes | Value | Description |
|---|---|---|---|
| Header | 2 | 0xAA 0x55 (Fixed) | Identifies the start of a data frame; the receiver uses this as a sign for data synchronization. |
| Command | 1 | 0x00 ~ 0xFF | Defines the function and intent of the packet. |
| Length | 1 | 0x00 ~ 0xFF | Indicates the byte length of the immediately following Data field. If the data field is empty, this value is 0. |
| Data | N (Determined by Length) | Any Value | The payload of the packet, i.e., the information that actually needs to be transmitted. |
| Checksum | 1 | 0x00 ~ 0xFF | The cumulative sum of all bytes from the Command to the last byte of the Data field, with the result truncated to the lowest 8 bits. |
A. Set LED Status (CMD: 0x01)
1 represents on, 0 represents off.0b00000101 = 0x05).
AA 55 01 01 05 070x01 + 0x01 + 0x05 = 0x07B. Query LED Status (CMD: 0x02)
AA 55 02 00 020x02 + 0x00 = 0x02C. Respond LED Status (CMD: 0x82)
0x02 query command, report the current LED status.
Request Code + 0x80). This is a common protocol design pattern.0x01 command, representing the actual status of the current LEDs.AA 55 82 01 05 880x82 + 0x01 + 0x05 = 0x88Based on the design above, our data packet format is determined as follows:
| Header 1 (0xAA) | Header 2 (0x55) | Command (1 Byte) | Length (1 Byte) | Data (N Bytes) | Checksum (1 Byte) |
Following intuition, I might write "spaghetti code" like this:
[Code Block Placeholder - The source text implies code here but doesn't provide it, just describes it]
The execution flow of this kind of code is linear and procedural, deciding the next step based on a series of conditional judgments. Its downsides are obvious: deeply nested, chaotic logic, very prone to errors, and extremely difficult to extend.
Obviously, this procedural approach of "going all the way to the dark" is already inadequate when dealing with streaming data with contextual logic.
So, let's analyze what the root cause of this chaos is?
The fundamental reason is that our parsing logic depends not only on the currently read byte but even more so on a hidden "context", which is "which part of the packet has been read so far".
In the code above, one attempts to artificially "remember" this context using an i++ and the nesting level of the code, but the effect is obviously not ideal.
Since there are different "contexts" in the parsing process, why not switch to a different mode of thinking? We can explicitly define these contexts and turn them into clear "states".
For example:
State waiting for Header 1State waiting for Header 2State waiting for Command ByteIf the program can switch between these explicit "states", the code structure will become much clearer.
This is precisely the core concept to be introduced next—Finite State Machine (FSM).
A state machine is a powerful design pattern. Simply put, the core of the state machine concept is to break down a complex logical process into a finite number of mutually exclusive "states", and clearly define under what conditions (events) to switch from one state to another. Through it, we can refactor the mess of if-else above into code with a clear structure that is easy to maintain. It is naturally suitable for handling scenarios like packet parsing where behavior depends on historical steps.
Use an enumeration type enum to define all possible states. The initial state is STATE_HEADER1, waiting for Header 1 (0xAA).
Define protocol command constants.
The core of the state machine lies in the parse_byte(uint8_t byte) function. Its working mode is byte-by-byte driven.
The serial port function has already stored the data into a ring buffer and started processing the data byte by byte.
It processes only one byte at a time, and then decides what to do next based on the current_state (current state).
The switch-case structure is an excellent way to implement a state machine; each case is an independent state processing logic.
Initial State (STATE_WAIT_HEADER_1):
0xAA of the protocol.0xAA is received, it completes the first task and switches the state to STATE_WAIT_HEADER_2.Waiting for Second Header (STATE_WAIT_HEADER_2):
0x55.0x55 is successfully received, it means the header matches successfully, and the state switches to STATE_WAIT_CMD, preparing to receive the command.0x55, it indicates an erroneous packet (perhaps the data just happened to contain 0xAA), and it will immediately reset the state machine back to STATE_WAIT_HEADER_1.Receiving Command and Length (STATE_WAIT_CMD, STATE_WAIT_LEN):
packet_cmd, packet_len), and then unconditionally switch to the next state.packet_len is 0, it will directly skip the data reception phase and enter the checksum state STATE_WAIT_CHECKSUM.Receiving Data (STATE_WAIT_DATA):
packet_len bytes of data.data_index variable acts as a counter here. Every time a byte is received, data_index increments by one.data_index equals packet_len, it indicates the data part reception is complete, and the state switches to STATE_WAIT_CHECKSUM.Checksum Verification (STATE_WAIT_CHECKSUM):
packet_cmd, packet_len, and the data in packet_buffer.checksum with the last received byte byte.handle_packet() function to process this complete and correct packet.STATE_WAIT_HEADER_1, waiting for the arrival of the next packet.When the state machine successfully receives and verifies a complete data packet (completed in the STATE_CHECKSUM state), it calls the handle_packet() function to execute the actual business logic.
For more complex scenarios with higher security requirements, the following improvements can be made:
Add a Timeout Mechanism
If only half a packet is received (e.g., only the header and command were sent), the state machine will stay stuck in an intermediate state (like STATE_WAIT_LEN) forever and cannot reset automatically.
Add a timeout timer. Record the current system time (e.g., HAL_GetTick()) when entering STATE_WAIT_HEADER_2. In the loop of BT_Task, check the difference between the current time and the recorded time. If it exceeds a preset threshold, force reset parser_state to STATE_WAIT_HEADER_1.
Data Length Verification
If a string of excessively long data is sent incorrectly, exceeding the buffer causes an overflow, leading to data loss and program errors.
After receiving packet_len in the STATE_WAIT_LEN state, immediately check if (packet_len > BT_PROCESS_BUFFER_SIZE). If the length is wrong, reset the state machine.
I had the opportunity to fill the pit of "data packet unpacking/packing" that I had wanted to learn but hadn't delved into before.
This study was an eye-opener for me. The "spaghetti" if-else nested writing style becomes logically chaotic and hard to maintain when dealing with streaming data.
The implementation of the state machine not only makes the code easy to understand and extend but also naturally achieves the decoupling of parsing logic and business logic.