Packet Processing—An Introduction to State Machine Concepts
TITLE: Packet Processing—An Introduction to State Machine Concepts
TEXT_MARKDOWN:
Preface
With nothing to do during the summer holidays, I dug out an HC-05 Bluetooth module that had been idle for a long time, just right for playing around. After checking briefly, it is essentially a wireless serial port bridge tool. After connecting with a Bluetooth debugging app on the phone, the actual communication effect is no different from a wired serial port, so there isn't much extra to learn.
I recalled the pit I dug in my previous studies regarding data unpacking/packing, which I hadn't filled yet. This is the perfect opportunity to learn about it.
Bluetooth Module
I am using the LiChuang SkyStar STM32F407VGT6. I connected the HC-05's TXD and RXD to the corresponding pins of the microcontroller's USART3. Its logic level is 3.3V, and it is driven by a 5V voltage. The serial port solution uses DMA + Idle Interrupt + Ring Buffer.
When the EN/KEY pin is at a low level or floating, the HC-05 is in default mode (Data Mode). After the phone connects to the HC-05 using a Bluetooth debugging app to send data, the HC-05 automatically forwards the data to the microcontroller.
When the EN/KEY pin is pulled to a high level, the HC-05 module will enter AT Command Mode. This mode allows users to send specific "AT commands" to the module via the serial port to configure various parameters of the module.
Custom Data Packet
A well-designed data packet usually contains the following parts:
| Component | Alias/Colloquial Name | Function and Role |
|---|---|---|
| 1. Header | Sync Header, Frame Header | A fixed, unique sequence of bytes used to identify the start of a data packet. |
| 2. Address/ID | Source/Destination Address | Used on multi-device communication buses (like RS485) to specify who the packet is for or where it came from. It can be omitted in simple point-to-point communication. |
| 3. Command/Function Code | Function Code, CMD | The core of the packet, telling the receiver "what to do", e.g., "Set LED", "Read Temperature", "Return Heartbeat". |
| 4. Data Length | LEN | Explicitly points out the length of the Payload. This is key to implementing variable-length packets; the receiver uses this length to determine how many bytes of data to read. |
| 5. Payload | Data | The part of the packet that actually carries the changing information. For example, the status value to set an LED, the sensor reading to send, etc. If a command does not need extra data (like "Query"), this part can be empty. |
| 6. Checksum | CRC, Checksum | A "fingerprint" calculated by a specific algorithm (cumulative sum, CRC, etc.) on the key contents of the packet. The receiver calculates it once using the same algorithm; if the result matches the received checksum, it indicates the data is likely fine. |
| 7. Footer | Terminator, End Mark | A fixed byte sequence used to identify the end of a data packet. In protocols with a "Data Length" field, the footer is not mandatory, but sometimes serves as an extra layer of verification. In some text protocols, the newline character \r\n acts as the footer. |
For practice, I designed a simple scenario: communicating via phone with the Bluetooth module to control 6 LEDs on the development board.
The custom data packet format is as follows:
| Field | Bytes | Value | Description |
|---|---|---|---|
| Header | 2 | 0xAA 0x55 (Fixed) | Identifies the start of a data frame; the receiver uses this as a sign for data synchronization. |
| Command | 1 | 0x00 ~ 0xFF | Defines the function and intent of the packet. |
| Length | 1 | 0x00 ~ 0xFF | Indicates the byte length of the immediately following Data field. If the data field is empty, this value is 0. |
| Data | N (Determined by Length) | Any Value | The payload of the packet, i.e., the information that actually needs to be transmitted. |
| Checksum | 1 | 0x00 ~ 0xFF | The cumulative sum of all bytes from the Command to the last byte of the Data field, with the result truncated to the lowest 8 bits. |
A. Set LED Status (CMD: 0x01)
- Direction: Phone/Host -> Development Board
- Function: Control the on/off status of 6 LEDs on the development board.
- Data Field (Data):
- Length: 1 byte.
- Content: An 8-bit byte, where the lower 6 bits (Bit 0 ~ Bit 5) correspond to the status of the 6 LEDs respectively.
1represents on,0represents off.
- Example: Turn on LED1 and LED3 (
0b00000101=0x05).- Complete Packet:
AA 55 01 01 05 07 - Checksum:
0x01 + 0x01 + 0x05 = 0x07
- Complete Packet:
B. Query LED Status (CMD: 0x02)
- Direction: Phone/Host -> Development Board
- Function: Request the development board to return the status of all current LEDs.
- Data Field (Data):
- Length: 0 bytes. The data field is empty.
- Example: Send a query command.
- Complete Packet:
AA 55 02 00 02 - Checksum:
0x02 + 0x00 = 0x02
- Complete Packet:
C. Respond LED Status (CMD: 0x82)
- Direction: Development Board -> Phone/Host
- Function: As a reply to the
0x02query command, report the current LED status.- Design Note: The response command code usually sets the highest bit to 1 based on the request command code (i.e.,
Request Code + 0x80). This is a common protocol design pattern.
- Design Note: The response command code usually sets the highest bit to 1 based on the request command code (i.e.,
- Data Field (Data):
- Length: 1 byte.
- Content: Exactly the same data format as the
0x01command, representing the actual status of the current LEDs.
- Example: Reply that LED1 and LED3 are currently on.
- Complete Packet:
AA 55 82 01 05 88 - Checksum:
0x82 + 0x01 + 0x05 = 0x88
- Complete Packet:
Packet Processing
Why use a State Machine?
Based on the design above, our data packet format is determined as follows:
| Header 1 (0xAA) | Header 2 (0x55) | Command (1 Byte) | Length (1 Byte) | Data (N Bytes) | Checksum (1 Byte) |
Following intuition, I might write "spaghetti code" like this:
[Code Block Placeholder - The source text implies code here but doesn't provide it, just describes it]
The execution flow of this kind of code is linear and procedural, deciding the next step based on a series of conditional judgments. Its downsides are obvious: deeply nested, chaotic logic, very prone to errors, and extremely difficult to extend.
Obviously, this procedural approach of "going all the way to the dark" is already inadequate when dealing with streaming data with contextual logic.
So, let's analyze what the root cause of this chaos is?
The fundamental reason is that our parsing logic depends not only on the currently read byte but even more so on a hidden "context", which is "which part of the packet has been read so far".
In the code above, one attempts to artificially "remember" this context using an i++ and the nesting level of the code, but the effect is obviously not ideal.
Since there are different "contexts" in the parsing process, why not switch to a different mode of thinking? We can explicitly define these contexts and turn them into clear "states".
For example:
State waiting for Header 1State waiting for Header 2State waiting for Command Byte- ...and so on.
If the program can switch between these explicit "states", the code structure will become much clearer.
This is precisely the core concept to be introduced next—Finite State Machine (FSM).
A state machine is a powerful design pattern. Simply put, the core of the state machine concept is to break down a complex logical process into a finite number of mutually exclusive "states", and clearly define under what conditions (events) to switch from one state to another. Through it, we can refactor the mess of if-else above into code with a clear structure that is easy to maintain. It is naturally suitable for handling scenarios like packet parsing where behavior depends on historical steps.
State Machine Implementation
1. Define States/Commands
Use an enumeration type enum to define all possible states. The initial state is STATE_HEADER1, waiting for Header 1 (0xAA).
Define protocol command constants.
2. State Machine Core
The core of the state machine lies in the parse_byte(uint8_t byte) function. Its working mode is byte-by-byte driven.
The serial port function has already stored the data into a ring buffer and started processing the data byte by byte.
It processes only one byte at a time, and then decides what to do next based on the current_state (current state).
The switch-case structure is an excellent way to implement a state machine; each case is an independent state processing logic.
Initial State (STATE_WAIT_HEADER_1):
- After the state machine starts, its only goal is to wait for the first packet header
0xAAof the protocol. - Once
0xAAis received, it completes the first task and switches the state toSTATE_WAIT_HEADER_2.
Waiting for Second Header (STATE_WAIT_HEADER_2):
- At this point, it expects to receive the second packet header
0x55. - If
0x55is successfully received, it means the header matches successfully, and the state switches toSTATE_WAIT_CMD, preparing to receive the command. - If the received byte is not
0x55, it indicates an erroneous packet (perhaps the data just happened to contain0xAA), and it will immediately reset the state machine back toSTATE_WAIT_HEADER_1.
Receiving Command and Length (STATE_WAIT_CMD, STATE_WAIT_LEN):
- Store the byte into the corresponding variable (
packet_cmd,packet_len), and then unconditionally switch to the next state. - If
packet_lenis 0, it will directly skip the data reception phase and enter the checksum stateSTATE_WAIT_CHECKSUM.
Receiving Data (STATE_WAIT_DATA):
- This is a "looping" state. It will stay in this state until it has received
packet_lenbytes of data. - The
data_indexvariable acts as a counter here. Every time a byte is received,data_indexincrements by one. - When
data_indexequalspacket_len, it indicates the data part reception is complete, and the state switches toSTATE_WAIT_CHECKSUM.
Checksum Verification (STATE_WAIT_CHECKSUM):
- Recalculate the checksum based on the received
packet_cmd,packet_len, and the data inpacket_buffer. - Compare the calculated
checksumwith the last received bytebyte. - If they are equal, Checksum Passed! Call the
handle_packet()function to process this complete and correct packet. - If they are not equal, it indicates an error occurred during data transmission.
- At this point, the lifecycle of a data packet ends. The state machine resets back to
STATE_WAIT_HEADER_1, waiting for the arrival of the next packet.
3. Data Processing
When the state machine successfully receives and verifies a complete data packet (completed in the STATE_CHECKSUM state), it calls the handle_packet() function to execute the actual business logic.
Improvements
For more complex scenarios with higher security requirements, the following improvements can be made:
Add a Timeout Mechanism
If only half a packet is received (e.g., only the header and command were sent), the state machine will stay stuck in an intermediate state (like
STATE_WAIT_LEN) forever and cannot reset automatically. Add a timeout timer. Record the current system time (e.g.,HAL_GetTick()) when enteringSTATE_WAIT_HEADER_2. In the loop ofBT_Task, check the difference between the current time and the recorded time. If it exceeds a preset threshold, force resetparser_statetoSTATE_WAIT_HEADER_1.Data Length Verification
If a string of excessively long data is sent incorrectly, exceeding the buffer causes an overflow, leading to data loss and program errors. After receiving
packet_lenin theSTATE_WAIT_LENstate, immediately checkif (packet_len > BT_PROCESS_BUFFER_SIZE). If the length is wrong, reset the state machine.
Summary
I had the opportunity to fill the pit of "data packet unpacking/packing" that I had wanted to learn but hadn't delved into before.
This study was an eye-opener for me. The "spaghetti" if-else nested writing style becomes logically chaotic and hard to maintain when dealing with streaming data.
The implementation of the state machine not only makes the code easy to understand and extend but also naturally achieves the decoupling of parsing logic and business logic.