Parsing Protocol Standards to Parse Standard Protocols

Stephen McQuistin presented our paper on Parsing Protocol Standards to Parse Standard Protocols at the ACM/IRTF Applied Networking Research Workshop 2020 last week. In this paper we consider the problem of how to parse Internet protocols standards documents to generate a typed representation of protocol data units, and from that to generate parsing and serialisation code to ease implementation of the protocol. The goal is to make it easier to implement, test, and validate network protocols, and to help improve security by removing the need to write protocol parsing code by hand.

ANRW 2020 presentation slides

There are two parts to our work. Our key contribution is to develop the Network Protocol Representation. This is a typed protocol data description format that allows us to model protocol data units, and the process by which they are parsed and serialised, independent of both the surface syntax used in the protocol specification document and the implementation language. We also develop augmented packet diagrams to provide a familiar syntax that is easy-to-use and machine readable, and that can be used to describe protocol data within IETF standards documents.

The format of protocol data can be surprisingly complex. Many protocols use data formats where the presence and format of later fields depends, in complex ways, on earlier data. Other formats require knowledge of information sent out-of-band to parse incoming packets, perhaps based on data sent in a previous packet in the flow, or distributed via a separate signalling protocol. Still other formats require transformations of the data as part of the parsing or serialisation process. For example, a transformation function might be called to decrypt part of a frame, to decompress or reference prior content, or to reformat or decode a field with an unusual layout.

The QUIC transport protocol provides a good example of this, requiring transforms to decrypt data based on a key negotiated during the initial handshake, and to parse and serialise its variable length integer encoding. Other examples include message compression in DNS responses, and decoding the "unfortunate" format of the STUN Message Type Field. The Network Protocol Representation we define is, to the best of our knowledge, the first protocol data description format that can represent all these aspects of modern protocol data formats in a coherent, type-safe, manner, that allows for generation of parsing and serialisation code, and validation of protocol data formats.

While the Network Protocol Representation provides a powerful tool for describing protocol data and implementing a parser generator, the protocol specification document needs some surface syntax that can be used to describe protocol data. Protocol specifications use a range of different formats for this, including ABNF, ASN.1, YANG, the TLS presentation language, etc., and these are defined with varying degrees of formalism and ease of use. Perhaps unsurprisingly, engineers developing protocol standards have tended to favour more informal, easy to use, techniques for describing protocol data over the more rigorous, and correspondingly more complex, formal approaches with machine readable syntax.

The second contribution of our paper is to sketch a variant on the informal packet header diagrams that are commonly used to describe protocol data, and to demonstrate that only trivial changes to the format result in a human- and machine-readable format that is both familiar, easy to write, and can be parsed into the Network Protocol Representation. A companion draft describes this augmented packet diagram format in more detail.

Together, these contributions allow us to process Internet standards documents, extract protocol data descriptions, and transform them in to a rich internal representation. This can be used to sanity check the format, to help ensure correctness of the standards, and detect possible errors in the specification. Moreover, we can use it to generate implementations of parsing and serialisation code for the specified protocol format directly from the specification. Parsing code is a frequent source of bugs and security vulnerabilities, and we hope that this approach will, over time, help to improve the robustness of protocol implementations.

The slides are available, or you can watch a video of the presentation. A recording of the full ACM/IRTF Applied Networking Research Workshop session in which our paper was presented is also available on the IETF YouTube channel:

Read more: Parsing Protocol Standards to Parse Standard Protocols.