— python, protoc, software, how-to — 2 min read
A while ago, I tried to figure out how to write a Python plugin for protoc
, which is Google's official compiler for protobuf (aka Protocol Buffer). Protocol Buffer is "Google's data interchange format" and it allows you to serialize structured data. My overall aim was to parse a set of Protocol Buffer definition (.proto) files and store the parsing results in JSON format (see protoc-gen-bq-schema for a real-life application of this idea).
The official protobuf library supports automatic code generation for a limited number of use-cases and languages. Luckily, protoc
can easily be extended to support custom parsing via plugins!
From Google's docs:
A plugin is just a program which reads a CodeGeneratorRequest protocol buffer from standard input and then writes a CodeGeneratorResponse protocol buffer to standard output. These message types are defined in plugin.proto. We recommend that all third-party code generators be written as plugins, as this allows all generators to provide a consistent interface and share a single parser implementation.
I did some research, but couldn't find too much information. Most plugins out there seem to be written in Golang, and there is no official documentation or tutorial for a Python plugin. Hence, I want to provide a beginner-friendly introduction for working with protoc
and how to write a simple, custom plugin using Python. The plugin we'll write in this example parses Protocol Buffer files and writes high-level information about the files, such as importing definitions or options, into JSON files.
Note: All setup instructions and the complete example code can be found on GitHub.
You'll need Python 3 and protoc
to be installed on your machine (I'm using Mac).
1brew install protobuf
Validate your installation with:
1protoc --version
The output should be libprotoc 3.14.0
, or similar.
Furthermore, make sure to pip install protobuf==3.14.0
(versions should match) in your Python environment.
First, create a file plugin.py
. The most basic thing a plugin can do is reading a CodeGeneratorRequest
from stdin and write an empty CodeGeneratorResponse
to stdout.
This can be accomplished as follows:
1#!/usr/bin/env python2
3import sys4
5from google.protobuf.compiler import plugin_pb2 as plugin6
7
8def process(9 request: plugin.CodeGeneratorRequest, response: plugin.CodeGeneratorResponse10) -> None:11 pass12
13
14def main() -> None:15 # Load the request from stdin16 request = plugin.CodeGeneratorRequest.FromString(sys.stdin.buffer.read())17
18 # Create a response19 response = plugin.CodeGeneratorResponse()20
21 process(request, response)22
23 # Serialize response and write to stdout24 sys.stdout.buffer.write(response.SerializeToString())25
26
27if __name__ == "__main__":28 main()
Now make the script executable with chmod +x plugin.py
. Finally, we can try it out!
Create a test Protocol Buffer file example.proto and invoke the compiler by running:
1protoc example.proto --plugin=protoc-gen-custom-plugin=./plugin.py --custom-plugin_out=.
Confusing? Custom plugin names always start with protoc-gen-
. Note that the term custom-plugin
is both the last portion of the plugin name and the first part of the out
argument (path where output files are written to). This term could be anything, as long as you follow the naming convention. Also note that in this example I specified the path to our plugin with =./plugin.py
, but alternatively you can also make sure your plugin is a program called protoc-gen-custom-plugin
(for example), available on your PATH
.
Executing the full command will not result in any output, since we are not writing anything into the response yet. Soon, in part 2 we'll look into generating some useful output and magically write it into JSON files.
Stay tuned and Merry Christmas! 🎄
I hope you found this helpful — for any feedback, comments or questions, please reach out.
~ manzan
P.S.: Part 2 is now online!