lyft/protoc-gen-star
GitHub: lyft/protoc-gen-star
Stars: 670 | Forks: 76
## Features
### Documentation
### Roadmap
- [x] Interface-based and fully-linked dependency graph with access to raw descriptors
- [x] Built-in context-aware debugging capabilities
- [x] Exhaustive, near 100% unit test coverage
- [x] End-to-end testable via overrideable IO & Interface based API
- [x] [`Visitor`][visitor] pattern and helpers for efficiently walking the dependency graph
- [x] [`BuildContext`][context] to facilitate complex generation
- [x] Parsed, typed command-line [`Parameters`][params] access
- [x] Extensible `ModuleBase` for quickly creating `Modules` and facilitating code generation
- [x] Configurable post-processing (eg, gofmt) of generated files
- [x] Support processing proto files from multiple packages
- [x] Load comments (via SourceCodeInfo) from proto files into gathered AST for easy access
- [x] Language-specific helper subpackages for handling common, nuanced generation tasks
- [ ] Load plugins/modules at runtime using Go shared libraries
### Examples
[`protoc-gen-example`][pge], can be found in the `testdata` directory. It includes two `Module` implementations using a variety of the features available. It's `protoc` execution is included in the `testdata/generated` [Makefile][make] target. Examples are also accessible via the documentation by running `make docs`.
## How It Works
### The `protoc` Flow
Because the process is somewhat confusing, this section will cover the entire flow of how proto files are converted to generated code, using a hypothetical PG* plugin: `protoc-gen-myplugin`. A typical execution looks like this:
protoc \
-I . \
--myplugin_out="foo=bar:../generated" \
./pkg/*.proto
`protoc`, the PB compiler, is configured using a set of flags (documented under `protoc -h`) and handed a set of files as arguments. In this case, the `I` flag can be specified multiple times and is the lookup path it uses for imported dependencies in a proto file. By default, the official descriptor protos are already included.
`myplugin_out` tells `protoc` to use the `protoc-gen-myplugin` protoc-plugin. These plugins are automatically resolved from the system's `PATH` environment variable, or can be explicitly specified with another flag. The official protoc-plugins (eg, `protoc-gen-python`) are already registered with `protoc`. The flag's value is specific to the particular plugin, with the exception of the `:../generated` suffix. This suffix indicates the root directory in which `protoc` will place the generated files from that package (relative to the current working directory). This generated output directory is _not_ propagated to `protoc-gen-myplugin`, however, so it needs to be duplicated in the left-hand side of the flag. PG* supports this via an `output_path` parameter.
`protoc-gen-myplugin` starts up, receiving the request payload, which it unmarshals. There are two phases to a PG*-based protoc-plugin. First, PG* unmarshals the `CodeGeneratorRequest` received from `protoc`, and creates a fully connected abstract syntax tree (AST) of each file and all its contained entities. Any parameters specified for this plugin are also parsed for later consumption.
When this step is complete, PG* then executes any registered `Modules`, handing it the constructed AST. `Modules` can be written to generate artifacts (eg, files) or just performing some form of validation over the provided graph without any other side effects. `Modules` provide the great flexibility in terms of operating against the PBs.
Once all `Modules` are run, PG* writes any custom artifacts to the file system or serializes generator-specific ones into a `CodeGeneratorResponse` and sends the data to its `stdout`. `protoc` receives this payload, unmarshals it, and persists any requested files to disk after all its plugins have returned. This whole flow looks something like this:
foo.proto → protoc → CodeGeneratorRequest → protoc-gen-myplugin → CodeGeneratorResponse → protoc → foo.pb.go
The PG* library hides away nearly all of this complexity required to implement a protoc-plugin!
### Modules
PG* `Modules` are handed a complete AST for those files that are targeted for generation as well as all dependencies. A `Module` can then add files to the protoc `CodeGeneratorResponse` or write files directly to disk as `Artifacts`.
PG* provides a `ModuleBase` struct to simplify developing modules. Out of the box, it satisfies the interface for a `Module`, only requiring the creation of `Name` and `Execute` methods. `ModuleBase` is best used as an anonyomous embedded field of a wrapping `Module` implementation. A minimal module would look like the following:
// ReportModule creates a report of all the target messages generated by the
// protoc run, writing the file into the /tmp directory.
type reportModule struct {
*pgs.ModuleBase
}
// New configures the module with an instance of ModuleBase
func New() pgs.Module { return &reportModule{&pgs.ModuleBase{}} }
// Name is the identifier used to identify the module. This value is
// automatically attached to the BuildContext associated with the ModuleBase.
func (m *reportModule) Name() string { return "reporter" }
// Execute is passed the target files as well as its dependencies in the pkgs
// map. The implementation should return a slice of Artifacts that represent
// the files to be generated. In this case, "/tmp/report.txt" will be created
// outside of the normal protoc flow.
func (m *reportModule) Execute(targets map[string]pgs.File, pkgs map[string]pgs.Package) []pgs.Artifact {
buf := &bytes.Buffer{}
for _, f := range targets {
m.Push(f.Name().String()).Debug("reporting")
fmt.Fprintf(buf, "--- %v ---", f.Name())
for i, msg := range f.AllMessages() {
fmt.Fprintf(buf, "%03d. %v\n", i, msg.Name())
}
m.Pop()
}
m.OverwriteCustomFile(
"/tmp/report.txt",
buf.String(),
0644,
)
return m.Artifacts()
}
`ModuleBase` exposes a PG* [`BuildContext`][context] instance, already prefixed with the module's name. Calling `Push` and `Pop` allows adding further information to error and debugging messages. Above, each file from the target package is pushed onto the context before logging the "reporting" debug message.
The base also provides helper methods for adding or overwriting both protoc-generated and custom files. The above execute method creates a custom file at `/tmp/report.txt` specifying that it should overwrite an existing file with that name. If it instead called `AddCustomFile` and the file existed, no file would have been generated (though a debug message would be logged out). Similar methods exist for adding generator files, appends, and injections. Likewise, methods such as `AddCustomTemplateFile` allows for `Templates` to be rendered instead.
After all modules have been executed, the returned `Artifacts` are either placed into the `CodeGenerationResponse` payload for protoc or written out to the file system. For testing purposes, the file system has been abstracted such that a custom one (such as an in-memory FS) can be provided to the PG* generator with the `FileSystem` `InitOption`.
#### Post Processing
`Artifacts` generated by `Modules` sometimes require some mutations prior to writing to disk or sending in the response to protoc. This could range from running `gofmt` against Go source or adding copyright headers to all generated source files. To simplify this task in PG*, a `PostProcessor` can be utilized. A minimal looking `PostProcessor` implementation might look like this:
// New returns a PostProcessor that adds a copyright comment to the top
// of all generated files.
func New(owner string) pgs.PostProcessor { return copyrightPostProcessor{owner} }
type copyrightPostProcessor struct {
owner string
}
// Match returns true only for Custom and Generated files (including templates).
func (cpp copyrightPostProcessor) Match(a pgs.Artifact) bool {
switch a := a.(type) {
case pgs.GeneratorFile, pgs.GeneratorTemplateFile,
pgs.CustomFile, pgs.CustomTemplateFile:
return true
default:
return false
}
}
// Process attaches the copyright header to the top of the input bytes
func (cpp copyrightPostProcessor) Process(in []byte) (out []byte, err error) {
cmt := fmt.Sprintf("// Copyright © %d %s. All rights reserved\n",
time.Now().Year(),
cpp.owner)
return append([]byte(cmt), in...), nil
}
The `copyrightPostProcessor` struct satisfies the `PostProcessor` interface by implementing the `Match` and `Process` methods. After PG* recieves all `Artifacts`, each is handed in turn to each registered processor's `Match` method. In the above case, we return `true` if the file is a part of the targeted Artifact types. If `true` is returned, `Process` is immediately called with the rendered contents of the file. This method mutates the input, returning the modified value to out or an error if something goes wrong. Above, the notice is prepended to the input.
PostProcessors are registered with PG* similar to `Modules`:
g := pgs.Init(pgs.IncludeGo())
g.RegisterModule(some.NewModule())
g.RegisterPostProcessor(copyright.New("PG* Authors"))
## Protocol Buffer AST
While `protoc` ensures that all the dependencies required to generate a proto file are loaded in as descriptors, it's up to the protoc-plugins to recognize the relationships between them. To get around this, PG* uses constructs an abstract syntax tree (AST) of all the `Entities` loaded into the plugin. This AST is provided to every `Module` to facilitate code generation.
### Hierarchy
The hierarchy generated by the PG* `gatherer` is fully linked, starting at a top-level `Package` down to each individual `Field` of a `Message`. The AST can be represented with the following digraph:

标签:EVTX分析