[PoC] Proposal of RBS generation method using Rack architecture

RBS generation problem

RBS is an easy language to generate code for because of its simple syntax and lack of dependencies between files. In addition, there are currently a large number of required type definitions, so code generation for RBS is considered to be highly important.

Various attempts have been made to generate RBS code, including generation from JSON files and analysis of static and dynamic Ruby code. However, each method has its advantages and disadvantages, and not all problems have been solved.

Large number of undefined gems
Determination of generics
Dynamic method generation by eval-type methods
Dynamic module include/extend/prepend using Module.new or included
Various extension requests

Proposed Methodology

I propose a method of combining small classes like a plugin mechanism for Rack middleware. We refer to this mechanism as RBSG (RBS Generator).

Like a Rack application, you write a small amount of code and run this.

Generator example:

loader = -> () {
  require 'foo'
  class Bar
    include Foo
  end
}
RBSG::Builder.new do
  use RBSG::Logger
  use RBSG::CreateFileByName,
    base_dir: "sig/out",
    header: "# !!! GENERATED CODE !!!"
  use RBSG::ObjectSpaceDiff
  use RBSG::IncludeExtendPrepend
  use RBSG::Result
  run loader
end

outputs:

sig/out/foo.rbs

# !!! GENERATED CODE !!!

module Foo
end

sig/out/bar.rbs

# !!! GENERATED CODE !!!

class Bar
  include Foo
end

Middleware example:

module RBSG
  class ObjectSpaceDiff
    def initialize(loader, if: nil)
      @loader = loader
      @if = binding.local_variable_get(:if)
    end

    def call(env)
      modules_before = ObjectSpace.each_object(Module).to_a

      result = @loader.call(env)

      modules_after = ObjectSpace.each_object(Module).to_a
      (modules_after - modules_before).each do |mod|
        next unless @if.nil? || @if.call(mod)
        result[mod.to_s] # set by default value
      end

      result
    end
  end
end

Pros

Middleware can be stacked by function, and middleware can be easily added or removed.
Middleware is in simple classes, and because they are simple, they can be used in a variety of situations.
The independent loading phase of the code allows both the pre- and post-loading code to be written. It is also easy to enclose the code in blocks.
The Rack architecture is widely accepted by rubyists and the acquisition cost can be estimated at a low level.
Unnecessary output can be controlled by middleware.
File output can also be written as middleware, so any output format can be supported.

Cons

The developer must write the loader and middleware stack like an application.
Middleware using TracePoint will not work as intended if the load timing is off.
High scalability and flexibility have a trade-off that also increases the cost of understanding.

Figure

sequenceDiagram
Middleware1 ->> Middleware2: .call
Middleware2 ->> Result: .call
Result ->>+ loader: .call
loader ->>- Result: no result
Result ->> Middleware2: result
Middleware2 ->> Middleware1: result

Middleware Example

Class/module definition using ObjectSpace
Static and dynamic addition of include/extend/prepend modules
Output data filtering
Debugging display of output
File output per class/module
Constant definition and type guessing
Logger configuration
Rails extensions to support class_attribute and mattr_acessor
Automatic support for method delegation

Use case

The proposed method is expected to be applied to a variety of use cases because of its simple and powerful mechanism.

gem_rbs_collection

When generating definitions for ActiveRecord

loader = -> (_env) {
  # code loading
  require 'active_record'
  ActiveRecord.eager_load!
}
RBSG::Builder.new do
  use RBSG::Logger
  use RBSG::CreateFileByName, # output
    base_dir: "sig/out",
    header: "# !!! GENERATED CODE !!!"
  use RBSG::Clean, if: -> (name, bodies) {
    if RBSG.rbs_defined?(name, library: "stdlib")
      bodies.empty? # skip empty definition
    else
      !(name.start_with?("ActiveRecord")) # skip out of scope
    end
  }
  use RBSG::ObjectSpaceDiff # class definition
  use RBSG::IncludeExtendPrepend # imported modules
  use RBSG::Rails::ClassAttribute # extention for rails
  use RBSG::Result
  run loader
end

Also, methods that are extended in Rails can be developed by writing extensions prepared for them and adding functionality. Furthermore, by switching the branch of the code to be read from, it is possible to easily output the code for each version.

Rails Application

env = {}
loader = -> (_env) {
  Rails.application.eager_load!
}
RBSG::Builder.new do
  use RBSG::Logger
  use RBSG::CreateFileByName,
    base_dir: Rails.root.join("sig/out"),
    header: "# !!! GENERATED CODE !!!"
  use RBSG::Clean, if: -> (name, _bodies) {
    RBSG.rbs_defined?(name, collection: true) # skip exist definition
  }
  use RBSG::ObjectSpaceDiff
  use RBSG::IncludeExtendPrepend
  use RBSG::Rails::ClassAttribute
  use CustomGenerator::Rolify # user customized
  use RBSG::Result
  run loader
end.call(env)

Similar to the gem_rbs_collection example, the same middleware can be used in the application code by changing the loader portion. In addition, users can add their own extensions and try them out, and it is easy to convert them to gems after they are used.

Common specifications

Loader

The loader only needs to load the code and does not need to worry about the return value.

Middlewear

Create it with a class that has a #call method, like Rack middleware.

Example of simple middleware

class SampleMiddlewear
  def initialize(loader)
    @loader = loader
  end
  
  def call(env)
    @loader.call(env)
  end
end

The interface is limited, but the content is not. Generate code, filter output, change output format, debug display, read documentation, configure Logger, etc.

Result

The return value is basically a Hash object with the class/module name as key and the content of each class/module as body. RBS is constructed by adding output codes to this result. It can output multiple classes/modules, so it can be used in libraries that extend core classes, such as active_support, and of course in Rails applications.

Output

The output is done using result. output can also be middleware, so it can handle a variety of output requests. For example, "write to a file for each class/module name", "write everything to standard output", etc.

I'm reading orthoses code and the middleware approach looks working well 👍

I imagine the following roadmap for the next steps.

Make a prototype
- It's done as orthoses and this gist
Move orthoses code to ruby/rbs to provide RBS generator library by rbs gem
- rbs gem should have the core structure and basic middlewares.
- Advanced middlewares, such as for Rails, should be outside of ruby/rbs. I have not considered where is the best place to put them yet.
Replace all rbs prototypes with orthoses
- rbs prototype's implementations should be just a set of middleware.
- The users can use rbs prototype for a shorthand of generator.
- If they want to customize rbs prototype, they can copy rbs prototype's implementation and inject middlewares.
Replace RBS Rails with the new generator

What do you think? If you feel mismatches with the roadmap, please tell me 🙏

By the way, I have not read the whole implementation of orthoses yet, but I've just read the core architecture and considered that it can replace rbs prototype.
The middleware approach is good for customizable generator, but I worry that the concrete implementation is the best way. So I'd like to review the implementation on some way. I can review it on a PR of ruby/rbs or on a video meeting. Which way (or something else) would you like?

ksss/rbsg.md