Skip to content

Instantly share code, notes, and snippets.

@rjl493456442
Last active January 8, 2020 07:32
Show Gist options
  • Select an option

  • Save rjl493456442/defff4b842daab35ad8aa26a9f48079e to your computer and use it in GitHub Desktop.

Select an option

Save rjl493456442/defff4b842daab35ad8aa26a9f48079e to your computer and use it in GitHub Desktop.

Abstract flow control model

Now LES protocol uses token bucket for throttling as the flow control mechanism. Basically the mechanism is:

  • Each les client will be assigned a capacity which represents the allowance of les client to use les server resources

    • Free client will be assigned a minimal capacity
    • Paid client can be assigned a higher capacity
  • Each request will be metered, the cost mostly depends on the serving time

  • Token bucket is counted by unit, not request count, e.g. 100,000 units

  • Token bucket is recharged by unit, not request count, e.g. 1 unitper nanosecond

    For each request, corresponding units will be reduced which is consistent with serving time, e.g. 100,000 units for 0.1ms request serving.

    In fact, this is not very accurate. In addition to the request serving time, the impact of network bandwidth will be considered in LES.

  • Token bucket is recharged with speed capacity, so higher capacity assigned, the client can send more or heavier requests

  • Mirror token bucket

    Not only server will maintain token buckets for all connected clients, client itself will also maintain a mirror token bucket for the connection.

    In the handshake, server will exchange some flow control parameters with les client which is listed below:

    • Maximal request cost table: MRC
    • Minimal recharge speed: MRR
    • Token bucket burst

    With these information, clients can use these locally maintained token buckets for request distribution. Therefore, in theory, all requests will be distributed to different connected servers. Each request is processed with the highest priority and the best effect on latency.

    Notably, these parameters are the promise from server to client. It means based on the MRC and MRR, client can always send requests no matter server is overload or not.

    Additional tricks

    • Extra bonus for token bucket recharge

      If the server is idle(no heavy requst load, no block processing), we can use a higher recharge speed(In other words, we give the client a higher capacity)

    Drawbacks

    However this model will have these drawbacks:

    • Now the request cost metering is bond with serving time, however we can abstract it out with different metering policies:

      • Metering by serving time
      • Metering by request count
    • It's very hard to make request cost table correct which is exchanged between server and client

      • Different machine will have totally different average request serving time
      • Low machine
        • Fast machine
    • Different status of Geth node will have totally different request serving time

      • Geth node will high pressure(Eth node is syncing from us, Les clients are sending requests to us)
      • Geth node is idle

    So in order to match the promise maximum cost table which server sends to client, we need an additional adjustment factor Global Factor to adjust the real cost via the proportion between Real Serving Time and Estimated Serving Time

  • It's unnecessary to throttle request in unit level

    We can see the benchmarking result, basically the average serving time of different request types is 0.45ms to 1ms, there is no big difference.

    ONE EXCEPTION is txpool related requests because txpool has lots of lock resource competition

    // average request cost estimates based on serving time
    	reqAvgTimeCost = protocol.RequestCostTable{
    		protocol.GetBlockHeadersMsg:     {150000, 30000},
    		protocol.GetBlockBodiesMsg:      {0, 700000},
    		protocol.GetReceiptsMsg:         {0, 1000000},
    		protocol.GetCodeMsg:             {0, 450000},
    		protocol.GetProofsV2Msg:         {0, 600000},
    		protocol.GetHelperTrieProofsMsg: {0, 1000000},
    		protocol.SendTxV2Msg:            {0, 450000},
    		protocol.GetTxStatusMsg:         {0, 250000},
    	}
    • It's incorrect to derive the initial BufferLimit and MinRecharge based on the initial status of server. Everything will change fast.

    If the request processing is slower than the promise to LES client?

    • The real cost will be adjusted to estmation level which is close to the estimated value
    • Capacity management module will undertake the task for Overload control

    If the request processing is faster than the promise to LES client?

    • The real cost will be adjusted to estimation level which is close to the estimated value

    Proposals

    Here are a few proposals for flow control model:

    • In current flow control implementation: les/flowcontrol, some capacity management code is mixed

      like raise total capcity or reduce total capacity whenever client freezing happened. We need to move these logics into capacity management module.

    • Abstract request metering policies, support request count metering

      • I'm not sure which one is the best policy, but I can ensure the raw count metering will simplify thing a lot

        • We can get rid of Global factor for adjustment
        • We can get rid of request serving time estimation which is may not accurate or outdated
        • We can calculate minimal capacity much much easier
      • I guess it won't be so bad if we use request count policy

        • The average serving time for different kinds request is similar(at least the same magnitude)

        • Heavier request can be assigned a higher weight:

          • GetBlockBodiesMsg, GetReceiptsMsg, GetBlockHeadersMsg

            These requests are trivial for serving, the weight can be assigned as 1.

          • GetCodeMsg, GetProofsV2Msg, GetHelperTrieProofsMsg

            These requests require state access which is heavier than chain access, the weight can be assigned as 2

          • SendTxV2Msg,GetTxStatusMsg

            These reqeusts require txpool access which may extermely slow if txpool is busy. Perhaps we need to improve txpool instead of assigning a weird high weight here?

      • The philosophy of simplification

        The original implementation is complicated but full-scale, we has a term unit and meter the request cost at unit level, so that we can consider serving cost, bandwidth cost, connection time in accurate level. But in the same time it introduces lots of complixity.

        The simplified version uses simple weight strategy, it's general may not accurate but simple. But the fact here is sometimes even we use unit metering, the status of system itself fluctuate heavily. So it might just have same effective with this simple approach.

        For other consideration, like bandwidth limitaion, we can use a separate mirror token bucket, each request will be metered based on the incoming request size of outgoing response size.

      • It's easier for business logic

        With the help of incentive module, now server can start to sell its resource to clients. However now server needs to sell some token which is consistent with resource unit. It's too hard for server operator to understand the whole thing, it's not easy for estimation and business logic setting.

        However request count metering is quite straightforward. Server operator can set the rule like:

        • 100 request/per second: 1 cent for 1 hour connection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment