Summary: In multi-threaded applications where we have multiple inferences on the same model in parallel (consider e.g. a TTS system handling multiple requests), it can be useful to share the parameters of a model amongst these multiple instances. This improves the cache utilization behaviour of the system, as multiple cores can use the same set of weights instead of evicting the identical copies of weights in a shared cache. As the underlying `NDArray` instances in `data_entry_` implement a ref-counted based sharing system, this is a simple modification of the `GraphRuntime::LoadParams` logic to instead copy parameters from an existing GraphRuntime instance. This is a little ugly in that we need both the pre-existing GraphRuntime instance, as well as the 'serialized' params (since we need to know the set of names we should copy), but without imposing additional assumptions (i.e. storing the set of param names in GraphRuntime, and enforcing that shared param names are identical to the parameters set in the preceding `LoadParams` call), this seems unavoidable. Test Plan: Unit test added.
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
api | Loading commit data... | |
arithmetic | Loading commit data... | |
autotvm | Loading commit data... | |
codegen | Loading commit data... | |
common | Loading commit data... | |
contrib | Loading commit data... | |
lang | Loading commit data... | |
op | Loading commit data... | |
pass | Loading commit data... | |
relay | Loading commit data... | |
runtime | Loading commit data... | |
schedule | Loading commit data... | |
README.md | Loading commit data... |