Nadathur, Sundar
2018-05-16 17:01:44 UTC
Hi,
  The Cyborg quota spec [1] proposes to implement a quota (maximum
usage) for accelerators on a per-project basis, to prevent one project
(tenant) from over-using some resources and starving other tenants.
There are separate resource classes for different accelerator types
(GPUs, FPGAs, etc.), and so we can do quotas per RC.
The current proposal [2] is to track the usage in Cyborg agent/driver. I
am not sure that scheme will work, as I have indicated in the comments
on [1]. Here is another possible way.
* The operator configures the oslo.limit in keystone per-project
per-resource-class (GPU, FPGA, ...).
o Until this gets into Keystone, Cyborg may define its own quota
table, as defined in [1].
* Cyborg implements a table to track per-project usage, as defined in [1].
* Cyborg provides a filter for the Nova scheduler, which checks
whether the project making the request has exceeded its own quota.
o If so, it removes all candidates, thus failing the request.
o If not, it updates the per-project usage in its own DB. Since
this is an out-of-tree filter, at least to start with, it should
be ok to directly update the db without making REST API calls.
IOW, the resource usage tracking and enforcement are done as part of the
request scheduling, rather than done at the compute node.
If there are better ways, or ways to avoid a filter, please LMK.
[1] https://review.openstack.org/#/c/560285/
[2] https://review.openstack.org/#/c/564968/
Thanks.
Regards,
Sundar
  The Cyborg quota spec [1] proposes to implement a quota (maximum
usage) for accelerators on a per-project basis, to prevent one project
(tenant) from over-using some resources and starving other tenants.
There are separate resource classes for different accelerator types
(GPUs, FPGAs, etc.), and so we can do quotas per RC.
The current proposal [2] is to track the usage in Cyborg agent/driver. I
am not sure that scheme will work, as I have indicated in the comments
on [1]. Here is another possible way.
* The operator configures the oslo.limit in keystone per-project
per-resource-class (GPU, FPGA, ...).
o Until this gets into Keystone, Cyborg may define its own quota
table, as defined in [1].
* Cyborg implements a table to track per-project usage, as defined in [1].
* Cyborg provides a filter for the Nova scheduler, which checks
whether the project making the request has exceeded its own quota.
o If so, it removes all candidates, thus failing the request.
o If not, it updates the per-project usage in its own DB. Since
this is an out-of-tree filter, at least to start with, it should
be ok to directly update the db without making REST API calls.
IOW, the resource usage tracking and enforcement are done as part of the
request scheduling, rather than done at the compute node.
If there are better ways, or ways to avoid a filter, please LMK.
[1] https://review.openstack.org/#/c/560285/
[2] https://review.openstack.org/#/c/564968/
Thanks.
Regards,
Sundar