Matt Riedemann
2018-04-23 19:48:38 UTC
We seem to be at a bit of an impasse in this spec amendment [1] so I
want to try and summarize the alternative solutions as I see them.
The overall goal of the blueprint is to allow defining traits via image
properties, like flavor extra specs. Those image-defined traits are used
to filter hosts during scheduling of the instance. During server create,
that filtering happens during the normal "GET /allocation_candidates"
call to placement.
The problem is during rebuild with a new image that specifies new
required traits. A rebuild is not a move operation, but we run through
the scheduler filters to make sure the new image (if one is specified),
is valid for the host on which the instance is currently running.
We don't currently call "GET /allocation_candidates" during rebuild
because that could inadvertently filter out the host we know we need
[2]. Also, since flavors don't change for rebuild, we haven't had a need
for getting allocation candidates during rebuild since we're not
allocating new resources (pretend bug 1763766 [3] does not exist for now).
Now that we know the problem, here are some of the solutions that have
been discussed in the spec amendment, again, only for rebuild with a new
image that has new traits:
1. Fail in the API saying you can't rebuild with a new image with new
required traits.
Pros:
- Simple way to keep the new image off a host that doesn't support it.
- Similar solution to volume-backed rebuild with a new image.
Cons:
- Confusing user experience since they might be able to rebuild with
some new images but not others with no clear explanation about the
difference.
2. Have the ImagePropertiesFilter call "GET
/resource_providers/{rp_uuid}/traits" and compare the compute node root
provider traits against the new image's required traits.
Pros:
- Avoids having to call "GET /allocation_candidates" during rebuild.
- Simple way to compare the required image traits against the compute
node provider traits.
Cons:
- Does not account for nested providers so the scheduler could reject
the image due to its required traits which actually apply to a nested
provider in the tree. This is somewhat related to bug 1763766.
3. Slight variation on #2 except build a set of all traits from all
providers in the same tree.
Pros:
- Handles the nested provider traits issue from #2.
Cons:
- Duplicates filtering in ImagePropertiesFilter that could otherwise
happen in "GET /allocation_candidates".
4. Add a microversion to change "GET /allocation_candidates" to make two
changes:
a) Add an "in_tree" filter like in "GET /resource_providers". This would
be needed to limit the scope of what gets returned since we know we only
want to check against one specific host (the current host for the instance).
b) Make "resources" optional since on a rebuild we don't want to
allocate new resources (again, notwithstanding bug 1763766).
Pros:
- We can call "GET /allocation_candidates?in_tree=<current node rp
UUID>&required=<new image required traits>" and if nothing is returned,
we know the new image's required traits don't work with the current node.
- The filtering is baked into "GET /allocation_candidates" and not
client-side in ImagePropertiesFilter.
Cons:
- Changes to the "GET /allocation_candidates" API which is going to be
more complicated and more up-front work, but I don't have a good idea of
how hard this would be to add since we already have the same "in_tree"
logic in "GET /resource_providers".
- Potentially slows down the completion of the overall blueprint.
===========
My personal thoughts are, I don't like option 1 since it adds technical
debt which we'll eventually just need to solve later (think about [4]).
Similar feelings for #2. #3 might be a short-term solution until #4 is
done, but I think the best long-term solution to this problem is #4.
[1] https://review.openstack.org/#/c/560718/
[2] https://review.openstack.org/#/c/546357/
[3] https://bugs.launchpad.net/nova/+bug/1763766
[4] https://review.openstack.org/#/c/532407/
want to try and summarize the alternative solutions as I see them.
The overall goal of the blueprint is to allow defining traits via image
properties, like flavor extra specs. Those image-defined traits are used
to filter hosts during scheduling of the instance. During server create,
that filtering happens during the normal "GET /allocation_candidates"
call to placement.
The problem is during rebuild with a new image that specifies new
required traits. A rebuild is not a move operation, but we run through
the scheduler filters to make sure the new image (if one is specified),
is valid for the host on which the instance is currently running.
We don't currently call "GET /allocation_candidates" during rebuild
because that could inadvertently filter out the host we know we need
[2]. Also, since flavors don't change for rebuild, we haven't had a need
for getting allocation candidates during rebuild since we're not
allocating new resources (pretend bug 1763766 [3] does not exist for now).
Now that we know the problem, here are some of the solutions that have
been discussed in the spec amendment, again, only for rebuild with a new
image that has new traits:
1. Fail in the API saying you can't rebuild with a new image with new
required traits.
Pros:
- Simple way to keep the new image off a host that doesn't support it.
- Similar solution to volume-backed rebuild with a new image.
Cons:
- Confusing user experience since they might be able to rebuild with
some new images but not others with no clear explanation about the
difference.
2. Have the ImagePropertiesFilter call "GET
/resource_providers/{rp_uuid}/traits" and compare the compute node root
provider traits against the new image's required traits.
Pros:
- Avoids having to call "GET /allocation_candidates" during rebuild.
- Simple way to compare the required image traits against the compute
node provider traits.
Cons:
- Does not account for nested providers so the scheduler could reject
the image due to its required traits which actually apply to a nested
provider in the tree. This is somewhat related to bug 1763766.
3. Slight variation on #2 except build a set of all traits from all
providers in the same tree.
Pros:
- Handles the nested provider traits issue from #2.
Cons:
- Duplicates filtering in ImagePropertiesFilter that could otherwise
happen in "GET /allocation_candidates".
4. Add a microversion to change "GET /allocation_candidates" to make two
changes:
a) Add an "in_tree" filter like in "GET /resource_providers". This would
be needed to limit the scope of what gets returned since we know we only
want to check against one specific host (the current host for the instance).
b) Make "resources" optional since on a rebuild we don't want to
allocate new resources (again, notwithstanding bug 1763766).
Pros:
- We can call "GET /allocation_candidates?in_tree=<current node rp
UUID>&required=<new image required traits>" and if nothing is returned,
we know the new image's required traits don't work with the current node.
- The filtering is baked into "GET /allocation_candidates" and not
client-side in ImagePropertiesFilter.
Cons:
- Changes to the "GET /allocation_candidates" API which is going to be
more complicated and more up-front work, but I don't have a good idea of
how hard this would be to add since we already have the same "in_tree"
logic in "GET /resource_providers".
- Potentially slows down the completion of the overall blueprint.
===========
My personal thoughts are, I don't like option 1 since it adds technical
debt which we'll eventually just need to solve later (think about [4]).
Similar feelings for #2. #3 might be a short-term solution until #4 is
done, but I think the best long-term solution to this problem is #4.
[1] https://review.openstack.org/#/c/560718/
[2] https://review.openstack.org/#/c/546357/
[3] https://bugs.launchpad.net/nova/+bug/1763766
[4] https://review.openstack.org/#/c/532407/
--
Thanks,
Matt
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-***@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mail
Thanks,
Matt
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-***@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mail