Discussion:
[openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
Artom Lifshitz
2017-02-17 19:28:18 UTC
Permalink
Early on in the inception of device role tagging, it was decided that
it's acceptable that the device metadata on the config drive lags
behind the metadata API, as long as it eventually catches up, for
example when the instance is rebooted and we get a chance to
regenerate the config drive.

So far this hasn't really been a problem because devices could only be
tagged at instance boot time, and the tags never changed. So the
config drive was pretty always much up to date.

In Pike the tagged device attachment series of patches [1] will
hopefully merge, and we'll be in a situation where device tags can
change during instance uptime, which makes it that much more important
to regenerate the config drive whenever we get a chance.

However, when the config drive is first generated, some of the
information stored in there is only available at instance boot time
and is not persisted anywhere, as far as I can tell. Specifically, the
injected_files and admin_pass parameters [2] are passed from the API
and are not stored anywhere.

This creates a problem when we want to regenerated the config drive,
because the information that we're supposed to put in it is no longer
available to us.

We could start persisting this information in instance_extra, for
example, and pulling it up when the config drive is regenerated. We
could even conceivably hack something to read the metadata files from
the "old" config drive before refreshing them with new information.
However, is that really worth it? I feel like saying "the config drive
is static, deal with it - if you want to up to date metadata, use the
API" is an equally, if not more, valid option.

Thoughts? I know y'all are flying out to the PTG, so I'm unlikely to
get responses, but I've at least put my thoughts into writing, and
will be able to refer to them later on :)

[1] https://review.openstack.org/#/q/status:open+topic:bp/virt-device-tagged-attach-detach
[2] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2667-L2672

--
Artom Lifshitz
Michael Still
2017-02-17 22:41:03 UTC
Permalink
We have had this discussion several times in the past for other reasons.
The reality is that some people will never deploy the metadata API, so I
feel like we need a better solution than what we have now.

However, I would consider it probably unsafe for the hypervisor to read the
current config drive to get values, and persisting things like the instance
root password in the Nova DB sounds like a bad idea too.

Michael




On Feb 18, 2017 6:29 AM, "Artom Lifshitz" <***@redhat.com> wrote:

Early on in the inception of device role tagging, it was decided that
it's acceptable that the device metadata on the config drive lags
behind the metadata API, as long as it eventually catches up, for
example when the instance is rebooted and we get a chance to
regenerate the config drive.

So far this hasn't really been a problem because devices could only be
tagged at instance boot time, and the tags never changed. So the
config drive was pretty always much up to date.

In Pike the tagged device attachment series of patches [1] will
hopefully merge, and we'll be in a situation where device tags can
change during instance uptime, which makes it that much more important
to regenerate the config drive whenever we get a chance.

However, when the config drive is first generated, some of the
information stored in there is only available at instance boot time
and is not persisted anywhere, as far as I can tell. Specifically, the
injected_files and admin_pass parameters [2] are passed from the API
and are not stored anywhere.

This creates a problem when we want to regenerated the config drive,
because the information that we're supposed to put in it is no longer
available to us.

We could start persisting this information in instance_extra, for
example, and pulling it up when the config drive is regenerated. We
could even conceivably hack something to read the metadata files from
the "old" config drive before refreshing them with new information.
However, is that really worth it? I feel like saying "the config drive
is static, deal with it - if you want to up to date metadata, use the
API" is an equally, if not more, valid option.

Thoughts? I know y'all are flying out to the PTG, so I'm unlikely to
get responses, but I've at least put my thoughts into writing, and
will be able to refer to them later on :)

[1] https://review.openstack.org/#/q/status:open+topic:bp/virt-
device-tagged-attach-detach
[2] https://github.com/openstack/nova/blob/master/nova/virt/
libvirt/driver.py#L2667-L2672

--
Artom Lifshitz
Clint Byrum
2017-02-18 00:05:11 UTC
Permalink
Post by Michael Still
We have had this discussion several times in the past for other reasons.
The reality is that some people will never deploy the metadata API, so I
feel like we need a better solution than what we have now.
However, I would consider it probably unsafe for the hypervisor to read the
current config drive to get values, and persisting things like the instance
root password in the Nova DB sounds like a bad idea too.
Agreed. What if we simply have a second config drive that is for "things
that change" and only rebuild that one on reboot?
Jay Pipes
2017-02-20 15:02:33 UTC
Permalink
Post by Clint Byrum
Post by Michael Still
We have had this discussion several times in the past for other reasons.
The reality is that some people will never deploy the metadata API, so I
feel like we need a better solution than what we have now.
However, I would consider it probably unsafe for the hypervisor to read the
current config drive to get values, and persisting things like the instance
root password in the Nova DB sounds like a bad idea too.
Agreed. What if we simply have a second config drive that is for "things
that change" and only rebuild that one on reboot?
Or not.

Why are we trying to reinvent configuration management systems in Nova?

-jay
Artom Lifshitz
2017-02-18 13:11:10 UTC
Permalink
We have had this discussion several times in the past for other reasons. The
reality is that some people will never deploy the metadata API, so I feel
like we need a better solution than what we have now.
Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.
However, I would consider it probably unsafe for the hypervisor to read the
current config drive to get values
Yeah, I was using the word "hack" very generously ;)
and persisting things like the instance
root password in the Nova DB sounds like a bad idea too.
I hadn't even thought of the security implication. That's a very good
point, there's no way to persist admin_pass in securely. We'll have to
read it at some point, so no amount of encryption will change
anything. We can argue that since we already store admin_pass on the
config drive, storing it in the database as well is OK (it's probably
immediately changed anyways), but there's a difference between having
it in a file on a single compute node, and in the database accessible
by the entire deployment.
Agreed. What if we simply have a second config drive that is for "things
that change" and only rebuild that one on reboot?
We've already set the precedent that there's a single config drive
with the device tagging metadata on it, I don't think we can go back
on that promise.


So while we shouldn't read from the config drive to get current values
in order to afterwards monolithically regenerate a new one, we could
try just writing to the files we want changed. I'm thinking of a
system where code that needs to change information on the config drive
would have a way of telling it "here are the new values for
device_metadata", and whenever we next get a chance, for example when
the instance is rebooted, those values are saved on the config drive.
Dean Troyer
2017-02-18 14:21:29 UTC
Permalink
Post by Artom Lifshitz
I hadn't even thought of the security implication. That's a very good
point, there's no way to persist admin_pass in securely. We'll have to
read it at some point, so no amount of encryption will change
anything. We can argue that since we already store admin_pass on the
config drive, storing it in the database as well is OK (it's probably
immediately changed anyways), but there's a difference between having
it in a file on a single compute node, and in the database accessible
by the entire deployment.
Honestly I don't think a policy of "admin password is only available
on first boot (initial config drive)" is a bad one. Any workflow that
does not change that password immediately is broken security-wise, and
I see no reason for us to enable that sort of workflow to get worse by
extending the availability of that initial password.

We do not break existing users by ignoring the admin password on
subsequent config drive images. Of course I can always be
misunderestimating the "innovation" of users making use of config
drive in ways that none of us have imagined.

dt
--
Dean Troyer
***@gmail.com
Clint Byrum
2017-02-18 16:12:18 UTC
Permalink
Post by Dean Troyer
Post by Artom Lifshitz
I hadn't even thought of the security implication. That's a very good
point, there's no way to persist admin_pass in securely. We'll have to
read it at some point, so no amount of encryption will change
anything. We can argue that since we already store admin_pass on the
config drive, storing it in the database as well is OK (it's probably
immediately changed anyways), but there's a difference between having
it in a file on a single compute node, and in the database accessible
by the entire deployment.
Honestly I don't think a policy of "admin password is only available
on first boot (initial config drive)" is a bad one. Any workflow that
does not change that password immediately is broken security-wise, and
I see no reason for us to enable that sort of workflow to get worse by
extending the availability of that initial password.
I'm certain there are people who do not change it immediately, and
likely assert it at every boot because that's just the way it has always
worked.
Post by Dean Troyer
We do not break existing users by ignoring the admin password on
subsequent config drive images. Of course I can always be
misunderestimating the "innovation" of users making use of config
drive in ways that none of us have imagined.
Your API now is that it's in the config drive every time the box
boots. You break users by changing that.

Let's follow the lead of the Linux kernel and not break userspace.
Clint Byrum
2017-02-18 16:23:24 UTC
Permalink
Post by Artom Lifshitz
Post by Clint Byrum
Agreed. What if we simply have a second config drive that is for "things
that change" and only rebuild that one on reboot?
We've already set the precedent that there's a single config drive
with the device tagging metadata on it, I don't think we can go back
on that promise.
So while we shouldn't read from the config drive to get current values
in order to afterwards monolithically regenerate a new one, we could
try just writing to the files we want changed. I'm thinking of a
system where code that needs to change information on the config drive
would have a way of telling it "here are the new values for
device_metadata", and whenever we next get a chance, for example when
the instance is rebooted, those values are saved on the config drive.
Up until now, there's also been a precedent that config drive won't
change. So you'd need to define a new config drive API version that is
allowed to change (people may be relying on them not changing).

But I believe Michael is not saying "it's unsafe to read the json
files" but rather "it's unsafe to read the whole config drive". It's
an ISO filesystem, so you can't write to it. You have to read the whole
contents back into a directory and regenerate it. I'm guessing Michael
is concerned that there is some danger in doing this, though I can't
imagine what it is.
Dean Troyer
2017-02-18 17:36:54 UTC
Permalink
Post by Clint Byrum
But I believe Michael is not saying "it's unsafe to read the json
files" but rather "it's unsafe to read the whole config drive". It's
an ISO filesystem, so you can't write to it. You have to read the whole
contents back into a directory and regenerate it. I'm guessing Michael
is concerned that there is some danger in doing this, though I can't
imagine what it is.
Nova can be configured for config drive to be a VFAT filesystem, which
can not be trusted. Unfortunately this is (was??) required for
libvirt live migration to work so is likely to not be an edge case in
deployments.

The safest read-back approach would be to generate both ISO9660 and
VFAT (if configured) and only read back from the ISO version. But
yuck, two config drive images...still better than passwords in the
database.

dt
--
Dean Troyer
***@gmail.com
Artom Lifshitz
2017-02-18 18:54:11 UTC
Permalink
A few good points were made:

* the config drive could be VFAT, in which case we can't trust what's
on it because the guest has write access
* if the config drive is ISO9660, we can't selectively write to it, we
need to regenerate the whole thing - but in this case it's actually
safe to read from (right?)
* the point about the precedent being set that the config drive
doesn't change... I'm not sure I 100% agree. There's definitely a
precedent that information on the config drive will remain present for
the entire instance lifetime (so the admin_pass won't disappear after
a reboot, even if using that "feature" in a workflow seems ludicrous),
but we've made no promises that the information itself will remain
constant. For example, nothing says the device metadata must remain
unchanged after a reboot.

Based on that here's what I propose:

If the config drive is vfat, we can just update the information on it
that we need to update. In the device metadata case, we write a new
JSON file, overwriting the old one.

If the config drive is ISO9660, we can safely read from it to fill in
what information isn't persisted anywhere else, then update it with
the new stuff we want to change. Then write out the new image.
Post by Dean Troyer
Post by Clint Byrum
But I believe Michael is not saying "it's unsafe to read the json
files" but rather "it's unsafe to read the whole config drive". It's
an ISO filesystem, so you can't write to it. You have to read the whole
contents back into a directory and regenerate it. I'm guessing Michael
is concerned that there is some danger in doing this, though I can't
imagine what it is.
Nova can be configured for config drive to be a VFAT filesystem, which
can not be trusted. Unfortunately this is (was??) required for
libvirt live migration to work so is likely to not be an edge case in
deployments.
The safest read-back approach would be to generate both ISO9660 and
VFAT (if configured) and only read back from the ISO version. But
yuck, two config drive images...still better than passwords in the
database.
dt
--
Dean Troyer
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
--
Artom Lifshitz
Daniel P. Berrange
2017-02-20 13:41:47 UTC
Permalink
Post by Artom Lifshitz
* the config drive could be VFAT, in which case we can't trust what's
on it because the guest has write access
* if the config drive is ISO9660, we can't selectively write to it, we
need to regenerate the whole thing - but in this case it's actually
safe to read from (right?)
* the point about the precedent being set that the config drive
doesn't change... I'm not sure I 100% agree. There's definitely a
precedent that information on the config drive will remain present for
the entire instance lifetime (so the admin_pass won't disappear after
a reboot, even if using that "feature" in a workflow seems ludicrous),
but we've made no promises that the information itself will remain
constant. For example, nothing says the device metadata must remain
unchanged after a reboot.
If the config drive is vfat, we can just update the information on it
that we need to update. In the device metadata case, we write a new
JSON file, overwriting the old one.
If the config drive is ISO9660, we can safely read from it to fill in
what information isn't persisted anywhere else, then update it with
the new stuff we want to change. Then write out the new image.
Neither of these really cope with dynamically updating the role device
metdata for a *running* guest during a disk/nic hotplug for example.
You can't have the guest re-write the FS data that's in use by a running
guest.

For the CDROM based config drive, you would have to eject the virtual
media and insert new media.

IMHO, I'd just declare config drive readonly no matter what and anything
which requires dynamic data must use a different mechanism. Trying to
make config drive at all dynamic just opens a can of worms.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Steve Gordon
2017-02-18 19:12:13 UTC
Permalink
----- Original Message -----
Sent: Saturday, February 18, 2017 8:11:10 AM
Subject: Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on
it
We have had this discussion several times in the past for other reasons. The
reality is that some people will never deploy the metadata API, so I feel
like we need a better solution than what we have now.
Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.
The other reason is that the metadata API as it stands isn't an option for folks trying to do IPV6-only IIRC.

-Steve
Michael Still
2017-02-19 23:40:19 UTC
Permalink
Config drive over read-only NFS anyone?

Michael
Post by Steve Gordon
----- Original Message -----
To: "OpenStack Development Mailing List (not for usage questions)" <
Sent: Saturday, February 18, 2017 8:11:10 AM
Subject: Re: [openstack-dev] [nova] Device tagging: rebuild config drive
upon instance reboot to refresh metadata on
it
Post by Michael Still
We have had this discussion several times in the past for other
reasons.
Post by Michael Still
The
reality is that some people will never deploy the metadata API, so I
feel
Post by Michael Still
like we need a better solution than what we have now.
Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.
The other reason is that the metadata API as it stands isn't an option for
folks trying to do IPV6-only IIRC.
-Steve
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Rackspace Australia
Artom Lifshitz
2017-02-20 13:23:09 UTC
Permalink
Config drive over read-only NFS anyone?


A shared filesystem so that both Nova and the guest can do IO on it at the
same time is indeed the proper way to solve this. But I'm afraid of the
ramifications in terms of live migrations and all other operations we can
do on VMs...


Michael
Post by Steve Gordon
----- Original Message -----
To: "OpenStack Development Mailing List (not for usage questions)" <
Sent: Saturday, February 18, 2017 8:11:10 AM
Subject: Re: [openstack-dev] [nova] Device tagging: rebuild config drive
upon instance reboot to refresh metadata on
it
Post by Michael Still
We have had this discussion several times in the past for other
reasons.
Post by Michael Still
The
reality is that some people will never deploy the metadata API, so I
feel
Post by Michael Still
like we need a better solution than what we have now.
Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.
The other reason is that the metadata API as it stands isn't an option for
folks trying to do IPV6-only IIRC.
-Steve
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Rackspace Australia
Clint Byrum
2017-02-20 14:42:50 UTC
Permalink
Post by Michael Still
Config drive over read-only NFS anyone?
A shared filesystem so that both Nova and the guest can do IO on it at the
same time is indeed the proper way to solve this. But I'm afraid of the
ramifications in terms of live migrations and all other operations we can
do on VMs...
What makes anyone think this will perform better than the metadata
service?

If we can hand people an address that is NFS-capable, we can hand them
an HTTP(S) URL that doesn't have performance problems.
Post by Michael Still
Michael
Post by Steve Gordon
----- Original Message -----
To: "OpenStack Development Mailing List (not for usage questions)" <
Sent: Saturday, February 18, 2017 8:11:10 AM
Subject: Re: [openstack-dev] [nova] Device tagging: rebuild config drive
upon instance reboot to refresh metadata on
it
Post by Michael Still
We have had this discussion several times in the past for other
reasons.
Post by Michael Still
The
reality is that some people will never deploy the metadata API, so I
feel
Post by Michael Still
like we need a better solution than what we have now.
Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.
The other reason is that the metadata API as it stands isn't an option for
folks trying to do IPV6-only IIRC.
-Steve
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Daniel P. Berrange
2017-02-20 13:38:31 UTC
Permalink
Post by Artom Lifshitz
We have had this discussion several times in the past for other reasons. The
reality is that some people will never deploy the metadata API, so I feel
like we need a better solution than what we have now.
Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.
FYI, there are a variety of other options available in QEMU for exposing
metadata from the host to the guest that may be a better option than either
config drive or network metadata service, that we should consider.

- NVDIMM - this is an arbitrary block of data mapped into the guest OS
memory space. As the name suggests, from a physical hardware POV this
is non-volatile RAM, but in the virt space we have much more flexibilty.
It is possible to back an NVDIMM in the guest with a plain file in the
host, or with volatile ram in the host.

In the guest, the NVDIMM can be mapped as a block device, and from there
mounted as a filesystem. Now this isn't actually more useful that config
drive really, since guest filesystem drivers get upset if the host changes
the filesystem config behind its back. So this wouldn't magically make it
possible to dynamically update role device metdata at hotplug time.

Rather than mounting as a filesystem, you can also use NVDIMM directly
as a raw memory block, in which case it can contain whatever data format
you want - not merely a filesystem. With the right design, you could come
up with a format that let you store the role device metadata in a NVDIMM
and be able to update its contents on the fly for the guest during hotplug.

- virtio-vsock - think of this as UNIX domain sockets between the host and
guest. This is to deal with the valid use case of people wanting to use
a network protocol, but not wanting an real NIC exposed to the guest/host
for security concerns. As such I think it'd be useful to run the metadata
service over virtio-vsock as an option. It'd likely address at lesat some
people's security concerns wrt metadata service. It would also fix the
ability to use the metadat service in IPv6-only environments, as we would
not be using IP at all :-)


Both of these are pretty new features only recently added to qemu/libvirt
so its not going to immediately obsolete the config drive / IPv4 metadata
service, but they're things to consider IMHO. It would be valid to say
the config drive role device tagging metadata will always be readonly,
and if you want dynamic data you must use the metdata service over IPv4
or virtio-vsock.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Jeremy Stanley
2017-02-20 14:24:12 UTC
Permalink
On 2017-02-20 13:38:31 +0000 (+0000), Daniel P. Berrange wrote:
[...]
Post by Daniel P. Berrange
Rather than mounting as a filesystem, you can also use NVDIMM directly
as a raw memory block, in which case it can contain whatever data format
you want - not merely a filesystem. With the right design, you could come
up with a format that let you store the role device metadata in a NVDIMM
and be able to update its contents on the fly for the guest during hotplug.
[...]

Maybe it's just me, but this begs for a (likely fairly trivial?)
kernel module exposing that data under /sys or /proc (at least for
*nix guests).
--
Jeremy Stanley
Daniel P. Berrange
2017-02-20 15:46:43 UTC
Permalink
Post by Jeremy Stanley
[...]
Post by Daniel P. Berrange
Rather than mounting as a filesystem, you can also use NVDIMM directly
as a raw memory block, in which case it can contain whatever data format
you want - not merely a filesystem. With the right design, you could come
up with a format that let you store the role device metadata in a NVDIMM
and be able to update its contents on the fly for the guest during hotplug.
[...]
Maybe it's just me, but this begs for a (likely fairly trivial?)
kernel module exposing that data under /sys or /proc (at least for
*nix guests).
The data is exposed either as a block device or as a character device
in Linux - which one depends on how the NVDIMM is configured. Once
opening the right device you can simply mmap() the FD and read the
data. So exposing it as a file under sysfs doesn't really buy you
anything better.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Jeremy Stanley
2017-02-20 16:50:44 UTC
Permalink
Post by Daniel P. Berrange
The data is exposed either as a block device or as a character device
in Linux - which one depends on how the NVDIMM is configured. Once
opening the right device you can simply mmap() the FD and read the
data. So exposing it as a file under sysfs doesn't really buy you
anything better.
Oh! Fair enough, if you can already access it as a character device
then I agree that solves the use cases I was considering.
--
Jeremy Stanley
Tim Bell
2017-02-20 17:07:53 UTC
Permalink
Is there cloud-init support for this mode or do we still need to mount as a config drive?

Tim
Post by Daniel P. Berrange
The data is exposed either as a block device or as a character device
in Linux - which one depends on how the NVDIMM is configured. Once
opening the right device you can simply mmap() the FD and read the
data. So exposing it as a file under sysfs doesn't really buy you
anything better.
Oh! Fair enough, if you can already access it as a character device
then I agree that solves the use cases I was considering.
--
Jeremy Stanley

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-***@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Daniel P. Berrange
2017-02-20 17:22:50 UTC
Permalink
Post by Tim Bell
Is there cloud-init support for this mode or do we still need to mount as a config drive?
I don't think it particularly makes sense to expose the config drive
via NVDIMM - it wouldn't solve any of the problems that config drive
has today and it'd be less portable wrt guest OS.

Rather I was suggesting we should consider NVDIMM as a transport for
the role device tagging metadata standalone, as that could provide us
a way to live-update the metadata on the fly, which is impractical /
impossible when the metadata is hidden inside the config drive.

But before doing that though, I think it'd be worth understanding whether
metadata-over-vsock support would be acceptable to people who refuse
to deploy metadata-over-TCPIP today.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Artom Lifshitz
2017-02-20 17:46:09 UTC
Permalink
Post by Daniel P. Berrange
But before doing that though, I think it'd be worth understanding whether
metadata-over-vsock support would be acceptable to people who refuse
to deploy metadata-over-TCPIP today.
Sure, although I'm still concerned that it'll effectively make tagged
hotplug libvirt-only.
Artom Lifshitz
2017-02-20 17:54:10 UTC
Permalink
Post by Artom Lifshitz
Post by Daniel P. Berrange
But before doing that though, I think it'd be worth understanding whether
metadata-over-vsock support would be acceptable to people who refuse
to deploy metadata-over-TCPIP today.
Sure, although I'm still concerned that it'll effectively make tagged
hotplug libvirt-only.
Upon rethink, that not strictly true, there's still the existing
metadata service that works across all hypervisor drivers. I know
we're far for feature parity across all virt drivers, but would
metadata-over-vsock be acceptable? That's not even lack of feature
parity, that's a specific feature being exposed in a different (and
arguably worse) way depending on the virt driver.
Daniel P. Berrange
2017-02-20 17:57:47 UTC
Permalink
Post by Artom Lifshitz
Post by Daniel P. Berrange
But before doing that though, I think it'd be worth understanding whether
metadata-over-vsock support would be acceptable to people who refuse
to deploy metadata-over-TCPIP today.
Sure, although I'm still concerned that it'll effectively make tagged
hotplug libvirt-only.
Well there's still the option of accessing the metadata server the
traditional way over IP which is fully portable. If some deployments
choose to opt-out of this facility I don't neccessarily think we need
to continue to invent further mechanisms. At some point you have to
say what's there is good enough and if people choose to trade off
features against some other criteria so be it.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Artom Lifshitz
2017-02-20 18:23:59 UTC
Permalink
Post by Daniel P. Berrange
But before doing that though, I think it'd be worth understanding whether
metadata-over-vsock support would be acceptable to people who refuse
to deploy metadata-over-TCPIP today.
I wrote a thing [1], let's see what happens.

[1] http://lists.openstack.org/pipermail/openstack-operators/2017-February/012724.html
Joshua Harlow
2017-02-27 17:20:58 UTC
Permalink
Not afaik, first time I've heard about this type of device/data.
Post by Tim Bell
Is there cloud-init support for this mode or do we still need to mount as a config drive?
Tim
Post by Daniel P. Berrange
The data is exposed either as a block device or as a character device
in Linux - which one depends on how the NVDIMM is configured. Once
opening the right device you can simply mmap() the FD and read the
data. So exposing it as a file under sysfs doesn't really buy you
anything better.
Oh! Fair enough, if you can already access it as a character device
then I agree that solves the use cases I was considering.
--
Jeremy Stanley
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Artom Lifshitz
2017-02-27 15:30:33 UTC
Permalink
Post by Daniel P. Berrange
- virtio-vsock - think of this as UNIX domain sockets between the host and
guest. This is to deal with the valid use case of people wanting to use
a network protocol, but not wanting an real NIC exposed to the guest/host
for security concerns. As such I think it'd be useful to run the metadata
service over virtio-vsock as an option. It'd likely address at lesat some
people's security concerns wrt metadata service. It would also fix the
ability to use the metadat service in IPv6-only environments, as we would
not be using IP at all :-)
Is this currently exposed by libvirt? I had a look at [1] and couldn't
find any mention of 'vsock' or anything that resembles what you've
described.

[1] https://libvirt.org/formatdomain.html
Daniel P. Berrange
2017-02-27 15:33:26 UTC
Permalink
Post by Artom Lifshitz
Post by Daniel P. Berrange
- virtio-vsock - think of this as UNIX domain sockets between the host and
guest. This is to deal with the valid use case of people wanting to use
a network protocol, but not wanting an real NIC exposed to the guest/host
for security concerns. As such I think it'd be useful to run the metadata
service over virtio-vsock as an option. It'd likely address at lesat some
people's security concerns wrt metadata service. It would also fix the
ability to use the metadat service in IPv6-only environments, as we would
not be using IP at all :-)
Is this currently exposed by libvirt? I had a look at [1] and couldn't
find any mention of 'vsock' or anything that resembles what you've
described.
Not yet. The basic QEMU feature merged in 2.8.0, but we're still wiring
up varous bits of userspace. eg selinux-policy, libvirt, nfs server, and
so on to understand vsock

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Jay Pipes
2017-02-20 15:00:06 UTC
Permalink
Post by Artom Lifshitz
Early on in the inception of device role tagging, it was decided that
it's acceptable that the device metadata on the config drive lags
behind the metadata API, as long as it eventually catches up, for
example when the instance is rebooted and we get a chance to
regenerate the config drive.
So far this hasn't really been a problem because devices could only be
tagged at instance boot time, and the tags never changed. So the
config drive was pretty always much up to date.
In Pike the tagged device attachment series of patches [1] will
hopefully merge, and we'll be in a situation where device tags can
change during instance uptime, which makes it that much more important
to regenerate the config drive whenever we get a chance.
However, when the config drive is first generated, some of the
information stored in there is only available at instance boot time
and is not persisted anywhere, as far as I can tell. Specifically, the
injected_files and admin_pass parameters [2] are passed from the API
and are not stored anywhere.
This creates a problem when we want to regenerated the config drive,
because the information that we're supposed to put in it is no longer
available to us.
We could start persisting this information in instance_extra, for
example, and pulling it up when the config drive is regenerated. We
could even conceivably hack something to read the metadata files from
the "old" config drive before refreshing them with new information.
However, is that really worth it? I feel like saying "the config drive
is static, deal with it - if you want to up to date metadata, use the
API" is an equally, if not more, valid option.
Yeah, config drive should, IMHO, be static, readonly. If you want to
change device tags or other configuration data after boot, use a
configuration management system or something like etcd watches. I don't
think Nova should be responsible for this.

Best,
-jay
Clint Byrum
2017-02-20 15:35:14 UTC
Permalink
Post by Jay Pipes
Post by Artom Lifshitz
Early on in the inception of device role tagging, it was decided that
it's acceptable that the device metadata on the config drive lags
behind the metadata API, as long as it eventually catches up, for
example when the instance is rebooted and we get a chance to
regenerate the config drive.
So far this hasn't really been a problem because devices could only be
tagged at instance boot time, and the tags never changed. So the
config drive was pretty always much up to date.
In Pike the tagged device attachment series of patches [1] will
hopefully merge, and we'll be in a situation where device tags can
change during instance uptime, which makes it that much more important
to regenerate the config drive whenever we get a chance.
However, when the config drive is first generated, some of the
information stored in there is only available at instance boot time
and is not persisted anywhere, as far as I can tell. Specifically, the
injected_files and admin_pass parameters [2] are passed from the API
and are not stored anywhere.
This creates a problem when we want to regenerated the config drive,
because the information that we're supposed to put in it is no longer
available to us.
We could start persisting this information in instance_extra, for
example, and pulling it up when the config drive is regenerated. We
could even conceivably hack something to read the metadata files from
the "old" config drive before refreshing them with new information.
However, is that really worth it? I feel like saying "the config drive
is static, deal with it - if you want to up to date metadata, use the
API" is an equally, if not more, valid option.
Yeah, config drive should, IMHO, be static, readonly. If you want to
change device tags or other configuration data after boot, use a
configuration management system or something like etcd watches. I don't
think Nova should be responsible for this.
I tend to agree with you, and I personally wouldn't write apps that need
this. However, in the interest of understanding the desire to change this,
I think the scenario is this:

1) Servers are booted with {n_tagged_devices} and come up, actions happen
using automated thing that reads device tags and reacts accordingly.

2) A new device is added to the general configuration.

3) New servers configure themselves with the new devices automatically. But
existing servers do not have those device tags in their config drive. In
order to configure these, one would now have to write a fair amount of
orchestration to duplicate what already exists for new servers.

While I'm a big fan of the cattle approach (just delete those old
servers!) I don't think OpenStack is constrained enough to say that
this is always going to be efficient. And writing two paths for server
configuration feels like repeating yourself.

I don't have a perfect answer to this, but I don't think "just don't
do that" is sufficient as a response. We allowed the tags in config
drive. We have to deal with the unintended consequences of that decision.
Artom Lifshitz
2017-02-20 15:46:09 UTC
Permalink
I don't think we're trying to re-invent configuration management in
Nova. We have this problem where we want to communicate to the guest,
from the host, a bunch of dynamic metadata that can change throughout
the guest's lifetime. We currently have two possible avenues for this
already in place, and both have problems:

1. The metadata service isn't universally deployed by operators for
security and other reasons.
2. The config drive was never designed for dynamic metadata.

So far in this thread we've mostly been discussing ways to shoehorn a
solution into the config drive avenue, but that's going to be ugly no
matter what because it was never designed for what we're trying to do
in the first place.

Some folks are saying that we admit that the config drive is only for
static information and metadata that is known at boot time, and work
on a third way to communicate dynamic metadata to the guest. I can get
behind that 100%. I like the virtio-vsock option, but that's only
supported by libvirt IIUC. We've got device tagging support in hyper-v
as well, and xenapi hopefully on the way soon [1], so we need
something a bit more universal. How about fixing up the metadata
service to be more deployable, both in terms of security, and IPv6
support?

[1] https://review.openstack.org/#/c/333781/
Post by Clint Byrum
Post by Jay Pipes
Post by Artom Lifshitz
Early on in the inception of device role tagging, it was decided that
it's acceptable that the device metadata on the config drive lags
behind the metadata API, as long as it eventually catches up, for
example when the instance is rebooted and we get a chance to
regenerate the config drive.
So far this hasn't really been a problem because devices could only be
tagged at instance boot time, and the tags never changed. So the
config drive was pretty always much up to date.
In Pike the tagged device attachment series of patches [1] will
hopefully merge, and we'll be in a situation where device tags can
change during instance uptime, which makes it that much more important
to regenerate the config drive whenever we get a chance.
However, when the config drive is first generated, some of the
information stored in there is only available at instance boot time
and is not persisted anywhere, as far as I can tell. Specifically, the
injected_files and admin_pass parameters [2] are passed from the API
and are not stored anywhere.
This creates a problem when we want to regenerated the config drive,
because the information that we're supposed to put in it is no longer
available to us.
We could start persisting this information in instance_extra, for
example, and pulling it up when the config drive is regenerated. We
could even conceivably hack something to read the metadata files from
the "old" config drive before refreshing them with new information.
However, is that really worth it? I feel like saying "the config drive
is static, deal with it - if you want to up to date metadata, use the
API" is an equally, if not more, valid option.
Yeah, config drive should, IMHO, be static, readonly. If you want to
change device tags or other configuration data after boot, use a
configuration management system or something like etcd watches. I don't
think Nova should be responsible for this.
I tend to agree with you, and I personally wouldn't write apps that need
this. However, in the interest of understanding the desire to change this,
1) Servers are booted with {n_tagged_devices} and come up, actions happen
using automated thing that reads device tags and reacts accordingly.
2) A new device is added to the general configuration.
3) New servers configure themselves with the new devices automatically. But
existing servers do not have those device tags in their config drive. In
order to configure these, one would now have to write a fair amount of
orchestration to duplicate what already exists for new servers.
While I'm a big fan of the cattle approach (just delete those old
servers!) I don't think OpenStack is constrained enough to say that
this is always going to be efficient. And writing two paths for server
configuration feels like repeating yourself.
I don't have a perfect answer to this, but I don't think "just don't
do that" is sufficient as a response. We allowed the tags in config
drive. We have to deal with the unintended consequences of that decision.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
--
Artom Lifshitz
Daniel P. Berrange
2017-02-20 15:59:59 UTC
Permalink
Post by Artom Lifshitz
I don't think we're trying to re-invent configuration management in
Nova. We have this problem where we want to communicate to the guest,
from the host, a bunch of dynamic metadata that can change throughout
the guest's lifetime. We currently have two possible avenues for this
1. The metadata service isn't universally deployed by operators for
security and other reasons.
2. The config drive was never designed for dynamic metadata.
So far in this thread we've mostly been discussing ways to shoehorn a
solution into the config drive avenue, but that's going to be ugly no
matter what because it was never designed for what we're trying to do
in the first place.
Some folks are saying that we admit that the config drive is only for
static information and metadata that is known at boot time, and work
on a third way to communicate dynamic metadata to the guest. I can get
behind that 100%. I like the virtio-vsock option, but that's only
supported by libvirt IIUC. We've got device tagging support in hyper-v
as well, and xenapi hopefully on the way soon [1], so we need
something a bit more universal. How about fixing up the metadata
service to be more deployable, both in terms of security, and IPv6
support?
FYI, virtio-vsock is not actually libvirt specific. the VSOCK sockets
transport was in fact invented by VMWare and first merged into Linux
in 2013 as a vmware guest driver.

A mapping of the VSOCK protocol over virtio was later defined to enable
VSOCK to be used with QEMU, KVM and Xen all of which support virtio.
The intention was explicitly that applications consuming VSOCK in the
guest would be portable between KVM & VMWare.

That said I don't think it is available via XenAPI, and doubt hyperv
will support it any time soon, but it is none the less a portable
standard if HVs decide they want such a feature.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Loading...