Matt Young
2018-05-13 20:09:46 UTC
Re: resolving network latency issue on the promotion server in
tripleo-infra tenant, that's great news!
Re: retrospective on this class of issue, I'll reach out directly early
this week to get something on the calendar for our two teams. We clearly
need to brainstorm/hash out together how we can reduce the turbulence
moving forward.
In addition, as a result of working these issues over the past few days
we've identified a few pieces of low hanging (tooling) fruit that are ripe
for for improvements that will speed diagnosis / debug in the future.
We'll capture these as RFE's and get them into our backlog.
Matt
version? Can't we test the image before actually using it? We could have
experimental jobs testing latest image and pin gate images to a specific
one?
Like we could configure infra to deploy centos 7.4 in our gate and 7.5 in
experimental, so we can take our time to fix eventual problems and make the
switch when we're ready, instead of dealing with fires (that usually come
all together).
It would be great to make a retrospective on this thing between tripleo
ci & infra folks, and see how we can improve things.
I agree,
We need to in coordination with the infra team be able to pin / lock
content for production check and gate jobs while also have the ability to
stage new content e.g. centos 7.5 with experimental or periodic jobs.
In this particular case the ci team did check the tripleo deployment w/
centos 7.5 updates, however we did not stage or test what impact the centos
minor update would have on the upstream job workflow.
The key issue is that the base centos image used upstream can not be
pinned by the ci team, if say we could pin that image the ci team could pin
the centos repos used in ci and run staging jobs on the latest centos
content.
I'm glad that you also see the need for some amount of coordination here,
I've been in contact with a few folks to initiate the conversation.
In an unrelated note, Sagi and I just fixed the network latency issue on
our promotion server, it was related to DNS. Automatic promotions should
be back online.
Thanks all.
tripleo-infra tenant, that's great news!
Re: retrospective on this class of issue, I'll reach out directly early
this week to get something on the calendar for our two teams. We clearly
need to brainstorm/hash out together how we can reduce the turbulence
moving forward.
In addition, as a result of working these issues over the past few days
we've identified a few pieces of low hanging (tooling) fruit that are ripe
for for improvements that will speed diagnosis / debug in the future.
We'll capture these as RFE's and get them into our backlog.
Matt
2. Shortly after #1 was resolved CentOS released 7.5 which comes
directly into the upstream repos untested and ungated. Additionally the
associated qcow2 image and container-base images were not updated at the
same time as the yum repos. https://bugs.launchpad.net/tripleo/+bug/
1770355
Why do we have this situation everytime the OS is upgraded to a majordirectly into the upstream repos untested and ungated. Additionally the
associated qcow2 image and container-base images were not updated at the
same time as the yum repos. https://bugs.launchpad.net/tripleo/+bug/
1770355
version? Can't we test the image before actually using it? We could have
experimental jobs testing latest image and pin gate images to a specific
one?
Like we could configure infra to deploy centos 7.4 in our gate and 7.5 in
experimental, so we can take our time to fix eventual problems and make the
switch when we're ready, instead of dealing with fires (that usually come
all together).
It would be great to make a retrospective on this thing between tripleo
ci & infra folks, and see how we can improve things.
We need to in coordination with the infra team be able to pin / lock
content for production check and gate jobs while also have the ability to
stage new content e.g. centos 7.5 with experimental or periodic jobs.
In this particular case the ci team did check the tripleo deployment w/
centos 7.5 updates, however we did not stage or test what impact the centos
minor update would have on the upstream job workflow.
The key issue is that the base centos image used upstream can not be
pinned by the ci team, if say we could pin that image the ci team could pin
the centos repos used in ci and run staging jobs on the latest centos
content.
I'm glad that you also see the need for some amount of coordination here,
I've been in contact with a few folks to initiate the conversation.
In an unrelated note, Sagi and I just fixed the network latency issue on
our promotion server, it was related to DNS. Automatic promotions should
be back online.
Thanks all.
--
Emilien Macchi
____________________________________________________________
______________
OpenStack Development Mailing List (not for usage questions)
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Emilien Macchi
____________________________________________________________
______________
OpenStack Development Mailing List (not for usage questions)
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev