Discussion:
[openstack-dev] Does the openstack ci vms start each time clear up enough?
linghucongsong
2018-05-04 08:13:41 UTC
Permalink
Hi all!

Recently we meet a strange problem in our ci. look this link: https://review.openstack.org/#/c/532097/

we can pass the ci in the first time, but when we begin to start the gate job, it will always failed in the second time.

we have rebased several times, it alway pass the ci in the first time and failed in the second time.

This have not happen before and make me to guess is it really we start the ci from the new fresh vms each time?
Sean McGinnis
2018-05-04 09:37:46 UTC
Permalink
Post by linghucongsong
Hi all!
Recently we meet a strange problem in our ci. look this link: https://review.openstack.org/#/c/532097/
we can pass the ci in the first time, but when we begin to start the gate job, it will always failed in the second time.
we have rebased several times, it alway pass the ci in the first time and failed in the second time.
This have not happen before and make me to guess is it really we start the ci from the new fresh vms each time?
A new VM is spun up for each test run, so I don't believe this is an issue with
stale artifacts on the host. I would guess this is more likely some sort of
race condition, and you just happen to be hitting it 50% of the time.

You can probably keep rechecking the patch until you get lucky. But it would be
dangerous to do that without knowing the source of the failure. You will just
most likely keep running into this in future patches then.

The failure looks very odd, with the test expecting a status to be returned but
instead getting a ServerDetail object:

http://logs.openstack.org/97/532097/16/gate/legacy-tricircle-dsvm-multiregion/ad547d5/job-output.txt.gz#_2018-05-04_03_57_05_399493


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-***@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/m
Clark Boylan
2018-05-04 15:52:29 UTC
Permalink
Post by Sean McGinnis
Post by linghucongsong
Hi all!
Recently we meet a strange problem in our ci. look this link: https://review.openstack.org/#/c/532097/
we can pass the ci in the first time, but when we begin to start the gate job, it will always failed in the second time.
we have rebased several times, it alway pass the ci in the first time and failed in the second time.
This have not happen before and make me to guess is it really we start the ci from the new fresh vms each time?
A new VM is spun up for each test run, so I don't believe this is an issue with
stale artifacts on the host. I would guess this is more likely some sort of
race condition, and you just happen to be hitting it 50% of the time.
Additionally you can check the job logs to see while these two jobs did run against the same cloud provider they did so in different regions on hosts with completely different IP addresses. The inventory files [0][1] are where I would start if you suspect oddness of this sort. Reading them I don't see anything to indicate the nodes were reused.

[0] http://logs.openstack.org/97/532097/16/check/legacy-tricircle-dsvm-multiregion/c9b3d29/zuul-info/inventory.yaml
[1] http://logs.openstack.org/97/532097/16/gate/legacy-tricircle-dsvm-multiregion/ad547d5/zuul-info/inventory.yaml

Clark
Vega Cai
2018-05-06 10:20:47 UTC
Permalink
To test whether it's our new patch that causes the problem, I submitted a
dummy patch[1] to trigger CI and the CI failed again. Checking the log of
nova scheduler, it's very strange that the scheduling starts with 0 host at
the beginning.

May 06 09:40:34.358585
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_40_34_358585>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_service.periodic_task [None
req-008ee30a-47a1-40a2-bf64-cb0f1719806e None None] Running periodic
task SchedulerManager._run_periodic_tasks {{(pid=23795)
run_periodic_tasks
/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:215}}May
06 09:41:23.968029
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_23_968029>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
nova.scheduler.manager [None req-c67986fa-2e3b-45b7-96dd-196704945b95
admin admin] *Starting to schedule for instances:
[u'8b227e85-8959-4e07-be3d-1bc094c115c1']* {{(pid=23795)
select_destinations
/opt/stack/new/nova/nova/scheduler/manager.py:118}}May 06
09:41:23.969293
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_23_969293>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_concurrency.lockutils [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] Lock
"placement_client" acquired by
"nova.scheduler.client.report._create_client" :: waited 0.000s
{{(pid=23795) inner
/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:273}}May
06 09:41:23.975304
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_23_975304>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_concurrency.lockutils [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] Lock
"placement_client" released by
"nova.scheduler.client.report._create_client" :: held 0.006s
{{(pid=23795) inner
/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:285}}May
06 09:41:24.276470
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_276470>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_concurrency.lockutils [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] Lock
"6e118c71-9008-4694-8aee-faa607944c5f" acquired by
"nova.context.get_or_set_cached_cell_and_set_connections" :: waited
0.000s {{(pid=23795) inner
/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:273}}May
06 09:41:24.279331
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_279331>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_concurrency.lockutils [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] Lock
"6e118c71-9008-4694-8aee-faa607944c5f" released by
"nova.context.get_or_set_cached_cell_and_set_connections" :: held
0.003s {{(pid=23795) inner
/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:285}}May
06 09:41:24.302854
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_302854>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_db.sqlalchemy.engines [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] MySQL server
mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
{{(pid=23795) _check_effective_sql_mode
/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:308}}May
06 09:41:24.321713
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_321713>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
nova.filters [None req-c67986fa-2e3b-45b7-96dd-196704945b95 admin
admin] *Starting with 0 host(s)* {{(pid=23795) get_filtered_objects
/opt/stack/new/nova/nova/filters.py:70}}May 06 09:41:24.322136
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_322136>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: INFO
nova.filters [None req-c67986fa-2e3b-45b7-96dd-196704945b95 admin
admin] Filter RetryFilter returned 0 hostsMay 06 09:41:24.322614
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_322614>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
nova.filters [None req-c67986fa-2e3b-45b7-96dd-196704945b95 admin
admin] Filtering removed all hosts for the request with instance ID
'8b227e85-8959-4e07-be3d-1bc094c115c1'. Filter results:
[('RetryFilter', None)] {{(pid=23795) get_filtered_objects
/opt/stack/new/nova/nova/filters.py:129}}May 06 09:41:24.323029
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_323029>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: INFO
nova.filters [None req-c67986fa-2e3b-45b7-96dd-196704945b95 admin
admin] Filtering removed all hosts for the request with instance ID
'8b227e85-8959-4e07-be3d-1bc094c115c1'. Filter results: ['RetryFilter:
(start: 0, end: 0)']May 06 09:41:24.323419
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_323419>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
nova.scheduler.filter_scheduler [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] Filtered []
{{(pid=23795) _get_sorted_hosts
/opt/stack/new/nova/nova/scheduler/filter_scheduler.py:404}}May 06
09:41:24.323861
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_24_323861>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
nova.scheduler.filter_scheduler [None
req-c67986fa-2e3b-45b7-96dd-196704945b95 admin admin] There are 0
hosts available but 1 instances requested to build. {{(pid=23795)
_ensure_sufficient_hosts
/opt/stack/new/nova/nova/scheduler/filter_scheduler.py:278}}May 06
09:41:26.358317
<http://logs.openstack.org/59/566459/1/check/legacy-tricircle-dsvm-multiregion/f2e2510/logs/screen-n-sch.txt.gz#_May_06_09_41_26_358317>
ubuntu-xenial-inap-mtl01-0003885152 nova-scheduler[21962]: DEBUG
oslo_service.periodic_task [None
req-008ee30a-47a1-40a2-bf64-cb0f1719806e None None] Running periodic
task SchedulerManager._run_periodic_tasks {{(pid=23794)
run_periodic_tasks
/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:215}}


I copy the log between two periodic task log records to show one whole
scheduling process.

Zhiyuan
Post by linghucongsong
Post by Sean McGinnis
Post by linghucongsong
Hi all!
https://review.openstack.org/#/c/532097/
Post by Sean McGinnis
Post by linghucongsong
we can pass the ci in the first time, but when we begin to start the
gate job, it will always failed in the second time.
Post by Sean McGinnis
Post by linghucongsong
we have rebased several times, it alway pass the ci in the first time
and failed in the second time.
Post by Sean McGinnis
Post by linghucongsong
This have not happen before and make me to guess is it really we
start the ci from the new fresh vms each time?
Post by Sean McGinnis
A new VM is spun up for each test run, so I don't believe this is an
issue with
Post by Sean McGinnis
stale artifacts on the host. I would guess this is more likely some sort
of
Post by Sean McGinnis
race condition, and you just happen to be hitting it 50% of the time.
Additionally you can check the job logs to see while these two jobs did
run against the same cloud provider they did so in different regions on
hosts with completely different IP addresses. The inventory files [0][1]
are where I would start if you suspect oddness of this sort. Reading them I
don't see anything to indicate the nodes were reused.
[0]
http://logs.openstack.org/97/532097/16/check/legacy-tricircle-dsvm-multiregion/c9b3d29/zuul-info/inventory.yaml
[1]
http://logs.openstack.org/97/532097/16/gate/legacy-tricircle-dsvm-multiregion/ad547d5/zuul-info/inventory.yaml
Clark
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
BR
Zhiyuan
Loading...