Discussion:
[openstack-dev] [octavia] Sometimes amphoras are not re-created if they are not reached for more than heartbeat_timeout
m***@orange.com
2018-04-25 11:07:31 UTC
Permalink
Hello,

I am testing Octavia Queens and I see that the failover behavior is very much different than the one in Ocata (this is the version we are currently running in production).
One example of such behavior is:

I create 4 load balancers and after the creation is successful, I shut off all the 8 amphoras. Sometimes, even the health-manager agent does not reach the amphoras, they are not deleted and re-created. The logs look like shown below even when the heartbeat timeout is long passed. Sometimes the amphoras are deleted and re-created. Sometimes, they are partially re-created - part of them remain in shut off.
Heartbeat_timeout is set to 60 seconds.



[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:26.244 11 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-339b54a7-ab0c-422a-832f-a444cd710497 - a5f15235c0714365b98a50a11ec956e7 - - -] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries exceeded with url: /0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octaviasrv2.orange.com.pem (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f559862c710>: Failed to establish a new connection: [Errno 113] No route to host',))
[octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:26.464 13 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - a5f15235c0714365b98a50a11ec956e7 - - -] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries exceeded with url: /0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8-9d73-2397e281712c/haproxy (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8a0de95e10>: Failed to establish a new connection: [Errno 113] No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:27.772 11 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-10febb10-85ea-4082-9df7-daa48894b004 - a5f15235c0714365b98a50a11ec956e7 - - -] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries exceeded with url: /0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b-b93f-1da3957a5b71/haproxy (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5598491c90>: Failed to establish a new connection: [Errno 113] No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:34.252 11 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-339b54a7-ab0c-422a-832f-a444cd710497 - a5f15235c0714365b98a50a11ec956e7 - - -] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries exceeded with url: /0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octaviasrv2.orange.com.pem (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5598520790>: Failed to establish a new connection: [Errno 113] No route to host',))
[octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:34.476 13 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - a5f15235c0714365b98a50a11ec956e7 - - -] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries exceeded with url: /0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8-9d73-2397e281712c/haproxy (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8a0de953d0>: Failed to establish a new connection: [Errno 113] No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:35.780 11 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-10febb10-85ea-4082-9df7-daa48894b004 - a5f15235c0714365b98a50a11ec956e7 - - -] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries exceeded with url: /0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b-b93f-1da3957a5b71/haproxy (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f55984e2050>: Failed to establish a new connection: [Errno 113] No route to host',))

Thank you,
Mihaela Balas

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
Michael Johnson
2018-04-27 17:24:27 UTC
Permalink
Hi Mihaela,

I am sorry to hear you are having trouble with the queens release of
Octavia. It is true that a lot of work has gone into the failover
capability, specifically working around a python threading issue and
making it more resistant to certain neutron failure situations
(missing ports, etc.).

I know of one open bug against the failover flows,
https://storyboard.openstack.org/#!/story/2001481, "failover breaks in
Active/Standby mode if both amphroae are down".

Unfortunately the log snippet above does not give me enough
information about the problem to help with this issue. From the
snippet it looks like the failovers were initiated, but the
controllers are unable to reach the amphora-agent on the replacement
amphora. It will continue those retry attempts, but eventually will
fail the amphora into ERROR if it doesn't succeed.

One thought I have is if you created you amphora image in the last two
weeks, you may have built an amphora using the master branch of
octavia, which had a bug that impacted active/standby images. This was
introduced working around the new pip 10 issues. That patch has been
fixed: https://review.openstack.org/#/c/564371/

If neither of these situations match your environment, please open a
story (https://storyboard.openstack.org/#!/dashboard/stories) for us
and include the health manager logs from the point you delete the
amphora up until it starts these connection attempts. We will dig
through those logs to see what the issue might be.

Michael (johnsom)
Post by m***@orange.com
Hello,
I am testing Octavia Queens and I see that the failover behavior is very
much different than the one in Ocata (this is the version we are currently
running in production).
I create 4 load balancers and after the creation is successful, I shut off
all the 8 amphoras. Sometimes, even the health-manager agent does not reach
the amphoras, they are not deleted and re-created. The logs look like shown
below even when the heartbeat timeout is long passed. Sometimes the amphoras
are deleted and re-created. Sometimes, they are partially re-created – part
of them remain in shut off.
Heartbeat_timeout is set to 60 seconds.
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:26.244 11 WARNING
octavia.amphorae.drivers.haproxy.rest_api_driver
[req-339b54a7-ab0c-422a-832f-a444cd710497 - a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries exceeded
/0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octaviasrv2.orange.com.pem
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
object at 0x7f559862c710>: Failed to establish a new connection: [Errno 113]
No route to host',))
[octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:26.464 13 WARNING
octavia.amphorae.drivers.haproxy.rest_api_driver
[req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries exceeded
/0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8-9d73-2397e281712c/haproxy
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
object at 0x7f8a0de95e10>: Failed to establish a new connection: [Errno 113]
No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:27.772 11 WARNING
octavia.amphorae.drivers.haproxy.rest_api_driver
[req-10febb10-85ea-4082-9df7-daa48894b004 - a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries exceeded
/0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b-b93f-1da3957a5b71/haproxy
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
object at 0x7f5598491c90>: Failed to establish a new connection: [Errno 113]
No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:34.252 11 WARNING
octavia.amphorae.drivers.haproxy.rest_api_driver
[req-339b54a7-ab0c-422a-832f-a444cd710497 - a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries exceeded
/0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octaviasrv2.orange.com.pem
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
object at 0x7f5598520790>: Failed to establish a new connection: [Errno 113]
No route to host',))
[octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:34.476 13 WARNING
octavia.amphorae.drivers.haproxy.rest_api_driver
[req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries exceeded
/0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8-9d73-2397e281712c/haproxy
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
object at 0x7f8a0de953d0>: Failed to establish a new connection: [Errno 113]
No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:35.780 11 WARNING
octavia.amphorae.drivers.haproxy.rest_api_driver
[req-10febb10-85ea-4082-9df7-daa48894b004 - a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries exceeded
/0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b-b93f-1da3957a5b71/haproxy
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
object at 0x7f55984e2050>: Failed to establish a new connection: [Errno 113]
No route to host',))
Thank you,
Mihaela Balas
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and
delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-***@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/
m***@orange.com
2018-05-03 08:51:50 UTC
Permalink
Hi Michael,

I build a new amphora image with the latest patches and I reproduced two different bugs that I see in my environment. One of them is similar to the one initially described in this thread. I opened two stories as you advised:

https://storyboard.openstack.org/#!/story/2001960
https://storyboard.openstack.org/#!/story/2001955

Meanwhile, can you provide some recommendation of values for the following parameters (maybe in relation with number of workers, cores, computes etc)?

[health_manager]
failover_threads
status_update_threads

[haproxy_amphora]
build_rate_limit
build_active_retries

[controller_worker]
workers
amp_active_retries
amp_active_wait_sec

[task_flow]
max_workers

Thank you for your help,
Mihaela Balas

-----Original Message-----
From: Michael Johnson [mailto:***@gmail.com]
Sent: Friday, April 27, 2018 8:24 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [octavia] Sometimes amphoras are not re-created if they are not reached for more than heartbeat_timeout

Hi Mihaela,

I am sorry to hear you are having trouble with the queens release of Octavia. It is true that a lot of work has gone into the failover capability, specifically working around a python threading issue and making it more resistant to certain neutron failure situations (missing ports, etc.).

I know of one open bug against the failover flows, https://storyboard.openstack.org/#!/story/2001481, "failover breaks in Active/Standby mode if both amphroae are down".

Unfortunately the log snippet above does not give me enough information about the problem to help with this issue. From the snippet it looks like the failovers were initiated, but the controllers are unable to reach the amphora-agent on the replacement amphora. It will continue those retry attempts, but eventually will fail the amphora into ERROR if it doesn't succeed.

One thought I have is if you created you amphora image in the last two weeks, you may have built an amphora using the master branch of octavia, which had a bug that impacted active/standby images. This was introduced working around the new pip 10 issues. That patch has been
fixed: https://review.openstack.org/#/c/564371/

If neither of these situations match your environment, please open a story (https://storyboard.openstack.org/#!/dashboard/stories) for us and include the health manager logs from the point you delete the amphora up until it starts these connection attempts. We will dig through those logs to see what the issue might be.

Michael (johnsom)
Post by m***@orange.com
Hello,
I am testing Octavia Queens and I see that the failover behavior is
very much different than the one in Ocata (this is the version we are
currently running in production).
I create 4 load balancers and after the creation is successful, I shut
off all the 8 amphoras. Sometimes, even the health-manager agent does
not reach the amphoras, they are not deleted and re-created. The logs
look like shown below even when the heartbeat timeout is long passed.
Sometimes the amphoras are deleted and re-created. Sometimes, they
are partially re-created – part of them remain in shut off.
Heartbeat_timeout is set to 60 seconds.
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:26.244 11
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
[req-339b54a7-ab0c-422a-832f-a444cd710497 -
a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries
/0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octav
iasrv2.orange.com.pem (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
[Errno 113] No route to host',))
[octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:26.464 13
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
[req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b -
a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries
/0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8
-9d73-2397e281712c/haproxy (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
[Errno 113] No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:27.772 11
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
[req-10febb10-85ea-4082-9df7-daa48894b004 -
a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries
/0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b
-b93f-1da3957a5b71/haproxy (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
[Errno 113] No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:34.252 11
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
[req-339b54a7-ab0c-422a-832f-a444cd710497 -
a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries
/0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octav
iasrv2.orange.com.pem (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
[Errno 113] No route to host',))
[octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:34.476 13
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
[req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b -
a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries
/0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8
-9d73-2397e281712c/haproxy (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
[Errno 113] No route to host',))
[octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:35.780 11
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
[req-10febb10-85ea-4082-9df7-daa48894b004 -
a5f15235c0714365b98a50a11ec956e7
HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries
/0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b
-b93f-1da3957a5b71/haproxy (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
[Errno 113] No route to host',))
Thank you,
Mihaela Balas
______________________________________________________________________
___________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc pas etre diffuses,
exploites ou copies sans autorisation. Si vous avez recu ce message
par erreur, veuillez le signaler a l'expediteur et le detruire ainsi
que les pieces jointes. Les messages electroniques etant susceptibles
d'alteration, Orange decline toute responsabilite si ce message a ete
altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or
privileged information that may be protected by law; they should not
be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and
delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have
been modified, changed or falsified.
Thank you.
______________________________________________________________________
____ OpenStack Development Mailing List (not for usage questions)
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-***@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
Michael Johnson
2018-05-04 21:27:53 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...