When I started using Ansible, the operation became slow when the setting amount became large. I solved the problem by dividing it, so make a note of the confirmation point. (CentOS 7.3 ansible 22.214.171.124)
As a result, waiting for DNS timeout occurred. Even if you specify localhost or IP address other than FQDN, you also make a PTR record (reverse lookup) query.
- Isolation by tcpdump
- Difference specified by Ansible hosts file
- Eliminate an operation delay of Ansible
- Conclusion - Ansible slow/delayed reason and settings to be confirmed
Isolation by tcpdump
ansible-playbook seems to take a certain amount of time each time. When it is fixed slowly, it usually waits for some timeout, often in DNS. Check the operation with tcpdump.
First, check the interface information.
$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 ... 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 ...
Capture only the packet of DNS (53) on the confirmed interface. With the
-nn option, it disables character string conversion of Well-Known port number. In the case of port 53, it is displayed as domain, but it is hard to understand. (Sequence when capture is normal)
$ sudo tcpdump -i ens192 -nn port 53 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ens192, link-type EN10MB (Ethernet), capture size 65535 bytes 07:30:16.609964 IP 192.168.1.75.40927 > 192.168.1.77.53: 19540+ A? h-cent-mng01.designet.local. (45) 07:30:16.610591 IP 192.168.1.77.53 > 192.168.1.75.40927: 19540* 1/0/0 A 192.168.1.75 (61) 07:30:16.610660 IP 192.168.1.75.40927 > 192.168.1.77.53: 1216+ AAAA? h-cent-mng01.designet.local. (45) 07:30:16.611196 IP 192.168.1.77.53 > 192.168.1.75.40927: 1216* 0/1/0 (106) 07:30:16.822439 IP 192.168.1.75.43660 > 192.168.1.77.53: 18414+ PTR? 126.96.36.199.in-addr.arpa. (43) 07:30:16.822750 IP 192.168.1.77.53 > 192.168.1.75.43660: 18414* 1/0/0 PTR h-cent-mng01.designet.local. (84)
From the results, you can see that DNS inquires each record of A, AAAA, PTR.
Difference specified by Ansible hosts file
As for DNS problem, I suppose to assume the case of FQDN designation. However, Ansible tries to resolve each record even if localhost and IP address target servers are specified. For this reason, the problem of the operation delay becomes obvious when "it is not DNS registered yet so it is provisionally set by IP address ...".
Eliminate an operation delay of Ansible
The solution method is simple, one of the following measures should be taken.
- Register A, AAAA, PTR records in the DNS server
- Make Non-Exist (NXDomain) return by DNS server (Prevent timeout wait)
Conclusion - Ansible slow/delayed reason and settings to be confirmed
If Ansible's behavior is slow, you should doubt DNS timeout. Investigate the cause with tcpdump and find out if it is a DNS problem, register the necessary records for the DNS server or set the DNS server to return NXDomain to solve the problem.
There are other factors such as Yum Proxy, but this process validating abnormal communication with tcpdump works effectively.