cat log.ctdb on 192.168.224.222: 2010/03/29 13:02:42.867632 [ 3718]: Starting CTDBD as pid : 3718 2010/03/29 13:02:43.252555 [ 3718]: Freeze priority 1 2010/03/29 13:02:43.253644 [ 3718]: Freeze priority 2 2010/03/29 13:02:43.254795 [ 3718]: Freeze priority 3 2010/03/29 13:03:13.303992 [ 3718]: Freeze priority 1 2010/03/29 13:03:13.305381 [ 3718]: Freeze priority 2 2010/03/29 13:03:13.306351 [ 3718]: Freeze priority 3 2010/03/29 13:03:14.468369 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3733 2010/03/29 13:03:14.468472 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3733 2010/03/29 13:03:14.468540 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:03:14.473278 [ 3733]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3733 2010/03/29 13:03:43.354880 [ 3718]: Freeze priority 1 2010/03/29 13:03:43.357643 [ 3718]: Freeze priority 2 2010/03/29 13:03:43.358702 [ 3718]: Freeze priority 3 2010/03/29 13:03:43.465965 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3749 2010/03/29 13:03:43.466060 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3749 2010/03/29 13:03:43.466133 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:03:43.472256 [ 3749]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 29.0 seconds pid :3749 2010/03/29 13:03:43.477552 [ 3733]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130314.3733 2010/03/29 13:04:13.465880 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3764 2010/03/29 13:04:13.466015 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3764 2010/03/29 13:04:13.466044 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:04:13.474976 [ 3764]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3764 2010/03/29 13:04:13.475853 [ 3718]: Freeze priority 1 2010/03/29 13:04:13.480937 [ 3749]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130343.3749 2010/03/29 13:04:13.484329 [ 3718]: Freeze priority 2 2010/03/29 13:04:13.498799 [ 3718]: Freeze priority 3 2010/03/29 13:04:13.510519 [ 3764]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130413.3764 2010/03/29 13:04:43.547187 [ 3718]: Banning this node for 300 seconds 2010/03/29 13:04:43.553129 [ 3718]: Freeze priority 1 2010/03/29 13:04:43.554051 [ 3718]: Freeze priority 2 2010/03/29 13:04:43.555016 [ 3718]: Freeze priority 3 2010/03/29 13:04:44.466442 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3778 2010/03/29 13:04:44.466621 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3778 2010/03/29 13:04:44.466794 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:04:44.472729 [ 3778]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3778 2010/03/29 13:05:13.596194 [ 3718]: Freeze priority 1 2010/03/29 13:05:13.596693 [ 3718]: Freeze priority 2 2010/03/29 13:05:13.597521 [ 3718]: Freeze priority 3 2010/03/29 13:05:14.466552 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3790 2010/03/29 13:05:14.466604 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3790 2010/03/29 13:05:14.466627 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:05:14.471644 [ 3790]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3790 2010/03/29 13:05:14.477317 [ 3778]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130444.3778 2010/03/29 13:05:14.508099 [ 3790]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130514.3790 2010/03/29 13:05:18.515791 [ 3718]: Freeze priority 1 2010/03/29 13:05:18.516322 [ 3718]: Freeze priority 2 2010/03/29 13:05:18.516761 [ 3718]: Freeze priority 3 ... repeated a trillion times ... 2010/03/29 13:09:43.548576 [ 3718]: Banning timedout 2010/03/29 13:09:45.782484 [ 3718]: Freeze priority 1 2010/03/29 13:09:45.783514 [ 3718]: Freeze priority 2 2010/03/29 13:09:45.784539 [ 3718]: Freeze priority 3 2010/03/29 13:10:15.841153 [ 3718]: Freeze priority 1 2010/03/29 13:10:15.842405 [ 3718]: Freeze priority 2 2010/03/29 13:10:15.843701 [ 3718]: Freeze priority 3 2010/03/29 13:10:17.463098 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3831 2010/03/29 13:10:17.463194 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3831 2010/03/29 13:10:17.463310 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:10:17.468496 [ 3831]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3831 2010/03/29 13:10:45.923063 [ 3718]: Freeze priority 1 2010/03/29 13:10:45.926392 [ 3718]: Freeze priority 2 2010/03/29 13:10:45.927442 [ 3718]: Freeze priority 3 2010/03/29 13:10:46.460682 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3844 2010/03/29 13:10:46.460836 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3844 2010/03/29 13:10:46.460935 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:10:46.466722 [ 3844]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 29.0 seconds pid :3844 2010/03/29 13:10:46.472477 [ 3831]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131017.3831 2010/03/29 13:11:15.979213 [ 3718]: Freeze priority 1 2010/03/29 13:11:15.981135 [ 3718]: Freeze priority 2 2010/03/29 13:11:15.981984 [ 3718]: Freeze priority 3 2010/03/29 13:11:16.460710 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3857 2010/03/29 13:11:16.460854 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3857 2010/03/29 13:11:16.461015 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:11:16.467334 [ 3857]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3857 2010/03/29 13:11:16.473104 [ 3844]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131046.3844 2010/03/29 13:11:46.025220 [ 3718]: Banning this node for 300 seconds 2010/03/29 13:11:46.030948 [ 3718]: Freeze priority 1 2010/03/29 13:11:46.031958 [ 3718]: Freeze priority 2 2010/03/29 13:11:46.033029 [ 3718]: Freeze priority 3 2010/03/29 13:11:46.461049 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3870 2010/03/29 13:11:46.461157 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3870 2010/03/29 13:11:46.461253 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:11:46.468274 [ 3870]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3870 2010/03/29 13:11:46.475603 [ 3857]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131116.3857 2010/03/29 13:12:16.075108 [ 3718]: Freeze priority 1 2010/03/29 13:12:16.075578 [ 3718]: Freeze priority 2 2010/03/29 13:12:16.075964 [ 3718]: Freeze priority 3 2010/03/29 13:12:16.460923 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3883 2010/03/29 13:12:16.461094 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3883 2010/03/29 13:12:16.461136 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:12:16.466258 [ 3883]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3883 2010/03/29 13:12:16.471621 [ 3870]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131146.3870 2010/03/29 13:12:16.499255 [ 3883]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131216.3883 2010/03/29 13:12:23.550917 [ 3718]: Freeze priority 1 2010/03/29 13:12:23.551444 [ 3718]: Freeze priority 2 2010/03/29 13:12:23.551885 [ 3718]: Freeze priority 3 ... repeated a trillion times ... 2010/03/29 13:16:46.027045 [ 3718]: Banning timedout 2010/03/29 13:16:49.238066 [ 3718]: Freeze priority 1 2010/03/29 13:16:49.239297 [ 3718]: Freeze priority 2 2010/03/29 13:16:49.240433 [ 3718]: Freeze priority 3 2010/03/29 13:17:19.299505 [ 3718]: Freeze priority 1 2010/03/29 13:17:19.300733 [ 3718]: Freeze priority 2 2010/03/29 13:17:19.301805 [ 3718]: Freeze priority 3 2010/03/29 13:17:20.469640 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3926 2010/03/29 13:17:20.469739 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3926 2010/03/29 13:17:20.469760 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:17:20.474647 [ 3926]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3926 2010/03/29 13:17:49.356863 [ 3718]: Freeze priority 1 2010/03/29 13:17:49.358422 [ 3718]: Freeze priority 2 2010/03/29 13:17:49.359390 [ 3718]: Freeze priority 3 2010/03/29 13:17:49.463192 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3942 2010/03/29 13:17:49.463304 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3942 2010/03/29 13:17:49.463492 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:17:49.469847 [ 3942]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 29.0 seconds pid :3942 2010/03/29 13:17:49.475588 [ 3926]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131720.3926 2010/03/29 13:18:19.461981 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3953 2010/03/29 13:18:19.462143 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3953 2010/03/29 13:18:19.462183 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:18:19.467383 [ 3953]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3953 2010/03/29 13:18:19.526714 [ 3718]: Freeze priority 1 2010/03/29 13:18:19.533024 [ 3718]: Freeze priority 2 2010/03/29 13:18:19.533922 [ 3942]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131749.3942 2010/03/29 13:18:19.534909 [ 3718]: Freeze priority 3 2010/03/29 13:18:19.535322 [ 3953]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131819.3953 2010/03/29 13:18:49.581196 [ 3718]: Banning this node for 300 seconds 2010/03/29 13:18:49.586456 [ 3718]: Freeze priority 1 2010/03/29 13:18:49.587541 [ 3718]: Freeze priority 2 2010/03/29 13:18:49.588503 [ 3718]: Freeze priority 3 2010/03/29 13:18:51.458746 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3969 2010/03/29 13:18:51.458869 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3969 2010/03/29 13:18:51.458896 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:18:51.464921 [ 3969]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :3969 2010/03/29 13:19:19.604956 [ 3718]: Freeze priority 1 2010/03/29 13:19:19.605460 [ 3718]: Freeze priority 2 2010/03/29 13:19:19.605752 [ 3718]: Freeze priority 3 2010/03/29 13:19:20.459208 [ 3718]: Event script timed out : startrecovery count : 0 pid : 3981 2010/03/29 13:19:20.459265 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:3981 2010/03/29 13:19:20.459291 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:19:20.464984 [ 3981]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 29.0 seconds pid :3981 2010/03/29 13:19:20.470774 [ 3969]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131851.3969 2010/03/29 13:19:20.503980 [ 3981]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131920.3981 2010/03/29 13:19:24.509575 [ 3718]: Freeze priority 1 2010/03/29 13:19:24.509918 [ 3718]: Freeze priority 2 2010/03/29 13:19:24.510218 [ 3718]: Freeze priority 3 ... repeated a trillion times ... 2010/03/29 13:23:49.583719 [ 3718]: Banning timedout 2010/03/29 13:23:52.977761 [ 3722]: Taking out recovery lock from recovery daemon 2010/03/29 13:23:52.977996 [ 3722]: Take the recovery lock 2010/03/29 13:23:52.983145 [ 3722]: Recovery lock taken successfully 2010/03/29 13:23:52.983347 [ 3722]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:23:52.993309 [ 3718]: Freeze priority 1 2010/03/29 13:23:52.994341 [ 3718]: Freeze priority 2 2010/03/29 13:23:52.994927 [ 3718]: Freeze priority 3 2010/03/29 13:24:13.136669 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:3339 opcode:70 dstnode:0 2010/03/29 13:24:13.136804 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:24:13.136820 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:24:13.136832 [ 3722]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:24:13.136850 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:68876 opcode:70 dstnode:1 2010/03/29 13:24:13.136861 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:24:13.136871 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:24:13.136880 [ 3722]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:24:13.136918 [ 3722]: Async wait failed - fail_count=2 2010/03/29 13:24:13.136930 [ 3722]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:24:13.136940 [ 3722]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:24:22.997458 [ 3718]: Event script timed out : startrecovery count : 0 pid : 4024 2010/03/29 13:24:22.997621 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:4024 2010/03/29 13:24:22.997656 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:24:23.001886 [ 4024]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4024 2010/03/29 13:24:23.038842 [ 4024]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132423.4024 2010/03/29 13:24:23.040399 [ 3722]: client/ctdb_client.c:718 reqid 68876 not found 2010/03/29 13:24:23.040429 [ 3722]: Dropped orphaned reply control with reqid:3339 2010/03/29 13:24:23.059332 [ 3722]: Taking out recovery lock from recovery daemon 2010/03/29 13:24:23.059379 [ 3722]: Take the recovery lock 2010/03/29 13:24:23.063956 [ 3722]: Recovery lock taken successfully 2010/03/29 13:24:23.064460 [ 3722]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:24:23.076450 [ 3718]: Freeze priority 1 2010/03/29 13:24:23.077724 [ 3718]: Freeze priority 2 2010/03/29 13:24:23.078095 [ 3718]: Freeze priority 3 2010/03/29 13:24:43.133967 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:3370 opcode:70 dstnode:0 2010/03/29 13:24:43.134110 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:24:43.134126 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:24:43.134137 [ 3722]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:24:43.134154 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:68907 opcode:70 dstnode:1 2010/03/29 13:24:43.134165 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:24:43.134175 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:24:43.134184 [ 3722]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:24:43.134196 [ 3722]: Async wait failed - fail_count=2 2010/03/29 13:24:43.134221 [ 3722]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:24:43.134236 [ 3722]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:24:53.080108 [ 3718]: Event script timed out : startrecovery count : 0 pid : 4037 2010/03/29 13:24:53.080252 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:4037 2010/03/29 13:24:53.080277 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:24:53.086024 [ 4037]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4037 2010/03/29 13:24:53.119816 [ 4037]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132453.4037 2010/03/29 13:24:53.123049 [ 3722]: client/ctdb_client.c:718 reqid 68907 not found 2010/03/29 13:24:53.123095 [ 3722]: Dropped orphaned reply control with reqid:3370 2010/03/29 13:24:53.126833 [ 3722]: Taking out recovery lock from recovery daemon 2010/03/29 13:24:53.126877 [ 3722]: Take the recovery lock 2010/03/29 13:24:53.130454 [ 3722]: Recovery lock taken successfully 2010/03/29 13:24:53.130640 [ 3722]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:24:53.132786 [ 3718]: Freeze priority 1 2010/03/29 13:24:53.133973 [ 3718]: Freeze priority 2 2010/03/29 13:24:53.134986 [ 3718]: Freeze priority 3 2010/03/29 13:25:13.143946 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:3401 opcode:70 dstnode:0 2010/03/29 13:25:13.153347 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:25:13.153728 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:25:13.153750 [ 3722]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:25:13.153778 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:68938 opcode:70 dstnode:1 2010/03/29 13:25:13.153794 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:25:13.153804 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:25:13.153814 [ 3722]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:25:13.153826 [ 3722]: Async wait failed - fail_count=2 2010/03/29 13:25:13.153836 [ 3722]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:25:13.153862 [ 3722]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:25:23.183651 [ 3718]: Event script timed out : startrecovery count : 0 pid : 4050 2010/03/29 13:25:23.183783 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:4050 2010/03/29 13:25:23.183832 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:25:23.188079 [ 4050]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4050 2010/03/29 13:25:23.221547 [ 4050]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132523.4050 2010/03/29 13:25:23.223187 [ 3722]: client/ctdb_client.c:718 reqid 68938 not found 2010/03/29 13:25:23.223221 [ 3722]: Dropped orphaned reply control with reqid:3401 2010/03/29 13:25:23.229499 [ 3722]: Taking out recovery lock from recovery daemon 2010/03/29 13:25:23.229530 [ 3722]: Take the recovery lock 2010/03/29 13:25:23.232011 [ 3722]: Recovery lock taken successfully 2010/03/29 13:25:23.232303 [ 3722]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:25:23.234616 [ 3718]: Freeze priority 1 2010/03/29 13:25:23.235832 [ 3718]: Freeze priority 2 2010/03/29 13:25:23.236783 [ 3718]: Freeze priority 3 2010/03/29 13:25:44.135016 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:3432 opcode:70 dstnode:0 2010/03/29 13:25:44.138116 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:25:44.138167 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:25:44.138193 [ 3722]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:25:44.138219 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:68969 opcode:70 dstnode:1 2010/03/29 13:25:44.138238 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:25:44.138249 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:25:44.138258 [ 3722]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:25:44.138270 [ 3722]: Async wait failed - fail_count=2 2010/03/29 13:25:44.138284 [ 3722]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:25:44.138297 [ 3722]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:25:53.239529 [ 3718]: Event script timed out : startrecovery count : 0 pid : 4063 2010/03/29 13:25:53.240526 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:4063 2010/03/29 13:25:53.240590 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:25:53.262679 [ 4063]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4063 2010/03/29 13:25:53.317619 [ 4063]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132553.4063 2010/03/29 13:25:53.320639 [ 3722]: client/ctdb_client.c:718 reqid 68969 not found 2010/03/29 13:25:53.320671 [ 3722]: Dropped orphaned reply control with reqid:3432 2010/03/29 13:25:53.322731 [ 3718]: Banning this node for 300 seconds 2010/03/29 13:25:53.322808 [ 3722]: Taking out recovery lock from recovery daemon 2010/03/29 13:25:53.322831 [ 3722]: Take the recovery lock 2010/03/29 13:25:53.328911 [ 3722]: Recovery lock taken successfully 2010/03/29 13:25:53.329681 [ 3722]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:25:53.332234 [ 3718]: Freeze priority 1 2010/03/29 13:25:53.333128 [ 3718]: Freeze priority 2 2010/03/29 13:25:53.334341 [ 3718]: Freeze priority 3 2010/03/29 13:26:03.455809 [ 3718]: Freeze priority 1 2010/03/29 13:26:04.455793 [ 3718]: Freeze priority 2 2010/03/29 13:26:05.455865 [ 3718]: Freeze priority 3 2010/03/29 13:26:14.145095 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:3465 opcode:70 dstnode:0 2010/03/29 13:26:14.145295 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:26:14.145311 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:26:14.145321 [ 3722]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:26:14.145336 [ 3722]: client/ctdb_client.c:771 control timed out. reqid:69002 opcode:70 dstnode:1 2010/03/29 13:26:14.145346 [ 3722]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:26:14.145355 [ 3722]: Async operation failed with state 3, opcode:70 2010/03/29 13:26:14.145364 [ 3722]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:26:16.135244 [ 3722]: Async wait failed - fail_count=2 2010/03/29 13:26:16.135309 [ 3722]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:26:16.135322 [ 3722]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:26:22.456041 [ 3718]: Freeze priority 1 2010/03/29 13:26:23.337748 [ 3718]: Event script timed out : startrecovery count : 0 pid : 4076 2010/03/29 13:26:23.337810 [ 3718]: server/eventscript.c:508 Sending SIGTERM to child pid:4076 2010/03/29 13:26:23.337863 [ 3718]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:26:23.341870 [ 4076]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4076 2010/03/29 13:26:23.375245 [ 4076]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132623.4076 2010/03/29 13:26:23.377374 [ 3722]: Dropped orphaned reply control with reqid:3465 2010/03/29 13:26:23.377408 [ 3722]: client/ctdb_client.c:718 reqid 69002 not found 2010/03/29 13:26:23.383810 [ 3718]: Freeze priority 2 2010/03/29 13:26:23.384169 [ 3718]: Freeze priority 3 2010/03/29 13:26:30.417436 [ 3718]: Freeze priority 1 ... repeated a trillion times ... 2010/03/29 13:27:32.188713 [ 4160]: Timed out running script '/etc/ctdb/events.d/01.reclock shutdown ' after 1.5 seconds pid :4160 2010/03/29 13:27:32.190825 [ 4160]: Timed out running script '/etc/ctdb/events.d/01.reclock shutdown ' after 1.5 seconds pid :4160 2010/03/29 13:27:32.248602 [ 4160]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132732.4160