cat log.ctdb on 192.168.224.221: 2010/03/29 13:02:38.191668 [ 4068]: Starting CTDBD as pid : 4068 2010/03/29 13:02:39.204880 [ 4068]: Freeze priority 1 2010/03/29 13:02:39.205016 [ 4068]: Freeze priority 2 2010/03/29 13:02:39.205110 [ 4068]: Freeze priority 3 2010/03/29 13:02:43.241245 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:02:43.241428 [ 4072]: Take the recovery lock 2010/03/29 13:02:43.243224 [ 4072]: Recovery lock taken successfully 2010/03/29 13:02:43.243591 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:02:43.245257 [ 4068]: Freeze priority 1 2010/03/29 13:02:43.246451 [ 4068]: Freeze priority 2 2010/03/29 13:02:43.247685 [ 4068]: Freeze priority 3 2010/03/29 13:03:03.405623 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:56 opcode:70 dstnode:0 2010/03/29 13:03:03.405707 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:03:03.405722 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:03:03.405734 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:03:03.405751 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:65593 opcode:70 dstnode:1 2010/03/29 13:03:03.405763 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:03:03.405773 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:03:03.405783 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:03:03.405796 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:03:03.405807 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:03:03.405817 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:03:13.251469 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4089 2010/03/29 13:03:13.251592 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4089 2010/03/29 13:03:13.251615 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:03:13.255105 [ 4089]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4089 2010/03/29 13:03:13.286627 [ 4089]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130313.4089 2010/03/29 13:03:13.288128 [ 4072]: Dropped orphaned reply control with reqid:56 2010/03/29 13:03:13.290556 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:03:13.290660 [ 4072]: Take the recovery lock 2010/03/29 13:03:13.293747 [ 4072]: Recovery lock taken successfully 2010/03/29 13:03:13.294370 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:03:13.296576 [ 4068]: Freeze priority 1 2010/03/29 13:03:13.297871 [ 4068]: Freeze priority 2 2010/03/29 13:03:13.298974 [ 4068]: Freeze priority 3 2010/03/29 13:03:15.405748 [ 4072]: Dropped orphaned reply control with reqid:65593 2010/03/29 13:03:33.412597 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:87 opcode:70 dstnode:0 2010/03/29 13:03:33.412757 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:03:33.412772 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:03:33.412787 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:03:33.412807 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:65624 opcode:70 dstnode:1 2010/03/29 13:03:33.412819 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:03:33.412830 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:03:33.412839 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:03:33.412852 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:03:33.412862 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:03:33.412873 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:03:43.301538 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4106 2010/03/29 13:03:43.301682 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4106 2010/03/29 13:03:43.301722 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:03:43.305339 [ 4106]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4106 2010/03/29 13:03:43.337079 [ 4106]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130343.4106 2010/03/29 13:03:43.339670 [ 4072]: Dropped orphaned reply control with reqid:87 2010/03/29 13:03:43.342006 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:03:43.342042 [ 4072]: Take the recovery lock 2010/03/29 13:03:43.344683 [ 4072]: Recovery lock taken successfully 2010/03/29 13:03:43.345302 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:03:43.347109 [ 4068]: Freeze priority 1 2010/03/29 13:03:43.348782 [ 4068]: Freeze priority 2 2010/03/29 13:03:43.351026 [ 4068]: Freeze priority 3 2010/03/29 13:03:44.406965 [ 4072]: Dropped orphaned reply control with reqid:65624 2010/03/29 13:04:03.395103 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:118 opcode:70 dstnode:0 2010/03/29 13:04:03.395264 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:04:03.395280 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:04:03.395299 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:04:03.395323 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:65655 opcode:70 dstnode:1 2010/03/29 13:04:03.395335 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:04:03.395345 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:04:03.395355 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:04:03.395368 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:04:03.395383 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:04:03.395396 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:04:13.355993 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4122 2010/03/29 13:04:13.356136 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4122 2010/03/29 13:04:13.356160 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:04:13.359687 [ 4122]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4122 2010/03/29 13:04:13.393713 [ 4122]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130413.4122 2010/03/29 13:04:13.395814 [ 4072]: Dropped orphaned reply control with reqid:118 2010/03/29 13:04:13.398145 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:04:13.398186 [ 4072]: Take the recovery lock 2010/03/29 13:04:13.457013 [ 4072]: Recovery lock taken successfully 2010/03/29 13:04:13.457673 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:04:13.459520 [ 4072]: client/ctdb_client.c:718 reqid 65655 not found 2010/03/29 13:04:13.459786 [ 4068]: Freeze priority 1 2010/03/29 13:04:13.468732 [ 4068]: Freeze priority 2 2010/03/29 13:04:13.477207 [ 4068]: Freeze priority 3 2010/03/29 13:04:34.407459 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:149 opcode:70 dstnode:0 2010/03/29 13:04:34.407578 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:04:34.407594 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:04:34.407605 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:04:34.407653 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:65686 opcode:70 dstnode:1 2010/03/29 13:04:34.407667 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:04:34.407677 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:04:34.407687 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:04:34.407699 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:04:34.407709 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:04:34.407720 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:04:43.499419 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4137 2010/03/29 13:04:43.499565 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4137 2010/03/29 13:04:43.499593 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:04:43.503510 [ 4137]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4137 2010/03/29 13:04:43.536404 [ 4137]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130443.4137 2010/03/29 13:04:43.537587 [ 4072]: Dropped orphaned reply control with reqid:149 2010/03/29 13:04:43.539184 [ 4068]: Banning this node for 300 seconds 2010/03/29 13:04:43.539768 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:04:43.539786 [ 4072]: Take the recovery lock 2010/03/29 13:04:43.542203 [ 4072]: Recovery lock taken successfully 2010/03/29 13:04:43.542787 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:04:43.544891 [ 4068]: Freeze priority 1 2010/03/29 13:04:43.545748 [ 4068]: Freeze priority 2 2010/03/29 13:04:43.546805 [ 4068]: Freeze priority 3 2010/03/29 13:04:45.403611 [ 4072]: Dropped orphaned reply control with reqid:65686 2010/03/29 13:04:51.177378 [ 4068]: Freeze priority 1 2010/03/29 13:04:52.177340 [ 4068]: Freeze priority 2 2010/03/29 13:04:53.177349 [ 4068]: Freeze priority 3 2010/03/29 13:05:04.404000 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:182 opcode:70 dstnode:0 2010/03/29 13:05:04.404071 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:05:04.404084 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:05:04.404095 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:05:04.404112 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:65719 opcode:70 dstnode:1 2010/03/29 13:05:04.404123 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:05:04.404133 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:05:04.404142 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:05:04.404155 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:05:04.404165 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:05:04.404176 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:05:10.178025 [ 4068]: Freeze priority 1 2010/03/29 13:05:11.177985 [ 4068]: Freeze priority 2 2010/03/29 13:05:12.178079 [ 4068]: Freeze priority 3 2010/03/29 13:05:13.549876 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4150 2010/03/29 13:05:13.549928 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4150 2010/03/29 13:05:13.549947 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:05:13.553916 [ 4150]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4150 2010/03/29 13:05:13.586760 [ 4150]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329130513.4150 2010/03/29 13:05:13.587930 [ 4072]: Dropped orphaned reply control with reqid:182 2010/03/29 13:05:14.459191 [ 4072]: client/ctdb_client.c:718 reqid 65719 not found 2010/03/29 13:05:21.531947 [ 4068]: Freeze priority 1 2010/03/29 13:05:21.535199 [ 4068]: Freeze priority 2 2010/03/29 13:05:21.535907 [ 4068]: Freeze priority 3 ... repeated a trillion times ... 2010/03/29 13:09:43.541414 [ 4068]: Banning timedout 2010/03/29 13:09:45.767566 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:09:45.767691 [ 4072]: Take the recovery lock 2010/03/29 13:09:45.769760 [ 4072]: Recovery lock taken successfully 2010/03/29 13:09:45.769929 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:09:45.777248 [ 4068]: Freeze priority 1 2010/03/29 13:09:45.778275 [ 4068]: Freeze priority 2 2010/03/29 13:09:45.779334 [ 4068]: Freeze priority 3 2010/03/29 13:10:06.406493 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:1298 opcode:70 dstnode:0 2010/03/29 13:10:06.406576 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:10:06.406593 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:10:06.406604 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:10:06.406622 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:66835 opcode:70 dstnode:1 2010/03/29 13:10:06.406634 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:10:06.406643 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:10:06.406686 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:10:06.406701 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:10:06.406712 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:10:06.406722 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:10:15.784993 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4190 2010/03/29 13:10:15.785166 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4190 2010/03/29 13:10:15.785194 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:10:15.789274 [ 4190]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4190 2010/03/29 13:10:15.827736 [ 4190]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131015.4190 2010/03/29 13:10:15.829889 [ 4072]: Dropped orphaned reply control with reqid:1298 2010/03/29 13:10:15.831340 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:10:15.831359 [ 4072]: Take the recovery lock 2010/03/29 13:10:15.834004 [ 4072]: Recovery lock taken successfully 2010/03/29 13:10:15.834596 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:10:15.836090 [ 4068]: Freeze priority 1 2010/03/29 13:10:15.837552 [ 4068]: Freeze priority 2 2010/03/29 13:10:15.838849 [ 4068]: Freeze priority 3 2010/03/29 13:10:18.406113 [ 4072]: Dropped orphaned reply control with reqid:66835 2010/03/29 13:10:36.406438 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:1329 opcode:70 dstnode:0 2010/03/29 13:10:36.406540 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:10:36.406555 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:10:36.406572 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:10:36.406593 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:66866 opcode:70 dstnode:1 2010/03/29 13:10:36.406605 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:10:36.406614 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:10:36.406624 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:10:36.406665 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:10:36.406681 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:10:36.406723 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:10:45.841599 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4203 2010/03/29 13:10:45.841755 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4203 2010/03/29 13:10:45.841778 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:10:45.845600 [ 4203]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4203 2010/03/29 13:10:45.898163 [ 4203]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131045.4203 2010/03/29 13:10:45.899889 [ 4072]: Dropped orphaned reply control with reqid:1329 2010/03/29 13:10:45.901577 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:10:45.901602 [ 4072]: Take the recovery lock 2010/03/29 13:10:45.904040 [ 4072]: Recovery lock taken successfully 2010/03/29 13:10:45.904622 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:10:45.917889 [ 4068]: Freeze priority 1 2010/03/29 13:10:45.921511 [ 4068]: Freeze priority 2 2010/03/29 13:10:45.922413 [ 4068]: Freeze priority 3 2010/03/29 13:10:47.406762 [ 4072]: Dropped orphaned reply control with reqid:66866 2010/03/29 13:11:06.407194 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:1360 opcode:70 dstnode:0 2010/03/29 13:11:06.407327 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:11:06.407344 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:11:06.407358 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:11:06.407375 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:66897 opcode:70 dstnode:1 2010/03/29 13:11:06.407386 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:11:06.407397 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:11:06.407407 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:11:06.407419 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:11:06.407430 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:11:06.407440 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:11:15.926629 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4217 2010/03/29 13:11:15.926843 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4217 2010/03/29 13:11:15.926875 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:11:15.930802 [ 4217]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4217 2010/03/29 13:11:15.964261 [ 4217]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131115.4217 2010/03/29 13:11:15.966026 [ 4072]: Dropped orphaned reply control with reqid:1360 2010/03/29 13:11:15.967511 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:11:15.967561 [ 4072]: Take the recovery lock 2010/03/29 13:11:15.971621 [ 4072]: Recovery lock taken successfully 2010/03/29 13:11:15.972213 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:11:15.974382 [ 4068]: Freeze priority 1 2010/03/29 13:11:15.975956 [ 4068]: Freeze priority 2 2010/03/29 13:11:15.977374 [ 4068]: Freeze priority 3 2010/03/29 13:11:17.407199 [ 4072]: Dropped orphaned reply control with reqid:66897 2010/03/29 13:11:36.395888 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:1391 opcode:70 dstnode:0 2010/03/29 13:11:36.396046 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:11:36.396063 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:11:36.396075 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:11:36.396091 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:66928 opcode:70 dstnode:1 2010/03/29 13:11:36.396102 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:11:36.396112 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:11:36.396121 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:11:36.396134 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:11:36.396144 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:11:36.396154 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:11:45.980287 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4230 2010/03/29 13:11:45.980444 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4230 2010/03/29 13:11:45.980473 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:11:45.984634 [ 4230]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4230 2010/03/29 13:11:46.017530 [ 4230]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131145.4230 2010/03/29 13:11:46.019194 [ 4072]: Dropped orphaned reply control with reqid:1391 2010/03/29 13:11:46.020813 [ 4068]: Banning this node for 300 seconds 2010/03/29 13:11:46.021347 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:11:46.021366 [ 4072]: Take the recovery lock 2010/03/29 13:11:46.023910 [ 4072]: Recovery lock taken successfully 2010/03/29 13:11:46.024553 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:11:46.026092 [ 4068]: Freeze priority 1 2010/03/29 13:11:46.027286 [ 4068]: Freeze priority 2 2010/03/29 13:11:46.028310 [ 4068]: Freeze priority 3 2010/03/29 13:11:47.403852 [ 4072]: Dropped orphaned reply control with reqid:66928 2010/03/29 13:11:55.180817 [ 4068]: Freeze priority 1 2010/03/29 13:11:56.181889 [ 4068]: Freeze priority 2 2010/03/29 13:11:57.181754 [ 4068]: Freeze priority 3 2010/03/29 13:12:06.404353 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:1424 opcode:70 dstnode:0 2010/03/29 13:12:06.404575 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:12:06.404589 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:12:06.404601 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:12:06.404615 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:66961 opcode:70 dstnode:1 2010/03/29 13:12:06.404626 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:12:06.404636 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:12:06.404646 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:12:07.404284 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:12:07.404336 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:12:07.404351 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:12:14.179726 [ 4068]: Freeze priority 1 2010/03/29 13:12:15.182130 [ 4068]: Freeze priority 2 2010/03/29 13:12:16.031421 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4243 2010/03/29 13:12:16.031569 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4243 2010/03/29 13:12:16.031615 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:12:16.035529 [ 4243]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4243 2010/03/29 13:12:16.068546 [ 4243]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131216.4243 2010/03/29 13:12:16.070264 [ 4072]: Dropped orphaned reply control with reqid:1424 2010/03/29 13:12:16.457302 [ 4072]: client/ctdb_client.c:718 reqid 66961 not found 2010/03/29 13:12:16.496875 [ 4068]: Freeze priority 3 2010/03/29 13:12:20.503948 [ 4068]: Freeze priority 1 2010/03/29 13:12:20.504355 [ 4068]: Freeze priority 2 ... repeated a trillion times ... 2010/03/29 13:16:46.031944 [ 4068]: Banning timedout 2010/03/29 13:16:49.227965 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:16:49.228086 [ 4072]: Take the recovery lock 2010/03/29 13:16:49.230146 [ 4072]: Recovery lock taken successfully 2010/03/29 13:16:49.230453 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:16:49.232319 [ 4068]: Freeze priority 1 2010/03/29 13:16:49.233720 [ 4068]: Freeze priority 2 2010/03/29 13:16:49.234703 [ 4068]: Freeze priority 3 2010/03/29 13:17:09.405855 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:2346 opcode:70 dstnode:0 2010/03/29 13:17:09.405994 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:17:09.406011 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:17:09.406022 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:17:09.406039 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:67883 opcode:70 dstnode:1 2010/03/29 13:17:09.406051 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:17:09.406060 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:17:09.406070 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:17:09.406124 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:17:09.406137 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:17:09.406147 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:17:19.238485 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4285 2010/03/29 13:17:19.238622 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4285 2010/03/29 13:17:19.238648 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:17:19.242481 [ 4285]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4285 2010/03/29 13:17:19.276291 [ 4285]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131719.4285 2010/03/29 13:17:19.277725 [ 4072]: Dropped orphaned reply control with reqid:2346 2010/03/29 13:17:19.279326 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:17:19.279351 [ 4072]: Take the recovery lock 2010/03/29 13:17:19.281940 [ 4072]: Recovery lock taken successfully 2010/03/29 13:17:19.282370 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:17:19.294234 [ 4068]: Freeze priority 1 2010/03/29 13:17:19.295657 [ 4068]: Freeze priority 2 2010/03/29 13:17:19.296822 [ 4068]: Freeze priority 3 2010/03/29 13:17:21.406882 [ 4072]: Dropped orphaned reply control with reqid:67883 2010/03/29 13:17:39.407638 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:2377 opcode:70 dstnode:0 2010/03/29 13:17:39.407762 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:17:39.407778 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:17:39.407794 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:17:39.407816 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:67914 opcode:70 dstnode:1 2010/03/29 13:17:39.407828 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:17:39.407838 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:17:39.407848 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:17:39.407860 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:17:39.407871 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:17:39.407882 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:17:49.300931 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4302 2010/03/29 13:17:49.301306 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4302 2010/03/29 13:17:49.302280 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:17:49.307562 [ 4302]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4302 2010/03/29 13:17:49.345069 [ 4302]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131749.4302 2010/03/29 13:17:49.346456 [ 4072]: Dropped orphaned reply control with reqid:2377 2010/03/29 13:17:49.347964 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:17:49.347985 [ 4072]: Take the recovery lock 2010/03/29 13:17:49.350187 [ 4072]: Recovery lock taken successfully 2010/03/29 13:17:49.350654 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:17:49.352459 [ 4068]: Freeze priority 1 2010/03/29 13:17:49.353906 [ 4068]: Freeze priority 2 2010/03/29 13:17:49.354907 [ 4068]: Freeze priority 3 2010/03/29 13:17:50.404053 [ 4072]: Dropped orphaned reply control with reqid:67914 2010/03/29 13:18:09.406597 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:2408 opcode:70 dstnode:0 2010/03/29 13:18:09.406706 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:18:09.406748 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:18:09.406761 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:18:09.406778 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:67945 opcode:70 dstnode:1 2010/03/29 13:18:09.406790 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:18:09.406800 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:18:09.406809 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:18:09.406823 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:18:09.406833 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:18:09.406844 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:18:19.358080 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4315 2010/03/29 13:18:19.358304 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4315 2010/03/29 13:18:19.358344 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:18:19.362608 [ 4315]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4315 2010/03/29 13:18:19.395404 [ 4072]: Dropped orphaned reply control with reqid:2408 2010/03/29 13:18:19.458323 [ 4315]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131819.4315 2010/03/29 13:18:19.459796 [ 4072]: client/ctdb_client.c:718 reqid 67945 not found 2010/03/29 13:18:19.500019 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:18:19.500071 [ 4072]: Take the recovery lock 2010/03/29 13:18:19.502355 [ 4072]: Recovery lock taken successfully 2010/03/29 13:18:19.502502 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:18:19.516028 [ 4068]: Freeze priority 1 2010/03/29 13:18:19.523465 [ 4068]: Freeze priority 2 2010/03/29 13:18:19.529760 [ 4068]: Freeze priority 3 2010/03/29 13:18:40.407183 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:2439 opcode:70 dstnode:0 2010/03/29 13:18:40.407388 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:18:40.407405 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:18:40.407417 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:18:40.407437 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:67976 opcode:70 dstnode:1 2010/03/29 13:18:40.407449 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:18:40.407459 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:18:40.407469 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:18:40.407482 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:18:40.407493 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:18:40.407503 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:18:49.536144 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4328 2010/03/29 13:18:49.536310 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4328 2010/03/29 13:18:49.536353 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:18:49.540471 [ 4328]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4328 2010/03/29 13:18:49.574999 [ 4328]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131849.4328 2010/03/29 13:18:49.576372 [ 4072]: Dropped orphaned reply control with reqid:2439 2010/03/29 13:18:49.577864 [ 4068]: Banning this node for 300 seconds 2010/03/29 13:18:49.578420 [ 4072]: Taking out recovery lock from recovery daemon 2010/03/29 13:18:49.578461 [ 4072]: Take the recovery lock 2010/03/29 13:18:49.580985 [ 4072]: Recovery lock taken successfully 2010/03/29 13:18:49.581569 [ 4072]: Recovery lock taken successfully by recovery daemon 2010/03/29 13:18:49.583159 [ 4068]: Freeze priority 1 2010/03/29 13:18:49.583923 [ 4068]: Freeze priority 2 2010/03/29 13:18:49.584964 [ 4068]: Freeze priority 3 2010/03/29 13:18:52.403474 [ 4072]: Dropped orphaned reply control with reqid:67976 2010/03/29 13:18:58.181464 [ 4068]: Freeze priority 1 2010/03/29 13:18:59.181320 [ 4068]: Freeze priority 2 2010/03/29 13:19:00.181373 [ 4068]: Freeze priority 3 2010/03/29 13:19:10.396117 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:2472 opcode:70 dstnode:0 2010/03/29 13:19:10.396183 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:19:10.396197 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:19:10.396208 [ 4072]: server/ctdb_recoverd.c:178 Node 0 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:19:10.396223 [ 4072]: client/ctdb_client.c:771 control timed out. reqid:68009 opcode:70 dstnode:1 2010/03/29 13:19:10.396234 [ 4072]: client/ctdb_client.c:882 ctdb_control_recv failed 2010/03/29 13:19:10.396244 [ 4072]: Async operation failed with state 3, opcode:70 2010/03/29 13:19:10.396253 [ 4072]: server/ctdb_recoverd.c:178 Node 1 failed the startrecovery event. Setting it as recovery fail culprit 2010/03/29 13:19:10.396266 [ 4072]: Async wait failed - fail_count=2 2010/03/29 13:19:10.396276 [ 4072]: server/ctdb_recoverd.c:202 Unable to run the 'startrecovery' event. Recovery failed. 2010/03/29 13:19:10.396287 [ 4072]: server/ctdb_recoverd.c:1372 Unable to run the 'startrecovery' event on cluster 2010/03/29 13:19:17.181550 [ 4068]: Freeze priority 1 2010/03/29 13:19:18.181889 [ 4068]: Freeze priority 2 2010/03/29 13:19:19.181927 [ 4068]: Freeze priority 3 2010/03/29 13:19:19.588486 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4341 2010/03/29 13:19:19.588650 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4341 2010/03/29 13:19:19.588696 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:19:19.592916 [ 4341]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4341 2010/03/29 13:19:19.601901 [ 4072]: Dropped orphaned reply control with reqid:2472 2010/03/29 13:19:19.648327 [ 4341]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329131919.4341 2010/03/29 13:19:20.459066 [ 4072]: client/ctdb_client.c:718 reqid 68009 not found 2010/03/29 13:19:27.525801 [ 4068]: Freeze priority 1 2010/03/29 13:19:27.528888 [ 4068]: Freeze priority 2 2010/03/29 13:19:27.529510 [ 4068]: Freeze priority 3 ... repeated a trillion times ... 2010/03/29 13:23:49.582304 [ 4068]: Banning timedout 2010/03/29 13:23:52.992313 [ 4068]: Freeze priority 1 2010/03/29 13:23:52.992993 [ 4068]: Freeze priority 2 2010/03/29 13:23:52.993591 [ 4068]: Freeze priority 3 2010/03/29 13:24:23.017062 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4384 2010/03/29 13:24:23.017247 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4384 2010/03/29 13:24:23.017277 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:24:23.021698 [ 4384]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4384 2010/03/29 13:24:23.074976 [ 4384]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132423.4384 2010/03/29 13:24:23.075789 [ 4068]: Freeze priority 1 2010/03/29 13:24:23.076161 [ 4068]: Freeze priority 2 2010/03/29 13:24:23.076521 [ 4068]: Freeze priority 3 2010/03/29 13:24:53.078187 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4396 2010/03/29 13:24:53.078332 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4396 2010/03/29 13:24:53.078357 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:24:53.083262 [ 4396]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4396 2010/03/29 13:24:53.117516 [ 4396]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132453.4396 2010/03/29 13:24:53.131690 [ 4068]: Freeze priority 1 2010/03/29 13:24:53.132917 [ 4068]: Freeze priority 2 2010/03/29 13:24:53.133964 [ 4068]: Freeze priority 3 2010/03/29 13:25:23.144022 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4409 2010/03/29 13:25:23.144178 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4409 2010/03/29 13:25:23.144202 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:25:23.148637 [ 4409]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4409 2010/03/29 13:25:23.188240 [ 4409]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132523.4409 2010/03/29 13:25:23.233172 [ 4068]: Freeze priority 1 2010/03/29 13:25:23.234187 [ 4068]: Freeze priority 2 2010/03/29 13:25:23.235142 [ 4068]: Freeze priority 3 2010/03/29 13:25:53.238287 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4422 2010/03/29 13:25:53.238507 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4422 2010/03/29 13:25:53.238583 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:25:53.245123 [ 4422]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4422 2010/03/29 13:25:53.284057 [ 4422]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132553.4422 2010/03/29 13:25:53.321100 [ 4068]: Banning this node for 300 seconds 2010/03/29 13:25:53.330818 [ 4068]: Freeze priority 1 2010/03/29 13:25:53.331684 [ 4068]: Freeze priority 2 2010/03/29 13:25:53.332903 [ 4068]: Freeze priority 3 2010/03/29 13:26:23.335360 [ 4068]: Event script timed out : startrecovery count : 0 pid : 4435 2010/03/29 13:26:23.335544 [ 4068]: server/eventscript.c:508 Sending SIGTERM to child pid:4435 2010/03/29 13:26:23.335609 [ 4068]: server/ctdb_recover.c:997 startrecovery event script failed (status -62) 2010/03/29 13:26:23.340020 [ 4435]: Timed out running script '/etc/ctdb/events.d/01.reclock startrecovery ' after 30.0 seconds pid :4435 2010/03/29 13:26:23.378342 [ 4068]: Freeze priority 1 2010/03/29 13:26:23.379117 [ 4435]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132623.4435 2010/03/29 13:26:23.379856 [ 4068]: Freeze priority 2 2010/03/29 13:26:23.380306 [ 4068]: Freeze priority 3 2010/03/29 13:26:27.397589 [ 4068]: Freeze priority 1 ... repeated a trillion times ... 2010/03/29 13:27:26.458919 [ 4533]: Timed out running script '/etc/ctdb/events.d/01.reclock shutdown ' after 1.2 seconds pid :4533 2010/03/29 13:27:26.461119 [ 4533]: Timed out running script '/etc/ctdb/events.d/01.reclock shutdown ' after 1.2 seconds pid :4533 2010/03/29 13:27:26.538304 [ 4533]: Logged timedout eventscript : { pstree -p; cat /proc/locks; ls -li /var/ctdb/ /var/ctdb/persistent; } >/tmp/ctdb.event.20100329132726.4533