… | |
… | |
1538 | |
1538 | |
1539 | So when you encounter spurious, unexplained daemon exits, make sure you |
1539 | So when you encounter spurious, unexplained daemon exits, make sure you |
1540 | ignore SIGPIPE (and maybe make sure you log the exit status of your daemon |
1540 | ignore SIGPIPE (and maybe make sure you log the exit status of your daemon |
1541 | somewhere, as that would have given you a big clue). |
1541 | somewhere, as that would have given you a big clue). |
1542 | |
1542 | |
|
|
1543 | =head3 The special problem of accept()ing when you can't |
|
|
1544 | |
|
|
1545 | Many implementations of the POSIX C<accept> function (for example, |
|
|
1546 | found in port-2004 Linux) have the peculiar behaviour of not removing a |
|
|
1547 | connection from the pending queue in all error cases. |
|
|
1548 | |
|
|
1549 | For example, larger servers often run out of file descriptors (because |
|
|
1550 | of resource limits), causing C<accept> to fail with C<ENFILE> but not |
|
|
1551 | rejecting the connection, leading to libev signalling readiness on |
|
|
1552 | the next iteration again (the connection still exists after all), and |
|
|
1553 | typically causing the program to loop at 100% CPU usage. |
|
|
1554 | |
|
|
1555 | Unfortunately, the set of errors that cause this issue differs between |
|
|
1556 | operating systems, there is usually little the app can do to remedy the |
|
|
1557 | situation, and no known thread-safe method of removing the connection to |
|
|
1558 | cope with overload is known (to me). |
|
|
1559 | |
|
|
1560 | One of the easiest ways to handle this situation is to just ignore it |
|
|
1561 | - when the program encounters an overload, it will just loop until the |
|
|
1562 | situation is over. While this is a form of busy waiting, no OS offers an |
|
|
1563 | event-based way to handle this situation, so it's the best one can do. |
|
|
1564 | |
|
|
1565 | A better way to handle the situation is to log any errors other than |
|
|
1566 | C<EAGAIN> and C<EWOULDBLOCK>, making sure not to flood the log with such |
|
|
1567 | messages, and continue as usual, which at least gives the user an idea of |
|
|
1568 | what could be wrong ("raise the ulimit!"). For extra points one could stop |
|
|
1569 | the C<ev_io> watcher on the listening fd "for a while", which reduces CPU |
|
|
1570 | usage. |
|
|
1571 | |
|
|
1572 | If your program is single-threaded, then you could also keep a dummy file |
|
|
1573 | descriptor for overload situations (e.g. by opening F</dev/null>), and |
|
|
1574 | when you run into C<ENFILE> or C<EMFILE>, close it, run C<accept>, |
|
|
1575 | close that fd, and create a new dummy fd. This will gracefully refuse |
|
|
1576 | clients under typical overload conditions. |
|
|
1577 | |
|
|
1578 | The last way to handle it is to simply log the error and C<exit>, as |
|
|
1579 | is often done with C<malloc> failures, but this results in an easy |
|
|
1580 | opportunity for a DoS attack. |
1543 | |
1581 | |
1544 | =head3 Watcher-Specific Functions |
1582 | =head3 Watcher-Specific Functions |
1545 | |
1583 | |
1546 | =over 4 |
1584 | =over 4 |
1547 | |
1585 | |