[Bug 282217] databases/couchdb3: Crashes after a few minutes

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 20 Oct 2024 01:24:27 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=282217

            Bug ID: 282217
           Summary: databases/couchdb3: Crashes after a few minutes
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: Individual Port(s)
          Assignee: dch@freebsd.org
          Reporter: erik@nordstroem.no
          Assignee: dch@freebsd.org
             Flags: maintainer-feedback?(dch@freebsd.org)

I am running CouchDB 3.3.3, installed via latest package available in quarterly
package repo.

Package version couchdb3-3.3.3_1

Host OS: FreeBSD 14.1-RELEASE-p5

I created a jail, with a config that looks like this:

    db-4 {
      host.hostname = "db-4";

      vnet;
      vnet.interface = "e12b_db_4";
      vnet.interface += "e13b_db_4";

      exec.poststop += "ifconfig e12a_db_4 destroy";
      exec.poststop += "ifconfig e13a_gh_4 destroy";
      exec.poststop += "ifconfig epair12 create";
      exec.poststop += "ifconfig epair13 create";

      sysvsem = "new";
      sysvmsg = "new";
      sysvshm = "new";
    }

The jail has the same version of FreeBSD as the host.

I installed CouchDB 3.3.3 with `pkg install couchdb3`.

I edited the `/usr/local/etc/couchdb3/local.ini` config file according to
package message, and set it to listen to IPv6 global address ::

I enabled the couchdb3 service and started it.

I sent the HTTP PUT requests as instructed:

    curl -X PUT 'http://admin:notmyactualpasswordobviously@[::]:5984/_users'

    curl -X PUT
'http://admin:notmyactualpasswordobviously@[::]:5984/_replicator'

    curl -X PUT
'http://admin:notmyactualpasswordobviously@[::]:5984/_global_changes'

The first PUT command succeeded. The second and third ones returned error
responses similar to:

    {"error":"unknown_error","reason":"badarg","ref":1926143316}

After a few minutes, CouchDB stops running, as can be observed by there no
longer being any process matches for `ps wwwaux | grep couchdb3`.

I turned on logging to file at level info.

The first few lines of the CouchDB log file look like the following:

    [info] 2024-10-20T00:46:44.941085Z couchdb@127.0.0.1 <0.254.0> --------
Preflight check: Checking For Monsters

    [info] 2024-10-20T00:46:44.941137Z couchdb@127.0.0.1 <0.254.0> --------
Preflight check: Asserting Admin Account

    [info] 2024-10-20T00:46:44.943129Z couchdb@127.0.0.1 <0.254.0> --------
Apache CouchDB 3.3.3 is starting.

    [info] 2024-10-20T00:46:44.943167Z couchdb@127.0.0.1 <0.255.0> --------
Starting couch_sup
    [info] 2024-10-20T00:46:44.974439Z couchdb@127.0.0.1 <0.254.0> --------
Apache CouchDB has started. Time to relax.

    [notice] 2024-10-20T00:46:44.978854Z couchdb@127.0.0.1 <0.329.0> --------
rexi_server : started servers
    [notice] 2024-10-20T00:46:44.979659Z couchdb@127.0.0.1 <0.333.0> --------
rexi_buffer : started servers
    [error] 2024-10-20T00:46:45.011115Z couchdb@127.0.0.1 emulator --------
Error in process <0.353.0> on node 'couchdb@127.0.0.1' with exit value:
   
{badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module
=>
erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]}

    [error] 2024-10-20T00:46:45.011204Z couchdb@127.0.0.1 emulator --------
Error in process <0.353.0> on node 'couchdb@127.0.0.1' with exit value:
   
{badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module
=>
erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]}

    [error] 2024-10-20T00:46:45.011323Z couchdb@127.0.0.1 emulator --------
Error in process <0.363.0> on node 'couchdb@127.0.0.1' with exit value:
   
{badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module
=>
erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]}

    [error] 2024-10-20T00:46:45.011405Z couchdb@127.0.0.1 emulator --------
Error in process <0.363.0> on node 'couchdb@127.0.0.1' with exit value:
   
{badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module
=>
erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]}

The log file is quickly filled with a lot of error messages, and the log file
grows very quickly to gigabytes in size in just a few minutes.

One of the error messages that keeps showing up says "custodian shard checker
died":

    [notice] 2024-10-20T01:19:03.987934Z couchdb@127.0.0.1 <0.2139.0> --------
custodian shard checker died
{badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module
=>
erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{custodian_util,fold_dbs1,2,[{file,"src/custodian_util.erl"},{line,83}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]}

And aside from that one there are similar looking ones to the ones from the
beginning of the log.

Would appreciate some help here, as I cannot tell from these log files what is
actually going wrong.

-- 
You are receiving this mail because:
You are the assignee for the bug.