Mongod spawn error (wildTiger library panic)

Started having this error. Different from past spawn errors.

mogod.stdout:

2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] MongoDB starting : pid=5670 port=38001 dbpath=/mnt/data/cryosparc/cryosparc/cryosparc/run/db 64-bit host
=ishtar
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] db version v3.2.9
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] git version: 22ec9e93b40c85fc7cae7d56e7d6a02fd811088c
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] allocator: tcmalloc
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] modules: none
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] build environment:
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten]     distarch: x86_64
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten]     target_arch: x86_64
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] options: { net: { port: 38001 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath:
 "/mnt/data/cryosparc/cryosparc/cryosparc/run/db", journal: { enabled: false } } }
2018-01-30T21:41:28.354-0800 I -        [initandlisten] Detected data files in /mnt/data/cryosparc/cryosparc/cryosparc/run/db created by the 'wiredTiger' storag
e engine, so setting the active storage engine to 'wiredTiger'.
2018-01-30T21:41:28.354-0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=55G,session_max=20000,eviction=(threads_max=4),config_base=fal
se,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),stat
istics_log=(wait=0),,log=(enabled=false),
2018-01-30T21:41:28.362-0800 E STORAGE  [initandlisten] WiredTiger (22) [1517377288:362556][5670:0x7fa8032accc0], file:WiredTiger.wt, connection: live.avail: me
rge range 16384-24576 overlaps with existing range 20480-24576: Invalid argument
2018-01-30T21:41:28.362-0800 E STORAGE  [initandlisten] WiredTiger (-31804) [1517377288:362597][5670:0x7fa8032accc0], file:WiredTiger.wt, connection: the proces
s must exit and restart: WT_PANIC: WiredTiger library panic
2018-01-30T21:41:28.362-0800 I -        [initandlisten] Fatal Assertion 28558
2018-01-30T21:41:28.362-0800 I -        [initandlisten]

***aborting after fassert() failure

@apunjani @spunjani We had about 6,000 experiments in this database, so I’d like to recover it. Can you guys give me any input or other logs to send?

I’m going to have to spin up a fresh instance pretty soon because the lab is clamoring to resume processing, but then I’ll have to merge them or set the old one to read-only somehow.

Hi @DanielAsarnow, sorry for the delay.

And chance you have a backup of the db?

Is there more to that log file before the error? Can you send me the whole thing actually? Did the disk get full at any point?
How large is the run/db folder? If not too large can you send it to me (I can provide scp credentials via email)

Ali

There is potential for a mongodb repair to work, but I’m not sure. This seems to be the most closely related issue https://jira.mongodb.org/browse/SERVER-16210

To do the database repair:

From in the cryosparc install dir:

  1. cryosparc stop
  2. killall mongod (just in case)
  3. make a copy of the db:
    cp -r /run/db /run/db_backup
  4. eval $(cryosparc env)
  5. Attempt to repair:
    mongod --dbpath run/db --repair

If this is successful, then you should have a working db with potentially some files missing that got corrupt. Hopefully only the last few jobs would be affected.
Let me know what happens.

We’re working on this separately, when we get it figured out I’ll edit this post with the conclusions.

we are having the same problem!! and cannot access all data!! How did you solve the problem? Thanks much.

Bests,
Xing

Restore your database from an automatic backup. You turned those on, didn’t you? :wink:

The only other method is to compile custom versions of the mongo/wildtiger tool chain to recover that one broken index. But, there’s no way to know which .wt file is involved, so it would be trial-and-error even if you manage to get the custom toolchain working. We decided to cut our losses and zero-out the db, and then be careful to run at least weekly backups (using the automatic backup feature). If anything goes wrong now, we can just rewind a few days.

Good luck!

Thanks, Daniel,

Unfortunately we are using all default settings, and didn’t turned on the autobackup:sob:

Bests,
Xing