Mongod spawn error (wildTiger library panic)

DanielAsarnow · January 31, 2018, 5:46am

Started having this error. Different from past spawn errors.

mogod.stdout:

2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] MongoDB starting : pid=5670 port=38001 dbpath=/mnt/data/cryosparc/cryosparc/cryosparc/run/db 64-bit host
=ishtar
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] db version v3.2.9
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] git version: 22ec9e93b40c85fc7cae7d56e7d6a02fd811088c
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] allocator: tcmalloc
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] modules: none
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] build environment:
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten]     distarch: x86_64
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten]     target_arch: x86_64
2018-01-30T21:41:28.317-0800 I CONTROL  [initandlisten] options: { net: { port: 38001 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath:
 "/mnt/data/cryosparc/cryosparc/cryosparc/run/db", journal: { enabled: false } } }
2018-01-30T21:41:28.354-0800 I -        [initandlisten] Detected data files in /mnt/data/cryosparc/cryosparc/cryosparc/run/db created by the 'wiredTiger' storag
e engine, so setting the active storage engine to 'wiredTiger'.
2018-01-30T21:41:28.354-0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=55G,session_max=20000,eviction=(threads_max=4),config_base=fal
se,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),stat
istics_log=(wait=0),,log=(enabled=false),
2018-01-30T21:41:28.362-0800 E STORAGE  [initandlisten] WiredTiger (22) [1517377288:362556][5670:0x7fa8032accc0], file:WiredTiger.wt, connection: live.avail: me
rge range 16384-24576 overlaps with existing range 20480-24576: Invalid argument
2018-01-30T21:41:28.362-0800 E STORAGE  [initandlisten] WiredTiger (-31804) [1517377288:362597][5670:0x7fa8032accc0], file:WiredTiger.wt, connection: the proces
s must exit and restart: WT_PANIC: WiredTiger library panic
2018-01-30T21:41:28.362-0800 I -        [initandlisten] Fatal Assertion 28558
2018-01-30T21:41:28.362-0800 I -        [initandlisten]

***aborting after fassert() failure

DanielAsarnow · February 5, 2018, 9:29pm

@apunjani @spunjani We had about 6,000 experiments in this database, so I’d like to recover it. Can you guys give me any input or other logs to send?

I’m going to have to spin up a fresh instance pretty soon because the lab is clamoring to resume processing, but then I’ll have to merge them or set the old one to read-only somehow.

apunjani · February 5, 2018, 10:09pm

Hi @DanielAsarnow, sorry for the delay.

And chance you have a backup of the db?

Is there more to that log file before the error? Can you send me the whole thing actually? Did the disk get full at any point?
How large is the run/db folder? If not too large can you send it to me (I can provide scp credentials via email)

Ali

apunjani · February 5, 2018, 10:13pm

There is potential for a mongodb repair to work, but I’m not sure. This seems to be the most closely related issue https://jira.mongodb.org/browse/SERVER-16210

apunjani · February 5, 2018, 10:39pm

To do the database repair:

From in the cryosparc install dir:

cryosparc stop
killall mongod (just in case)
make a copy of the db:
cp -r /run/db /run/db_backup
eval $(cryosparc env)
Attempt to repair:
mongod --dbpath run/db --repair

If this is successful, then you should have a working db with potentially some files missing that got corrupt. Hopefully only the last few jobs would be affected.
Let me know what happens.

DanielAsarnow · February 6, 2018, 12:09am

We’re working on this separately, when we get it figured out I’ll edit this post with the conclusions.

xzhang2017 · April 23, 2018, 1:53am

we are having the same problem!! and cannot access all data!! How did you solve the problem? Thanks much.

Bests,
Xing

DanielAsarnow · April 23, 2018, 11:20pm

Restore your database from an automatic backup. You turned those on, didn’t you?

The only other method is to compile custom versions of the mongo/wildtiger tool chain to recover that one broken index. But, there’s no way to know which .wt file is involved, so it would be trial-and-error even if you manage to get the custom toolchain working. We decided to cut our losses and zero-out the db, and then be careful to run at least weekly backups (using the automatic backup feature). If anything goes wrong now, we can just rewind a few days.

Good luck!

xzhang2017 · April 24, 2018, 1:09am

Thanks, Daniel,

Unfortunately we are using all default settings, and didn’t turned on the autobackup:sob:

Bests,
Xing

spunjani · October 11, 2018, 7:48pm