Last weekend (July 3rd, 2011), Washington had a tremendous thunderstorm, which led to a long power outage. We have a number of experimental and development virtual machines in our local server farm that were impacted, but what was truly hit hard was an instance of Zimbra (an excellent email and collaboration environment) that we use to support some public outreach programs (such as the NGP).
Zimbra uses OpenLDAP, which in turn is based upon Berkley DB (now maintained by Oracle). BDB is open, simple, fast, and powerful, but has a tendancy to really dislike power failures. BDB uses a transaction log system (with names like "log.0000000001") that will render the database unusable if inadvertaintly deleted. On the other hand, the logs ensure the validity of the database transactions. So what do you do if your Zimbra instance doesn't come up, and by looking at /var/log/zimbra.log you realize that the ldap server is to blame (with "bdb()" errors)?
First, know that you are not alone: BDB recovery is so common that they have a "db_recover" commnand.
As root run:
/etc/init.d/zimbra stop cd /opt/zimbra/data/ldap/hdb/db/
/opt/zimbra/libexec/zmfixperms --extended /etc/init.d/zimbra start
If that doesn't work, db_recover has a "catastrophic" mode:
However, if "catastrophic" mode returns errors like:
db_recover: Commonly caused by moving a database from one database environment
db_recover: to another without clearing the database LSNs, or by removing all of
db_recover: the log files from a database environment
then to get the logs happy, you may need to dump and re-import some specific databases (typically the id2entry.bdb file which is very active):
/opt/zimbra/bdb/bin/db_dump id2entry.bdb > foo.txt
cp id2entry.bdb id2entry.bdb.bak
cat foo.txt | /opt/zimbra/bdb/bin/db_load id2entry.bdb
With all of this, I was able to get LDAP (slapd) up and running. Oddly enough, after a couple of minutes slapd would PANIC (using that word in zimbra.log) and shutting down with a BDB inconsistency. Fortunately because I was able to get it to run for a couple of minutes, I used that time to perform a full LDAP dump and restore. In addition, this is a great way to conduct periodic backups of LDAP to speed recovery in the future (so if you are running the community version of Zimbra, add the backup command to a cron job!). To create a live dump of LDAP while it is running, run:
su - zimbra
Then, to restore (or fully reconstruct) the LDAP database:
su - zimbra
# stop zimbra
# Make the DB directories
mv /opt/zimbra/data/ldap/hdb /opt/zimbra/data/ldap/hdb.old
mkdir -p /opt/zimbra/data/ldap/hdb/db \
cp /opt/zimbra/data/ldap/hdb.old/db/DB_CONFIG \ /opt/zimbra/data/ldap/hdb/db/DB_CONFIG
# Restore the backup you made to create a nice and clean BDB set
/opt/zimbra/openldap/sbin/slapadd -q -b "" -F \ /opt/zimbra/data/ldap/config -cv -l /opt/zimbra/backup/ldap.bak
# Start zimbra
Once I performed this step of backup and recovery, LDAP finally behaved correctly.
I hope this saves you time (and stress)!