I had a problem with a testing server earlier today where the kerberos database had become corrupt. For any users on my Open Directory Master the kerberos passwords were flagged as incorrect, and changing them from work group manager had no effect. Changing them from the command line was not an option as this relies on knowing the users original password – which was corrupt.
The kerberos service can be restarted by greping the output of the ‘ps aux’ command for the kerberos process, (usually named ‘krd5kdc’) and then issuing a kill command against its PID. The service will then automatically restart.
ps aux | grep krd
kill kerberos_PID
This slightly improved the problem as it allowed newly created users to use kerberos with their correct password.
Re-building the kerberos database is done with the following command:
slapconfig -kerberize -f diradmin
This needs to be run as root, either directly of via sudo. The -f flag forces the current set up to be over written.
I would recommend taking a full backup of your users and groups, as well as an archive of your Open Directory server from server admin. Stopping any services that rely on kerberos would also be a good idea.
Re-building the kerberos database from scratch.
If neither of the above options worked then it is possible to rebuild your kerberos database from scratch – nuking your old database. This would also be necessary if you are changing the the Kerberos Domain, however don’t forget that if doing this you would also have to change the search path in all your LDAM and Password Server databases.
To completely rebuild kerberos.
1) Stop the OD Service.
2) Log into a shell as root and run the following command:
sso_util remove -k 0a diradmin -p your_diradmin_password -r your_kerberos_realm
3) Remove the following files and directories from your system:
/var/db/krb5kdv/
/Library/Preferences/edu.mit.Kerberos
/etc/krb5.keytab
4) Run the following set of commands as root:
dscl 127.0.0.1
cd /LDAPv3/127.0.0.1/Config/
auth diradmin (and enter your diradmin password)
delete KerberosKDC
delete KerberosClient
quit
5) find and kerberos process (krb5kdc) and kadmind processed and kill them, (as shown above).
6) Re-build your kerberos database:
slapconfig -kerberize diradmin
Testing Kerberos:
You can check if the users’ passwords are now being accepted using the ‘kpasswd’ command. The ‘kinit’ command can also be used to test creating a kerberos ticket.
How do you stop the “OD Service”? There is no Stop button in Server Admin.
Hi Clint McIntosh,
The best way to stop the OD Service is to issue the following command as root (from shell):
sudo launchctl unload /System/Library/LaunchDaemons/org.openldap.slapd.plist
It can then be restarted in a similar manor with:
sudo launchctl load /System/Library/LaunchDaemons/org.openldap.slapd.plist
If all else fails search for the process ID using “ps aux” and kill it.
Hope that helps. – Thanks for your comment.
- Oliver
Hi,
thanks for this nice manual at first.
I guess I’ve got a very tricky configuration.
I did it step by step.
After upgrading from 10.5 I recognized that kerberos is not running. So I tried your steps. Besides of stopping OD Service
But after entering slapconfig -kerberize diradmin
I get this error: kdcsetup command failed with exit code 255
and have no clue what to do.
After this the “Kerberize” Button appeared in Server-Admin. But it looks like it does not accept any of my authentication. Neither sysadmin nor diradmin.
Hi Jon,
Glad you found the post useful.
This sounds like a strange one though! First off, when you started the stages above where there defiantly no kerberos services running? You can check this easily using the ‘ps aux’ command. Everything needs to be killed before running through the steps in the manual above.
Assuming there was nothing running I would try the following:
1) Do you see the diradmin and sysadmin users in workgroup manager? If so try just changing the password for these users from there, as it may just sort out any small corruption in the database.
2) The next thing I would check would be the DNS and Reverse DNS records. I might be an idea to enter the associated hosts into the servers local hosts file (/etc/hosts). All of the Directory services rely heavily on DNS records being correct.
3) Lastly you can add a ‘-f’ flag to the slapconfig command if you wish to force a server to be kerberized, but i’m not sure that would help much in this situation.
Failing all of the above could you post any relevant looking bits form the slapconfig.log up and i’ll have another go!
Thanks for your comment.
- Oliver
Hi Oliver,
thank you very much for your quick and extensive answer.
I checked it and did the steps of your manual again for beginning.
So I found out that there are some more issues on my system which I didn’t really cared for at my first run.
1. the command ‘dscl 127.0.0.1′ does not work. I have to take ‘dscl localhost’ instead.
2. same issue with ‘cd /LDAPv3/127.0.0.1/Config/’. Here I had to replace 127.0.0.1 to OS-X-Server, which is my servers name.
I guess this must be one reason why ‘slapconfig -kerberize diradmin’ does not works because I see in the logs that the contained commands use the same variables than yours.
But I don’t have any idea how and there to change this.
Maybe it might be better if I do a clean install and import the users. But than I don’t know how to import them with passwords because they are not included in the workgroup managers export. The import/export of archives feature in server admin does not work with my database. Maybe because of the same failures.
I thank you very much for your response! Hope you have an idea.
Hi Jon,
Nice, haven’t seen that one before! There’s clearly something going wrong with the mapping of the localhost address there, and it would defiantly explain why the slapconfig command is failing. Can you stick a copy of your /etc/hosts file up?
If you still have an active copy of the database somewhere, other then on this problematic server, you can take a full backup of the LDAP server, including the password database using the ‘slapconfig -backupdb’ command.
Keep your findings coming!
- Oliver
Jon,
Also, can you run ‘ifconfig’ from shell and put the results up? I wonder if your Loopback interface has not come up…
- Oliver
Hi Oliver,
jep this really looks pretty weird.
Than I looked into /etc/hosts I found at the end two very strange entries, which I would delete normally…
Here comes the information you wanted:
/etc/hosts
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
lo0: flags=8049 mtu 16384
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0×1
inet 127.0.0.1 netmask 0xff000000
gif0: flags=8010 mtu 1280
stf0: flags=0 mtu 1280
en0: flags=8863 mtu 1500
ether 00:1f:5b:31:90:9c
media: autoselect
status: inactive
en1: flags=8863 mtu 1500
ether 00:1f:5b:31:90:9d
inet6 fe80::21f:5bff:fe31:909d%en1 prefixlen 64 scopeid 0×5
inet 192.168.0.1 netmask 0xffff0000 broadcast 192.168.255.255
media: autoselect (100baseTX )
status: active
fw0: flags=8822 mtu 4078
lladdr 00:1f:f3:ff:fe:0c:53:de
media: autoselect
status: inactive
ppp0: flags=8051 mtu 1444
inet 192.168.0.1 –> 192.168.3.102 netmask 0xffffff00
Jon
Oh sorry, I forgot to mark the beginning of ifconfigs output.
This is at ‘lo0: flags=8049 mtu 16384′
Jon
Hi Jon,
The odd looking lines at the end of the hosts file are there for the IPv6 support, so defiantly need to stay in! The loop back interface is clearly coming up ok, so i guess that might narrow it down to the routing table. Could you run “netstat -nr” as root and put the results up? It should look basically like this: (sorry for the slightly messed up formatting!)
Destination Gateway Flags Refs Use Netif Expire
127 127.0.0.1 UCS 0 0 lo0
127.0.0.1 127.0.0.1 UH 3 123755 lo0
169.254.137.72 127.0.0.1 UHS 0 0 lo0
192.168.1.73 127.0.0.1 UHS 0 0 lo0
::1 ::1 UH lo0
fd08:a43e:9507:e927:d69a:20ff:fe00:37e2 link#1 UHL lo0
fe80::%lo0/64 fe80::1%lo0 Uc lo0
fe80::1%lo0 link#1 UHL lo0
fe80::d69a:20ff:fe72:b63e%en1 d4:9a:20:72:b6:3e UHL lo0
fe80::21c:42ff:fe00:8%en2 0:1c:42:0:0:8 UHL lo0
ff01::/32 ::1 Um lo0
ff02::/32 ::1 UmC lo0
Keep us updated!
- Oliver
P.S. I remembered that I have seen this once before. It turned out to be that the machine did not have enough memory to bring the lo interface up on start up. I assume that you have plenty of memory in your box?
Hi Oliver,
thanks for your help and your time.
Unfortunately I had to solve this problem quickly.
So I made it over the weekend with ‘brute force’. I did a reinstall and build a new OD database. Now everything is fine except the users lost their passwords. But that was bearable.
I’m really happy about your help. Sorry but I had to hurry.
btw: yes you were right. That machine had plenty of memory.
Greetings Jon
Hi, Oliver.
Nice howto regarding kerberos.
Where can I find the similar stuff regardning The Passwordserver on a 10.6 server.
When the RSA private key is corrupt in the password server, no replicas can be made successfully. Haven’t found a way to rebuild it without wiping the passwords.
you got a clue on that?
Thank yor for your advice. It worked for us. But you wrote a command in wrong order:
slapconfig -kerberize diradmin -f
correct:
slapconfig -kerberize -f diradmin
Thx again!
Good spot, thanks! I’ve corrected it now.