Linux NSS (libnss) and nss_ldap problems and possible solutions.

To integrate a Linux system with a centralized user directory (like Microsoft Active Directory) the usual solution is to configure Kerberos for Authentication (password/credential checking) and LDAP for Authorization and Access Control. The "standarized" way to implement this is using libpam_krb5libnss_ldap (by padl software) and nscd (from libc).

Kerberos integration works pretty well and I do not have too many issues with it, but I can not say the same from libnss_ldap and nscd.

In this post I will explain the anoying problems that you can find using libnss_ldap and nscd, and propose some solutions and configurations that will make it work properly. I also recomend read a previous post about the problems and solutions with connecting an Unix server to Active directory (Spanish post).

Read this article if you are experiencing problems with nscd+libnss_ldap (quoting http://www.nico.schottelius.org/blog/nscd-bugs/):

<!-- more -->How does all this work?

The NSS (Name Service Switch) is the subsystem (it is formed by modules, libraries and daemons) that Linux will use to resolve different names: users, groups, hosts... The base implementation comes with standard C library (libc), but it can be extendend using modules. Here you have a very good description of the library.

Basicly it follows this schema:

In the image you can see how libnss works. It is a shared library that is loaded dynamicaly** in process space. libnss will load different modules following the configuration in /etc/nsswitch.conf. libnss and each module is a shared library, so its code is shared between processes, but ** internal data and state is stored in process private area. It will consume the process resources and will be executed into the process threads.

The submodule has in charge execute all the logic needed to resolve the different names. For instance, with libnssldap, it will read the libnssldap.conf file, connect to the server, query it and parse the response and return the values, and it is also responsible to monitor the remote servers, timeout queries. Since each process loads the library, each process does this.

Of cuorse, if you have nscd (name service cache daemon) running (and the process can access to it), it will query it instead of calling the submodule. But if the nscd is not available because it fails, it is not started or something, the process will load all nss modules...

This behaviour is no really a big deal for simple modules, like local files. But in the case of libnss_ldap it can be a big problem. Lets see why...

Firstly, as commented here by Arthur de Jong author of nss-pam-ldapd, this original implementation has several problems (I copy his words):

On the other hand, the fact of being a shared library has several disvantages. I quote the comments in the source code of busibox's unscd.c. He talks about nscd, but the afirmations are true for all processes:

Also, how the nss API is defined, it is not designed to be used for networked directory services. As nsscache's author says (see also his libnss explanation)

In conclusion I can say about this default design:

So, we can say that we are facing a design problem and buggy implementations.

First solution: Upgrade to last software versions and tune the software

The very first solution to minimize the problems is to minimize the bugs:

I will comment soon in another post our working configuration in our environment and some tips about it. We are using Suse 11SP2 and Microsoft Active Directory and right now (19th Agust 2010) a "quite stable" configuration.

Second solution: Avoid the problem, replicate the database locally

You can replicate all the needed users locally: coping it manually, using an script or using nsscache.

nsscache idea is to asyncronously populate a local database using a python batch tool, and store the result in a local small DB that is queried by an small simple light nss module.

Actually I really recommend you read its wiki, it has a very good explanation of the problem, how libnss works and his solution:

Our problem with nsscache is that it is not designed to be used with Active Directory. It does not support:

If I have time, I will try to add those features in the code.

Third solution: Use elternative refactorized solutions

The third solution to minimize the problem attacking also its design. You can use the solutions proposed by:

Mixing this two solutions will give you a quite stable solution:

The global working schema would be as described in this image:

Fourth solution: Mix all them

I think that the best solution will be clone some important users (services, administrators), use unsd, nsspamldapd and a tuned configuration. I hope implement this option someday and post it here.

References:

… this is another random thinking from keymon (http://keymon.wordpress.com)