Frequently Asked Questions

Miscellaneous

Troubleshooting

Workaround


Miscellaneous

What is the CSV file host-principal-keytab-list.csv or host-principal-keytab-list-v212.csv?

The CSV file host-principal-keytab-list.csv or host-principal-keytab-list-v212.csv lists all Kerberos principals and keytabs required to enable Kerberos security for a Hadoop cluster. The CSV file is downloadable from Apache Ambari's Add security wizard.

The CSV file host-principal-keytab-list.csv is based on the format used in Apache Ambari v1.6.1 while the CSV file host-principal-keytab-list-v212.csv is based on the newer format used in Apache Ambari v2.1.2. Both formats are supported by the sample script kerberos_security_setup.pl.

The format of the CSV file host-principal-keytab-list.csv is as follows:

<Host in FQDN>,<Component>,<Principal>,<Keytab File>,<Keytab Folder>,<Keytab File User>,<Keytab File Group>,<Keytab File Permission>

An example is shown here:

host1.example.com,Ambari Smoke Test User,ambari-qa@EXAMPLE.COM,smokeuser.headless.keytab,/etc/security/keytabs,ambari-qa,hadoop,440

The format of the CSV file host-principal-keytab-list-v212.csv is as follows:

<host>,<description>,<principal name>,<principal type>,<local username>,<keytab file path>,<keytab file owner>,<keytab file owner access>,<keytab file group>,<keytab file group access>,<keytab file mode>,<keytab file installed>

An example is shown here:

cluster1n1.example.com,/smokeuser,ambari-qa-cluster1@EXAMPLE.COM,USER,ambari-qa,/etc/security/keytabs/smokeuser.headless.keytab,ambari-qa,r,hadoop,r,440,unknown

For more information, please refer to this Apache Ambari page:
Creating Service Principals and Keytab Files for Hadoop

How does the sample script kerberos_security_setup.pl use the CSV file?

The sample script kerberos_security_setup.pl will create (and deploy) Kerberos principals and keytabs listed in the CSV file.

In case no Kerberos principal and keytab is needed for some cluster nodes, the sample script kerberos_security_setup.pl can also accept the CSV file in this format:

host1.example.com,,,,,,,
host2.example.com,,,,,,,

After new Hadoop services or cluster nodes are added, you might need to figure out the delta changes manually. Then supply the delta changes in CSV files to the sample script similar to this:

perl kerberos_security_setup.pl --input delta_create.csv --create
perl kerberos_security_setup.pl --input delta_deploy.csv --deploy
perl kerberos_security_setup.pl --input delta_undeploy.csv --undeploy
perl kerberos_security_setup.pl --input delta_delete.csv --delete

For example, when a new node is added to the cluster, we can simply deploy shared/headless Kerberos keytabs to the new node. On the other hand, when new Hadoop services/roles (e.g. DataNode) are added to the new node, we will need to create per-host keytabs on the master node before deploying to the new node.

What does the sample script kerberos_security_setup.pl do?

The sample script does the following with --create command:

The sample script does the following with --deploy command:

The sample script does the following with --undeploy command:

The sample script does the following with --delete command:

The sample script does the following with --remove-spn command:

Remarks:

What is the Perl requirement of the sample script kerberos_security_setup.pl?

The requirement is the same as Centrify DirectControl Agent. As in Centrify DirectControl Agent 5.2.2, Perl 5.8 or later is required.

How should I run the sample script kerberos_security_setup.pl with dzdo?

The sample script kerberos_security_setup.pl requires root privilege to run for several reasons. For instance, the sample script needs to distribute Kerberos keytab files across cluster with the desired ownership and permission. Some orgnanizations might have policies to restrict privilege escalation. In such case, we can regulate the privilege granted using sudo or dzdo.

To run the sample script kerberos_security_setup.pl with dzdo:

How should I run the sample script kerberos_security_setup.pl again after error?

The sample script kerberos_security_setup.pl provides the following options to help recover from last error:

Another way is to clean up everything by running the sample script with --delete command, then run again with --create command.

Note that the sample script is not sophisticated enough to recover from all kinds of errors. Therefore, please make sure to read the Pre-requisites section in README; Review carefully the script commands using --dry-run option; and test thoroughly in lab environment before deploying to production environment.

As a precaution, it is often a good idea to run the sample script using a divide-and-conquer approach. That is, instead of creating and distributing Kerberos keytabs to hundreds or thousands of cluster nodes at once, we can break the task down into sub-procedures, e.g.:

What if I do not want to run the sample script kerberos_security_setup.pl on my Hadoop environment?

If you are using configuration management tools (e.g. Chef and Puppet), you might prefer to write your own recipes/manifests instead of running the sample script kerberos_security_setup.pl. In this case, run the sample script with --dry-run option to get the commands. Then implement your own recipes/manifests.


Troubleshooting

The sample script kerberos_security_setup.pl has been running for a long time without progress. What should I do?

If no new log message is found, likely the sample script is held by interactive prompt when running commands. For instance when SSH, SCP, or adkeytab are run. The sample script should be terminated in this case.

Therefore, please make sure to read the Pre-requisites section in README; Review carefully the script commands using --dry-run option; and test thoroughly in lab environment before deploying to production environment.

How do I debug the sample script kerberos_security_setup.pl?


Workaround

Got "Error: One or more of the following SPNs already associated with other account in the forest" when joining domain. What should I do?

This error appears when joining domain (using adjoin) but the joined computer account on Active Directory cannot associate with, say, HTTP or NFS SPN. This is normal in Hadoop environment. Since HTTP SPN is often associated with the per-host SPNEGO service account (HTTP/HOST@REALM). NFS SPN may associate with the per-host NFS gateway service account (NFS/HOST@REALM).

To resolve this issue, modify Centrify DirectControl Agent configuration file by setting parameter adclient.krb5.service.principals as "ftp cifs nfs" (if only "http" is in conflict), or "ftp cifs" (if both "http" and "nfs" are in conflict).

The sample script kerberos_security_setup.pl does this for every host in the cluster when --remove-spn command is run (Note: the specified parameters in hadoop.conf have to be set, e.g. hadoop.adclient.krb5.service.principal.http.remove: true and/or hadoop.adclient.krb5.service.principal.nfs.remove: true).

The sample script kerberos_security_setup.pl with --create command does that too.

[Cloudera] The sample script kerberos_security_setup.pl fails to run on my Cloudera cluster. The error is adkeytab complaining about insufficient permissions to create or change an account. But the Kerberos principal mentioned is not from the default Kerberos credential cache (e.g. /tmp/krb5cc_0). What should I do?

The problem is due to adkeytab loading root-owned Kerberos credential cache /tmp/krb5cc_cm_agent instead of the default /tmp/krb5cc_0. (REF#: 76873)

This problem only occurs when TGT in Kerberos credential cache /tmp/krb5cc_cm_agent is valid and not expired. Other Centrify CLIs (e.g. adjoin/adleave) also have this behavior.

To work around, either set environment variable KRB5CCNAME or remove /tmp/krb5cc_cm_agent as follows:

From our observation, /tmp/krb5cc_cm_agent is not required by cloudera-scm-agent once the service has been started. And the Kerberos credential cache will be expired without renewal. However, please confirm with Cloudera support before proceed. Or stop cloudera-scm-agent before removal.

[MapR] MapR PAM module fails to generate MapR ticket when logging in using Centrify-enabled OpenSSH. What is the problem?

MapR provides a PAM Authenticator module that generates MapR tickets [1] during login. Customers can add the PAM module (libmapr_pam.so) to PAM configuration files.

The problem is that the PAM module fails to run when logging in via Centrify-enabled OpenSSH. Because a dynamic library (libgcc_s.so.1) cannot be loaded properly. (REF#: 76441)

To work around on CentOS 6.4 x64 (other Linux distributions should work too):

Please note that this workaround is provided for use as-is, outside of the normal development and quality assurance review cycles. Centrify does not intend to address any issues or enhance any features provided by the workaround in future releases. You should test the workaround in a lab environment as thoroughly as possible before deploying to a production environment.

Moreover, Centrify provides a flexible way to execute commands during user login. MapR tickets should be able to generate automatically by executing the maprlogin utility. Therefore, Centrify suggests not to use this MapR PAM module. You can refer to the Specify commands to run Group Policy for more detail.

References:

  1. MapR Tickets and the PAM Authenticator