Miscellaneous
Troubleshooting
Workaround
The CSV file host-principal-keytab-list.csv or host-principal-keytab-list-v212.csv lists all Kerberos principals and keytabs required to enable Kerberos security for a Hadoop cluster. The CSV file is downloadable from Apache Ambari's Add security wizard.
The CSV file host-principal-keytab-list.csv is based on the format used in Apache Ambari v1.6.1 while the CSV file host-principal-keytab-list-v212.csv is based on the newer format used in Apache Ambari v2.1.2. Both formats are supported by the sample script kerberos_security_setup.pl.
The format of the CSV file host-principal-keytab-list.csv is as follows:
<Host in FQDN>,<Component>,<Principal>,<Keytab File>,<Keytab Folder>,<Keytab File User>,<Keytab File Group>,<Keytab File Permission>
An example is shown here:
host1.example.com,Ambari Smoke Test User,ambari-qa@EXAMPLE.COM,smokeuser.headless.keytab,/etc/security/keytabs,ambari-qa,hadoop,440
The format of the CSV file host-principal-keytab-list-v212.csv is as follows:
<host>,<description>,<principal name>,<principal type>,<local username>,<keytab file path>,<keytab file owner>,<keytab file owner access>,<keytab file group>,<keytab file group access>,<keytab file mode>,<keytab file installed>
An example is shown here:
cluster1n1.example.com,/smokeuser,ambari-qa-cluster1@EXAMPLE.COM,USER,ambari-qa,/etc/security/keytabs/smokeuser.headless.keytab,ambari-qa,r,hadoop,r,440,unknown
For more information, please refer to this Apache Ambari page:
Creating Service Principals and Keytab Files for Hadoop
The sample script kerberos_security_setup.pl will create (and deploy) Kerberos principals and keytabs listed in the CSV file.
In case no Kerberos principal and keytab is needed for some cluster nodes, the sample script kerberos_security_setup.pl can also accept the CSV file in this format:
host1.example.com,,,,,,,
host2.example.com,,,,,,,
After new Hadoop services or cluster nodes are added, you might need to figure out the delta changes manually. Then supply the delta changes in CSV files to the sample script similar to this:
perl kerberos_security_setup.pl --input delta_create.csv --create
perl kerberos_security_setup.pl --input delta_deploy.csv --deploy
perl kerberos_security_setup.pl --input delta_undeploy.csv --undeploy
perl kerberos_security_setup.pl --input delta_delete.csv --delete
For example, when a new node is added to the cluster, we can simply deploy shared/headless Kerberos keytabs to the new node. On the other hand, when new Hadoop services/roles (e.g. DataNode) are added to the new node, we will need to create per-host keytabs on the master node before deploying to the new node.
The sample script does the following with --create
command:
The sample script does the following with --deploy
command:
The sample script does the following with --undeploy
command:
The sample script does the following with --delete
command:
The sample script does the following with --remove-spn
command:
Remarks:
--delete
command is run.The requirement is the same as Centrify DirectControl Agent. As in Centrify DirectControl Agent 5.2.2, Perl 5.8 or later is required.
The sample script kerberos_security_setup.pl requires root privilege to run for several reasons. For instance, the sample script needs to distribute Kerberos keytab files across cluster with the desired ownership and permission. Some orgnanizations might have policies to restrict privilege escalation. In such case, we can regulate the privilege granted using sudo or dzdo.
To run the sample script kerberos_security_setup.pl with dzdo:
Configure hadoop.conf to use dzdo when running commands via SSH on cluster nodes:
hadoop.secure.shell.privilege.enable: true
hadoop.secure.shell.privilege: dzdo
Configure hadoop.conf to specify the unix user account to run commands via SSH on cluster nodes. For example, the AD user with unix name "aduser" is used:
hadoop.secure.shell.user.enable: true
hadoop.secure.shell.user: aduser
Configure hadoop.conf to specify the unix user account to copy files (e.g. Kerberos keytabs) via SCP to cluster nodes. For example, the local root user is used:
hadoop.secure.copy.user.enable: true
hadoop.secure.copy.user: root
In the Centrify Zone with cluster nodes joined, set up profile for the AD user with unix name "aduser". Allow the AD user to log in to all cluster nodes.
As you will see with the following procedures, AD user will run the sample script with root privilege (using dzdo) on a master node. The sample script will then connect to cluster nodes using the AD user via SSH, and execute commands with root privilege (using dzdo).
Set up roles and command rights for the AD user. Command right should be granted to run the sample script. Command rights are also required to run commands via SSH on cluster nodes. An example is shown here:
Command rights in glob expression:
<centrifydc-install-path>/samples/hadoop/kerberos_security_setup.pl
<centrifydc-system-command-path>/bin/adinfo
test -d /etc/security/keytabs
rm /etc/security/keytabs/*
perl -e sysopen(undef, "/var/centrify/tmp/centrifydc.conf.lock", 194, 0600) || die;
cp --force /etc/centrifydc/centrifydc.conf /var/centrify/tmp/centrifydc.conf.tmp
echo adclient.krb5.service.principals: ftp cifs nfs
mv --force /var/centrify/tmp/centrifydc.conf.tmp /etc/centrifydc/centrifydc.conf
rm /var/centrify/tmp/centrifydc.conf.lock
tee --append /var/centrify/tmp/centrifydc.conf.tmp
<centrifydc-system-command-path>/sbin/adkeytab --delspn -P http -m
<centrifydc-system-command-path>/sbin/adkeytab --new
<centrifydc-system-command-path>/sbin/adkeytab --delete
Command rights in regular expression:
mkdir -p /etc/security/keytabs/?.*
chown .*:.* /etc/security/keytabs/?.*
chmod [0-7]+ /etc/security/keytabs/?.*
rmdir /etc/security/keytabs/?.*
(Remark: the command rights settings above are granular and rigorous. That might be unnecessary in your scenarios.)
Please always verify commands rights required by the sample script using
--dry-run
option. Since options like --force
will modify the commands.
(Optional) Manually expire cache in Centrify DirectControl Agent using
adflush --expire
to load new roles and command rights immediately. This
should be done on all cluster nodes. Otherwise, wait until the cache expired.
Use dzinfo
to check if new settings are loaded.
Centrify DirectControl Agent will set environment variable KRB5CCNAME for AD
users. This might confuse Kerberos utilties (e.g. klist
, kinit
) when
running as another user (e.g. root) using sudo or dzdo. Therefore, either
override KRB5CCNAME in command rights or set KRB5CCNAME before running the
sample script (see below for an example).
A valid TGT of Kerberos principal is required to create (and delete) accounts
on Active Directory automatically. The Kerberos principal should have the
administrative privilege to create (and delete) accounts on Active Directory.
The sample script will check for a valid TGT before executing recommands.
Otherwise, commands and utilities run by the sample script (e.g. adkeytab
)
might prompt for password.
Since the sample script requires root privilege, we should get the valid TGT as root, e.g.:
dzdo kinit administrator@EXAMPLE.COM -c /tmp/krb5cc_0
Remember to remove the TGT once the sample script has finished, e.g.:
dzdo kdestroy -c /tmp/krb5cc_0
Single sign-on (SSO) is possible using Centrify kerberized OpenSSH. First get a valid TGT of the AD user as root using non-default Kerberos ccache file, e.g.:
dzdo kinit aduser@EXAMPLE.COM -c /tmp/krb5cc_hadoop_ssh
Then configure hadoop.conf to specify KRB5CCNAME with the non-default Kerberos ccache file. After that, SSH will use the non-default Kerberos ccache file to log in cluster nodes seamlessly (Remark: SCP can also be configured):
hadoop.secure.shell.krb5ccname.enable: true
hadoop.secure.shell.krb5ccname: /tmp/krb5cc_hadoop_ssh
Remember to remove the TGT once the sample script has finished, e.g.:
dzdo kdestroy -c /tmp/krb5cc_hadoop_ssh
On the master node, log in as the AD user. Then run the sample script as root using dzdo, e.g.:
dzdo KRB5CCNAME=/tmp/krb5cc_0 ./kerberos\_security\_setup.pl --input host-principal-keytab-list.csv --create
dzdo KRB5CCNAME=/tmp/krb5cc_0 ./kerberos\_security\_setup.pl --input host-principal-keytab-list.csv --deploy
The sample script kerberos_security_setup.pl provides the following options to help recover from last error:
--force
option so commands will proceed even if error is found in
certain situations.--ignore-error
option. Note that this
option will ignore all errors to proceed all commands. Please always examine
log messages afterwards.Another way is to clean up everything by running the sample script with
--delete
command, then run again with --create
command.
Note that the sample script is not sophisticated enough to recover from all
kinds of errors. Therefore, please make sure to read the Pre-requisites
section in README; Review carefully the script commands using --dry-run
option; and test thoroughly in lab environment before deploying to production
environment.
As a precaution, it is often a good idea to run the sample script using a divide-and-conquer approach. That is, instead of creating and distributing Kerberos keytabs to hundreds or thousands of cluster nodes at once, we can break the task down into sub-procedures, e.g.:
If you are using configuration management tools (e.g. Chef and Puppet), you
might prefer to write your own recipes/manifests instead of running the sample
script kerberos_security_setup.pl. In this case, run the sample script with
--dry-run
option to get the commands. Then implement your own
recipes/manifests.
If no new log message is found, likely the sample script is held by interactive prompt when running commands. For instance when SSH, SCP, or adkeytab are run. The sample script should be terminated in this case.
Therefore, please make sure to read the Pre-requisites section in
README; Review carefully the script commands using --dry-run
option; and
test thoroughly in lab environment before deploying to production environment.
--verbose
option to show more messages.addebug
CLI provided by Centrify
DirectControl Agent.This error appears when joining domain (using adjoin) but the joined computer account on Active Directory cannot associate with, say, HTTP or NFS SPN. This is normal in Hadoop environment. Since HTTP SPN is often associated with the per-host SPNEGO service account (HTTP/HOST@REALM). NFS SPN may associate with the per-host NFS gateway service account (NFS/HOST@REALM).
To resolve this issue, modify Centrify DirectControl Agent configuration file by setting parameter adclient.krb5.service.principals as "ftp cifs nfs" (if only "http" is in conflict), or "ftp cifs" (if both "http" and "nfs" are in conflict).
The sample script kerberos_security_setup.pl does this for every host in the cluster
when --remove-spn
command is run (Note: the specified parameters in hadoop.conf have to
be set, e.g. hadoop.adclient.krb5.service.principal.http.remove: true
and/or
hadoop.adclient.krb5.service.principal.nfs.remove: true
).
The sample script kerberos_security_setup.pl with --create
command does that too.
The problem is due to adkeytab loading root-owned Kerberos credential cache /tmp/krb5cc_cm_agent instead of the default /tmp/krb5cc_0. (REF#: 76873)
This problem only occurs when TGT in Kerberos credential cache /tmp/krb5cc_cm_agent is valid and not expired. Other Centrify CLIs (e.g. adjoin/adleave) also have this behavior.
To work around, either set environment variable KRB5CCNAME or remove /tmp/krb5cc_cm_agent as follows:
From our observation, /tmp/krb5cc_cm_agent is not required by cloudera-scm-agent once the service has been started. And the Kerberos credential cache will be expired without renewal. However, please confirm with Cloudera support before proceed. Or stop cloudera-scm-agent before removal.
MapR provides a PAM Authenticator module that generates MapR tickets [1] during login. Customers can add the PAM module (libmapr_pam.so) to PAM configuration files.
The problem is that the PAM module fails to run when logging in via Centrify-enabled OpenSSH. Because a dynamic library (libgcc_s.so.1) cannot be loaded properly. (REF#: 76441)
To work around on CentOS 6.4 x64 (other Linux distributions should work too):
Modify /etc/init.d/centrify-sshd. In function start(), change
$SSHD $OPTIONS || failure
to
LD_PRELOAD=/lib64/libgcc_s.so.1 $SSHD $OPTIONS || failure
Restart service centrify-sshd.
Please note that this workaround is provided for use as-is, outside of the normal development and quality assurance review cycles. Centrify does not intend to address any issues or enhance any features provided by the workaround in future releases. You should test the workaround in a lab environment as thoroughly as possible before deploying to a production environment.
Moreover, Centrify provides a flexible way to execute commands during user
login. MapR tickets should be able to generate automatically by executing the
maprlogin
utility. Therefore, Centrify suggests not to use this MapR PAM
module. You can refer to the Specify commands to run Group Policy for more
detail.
References: