资源

目前课题组的服务器一共分两组:

  • node00-node05组成的Node算力序列,简称Node组; node组的算力为 12cpu*6 node=72CPU, 满负荷进程数144个(单节点提交<=24进程), 内存:128 GB/node * 6 node
  • Core01,Core01组成的Intel Core算力序列, 简称Core组。core组的算力为 24cpu*2core=48CPU, 满负荷96进程(单节点提交<=48进程), 内存:512 GB/Core *2 core
    两个组的算力差异不大,但是后者更适用于高并行的模拟任务,开销的内存较小;node更适合大批量的数据处理,如反演等数据产品的制备,需要大的内存开销。所以在使用上,我们决定按照任务类型将及其分配给不同的用户组。也就是两个不同的集群,共享一套磁盘存储。

目前,8台机器的操作系统已经全部更新,如下:

Hosts 系统 内存 子网ip
node00 Ubuntu 22.04.3 LTS 256 GB 192.168.1.100*
node01 Ubuntu 22.04.3 LTS 256 GB 192.168.1.101
node02 Ubuntu 22.04.2 LTS 256 GB 192.168.1.102
node03 Ubuntu 22.04.3 LTS 256 GB 192.168.1.103
node04 Ubuntu 22.04.3 LTS 256 GB 192.168.1.104
node05 Ubuntu 22.04.5 LTS 256 GB 192.168.1.105*
core00 Ubuntu 22.04.3 LTS 512 GB 192.168.1.210*
core01 Ubuntu 22.04.3 LTS 512 GB 192.168.1.211*
* public network connected

两台磁盘阵列服务器的信息如下:

Hosts 系统 挂载点 子网ip
dataserver1 Ubuntu 20.04.1 LTS /data00 131T 192.168.1.201*
dataserver2 Ubuntu 20.04.1 LTS /data04 328T 192.168.1.106*

上述所有的节点通过nfs-kernel-server服务挂在磁盘阵列上的两个盘:

plaintext
$ df -h 
...
dataserver1:/data00 131T 121T 3.7T 98% /data00
dataserver2:/data04 328T 277T 51T 85% /data04

以上内容(别名系统和网络设置)已经在之前的博客中设置好了。

方案

在架构上,我们计划使用node05作为Node集群的主节点(以前是node01,最近经常崩溃),所有node节点都从这里共享用户信息;core01作为Core集群的主节点,为core01,core02提供用户信息的同步。
但是,两个集群之间还是可以通讯的,对应用户的UID和GID一致,就可以在俩个集群上都能成功访问自己在磁盘阵列上的文件。因此,我们需要使用NIS+NFS服务,将磁盘和用户信息同步到各个从属节点。

NIS服务

NIS(Network Information Service)是一种用于在计算机网络上的计算机之间分发系统配置数据(例如用户名和主机名)的服务。

NIS 服务端(Server)设置

登录core01节点,安装NIS

bash
root@core01:~# apt install nis

设置core01为master,设置ypserver的IP,设置域名:

bash
root@core01:~# vim /etc/default/nis 
# /etc/default/nis
NISSERVER=master
NISMASTER=core01

root@core01:~# vim /etc/yp.conf ## 设定服务端ip或别名
# /etc/yp.conf
ypserver core01

root@core01:~# ypdomainname core01 #设定domainname
root@core01:~# ypdomainname ## 查看
core01

root@core01:~# vim /etc/defaultdomain ## 新建文件
# /etc/defaultdomain
core01 ## 添加

root@core01:~# vim /etc/nsswitch.conf
# /etc/nsswitch.conf
passwd: files systemd sss nis
group: files systemd sss nis
shadow: files sss nis
gshadow: files nis

开启服务

bash
root@core01:~# systemctl restart ypserv ypbind yppasswdd ## 启动服务
root@core01:/etc# systemctl enable ypserv ypbind yppasswdd ## 开启启动服务
Synchronizing state of ypserv.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable ypserv
Synchronizing state of ypbind.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable ypbind
Synchronizing state of yppasswdd.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable yppasswdd
Created symlink /etc/systemd/system/multi-user.target.wants/ypserv.service → /lib/systemd/system/ypserv.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ypbind.service → /lib/systemd/system/ypbind.service.
Created symlink /etc/systemd/system/multi-user.target.wants/yppasswdd.service → /lib/systemd/system/yppasswdd.service.

初始化服务端:

bash
root@core01:/etc# /usr/lib/yp/ypinit -m

At this point, we have to construct a list of the hosts which will run NIS
servers. core01 is in the list of NIS server hosts. Please continue to add
the names for the other hosts, one per line. When you are done with the
list, type a <control D>.
next host to add: core01
next host to add: <control D>
The current list of NIS servers looks like this:

core01

Is this correct? [y/n: y] y
We need a few minutes to build the databases...
Building /var/yp/core01/ypservers...
Running /var/yp/Makefile...
gmake[1]: Entering directory '/var/yp/core01'
Updating passwd.byname...
Updating passwd.byuid...
Updating group.byname...
Updating group.bygid...
Updating hosts.byname...
Updating hosts.byaddr...
Updating rpc.byname...
Updating rpc.bynumber...
Updating services.byname...
Updating services.byservicename...
Updating netid.byname...
Updating protocols.bynumber...
Updating protocols.byname...
Updating netgroup...
Updating netgroup.byhost...
Updating netgroup.byuser...
Updating shadow.byname...
gmake[1]: Leaving directory '/var/yp/core01'

core01 has been set up as a NIS master server.

Now you can run ypinit -s core01 on all slave server

这里提示可以在客户端运行ypinit -s core01来建立通信。

测试:

bash
root@core01:/etc# yptest
Test 1: domainname
Configured domainname is "core01"

Test 2: ypbind
Use Protocol V1: Used NIS server: 192.168.1.210
Use Protocol V2: Used NIS server: 192.168.1.210
Use Protocol V3:
ypbind_nconf:
nc_netid: udp
nc_semantics: 1
nc_flag: 1
nc_protofmly: 'inet'
nc_proto: 'udp'
nc_device: '-'
nc_nlookups: 0
ypbind_svcaddr: 192.168.1.210:1010
ypbind_servername: core01
ypbind_hi_vers: 2
ypbind_lo_vers: 2

Test 3: yp_match
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin

Test 4: yp_first
ncl ncl:x:1001:1001::/home/ncl:/bin/bash

Test 5: yp_next
rli7 rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
nobody nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin

Test 6: yp_master
core01

Test 7: yp_order
1700325739

Test 8: yp_maplist
netgroup.byuser
group.bygid
shadow.byname
services.byname
rpc.byname
netid.byname
passwd.byname
hosts.byname
rpc.bynumber
protocols.byname
protocols.bynumber
group.byname
netgroup.byhost
passwd.byuid
hosts.byaddr
netgroup
services.byservicename
ypservers

Test 9: yp_all
ncl ncl:x:1001:1001::/home/ncl:/bin/bash
rli7 rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
nobody nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
All tests passed

可以看出,服务端已经通过了全部的测试,并且有三条用户信息被载入到数据库。现在,就可以设置客户端了。

NIS 客户端

这里使用core02这台机器作为客户端(salve):

bash
root@core02:~# sudo apt install nis
root@core02:~# vim /etc/yp.conf
# /etc/yp.conf
ypserver core01
root@core02:~# vim /etc/defaults/nis
# /etc/defaults/nis
# Are we a NIS server and if so what kind (values: false, slave, master)?
NISSERVER=false
# Are we a NIS client?
NISCLIENT=slave
NISMASTER=core01

root@core02:~# domainname core01
root@core02:~# domainname
core01
root@core02:~# vim /etc/defaultdomain
core01

开启服务

bash
root@core02:~# systemctl restart ypserv ypbind yppasswdd
root@core02:~# systemctl enable ypserv ypbind yppasswdd
Synchronizing state of ypserv.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable ypserv
Synchronizing state of ypbind.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable ypbind
Synchronizing state of yppasswdd.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable yppasswdd
Created symlink /etc/systemd/system/multi-user.target.wants/ypserv.service → /lib/systemd/system/ypserv.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ypbind.service → /lib/systemd/system/ypbind.service.
Created symlink /etc/systemd/system/multi-user.target.wants/yppasswdd.service → /lib/systemd/system/yppasswdd.service.

建立通信:

bash
root@core02:~# /usr/lib/yp/ypinit -s core01
root@core02:~# yptest
Test 1: domainname
Configured domainname is "core01"

Test 2: ypbind
Use Protocol V1: Used NIS server: 192.168.1.210
Use Protocol V2: Used NIS server: 192.168.1.210
Use Protocol V3:
ypbind_nconf:
nc_netid: udp
nc_semantics: 1
nc_flag: 1
nc_protofmly: 'inet'
nc_proto: 'udp'
nc_device: '-'
nc_nlookups: 0
ypbind_svcaddr: 192.168.1.210:1010
ypbind_servername: core01
ypbind_hi_vers: 2
ypbind_lo_vers: 2

Test 3: yp_match
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin

Test 4: yp_first
ncl ncl:x:1001:1001::/home/ncl:/bin/bash

Test 5: yp_next
rli7 rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
nobody nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin

Test 6: yp_master
core01

Test 7: yp_order
1700331328

Test 8: yp_maplist
netgroup.byuser
group.bygid
shadow.byname
services.byname
rpc.byname
netid.byname
passwd.byname
hosts.byname
rpc.bynumber
protocols.byname
protocols.bynumber
group.byname
netgroup.byhost
passwd.byuid
hosts.byaddr
netgroup
services.byservicename
ypservers

Test 9: yp_all
ncl ncl:x:1001:1001::/home/ncl:/bin/bash
rli7 rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
nobody nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
All tests passed

可以看到,core01中的ncl等用户被同步过来了,切换用户:

bash
root@core02:~# su ncl
ncl@core02:/root$ ls
ls: cannot open directory '.': Permission denied
ncl@core02:/root$ cd
bash: cd: /home/ncl: No such file or directory

这里显示找不到core01用户的主目录/home/ncl,这是因为虽然用户同步过来了,但是文件系统并没有共享,core02的/home/下面并没有该用户的主目录。
这时候就需要共享主目录了: core01:/home - > core02:/home

/home目录的共享

和之前磁盘共享挂载一样,我们需要先将core01上的/home共享出去

bash
## 安装nfs-kernel-server 
root@core01:/etc# apt install nfs-kernel-server
root@core01:/etc# vim /etc/exports
# /etc/exports: the access control list for filesystems which may be exported

### 添加 /home在 192.168.1.0网段内共享权限
/home 192.168.1.0/24(rw,no_root_squash,async,no_subtree_check)
## 广播出去
root@core01:/etc# exportfs -ra

core02挂载core01:/home,替换掉core01:/home:

bash
## 安装
root@core02:~# apt install nfs-kernel-server

## 添加磁盘挂载
root@core02:/home# vim /etc/fstab
# /etc/fstab: static file system information.
...
## 注释掉自己的/home
# /home was on /dev/sda3 during curtin installation
#/dev/disk/by-uuid/6662f49f-df4d-43cc-8838-0767d131ec26 /home ext4 defaults 0 1

## 挂载core01:/home
core01:/home /home nfs default 0 0

## 挂载
root@core02:/home# mount -a

查看磁盘

plaintext
root@core02:/home# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 51G 4.4M 51G 1% /run
/dev/sda3 313G 6.3G 290G 3% /
tmpfs 252G 0 252G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda1 1.1G 6.1M 1.1G 1% /boot/efi
tmpfs 51G 80K 51G 1% /run/user/131
tmpfs 51G 64K 51G 1% /run/user/1000
dataserver1:/data00 131T 121T 3.7T 98% /data00
dataserver2:/data04 328T 277T 51T 85% /data04
core01:/home 313G 15G 282G 5% /home

切换到ncl用户,发现可以取到主目录/home/ncl

bash
root@core02:~# su ncl
ncl@core02:/root$ cd
ncl@core02:~$ ls
bin include lib ncl_ncarg-6.4.0-Debian7.11_64bit_nodap_gnu472.tar.gz

更新用户信息

core01 下新建一个用户jiheng (uid=1516, gid=1516):

bash
root@core01:/etc# useradd jiheng -u 1516 -g 1516 -m -c "Jiheng Hu"
root@core01:/etc# passwd jiheng
New password:
Retype new password:
passwd: password updated successfully
root@core01:/etc# ll /home
drwxr-x--- 15 hjh hjh 4096 Aug 30 12:33 hjh/
drwxr-x--- 2 jiheng jiheng 4096 Nov 18 19:06 jiheng/
drwxr-x--- 15 rli7 rli7 4096 Nov 18 19:04 rli7/
drwxr-x--- 5 ncl ncl 4096 Nov 18 19:02 ncl/

那么,core02上不会有jiheng的用户信息:

bash
rli7@core02:/home$ su jiheng
su: user jiheng does not exist or the user entry does not contain all the required fields

rli7@core02:/home$ ll
drwxr-x--- 15 516 516 4096 Aug 30 12:33 hjh/
drwxr-x--- 2 1516 1516 4096 Nov 18 19:06 jiheng/
drwxr-x--- 15 rli7 rli7 4096 Nov 18 19:04 rli7/
drwxr-x--- 5 ncl ncl 4096 Nov 18 19:02 ncl/

此时,需要在core01打包用户信息:

bash
root@core01:/etc# make -C /var/yp
make: Entering directory '/var/yp'
gmake[1]: Entering directory '/var/yp/core01'
Updating passwd.byname...
Updating passwd.byuid...
Updating netid.byname...
Updating shadow.byname...
gmake[1]: Leaving directory '/var/yp/core01'
make: Leaving directory '/var/yp

yptest发现jiheng已经进入数据库:

bash
root@core01:/etc# yptest
Test 9: yp_all
ncl ncl:x:1001:1001::/home/ncl:/bin/bash
jiheng jiheng:x:1516:516:Jiheng Hu:/home/jiheng:/bin/bash
rli7 rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
nobody nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
All tests passed

此时,在core02可以登入jiheng新用户:

bash
rli7@core02:/home$ su jiheng
Password:
groups: cannot find name for group ID 516
jiheng@core02:/home$ ll
drwxr-x--- 15 516 516 4096 Aug 30 12:33 hjh/
drwxr-x--- 2 jiheng jiheng 4096 Nov 18 19:06 jiheng/
drwxr-x--- 15 rli7 rli7 4096 Nov 18 19:04 rli7/
drwxr-x--- 5 ncl ncl 4096 Nov 18 19:02 ncl/

用户信息的改动,新建删除和修改密码,都需要使用make -C /var/yp进行同步

Issues

  1. 1 tests failed
    在某些机器上,yptest时会有 Test 3:yp_match 有告警导致在最一行yp_all出现了 1 tests failed 报错,但是这个报错是可以忽略的。根据告警信息来看,是指在 passwd.byname 数据文件中没有找到 nobody 的记录,这是因为早期 nobody 用户的 UID 被设置为 65534,而现在 nobody 用户的 UID 被设置为 99(可以通过 id nobody 进行查看),所以没有被记录。在本个例中没有出现这一报错,因为可以找到nobody。

  2. 部分用户无法同步
    uid<1000的用户无法同步,如’core01’下的真实用户有:

    plaintext
    rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
    hjh:x:516:516:Jiheng Hu:/home/hjh:/bin/bash
    ncl:x:1001:1001::/home/ncl:/bin/bash

    但是core01和core02下都无法同步获取hjh(uid=516)这个用户的信息:

    plaintext
    Test 9: yp_all
    ncl ncl:x:1001:1001::/home/ncl:/bin/bash
    rli7 rli7:x:1000:1000:rli7:/home/rli7:/bin/bash
    nobody nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
    All tests passed

    在客户端无法用ncl@core02:~$ su hjh的方法切换。并且,共享文件加下的资源不会显示用户hjh,只显示其uid 516,因为用户不存在:

    bash
    ncl@core02:~$ ll /home
    total 20
    drwxr-xr-x 5 root root 4096 Aug 29 12:59 ./
    drwxr-xr-x 21 root root 4096 Nov 18 18:32 ../
    drwxr-x--- 15 516 516 4096 Aug 30 12:33 hjh/
    drwxr-x--- 15 rli7 rli7 4096 Nov 18 15:25 rli7/
    drwxr-x--- 5 ncl ncl 4096 Aug 29 13:15 ncl/

推测原因
因为Ubuntu默认1000一下的uid均为虚拟账户,useradd 添加的用户会默认uid=1000+,除非指定了uid。NIS服务会默认不对虚拟账户进行同步,否则会导致系统错乱,只对普通权限的用户进行同步。所以我们还是按照规则来

解决方法:usermod -u 1516 hjh
groupmod -g 1516 hjh
这样麻烦在于,需要对所有磁盘文件进行归属人变更:chown命令。

关于LDAP

ChatGPT推荐了另外一个选项LDAP轻型服务,可以替代比较老旧的Yellow Page服务,记录在这里,以后维护考虑一下:

LDAP Overview

LDAP, or Lightweight Directory Access Protocol, is an open and cross-platform protocol for accessing and maintaining distributed directory information services over a network. LDAP directories are often used for centralized storage of information like user accounts, group memberships, and network configurations.

LDAP directories consist of entries organized in a hierarchical tree structure. Each entry represents an object, and each object has attributes with values. The entries can represent users, groups, devices, and other types of entities.

Setting up LDAP on Ubuntu Cluster
1. Install LDAP Server

On each node in your cluster, install the LDAP server. For OpenLDAP, you can use the following:

bash
sudo apt update
sudo apt install slapd ldap-utils
2. Configure LDAP

Follow the configuration steps outlined below to configure the LDAP server. Pay attention to the organization name, domain name, and other parameters.

3. Cluster Considerations

If your cluster nodes need to share the LDAP data, you might need to set up replication or synchronization between LDAP servers on different nodes. OpenLDAP supports replication to keep data consistent across multiple servers. Consult the OpenLDAP documentation for details on setting up replication.

4. Client Configuration

On each node or client machine in your cluster, install LDAP client utilities:

bash
Copy code
sudo apt install ldap-utils
Configure the client to connect to the LDAP server(s) in your cluster. Edit /etc/ldap/ldap.conf and /etc/nsswitch.conf to specify LDAP as a source for user information.
5. Test Configuration

Use tools like ldapsearch to test the LDAP configuration on each node and ensure that LDAP clients can query the directory.

Please note that the specifics of setting up LDAP in a cluster may depend on the cluster type (e.g., Kubernetes, Hadoop, etc.) and your specific requirements. Always refer to the documentation of the LDAP server software you are using and any cluster management tools you have in place. Additionally, consider security aspects, such as encryption (SSL/TLS), and access control policies for your LDAP deployment.