Management instructions

Setup

Setup environment to access the usdf-panda k8s cluster.

  • Get the tokens to authenticate usdf-panda cluster https://k8s.slac.stanford.edu/usdf-panda. Login to it to get instructions to setup environment for usdf-panda.

  • Follow the instructions to setup the environments in a USDF bash environment, such as SLAC rubin-dev.

General instructions

Show services and pods:

# kubectl get all -n panda
NAME                                      READY   STATUS    RESTARTS   AGE
pod/bigmon-dev-main-0                     1/1     Running   0          6d21h
pod/bigmon-dev-main-1                     1/1     Running   0          6d21h
pod/idds-dev-rest-0                       1/1     Running   0          4d2h
pod/idds-dev-rest-1                       1/1     Running   0          4d2h
pod/msgsvc-dev-activemq-7c47bb7d7-97nd6   1/1     Running   0          20d
pod/panda-dev-jedi-0                      1/1     Running   0          4h19m
pod/panda-dev-jedi-1                      1/1     Running   0          4h20m
pod/panda-dev-server-0                    1/1     Running   0          4h19m
pod/panda-dev-server-1                    1/1     Running   0          4h20m

NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/bigmon-dev-main       ClusterIP   10.98.243.181    <none>        443/TCP             6d21h
service/idds-dev-rest         ClusterIP   10.108.12.12     <none>        8443/TCP,8080/TCP   4d2h
service/msgsvc-dev-activemq   ClusterIP   10.103.208.12    <none>        61613/TCP           110d
service/panda-dev-jedi        ClusterIP   10.104.113.232   <none>        80/TCP,443/TCP      4d17h
service/panda-dev-server      ClusterIP   10.96.29.71      <none>        80/TCP,443/TCP      4d17h

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/msgsvc-dev-activemq   1/1     1            1           110d

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/msgsvc-dev-activemq-7c47bb7d7   1         1         1       110d

NAME                                READY   AGE
statefulset.apps/bigmon-dev-main    2/2     6d21h
statefulset.apps/idds-dev-rest      2/2     4d2h
statefulset.apps/panda-dev-jedi     2/2     4d17h
statefulset.apps/panda-dev-server   2/2     4d17h

Show secrets:

# kubectl get secrets -n panda

Show ingress:

# kubectl get ingress -n panda

NAME               CLASS    HOSTS                                      ADDRESS   PORTS     AGE
bigmon-dev-main    <none>   rubin-panda-bigmon-dev.slac.stanford.edu             80, 443   6d21h
idds-dev-rest      <none>   rubin-panda-idds-dev.slac.stanford.edu               80, 443   4d2h
panda-dev-server   <none>   rubin-panda-server-dev.slac.stanford.edu             80, 443   4d17h

Describe pod details:

kubectl describe <k8s service type: pod, service, ingress, secret and so on> <podname> -n panda
Eg: kubectl describe pod bigmon-dev-main-0 -n panda
Eg: kubectl describe statefulset bigmon-dev-main -n panda

Eg: kubectl describe statefulset bigmon-dev-main -n panda

Login to pods:

kubectl exec -n panda -it <podname> -- bash
Eg: kubectl exec -n panda -it bigmon-dev-main-0 -- bash

Pod logs:

After logging into a pod, normally the pod logs can be found under /var/log/panda, /var/log/idds.
To understand what the pod is running, you can ‘ps -ef’ or ‘ps -ef|grep sh’ to list all processes. The first process will tell you what the pod is doing.

Harvester debug:

kubectl exec -n panda -it harvester-dev-0 -- bash
# Check condor
source  /data/condor/condor/condor.sh
condor_q

# Check harvester logs, for example:
grep 'workers status' /var/log/panda/panda-submitter.loga

# check whether httpd is not running (http is used for exporting harvester jobs' logs)
ps -ef|grep http|grep -v grep
# if no http processes, start it with this command
runuser -u atlpan -g zp -- /sbin/httpd

# check whether harvester service is running
ps -ef|grep uwsgi|grep -v grep
# stop/start harvester services
runuser -u atlpan -g zp -- /opt/harvester/etc/rc.d/init.d/panda_harvester-uwsgi stop
runuser -u atlpan -g zp -- /opt/harvester/etc/rc.d/init.d/panda_harvester-uwsgi start

Panda server debug:

kubectl exec -n panda -it panda-dev-server-0 -- bash
# start/stop panda server (it has http web services and master.py daemons)
runuser -u atlpan -g zp -- /etc/rc.d/init.d/httpd-pandasrv stop
runuser -u atlpan -g zp -- /etc/rc.d/init.d/httpd-pandasrv start

# Http server logs
tail -f /var/log/panda/panda_server_error_log
ls /var/log/panda/panda-JobDispatcher.log
ls /var/log/panda/panda-UserIF.log

# Daemon logs
tail -f /var/log/panda/panda_daemon_stdout.log
ls /var/log/panda/*.log
ls /var/log/panda/panda-DBProxy.log

Panda jedi debug:

kubectl exec -n panda -it panda-dev-jedi-0 -- bash
# start/stop panda jedi
runuser -u atlpan -g zp -- /etc/rc.d/init.d/panda-jedi stop
runuser -u atlpan -g zp -- /etc/rc.d/init.d/panda-jedi start

# Logs
ls /var/log/panda/

iDDS debug:

kubectl exec -n panda -it idds-dev-rest-0 -- bash
# status/start/stop services
supervisorctl status
supervisorctl start <servicename|all>
supervisorctl stop <servicename|all>

# Check logs
ll /var/log/idds/

Other commands:

# iDDS
curl -sS -iv -k --request GET https://usdf-panda-idds.slac.stanford.edu:8443/idds/ping

# PanDA monitor
curl -sS -iv -k --request GET https://usdf-panda-bigmon.slac.stanford.edu:8443/idds/wfprogress

# openssl to verify the certificate
openssl s_client -verify_return_error -connect https://usdf-panda-bigmon.slac.stanford.edu:8443