check CP

To verify and debug the Control Plane Resource in the Cluster API Provider for Tinkerbell (CAPT), which is a critical component responsible for managing the Kubernetes control plane nodes, you can follow these steps:

1. Verify the Control Plane Resource Creation

The first step is to ensure that the KubeadmControlPlane resource, which manages the control plane, has been created correctly and is in the desired state.

Check KubeadmControlPlane Resource Status:

Use kubectl to check the status of the KubeadmControlPlane resource:

kubectl get kubeadmcontrolplanes -A
  • Expected Output: You should see the KubeadmControlPlane resources listed with their associated namespaces. The READY column should show the number of ready replicas compared to the desired number, indicating that the control plane is in a healthy state.

Describe the KubeadmControlPlane Resource:

For more details, you can describe the KubeadmControlPlane resource:

kubectl describe kubeadmcontrolplane <control-plane-name> -n <namespace>
  • Expected Output: This command provides detailed information about the KubeadmControlPlane resource, including its current status, conditions, events, and any errors or warnings.

2. Check Associated Machines

The KubeadmControlPlane resource manages a set of Machine resources that represent the control plane nodes. Ensure that these Machines are being created and managed correctly.

Check Machines Managed by KubeadmControlPlane:

kubectl get machines -l cluster.x-k8s.io/control-plane=<control-plane-name> -n <namespace>
  • Expected Output: This command lists all Machines associated with the KubeadmControlPlane. These Machines should be in a Running or Provisioned state.

Describe the Machines:

For more detailed debugging, describe one of the Machines:

kubectl describe machine <machine-name> -n <namespace>
  • Expected Output: Look for conditions, events, or error messages that might indicate issues with the Machines. The status should reflect that the Machines are in a healthy state and have been provisioned correctly.

3. Verify Control Plane Node Registration

Ensure that the control plane nodes are successfully registered as nodes in the Kubernetes cluster.

Check Node Status:

kubectl get nodes
  • Expected Output: The nodes corresponding to the Machines in your KubeadmControlPlane should be listed in the Kubernetes cluster, with a status of Ready.

Describe Nodes:

If a node is not Ready or is missing, describe the node for more information:

kubectl describe node <node-name>
  • Expected Output: This output will provide details on why the node might not be Ready, such as issues with the kubelet, network configuration, or connectivity to the control plane.

4. Check Associated Infrastructure Resources

The KubeadmControlPlane interacts with infrastructure-specific resources like TinkerbellMachineTemplate to define the configuration of the control plane nodes.

Check TinkerbellMachineTemplate:

kubectl get tinkerbellmachinetemplates -A
  • Expected Output: The TinkerbellMachineTemplate associated with your KubeadmControlPlane should be listed and in a Ready state.

Describe the TinkerbellMachineTemplate:

kubectl describe tinkerbellmachinetemplate <template-name> -n <namespace>
  • Expected Output: This command provides detailed information about the template, ensuring that the correct settings (e.g., hardware profile, OS image) are applied to the Machines created by the KubeadmControlPlane.

5. Check for Control Plane Initialization

Ensure that the control plane nodes have been correctly initialized. This is often indicated by the presence of conditions like Initialized, ControlPlaneInitialized, and Ready in the KubeadmControlPlane resource.

Check Initialization Conditions:

kubectl describe kubeadmcontrolplane <control-plane-name> -n <namespace>
  • Expected Output: The KubeadmControlPlane resource should show conditions such as Initialized and Ready, indicating that the control plane nodes have been successfully initialized.

6. Check Events in the Namespace

Kubernetes events can provide insights into what might be going wrong with the control plane or its associated components.

List Events in the Namespace:

kubectl get events -n <namespace> --sort-by='.metadata.creationTimestamp'
  • Expected Output: Review any warning or error events that might indicate problems with the control plane, such as failed node initialization, issues with etcd, or errors from the Tinkerbell infrastructure.

7. Verify Control Plane Scaling

The KubeadmControlPlane resource can manage scaling of the control plane. Ensure that the number of control plane replicas is being maintained as expected.

Check Replica Status:

kubectl get kubeadmcontrolplanes -A
  • Expected Output: The REPLICAS, READY, and UPDATED columns should show that the desired number of control plane nodes is being maintained.

Scale the Control Plane:

You can test scaling by adjusting the number of replicas:

kubectl scale kubeadmcontrolplane <control-plane-name> --replicas=<desired-number> -n <namespace>
  • Expected Output: The KubeadmControlPlane should create or delete control plane nodes to match the desired replica count. Verify that the number of control plane nodes corresponds to the new replica count.

8. Check Logs for CAPT Controller

If there are issues with the control plane resource, checking the logs of the CAPT controller can provide additional insights.

Check Logs for CAPT Controller:

kubectl logs -n capt-system <pod-name>

Replace <pod-name> with the actual pod name of the CAPT controller managing the KubeadmControlPlane resource.

  • Expected Output: The logs should detail any errors or issues encountered by the controller when managing the control plane resource, including interactions with Tinkerbell or Kubernetes API.

9. Advanced Debugging with Increased Verbosity

If you are still having trouble identifying the issue, you can increase the verbosity of the CAPT controller to gather more detailed logs:

  1. Edit the Deployment:
kubectl edit deployment capt-controller-manager -n capt-system
  1. Add Verbosity Flag:

Add --v=5 or --v=10 to the command section to enable more detailed logging.

  1. Check Logs Again:
kubectl logs -n capt-system <pod-name>

10. Interact with Tinkerbell via Tink CLI (Optional)

If you have direct access to the Tink CLI, you can interact with Tinkerbell resources directly to verify that the control plane nodes are being provisioned as expected:

tink hardware list
tink workflow list
  • Expected Output: The hardware and workflow related to the control plane nodes in the KubeadmControlPlane should be listed and should indicate whether the provisioning tasks have succeeded.

Conclusion

By following these steps, you can systematically verify and debug the KubeadmControlPlane resource and its interactions with other components in the Cluster API Provider for Tinkerbell. This process ensures that your Kubernetes control plane is correctly provisioned, managed, and scaled on bare-metal infrastructure, allowing for reliable operation and effective cluster management.