Monitor your application health with distributed checks
In modern distributed systems, health checks play a significant role in the maintenance of complex deployments. Health checks provide a way to monitor and verify the status of your service fleet and can be used to automatically react to failures or excessive load. Consul uses distributed health checks to maintain an updated catalog of all services and nodes registered within the datacenter and to periodically verify services' behavior.
The Consul DNS service provides a standard interface to query the Consul catalog. Using Consul FQDN for service configuration, instead of IPs, you can simplify application configuration. One further simplification provided by Consul is the load-balancing feature provided across multiple instances of the same service. This is completely transparent to the downstream service instances and does not require extra configuration. Consul DNS service prevents unhealthy service instances to be resolved and simplifies the configuration process for your applications that do not have to change configuration in case of service failure.
Consul DNS service is based on the service definition files that the different Consul agents provide and uses a distributed health check system to ensure reliability of DNS results.
In this tutorial you will learn how to write a service definition to register your service in Consul and how to create health checks to make sure your service is available only when fully functional.
Specifically, you will:
- Review the ACL permissions required to register a Consul service
- Generate a token for a Consul service using the
service-identity
policy - Create a service configuration for the
hashicups-api
service - Reload Consul configuration to include the newly configured service
- Verify service registration
- Verify the load-balancing feature provided by Consul DNS
Tutorial scenario
This tutorial uses HashiCups, a demo coffee shop application made up of several microservices running on VMs.
At the beginning of the tutorial, you have a fully deployed Consul datacenter with an instance of the HashiCups application, composed by four services, NGINX, Frontend, API, and Database, deployed and registered in Consul catalog.
In the Consul datacenter, there is an extra node that you want to use to host a second instance of the API service for HashiCups, that you want to include in the datacenter.
By the end of this tutorial, you will have configured the API service to be part of Consul catalog.
This tutorial will focus specifically on service and health checks definitions to register a service in Consul catalog.
Once the service is registered in Consul, it will automatically appear in Consul DNS results and Consul will automatically load balance traffic across the different instances of the service.
All operations in this tutorial, after the scenario deployment, will be performed from an additional node, a bastion host, that will be deployed as part of the scenario. The tutorial provides instructions on how to connect to the bastion host at the end of the deployment.
Prerequisites
This tutorial assumes you are already familiar with Consul service discovery and its core functionalities. If you are new to Consul refer to refer to the Consul Getting Started tutorials collection.
If you want to follow along with this tutorial and you do not already have the required infrastructure in place, the following steps guide you through the process to deploy a demo application and a configured Consul datacenter on AWS automatically using Terraform.
To create a Consul deployment on AWS using terraform, you need the following:
Clone GitHub repository
Clone the GitHub repository containing the configuration files and resources.
$ git clone https://github.com/hashicorp-education/learn-consul-health-checks-vms
Enter the directory that contains the configuration files for this tutorial.
$ cd learn-consul-health-checks-vms/self-managed/infrastructure/aws
Create infrastructure
With these Terraform configuration files, you are ready to deploy your infrastructure.
Issue the terraform init
command from your working directory to download the
necessary providers and initialize the backend.
$ terraform init
Initializing the backend...
Initializing provider plugins...
...
Terraform has been successfully initialized!
...
Then, deploy the resources. Confirm the run by entering yes
.
$ terraform apply -var-file=../../ops/conf/monitor_application_health_with_distributed_checks.tfvars
## ...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
## ...
Apply complete! Resources: 49 added, 0 changed, 0 destroyed.
Tip
The Terraform deployment could take up to 15 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for the environment to complete initialization or learn more about the Raft protocol in a fun and interactive way.
After the deployment is complete, Terraform returns a list of outputs you can use to interact with the newly created environment.
Outputs:
connection_string = "ssh -i certs/id_rsa.pem admin@`terraform output -raw ip_bastion`"
ip_bastion = "<redacted-output>"
remote_ops = "export BASTION_HOST=<redacted-output>"
retry_join = "provider=aws tag_key=ConsulJoinTag tag_value=auto-join-hcoc"
ui_consul = "https://<redacted-output>:8443"
ui_grafana = "http://<redacted-output>:3000/d/hashicups/hashicups"
ui_hashicups = "http://<redacted-output>"
The Terraform outputs provide useful information, including the bastion host IP address. The following is a brief description of the Terraform outputs:
- The
ip_bastion
provides IP address of the bastion host you use to run the rest of the commands in this tutorial. - The
remote_ops
lists the bastion host IP, which you can use access the bastion host. - The
retry_join
output lists Consul'sretry_join
configuration parameter. The next tutorial uses this information to generate Consul server and client configuration. - The
ui_consul
output lists the Consul UI address. The Consul UI is not currently running. You will use the Consul UI in a later tutorial to verify that Consul started correctly. - The
ui_grafana
output lists the Grafana UI address. You will use this address in a future tutorial. - The
ui_hashicups
output lists the HashiCups UI address. You can open this address in a web browser to verify the HashiCups demo application is running properly.
List AWS instances
The scenario deploys seven virtual machines.
$ terraform state list
## ...
aws_instance.api[0]
aws_instance.api[1]
aws_instance.bastion
aws_instance.consul_server[0]
aws_instance.database[0]
aws_instance.frontend[0]
aws_instance.nginx[0]
## ...
After deployment, six virtual machines, consul_server[0]
, database[0]
, frontend[0]
, api[0]
, api[1]
, and nginx[0]
are configured in a Consul datacenter with service discovery. Node api[1]
does not have a service registered but is already a member of Consul datacenter.
The remaining node, bastion
is used to perform the tutorial steps.
Login into the bastion host VM
Login to the bastion host using ssh
.
$ ssh -i certs/id_rsa.pem admin@`terraform output -raw ip_bastion`
#...
admin@bastion:~$
Configure CLI to interact with Consul
Configure your bastion host to communicate with your Consul environment using the two dynamically generated environment variable files.
$ source "/home/admin/assets/scenario/env-scenario.env" && \
source "/home/admin/assets/scenario/env-consul.env"
That will produce no output.
After loading the needed variables, verify you can connect to your Consul datacenter.
$ consul members
That will produce an output similar to the following.
Node Address Status Type Build Protocol DC Partition Segment
consul-server-0 172.22.0.8:8301 alive server 1.17.1 2 dc1 default <all>
hashicups-api-0 172.22.0.2:8301 alive client 1.17.1 2 dc1 default <default>
hashicups-api-1 172.22.0.3:8301 alive client 1.17.1 2 dc1 default <default>
hashicups-db-0 172.22.0.4:8301 alive client 1.17.1 2 dc1 default <default>
hashicups-frontend-0 172.22.0.5:8301 alive client 1.17.1 2 dc1 default <default>
hashicups-nginx-0 172.22.0.6:8301 alive client 1.17.1 2 dc1 default <default>
Use consul catalog services
to query Consul catalog for the services present in the datacenter.
$ consul catalog services -tags
consul
hashicups-api inst_0
hashicups-db inst_0
hashicups-frontend inst_0
hashicups-nginx inst_0
Every service for the HashiCups application is tagged with the inst_0
tag to indicate they are the first instance of the service.
Create ACL token for service registration
When ACLs are enabled, service registration requires a token with write permission over the service name you want to register. In this case service name is hashicups-api
.
Service identities let you to quickly construct policies for services, rather than manually creating identical polices for each service instance and for their companion sidecar proxy. Generate a new token for hashicups-api-1
using service-identity
.
$ consul acl token create \
-description="SVC HashiCups API token" \
--format json \
-service-identity="hashicups-api" | tee /home/admin/assets/scenario/conf/secrets/acl-token-svc-hashicups-api-1.json
That will produce an output similar to the following.
{
"CreateIndex": 97,
"ModifyIndex": 97,
"AccessorID": "cdf3909b-16cf-a5ea-bcb8-3b3b19fe0bd0",
"SecretID": "925cee23-8b47-5ef2-2f20-1fb40cdcbf0d",
"Description": "SVC HashiCups API token",
"ServiceIdentities": [
{
"ServiceName": "hashicups-api"
}
],
"Local": false,
"CreateTime": "2024-01-17T13:43:09.166273172Z",
"Hash": "RtpgRlE69uEOWjeuaBUG2ezUM37J4OxeUhFaaRJaEac="
}
Set your newly generated token as the CONSUL_AGENT_TOKEN
environment variable. You will use this variable later in the tutorial to generate the service definition file.
$ export CONSUL_AGENT_TOKEN=`cat /home/admin/assets/scenario/conf/secrets/acl-token-svc-hashicups-api-1.json | jq -r ".SecretID"`
Review service configuration for Consul services
Consul allows multiple ways to register a service in the service catalog:
- Creating a service definition file in the agent's configuration directory
- Using the
consul services register
CLI command - Calling the
/agent/service/register
HTTP API endpoint
Each of these methods will persist the registration in the Consul data folder.
To make sure your service registration survives data folder corruption, in this tutorial you will configure a service by placing a service definition file inside the agent's configuration directory.
Each service configuration is composed by two parts, service definitions and health checks definitions.
Service definition
The service definition requires the following parameters to be defined:
name
- the service name to register. Multiple instances of the same service share the same service name.id
- the service id. Multiple instances of the same service require a unambiguous id to be registered.tags
- [optional] - tags to assign to the service instance. Useful for blue-green deployment, canary deployment or to identify services inside Consul datacenter.port
- the port your service is exposing.token
- a token to be used during service registration. This is the token you created in the previous section.
Below an example configuration for the service definition.
service {
name = "hashicups-api"
id = "hashicups-api-1"
tags = [ "inst_1" ]
port = 8081
token = "925cee23-8b47-5ef2-2f20-1fb40cdcbf0d"
}
Read more on service definitions at Services configuration reference
Checks definition.
Consul provides distributed monitoring for your services when health checks for the service are configured. Health checks configurations are nested in the service block. They can be defined using the following parameters:
id
- unique string value that specifies an ID for the check.name
- required string value that specifies the name of the check.service_id
- specifies the ID of a service instance to associate with a service check.interval
- specifies how frequently to run the check.timeout
- specifies how long unsuccessful requests take to end with a timeout.
The other parameter required to define the check is the type. Consul supports multiple check types, but for this tutorial you will use the TCP and HTTP check types.
A tcp check establishes connections to the specified IPs or hosts. If the check successfully establishes a connection, the service status is reported as success
.
If the IP or host does not accept the connection, the service status is reported as critical
.
An example of tcp check for the hashicups-api service, listening on port 8081
is the following:
{
id = "check-hashicups-api.public",
name = "hashicups-api.public status check",
service_id = "hashicups-api-1",
tcp = "localhost:8081",
interval = "5s",
timeout = "5s"
}
HTTP checks send an HTTP request to the URL specified in the definition and report the service health based on the HTTP response code.
An example of tcp check for the hashicups-api service, which exposes an health
endpoint to test service status, is the following:
{
id = "check-hashicups-api.public.http",
name = "hashicups-api.public HTTP status check",
service_id = "hashicups-api-1",
http = "http://localhost:8081/health",
interval = "5s",
timeout = "5s"
}
Create service configuration for HashiCups API service
The HashiCups API service is composed by three different services, public-api
, product-api
, and payments
, listening respectively on port 8081
, 9090
, and 8080
. The only service that requires external access is the public-api
on port 8081
.
Create service configuration file.
$ tee /home/admin/assets/scenario/conf/hashicups-api-1/svc-hashicups-api.hcl > /dev/null << EOF
## -----------------------------
## svc-hashicups-api.hcl
## -----------------------------
service {
# Name of the service as it will appear in Consul catalog
name = "hashicups-api"
# Unambiguous ID, to distinguish the service instance from other instances of the same service
id = "hashicups-api-1"
# Tags for the service instance. A service instance can be reched using the
# <tag>.<service>.service.<datacenter>.<domain> FQDN
tags = [ "inst_1" ]
# Port number for the service
port = 8081
# ACL token to present when registering the service
token = "${CONSUL_AGENT_TOKEN}"
checks =[
{
id = "check-hashicups-api.public.http",
name = "hashicups-api.public HTTP status check",
service_id = "hashicups-api-1",
http = "http://localhost:8081/health",
interval = "5s",
timeout = "5s"
},
{
id = "check-hashicups-api.public",
name = "hashicups-api.public status check",
service_id = "hashicups-api-1",
tcp = "localhost:8081",
interval = "5s",
timeout = "5s"
},
{
id = "check-hashicups-api.product",
name = "hashicups-api.product status check",
service_id = "hashicups-api-1",
tcp = "localhost:9090",
interval = "5s",
timeout = "5s"
},
{
id = "check-hashicups-api.payments",
name = "hashicups-api.payments status check",
service_id = "hashicups-api-1",
tcp = "localhost:8080",
interval = "5s",
timeout = "5s"
}]
}
EOF
Tip
To simplify tests, all services already present in the Consul catalog have a tag assigned, inst_0
. The new instance uses a different tag, inst_1
.
Note
The service definition file can be adapted for Consul service mesh by adding a `connect` section in the file. Learn more on this on Securely connect your services with Consul service meshCopy the configuration file on the hashicups-api-1
node.
$ scp -r -i /home/admin/certs/id_rsa /home/admin/assets/scenario/conf/hashicups-api-1/svc-hashicups-api.hcl admin@hashicups-api-1:/etc/consul.d/svc-hashicups-api.hcl
svc-hashicups-api.hcl
Login to hashicups-api-1
from the bastion host.
$ ssh -i certs/id_rsa hashicups-api-1
#..
admin@hashicups-api-1:~
Verify the VM contains the files required for Consul configuration.
$ cat /etc/consul.d/svc-hashicups-api.hcl
That will produce an output similar to the following.
## -----------------------------
## svc-hashicups-api.hcl
## -----------------------------
service {
# Name of the service as it will appear in Consul catalog
name = "hashicups-api"
# Unambiguous ID, to distinguish the service instance from other instances of the same service
id = "hashicups-api-1"
# Tags for the service instance. A service instance can be reched using the
# <tag>.<service>.service.<datacenter>.<domain> FQDN
tags = [ "inst_1" ]
# Port number for the service
port = 8081
# ACL token to present when registering the service
token = "925cee23-8b47-5ef2-2f20-1fb40cdcbf0d"
checks =[
{
id = "check-hashicups-api.public.http",
name = "hashicups-api.public HTTP status check",
service_id = "hashicups-api-1",
http = "http://localhost:8081/health",
interval = "5s",
timeout = "5s"
},
{
id = "check-hashicups-api.public",
name = "hashicups-api.public status check",
service_id = "hashicups-api-1",
tcp = "localhost:8081",
interval = "5s",
timeout = "5s"
},
{
id = "check-hashicups-api.product",
name = "hashicups-api.product status check",
service_id = "hashicups-api-1",
tcp = "localhost:9090",
interval = "5s",
timeout = "5s"
},
{
id = "check-hashicups-api.payments",
name = "hashicups-api.payments status check",
service_id = "hashicups-api-1",
tcp = "localhost:8080",
interval = "5s",
timeout = "5s"
}]
}
Reload Consul to apply the service configuration.
$ consul reload
Configuration reload triggered
Start hashicups-api
service.
$ ~/start_service.sh
The output of the command will show some Docker output while the Docker images that compose the services are being pulled on the node.
To continue with the tutorial, exit the ssh session to return to the bastion host.
$ exit
logout
Connection to hashicups-api-1 closed.
admin@bastion:~$
Verify service registration
The service will now start appearing among Consul DNS results when a query is performed. Use the inst_1
tag in the results to recognize the new instance or in the query to filter the DNS results. Consul DNS service can be queried using all interfaces provided by Consul.
API interface and UI can also be used to retrieve information about the health checks configured for the service.
Use the consul catalog
command to query the Consul catalog and show all available services.
$ consul catalog services -tags
consul
hashicups-api inst_0,inst_1
hashicups-db inst_0
hashicups-frontend inst_0
hashicups-nginx inst_0
Notice that the hashicups-api
instance presents two different tags, inst_0
and inst_1
, that indicate that both instances are registered within Consul.
Verify Consul load balancing functionalities
When multiple instances of a service are defined, Consul DNS will automatically provide basic round-robin load balancing capabilities.
To test it, you will make 100 requests to the Consul DNS service asking to resolve the hashicups-api.service.dc1.consul
name and count the different results.
$ for i in `seq 1 100` ; do dig @consul-server-0 -p 8600 hashicups-api.service.dc1.consul +short | head -1; done | sort | uniq -c
That will produce an output similar to the following.
54 172.22.0.2
46 172.22.0.3
Notice that the request was balanced across the two different instances of the hashicups-api
service.
Destroy the infrastructure
Now that the tutorial is complete, clean up the infrastructure you created.
From the ./self-managed/infrastruture/aws
folder of the repository, use terraform
to destroy the infrastructure.
$ terraform destroy --auto-approve
Next steps
In this tutorial you learned how to register a service in a Consul datacenter by writing a service definition with associated health checks. You created a token with the necessary permissions to register the service, generated a definition for the hashicups-api
service including health checks for all the different components of the service, and registered the service in Consul catalog by reloading the Consul agent on the hashicups-api-1
node. You then used all Consul interfaces, CLI, API, DNS, and UI, to verify the service got registered correctly and learned how to identify the different instances of the service using the inst_1
tag. Finally, you tested Consul DNS load-balancing capabilities by performing multiple queries for the same service and verifying the number of request was roughly even across the two instances.
For more information about the topics covered in this tutorial, refer to the following resources:
To learn more about other features provided by Consul DNS service to: