Background
One of the main difficulties when developing an application is “imagining the form” of the application on the Production environment. The components/services that developers use to build the application on their local environment may differ greatly of those used on the environments of the deployment pipeline.
Admittedly, technologies like Vagrant and Docker aim to make it possible to replicate the Production environment along all the pipeline and bridge the gap between the different environments. In the same vein, DevOps approach promises to tackle this issue by automating the build of the environments so that all of them can be replicated easily and automatically. However there are many real life situations where the gap between Development and Production is still wide.
To illustrate this point, let’s consider a concrete application and my experience building it.
The application
The below diagram represents a microservice developed using Scala and Akka. The diagram is a simplified representation of the application.
The application exposes a REST endpoint to its clients. In order to fulfil the clients’ requests, the application needs to talk to a 3rd party via a SOAP endpoint and retrieve some data from a Redis repository to enrich the SOAP response.
Additionally, we want to send the logs of the application to Papertrail and collect metrics and send them to Datadog.
Developer’s environment
The first approach when configuring the local environment is ignoring all external dependencies. After all, the unit tests are meant to be run independently. This way, when writing unit tests, those dependencies can be mocked.
But sooner or later, the application will need to run end to end, putting all the pieces together, including external dependencies. Personally, I prefer to do this as soon as possible in my local environment. Having an early end to end version of the application is a good guide for the development process, even if that early version does nothing but gluing all the different pieces together.
For Redis, I choose a Docker container installed locally and therefore running in the same host as the application. Obviously, this is likely not to be the case in the Production environment.
To stand in for the 3rd party service, I have two options: using the test environment provided by the 3rd party or creating my own representation of the 3rd party with tools like Wiremock or https://getsandbox.com/. In reality, I use a combination of all these. Firstly, I use the 3rd party test environment to explore their API and the possible combinations of requests/responses. Then I can use the proxy mode of Wiremock to record those requests and responses. Finally, I can copy those combinations to getsandbox and create even more combinations by making manual changes in the original requests/responses. Personally, I like to work with Wiremock but using services like getsandbox makes it easier for non-technical people to collaborate in the creation of test cases.
Now, how to go about Papertrail and Datadog? They are provided as SaaS and you will need your organisation to give you access to them (assuming that your boss is ok with you and the rest of developers writing to these services from your local environments). And that is not all, you also need to figure out how to interact with those services.
For instance, in order to send data to Datadog, it is possible to use either Datadog’s HTTP API or DogStatsD. Depending on which one you use, you will need to write your code in a different way, choosing the appropriate client. However, it is up to DevOps to provide DogStatsD as part of the infrastructure. Unfortunately, in many organisations DevOps are a different department/team and communication is not always easy. In my case, DevOps agreed on using DogStatsD, so I ended up installing in my local environment the Datadog Agent with the DogStatsD metrics aggregation server included. As an alternative to DogStatsD for my local environment, I have sometimes used a Docker container like https://github.com/hopsoft/docker-graphite-statsd that contains a StatsD server plus Graphite as front-end dashboard to render the metrics. If you decide to do that as well, remember that DogStatsD includes some extensions for special Datadog features that do not exist on StatsD!!
As to Papertrail, there is a similar problem. Should we just write to a log file and have DevOps set up a process to pipe the content of the log file to Papertrail? Or should we write directly to Papertrail from the application using a specific appender like logback-syslog4j (assuming that we are allowed to write to Papertrail from our local environment)? In the present example, DevOps agreed to automate the send of the content of the log file to Papertrail.
In both cases, Datadog and Papertrail, it is recommendable to send some data to these services as soon as possible in the development cycle to make sure that the data sent to them suits our needs. Otherwise, we can meet unexpected surprises further down the line like not being able to apply some specific filters to analyse the data or not being able to group related data or events. Actually, creating searches, dashboards, graphs, etc. on these tools should be done in parallel with the corresponding development tasks so that we can get a fair idea of how our data is represented in these tools (at the end of the day, this is the only way to test this kind of functionality).
After all these considerations, the diagram of the application in my local environment is
Moving through the pipeline
Now that the developers’ local environment is set up, the next step is to move the application through the deployment pipeline. Ideally, the deployment pipeline should be ready in the early stages of the development so that the developers know what to expect in each environment. For instance, if the application is to be deployed on AWS, the Redis database in the diagram will be replaced by ElastiCache (another web service within the AWS ecosystem). Thus the queries on ElastiCache will need to travel over the wire as opposed to the situation on the local environment where Redis is on the same machine as the application. As a result, a valid question is whether it makes sense to modify the application to make an initial loading of Redis data into an in-memory data structure in order to ensure that the data is always read from memory in all the environments. This is just an example to illustrate the type of decisions that a developer needs to make based on their knowledge of the Production environment.