Since Titan’s launch in 2018, the product has grown quickly and gained traction from small businesses across the globe. The teams involved in the creation of Titan—engineering, product development, marketing, design, and business development—have contributed immensely to its growth. One such team who deserves a shout-out at Titan is our DevOps Engineering team who works to ensure that Titan’s services are always up and running with high reliability.
Team Devops @Titan
The DevOps team at Titan, apart from normal development operations, has not only significantly reduced operational toil but also has significantly contributed to the reliability of Titan as a service provider.
As it often goes, with the growth of the product and the team comes the growth of technical challenges. Some of the challenges the team faces are very common across organizations. But others take some serious problem solving and teamwork to find solutions. Unlike other operation teams, our focus is not only on making the infrastructure resilient, but actively contributing to production-facing solutions that handle large-scale traffic. Let’s look at some of the work undertaken by the team.
Work highlights from the team
All below work is specific to the DevOps team. The tools/services/automation mentioned are all maintained in-house and developed by Titan’s DevOps team
Programming/Scripting/Automation
- The team spends 40-60% of their time on programming. We have automated most repetitive tasks and we take pride in writing our own services.
- For frontend, we use React/Jquery/Bootstrap etc.
- As a backend framework, we use Flask.
- Our critical software services are written in Golang while scripting is done using Python/Lua.
- We have written custom milters, policy services ,ratelimiters and more internal extensions and services.
- We have migrated TBs of data using an internal tool, “IMF” (Internal Migration Framework)
- We value speed and efficiency—as a result, when our infrastructure started to grow, one of our team members wrote a small tool called aws_shortcuts, which quickly became a crucial part of our day-to-day operations. (Stay tuned for more info on this in the near future!)
- The team has also modified a few open source software to be used internally to suit our needs with correct license attributions.
Deployments and Migrations
- Naah! At Titan, DevOps doesn’t do deployments.
- Our deployment pipelines are enabled in such a way that gives our Developers the power to deploy 24×7 without the help of DevOps. Similar is the case with database migrations.
- We have enough to say about this topic to create another blog post—so we will! Stay tuned.
Dashboards
- Be it an analytics dashboard, managing access to servers and applications, or finding important information, we wrote all of the dashboard. A few examples are:
- A dashboard for support teams so they can find a needle in the haystack and quickly resolve any issues
- Abuse management for email infrastructure
- Add developers’s access to 50 servers? Our automation got it covered without the team’s involvement.
Knack for customer experience
- We prioritise customer experience, and thus have built services like Mailtracker which helps to: Track each incoming/outgoing mail flowing through our distributed infrastructure.
- Quickly troubleshoot/resolve any customer issues.
- Aid in better capacity planning as well as help us fight abuse.
Abuse management
- Abuse, or fighting spam, is one of the most complicated problems in the email industry.
- Our DevOps team has built a lot of solutions and automation around this issue. For example:
- We automatically detect, analyze, and block abusers based on their usage patterns.
- We built a system to block phishing and reduce bounces.
- We created dashboards for the abuse team to get better visibility into spam complaints and the customer usage patterns to aid in better decision making.
- We built tools around managing Titan’s Outgoing IP management.
- We proactively detect spam accounts before they cause any damage.
Monitoring and On-call management
Monitoring and on-call management are some of the most important aspects for any operations team, and we have this covered as well.
- We use Prometheus for monitoring ,Consul for service discovery and Victoriametrics as a time series database, Grafana and Kibana for visualizations
- We are also using ELK and Loki as our logging solutions at appropriate places.
- We have written an in-house tool called “Sentinel” which notifies respective Dev/Devops based on the role via SMS/call (This is similar to what we know as PagerDuty today)
- We have also written the oncall calendar (a tool used to manage on-call schedule/rotations)
Containers
- We hear you… and like everyone else, we leverage containerised applications and environments (ecs,eks) for our microservices.
- We are actively moving our production services to Kubernetes
Did we miss anything? Of course, if we were to mention everything the team contributes, you’d be reading this blog post all day. So check back as we continue to share more about Titan’s DevOps team and the processes we’ve created to ensure efficiency, accuracy, high availability and reliability.
By the way—Titan is always looking for exceptional talent to join our DevOps team. Maybe you’re the hire we’ve been looking for :). Interested in talking about this with us more? Apply here.