gopay.sh: The super-app for our engineers
Contributors: Giovanni Sakti, Giri Kuncoro, Imre Nagi, Ritesh Kumar
In 2020 alone, GoPay grew at tremendous pace. Among others, GoPayLater expanded threefold and GoInvestasi ballooned sevenfold! Such growth offers great opportunities for problem-solving and innovation, while at the same time pushes our engineers to put on their best problem-solving hat and collaborative spirit every single day. In one particular example, this is a behind-the-scenes story on solving a large-scale engineering problem through systematic problem solving and platform development.
Towards the end of 2020, the GoPay engineering group encountered some roadblocks during the planned service mesh adoption. Previously, the group has made a decision to start adopting service mesh so that we can gain better control of our service-to-service communication and improve its observability. We are early adopters of Envoy proxies and have been using them ubiquitously across our architecture by building an inhouse control plane. As our services grow, we wanted to move to a more mature solution with community support behind it.
With this reason, we decided to migrate to the service mesh setup, using Istio behind it while still using Envoy as sidecars in the data plane. The infrastructure engineering team has come up with steps that can be executed by product engineers as the service owners.
The problem is: the product engineers need to complete 15 steps to migrate their services, making adoption slow and difficult. Instead of building impactful features to end users, developers spent a lot of time on these thankless tasks.
Returning to the drawing board
After almost a year of running the migration campaign, only a fraction of total services were migrated to the service mesh setup. So to speed things up, the team returned to the drawing board to brainstorm a more effective solution.
The key ingredient of the proposed migration tooling is the ability to store and maintain detailed service information, including the source code repository detail, destination servers and countries, as well as capacity information; among others.
The initial idea during this brainstorm was to build a new tool to help with the migration process. However, the infrastructure engineering team has released a lot of tools already, each with a different interface and maintained by different squads within the team. Releasing a new tool would mean having another interface to learn, adopt and maintain. So we scrapped this.
After several back-and-forth discussions, the team identified a unique opportunity that has not been tried before: building a centralized platform as the only interface needed for product engineers to interact with all the functionalities provided by the infrastructure team. This idea passed many internal scrutinies.
Henceforth, a task force was formed. Their objective was to speak with as many potential stakeholders as possible and to gather their pain points and requirements for this new platform. The key findings from these discussion were:
- Custom requirements were not provided anywhere else yet, such as platform that has the ability to handle multiple cloud providers and on-premise data centers
- Integration with our existing build and deployment systems
At the same time, the same task force also benchmarked existing open source tools within this problem space, including Backstage and Clutch that was just released not a long time ago at that time. During the benchmarking with existing tools, we encountered some hiccups, such as:
- only minimal integration that are available
- persistent bugs
- incomplete documentations.
We fully understand that “open-source developer portal” is a new concept that was just starting to gain traction back then. Most of the available tools then were only several months old.
After many conversations as well as re-examination of our current infrastructure and system architecture, we identified several custom requirements not yet provided anywhere else. For example, we needed the plattform to have the ability to handle multiple cloud providers and on-premise data centers. We also needed an integration with our existing build and deployment systems. Building this functionality on the existing tools would mean writing custom integrations. This would have meant spending yet more time and effort to learn existing tools’ codebase plus writing the add-ons themselves.
Therefore, after much deliberation, the team decided to write a new platform instead, dubbed gopay.sh.
Gopay.sh: the superapp for engineers
At the heart of it, gopay.sh provides a detailed service catalog and ownership information that will serve as the single source of truth about services in our ecosystem. All other old and new functionalities provided by the Infrastructure Engineering team will be delivered as add-ons. This means gopay.sh needs to be heavily extensible from the get go.
In order to make it easy for anyone to build and integrate add-ons with gopay.sh, we decided to adopt an open standard: The Open Service Broker (OSB) specification. OSB is a set of API standards that can be implemented by a “service broker” or provider of a service, which can then be utilized by a “platform” to access and maintain the services. In our case, “service broker” is the add-on and “platform” is gopay.sh. This means that anyone, including people outside of the organization, can build compatible add-ons with gopay.sh and users can access them without having to know the details behind it.
Other than the service catalog and add-on integration functionality, gopay.sh also has another core functionality built around our need to deploy to service mesh and control the deployment of a service. Gopay.sh provides a tight integration with our existing deployment system using a combination of Gitlab CI and ArgoCD to run workloads on Kubernetes clusters in various environments. After supplying appropriate information into a service entry that can be done via a user-friendly web interface, gopay.sh will generate a manifest that users can add into the pipeline configuration. Afterwards, users can choose an appropriate add-on to expose their service, and WHALA … that’s it!
With gopay.sh, a process that typically took multiple days to complete can now be reduced to 30 minutes or less. In addition, we have also removed all irrelevant options that can be deduced or come pre-populated to lower the need for product engineers to learn many irrelevant contexts.
The first code commit of gopay.sh happened around December 2020 and by the end of March 2021, we have successfully onboarded more than 30% of our services to gopay.sh with participation from more than 50% of the teams. We have also built and/or integrated even more add-ons to the system and presented in KubeCon NA 2021 about our journey.
Arriving at this point did not come without challenges. Firstly, we need to update the clients when we move a service to a service mesh. This is something that we haven’t been able to automate so far, and is one of the main reasons why we still cannot accelerate the adoption process even further. Secondly, we had to overcome the initial inertia for major technology adoption among the stakeholders. And lastly, there are also very specific technical issues that we found during the Istio adoption, such as dealing with the transparent retry behaviour and various idle timeout issues but that’s a story for another time. Yes, this is a hint for future blog posts 🙂 - make sure you visit this blog often!
Despite all the challenges, we are in a better place today. With gopay.sh, we have reduced the number of interfaces that product engineers need to interact with, be it creation of new databases, accessing the monitoring dashboard, managing SLI/SLOs of a service, adjusting the capacity of a service, making adjustments to outgoing traffic behaviour; or other functionalities that used to exist in different interfaces before.
Today, more than 70% of GoPay engineers have been using the platform regularly and are happy that they can focus on delivering product features again, without getting interrupted by frequent migration tasks and infrastructure configuration details.
Looking back from the beginning of our service mesh migration journey, we made a good call by taking a step back and abstracting the migration infrastructure, giving birth to our gopay.sh developer platform. This approach significantly speeds up our migration progress and brings our long running goal to adopt service mesh into reality. Starting with the adoption from the service mesh migration, gopay.sh has solved many other developer productivity use cases and made the overall GoPay engineering organization more productive.
But we won’t stop here - we have a bias for continuing to innovate! We now envision a state where someone who just joined the GoPay engineering team can learn everything to become an effective team member by only interacting with gopay.sh. Yes, this also includes relevant documentation which can be accessed from the same interface and can also benefit more parties including product managers and business owners. But before all of this can happen, we still have some homeworks to improve and maintain the uptime of each functionality at the highest level and make the end to end user experience more seamless. The next challenge is on, and we’d love some fresh ideas!
Sounds fun? Interesting? Want to be involved? Then join our Community of Makers! Together with over 250 engineers in the region, we can build beautiful products that make a lasting impact on Indonesia's digital economy and financial inclusion.
To explore opportunities with GoTo Financial, check out this link.