Site Reliability Engineer

Application Deadline: 20/03/2025 - 30/04/2025


Tiki.vn is one of the most trusted e-commerce platforms in Vietnam. Our information system is composed of various components. From front web store that handles millions of product views daily to back-office systems that help automate warehouse management, order processing, third-party logistics providers integration, etc., ensuring that our orders are processed and delivered to customers with a short lead time and optimal cost, to analytics and CRM solutions that generate different near real-time reports, allowing us to make the right decision at the right time.  

As a member of the SysOps team, you will look after our system architecture, services deployment, monitor issues, granting access to systems, and maintenance to ensure all TIKI products are running healthily 24/7 and to provide a supreme user experience to our customers. 


What you'd have to succeed:

  • > 2 years of experience operating *nix OSes on servers and utilizing Linux as the main OS on personal laptop/PC. (ex: CentOS, Rocky, Ubuntu, Debian).

  • Have at least 1 year of experience in the relevant position.

  • Have at least 1 year of experience with infrastructure automation and configuration management (Ansible, Terraform, etc.).

  • Have at least 1 year of experience with the Kubernetes Ecosystem.

  • Experience working with cloud platforms (GCP, AWS) is advantageous.

  • Experience with Distributed Systems.

  • Understanding of Internet Protocols and Networks.

  • Knowledge of Monitoring/Logging: Graylog, Loki, Prometheus.

  • Knowledge of Web Server & Load Balancer: Nginx, HAproxy, LVS.

  • Proactive working attitude, open-minded, and result-oriented.

  • Fluent in Git and know how to use GitHub and GitLab.

  • Available for off-hour support.

  • Ability to work under high pressure.

What you'd contribute:

  • 24/7 system support.

  • Setup, manage, and maintain various Tiki.vn, Warehouse systems. 

  • Monitor technical performance of applications, and participate in performance and cost optimization, capacity planning, and bottleneck troubleshooting.

  • Worked with developers and assisted them in configuring and debugging systems.

  • Calculate system capacity and scale for high-traffic ticketing events.

  • Automate user creation and system permissions

  • Automate the deployment of system services, CI/CD.

  • Set up alerts, monitor, and handle related incidents.

  • Audit infrastructure for potential vulnerability and weaknesses, and make necessary changes to mitigate the risk.

  • Provide routine operation documentation.



Application Form