Site Reliability Engineering (SRE)

Application Deadline: 13/08/2024 - 30/09/2024

Tiki.vn is one of the most trusted e-commerce platforms in Vietnam. Our information system is composed of various components. From front web store that handles millions of product views daily to back-office systems that help automate warehouse management, order processing, third-party logistics providers integration, etc., ensuring that our orders are processed and delivered to customers with a short lead time and optimal cost, to analytics and CRM solutions that generate different near real-time reports, allowing us to make the right decision at the right time. As a member of the SysOps team, you will look after our system architecture, services deployment, monitor issues, granting access to systems, and maintenance to ensure all TIKI products are running healthily 24/7 and to provide a supreme user experience to our customers.


What you’d contribute:

● Monitor the technical performance of applications, and participate in performance and cost optimization, capacity planning, and bottleneck troubleshooting.

● Work with developers, and assist them in configuring, and debugging systems.

● Calculate system capacity and scale for high-traffic ticketing events.

● Automate user creation and system permissions

● Automate the deployment of system services, CI/CD.

● Set up alerts, monitor, and handle related incidents.

● Audit infrastructure for potential vulnerability and weaknesses, and make necessary changes to mitigate the risk.

● Provide routine operation documentation.


What you’d have to succeed for the role:

● At least 3 year of experience operating *nix OSes on servers and utilizing Linux as the main OS on personal laptop/PC. (ex: CentOS, Rocky, Ubuntu, Debian)

● Experience with infrastructure automation and configuration management (Ansible, Terraform...)

● Experience with the Kubernetes Ecosystem, Cloud Platforms (GCP, AWS)

● Experience with Distributed Systems.

● Understanding of Internet Protocols, and Networks.

● Knowledge of Monitoring/Logging: Graylog, Loki, Prometheus.

● Knowledge of Web Server & Load Balancer: Nginx, HAproxy, LVS.

● Knowledge of SQL/NoSQL/Search Engine: MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch.

● Knowledge of security: firewalls, IDS, and IPS.

● Experienced in managing high-traffic Websites.

● Proactive working attitude, open-minded and result-oriented.

● Good at Git and know how to use GitHub and GitLab.


What we love to offer:

● Hybrid working

● Attractive package + immediate healthcare insurance

● Full paid during the probation period

● Social insurance contributions paid on full salary

● Be coached by experienced & inspirational leaders and managers

● Tech application of autonomous robots and AI technologies

● Unlimited access to knowledge via learning library & via team knowledge-sharing

Application Form

Progressing...