Senior Cloud Engineer - SRE
Infrastructure, IT & SecurityBucharest | Czech Republic | Remote
These roles are open to full and partial remote from France, Romania and Czech Republic.
You have most likely used Algolia in the last week without even knowing about it. What about joining the team and enabling more developers to build great search experiences with little worry about the reliability of their search engine?
Site Reliability Engineers (SRE) at Algolia are both software and systems engineers that ensure we can reliably serve over 4 billion queries every day and over 1 trillion queries a year, for users all around the world, despite data centers being on fire and undersea cables being cut. Since at Algolia we operate many services including our Search API, DocSearch and Analytics, you’ll keep learning new things everyday and share what you have learned.
The platform we develop uses both cloud and bare-metal systems spanning over 80 data centers in 17 different regions serving hundreds of millions of users from every corner of the globe. Because search is a critical component of many applications, the SRE team maintains a high level of expertise in system failures in order to prevent them and provide reliable service to our customers.
As a Site Reliability Engineer you’ll actively work with software engineers in application teams to improve the reliability, predictability and performance of our applications and services. While part of the application team you’ll closely work with the SRE community of engineers at Algolia and share the knowledge and needs of your application team.
No two problems are the same because all the systems evolve all the time. We expect you to be a curious problem solver who isn’t afraid to think outside of the box and use the knowledge of system interactions in your favor. When you’re ready, you’ll also take ownership of complete projects and execute them.The team is composed of engineers with different backgrounds and experience both in the industry and academia, both senior and junior. The diversity works in our favour and you should increase it by bringing your experience, your knowledge and your point of view. Thinking differently is a plus, not a minus. We’re transparent with each other and to other teams both about our success and our failures. This way we learn, we accept our weaknesses and continuously strive to improve both personally and professionally.
This is a full-time opportunity open to full and partial remote from France, Romania, & Czech Republic.
YOUR ROLE WILL CONSIST OF:
- Being a team player
- Working with other teams to identify, troubleshoot, and resolve high impact issues
- Evaluating performance of current and future systems, both software and hardware
- Participating in design of new systems
- Developing and maintain the automation tools used for all systems
- Participating in on-call rotation to ensure fast response to production issues
- Ensuring that the Infrastructure best practices are followed
YOU MIGHT BE A FIT IF YOU HAVE:
- Collaborative approach to problem solving
- Willingness to make independent decisions and taking ownership for them
- 4+ years of software engineering experience
- Knowledge of Shell scripting and at least one scripting language (Python, Ruby, etc.)
- Willingness to learn Go (golang)
- Understanding of Linux systems: I/O, process scheduling, filesystems
- Understanding of computer networks: TCP/IP, DNS, load-balancing
- Proficient spoken and written English skills
- Rigor in high code quality, automated testing and other engineering best practices
NICE TO HAVE:
- Knowledge of low level principles of computers and network components
- Performance profiling of applications both in development and production
- Knowledge of Public Cloud platforms (AWS, GCP, Azure)
- Knowledge of Go (golang)
- Knowledge of automated integration tests
- Knowledge of Chaos engineering
- Ability to use a configuration management tool like Ansible, Puppet or Chef