Site Reliability Enigneer

Tokyo, Japan, Infostellar [SRE01]

Field(s) of expertise
Aerospace Engineering Information Technology
Job type

About this job

Infostellar is looking to build a new normal for the ground-services segment of the space industry, and our Site Reliability Engineers (SREs) are going to be a big part of that. Focused on building scalable platforms and tools to keep Infostellar stable, flexible, and above all, reliable; ideal candidates need to be passionate about automating both preemptive and reactive mechanisms and processes.
We’re specifically looking for a senior hire, meaning you will be expected to have intimate knowledge of Kubernetes, infrastructure as code tools like Terraform, and technologies such as Google Cloud Platform. Like any startup, we value individuals who are self-motivated and professional, capable of excellent, error-free code with minimal supervision, and who should be comfortable providing technical leadership or mentorship to more junior engineers.
If you’re driven by interesting, challenging problems that require sound software engineering solutions, and have always dreamt of working in the space industry, then do reach out. We’re building something special, and we’d love for you to be a part of it.

Development environment
● Our servers are written using Java (v.13+) built on Armeria and gRPC
● Please have a look at our open source repositories: https://github.com/infostellarinc/
● We do infrastructure management using Terraform and orchestrate our containers using
● All of our services are in Google Cloud
● We use Gradle to automate our builds
● Source code management (SCM) is Git + GitHub
● We usually develop on Linux / Mac OS using IntelliJ or Visual Studio Code
● Our servers all run on Linux.
● The StarPass (edge server placed at the ground station) core components are written in
Java and Golang.
● StarPass uses Docker for running applications.


● Maintain cloud infrastructure
● Maintain build and deployment architecture
● Automate related processes, such as infrastructure creation and patching, monitoring,
alerting, and automating responses to certain incidents
● Monitor distributed systems
● Provide on-call support where necessary
● Manage incidents and facilitate Postmortems
● Track outages and incidents
● Create Service Level Indicators and Objectives, ensuring that infrastructure and
architecture are able to meet them


● BS degree in Computer Science, Mechanical Engineering, Physics, Math, or a similar
technical field of study or equivalent practical experience
● Advanced experience with configuration management systems, preferably Terraform
● Advanced experience managing container-based workloads, preferably using
● Advanced experience in Linux-server administration
● Experience with Google Cloud Platform or, less ideally, AWS or Azure
● Software development experience in one or more general purpose programming
languages such as Java, C/C++, C#, Objective C, Python, or Go
● Strong track record of problem solving under pressure
● Self-driven with an analytical mind and a bias for action
● Working-level proficiency in spoken and written English
● Experience in systems engineering