12

Site Reliability Engineering Specialist

New

Montreal, Canada, Sapienza Consulting [1792]

Field(s) of expertise
Software Engineering
Job type
Permanent
Education
Bachelor
Deadline
30/12/2021

About this job

Sapienza Consulting, a tpgroup company is working with many exciting new ventures around the world involved with innovative New Space technologies surrounding earth observation.

SRE bridges the gap between development and operations, focusing on Digital Infrastructure availability and reliability, while DevOps focuses on continuity and speed of product development and delivery.

As a member of the Digital Infrastructure (DI) team, you will be in charge of the infrastructure, used to plan, research, develop, test, deploy and operate new data products at GHGSat. You will implement and improve observability and alerting of on-premise and cloud services, secure and improve the workflows of 120 employees. Your long-term goal is to make Production and Staging environments of our data products reliable, fast, cost-effective and secure.

Responsibilities

  • For the first few months, act as DevOps within a Development team: spearhead Staging environment in Kubernetes, automate deployments, scan code and containers for security, automate storing of artifacts, secure reproducibility of deployments to other systems
  • Then move to SRE activities: Evaluate and Design evolutions of the global infrastructure: IaC, deployments, automation, redundancy
  • Provision, configure, deploy, support, update and monitor web services: both on-premise and in the Cloud: Kubernetes, container registry, VM registry, python registry, object storage, security scanning tools, etc.
  • Optimize productivity of more than 80 developers, analysts and operators
  • Provision, support, troubleshoot and audit Identity Access Management for GHGSat employees, customers and partners.
  • Evaluate, schedule and execute system maintenances
  • Audit Cybersecurity internally regularly (employee access, application token, etc.)
  • Grow team knowledge with online training, Lunch & Learns
  • Assist other teams with their technical needs and choices
  • Collaborate pro-actively, persevere, ask for help to reach autonomy

Profile

  • You love Linux and FOSS technology stack (Kubernetes, nginx, docker, grafana, terraform, ansible)
  • You are qualified on at least one IAM service (Azure AD, AWS IAM, Google Cloud)
  • You embrace automation and observability
  • knowledge of Cybersecurity practices (phishing, threat and intrusion detection, SSL certificates, malware protection)
  • knowledge of deployment frameworks and orchestrators (ArgoCD, Mesos, Rancher, etc.)
  • knowledge of one distributed system (Apache Spark, Python Dask, AMQP, gRPC) would be nice
  • Strong will to learn new technologies ️
  • Bilingual French and English

Qualifications

  • Bachelor’s degree in information technology, computer science, software engineering or equivalent background
  • 5-10 years of experience as a SRE, DevOps or System Administrator
  • Ability to obtain Canadian government security clearance (Canadian Controlled Goods Program)