Platform/Site Reliability Engineer
Position Title: Platform Engineer/Site Reliability Engineer
Location: Remote
Roles & Responsibilities
As a Platform Engineer at Unizin, your primary responsibility is to ensure the reliability, scalability, security, and performance of our infrastructure and applications hosted on the Google Cloud Platform (GCP) and Amazon Web Services (AWS). You will leverage tools such as Kubernetes, ArgoCD, GitLab CI/CD, Python, Terraform, Pulimi, and Ansible to achieve these goals.
Key Responsibilities
Infrastructure and Configuration Management
- Implement and maintain infrastructure as code (IaC).
- Manage multiple Kubernetes clusters, ensuring high availability, scalability, and security.
- Automate deployment, scaling, and management of containerized applications.
Monitoring and Alerting
- Design and implement monitoring and reporting solutions.
- Set up alerts and response procedures to ensure rapid response to incidents and outages.
- Perform monitoring and reporting on infrastructure costs and usage, as well as provide solutions for savings and optimization.
Continuous Integration and Deployment
- Develop and maintain CI/CD pipelines.
- Automate testing, builds, and deployments to achieve a consistent and reliable delivery process.
Performance and Reliability
- Conduct performance testing and capacity planning to ensure systems can scale with demand.
- Optimize system performance and resource utilization across our products and platform.
Incident Response and Post-Mortems
- Participate in incident response activities, ensuring timely resolution of issues.
- Conduct thorough post-mortem analyses to identify root causes and prevent recurrence.
Security
- Identify and deploy cybersecurity measures by continuously performing vulnerability assessment and risk management.
- Administer and enforce time-bound access controls for cloud infrastructure ensuring least privilege and JIT (Just-In-Time) access principles.
- Oversee remote endpoint management of workstations using centralized tooling, including patch deployment, configuration enforcement, and compliance with security standards.
- Collaborate with the broader engineering team to implement and maintain best practices for security controls.
Cloud Platform SME
- Act as a subject matter expert on cloud platform (GCP/AWS) services, technologies, automation, and security.
- Provide guidance and recommendations on infrastructure design, implementation, and optimization.
- Maintain a strong understanding of containerization, microservices architecture, and cloud-native technologies.
- Stay updated with industry trends and best practices related to cloud platforms and DevOps methodologies.
Documentation and Knowledge Sharing
- Create and maintain comprehensive documentation regarding systems, configurations, recurring issues, procedures, knowledge transfer material, etc.
- Share knowledge and best practices with the broader engineering team.
- Mentoring and guiding broader engineering team members on infrastructure and CI/CD processes.
Automation and Tooling
- Identify opportunities for automation to streamline operations and improve efficiency. Encourage and build automated processes wherever possible.
- Develop and contribute to scripts and tools using Python, Bash, or other scripting languages as needed.
On-Call Responsibilities
- Participate in a 24/7 on-call rotation with other Platform Engineering team members.
- Respond promptly to alerts and incidents during your on-call shift.
- Coordinate with other team members to resolve critical issues and minimize downtime.
- Document incidents, actions taken, and follow-up tasks for review during business hours.
Qualifications
- At least 4 years of Platform Engineering / SRE / DevOps experience
- Extensive experience with Linux-based infrastructure and systems administration.
- Strong understanding and practical experience with containerization, microservices architecture, and cloud-native technologies.
- Solid experience with a cloud platform, preferably GCP.
- Expert in automated deployments and CI/CD (Gitlab CI/CD, Tekton, Jenkins, etc.).
- Proficiency in scripting and automation using Python or Bash.
- Experience working with a SQL architecture such as PostgreSQL or MySQL.
- Familiarity with infrastructure as code (IaC) principles and tools (Ansible, Terraform, Pulumi, etc.).
- Proven track record of working on production software projects that scale efficiently.
- Expertise in application scaling methodologies, including horizontal and vertical scaling strategies.
- Experience with monitoring and logging tools, preferably Stackdriver.
- Excellent problem-solving skills and ability to troubleshoot complex issues under pressure.
- Strong communication skills and ability to collaborate effectively across teams and with end users.
- Demonstrated ability to articulate and represent technical viewpoints effectively, coupled with active listening skills to understand diverse perspectives.
Skills that will set you apart
- Apache Airflow, Kafka, Beam
- ElasticSearch
- Learning Management System (LMS) such as Canvas by Instructure
- Shibboleth
- Helm / Kustomize
- GitFlow / GitOps
- Akuity platform products (Argo, Kargo, etc.)
- Vault or comparable secret management tooling
- Britive
- NinjaOne
- Certifications in Kubernetes, GCP, or related technologies
Conclusion
As a Platform Engineer, your role is crucial in maintaining the stability, scalability, and reliability of our cloud systems. By leveraging your expertise in Kubernetes, CI/CD, IaC, scripting, and serving as a cloud platform SME, you will contribute significantly to advancing our DevOps practices and ensuring seamless delivery of high-quality services to our engineering teams and our members.
Furthermore, your ability to articulate a strong technical perspective, coupled with your openness to constructive dialogue and collaboration, will drive innovative solutions and foster a culture of continuous improvement within our engineering teams.
Additional Information
Must be comfortable working with a computer daily for several hours. Must be able to communicate detailed work verbally and in writing. Must be able to context switch for concurrent tasks and interruptions when they arise.
Unizin is proud to be an equal opportunity workplace and is committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, disability, or gender identity. Accommodations will be provided to any candidate with special needs who requests them.
At this time, we are unable to sponsor applicants for work visas. Candidates must be authorized to work in the United States without current or future sponsorship.
If you are a resident of a state with designated pay transparency requirements and this role is available remotely, you may be eligible to receive additional information about the compensation and benefits for this role, which we will provide upon request. Please send an email to [email protected].