Production Operations Engineer

about 1 year old

We shaped the earliest forms of ad tech, and we’re looking for the technical expertise to help shape its future. Our customers have unique problems that can only be solved at internet scale, and that’s where the technical skills of our team make a real difference.

Our exchange handles over 500 billion requests every day (for comparison Google serves an estimated 9 billion searches a day), all running in our own global data centers. Every member of our technology team has an enormous amount of autonomy in building and managing our systems to support and enable our growing level of scale. Through the transparency of our technology, dedication to innovation and integrity, and long-standing customer relationships, we lead through change.

What’s it like to work at Index?

We have more than 550 Indexers around the globe dedicated to building a safe and transparent marketplace that provides a trusted experience for consumers.

Index is an exciting and fast-paced place to work. We’re built on our values of change, support, learning and teaching, trust, and intention. We pride ourselves on our independence and openness, not only in our technology, but in our teams, too. Our diverse and inclusive culture celebrates how we can leverage our unique differences to help drive Index forward.

Our culture of success is truly supportive and collaborative. In working together across our teams, we’re continually investing in the people and technology to solve the industry’s most complex problems. As we extend the promise of ad tech to every channel, we’re looking for talented engineers to help advance Index, and the industry, forward.

Are you ready to join the programmatic evolution?

Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.

Our business is growing significantly every year and is poised to grow even faster. Our people and our platforms are the foundation and enabler of that growth. We are significantly expanding our technology teams, and are looking for technologists with a passion for high performance software development, and a drive to deliver software products and platforms that enable and empower industries at a global scale.

About the Team:

The global Production Operations team is integral to ensuring the operational stability and reliability of our worldwide 24x7 on-premises and cloud environments. As the first line of defense this team has ownership of operations engineering. Collaborating closely with IT, SRE, Network, and Data engineering teams, and key stakeholders across business, product, and software engineering teams. We play a crucial role in maintaining systems health, responding to incidents, and optimizing the performance, efficiency, and stability of complex global systems.

As a Production Operations Engineer, you'll be at the heart of maintaining and improving our global infrastructure. You'll work with a passionate team of engineers who take pride in building reliable, scalable systems that power our business. This role offers an exciting opportunity to work with enterprise-grade data center infrastructure while developing your skills in automation and system optimization.

You'll be part of a global team that maintains 24x7 coverage of our systems. This means participating in an on-call rotation and occasionally working outside regular hours when urgent issues arise. We provide training and support to ensure you're well-prepared for these responsibilities.

What We’re Looking For:

Your Technical Foundation:

We're looking for someone who has built a solid foundation in systems engineering. You should be comfortable with Linux systems administration and have a strong command of bash and Python for automation. Experience with infrastructure automation tools like Ansible and GitLab CI/CD is key, as these are central to how we operate.

Essential Technical Skills:

Strong proficiency in Linux systems administration (especially CentOS/Rocky Linux) and bash scripting

Python programming skills for automation and tooling development

Experience with infrastructure automation tools (e.g., Ansible, Gitlab CI/CD)

Hands-on experience with bare-metal server lifecycle management

Understanding of networking fundamentals and troubleshooting

Experience with observability and monitoring platforms (e.g., ELK Stack, Prometheus)

Working knowledge of big data ecosystems (e.g., Hadoop/HDFS)

We also value experience with:

Experience with Go programming language

Virtualization platforms

Kubernetes and container orchestration

Infrastructure-as-code practices

Advanced observability platform implementation and integration

Deep understanding of big data tools and architectures

Experience with metrics collection and visualization tools

Here's what you'll be doing:

Every day brings new challenges in our dynamic environment. You might find yourself managing and optimizing our bare-metal infrastructure across global data centers. You'll work with enterprise-grade hardware, handling everything from firmware updates to performance tuning. When systems need attention, you'll coordinate with remote hands and team members to quickly resolve issues.

Taking ownership of automation initiatives that improve our operational efficiency. Whether it's crafting a new Ansible playbook or optimizing an existing deployment pipeline, you'll have opportunities to make our systems work smarter, not harder.

Key Responsibilities:

Monitor and maintain system health across our global on-premises infrastructure

Manage bare-metal server lifecycle, including firmware updates and break-fix procedures

Participate in incident response and alert triage

Implement and maintain automation frameworks

Contribute to system documentation and team knowledge sharing
Here's what you need:

The ideal candidate brings 5-7 years of experience in DevOps, Systems Administration, Site Reliability Engineering (SRE), or similar roles. During this time, you should have developed significant hands-on experience with enterprise infrastructure management and automation.

We're particularly looking for someone who has:

Infrastructure Management Built or maintained private-cloud infrastructure running CentOS/Rocky Linux, working with a mix of bare-metal servers and virtualization technologies. Experience with server lifecycle management in distributed data centers is crucial - you'll be handling everything from break-fix scenarios to firmware updates on enterprise-grade hardware like Dell and Supermicro systems.
Automation & Orchestration Developed and maintained automation frameworks for deployment and maintenance pipelines. You should be comfortable using tools like Ansible and GitLab CI/CD to push out code, manage configurations, and build new infrastructure systems. Experience with message queuing systems and workflow automation is valuable.
System Integration While we primarily operate on-premises, familiarity with public cloud environments (AWS, GCP, Azure) and how they can integrate with on-premise infrastructure is beneficial. Understanding how to bridge these environments effectively demonstrates the kind of systems thinking we value.
Note: We recognize that everyone's path is different. If you've spent meaningful time working with similar technologies or in comparable environments, we'd like to hear about your experience.

Your Approach:

Technical skills are important, but equally valuable is your approach to problem-solving and teamwork. The characteristics that will make you successful in this role go beyond just technical expertise.
Communication Clear and effective communication within and across teams is essential. While we place a huge premium on technical skill, we value just as much your ability to work with other people.
Curiosity Things can (and will) break for different reasons; your curiosity will help drive you to identify and fix the things that go wrong.
Alertness We can never predict when things will go wrong so it is your job to be vigilant and prepared to respond when they do; you must be ready to reach out, ask questions and sound the alarm when necessary.
Analytical Thinking Monitor and analyze activity, collaborate with other departments to maintain technical defense.
Reliability Prioritize the reliability of our systems, ensuring our exchange customers can trust in our services 24x7. Adhere to operational procedures, best practices, and security protocols.
Continuous Improvement Embrace a culture of continuous learning and innovation, always seeking ways to enhance our operational efficiency.
Customer-Centricity Committed to providing the best possible experience for our customers, both internal and external.
Accountability Take ownership of our responsibilities and hold ourselves accountable for the quality of our work.

Why You’ll Love Working Here:

Comprehensive health, dental, and vision plans at no cost to you
Time off and flexible work schedules
Retirement plan with a 5% company match
Stock options and equity packages
Generous parental leave
Monthly wellness stipend plus fitness discounts and quarterly wellness group activities
Community engagement opportunities and donation-matching program
Annual virtual company retreats and regular community-led team events
One day off per year to volunteer
A workplace that supports a diverse, equitable, and inclusive environment – learn more here

Equal employment opportunity

At Index Exchange, we believe that successful products are built by teams just as diverse as the audience who uses them. As such, we are committed to equal employment opportunities. We celebrate diversity of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or expression, or veteran status. Additionally, we realize that diversity is deeper than any status or classification—diversity is the human experience. For those who show grit, passion, and humility—Index will welcome you.

Accessibility for applicants with disabilities

Index Exchange is committed to working with and providing access and reasonable accommodations to applicants with disabilities. Please let us know if you’d like to request a reasonable accommodation.

Index Everywhere, Index Anywhere

Our corporate headquarters are in Toronto, with major offices in New York, Montreal, Kitchener, London, San Francisco, and many other global cities. As a major global advertising exchange, we are committed to operating as a tightly-knit global team and embracing and empowering talent wherever our colleagues may be.

#Ll-PC1

#LI-ONSITE

Apply Now