Matthew Liste Discusses the Evolving Landscape of Systems Engineering and Platform Development

Matthew Liste, a seasoned infrastructure engineering expert with over 30 years of experience, recently shared his insights on the intricate world of systems engineering and platform development during an appearance on the Architects Podcast. Currently overseeing American Express’s data center, resiliency, and multi-cloud strategies, Liste’s responsibilities extend to critical infrastructure components like digital workspaces and operational oversight for site reliability, application support, and mission control across all business lines. His career trajectory includes significant tenures at major financial institutions such as J.P. Morgan Chase, where he managed platform infrastructure including databases, middleware, and identity management, as well as Goldman Sachs, Throughpoint, and Schlumberger.
The Genesis of a Systems Engineer: From Tinkerer to Architect
Liste traces his professional path back to a childhood fascination with technology, ignited by an early encounter with a mainframe computer. "I was always a tinkerer, I guess," Liste shared, recalling a pivotal moment in Norway as a child. "We were living in Norway, and my parents, they’re not Norwegian. We moved there when I was a kid, six years old, and they made a friend who ran the mainframe for the Universe of Oslo. One day we went over there. I was eight or nine at the time, and he pulled me aside and said, ‘I want to show you something.’ He put me in front of a terminal, put his phone in it, and I played chess. He basically had me play chess against the mainframe. And that was when I said, ‘That magic is something I want to be involved with.’" This early exposure to the power and potential of computing systems laid the groundwork for his future career.
He further elaborated on the formative influence of his father, a carpenter, who instilled in him a builder’s mindset. While his father worked with his hands, Liste found his passion in the digital realm, combining his interest in electronics and low-level software. This hands-on approach, characterized by experimentation and a drive to manifest ideas into tangible outcomes, became a cornerstone of his engineering philosophy.
Systems Engineering as an Apprenticeship: Learning Through Experience
Liste emphasizes that systems engineering, much like any other craft, is learned through an apprenticeship model. "System engineering is an apprenticeship no different than any other craft," he stated. "And you get good at a craft by learning from others, from making mistakes and gradually understanding what great looks like. But it takes experience, it takes apprenticing, it takes being willing to take risk and learn from the mistakes and work through it."
He recounted a formative experience during a summer job at Schlumberger in his youth. Tasked with soldering complex cables, Liste, then 16, meticulously followed instructions. After two weeks of dedicated effort, he presented his first completed cable, proud of his work. His supervisor, however, inspected the soldering, deemed it "shoddy," and cut the cable in half, instructing him to redo it with greater attention to detail. This seemingly harsh lesson, Liste explained, was invaluable. "These simple things like, ‘Are you really soldering the joints? Do they stick or are they just going to stay around for a couple of days and then break?’ That apprenticing and those lessons are really at least how my career’s been is, through iterating my way through making small mistakes on continuous spaces, hopefully not too big. Taking a lesson from them, also apprenticing, learning from others, and gradually building up this, I hate to say it, but a bit of a gut of what is right and what is wrong, intuition." This emphasis on iterative learning and the development of intuition through practical application is a recurring theme in his discussion.
The Impact of AI on the Future of Apprenticeship and Systems Engineering
The conversation then turned to the growing influence of Artificial Intelligence (AI) on the field. Liste expressed concern about the potential impact of AI on entry-level roles and the traditional apprenticeship model. "I think that’s the most profound… I mean, at least to me right now is the most profound," he remarked. "I was lucky I could apprentice. I could learn to do stupid little things to begin with that gradually became more and more complex things over time."
He posed a critical question: "If you no longer do the stupid stuff because AI is doing that for you, how do you ever learn to do the more complex stuff? Because you have to learn over time." This sentiment was echoed by Michael Stiefel, who noted the acceleration of software development and the "move fast and break things" ethos often prevalent in Silicon Valley. The potential for AI to automate foundational coding tasks raises questions about how emerging engineers will acquire the fundamental skills and intuition necessary for more complex system design and problem-solving.
Liste acknowledged the potential for AI to handle repetitive or "boring" tasks, drawing a parallel to how compilers have abstracted away the need for developers to write assembly code. However, he cautioned that over-reliance on such abstractions can lead to a disconnect from the underlying mechanics of systems. "The problem comes where the abstraction breaks," Stiefel interjected, illustrating with an anecdote about a performance issue caused by a compiler placing an instruction across a page boundary, a problem undetectable without understanding the underlying hardware interactions. This highlights the enduring need for engineers who possess a deep understanding of how systems function, even as higher-level abstractions become more sophisticated.
Building Resilient Platforms: Stability, Security, and Scalability
Liste’s work at American Express involves building platforms that are foundational for other developers. He described these platforms using the "three S’s": stability, security, and scalability, which are non-negotiable in production environments, especially within the financial services sector. "Those three are non-negotiable at all times. They always need to hold true," he emphasized.
He likened a complex system to an organism, where the failure of one part can impact the entire entity. This systemic thinking requires an understanding of not only one’s own component but also its upstream and downstream dependencies. In the unforgiving environment of financial services, mistakes can have significant repercussions. "Meaning you get it wrong, it’s very obvious because you will blow up and there’s very little tolerance for the wrong kind of mistake," he explained. This necessitates a meticulous approach to risk management, where every change is carefully weighed against potential consequences.
Managing Risk and Learning from Systemic Failures
The discussion touched upon the inherent risks in operating complex systems, particularly in high-stakes industries like finance. Liste described his approach as a continuous process of risk management, balancing innovation with the need for robust, reliable platforms. He referenced Barry Boehm’s spiral model of software development as a framework for continuously assessing risk at each stage of development, although he noted he wasn’t familiar with the specific model itself.
He elaborated on how risk tolerance varies across different business functions. Highly profitable or critical "golden goose" areas demand a more conservative approach, while other, more competitive sectors might embrace higher levels of risk. Site Reliability Engineering (SRE) plays a crucial role in this, with concepts like error budgets guiding decisions about how much failure is acceptable. By measuring customer journeys, organizations can quantify the impact of system failures and adjust their risk appetite accordingly.
The Crucial Feedback Loop: Connecting System Performance to Architecture
A significant challenge in platform engineering, as highlighted by Stiefel, is establishing an effective feedback loop between operational insights (from SREs) and architectural decisions. Liste acknowledged the difficulty in achieving a "perfect feedback loop" but stressed the importance of focusing on customer outcomes. "If you think about ultimately why are we building software, we’re building software to support certain business outcomes, which support our customers," he stated.
By centering conversations around customer impact, organizations can prioritize efforts and direct resources more effectively. A failure that goes unnoticed by customers, for instance, is less critical than one that causes widespread disruption. This customer-centric perspective helps to "tighten that feedback" and ensure that architectural improvements are aligned with business objectives.
Navigating Complexity and the Edge of Chaos
Liste described the inherent complexity of modern systems, where components operate on the "edge of chaos." Using the example of a credit card transaction, he illustrated the intricate network of systems and parties involved in even a seemingly simple process. To manage this complexity, especially in critical customer journeys like payment processing, a high degree of attention is paid to testing, chaos engineering, and scenario planning.
Anticipating scale is another critical aspect, as successful products inevitably lead to increased demand. Liste warned that scaling issues are often the root cause of complex system failures, where a system that functions adequately under normal load can buckle under increased pressure. "It is usually scaling issues that have broken complex systems because something that was working fine over time was getting closer and closer to some threshold," he observed.
The Trade-off Between Technical Perfection and Customer Experience
The conversation delved into the perpetual balancing act between achieving technical perfection and delivering a seamless customer experience. Liste argued that these two goals are often at odds. For instance, an overly stringent approach to fraud prevention could lead to denying legitimate customer transactions, thereby degrading the customer experience. This necessitates making calculated trade-offs, using heuristics and predictive models to manage risk while minimizing customer friction.
He also broadened the concept of resiliency beyond purely technical aspects to include process and people. The ability to call a human representative during a system outage, for example, represents a form of process resiliency that can mitigate the impact of technical failures. The pursuit of extreme levels of uptime (e.g., six nines) can be prohibitively expensive, leading to diminishing returns. This is why understanding the acceptable level of risk and cost for a given system is crucial, a concept naturally applied in everyday life (e.g., the power grid’s reliability) but often expected in absolute terms for software.
Serving Developers as Customers and the Art of Platform Evolution
Liste highlighted that platforms have a secondary, yet critical, customer: the developers who build on top of them. The challenge lies in balancing the immediate needs of developers with the long-term vision and maintenance requirements of the platform. He described this as an art of knowing "when to be too early and when to be too late" in adopting new technologies.
He used Kubernetes as an example, noting his team’s experience with container platforms predating its widespread adoption. The decision to invest in emerging technologies requires conviction and an understanding of their maturity. "What adds the most value to the most developers in the least amount of time that cost me the least to maintain?" he posed as a guiding question.
The commitment to a platform is long-term, akin to adopting a pet. Once developers become reliant on a platform, migrating away can be a significant undertaking. This underscores the importance of careful selection and a strong conviction in the chosen path, acknowledging that resources are finite and every decision to build one thing means not building another.
The Critical Role of Culture in Platform Engineering
The discussion underscored the profound impact of organizational culture on the success of platform engineering. Liste emphasized that fostering a culture of trust, empowerment, and continuous learning is paramount. "The most important job for me at this stage in my career, I lead a big team, is setting the culture because great culture builds great teams, and great teams build great products," he stated.
He advocates for empowering teams to make autonomous decisions within defined "guardrails," accepting that mistakes are inevitable and providing opportunities for growth. This aligns with the apprenticeship model, where smaller, manageable mistakes are learned from, with experienced personnel guiding the process to prevent catastrophic failures.
Conway’s Law, which posits that organizations design systems that mirror their communication structures, was implicitly addressed. A dysfunctional organization with siloed teams will likely produce a fragmented and difficult-to-use platform. To counteract this, Liste has implemented a "Developer Zero" team, composed of individuals who consume platforms as any external developer would, providing candid feedback and identifying issues before they impact the broader development community.
The Evolving Landscape with Agentic AI
Looking ahead, Liste addressed the transformative potential of agentic AI. He views AI agents as analogous to senior developers overseeing junior developers, with the fundamental responsibility for system integrity remaining with human oversight. "I’m still accountable to make sure this stuff works. And if it’s built by humans or built by agents, it still needs to function," he asserted.
The primary change AI introduces is speed. Agentic systems, with their ability to process vast amounts of data rapidly, promise to accelerate issue detection and resolution. However, this also means the potential for mistakes to be made and propagated at an unprecedented pace. "Assume you have agents writing code and assume that they will spawn mistakes, you also need agents observing the systems that also can go just as fast," Liste advised, framing the dynamic as an ongoing "arms race" between generative and observational AI.
The complexity of data observability also increases, requiring platforms to be scaled to feed APIs and systems at speeds far exceeding human comprehension. This necessitates a continuous focus on scaling, security, and stability – the core tenets of platform engineering – in an environment of rapidly accelerating operations.
The Architect’s Questionnaire: Reflections on Systems Engineering
In a more personal segment, Liste responded to a series of questions designed to elicit deeper insights into his professional philosophy.
- Favorite Part of Systems Engineering: "I love building stuff and seeing it being used, that it manifests itself into something in production. And I always use the saying, running code wins. If it’s running in production, that’s when it’s real to me. And I love that satisfaction."
- Least Favorite Part: He expressed a dislike for the artificial separation between enterprise architecture and building, viewing it as a continuum. He prefers to be seen as a builder who understands design, rather than solely an architect detached from implementation.
- Spiritually or Emotionally Satisfying Aspects: The ability to see abstract thoughts materialize into tangible, functional systems is deeply motivating. He finds profound satisfaction in this creative process, especially when he has a hand in shaping the vision.
- What Turns Him Off: The lack of appreciation for the unseen effort involved in system engineering can be frustrating. "Your job is to be invisible. And if you do a really good job, no one ever knows you exist," he explained. This often leads to a trivialization of complexity and unrealistic expectations from stakeholders who do not fund the necessary robustness and resiliency.
- Favorite Technologies/Moments: He cited the profound experiences of witnessing foundational technological advancements firsthand, such as playing chess on a mainframe or logging into the early DARPA Net. These moments, where theoretical possibilities become reality, are a significant source of joy in his career.
- What He Loves About Systems and Engineering: The team aspect and the complex interplay of multiple components, teams, and technologies to achieve a cohesive system outcome is what he finds most engaging.
- What He Hates About Systems Engineering: Beyond funding issues, he dislikes unrealistic expectations regarding speed and ease of completion, which often stem from a misunderstanding of the inherent complexity and the necessity of trade-offs.
- Alternative Profession: He expressed a lingering interest in becoming a building architect, appreciating the blend of art and science, though his lack of drawing skills would have been a barrier in the past. He also envisions spending more time mentoring and teaching.
- Future of His Role: Liste sees himself continuing in systems engineering for as long as he is working, driven by his passion for its complexity, teamwork, and cultural dynamics.
- Ideal Project Completion Feedback: He desires to hear that expectations were met, the delivered product is sustainable and maintainable, and that it was built "in the right way," not just delivered quickly.
The Enduring Value of Platforms
In concluding remarks, Stiefel underscored the fundamental importance of platforms, stating, "Without platforms, we wouldn’t have software." Liste echoed this sentiment, expressing gratitude for the opportunity to discuss his journey and the critical role of platform engineers, architects, and designers. He emphasized the interconnectedness of platforms, where each layer builds upon another, a testament to the collaborative nature of modern technology development. This ongoing evolution and the constant pressure to innovate while maintaining stability, security, and scalability make systems engineering a perpetually challenging and exciting field.




