You have 4 summaries left

Oxide and Friends

Shipping the first Oxide rack: Your questions answered!

Tue Jul 04 2023
OxideOn-premises ITHardware integrationPower optimizationCustomer supportInfrastructure managementUpgradesIndependent cloudEfficiencyChallengesBuilding a company

Description

This episode covers the launch of Oxide, a product that allows on-premises IT access to cloud computing. It explores the benefits of on-premises infrastructure, challenges in building hardware and software integration, efficiency and power optimization in on-premises infrastructure, testing and installation of Oxide rack, improving server management and customer experience, transparency, trust, and customer support, improving customer support and time to value, efficient interfaces and ownership of infrastructure, Oxide as an independent cloud and future potential, scalability, software support, and future plans, live migration, upgrades, and cooling solutions, power budget and operating system considerations, challenges with Linux and open source licensing, product features, economic model, and open source, overcoming challenges and building the company, navigating banking crisis and support from investors, support from fans and future plans

Insights

Oxide's product is an hypervisor host that allows hosting VMs and switching networks

The launch of Oxide marks the availability of their first commercial system, which is a hypervisor host that enables hosting VMs and switching networks.

On-premises IT infrastructure will remain big even as public cloud grows

Despite the growth of public cloud, on-premises IT infrastructure will continue to be significant due to reasons such as regulatory compliance, security, risk management, latency, and economics.

Hardware and software integration are essential for delivering infrastructure

Building hardware has been technically challenging but necessary for delivering the final product. The team realized that both hardware and software integration were necessary for true elastic infrastructure.

Efficiency gains can be achieved by eliminating redundant AC power supplies and using a DC bus bar system

Rack scale design improves power efficiency by utilizing larger fans that move more air at lower speeds. Efficiency gains can also be achieved by eliminating redundant AC power supplies and using a DC bus bar system.

Oxide aims to deliver an integrated hardware and software system like Apple and Sun did in the past

Oxide's goal is to provide an integrated hardware and software system similar to what Apple and Sun did in the past. This integrated approach is crucial for delivering a better customer experience.

Customer support often blames the customer instead of investigating the issue

Many organizations tend to blame the customer for their problems instead of thoroughly investigating the issue. This victim-blaming approach leads to a terrible customer experience.

Oxide provides a solution to help solve customer support problems and deliver a better experience

Oxide aims to provide a solution that helps solve customer support problems and delivers a better overall experience. They strive to address any problem that may arise with a component or part of the system.

The Oxide rack is standalone and does not rely on any cloud services except for fetching updates

The Oxide rack is designed to be standalone and does not rely on any cloud services except for fetching updates. This ensures that customers have full control over their infrastructure.

Oxide is an independent cloud running on its own, not like AWS Outposts

Oxide is not like AWS Outposts. It is an independent cloud running on its own infrastructure. This distinction sets Oxide apart from other offerings in the market.

The ability to upgrade is viewed as the most important feature of the product

One of the most important features of Oxide's product is the ability to upgrade. This allows customers to easily adapt and scale their infrastructure as their needs evolve.

Overcoming challenges and building the company has been a collective effort

Building Oxide has been a challenging journey, but it has been made possible through the collective effort of everyone involved in the company. Overcoming crises together has reduced anxiety over time.

Chapters

  1. Oxide Launch and Product Overview
  2. Benefits of On-Premises IT Infrastructure
  3. Challenges in Building Hardware and Software Integration
  4. Importance of Hardware and Software Integration
  5. Efficiency and Power Optimization in On-Premises Infrastructure
  6. Testing and Installation of Oxide Rack
  7. Improving Server Management and Customer Experience
  8. Transparency, Trust, and Customer Support
  9. Improving Customer Support and Time to Value
  10. Efficient Interfaces and Ownership of Infrastructure
  11. Oxide as an Independent Cloud and Future Potential
  12. Scalability, Software Support, and Future Plans
  13. Live Migration, Upgrades, and Cooling Solutions
  14. Power Budget and Operating System Considerations
  15. Challenges with Linux and Open Source Licensing
  16. Product Features, Economic Model, and Open Source
  17. Overcoming Challenges and Building the Company
  18. Navigating Banking Crisis and Support from Investors
  19. Support from Fans and Future Plans
Summary
Transcript

Oxide Launch and Product Overview

00:00 - 07:02

  • Oxide launched on Friday and shipped its first commercial system
  • A package from nuts.com arrived at the office as a gift from a supporter
  • The team is grateful for the support they have received
  • The hardware is complete, but there is still work to be done on software
  • Some naysayers were proven wrong when Oxide shipped their product
  • The Hacker News thread had both supportive comments and trollish ones
  • The team feels proud of their accomplishment and appreciates everyone's contributions
  • Oxide's product is an hypervisor host that allows hosting VMs and switching networks

Benefits of On-Premises IT Infrastructure

06:38 - 14:07

  • Oxide is a product that allows on-premises IT access to cloud computing
  • Cloud hyperscalers have different hardware and software than on-prem IT
  • There are good reasons to run on-prem, such as regulatory compliance, security, risk management, latency, and economics
  • Once you reach a certain size, it makes sense to own compute instead of renting all the time
  • The cost of owning infrastructure includes operational costs and developer onboarding
  • 95% of IT infrastructure was running outside of public cloud in 2021
  • On-premises IT infrastructure will remain big even as public cloud grows
  • Having local compute is important for regulatory compliance and reducing latency
  • The freedom to choose between renting or owning infrastructure should be available
  • The business value of Oxide may not be immediately apparent but it offers benefits like faster access for developers and lower overhead for operators
  • Customers may not notice or care about the unique hardware aspects of Oxide but off-the-shelf setups wouldn't provide the same capabilities

Challenges in Building Hardware and Software Integration

13:42 - 19:34

  • The podcast discusses skepticism towards a new business model in the hardware industry
  • The speaker finds it amusing that someone would claim no customers exist while there is evidence of customers purchasing their product
  • The team has worked hard on the project, and accusations of side quests are unfounded
  • Building the hardware has been technically challenging but essential for delivering the final product
  • The podcast mentions AWS building their own switch as an example of similar endeavors in the industry
  • The team tried to use commodity hardware but found limitations and realized that both hardware and software integration were necessary for true elastic infrastructure

Importance of Hardware and Software Integration

19:12 - 26:27

  • Hardware and software are both necessary to deliver infrastructure
  • There is no way to split the minimum viable product, it must be the full solution
  • Removing the BMC reference sign from every reference design was challenging but allowed for deep control of the system
  • The decision-making process for fan speed in some systems is flawed, resulting in wasted power
  • The ability to monitor power draw on fans is a natural capability for this system
  • Up to 20% of power in data centers is used for fan performance and cooling systems, which is inefficient
  • Companies should question why they are building bespoke private clouds and consider focusing on more efficient on-prem solutions

Efficiency and Power Optimization in On-Premises Infrastructure

26:02 - 33:03

  • Moving workloads to the public cloud makes organizations realize the need for greater efficiency on-premises
  • Building a bespoke on-prem infrastructure becomes harder to justify
  • Multiple vendors in hardware and software make it difficult to address problems as none of them take ownership
  • Rack scale design improves power efficiency by utilizing larger fans that move more air at lower speeds
  • Efficiency gains are achieved by eliminating redundant AC power supplies and using a DC bus bar system
  • Lab infrastructure will be made available for testing without having to purchase a full rack

Testing and Installation of Oxide Rack

32:36 - 39:55

  • Oxide is planning to offer the opportunity for people to test drive and get familiar with their system without having to buy a full rack
  • Next week's episode will focus on operational questions about shipping, insurance, logistics, and installation of the Oxide rack
  • The operations team at Oxide has been working hard for three and a half years to ship the rack
  • There will be an in-depth conversation about the engineering of the crate and the intuitive unboxing experience
  • Oxide aims to deliver an integrated hardware and software system like Apple and Sun did in the past
  • Integrated hardware and software is necessary to provide a better customer experience
  • The installation experience of Oxide's rack is described as unbelievably gorgeous and delightful

Improving Server Management and Customer Experience

39:26 - 47:04

  • The new textual-based interface in ASCII defies expectations and provides a gorgeous experience
  • Technicians and engineers will have a hard time describing the experience to their peers who weren't there
  • The current state of the art in server management involves outdated methods like IPMI and manual configuration screens
  • Google and AWS value operators' time, but many companies lack access to similar tools
  • The market for on-premises solutions will remain large, but conservative customers may be hesitant to adopt unknown vendors and hardware
  • The choice between KVM or ESXi as a hypervisor would have presented different challenges
  • VMware is not popular with its own customers due to frustrations with the company's practices
  • Customers prioritize finding partners they can trust for long-term investments in technology

Transparency, Trust, and Customer Support

46:38 - 53:29

  • Transparency and trust are important for customers in adopting new technologies
  • AMD's superior microprocessor product has won people over in the last three and a half years
  • Customers care about the interface their development teams will operate against
  • Current state of the art requires calling multiple companies when issues arise, leading to frustration
  • The company's core differentiator is understanding how the entire system works and taking responsibility for any problem within it
  • Delivering oxide value means being able to support any problem that may arise with a component or part of the system
  • Unsupported configurations often lead to blame shifting instead of addressing the actual issue

Improving Customer Support and Time to Value

53:01 - 59:50

  • Customer support often blames the customer for their problems instead of investigating the issue
  • Excuses like using a different version of Ubuntu or not having all the patches are common but do not address the actual problem
  • This victim-blaming approach leads to a terrible customer experience
  • It is important for organizations to provide better support and understand the pain their customers are facing
  • Oxide provides a solution to help solve these problems and deliver a better customer experience
  • The time to value for on-prem solutions can be up to 90 days, but Oxide aims to reduce that to within an hour or two
  • With Oxide, customers can quickly set up infrastructure resources and deploy software in minutes
  • People may find it hard to believe such short timescales if they are used to longer wait times

Efficient Interfaces and Ownership of Infrastructure

59:27 - 1:06:52

  • People have a hard time believing that certain tasks can be completed in a short amount of time because their norm is much longer
  • The team has created a fast and efficient UI for the console, CLI, and API interfaces
  • Oxide emphasizes that the computer belongs to the customer when purchasing their distributed system
  • Oxide provides operator interfaces for managing and maintaining the infrastructure
  • Support contracts are available with flexible timing for updates
  • The Oxide rack is standalone and does not rely on any cloud services except for fetching updates

Oxide as an Independent Cloud and Future Potential

1:06:29 - 1:13:50

  • Oxide is not like AWS Outposts, it is an independent cloud running on its own
  • Large cloud SaaS companies are interested in extending their infrastructure beyond the big three cloud providers
  • There is a need for operational efficiency in classic on-prem enterprises and Cloud SaaS companies
  • Oxide does not currently offer smaller racks for home labs, but they see the potential for small to medium-sized businesses in the future
  • The architecture of Oxide's rack can scale down to a certain point

Scalability, Software Support, and Future Plans

1:13:24 - 1:20:41

  • The rack drawing of 15 kW may not be suitable for all use cases, but it can be scaled down from a power perspective
  • The software architecture is designed to support different form factors and edge deployment use cases
  • The company is open to feedback and specific use cases from the market
  • Support for AI workloads and GPUs is not currently available due to compatibility issues with proprietary software
  • AMD GPUs are being closely monitored as a potential solution
  • The current product focus is on compute, storage, and networking services
  • Firecracker was considered as a hypervisor alternative but didn't meet the requirements for running arbitrary guests
  • Live migration capability is important for workload flexibility and maximizing resource utilization

Live Migration, Upgrades, and Cooling Solutions

1:20:15 - 1:28:15

  • Live migration is important for maximizing the utilization of a data center or rack
  • The ability to live migrate workloads allows for better lifecycle management
  • FireCocker was not a great option due to limitations in live migration
  • Genoa CPUs and next-gen sockets are being considered for future products
  • The rack design allows for modularity and easy upgrades without ripping out the entire rack
  • The rack has ample networking with 6.4 terabits of networking facing the rack and additional ports facing the network
  • Collaboration with Motivare for augmented cooling solutions, but no liquid cooling in current architecture
  • Power budget in the data center is a challenge, especially in enterprise DCs compared to hyperscale DCs

Power Budget and Operating System Considerations

1:27:45 - 1:35:01

  • Data centers typically have power budgets per rack, with hyperscalers reaching up to 35kW or 40kW
  • The challenge in adopting alternate cooling methods is managing the power budget in the data center
  • Using DTrace and other tools allows for better control and understanding of production systems
  • Linux's BPF trace and EVPF solutions do not solve the specific problems that need to be addressed
  • Choosing Linux as an operating system also means signing up for a specific distribution, which can be burdensome on a team
  • SmartOS values upstream participation and avoiding divergence from the community
  • Small communities like SmartOS have their own challenges but also benefit from shared values and less concern over larger community issues
  • SmartOS has a different model for booting the system, requiring significant surgery

Challenges with Linux and Open Source Licensing

1:34:31 - 1:41:19

  • The Linux community has shown little interest in important Linux work done by Oracle in the past
  • Rust was implemented in the Luma's kernel before Linux, showcasing a move that would have been harder with Linux
  • Torvalds issued negative comments about ZFS while it was being deliberated upon
  • Finding experienced DTrace users in the Linux community is challenging
  • A former colleague expressed missing DTrace and facing challenges with eBPF
  • The chosen operating system allowed for faster progress and leveraged the team's familiarity
  • Open source licensing was discussed, emphasizing trust issues with Oracle and patent enforcement concerns
  • Oracle's license for DTrace explicitly covers patents related to it

Product Features, Economic Model, and Open Source

1:40:53 - 1:48:18

  • The ability to upgrade is viewed as the most important feature of the product
  • The economic model is open and intended to empower customers
  • The web console is currently closed but there are plans to open it up
  • Software repos are usually open sourced immediately, but sometimes kept closed during development
  • Opening up software allows for easier searchability and increased developer velocity
  • Some networking repos are not open due to concerns around IP from hardware vendors
  • Building on their own hardware will be happening soon, as they have recently been able to get full racks together
  • The hardest part of the journey was when they received funding and realized the magnitude of the task ahead

Overcoming Challenges and Building the Company

1:47:52 - 1:54:33

  • Feeling paralyzed by the amount of work and technical risk at Oxide
  • First employee Robert joining and feeling uncertain about how to build the company
  • Receiving support from mom who reminded them of past successes
  • Comparing Oxide to Fishworks without a safety net
  • Realizing that success is not assured in a startup
  • Overcoming the fear and focusing on building the team and moving forward
  • The challenge of starting with a blank sheet of paper and feeling overwhelmed
  • Finding stability as the company grows and more employees join
  • Assuring new employees that the overwhelming feeling will pass
  • Enduring crises together has reduced anxiety over time

Support from Fans and Future Plans

1:54:03 - 2:01:13

  • The initial raise and bringing in the initial folks is credited to the podcast hosts
  • They asked people to fill out a Google form for stickers before the pandemic, and received 500 envelopes full of stickers
  • The support from fans has meant a lot to the hosts
  • Next week's episode will feature the operations team, including Eric Anderson who has insight into manufacturing details
  • CJ will walk them through something related to crates
  • The hosts express their gratitude to everyone in chat
1