What Rex Computing is doing is a pretty big deal. This start-up is rethinking processor’s architecture to reduce their power consumption and total area with the ultimate goal of making High Performance Computing (HPC) systems a ubiquitous reality. And what looks more challenging, they are getting into the playground of tech giants like Intel.
I had the chance to meet Thomas Sohmers at EmTech Asia 2017 in Singapore. To be honest, I didn’t plan to interview him nor did I do any research on what his start-up was doing before the event.
All that changed after his 15 minutes talk. I chased him at the end of the first day and we arranged the interview for the next day. What caught my attention? A simple and logical value proposition that I couldn’t believe had been overlooked for so long.
“We are redesigning processor’s architecture with a methodology in mind: if we were alive 30-40 years ago, what would we have done differently with the information and tools we have now?”
A simple, clear and game-changing way of thinking.
Meeting Thomas Sohmers and Paul Sebexen, co-founders of Rex Computing
Thomas Sohmers is the CEO and co-founder of Rex Computing. He isn’t like most other start-up CEOs in their late twenties, thirties or even forties. He is now 21 years old, founded Rex Computing in 2013, when he was 17 years old and has been featured on Forbes “30 under 30” list.
He dropped out of high school and started working at the MIT Institute for Soldier Nanotechnologies when he was 14 years old. There he had the chance to work for 3 years as an end user of HPC (High Performance Computing) systems to later end up designing and building them at the laboratory.
Thomas met Paul, the other co-founder of Rex, thanks to Peter Thiel’s “20 under 20” Fellowship, which is a sweet motivation of $100.000 for young students that want to build new things instead of sitting in a classroom.
Paul is the CTO of Rex Computing. He has been an avid programmer ever since he was a child and studied Computer Science at Georgia Institute of Technology. Paul was granted the Thiel’s Fellowship, where he founded a synthetic biology startup and worked there for 18 months to later join Thomas in starting REX.
A quick view of supercomputing and exascale machines
In this interview we are going through a few complex terms so for people not familiar with this field (like me) I think it’s useful to talk about a couple of concepts before getting more in depth.
What is a supercomputer?
This is as intuitive as it sounds. A supercomputer is just a faster computer than the one you probably have at home. The “velocity” of these supercomputers is measured in FLOPS (Floating Operations Per Second) or to make it simpler, in calculations per second.
The fastest supercomputer is located in China and has reached the figure of 93 PFlops (1 PetaFlop = 1015 calculations per second, 1 quadrillion, or what is the same as many calculations in a second as seconds are in 32.000.000 years).
To have a better understanding of how fast the fastest supercomputer is, think about an ordinary computer. The fastest nowadays could be around the tens of GFLOPs (1 GigaFlops = 109 calculations per second, 1 billion) which is still 1 million times slower than the fastest supercomputer.
Why do we need supercomputers?
Although not common in our daily lives, they have played a central role in the main breakthroughs and developments of recent history. They are used in diverse fields like quantum mechanics, weather forecasting, oil and gas exploration, molecular modelling or physical simulations of any kind (airplane aerodynamics, nuclear fusion or calculation of structures).
The next step: Exascale computing
The next big milestone in terms of supercomputing will come with the exascale computing. Computers of more than 1 ExaFLOP (ExaFlop = 1018 calculations per second, 1 quintillion) or 11 times faster than the fastest supercomputer we have nowadays.
Getting to the levels of exascale computing would mean that computers are reaching the processing power of the human brain at neural level. Sounds intriguing and scaring at the same time.
What is stopping us to get there?
Energy efficiency and heat management. Energy efficieny is measured in “FLOPS per Watt” or number of calculations that we can do with a unit of electrical power (Watt).
These super-machines consume large amounts of electrical power, almost all of which is converted into heat that requires cooling. Since copper wires transfer energy into the computer with higher power densities than refrigerants can remove, the cooling system tends to be a limiting factor.
Moreover, the maintenance of a supercomputer is expensive. Tianhe-1A with 2.57 PetaFLOPS has a power of 4.04 megawatts. With a KWh of 10 cents in a dollar, the bill rises to $3.5 million a year. And we are aiming at building computers almost 400 times faster than this one.
In this video, Thomas talks about supercomputing, the challenges of the industry and how Rex Computing is trying to address them.
After this slightly technical introduction (hope I didn’t lose anyone on the way), it’s time to know more about Rex Computing roots.
How did a 17 years old kid think about challenging a few tech giants?
“It all started when I was working with HPC (High Power Computing) systems at the MIT Institute for Soldier Nanotechnologies. I realised the problem of energy efficiency affecting these large scale systems.
I was just trying to understand how computers had been designed through history to the point where we are today. Some factors like Moore’s law led to some design decisions that weren’t the best ones.
Over the years some features were added to make the life of programmers easier with the trade-off of higher energy consumption. Nowadays with the software tools we have developed, the hardware can be highly simplified and made more energy efficient given that today it takes more energy to move the data through the circuitry of the processor than to compute it.
When I was at the MIT, everyone was talking about exascale machines as the next big milestone. The first estimation was to have the first machine for 2015, then 2018, then 2020, 2022, and now it looks that it will be around 2025. The real barrier to get there is energy efficiency.
The limit in power placed for supercomputers by the energy department who manages the main American machines is 20 MW (that’s the power of a small nuclear power reactor). Having this limit means that to have this exascale machine, we need to hit the magic number in energy efficiency of 50 GigaFLOPs per Watt. When I started in 2010, the efficiency was at 1 GigaFLOP per Watt. Still a 50x improvement was needed to get there.
All these experiences made me get more interested in computer architecture. Eventually, I moved to San Francisco Bay Area and started Rex Computing to address the problems that HPC systems are facing.”
The journey to the first chip!
In the talk Thomas mentioned the difficulties they encountered to find funding, even in Silicon Valley! I wanted to know more about this issue.
“To be honest, once in Silicon Valley, we were thinking that the money would knock on our door but then we figured out that most start-ups are focused on software and not hardware. That made the process more difficult than what we expected plus investors thought we had a crazy idea when we pitched them.”
I’m sure it doesn’t feel good when someone tells you that you are crazy, so I asked Thomas about his thoughts in those moments.
“The thing is that it was understandable from the investors’ perspective so I didn’t take it personally. With all the information they had, especially if they were looking at previous start-ups that failed to deliver or at what Intel says (to make a chip costs around $100 million), it was reasonable to not invest in us and say we were crazy.
However, I knew examples of chips done with $5 million. With that in mind and considering Paul’s and my skills in the field, we thought we would be able to do something cheaper.
There is this huge barrier in the hardware industry since investors just write off the chance to invest in hardware start-ups because they have false information. Part of the problem is that in some of the failures between 2005 to 2011, discounting those due to the financial market crush, most of the founders previously worked for Intel or another big hardware company.
They launch a start-up and what surprises me is that they suddenly hire a group of 50 people as a first step. They keep expanding the number of people very fast because that’s what they are used to do in a big company. I think the problem of some of these start-ups is that they didn’t have the concept of LEAN startup and instead think that spending a lot of money in a big office and hiring a bunch of people, will do the job.”
The situation wasn’t the ideal so I had to ask. Did you think about quitting at some point?
“Yes. Many times. The moment in which your account is getting dangerously close to zero and you don’t feel positive about getting funding soon made me think: Ok, if we don’t get funding in one month, we have to figure something else out. But I thought that several months before getting the funding. Maybe it was stubbornness or naivety and probably not the smartest decision since we don’t have a back-up plan… but fortunately it ended up working.”
What happened at the end?
“It finally took us 10 months to get the funds. After 6 months of releasing the project we ran out of money, so we survived the next 4 months doing some contracting work. Luckily we received $2 million in funding before reaching a dramatic moment.
Leaving the funding aside, we needed one and a half years to actually put ideas together and have the fundamental architecture designed (in our heads). It was from February 2014 to September 2015.
Then a few more months designing the prototype and in July 2016 we produced our first chip, having spent to that moment around $1.25 million out of the $2 million we had received.
For full production, it would cost us around 2-3 million dollars to get the first 10.000 chips. Then for every wafer it would be just 2 or 3 thousand dollars, making the masks is the biggest cost.
How are chips made? (Link to video)
We have done just prototypes so far, but even if we want to go to full mass production it wouldn’t cost us more than 5 million dollars, pretty far from the $100 million dollars Intel states in cost to develop a chip. We proved experts were wrong.
The product: The NEO chip
According to Rex’s website the main problem in efficiency now is that existing processor architectures were designed in a time where the amount of energy to move data was roughly equal to the amount of energy required to do useful computation with that data.
Today, moving 64 bits of information from memory takes over 40 times more energy than the actual double precision floating point operation being performed with that data.
So to make it simpler, we spend way more energy moving data than computing it. What’s the solution then?
As Thomas stated for MIT Technology Review:
“Rex’s chips use less power because they don’t have a block of circuitry that is standard on chips from Intel and other companies. Such circuitry is a wasteful remnant from an earlier age. Those circuits manage the movement of data between memory stores, or caches, built into a chip and the processor core that actually works on data. They were introduced decades ago to make life easier for programmers, but have grown large and wasteful nowadays.
Our chips use software to manage their memory. So instead of moving data around to defined caches on the chip, the processors can throw data to the scratchpad quickly and have it compiled later on. That makes possible to remove the circuitry in charge of moving the data and make chips that have the same computational power but are smaller and need less power.
The result is that our chip can deliver a minimum of 64 GigaFLOPS per Watt, beating the magic number of 50 GigaFLOPS per Watt.”
It might look like a perfect solution but there is a downside in Rex Computing’s chips. Existing software designed for other platforms, would have to be modified in order to work on the NEO chip. Although they are designing the NEO TOOLCHAIN to make this work easier, there is still a lot of work to do.
“The first customers are going to need to work closely with us and have their own development teams. That’s the trade-off. Our plan is to first target companies constrained with energy efficiency, buying time while we create ways to make it easier for less urgently motivated companies to switch later.”
The vision of the hardware industry and Rex Computing’s footprint
“Even if we are not successful, at least we have shown that it’s possible to do things in another way.
I strongly think that there are huge opportunities for other start-ups in the hardware industry.
If giants like Intel keep spending $500 million to launch a new product and don’t have the ability to assemble small teams to work on a specific project in the way a start-up does, there will be plenty of room for many start-ups.
There are several hardware components that people can go after in terms of changing the design, not changing the transistor or trying to do something crazy in terms of the real production, just new designs that do things better than what they do now.”
A piece of advice to finish
“To all entrepreneurs in general… I would say to not dismiss an idea just based on the fact that experienced people say it’s impossible. The vast majority of people we talked to said that the idea wouldn’t work and it wasn’t technically solid.
Work flat out and go for it, isn’t maybe the best advice because there are plenty of people that have an idea that is legitimately crazy. I’m considering it more from the fact that the approach we have taken was disregarded just because it isn’t the way things have been done for so long and people just dismissed it on that fact.
I think, regardless the industry, if people’s only defence is to say, we have always done it in this way, one, that’s not a valid defence and two, there is probably a better way to do it. If you have that better way to do it, then there is a large opportunity there.”
With this great piece of advice we finished the interview. After concluding it, I’m still amazed by how such a simple approach might lead to such a big impact: let’s do what was done 30-40 years ago with the tools and knowledge we have now. Awesome.
Thanks so much Thomas and Rex Computing for this mind-blowing talk and the best of the luck in the future!
That’s all for now! Next week, we go back to Laos to keep learning about its peculiar start-up ecosystem, sign up below if you don’t wanna miss it 😉