Зарегистрируйтесь сейчас для лучшей персонализированной цитаты!

OpenStack Podcast #20: Joe Arnold

Aug, 26, 2024 Hi-network.com
From days of setting up internet in his dorm and almost being kicked out of college for it, Joe Arnold's tech roots are strong, and his passion is contagious. In the latest OpenStack Podcast, join the founder and CEO of SwiftStack as he discusses:

  • OpenStack Swift: What is it?
  • How the Enterprise Storage market is changing
  • How Swiftstack adds value on top of OpenStack
  • Standalone OpenStack Swift for particular use cases and industries
  • New O'Reilly Book, "OpenStack Swift", and how you can get a copy

You can follow the podcast and see the past and future guest schedule at @openstackpod and follow Joe Arnold at @joearnold.

For a full transcript of the  interview, click read more below.

Jeff Dickey:                All right. We're on.

Niki Acosta:               (Singing) We were talking about playing music to start off the podcast because we used to do that before we got lazy but were here and we didn't have music. That's okay because we have an amazing guest with us today. My name is Niki Acosta with Cisco.

Jeff Dickey:                I'm Jeff Dickey with Redapt.

Niki Acosta:               Mr. Arnold, one of the nicest guys in OpenStack for sure without a doubt. Please introduce yourself.

Joe Arnold:                Too nice Niki. Hi, my name is Joe Arnold and I'm one of the co-founders of SwiftStack and what we do is we make OpenStack Swift which, for those who don't know, is an object storage component that is part of OpenStack. That's who I am.

Niki Acosta:               We're going to get into your background story but first we were marveling at the amazing view you have behind you today. Tell us about your view.

Joe Arnold:                Well, we've been crammed together. It's so fun starting a startup because every step along the way and particularly in the early days, you change offices all the time. We went from my basement to a shared work station. When we had a co-working space, it was one desk and we managed to recruit John Dickinson from Rackspace. He and his wife walk in and they look at this co-working space and it's dingy, it's dark and five dudes all stranded around one desk. He's like, "Oh my God. What did I get myself into?" We've upgraded through the years and now we're in a great space in downtown San Francisco. It's beautiful. There's plenty of empty desks for future hires that we're bringing onto the team but its need to be a growth inflection.

Jeff Dickey:                That's awesome.

Niki Acosta:               We'll definitely get into that later for sure. Jeff, you can do the honors of asking the first question here. The first real question.

Jeff Dickey:                All right. Well, so the real question is, are people going to be stopping by your basement in 10 years to take pictures of where SwiftStack started?

Joe Arnold:                Yeah, I don't know. We're hard at work and office is one of those necessary evils and we've been the master of subletting a sublet. I don't know.

Jeff Dickey:                I'm so happy you guys are growing. That's awesome. We like to start off with just who you are and how you got into technology so take us through that journey from young Joe getting into technology and how you got into OpenStack from there.

Joe Arnold:                I've always been pretty geeky. It depends on how far back you want to go. I think when I was a freshman and even high school, I knew okay, here's the degree I want to get. Here's the University I want to go to. Get a computer science major. Get some business training. When I got to University, that was around the time when web development was happening and really got into it. The web was just being turned on. I'd gotten myself a couple of trouble spots a couple of times.

One time, I don't remember but reminders to your head when you had to do a dial-up modem and that was back when you do file sharing and there's really no hi-speed internet. Well, even in the dorms in my University, they didn't have wired internet. I couldn't believe it and so I'm like, "I'm going to fix this." Wired up the whole dorms in the night when no one was looking. One day, they discovered it and they kicked me out of the dorms. I got put on disciplinary probation but it subsequently landed me a job running some of the labs and IT infrastructure in the computer science department which is pretty cool.

I worked there for awhile and then I just got really obsessed about web and web infrastructure and web servers and things like that but I needed to have a project. Our university didn't register their .com address. They registered the .edu version of the address. So I registered it. Put up a website. Had fun with it. They didn't like that. They sued me and tried to kick me out of college but I made a deal with them and they let me stay in the University.

I don't know. I've been going back and forth of trying to build stuff, experiment with stuff through my whole career starting even back then.

Jeff Dickey:                That's good. That's awesome.

Niki Acosta:               We've done this twenty times and that is by far the best one we've ever heard.

Joe Arnold:                When I donated the name back to the University, I did it in such a way that I got a plaque on the donor board in the computer science department. It's HP, Cisco, and some rich people, rich person and then my name as a college kid donating to the University for the stupid domain name.

Jeff Dickey:                That's awesome.

Niki Acosta:               Awesome.

Jeff Dickey:                Yeah. It sound like you're very unique just from hearing about that background as both the technical and the business side to you on that. Negotiating and building?

Joe Arnold:                Yeah, you learn a lot about trademark law when you're forced to. From there you do what a lot of most folks did in around 2000. It was, hey, let's build a web development shop and start working on that. I quickly realized that you got bored of those projects and I knew I had to go to Silicon Valley so moved down there. Did a number of companies one of which became a part Aruba which is WiFi and we were doing network management. That's where I met a few of my co-founders for SwifStack. Then I got an opportunity to work at Yahoo. Spent some time at Bangalore in India. Had my first daughter there and then went to a company called Engine Yard. This is really where I learned a lot about open source and how to build a business around it.

For those who don't know what we did at Engine Yard was we worked on Ruby on Rails. We worked on Ruby itself. This was around 2006, 2008. 2009, stuff started getting started so think the introduction of the iPhone. Suddenly there's this big push for how do we build and launch these web applications? We basically took it from a, we can help you deploy and manage and operate and scale your Ruby on Rails application. How do you do that? How do you be an open source company, contribute to all these projects while still run a business around it?

We started doing managed hosting. That was one way to do it. Then Amazon, with the web services side, they were just getting started. This was around 2008 is the time frame is when they did that investment. They came up, Hey guys, come up to Seattle. They put us in a conference room and they basically started parading a bunch of the engineers who were building Amazon and like, "Hey, how can you use this? We're just about to launch EC2." What we did was we built a product like a web service that allowed you to manage your EC2 environment and then the layer that we put over it was, here's how to get a Ruby on Rails application running on that Amazon infrastructure. There wasn't a word for it at the time but just emerged to be called a platform as a service. Heroku was yet our chief competitor around that time and we just got a whole heck lot of experience running and learning how to use cloud computing infrastructure and build and deploy applications on it. That was a really cool experience to have. I was running engineering for them there when I was there.

Niki Acosta:               Then OpenStack, how did you make that jump to OpenStack? Who's brilliant idea was it? Was it the kid who got almost kicked out of college twice?

Joe Arnold:                Well, Randy Bias had a lot to do with this. He always falls into the mix here. I got an awesome opportunity to be able to work with Randy Bias. That was right around the time when OpenStack was getting launched and built. I didn't get a chance to go to the very first Austin Summit but I've been to all the other ones since. What Randy was able to do was build ago, okay, let's get this infrastructure up and running and let's get out hands dirty around this and work with some folks who are deploying it. We went to Korea Telecom and Internap and I was stuck in this, here's this OpenStack project. Go make it work.

What I gravitated towards really early was OpenStack Swift. I just loved the technology. I read everything I could about it. Dove into the source code. I just really got enamored with it. A lot of that had to do with I thought I saw something there that wasn't so obvious to folks where everyone else who first jumped into the OpenStack compute environment really started to take on just Nova and compute and networking and block storage and all those aspects. But the experience I had way before this was, whoa, we had to deploy applications in Amazon. These are big applications. This was Groupon, Seeking Alpha, New Relic, and some of them could go on Amazon, some of them couldn't go on Amazon. We had to understand what the different tools you had in those environments were. One of them was object storage. When we went to go to deploy customers in that environment, you couldn't just say, "Well, here's your distributed file system to use to store documents or profile photos."

What we did was we changed how they store that data by using Amazon S3 and object storage in order to get their applications to work. When Swift just came out, that was a direct response to S3 from Rackspace. Rewind to when OpenStack was launched, it was Nova and Swift. Swift came from Rackspace. That was already a production object storage system that was up and running and serving customers and so there was a semblance, a kernel, if you will, of something that was already hardened to a certain extent in a production environment. That also attracted to me.

Once we got it up and running in these environments, now I know okay, applications can be built around this. You can scale really well. Then we started doing operational testing around it. Like literally going out and pulling power plugs out of servers. Niki I heard you talking about this on the last podcast. That was like a huge proof point for me just because being in an operational seat dealing with storage when there's outages is such a hard thing to deal with and the system had the kernel to be able to survive those types of major operational events was pretty cool. That's really what got me sucked into Swift.

After working with Randy for a little bit of time, then I did the leap and started SwiftStack.

Niki Acosta:               For people who don't what SwiftStack is or maybe for people who've heard of it but haven't really taken the dive, tell us what SwiftStack is and does? Is it a managed offer? Is it all open source?

Joe Arnold:                Yeah, all right. Here's what we do. Number one: We're an object storage company and that means that for those who are building an application or have a lot of data to manage, an object storage is a great way to store that because it can scale. It's easy to manage and to operate. Those are what were some of the key tenets. Low cost because you can run it on standard or volume scale hardware. What we do is we work on OpenStack Swift which is an engine, would probably be the best way to describe this. It's open source. Then what we've done is we've built out deployment. We built out management, automation and we do that via this thing what we call a controller. The SwiftStack controller. That is what manages and operates that environment. We'll do things like we go into customer environments.

We can talk about it a couple here. Maybe it might be a good way to bridge in this like with Jeff, with Redapt, we're working with Ancestry.com. I think it's good to explain what object storage is. With Ancestry, they're storing and they're serving all sorts of documents and images and scanned records of one type or another. What that means is that there's a lot of simultaneous connections going into that storage environment and they're serving this content directly out to web pages, mobile devices. Object storage is perfect for that. What we do at SwiftStack is we turn that standard hardware, into a software defined storage platform that allows them to manage and scale that environment.

Now, there's no OpenStack in the rest of Ancestry yet but they can use the storage system independently of the rest of OpenStack. A distinction here to make is when you go and you want to just consume object storage, consume Swift, you don't necessarily need to plug that in to the rest of a compute environment that's up and running. Take for example, Time Warner Cable which is... they're all in OpenStack. They're doing compute, they're doing networking, they're all in. We're a component which supports that with backup images and archives and snapshots, things like that. But we're also a storage target for the next generation of things like Video on Demand or the time-shifted cloud DVR products that they're building out. Usually we're brought in to solve the storage problem and sometimes, OpenStack is involved in terms of the compute side but sometimes, it's just, hey we have a huge amount of data that we need to store. We have a large volume of users that we need to get up, get storing and surfing data from. That's where we get pulled in.

From a product perspective, what we've done is we've taken it away from being support or services. When you go to deploy an open source project, and we experienced this in the early days even back at Engine Yard with Ruby on Rails. It really takes a lot of expertise to set it up, deploy it, scale it. In OpenStack, in infrastructure, it's no different. It takes a lot of specialized knowledge to know how to get the systems up and running, how to integrate with the hardware that it needs to run on, how to manage that, how to tune that and then even once you do all that, you have to still plug in to your existing environment. You have to go, all right, how am I going to monitor this thing? Am I going to write my own SNMP traps when drives fail? Am I going to integrate this into my Active directory and LDAP environment? How am I going to do upgrades?

You have to hit point by point all of these little things that doesn't necessarily appear at the surface when you first get started with it, which is great. We love people pulling the open source code and getting their hands dirty. We have a book that teaches people how to get it up and running, just the open source bits. It's not something that we discourage but what we found is that when people get serious and they're running it in production, it's something that they want to turn to a company that specializes in that. Has the tools and the software already in place so that they can just plug it in and integrate it and then they can go. That's where a lot of the draw for us comes from.

Niki Acosta:               We partnered with you. Been at cloud now, Cisco OpenStack Private Cloud, we're partners with you. We rely on you guys, to come in and help us with folks who are really serious about needing object storage which there are many people who do. One of the questions we have on the Metacloud side, which I'm sure you do too, is "If your public cloud already does this does this, if Rackspace cloud files are already does this, if Amazon S3 already does this, then why would I turn to do this in a private environment?"

Joe Arnold:                Great question. There's a couple of things. Cost is a huge reason why people will pull back out of the public cloud or reconsider not using the public cloud because when you begin to operate at very large scales, and by very large, I don't mean horrendous scale, I mean a few racks worth of equipment. A few hundred terabytes, a petabyte...which in our world isn't a tremendous amount of storage...it can be really expensive to run on the public cloud. S3 in particular. It's the one that we model it out against. We have spreadsheets, TCO, calculations that we help people do because often times they're being challenged from up above. They have a CTO or CIO who's saying, "Well, let's investigate what the public cloud has to offer?" That's a totally valid thing to go out and do. When you actually go pencil the cost out, it can be a lot less expensive to bring the storage and even some of the compute environment on premises. That's one.

The second thing is, sometimes there's some data workflows that really just can't go over public internet or if you did, it would be exorbitantly expensive. Can you imagine you're shooting 4K video which has this crazy bit depth? There's so much data going into the workflow. There's more cameras. If you try to put that over a wire and put that into a public cloud provider, the costs are just game over. The time it takes to upload it is just crazy. You need to develop these workflows on premises just so you can get the speed of the workflow down. We're saying this across the board from video production to research and new equipment that's just producing data...just a tremendous amount of data out of some the scientific equipment to even some of the security footage, and storing and retaining and archiving that. That's category number two why people do on premises.

Number three, there's security and compliance reasons and very, very valid reasons why they don't want to put data in public cloud. Regulated industries, things like health care and finance. Those industries, they still want to be on the cutting edge of building an application. They still want to maintain agility for the developers as they're building out these applications. They have to provide all these tools. They have to give them instances on demand. They have to give them object storage. They have to give them these tools so they can deploy, launch, integrate, build, scale, just as fast as folks in other industries... but they're highly regulated. They have to do everything on premises and they have to be building out these internal clouds for them to use so that they can get to market.

Niki Acosta:               One of the early use cases that came to mind when I was back at Rackspace that I think I was asking John, your CTO, to help me with was a bank that was basically moving to an online banking system and so they were, obviously they scanned your checks or whatever, but they needed to store these checks for 10 years or something. They didn't want to do it in the public cloud. You're not going to put someone's copies of checks in a public cloud. The trust wasn't there and to some extent may still not be there putting that stuff out of your 4 walls. I see you guys have done a lot in the media space which... there's a lot obviously happening with mobile and media but how do you guys play well in that space?

Joe Arnold:                There's really two worlds that need to be solved or two really big pain points. It's first just the amount of data that's ingesting. You have just a tremendous volume of data that's being created and that needs to be stored. The original copies need to be stored and while it's fine to produce something in it, put it on a tape and stick it on a shelf, what's really neat is happening up on the content distribution side. If you think about when you go to produce and then deploy ... Sorry, I'm using programmer words for this but when they go to deploy the product that they have, it's in a certain language, certain frame rate, built for certain devices. Well, let's say a year later they want to go back. We're going to re-translate this into Japanese or another language. What they can do is they can just pull the original source files out into their work flow editors, make those changes and then re-cut the thing and then re-distribute it. What object storage allows them to do is instead of just keeping a linear version of that whole production they've made, they can actually keep the project in individual files and then just re-hydrate that back into their environment. It allows them to do more, get more value out of that content that they've already produced. That's one.

Then on the backend is around distribution. This is actually a place where object storagereallyshines because it can be a content delivery machine. Even if it's feeding a content delivery network, it's really good because it's speaking HTTP already. What that means is that you can serve that content out. Let's say it's a long tail content like Ancestry.com, long tail content. Time Warner Cable, you're going to have some popular shows. The popular shows you can feed into content, whatever the content delivery mechanisms. You can put caching in there to speed the access but that long tail stuff can be served out of the system directly and because it speaks data of what protocols, it just makes building applications around that much easier.

Niki Acosta:               You're making object storage sound so fun. You're just so passionate about this. It's really awesome.

Joe Arnold:                Yeah, I know. We have a workshop around ... Pinterest? Do you know that application? That web application.

Niki Acosta:               It's my demographic. I'm the target demographic for Pinterest.

Joe Arnold:                What we built was something entirely written in Swift and we called it Swinterest as a programming exercise to teach people how to use object storage. It has different users and you can upload photos and you can rewrite, tag them and things like that because there's things like metadata associated with objects. That can be fed into a search index. You can have different users. All of this can be entirely written in Swift because you can store this data, you can store the metadata, you have multiple users in it but it's a fun exercise to show people the power of how these objects ...

Jeff Dickey:                We've talked about the different use cases and some of the stuff. You guys are involved in some pretty large scale projects around this object storage. What are some lessons that you've learned from deploying Swiftstack at scale? What are some the things you've learned and obstacles you've overcome?

Joe Arnold:                Hardware matters and what's important is the flexibility of that hardware. Often times people think that you can take something a storage system and just put any old hardware in it. That's true in a technical sense. John Dickinson, the project technical leader for swift, he has a blog post up that he did on his personal website where he got Swift running on a Raspberry Pi. That's awesome. You can totally go out and do that. What we found is for people to have a good solid experience, particularly in the enterprise, people want a piece of hardware to consume. They want the software to be installed very easily on that environment and then they want to be up and running and have the ability to support that over time. Hardware's important because you want to know the use case, so if I'm going to go in an archive workload where I want where price-per-gigabyte as the overwhelming factor, then okay great, here's how to set this up. Or if people are setting up a situation where they want to serve lots of content, a high-throughput environment, then that configuration's going to be a little bit different. That's one.

Networking is the second thing that is important. You have to understand the data flows because we're not deploying a single rack storage environment. It's multi-rack. Almost every customer is multi-data center for either serving data or remote office or for disaster recovery and so having the understanding of what can go over the WAN link between those environments, the capacity of that WAN link configuring which networks are used for data transfer versus serving data. That would be the second thing I would make sure that if you're taking on a Swift project that you make sure you spend some time and understand how that's going to be laid out.

Jeff Dickey:                It's interesting you bring up a disaster recovery scenario. I'm not obviously the foremost expert on Swift but does that sort of ability to have DR, where does it happen? Does it happen at the hardware level? Does it happen at the OpenStack level? Does it happen at the application level? Can it happen at all levels?

Joe Arnold:                We tend not to do it at the hardware level. We're in discussions with some more traditional storage vendors. There's a moment. There's like, "Whoa, you guys spend all your time thinking about how to deal with hardware failures. That's all your exceptions. That's what your head space is at." We spend all of our time thinking about how topreventfailures from happening. It's a little bit of a different mindset. What you do is you rely on either replicas or erasure coded parity bits being distributed across a lot of different places and then you leverage that so when there's failure on a piece of equipment or you can't route to a certain location, then you're still okay because you can serve or store data in these places where you can have access to.

That's just a different mindset on how you build out a storage environment. You think about all those corner cases, not just about a single rack but how does it affect multiple data centers, multiple racks of equipment. For disaster recovery... maybe disaster recovery isn't the right word for it, because what people are really doing is they're putting their infrastructure in two different data centers and using them both. Most of the time, the reason why they want to do this, it starts out with disaster recovery as a reason. What that often ends up into is, hey, how can we provide a better experience for our users whether they're internal users or they have an application they've built. What we can do then is we can send users who are on East Coast versus the West Coast to two different data centers and they can get a better experience and better response times. You can service them much more quickly. Yes, when one of those data centers does go dark, then we can just route everything to one of the other data centers and then the application can pick up where it left off.

There's another thing too about it's the difference between object storage and how you build applications. That infrastructure layer is dealing with the failures. It's not like you have to build into your application, oh now, I need to go over to this data center. You don't have to put that burden on the developers. The infrastructure team can take that on when they're using an object storage using Swift. Then that infrastructure can deal with it and the application doesn't' necessarily need to. It does have to know how to use object storage instead of a file but the trade-offs are just much better for application developers.

Niki Acosta:               I think you're onto something there for sure. A lot of what we're hearing especially in regards to platform as a service is at the end of the day, people are just trying to make things easier for developers and I'm sure you see when you have this discussion especially in the enterprise, you probably see all the light bulbs going of like, "Whoa, we can do all that stuff? We can use two data centers at the same time?"

Joe Arnold:                It's funny too because the operators, they love it. They just get excited about the direction and the road map and the future of the next generation of things that are coming out. Then they go and stand it up and they have to go, all right, what can I do with it? There's usually two phases. One is just taking on operator workloads. Things like back-ups and archives and snapshots. I can't tell you how many petabytes of just database back-ups we're storing or virtual machine back-ups that we're storing. It's a ton. The reason is because the operators can go, all right, I'm buying into this and they get it up and running. They get it deployed and then they just start unloading all of that type of storage into this environment. That's the first step. The next step is, okay, let's evangelize this technology. Let's start getting it worked into the workflow of what's already there.

That's actually the tricky bit because you have developers who can come on and they love it but you might not have an existing applications that support the object API. They bought into this direction. They want everything object in the future but they have this gap in between where not all the applications speak object. One of the things that we've built as part of the product that we license is a file system gateway. What the gateway does is it mediates files to objects

tag-icon Горячие метки:

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.