Samuel explains the difference between concurrency and parallelism, the dangers of writing multithreaded code, how languages like Node, Go, and Erlang safely handle parallelism, and his efforts to improve the Ruby concurrency ecosystem.
Samuel is a member of the Ruby core team. He's working on making it safer and easier to write concurrent applications in Ruby.
- Asynchronous Ruby
- Fibers Are the Right Solution
- Early Hints and HTTP/2 Push with Falcon
- 2019 Ruby Association Grant
- Source with comments on why the Global VM Lock exists
- 0:51 - What's concurrency? What's parallelism?
- 5:49 - Piping commands between Unix processes provides easy parallelism
- 6:58 - Some types of applications abstract out threads/processes,
- 9:27 - Many Ruby gems have thread safety issues
- 10:44 - The CRuby Global VM Lock hides thread safety issues
- 11:24 - The problems with threads and shared mutable state
- 13:58 - Examples of mutexes causing problems in Ruby gems.
- 19:09 - What a deadlock is and how it causes problems
- 19:51 - Running separate processes to get parallelism and using an external database to communicate
- 21:01 - Lightweight process model used by Go and Erlang vs threads in Ruby
- 23:50 - Why async was created
- 24:38 - What is Celluloid? (Actor based concurrency for Ruby)
- 26:29 - Problems with shared global state in Celluloid
- 27:12 - Lifecycle management problems (getting and cleaning up objects)
- 28:19 - Maintaining Celluloid IO, issues with the library
- 29:43 - What's async?
- 32:00 - What's an event loop?
- 35:20 - How tasks execute in an event loop
- 37:29 - How IO tasks are scheduled with IO.select
- 39:41 - The importance of predictable and sequential code
- 41:48 - Comparing async library to async/await
- 45:23 - What node got right with its worker model
- 47:10 - How async/await works
- 48:35 - Fibers as an alternative to callbacks
- 51:10 - How async uses fibers, minimizes need to change code
- 56:19 - Libraries don't have to know they're using async
- 64:55 - Reasons for the CRuby Global VM Lock
- 67:13 - Guilds as a solution
- 69:14 - Sharing state across threads
- 71:33 - Limitations of Ruby GC at 10-20K connections
- 72:00 - Sharing state across processes
- 73:12 - Handling CPU bound tasks with threads and processes
- 77:42 - Which dependencies are messing with state? (Check memory allocations, sockets closed)
- 85:00 - Async in production
- 87:17 - Wrap up
Click here to help me correct the transcript on GitHub!
Samuel Thank you. Yeah, it's really great to be here.
Jeremy So the first thing I want to kind of break down for our audience, you know a lot of these topics are related to Performance, But let's kind of get the ground terms defined first.
So to start what is concurrency?
Samuel So. I guess there are lots of different ways of defining this terminology, but it is helpful to have a shared understanding which is consistent. Otherwise, you can say one thing and someone understands something completely different. So I like to use an umbrella term asynchronicity which means sort of without time.
And under the umbrella we have concurrency and parallelism. They are similar terms and you can argue that they mean the same thing I guess but I tend to think of concurrency as interleaving work. So if you have a job and you execute that and you have another job, which you execute. And interleave those units of work on the same processor, then that's a form of concurrency because you're sharing the one processor with multiple tasks.
Essentially. If you look at parallelism, it's when you have two tasks which are running simultaneously like on two different processes. And so those two different kind of concepts one is sharing a single unit of hardware between multiple tasks and one is running multiple tasks on multiple independent processes.
Jeremy And so essentially. When you have parallelism, you're saying that you have concurrency because concurrency is working on multiple tasks at the same time, but not necessarily executing at the same time. Is that correct?
Samuel I like to be a bit more specific with the separation of those two terms just because I think it's helpful when looking at actual programming models.
So for me, I like to separate them out. I like to think of parallelism is strictly when you have multiple hardware units and multiple jobs running on those hardware units independently whereas concurrency is sort of strictly running multiple jobs on a single hardware unit. Now, of course, you can you can sort of mix and match it together for example any job that can run concurrently.
Could also be considered running in parallel if you consider like one processor to be parallel, but it's kind of a misnomer you kind of think. Okay, well parallelism means kind of two things at least like a minimum of two things. So like you can't have a line parallel without some other line for it to be.
It's a relationship between two things, I guess in that case. So yeah, I like to think of those things strictly as being different just because I think it's easier to talk about the different models. If you decide to blur those lines, then it's more difficult to talk about concurrency and parallelism and how they impact models of programming and models of computation.
Jeremy And so basically you prefer to keep them completely separate. So let's say you have a single core CPU. It can only be doing one thing at a time so that could at most be having concurrency. But as soon as you introduce say a dual-core CPU, you would say that you know, my program or my tasks are executing in parallel and not really talking using the term concurrency.
Samuel Yeah, I mean ultimately when you have two cores then you can run tasks on those cores in a concurrent fashion, but you can also run two jobs completely independently and they run in parallel. So I guess the value of it is that when you talk about parallelism you start needing to think about synchronization primitives across multiple cores, whereas with concurrency you are limited to the issues that arise only when interleaving jobs so the kind of the nature of the synchronization primitives needed is quite different.
if you think about normal computer programs that run in sequence, you know, you're essentially interleaving one statement after another. I mean, of course it's a little bit confusing to think about it like that. But in that sense the dependencies just follow naturally from the sequential processing.
If you tried to run every statement like every line of a computer program on a different CPU core, you would need to synchronize each individual line. thinking about that, I guess if you look at how concurrency and parallelism affect the models of computation. I think that it's more useful to think about concurrency as just interleaving jobs on a single piece of hardware versus parallelism which introduces a whole level of additional kind of complexity with regards to synchronization.
So it's like a juggler. Yeah, so you have a juggler and he's juggling 10 balls that's concurrency and you have 10 jugglers in the juggling one ball each. That's parallelism. You can take any like real world situation where you like one person is doing like multiple things and you just multiply it by the unit number of people. It's parallelism. It's no reason why you can't have 10 people juggling ten balls each, and then you have like a mixed model. So. But I think it's useful to have terms that explicitly refer to one person juggling 10 balls and 10 people juggling one ball to me. Like those are the those are the kind of it's useful to have that separation. Otherwise like we just have this huge ambiguity when were talking about it.
Jeremy Got it. So when let's say someone is first learning about how to improve the performance of their application. It sounds like you would suggest they first start with concurrency because then they don't have to worry about the synchronization and that sort of thing.
Samuel That is quite a interesting question in the sense that if you've got a program and you want to improve the performance.
Some programs naturally lend themselves to being broken into separate pieces. For example when you run commands in a Unix system and you pipe the output from one command to the input of another command naturally those commands because they they are interfacing just with the pipe that's communicating from one process to the next those two processes can run completely in parallel and you get that essentially for free like you don't need to think about it.
Whereas when you think about a program in they don't necessarily know how to scale it up. You don't want to impose on them all the overhead of saying well, you could have like multiple threads and then you need to basically load your configuration and then split your work.
Um into like multiple separate pieces, you know, make a thread for each piece of work and then at the end combine it all back together like a map reduce approach. So I think ultimately parallelism can sometimes be super easy for people to use and sometimes it becomes super complicated. It just depends on the kind of problem.
You're trying to solve and in the same vein, I guess like concurrency is the same kind of thing because it really depends on the kind of problem. You're trying to solve for example concurrency is something which will the case of like async. For example.
It is feasible to take a program which is largely sequential and improve the scalability of it. When you are dealing with a certain kind of use case like a web server, which is processing independent requests. If you have a web server that's processing independent request those requests can run independently of each other.
But so whether you use threads or parallelism or concurrency to improve the performance is largely hidden from the user so that they again don't need to know so much which approach is being used to improve the performance of their code. But sometimes you get to the point where you have a program where these issues do become.
Intertwined with the logic of the code. So in that situation, then you do need to be aware of like how parallelism is affecting your code whether you're doing locking correctly or if using kind of a more concurrent style approach like how is the event driven system work and are you using callbacks or async await or some other like approach?
So like, you know, it really depends at some level you can write programs and you can utilize hardware resources more efficiently without necessarily any cognitive overhead in terms of how you're writing your code. And in other situations your code is going to be intimately aware of like how it's using those resources and that's probably the more tricky situation irrespective of whether you use parallelism or concurrency.
Jeremy So I guess in a lot of cases the developer can rely on I guess components that have already been built like, you know, for example, you wrote the async library people who are using a framework like rails. They're built on top of a web server like Puma which already uses threads so in a lot of cases
the developer doesn't necessarily have to worry about how the concurrency or how the parallelism is being accomplished because that part has been packaged up so that the developer doesn't work with it directly.
Samuel Absolutely, so in my mind when I'm looking at how to solve these issues I'm thinking.
What kind of concerns should the developer have in their mind when they write in this code? Do they want to think about how the code is being scaled up or do they just want to get on with writing their code in the most logical way possible? And I think that that is kind of the key difference between parallelism and concurrency with respect to programming models because ultimately parallelism introduces a lot of potential issues that
people may not even be aware of when they're writing code and that ultimately is going to become more of a problem with Ruby as we embrace thing s like jruby and truffle Ruby we have actual threads. of the things I've been looking at recently is I grabbed a copy of the rubygems database and.
I looked at the top ten thousand gems by download count and then I looked at those gems and I analyzed the source code and I looked at which gems have like mutex thread or synchronized key words in the code somewhere and then out of that list. I just went through them and I looked at the various usage of mutex thread and synchronize primitives.
It was not a pretty picture in about the ten gems that I looked at. I found probably about half of them have threading issues which are relatively trivial to encounter in you know, in just typical situations. So I guess the concern there is like systems like Puma where you have multiple threads serving requests, are they safe and.
CRuby, which is the common interpreter used by most people these days has something called the global VM lock and that prevents multiple lines of Ruby executing simultaneously like in parallel. So you're kind of restricted to a form of concurrency with CRuby, even though it appears to have multiple threads.
So because that developers who write code they can have threading issues, but they don't realize it or it's not it's not apparent that there are issues. Because the code appears to work, but then you run that code on jruby or truffle Ruby or just basically an implementation of Ruby that has real threads and the whole thing just falls to pieces.
there's that there is the problem really like parallelism introduces too many concerns without good enough isolation and isolation, I mean avoiding shared Global State, you know, it's just very very tricky for developers to build build systems that actually robust and reliable.
Jeremy And this shared Global State you're referring to you were talking about mutexes and things like that.
And those are locks to attempt to prevent different threads from touching the same Global State at the same time?
Samuel Yes, so. I guess what I can do is I just briefly explain what a mutex is for and the kind of situations. I saw it being used in the code that I looked in. So a code and a code base where you have say shared mutable state, so what I mean by that is you might have have a operation and the operation is expensive.
And so you want to cache the results of the operation so that subsequent. parts of your program that might want to do the same operation don't have to recompute that so like an example might be looking at a remote web service and pulling down some data and then caching it. So what you would do is you would have like a hash table and you would compute a hash key.
It could be the URL of the request plus maybe the parameters that you're posting. And the value of that would be the result of that request. For example, it may be that request takes like 30 seconds or something who knows it's a slow request. So you build this cache and then you deploy that using puma and now you have eight threads maybe sharing that global cache.
So the problem is like if two threads interleave the operations. So what that means is you have two threads making the same request they both check to see if the key is in the cache. It's not so they both make the same request and then they come back and they're trying to write the results into the cache.
Their operation is not thread safe. And so that can cause the cache to actually become irreparable damage. Like internally the data structure and the computer will become like damaged and that could cause the program to crash. The idea is that those threads the operation should be mutually exclusive so mutex is just short for mutually exclusive and what that means is in your operation, you're fetching the remote resource and putting it into the cache.
You put a mutex around it so what that means is only one thread can enter that part of the program at a time. And so that prevents these kinds of collisions from occurring in your programming and causing these unexpected side effects or crashes or whatever else disasters can happen. So ultimately with a shared Global State you'll put a mutex around it and what I saw in the code that I analyzed was mostly a very poor implementation of this kind of shared Global state.
There are other reasons why it's bad we can talk about them in a bit. But I guess I'll give a specific example. One of the ones I was looking at recently was nokogiri, which is a very popular gem for parsing XML and that gem when you do a CSS query on a document. It maps it into an XPath selector.
And then the XPath selector is cached. So it caches your CSS selector string is the hash key and the XPath is the value that's computed because that process like a little bit slower going to parse the CSS and turn it into some like AST and then turn it into an XPath. So you know it's a little bit slow and they want it to be fast.
They have this operation which says it's a block that says without cache and inside that. They basically switch caching off and then they do the operation and they switch caching back on and when caching is Switched Off the code path that goes to the mutex is disabled. The problem is if two threads call it without cache function and they interleave the switching off and switching on the first thread switches it off then second through comes and goes uh switched off.
So I'm going to you know, I'm fine and and then the first thread switches it back on and the second thread goes well, it was switched off when I started so I'm going to switch it off. Now I'm done. So you start off with the cache enabled these two threads interact and then the cache is just permanently disabled.
And so you just end up with situations, you know variations on these kind of race conditions and I have to say like it's a little bit shocking to see how many gems have these kinds of issues and I guess they're relatively trivial to fix but the reality is they're out there and they're out in production code right now.
So that's the problem.
Jeremy And is the reason why this hasn't been an issue so far specifically because of the the global VM lock you mentioned before like even though these aren't thread safe because only one thing can execute at a time. We're not getting a segfault when these two things are trying to access the same thing.
Samuel I've only tested about five different gems and of those five gems I was not able to get a segfault yet although I did get some pretty unusual results in JRuby Coming to your back to your question. Yes, the main reason why a lot of these issues aren't so obvious is because C Ruby does impose a kind of form of concurrency or Mutual exclusion on Ruby code.
So there are definitely situations where I've seen it break in CRuby as well. I was recently looking at another one.
So I was looking at Faraday and Faraday, they have a mutex so which is good and they used it to lock around setting up there connection structures because Faraday's is supposed to be thread safe. You can use it on any thread.
Jeremy and Faraday is a HTTP client.
Samuel That's correct. Yes, I should have mentioned that and what they have is they they have an instance variable called middleware mutex and they write middle at symbol middleware mutex or equals begin require monitor monitor dot new monitor is a form of re-entrant mutex.
It means that if you lock it once you can lock it again in the same thread and it won't block. And because they they lazily initialize the mutex. If you hit that function with like say 10 threads. You either have one thread which creates the mutex and successfully lazily initializes the instance variable or if you're really unlucky all 10 threads lazily initialize their own mutexes 10 times and you have ten separate mutexes and it's like, you know, it's pretty bad.
So, that sort of code flow that can occur on CRuby as well as JRuby and TruffleRuby. You're more likely to see on TruffleRuby and JRuby would just because they don't have the GVL so they don't have any form of implicit form of mutual exclusion on Ruby code. So you know, I guess ultimately if you're using Faraday.
And you're expecting stuff like this to work correctly, you know that one in a million time that your app crashes and you don't know why maybe it's there. I mean, who knows these issues are just super insidious. They cause very very difficult to diagnose problems. So parallelism, I think you know to try and put like a summary on there is just something which introduces so many issues and we're just scratching the surface really with regards to the kinds of problems.
Jeremy and so in. All these cases the problem kind of stemmed from needing to use a mutex and in a lot of these cases.
I'm guessing that there is some kind of shared collection.
Samuel That's correct shared shared Global state is basically the key that holds together all of these problems. So yeah, I think ultimately what it comes down to is if you have a gem or a library that has shared Global State you need to be incredibly careful with how that works in a multi-threaded environment because the chance of it being wrong, Is probably a lot higher than the chance of it being correct.
Just based on my experience. Yeah, I
Jeremy because it's so easy to miss something or make a mistake
Samuel just had to just have to miss one thing and then something disastrous can happen. It doesn't take a lot and I think that's that's the that's the problem with parallelism. Is It ultimately your behold and any kind of shared mutable state.
Can potentially introduce these issues and actually it's not just even that but actually the interaction between different parts of your code is very difficult to reason about. there was one fragment of code in the AWS Jim. I was trying to figure out if I could get it to deadlock Unicode. It takes a log like it let something like goes Music Stops synchronize and then yields back to user code.
Can deadlock because if you don't have a re-entrant mutex, like you're not using a monitor then what can happen is you you lock your mutex and then you call the user code and then the user occurred essentially tries to go back into that code. So the code that has the mutex and it tries to lock it again.
And as soon as you do that you have instant deadlock. So the code is. The program will just simply stop working fortunately thing in Ruby. I will just crash it won't just let hang but you know, there's lots of different ways parallelism can create problems and I think if you look at how people are using Ruby in the real world, you start to realize things like unicorn we have a single process is not such a bad idea, you know, sometimes using multiple threads and Ruby is probably more hassle than it's worth.
Jeremy And in the case of unicorn they're spawning multiple processes, right? And so do these processes they they don't have any kind of shared state. Is that why they don't run into the same problem?
Samuel Yes, so. I guess with unicorn the way that web server works and feel free to correct me if you know I'm wrong but.
It loads your application and then it forks a number of child processes and then those processes essentially just handle one request at a time and so the value of that I guess is that you don't have any parallelism issues. Like there's no chance of two threads to access the same the same shared State because every process has its own copy of memory.
Jeremy And so in the case that you need to share some kind of state amongst your different processes. It would have to be through like a separate application like say like a database or something like redis that sort of
Samuel Redis, a database, even a Unix pipe would be okay. So I guess go is a language which tries to solve this problem by forcing or by encouraging people to use channels and so a channel is a sort of a communication between two go routines where you can see in objects and information between them but it's not shared mutable state because basically when the object goes in one end of the channel, it gets serialized and comes out the other end and so the two processes they can't clobber each other's data and erlang uses the same approach actually erling has a way of communicating between lightweight processes in a similar fashion to avoid any kind of issue with shared Global State and I think if you look at where Ruby's going maybe guilds whatever they can end up being called. It's the same kind of model. It's trying to avoid having any chance of shared mutable State just because basically experience shows us it's impossible to deal with it. in general.
Jeremy so basically they're sort of sidestepping the problem of sharing mutable state and sending these I guess they're kind of like you're not operating system processes, but they're kind of more like lightweight runtime processes, right
Samuel That's correct.
Jeremy So then all these processes would have to I guess it would have to be very chatty, you know, they would all have to kind of send messages to one another to duplicate the.
Samuel I mean definitely overheads with it. I mean if you have a specific kind of problem. And you can isolate your problem in the runtime of the problem to a specific of circumstances. Then using threads can be a great solution and can scale very well, but just for general purpose code where you have people who aren't necessarily aware of the issues. You don't want to necessarily expose them to all that pain and frustration.
So, yeah, ultimately I think. with the lightweight process model. Do you get most of the benefits of scalability with very few of the pain points? And that's not to say you can't achieve that. So obviously with go and erlang you have specific semantics and syntax in the language to support that. But using a message bus or redis or a database for communicating between processes is totally fine way to do it like it might not be as efficient is something which is built directly into a language which in the language in the runtime is designed around it form of concurrency and parallelism, but it's not a big deal that you can do.
Jeremy So I guess in the case of go and erlang they have a runtime that's very much built around this concept of lightweight processes and message passing and with something like Ruby where that's not built into the runtime. You're kind of saying like well, maybe instead of having the runtime manage processes you would just spawn operating system processes and talk between them with a message bus or something else like that.
Samuel Yeah, absolutely. Well, I think. when you look at how Ruby is positioned right now obviously threads are the predominant model for achieving scalability whether that is a good idea or not. I think we can show that it's not actually that great. It might be okay in practice if you can avoid kind of pitfalls. But you know what my analysis showed me is it a lot of these gems that people are using that are super popular. They have potentially thread safety issues. And I , I've only looked at 10 of them. What about the other 10,000? It's quite concerning. So, you know, I think a different approach is needed for Ruby and I think that's what kind of brought me to async and. When I sort of thought about how do we solve these problems reason why I built a sink was because I got frustrated with Celluloid and don't get me wrong. I really like the ideas behind Celluloid, but Celluloid was very very difficult to test because it depended on shared Global State and if you had one spec fail. It wouldn't necessarily clean up the shared Global state that it left behind like you'd have actors that were still sitting in the global namespace.
Jeremy Could you kind of briefly explain what Celluloid is ?
Samuel Yes, so Celluloid is it was a very popular framework for Ruby actor based concurrency and I guess even some form of parallelism as well because you could run them on separate threads the way that it works is you have objects and those objects have methods and when you call a method on an object, it's not synchronous, it's asynchronous. And so what that means is when you go object dot do something. What you're actually doing is you're making a little message you're packaging up all the arguments into that message in your putting it into the object's mailbox. And then that object the remote object is basically polling on its mailbox saying did a message come in did a message come into the message come in. When it gets a message, it will do the work and then it if required will post a response back to the object and so it runs in a way which allows those objects to operate in parallel. So if you have like a lot of objects, you know, they can work together to solve problems and they can run in parallel. They run independently, and if one object crashes you can restart it. And so there's some robustness guarantees as well. It was quite popular.
Jeremy so on the surface, it kind of sounds a bit like the go channels or erlang processes. Just kind of brought into Ruby. Does that does that sound correct?
Samuel To a certain extent I guess the the detail of how these systems fit together makes them different and so erlang and go routines erlang with its lightweight processes go with its go routines and Celluloid with its actors and They're all similar. They all trying to solve the same kind of classic problems but the way they go around doing it is quite different and the semantics is where they differ quite significantly terms of like you know how you handle robustness issues how you handle failures? You know, how you handle that State and State transitions? All those kinds of things can be quite different between those systems.
Jeremy And in the case of celluloid. you were talking about there were problems with shared Global State. Like how is that working in the context of celluloid?
Samuel So in Celluloid when you create an actor and you instantiate it, it becomes part of the Ruby process. So it becomes Global State and you can communicate with itand when you're running specs you kind of want to go okay setup. Set of parameters for my spec like, you know, it could be you load a configuration file or you prepare your object in the certain state and then you essentially do something and test that the result was what you wanted and ideally if you run that spec it's isolated so that you don't have. If you run the spec and it fails it won't impact like some other subsequent spec in the case of like running tests. So Celluloid those actors would sit in the global namespace. And so if you were trying to test them you had to do a lot of scaffolding to sort of spin them up and tear them down at the end of the spec. There was no kind of implicit State Management or lifecycle Management in them and a big thing for me that I think trips up a lot of users is life cycle management the. What I mean by lifecycle management is. Creating an object using an object and getting rid of an object and surprisingly enough. A lot of code doesn't do that very well in the sense that when you an example would be code that connects the socket to remote system communicates with it and it doesn't close it and then expects the garbage collector to go and close it when it goes out of scope. In my mind a lot of that stuff should be more explicit because I think it avoids a lot of potential issues and potential bugs. So life cycle management is super important and I think Celluloid really lacked a good model for lifecycle management.
Jeremy And that kind of brings us to you created a new library called async which I believe is centered around concurrency. So could you kind of go into what sort of differences are in a secure? How does a sink work and how does it compare and that sort of.
Samuel So at a certain point I was maintaining Celluloid IO which is an event-driven IR reactor which could operate inside a cellular actor and I was maintaining that because I was building a tool called Ruby DNS and Ruby DNS is a ruby client and server DNS implementation you can use it for doing all sorts of crazy things. Like one of the examples is a DNS server that hooks up to Wikipedia. So if you query the DNS server for the keyword or return the first paragraph from Wikipedia, and I was just interested in like how do you scale it up? Like how do you build something in Ruby that is that makes sense in a scalable with a you know, straightforward logical code and we almost managed to get to a 1.0 release. Well, like, you know sort of a major release of Celluloid IO that would support that and I had Ruby DNS all lined up to go and then they were just spec specs and Ruby DNS that would just randomly crash for no obvious reason basically because of issues in Celluloid or Celluloid IO, and I could not fix those issues and at some point I think that after about six months of just like trying to make that work I was like this is crazy. Like this is we're never going to fix these issues. We're going around in circles. So at that point I was like, okay. I'm sick of like. Not being in control of like the thing which underpins Ruby DNS. I'm just going to build something. It can't be that hard. So that was when async was born async was kind of the product of frustration. And so what async is is its kind of it takes what I think were the best parts of Celluloid and. Plus a life cycle like a model for life cycle that makes sense in the scope of Ruby and then turns it into something which I could run Ruby DNS on top of and run. My specs and not have them crash and run just be super reliable, basically, so. I think essentially is just a reactor which lets you do things like non-blocking I/O and timers and then on top of that everything else. It's like sockets and networking and so on and so forth. Then on top of that is Ruby DNS, which is essentially just doing network IO. And I think actually now that I think of it the motivation was actually that Wikipedia DNS because what what actually happened was I was using I can't remember was with event machine or. Celluloid but I wanted to have a DNS server that could receive the request for like cats dot Wikipedia and then it would do a web request to Wikipedia's API, which would get back the first paragraph and it turned out that there were no there was no way to combine that Web request with the DNS server because the web request I think I was using like REST client that just assumed like a multi-threaded environment. So that request would block and then my Ruby dienes which is running. Either on event machine or Celluloid. It was non-blocking so was event-driven so as soon as you would have one request come in that would do that web request the whole thing would just lock up for the duration of the Web case. No other requests could be processed at the same time. And so I was just like this is crazy. Like we can't take these two components to make them work together. Yeah, I'd be have to go. From event driven back to like multi threads in like you can't spawn one thread per DNS queries and saying like it would just never scale it. So I was like, it's got to be like a better way to like solve this problem and combine all these pieces and I think you know, ultimately that's what led to async was. How do you build something which lets you do this? And so now Ruby DNS can do it. You can do that kind of thing. You can have a DNS server that will do event-driven. Create queries coming did event-driven and then if you do like a web request then that is also part of that event Loop so it won't block other requests.
Jeremy Yeah, so it sounds like the one of the big frustrations came out of the fact that you were working with something that used an event Loop and then to do other parts of work you were using a library that was expecting threads and the two just just don't work well together. So could you kind of talk a little bit more about what an event loop is and what is it is in relation to non-blocking I/O and kind of when you would use that instead of threads.
Samuel a really good question. So. What an event Loop comes down to his literally just a loop. And in that Loop you can do a couple of different things. It really depends on what your goals are. If we just focus on the simplest kind of event Loop which would just be timers essentially you have a piece of code like your own code that you want to run and you want to say after 3 seconds do this then. And so what your Loop is going to be doing is it's going to be saying I have a list of things that the user wants to do and one of those things is waiting for three seconds and then doing some stuff. So essentially what your event Loop would look like is a loop which would basically say okay I have list of timers and one that's going to expire the soonest is this one for three seconds. So I'm going to sleep for 3 seconds. And after three seconds, I'm going to go back to the users code and that can be through a callback or some other approach. We can we talk about later have fibers work. We'll just we'll just assume we're using callbacks. So that is really the simplest event look. So what's happening is you're looking at some list of timeouts you were choosing the one that turns out the soonest and then you're sleeping for that duration. After that duration is expired you resume the user's code if the user wants to do some more advanced for example networking. Then you need to incorporate elements of what kind of things can I read from and what kind of things can I write to so typical networking operation will be reading some data doing some processing and writing a result back out for example, so. Network latency is normally massive like on the order of milliseconds. And so what would happen is you would basically have a socket which is connected to something or listening for connection to come in then you would. Read you would try and read from that socket now in the case of multiple threads. That operation is a blocking operation. So what that means is a thread goes to sleep until there's data available to be read on this socket while that thread is sleeping it's not using a CPU. So other threads can run and the main point is that. You can also do the same thing in a non-blocking fashion in the reactor so that when you try and read from socket rather than sleeping in the operating system waiting for data to be available. You go to the reactor and say Hey, I want to register my interest in this socket tell me when it's readable and call me once once it is readable. And so the basic user schedule that IO into the reactor. The reactor does every other operation that it can do until it comes back in. The operator says hey there's data available for you. And the reactor will then resume that code in this case via a call back. And so the value and there I guess ultimately is that the overhead is a lot less potentially. So the reactor is really the event Loop is just really looping over and over again in the case of async. There's timers jobs and IO and those three things are basically interleaved in a way which minimizes latency and then the user code essentially is just waiting until the operation can continue and when it can continue it will resume in the meantime, if you're just waiting then other task will run and if no tasks can run. It's literally just sleeping in the operating system waiting for something to happen.
Jeremy And when you're making these operating system calls, like for example for IO and you were saying you have some kind of call back where let's say you want to read a file and the operating system is going to let you know when the files have been read or when it has a portion ready for you. Are you able to execute, you know your Ruby code at the same time while the operating system is doing that kind of work?
Samuel Yeah, so, essentially in async and even in general run Loops. I guess you will schedule multiple tasks. Those tasks run concurrently. So while one task might be waiting for an I/O operation another task could be waiting on a timer and another task could be executing like a loop like parsing some data or something now. Because they run concurrently those tasks can never run in parallel. It's not a form of parallelism. They're always scheduled one after the other based on the availability of data or a timer or something something else. at the event at the event Loop so. Essentially those tasks are user-driven and in the case of like a web server, for example, every request would be its own task. And so those tasks those requests while I never run in parallel. They were running concurrently. So a typical example would be receiving a request and the user is posting an image like a two megabyte image. It could take maybe I don't know a few seconds to upload that data. So while that one task is just waiting for the data to come in another task can be executing and processing some other part of another request or different requests for a different user. So essentially those those tasks are the core abstraction by which those units of work executed independently and the event Loop essentially multiplexes between them.
Jeremy And is it your async code that has a scheduler built-in that's deciding when each test should run or how does it determine
Samuel So, the event looks like I said is just it's literally a loop which is going around in a circle and the core part again is like the timers and the list of outstanding jobs. That would just be resumed by nature of them being outstanding and finally the core part is the IO and the IO operation waiting for like an event to occur. There are various different ways. You can implement them. So the most typical one is IO.Select and IO.Select takes four arguments and the most important ones are the first two in the last one. The first two is a list of sockets you want to know. If those sockets have data available, the second one is a list of sockets and you want to know if those sockets can be written to so like the buffer is empty then, you know, obviously the operating system is a buffer and that data is going out across the network. And if you put too much data in there you can't put any more and the final argument is the time at which is how long it will wait. For any of those events to occur like something to become readable or something to come writable. So IO select is kind of the most basic thing. And so what you do is you slot that in there into the event Loop and you basically say. I have a list of tasks which are waiting on IO like read like for it to become readable. I have a list of tasks which are waiting for Io to become writable and I have a list of tasks, which are waiting on timers and the shortest. Timer is like, I don't know a hundred milliseconds. So what you will do is you'll take all those readable IOs and put them in the first argument. You take all the writable IOs and put them the second argument you hit the time out and put it in the last argument and what their operation will do is like a sleep. But essentially if any events occur on those sockets, it will wake up immediately and you can resume the users code from that point and so that just sits in the event Loop in the event Loop spins around around around and executes that over and over again and if there's no timeouts, then you just you sleep forever until some Network event occurs.
Jeremy in a way, it's almost like the operating system called the io select call is I don't know if you would call that scheduling but it's sort of deciding when your function should wake up and start receiving
Samuel Yeah, I mean ultimately the operating system probably has a certain amount of variability with regards to network. data and when that comes in and how it makes things up, one of the things that I have thought about a lot with async is how to make it as predictable as possible. I think ultimately the way to look at it is that predictability in code is good. Like if you write a program and you expected it executes one instruction after another like one line of code after another like that's kind of a normal expectation and even if things are running out of order it's nice to have. Some cognitive model of how those things fit together. So when you write your program and you like you have two tasks and they're executing so you have maybe a task and it's making a child task in async. There is one guarantee which I found quite useful and it's when a parent spawns a child task that child task will run until the first blocking operation occurs and then it will go back to the parent. So because of that you can make certain assumptions about how the code behaves and you can have a cognitive model which. you know, when you're debugging code. It really helps if you have a cognitive model for how things are fitting together, and I guess I've tried to avoid. Getting too far away from sequential code because I think that is when you start making things really complicated. So async kind of tries to keep things as simple as possible as sequential is possible and these points where you have non determinism. know, we I guess ultimately try and minimize the chances that making the code complicated to understand.
Jeremy Yeah, I mean it. Sort of reminds me of when you run a debugger against multi-threaded code and you're trying to step line by line and it's kind of like jumping from thread to thread and it's really hard to know. Okay, where am I going next
Samuel absolutely. I think with async. So there are certain elements of async which are deterministic. And I think one thing I've been thinking about recently is there are some elements of async which are non deterministic. And because you kind of get the sense of oh, I understand what's going I understand like how this is working how it fits together. Sometimes when you have situations when that non-determinism creeps in and kind of causes some kind of issue, but I think sometimes it's unavoidable, you know, sometimes you need to introduce non-determinism. It's just part of the program and that to me. I was thinking about comparing async versus async/await. So is this very confusing async uses fibers, which I can explain in a moment and there's another pattern called async/await and they're kind of opposite sides of the same coin async await you basically use keywords and those keywords are used to indicate operations, which may introduce non-determinism into your program. if you try and read from a socket you may end up executing some other code while that data is coming into the system and then you'll be resumed back at that point. So like if you have Global mutable state in a sense. You know from Line to Line, you can't necessarily make assumptions about that Global mutable State unless you put some kind of semaphore or mutex around it in async await. You also can't make the same assumptions, but you have explicit points at which non determinism is introduced in those points clearly shown by the use of those keywords and so in. The opposite situation with with async is you don't have those keywords, and so that to me like it's valuable because you don't overload the language with a lot of Extra syntax, and you don't have to explain to users. This is this extra syntax. And in fact, what's really interesting is that you can actually retrofit like I've played around with retrofitting existing code bases with async and a lot of the time it just works like a lot of the time you can take an existing code base and you can stick it into an async reactor and inject the appropriate concurrency primitives. And it will just work which is really amazing to me. But obviously you can't do the same with async/await because you have the extra syntax and extra key words, you need to inject into your code. So I think going back to that main point is like just not making the code too complicated and trying to keep things as sequential as possible. But there's definitely this nature of non-determinism and you can avoid that with something naturally, which is involving like event loops and an event-driven sort of kind of callbacks or whatever. You want to whatever that approach you want to take a guess.
Jeremy Right, but you can at least minimize the surface that you have no control over, right?
Samuel Absolutely. I think that's what I've tried to do with async and the nature of that discussion about synchronous versus non determinism execution, you know every point. I'm kind of thinking. Okay, how do we actually. Make this easier for the user and try and avoid you know running into some common pitfalls, but I have to admit like it's exciting like sometimes you do still have issues. it's non-trivial like any kind of concurrency any kind of a synchronicity. It's non-trivial and that's why I think things like Puma and Falcon which try and isolate the user from that complexity. I think they are the way to but. again coming back to that discussion like threads just there are too many situations where threads introduce too many problems. And I think the defacto is sequential sequential code running on a single thread or you know, like a single process. That is kind of the defacto concurrency model that people understand in anything more complicated and there is kind of just it's just a disaster waiting to happen. So yeah, I'm very kind of aware of that and I'm trying to figure out if there's ways to make that some you know, Ruby is Ruby and we appreciate Ruby for being an awesome language for a lot of different reasons and I think a certain kind of semantic Simplicity that Ruby has I've tried to embrace it in async.
Jeremy Yeah, let's let's kind of go into that. And so that we kind of get an understanding of where you're coming from.
Samuel So, async like I said, it's the opposite side of the coin to a system like async await. async await Is just a. Kind of a syntactic sugar over callbacks. So when you write async/await inside a function what happens is when that function is compiled like by The Interpreter into some kind of byte code. It's actually transformed into a state machine. And well, this is the most common way it's done and what happens is your function has like an extra argument. And that argument is where to go to when you come back into the function. And so essentially async/await is a transform of your function into callback style approach if you look at callbacks, I think. Essentially callbacks are a way of dealing with events that occur in a system. But the problem with callbacks is they lose all the context in which the sequential flow of those events occurred. So if you're trying to do some complicated process what you end up having to do is build. Ginormous State machine which takes the callbacks and feeds those events into the state machine and the state machine produces some meaningful output and that is very very error-prone that the chance of it causing problems is very very high. And so async/await manually generates that state machine for you from your sequential code so you write your sequential code you put in these points where. You can have non-blocking operations and then that gets transformed into his giant State machine which then gets run by the event Loop the alternative to that is to use something like a fiber in a fiber is a it's interesting the way of this way I've seen it described is. We know what a routine is like a function or method and the general term for that is routine. And a routine is something which has a call operation which lets you go to the top of the routine or method or function you start executing it and at some point or if you just run off the end you return back to the caller. So you've got these two you have a routine and these two operations call and return then you have what's called a coroutine. And a coroutine has this call and return operation, but it's a superset of a routine it also has. resume and yield and what resume and yield do semantically is when you call a function to execute some of them and you get to a certain point and you can call yield and that yield goes back to the caller but it doesn't lose the state of the function where it was all the local values stay the same. When you call resume on that method it will go back to the point where it last called yield and all the state will be as it was. Then it will continue executing until you get to return and then we'll exit and they'll be the end of that that method invocation. So my understanding is the reason why that was first kind of put together as because back in the days when they used paper tape, they wanted to build a more efficient compiler. And so they had like a preprocessor. A compiler and assembler and like all these like processes and they basically had to take this tape from one machine to the next machine and the problem was so they wrote this code in the code sort of looked like here is my input process all of the input and here is the output and that was naturally quite a slow process. So the logic then became was okay. Let's build a better abstraction. I think at first what they wanted to build was a compiler compiler. They would take the all the bits like the preprocessor compiler assembler and so on and would like merge those all together into some Unholy combination that would just be able to feed one piece of one one step one output from the preprocessor into the compiler and then like one token from there into the assembler and one, you know one machine word for instruction to the CPU. So it was kind of like this can try and combine those together, but it just turned out to be impossible. Because who compiles the compiler compiler like I don't know. So what they did was they invented this abstraction and isn't it always the way you just find the right abstraction and then everything becomes amazing. So they took the idea of a routine which you basically the input was this paper tape and the output that was like a new paper tape and they said, okay, what we want to do is we want the input. To incrementally produce output and so that became the yield operation. So the preprocessor would read the source code and it would yield tokens and then you would have another function that would. Basically, you would feed one token at a time using resume and it would yield out like machine code or something. And so like basically you could fit these you'd have one process or like one one sort of routine that would resume the preprocessor. It would yield a token you would resume. The compiler it would yield some piece of assembler and you would resume the assembler with that output and it would come out with machine code. And so you could combine all these things together without them actually being explicitly aware of like how it was all fitting. You know, what they were doing with the outputs and inputs and so on. So a fiber is essentially captures that. State of the function execution in its own stack. So you take a coroutine in the semantic model of a coroutine and you attach to that a stack. And so when you enter a function that is a fiber it allocates a stack. For its own behavior and all its locals and anything else that calls and when you call yield operation what that does is it switches the stack to the stack that it came from. So basically it's a stack swapping operation. And so basically all that's happening. Is that call is just literally the call instruction on your CPU and then yield becomes like a return but you swapped stack at the same time and then resume is like a call but you swap the stack and then return you deallocate the stack because you're done with them. So fibers there almost identical to a thread but the context switching is controlled by the user. of the fiber rather than the operating system which transparently decides oh your time is up. I'm going to do something else now. And so the benefit of that is composability and determinism, you know how the program is going to behave because you explicitly have these points where you schedule the operations to suspend and resume. So fibers are used by async to manage those individual tasks every task in async is backed by a fiber. So your code inside a task executes sequentially and it just runs like you'd expect and those tasks that are running in concurrent fashion. They are actually fibers. They're backed by a fiber and so when you perform an I/O operation that would block you actually end up calling fiber dot yield and fiber dot yield goes back to the reactor. And when the reactor says your operation is ready to continue those operating systems. Is that the event Loop. calls fiber.resume And it will go back into your task where you left off and keep on executing so so it's another way of avoiding the ginormous State machine that you have with callbacks, but I think it's a bit more predictable than async/await and you avoid all this syntax overhead. So they're really just two sides of the same coin, but I think that they have different trade-offs and I prefer the trade-offs of fibers.
Jeremy Yeah, it's sort of like you have these functions that kind of have more information about themselves and can maintain their own State and like you were saying that allows you to write Library code. That's simpler in order to kind of switch between all these different functions that are running and resume them. Whereas in the case of the async await keywords, the you know, the library code that needs to be written to kind of jump between all of these traditional functions has to be a lot more complicated and possibly a lot less predictable.
Samuel Yeah, I mean, it's a really good point. it doesn't you don't have to look very far to see the effort that goes on in the Rust ecosystem or the node js ecosystem. Adapting all the libraries to support promises and async await and all the adapter patterns they use to turn promises into async/await and vice versa and this year I ran into this bug recently I was using GitHub actions and it turns out there's no built-in asynchronous. you know, the in Ruby have like the system method which executes something using the shell and there's just there's no built-in node.js method which works in a way that is async you have to use like a call back or something and it's just you know, so you can have like wrapping this in a promise and you know, 30 lines of code later you have this thing which you not even sure if it works correctly. This is surprising me a lot. So I think the great thing about using fibers, is that the code does not need to be modified in order to run in a concurrent fashion. And like I said, if you take existing code and you put into an async task and you inject the right primitives like for example. One example, I'll give you that I've tested myself is net/http their standard lib code from Ruby there. net/http if you inject I've made some wrapper classes if you inject those wrapper classes into the module if you just literally go like. Net::Http::TCPSocket equals a singer wrapper TCP socket then HTTP becomes just completely asynchronous, which I think is amazing because it doesn't require any additional keywords or changes to the HTTP just works and to me the value of that is massive because you can just take any code base. In theory and and use it in async and have it become asynchronous the same applies to SQL and active record which has some of yours I've been working on ActiveRecord is a little bit tricky because activerecord makes a lot of assumptions about threads. So when I've tried to make that work I've had to use monkey patching which is unfortunate, but in theory like I've got examples where I've done benchmarks. Between Puma and Falcon and all I've done is I've used an asynchronous postgres connection and the scalability with Falcon and async postgres is just crazy compared to puma and of course, you don't expect that in the real world as as much to be as to be as like a big of improvement but you know, it's not uncommon to have database queries. It takes sort of a few hundred milliseconds and. You know, there's no reason why you shouldn't be servicing other, you know requests in the meantime, so you can definitely improve scalability scalability through that approach. The other one that was really fascinating to me was redis-rb. So I made a pull request to redis because redis-rb supports this driver kind of abstraction where you can basically Swap in something which provides the core communication with the redis server and I wrote one that used Async IO and it was not only the shortest driver out of all of them, but with very minor issues the whole test suite just passed it was like well, it's amazing. Like it all just worked now. I don't know if that would work in like the real world because async redis also makes some assumptions about multi-threading. But it was just it was exciting to me that you could just take something like as big like. Redis-rb has like a lot of specs and so essentially that you could take something like that. Write this sort of like I don't know 50 line wrapper to make the IO asynchronous and then everything just basically worked it blew me away. So it was you know, I think these kind of situations where you have Legacy code. You don't want to rewrite all of it. But you want to improve the scalability. I think async can be the perfect, you know, at least it looks like it can do that and you know, the reality now is to get companies on board and people on board with the whole thing and let's do it. Let's actually. Take some Legacy code and make it scalable just by this transparent. non-blocking I/O and and potentially other things as well.
Jeremy And and you were saying this is possible because the the method signatures don't change and the objects that return don't change so you can write an async version of a library. And as long as the API is identical. If a code is running inside a reactor or it's not running inside a reactor, it'll it'll work just as well.
Samuel That's correct. And that's a really there's actually almost a little bit of a different point from the one I was just making but it's equally valuable. So what's interesting is that one of the things that I thought about when I was building a sink was users shouldn't need to know if the code they're invoking is event driven because. Why should the user be forced to sit up the environment? They just want to call a function and it does something they don't have to like go. Oh this is function could be async. So I better like make sure I'm running inside of an async reactor or something like this. So what I did was I explored is these options and so when you write in Ruby async do in your code, What that does is if you are in a reactor already, it makes a task and it will run that task. Concurrently with any of the tasks in the system. But if there is the top-level like if there's no reactor currently running and you run a sink do and your code it will create the reactor for you and run your code inside a task. So inside your library code, if you want to use async you can hide async the user will never know about it. And to me that is like another element of the whole the whole approach to life cycle and like managing the expectations of a user. So like I just give you like a really concrete like simple concrete example of that in Ruby DNS when you start the Ruby DNS server, it uses that approach so here's like an async block at the top level entry point. If you want to control the life cycle of the DNS server as a consumer of that Library you create your own event Loop like you go. Async do and then Ruby DNS server start or whatever. Then that returns a task, which you can wait on stop it. You can call it whatever you want to do like you you'd have complete control over its life cycle, but if you don't if you just call, you know start Ruby DNS server. It will spin up a reactor and that means it will block until the server has finished. Its I let you basically get the best of both worlds in terms of semantics. Essentially if you want to control it, then you can and if you don't want to control it, then you just ignore it and. Yeah, I think like again it comes back to like making the experience for the user as a symbol as possible. They shouldn't need to know like how the code is working in order or like what concurrency model it's using or what parallelism models using lady shouldn't need to like worry about that from the point of view of the consumer. It's like a method you call it and you do something with results you need to.
Jeremy Yeah, I mean, I think that part is pretty exciting because you know, it really makes it a real possibility that as people update libraries or as they create new libraries. You know, they can make use of a sink and take advantage of the reactor and like you said the the people who are adding, you know, their library to their gemfile. They may have no idea that it's using any of this and they can continue to use it in their existing apps and maybe later they find out a little bit more about what async is and how it works and they decided to add it in later and they still don't really have to change a whole lot with your code.
Samuel Yeah, that's really. that's like a really good summary. Yeah, and I think the really interesting point. Actually. I just thought about In my case, we have some Legacy system setup using passenger as a web server and it is running per process and we have some issues where we have we need concurrency in a single request. And even though we're using passenger we can use a sink inside passenger. It's still obviously the event Loop is going to block the request. But within that async block we can do asynchronous Behavior interacting with redis and upstream apis and at the end of that block everything comes back and the request continues and nothing is leaked outside of their block. There's no Global state or anything like that. So in that sense, like it's really powerful set of abstractions, and I think ultimately. What you know one of the what I've been thinking of for the past couple of years is what is the right semantic model? I don't really care about the code that much although to curate. The naming naming is a big deal. But for me, like what is really important is the semantic model that we have in the right semantic model for how people build scalable systems is absolutely critical and right now I've Ruby just gives so many mixed messages. There are so many different ways to do things and as a developer, there are so many ways that can go horribly wrong and. research is is really just showing that I mean, it's all very well saying like oh thread safety, you know, it's an issue but really like having these tangible examples and looking through the code myself really just made me think. Oh, well, this is pretty bad.
Jeremy Yeah, this is more about the library ecosystem I guess.
Samuel Yeah, I mean like at the end of the day when you're building a library, other ones I looked at is the money gem which is like you has 22 million downloads. I think and that gem has like thread safety issues. In fact a pretty like critical one when you I mean I don't think anyone is affected by otherwise, they probably would have figured it out by now. But we need is this shared Global state holder? It's called the bank's store and it stores exchange rates if you access that store from multiple threads. In my my test I had sort of like just because I was trying to hammer it. I had a hundred threads adding a thousand exchange rates each into the end of that test. You would expect a hundred times a thousand, right? So like what then at the end of my test like I had 18 I had 18 exchange rates in my in the store and it's because the threads were just like clobbering each other and what was even worse was. Not only where the threads clobbering each other but actually within one thread you could write an exchange rate to the hash table and then you could read it back and it wasn't there. It was it was just gone. So I think people are trying to make these gems and it's not really a criticism like because I have no entitlement regarding like people making code and giving away for free like, you know, it's awesome that people are doing it but. Look, we're just we have a landscape and a semantic model that just promotes disaster basically because I've had people talk to me saying things like we only deploy multi-process because we are not confident that Ruby is capable of dealing with multiple threads and I think when I heard that I kind of just my heart went out to the jruby and Truffle Ruby team. I just like they've worked so hard to make threads of reality and Ruby and now we're in this like really crazy situation where we have all this code, which is those problems are just magnified a thousand times. So the only solution I think and this is not really Causation is more like correlation. The reason why I made async was because I see it as being the only solution to this kind of complexity single-threaded asynchronous I/O and timers and sort of event driven behavior is ultimately the only way we're going to isolate this these kind of problems and build up. Build up code that can actually work correctly and I think it's gonna be pretty painful to be honest. Like I don't think it's a trivial issue to address. I think like independently these bugs can be found and solved but I think is a whole ecosystem like trying to move this whole thing forward that that that is really the key and it's going to be the difficult element and you know, maybe guilds are the way to do it. But the thing is with guilds, you know, assuming they're sort of. Some little erlang lightweight processes or go routines or whatever is then you have to let check everyone's code and somehow get it to work and that context so sync is kind of like a bridge between those two things because a sink works today and Ruby you can take basically any Ruby code throw in async and parts of your code will scale better depend on how much effort you spend on it. But if you take something like a guild you. Potentially removing thread removing mutex like does does those things operate inside guilds? I don't know. So then all the code that depends on mutexes and threads could be become broken for example, and then you look at things like how message passing needs to work or all those changes to the semantic model which is really going to affect existing code. So it's going to be tricky it's going to be a complicated thing to kind of come to terms with I think as a as a community and as ecosystem of libraries. But you know, it's possible to solve those problems and I think async is kind of like a you know, it's my stick in the ground saying yeah, this is what we can achieve right now with what we've got.
Jeremy Yeah, and I think what's kind of interesting about like these these issues with threading and all these different libraries is that you know, when a lot of people talk about rubies performance, they kind of talked about you know, the the global VM lock they talk about like oh, you know, I can't I can't run things at the same time because of this lock, but I'm wondering if that lock is actually what is. Protecting a lot of these libraries that you know, it's what's allowing a lot of them to run without seg faulting, you know due to the existence of that lock.
Samuel That is a really good question. And I think what I can say is. the process of understanding the gvl. is almost like spiritually transformative what what I mean by that is that I think when I was like a teenager and and working through my first programs and and mucking around with c and Python and other languages and you know, you hear about this thing called the gvl. You think why would they put the gvl you know, it seems like such a stupid idea to to lock around all that stuff. You know, it's just why can't I make real threads that. You know, what is the logic of it? And you kind of have this irritation around at least I did. I just I was just irritated as like, you know, it seems to be such an impure thing to have in an interpreter. You know, why would you leave all that performance on the table? And then you slowly come to terms with like why it's there. I think the Pinnacle of it for me was when I read the comments. I think I might have been thread.c Or thread.h is I'll look it up and I'll tell you afterwards but there's a comment and the top one of those files its ears. Here are the five possibilities enumerates, you know, and very straightforward, you know, at least from a software engineering point of view language. What are the options? You know, if we have a gvl or we don't have a gvl if we have fine grain locking versus not and like you suddenly realize that defining classes to finding methods method caches all these like things in the Ruby VM are actually not thread safe at all and to make them thread-safe would basically be impossible. And so I think when you look at that guilds or the idea of lightweight processes as a separate semantic form of asynchronicity within Ruby starts to become much more appealing because then you start to model it more explicitly around the life cycle of the users code. So okay. I have like a bunch of gems. I want to load. So I want to load all that code into memory and then I want to make one guild per processor in my system. And each of those guilds has access to this shared region of memory, which essentially becomes immutable and in those eight guilds operate independently produced in the same process address space so you can do things like sharing immutable objects between them and then you can do things like have independent garbage collection. So if you wanted to have like say thirty two guilds because of garbage collection overheads you could do them. And so you start to build a more complex semantic model and erlang for example, each erlang process has its own garbage collector, so, you know sort of. Guilds try and solve the problems. My original opinion was like I said to Koichi why don't we just make threads work without the gvl? You know jruby has done at truffle. Ruby has done it. Why can't we do it and I think he thought I was a bit crazy. So, you know, like I think the only solution is the one that they're proposing it's kind of just because it's pragmatic and it's in a way like as a software engineer and as a computer scientist, I don't believe anyone who encounters that and seriously thinks about it isn't at least slightly disappointed that there is the solution but I have to say just you know going through that whole process from like when I was a kid and kind of just coming to terms of programming and hearing about things like the python had it's own kind of gvl and just thinking why would they leave all their performance light on the table? You know, but then you suddenly realize over time like every step you take you just realize how new instant how complex the whole thing is and then you read that comment at the top of the thread.c Or whatever anything. Yeah. Okay. No, this is probably the best option out of all possibilities.
Jeremy Right it it's kind of like a lot of things right where we always we see something and we think it's dumb and then you kind of dive a little bit deeper down and then you realize like there's there are very good reasons why people make the decisions they do and you know, it's life is more complex than we sometimes sometimes think it is. So. One one other thing I guess I'd like to ask about a sink, you know at the beginning of the conversation you were talking a lot about how shared mutable state is a problem. In the context of the async library. If someone has state that they want to share between the different reactors or you know, the different tasks. What kind of way should they approach that?
Samuel Async provides this is another gem called async container. And what async container does is it abstracts parallelism between jruby, truffle Ruby, and CRuby what I mean by that is that in CRuby. if you have eight CPUs, you want eight processes running if you in jruby and you have eight CPU is you want eight threads in the same for travel Ruby. And so what a second container does is you give it a block of code? And it will basically run it as many times as it can that makes sense on your current Hardware. So the way that Falcon utilizes that for example is Falcon basically has this block where it loads your application creates a server. And then starts processing requests and it runs it inside an async container. So they're on CRuby. It uses a child processes and on jruby it uses 8 threads so to come back to your question. Like how do you communicate between those? Systems so in the Falcon examples, there is a chat example and the simplest way you can do that is you just tell Falcon to run with one process. Well, you know one container and so you basically have only one reactor. And you just need to do a little bit of like you need to pay attention the you have like a semaphore or something. So you've managed to state if you need to and essentially just have like a list of users and there it's a hash table of user to socket or web socket or whatever and then you can write message at just let node.Js just you know, if you've ever ridden like a chat system and node.js is exactly the same and you can run Falcon just like node.js using one process. And you can probably handle like I don't know I recently heard someone did a million web sockets in async and falcon on one process. It's so you know, they can scale pretty far. But at the point that you get past about I would say 10,000 connections 10 or 20 thousand connections. starts to become the point where the Ruby garbage collector causes issues and Aaron Patterson who has been working in the compacting GC. He has he was pretty excited to try and you know to understand that problem and see if you could figure out a solution. It's quite a complicated problem because of the way fibers work and we need to do the stack scanning. But anyway, I made you a question. How do you get these things to communicate because of the way async-container works in the way that I recommend? That people use async=container for their server systems so that you get the maximum amount of parallelism out of your Hardware. You need to either use something like redis a database or even just a shared Unix pipe or something to communicate between them and that is the simplest way to achieve scalable cross reactor communication because then if you for example, if you use redis then you can run Falcon on a cluster of machines and they will all those reactors will be able to communicate with each other? Like you won't have to do anything. It'll just work out of the box. If you use a database then of course to get all of them, especially the same thing if you use a thing like a Unix socket or IPC, then you're limited to the current system. So, you know, there's a number of light solutions in there the simplest one being run it like node.js with one process and that's sufficient for some use cases. If you need more scalability than use something else to do the communication. I have played around with implementing a message bus in pure Ruby that fits into async but I think it's just better at this point to use redis it's more mature. It was more just like for fun to see if it was possible.
Jeremy like the async container sort of. Concept that would also apply when you have something very CPU bound that you want to run on another process as well.
Samuel Absolutely, so async container serves two main purposes. The first purpose is to say here is a block of code run it in one reactor per CPU core. There's like one kind of main sort of semantic mode of operation. The other mode of operation is spin up this task and run it like this Ruby code run it as efficiently as possible and so on CRuby it will fork in make a child process and on jruby will make a thread and so. That allows you to do things like job processing or if you just want to like spin up a whole bunch of like isolated tasks. You can do the same thing. So you can do that. And I guess in the first situation where you're spinning out like a number of tasks async-container recently got the support for gracefully reloading those tasks. So that's used inside Falcon to do things like graceful restarts, so you can reload your application without. Dropping connections and so on. So like async-container kind of serves as the the foundation that Bridges the gap between C Ruby and jruby and truffle Ruby in a way that the users don't have to worry about it, but gets the parallelism sits in containers like a vehicle for getting parallelism out of your implementation and it creates async reactors. per CPU process and then you can get concurrency. So it lays it all up and in a way like you don't have to be concerned about how that layering happens. You just write your code and you say make me containers maybe eight of them maybe 30 of them and it will just it will just work.
Jeremy that's really exciting because it sounds like you almost built out like this whole. Ecosystem of in terms of tiers of if you're building an application you start with using the async library and using the async I/O library and you know, just working just in a single process without having to create threads and things like that. And if you have a need for actual parallelism then you bring in async-container. And like you were saying as far as the shared state you can make use of basically things external to Ruby but basically gets you the same result in terms of using external database or pipes things
Samuel I mean plenty of people use things like memcache for PHP systems, right? I mean, it's just it's it's super common because PHP has like a one shot process model. So when you get a request it basically spins up an interpreter and it just discard them, which is awesome. As you don't need like a super complicated GC. It's pretty pretty like clever. And so, you know in the same sense, you know, dropping them in memcache. We don't have an async adapter for memcache yet, but it's definitely on on the cards. We're just really polishing up the asynchronous redis implementation and it's actually being used by a number of people now, which is pretty cool. So. But you know ultimately. You know when you look at like shared mutable State sometimes doing it in process is just a bad idea. Anyway, and I think when you look at it from a semantic point of view. If you have code, which is depending on shared mutable State, it's very very hard to reason about how it behaves in any given situation because it could just depends on so many factors. So the whole async ecosystem is built around. Isolated classes which layered together or composed together to get the behavior you want but there's no there's no hidden internal state which is shared between those instances. Like you will never run into a situation where one Never Say Never right? But in theory is designed around those those kind of Concepts that you make a class you set it up. You do something with it and you finish with it. And then there is the lifecycle is a life cycle is a really big thing and making curd predictable to users is just in my mind like the most to me personally. It's really important. Like I want to be able to reason about the code over and if they can't think something is horribly wrong.
Jeremy Yeah, I mean that's kind of a conversation that I think a lot of language communities are having now in terms of you know, what are your dependencies and kind of do you understand what's happening in them? Right and I think in a lot of cases it's it's almost impossible to really understand, you know, all the things that you're including in your program.
Samuel So one really exciting thing that came out of a conversation just a few days ago was this idea around trying to understand exactly what you just talked about, you know, what are your hidden dependencies? And so to give you an idea of what we currently do with async. So there's a gem called async-rspec and that gem implements a whole bunch of convenient aspect contexts like one is for detecting socket leaks like if you forgot to close the socket then I ought to give you a warning and fail the spec. If you can put memory you can go expect block of code to not allocate any strings or to allocate one string of size 3 megabytes or you know, you can put kind of constraints on memory allocations and Falcon has a whole ton of them. So we don't have any memory usage regressions and another one. Recently came out of this conversation. We were having was the idea of saying this block of code should not mutate any Global state. So the idea being that I'm not you know, there was this was just this was just dreaming. We don't know if it's possible to implement or not. Of course, I can go and start working on the CRuby. Hooks that are necessary, you know, maybe make sense. But but the idea was imagine if you could just put in your specs like this whole spec by the time this spec is finished the memory that Ruby is referring like all the memory that Ruby is looking at has not changed like there is no like obvious side effect of running this code and the idea would be that you could take an existing system and. Evaluate, you know, where are the issues? Okay. I ran this code like say Let's Take the Money gem for example, and I add an exchange rate and Hold On A Moment Like This this shared Global is like been modified, you know, did you realize it was happening? And so the thing is like maybe that's obvious, but maybe it's not obvious when that gem is layered like ten layers deep in some other person's library like you just don't know right? So the idea is that could you take. Something like your own code and then check is this. Kind of how pure is this? Like is it something which is modifying or accessing Global state or is it something which is just isolated to its own State and I think you'd have to have certain exceptions because sometimes you do have local cases, so you'll be able to say some expect this block occurred. To not modify anything except for this thing this one thing anyways it's kind of Pie in the Sky dreaming, but you know, it will be really interesting to see and that it was possible through ruby, we just need to add some hooks some Trace Point hooks for when you modify. A variable like Global variable or instance variable something like that. And if you could do that if you could track those changes, yeah that to that to me seems like a really incredible tool for trying to bring you know how to pet understand code and understand where things could possibly go wrong. So there's tons of code out there like that does. Detects like race conditions and multi-threaded code. I don't know if you saw the recent article, but Google implemented like a. Race detector in the Linux kernel and they literally found like hundreds of race conditions. Like what what hope is there for the rest of us, you know, you know, if the Linux kernel developers can't, you know can't get one of all of those conditions, correct, you know, you know in some ways you just shouldn't need to care about this stuff, you know.
Jeremy I guess the thing is is like Any help that the computer can give us like, you know, as us developers we can always use right like that's you know, one of the things that's really been pushed with languages like rust and Elm and things like that is having an attempt to have the computer help us more in terms of discovering mistakes or things like that. And I know with with Ruby definitely we could use that given its Dynamic nature, but it's just. You know, it's hard, right?
Samuel Yeah in C++. We have the UB sanitizer UB SAN and A SAN address sanitizer and they are just amazing tools for looking at how code behaves in you know, like if you have a multi-threaded code, so I've implemented fibers for C++ as well. Especially Ruby was the Prototype and C++ with the real deal was for a commercial contract. It was really fascinating because. The address sanitizer was super helpful and pointing at issues with like handling memory use after free and potential issues and I just think we lack sufficient tooling for Ruby in those areas. We're starting to get it. It's definitely an area. We you know, like me looking through code manually and trying to figure out if there's race conditions is kind of. You know, what we need to do is we have to scan through the top 10,000 gems and go which of these gems have control race conditions and then figure it out. No, it's just not feasible one person to go through all this stuff and deal with it, but it's you know, the potential risk is massive, so.
Jeremy Yeah, I would imagine with you know, C or C++. Probably it's just a matter of time investment. Right? We have so much critical infrastructure. That's that's built on top of those languages that you know, there are probably companies that are funding you know that type of tooling.
Samuel I think yes and no like a lot of the my understanding is like the address sanitizer and the undefined Behavior sanitizer came from Google. So obviously they invested heavily in C++ and obviously with the race condition detector they heavily invested in Linux. So I don't think they do it just out of the goodness of their heart, but it just makes sense to them to do it, which is very nice and they give it to everyone so that's awesome. But you know, I guess with Ruby we're at a point where concurrency and scalability is potentially becoming a bigger issue. So having appropriate tooling to help understand detect issues and sort of manage them would make sense. And I think one of the big fears that people might have of async would be what kind of issues. Might I have that I didn't realize I had but you know, it's kind of tricky. It's one of those things that when you look at async you think okay. Well, it's single-threaded and it's isolated. So what are the potential issues that you could have with it? And the potential issues are more to do with people who have assumed code will run strictly sequentially, but those issues also exist with threads and are actually worse. So right now one of my goals, I'm giving a presentation. At Ruby World Conference in Japan in a couple of weeks and my goal is to kind of put it out there. Yes, you should be fearful of concurrency and parallelism and that includes async to a certain extent but you should be more fearful of threads and systems that use threads because the potential chance of issue just you know. Objectively like looking at code is potentially much worse. And I think that that's kind of the that's kind of the problem that we're faced with right now.
Jeremy Yeah, well, I mean it sounds like you know, you've put a lot of thought into this and you know, you've really started building it. Ecosystem for async. So I'm really excited to see hopefully people run with that and we get more of a focus on concurrency and Ruby and maybe not think about threads so much.
Samuel Yeah, I mean I'm super excited people are starting to use it and. People it's not just kind of casual and I just tried out people actually using it in production systems and and they're giving good feedback, you know, like okay it worked or or we had this issue or we had one person who tried it was super excited deployed Falcon and Falcon is like zero point something so it's not it's not really like stable. I mean, um it's still in kind of an unstable point and we have this we had a funny joke. We just said it was the longest issue in github's history. He was trying to replace Puma with Falcon for this middleware proxy serving like hundreds of maybe even like gigabytes of data per minute or something and you know Falcon it had some issues with leaking sockets based on their use case and we fixed we fix some of them and. What was fascinating was is that there's two ways of looking at it you could go oh Falcon's not as good as Puma or Falcon's not as good as X or async is not as good as threads or whatever or you can say, there is one person who has been like toiling away for like a couple of years who's passionate about these issues and the fact that Falcon like performs like 99% as good as Puma. This is just like incredible and in some situations like can perform way better. So that's the exciting thing, you know, if you can get on board with it if the people can get excited about it and try it out. That is just the most awesome thing ever, and I'm really excited by people who invest effort in it and I have had some of the most amazing experiences over the past year with people who have been excited by and yourself included. So I thank you so much for that.
Jeremy Thanks for putting putting in the work and you know for agreeing to chat today because you know, you sort of you put a lot of thought into the API and into. The documentation where I actually kind of remember in the past looking at say Celluloid and trying to get to grips with kind of what it was doing and how to use it. And I found it a lot more challenging and I found that you know, async was a lot easier to kind of jump into and you know, just take a look at some of the examples. And really see the case for why this could be something really great for Ruby. So thanks so much for for doing all of that
Samuel You're totally welcome. Thank you. Thank you as well for trying it out.
Jeremy so I guess to you know to wrap up for anyone who's interested in checking out async or checking out falcon or the the work you're doing. Where should they head?
Samuel The best place to go right now is probably the GitHub page for async, I'm sure we can post a link to that
Jeremy sure. Yeah, we'll get that in the notes.
Samuel And you know, we don't really have a forum. We do use gittir for chat but what I would suggest is if you have questions, sometimes those questions are actually actionable documentation changes. So, please feel free to submit issues even for just basic questions because if you have that question then clearly the documentation is to be improved so I welcome welcome interactions through GitHub issues.
Jeremy Nice, and I know you've given a couple of conference talks. So we'll we'll be sure to get those in the notes and.
You've got one coming up you said in just a couple of weeks, right?
Samuel That's correct. Yeah, that should be exciting. I'm supposed to like submit the slides today, but I'm still working on them is that normal is everyone in that situation? I'm not sure.
Jeremy I've heard different things. There's some people who are very let's very organized. Yeah, very organized and then
Samuel some people who are still writing the slides on the day before the presentation
Jeremy They're like I need to go to a quiet place so I could just finish up and it's like all right. But yeah, so that's that's really cool. Like it's I think just this they were all this year right? You're going to have three this year
Samuel Yeah, it's been crazy, you know the things that picking up steam, you know, things are happening. It's exciting super exciting. Matz is interested in it. That's awesome. I'm always excited about all this stuff. It's just incredible and it blows my mind
Jeremy yeah, that's awesome. And yeah, I just not too long ago. I watched your last video where you you had the 1 million connections you had Matz, press the button to get that last one. So, that's awesome.
Samuel Yeah, it was awesome. Totally hanging out with Matz and the rest of the Ruby team is they're such an awesome bunch of people. So yeah, I'm super honored to be invited and involved.
Jeremy Cool, I want to just wanted to thank you again for agreeing to come on and look forward to seeing where the future of async goes and the future of Ruby in general.
Samuel Thank you so much.