First Person Scrollers | Igalia - Open Source Consultancy and Development

Igalia's Eric Meyer and Brian Kardell chat with Chromium Layout engineer Ian Kilpatrick about performance

Jul 12, 2023

0:00

Transcription

Eric Meyer: Hello everyone. I'm Eric Meyer. I'm a developer advocate at Igalia.
Brian Kardell: I am Brian Kardell. I am also a developer advocate at Igalia.
Eric Meyer: And yeah, we're your hosts for Igalia Chats and we've been doing actually a number of chats recently about browser engines and that sort of thing. The most recent one before this episode was about Ladybird, which is a novel browser engine. And the one before that was about Servo, which is a novel browser engine that Igalia works on. After we did those, an idea that I've had kicking around in my head for a while about first person scrollers turned into a blog post where I was basically saying browser engines have these incredible rendering frame rate performance constraints that I don't think we really think about. And that it would be really awesome to talk to people who have experienced with that sort of thing, and that's what we're going to do today by talking to Ian Kilpatrick. Ian, thanks for being here.
Ian Kilpatrick: Thank you very much, Eric. Hi, my name's Ian Kilpatrick. I work on the Blink rendering engine for Google. I should say I'm here in my personal capacity. In a prior life, I was a web developer and then made the transition to become a browser engineer. I didn't know C++ or anything like that when I did that, but now I'm probably six or seven years into working on rendering engines.
Eric Meyer: Wow. So you actually jumped into the other side of things without having prior programming experience?
Ian Kilpatrick: Yeah. Well, I had a lot of Java experience, JavaScript.
Eric Meyer: Oh, okay.
Ian Kilpatrick: I had some embedded C, so C++ and JSON, but as any C++ developer will tell you, C++ has a lifetime's worth of quirks and weirdness to wrap your head around. So it was a little bit in the deep end, but programming languages is just a tool. You can make that work. But I sort of wanted to work on browsers because as a web developer, I was fed up with bugs and shortfalls that I felt on the platform. And there are quite actually a lot of browser engineers who sort of went down that path.
Eric Meyer: Wow, that's really interesting, because it always seemed to me the only way you could possibly get to work on a browser would be like you have to know c++ and you have to have experience with coding, visual rendering or whatever.
Ian Kilpatrick: Yeah, it's sort of interesting. The nice thing I was sort of allowed when I did this transition a lot of time to sort of ramp up on a new team, and my experience with front end development really, really helped. There's obviously a lot that I didn't know and I was also fortunate that there was a lot of people who could sort of mentor me on the team, which helped a lot. But there's a surprising amount of browser engineers who have gone through that path. It's not an easy one obviously, but it is one that's possible. You are at an advantage coming from a front end background, because you understand the problem space and the problems people are running into much easier than someone who's come from a C++, say, a database background or something like that. There's pros and cons. Yeah.
Eric Meyer: Okay. That makes sense. So you said you've been there six or seven years, did I get that right?
Ian Kilpatrick: Six or seven years, yeah. So primarily being focused on layout and that's sort of where I spend the vast majority of my time. So quite a few of your listeners might know this, but very, very quickly, your rendering pipeline basically encompasses from... You've got your use line script and then taking the HTML script, CSS, and producing pixels on the screen. And the rough phases in that is we have style which calculates, say, in your CSS background, color red for example. And we take all of the rules and we decide, 'Oh, this element should have a background color of red.' So that's the style recalc. We then have layout, which is what I spend the majority of my time in. So we're taking those styles and then working out the geometries for all the things in the page. So we work out the position, the width, and height of all the different elements, and we work out how the text should fall on the page. We then hand that off to paint. So paint goes great. It given me all of this geometry information. I'm now actually going to create a whole bunch of paint commands. So create this rectangle at this position with this background color, this rectangle here to represent this border for something else. And then that's handed over to the, it gets murky, this gets near to the limits of my knowledge, but to compositing and restorization. So then restorization is taking those paint commands and actually producing pixels in a buffer that can be displayed onto the screen. So very, very elaborate pipeline. I primarily spend a lot of my time in layouts. So think about Flexbox, grid, table layout, block layout, that's where I spend the majority of my time.
Eric Meyer: Right. So all of those CSS commands in effect. You're working in the area where you have to figure out, based on these commands, where should these rectangles be and how wide and how high and what's in the background and what's in the foreground. Are those basically separate rectangles? I'm just curious. Is the background, like the box of the element, one draw command, and then the text, a different draw command that the compositor then puts together?
Ian Kilpatrick: Yeah. So the in-joke is that how you paint CSS is based on a appendix, not even proper what we think of spec text. It's like an appendix of one of the CSS specifications, but there's a whole bunch of different phases in the paint phase. So roughly speaking you'll say like, 'Paint your backgrounds and then paint all of your foregrounds and then paint these types of elements that have Z index that are positive and stuff.' And so there's a whole set of tree walks you need to do to get the right layering of everything. Yeah, it's complex. That will be a whole 20-minute detour if we wanted to go into that. But from a layout point of view, we just produce a few basic rectangles. So layout works in border-box coordinate system, so produce one rectangle that represents the border-box of the element. There may be multiple if you've been fragmented, which is a complex topic. We produce a few other rectangles like how much overflow you have if you're a scrollable area. And then we say like, 'Hey, this is where the text is and here's how you should render the text and a few other things like that.' Yeah.
Brian Kardell: There's so much complexity in here.
Ian Kilpatrick: There really is. Yeah.
Brian Kardell: And if you were to write a browser, the way that we originally wrote browsers like single thread, and you did just the most naive implementation of this, where every time, anything changed in the DOM, you just tried to recompute the whole world.
Ian Kilpatrick: Things would be incredibly slow.
Brian Kardell: Yeah. I don't think that a page would ever finish loading as long as the universe will last, it won't ever finish loading. There's so many things that we do to keep the design of things performant or able to be made performant. And then there's even on the things that aren't specifically that there's constant efforts to make it faster, make it more performant because those are advantages, right?
Ian Kilpatrick: Yes.
Brian Kardell: I mean I think that was one of the things that was really nice that came out of the split of WebKit and Blink. You wanted to make V8 and that launched this sort of JavaScript engines performance wharf, and it was really positive, the outcomes. I mean it's just so much faster.
Ian Kilpatrick: I mean all engines have really great JavaScript engines now. They're all have multi-tiered JITs. And that competition is fundamentally good. It improves everyone. I think one of the things when Blink forked from WebKit is that we said, 'Yeah, we want to introduce more diversity into new rendering engines.' I think that it took a while because we did inherit all the same behaviors as WebKit, but I think we're at the place now where fundamentally Blink and WebKit are two completely different beasts, which is a good thing for the web. Going back to your earlier point, so obviously, I wasn't around when browsers were being written in the early late '90s type of thing, but I believe the early browsers more or less worked how you just said in that when you resized the browser window things, we basically recomputed layout from scratch for the whole thing and that blew away the entire state. You can roughly simulate what that would feel like today if you grab the HTML element and then place display none on it, and then do display initials. So display, block, block or whatever, that will likely more or less wipe away the whole style, all of the layout, all of the paint, and then force us to regenerate it so you can get a sense of how slow that would be for any incremental step. We do care about that initial bootstrapping performance because fundamentally, that's what pages. If you just visit a page, we have to do everything from scratch anyway. But you're right, we spend a lot of time making sure that incremental changes to the DOM to CSS or whatever is fast and we can do it in something second type time.
Brian Kardell: Yeah, I remember maybe in the mid 2000s, reading article that somebody wrote on the inner workings of WebKit and the introduction of bloom filters and how selector matching works and why it's important to do that. I was at the time writing a little library that did try to do the naive thing and sort of match like jQuery would match with a tree walk for everything. If everything takes a tree walk like, wow, you're in big trouble.
Ian Kilpatrick: Yeah. So that's a little bit outside of my wheelhouse. I've got a working understanding of you don't want to have to recalcStyle for the entire tree. You want to try and find the minimal set that you need to invalidate on, and so that's broadly speaking what those bloom filters are used for. But the rendering pipeline sort of has these caches and these tricks. The rendering pipeline to make everything fast fundamentally is an elaborate set of caches. At the style stage, we have caches to go like, 'Hey, we don't need to recalculate style for this element.' We know that it's not affected by this dynamic change you've done to your CSS or HTML. It's got a whole bunch of other caches as well to make sure that we can create styles quickly and stuff like that. So we broadly speaking have two classes of caches. We have a cache for what we call the min max content sizes and a layout cache. Paint has a whole series of caches as well. So if we know that element hasn't changed fundamentally, layout wise, like its geometry hasn't changed and its style hasn't changed, we can reuse all the paint commands we've generated. Then the later stages have caches as well. So to get anything here fast to make sure that we can do these dynamic changes in sub-millisecond type time, the whole system is invoking a cache pretty much at every single level and potentially multiple different caches. So that's the only trick that we've got up our sleeve to make any of this remotely fast. Otherwise, we'd be behaving like '90s era things. And we sometimes do need to fall back to that will be recomputing the whole world the whole time. But the rendering pipeline, elaborate set of caches, that's all it fundamentally is.
Eric Meyer: Right. And you're trying to get all of this done in sub-millisecond time because what is the rendering speed, or sorry, what is the frame rate that browsers basically insist on?
Ian Kilpatrick: Yeah, I mean, so this sort of gets into what we're optimizing for. It's really, really difficult because fundamentally, we have a very, very broad spectrum of devices that we care about. So now you take your high end desktop machines. You might have multiple CPU, multiple cores, huge amounts of RAM, relatively big screens for example as well. So your buffers for the actual graphics that you need to draw on need to be quite large as well. And also a good internet connection. And then all the way up down at the low end, you've got these Android devices, which might have two cores, a gigabyte of RAM, an anemic L1, L2 cache, pretty bad network. But a lot of these low end devices also have relatively high density screens. They need a lot of GPU RAM to actually hold all the pixels in memory. So basically, our target is be as fast as possible, but we can't target, hey, we always try and be at 60fps because fundamentally on what device and in what scenario, and there's trade-offs that we have to make all the time. One thing that might be good to talk about is performance is actually quite multifaceted in the sense that we think about performance as how fast can something go. So the traditional thing we think about is to bring up a car analogy like, 'Oh, my car can do 0 to 60 in four and a half seconds and this other car can do 0 to 60 in four seconds. So car B is obviously better.' So that's sort of raw, what I call raw throughput performance. How much work can you get done in some amount of time? But there's a whole different bunch of things that you can measure there. So what's your raw throughput performance? And how fast you can do layout? How fast you can do style recalc? And so the benchmark like artificial benchmark here is like motion mark for example. And that is really, really good at just measuring this raw throughput performance. That sort of raw throughput performance is complex because for what web developers are targeting, it's not necessarily a good correspondence to what web developers care about. So if you are a web developer working on a website, you care, depending on what you're building, you might care on about raw throughput performance. You feel like animating something on the screen and whatnot. But a lot of the time, you might just care about, my user has tapped on the button. How quickly can I refresh the DOM and produce the frame so it doesn't feel chunky for example? And then there's a class of performance which is sort of, there are lots of performance cliffs and this is what I find at least developers most care about.
Eric Meyer: What's a good example?
Ian Kilpatrick: A good example of this is I use some feature and my site has gone from performing layout in, say, 50 milliseconds to 500 milliseconds which happens. And that is what I find at least most web developers get most angry about, for lack of a better word, and rightfully so, because they've tried to build something and they don't really care if some action has taken 20 milliseconds versus 30 milliseconds even though that you clearly, the 20 milliseconds is 30% whatever quicker, but you really do care if that 30 milliseconds has gone to 3 or 600 milliseconds. That's like broken what you can achieve. So it's sort of roughly those two buckets of performance cliff or cliffs, which is performance bugs and then raw throughput performance. But when we talk about performance, we often only focus on throughput performance when this performance cliffs is arguably equally as important.
Brian Kardell: Can I lob a grenade up in the air for you?
Ian Kilpatrick: Sure. Go for it. I look for it. Yep.
Brian Kardell: Okay. So I know you and I have actually talked about this, so I want to give you an opportunity to talk about it here. If you open the HTML5 Living Standard single page edition, it is, I don't even know where, I mean it's on the tens of megabytes at least of just basically HTML source code. It doesn't really contain images or anything. The CSS is remarkably simple. Everything is geared toward loading as fast as you can. In Firefox, you're done, everything feels done and interactive in about two seconds, but in Chrome not so much.
Ian Kilpatrick: Yeah.
Brian Kardell: So why? Because Chrome, it's so fast, it's so full. Many of these tricks, like why? This is one of those things that is sort of mind-blowing to think, how could that be?
Ian Kilpatrick: Yeah. There's a fundamental trade-off that the engines have made here. We did have a very severe bug here, which I'll get into, but for me at least, this might be different for you, Brian. Chrome will push pixels to the page faster than Firefox, but Firefox will be completely done quicker. So the reason for that is that Chrome will basically go, I've consumed X amount of bytes of HTML source. I'm going to pause there and force a frame basically to push something to the screen so that you use the node that like, oh, hey, I'm on the HTML standards side, for example. But by doing that, we've created more work for ourselves later, because when we add more HTML, we've got to then trigger a recalcStyle, recalcLayout, repaint everything. And so we've got more work to do later, if that makes sense. So we may have pushed pixels to the screen faster, but we've increased our work further down the line. Whereas Firefox I believe will basically wait until it's received the whole thing and then go, 'Right, I'm just going to do this in one lump.' And so the trade-off there is that Firefox may not push pixels to the screen as quickly, but it feels more interactive, quicker, if that makes sense. So there's a trade-off there. We did have a really, really severe bug in Chrome where the HTML parser would yield constantly. So I think through the whole like HTML spec, it would yield a hundred times. And so rendering the HTML spec in Chrome would take 20 seconds to be completely done and five hook would be on the order four or five seconds. And that was a bad performance bug on our behalf. We've since fixed that and I think we yield when we two or three, maybe four times now. So I think we're in the realm of like, again, depends on your cost of hardware, but I think we're roughly within 10, 20% now. So it's complicated. There's trade-off.
Eric Meyer: Yeah. What you said that the HTML parser kept yielding, what exactly does it mean to have the parser yield and what is it yielding to?
Ian Kilpatrick: Yeah. So it's yielding to the rendering engine. So pretend that you are receiving bytes of the HTML page from the network and we've only received 20% and the network just, this is conceptually how it works, just pretend for a moment that the network hasn't delivered any more bytes. We don't want to wait for that network connection to finish giving us the rest of the bytes because that could take seconds, ages. So instead, what we'll do is we'll go, 'Okay, we've got something. We've got some content, let's yield, give it to the rendering engine to actually try and render this page.' And so it'll be an incomplete HTML, but we can go, hey, we've built up some DOM, we can recalcStyle, we can do some layout, we can paint this. And then later on, there might be more content added in. So it's actually, this isn't happening at the network level for us. This is in our HTML parser. We'll go, hey, we've got enough content that we can give this over to the... We think we've got enough useful content that it's better for the user to push some pixels earlier. But there are some sites that will do this that they'll just send the first kilobyte of HTML source immediately and then some additional posts will come in and then they'll flush that over the network connection later. So there's a trick that some sites can do, for example. So all of these little decisions like how often do you yield, what's your policy matter a lot for the extremes. And HTML spec is an extreme. It's probably... I've talked to you with Brian about this is that the HTML spec is an extreme case, but because of that, it's good for measuring some things but not good for measuring other things. It's not a representative page. It is an extreme case, where there is that we did have this performance cliff where renderer was yielding to the rendering engine way too often. And so it would be a lot slower. But yeah, it's complicated.
Eric Meyer: One of the things that I've heard a lot about in CSS working group discussions is, or at least in the past, maybe this is no longer the case, but in the past there was an overwhelming fear of multi-pass layout and that Flexbox and grid discussions, early discussions around those, it seemed like every third point of discussion was about whether how many passes it would require to do a thing. I mean I think most of us can intuitively grasp, oh, if you have to go through something more than once to figure out how to lay it out, of course, that means it will take longer to lay out. But how bad is that these days? Have the improved engine speeds sort of made that less of a concern, or is it still just as much of a concern as it was back then?
Ian Kilpatrick: It is still of concern. It's less of concern for us now. So the problem with multi-pass layouts, so to back up a bit, for example, Flexbox or if you've got a row of flex items, you need to do one layout initially to work out how tall everything is going to be. And then you do a second layout which basically stretches all of those items to be the size of the largest item. So they all look the same size, but none of them are chopped off content.
Eric Meyer: So you go through everything to just figure out what is the sort of minimum height and width, and then once you've done that for everything in the row, then you go back through and say, 'Okay, well the tallest one is this many pixels tall, so we got to stretch all the other heights to match that.'
Ian Kilpatrick: Yep, yep, exactly right. Exactly right.
Eric Meyer: Okay. Cool.
Ian Kilpatrick: And so pretty much every layout mode has a multi-pass component of it. It's just if you can hit it that often. So block layout has one that's a really, really edge case that I won't get into. Tables have this. Flexbox was the first one that was using this by default and grid has it in spades is the long and short of it. So the reason that multi-pass layouts are tricky or were bad five years ago, for example, is when you nest them, the time complexity goes exponential. So if you imagine, you've got a Flexbox and then that flex item is also a Flexbox and turtles all the way down. So you do one layout to measure that Flexbox and that Flexbox will do a measure, measure, measure, measure, measure, all the way down. And then you go, great, I figured I need to be this tall, for example. And then you stretch that Flexbox, for example, and then that inner Flexbox will go, oh great, I've got this new height constraint. I need to remeasure all of my children. And so then you need to measure that whole subtree and wash, rinse, repeat. And so this goes into exponential territory super, super quickly if you don't cache anything which is important caveat. So you've got, say, 20 elements on a page that can then take 10 seconds to layout easily on a top tier desktop machine. And the important thing there is that rendering engines need to cache correctly each of those two phases. So we need to cache the result of measuring something and then we need to also cache the final layout of something. And that was previously, we didn't really have to worry about this as much. So this was sort of the novel thing with flex in that we had to cache each of those two measure and layout phases separately. So if each layout mode brings some new interesting performance constraint, the trick, which is why it was kind of fine for Flexbox is as long as you don't have any children that have percentage heights or anything like that, it's very cheap to stretch something height wise. But it's not a super common case now that you can do that optimization. We had this bug in grid for quite a while where you're just nest a whole bunch of grids and the time we could spend seconds in layout very, very easily. We introduced this new cache that would allow us to cache these individual layouts separately. We had a very, very specialized cache for Flexbox previously but didn't generalize to grid. And basically, we just needed to generalize it some more. So each new layout mode brings new performance challenges. We also had this bug previously for Flexbox before we moved it over to our new engine where if you constructed a Flexbox with percentage heights nested all the way down, you could spend tens of seconds minutes laying out a page with 20, 30 elements type of thing. So that's a good example of a performance cliff. So it's interesting because it's very easy to go to the extremes of it's just taking seconds. But even if you've only using say three or four nested grids for example, this can hurt your users pretty substantially. The additional trouble here is that previously, we would try and you go like, 'Oh, the measure pass that we've done here is identical to the layout pass.' So we don't have to do this secondary layout, which would save a bunch of time. But if you are skipping on that second layout pass when you should have done it, then that can often result in correctness bugs. So I remember we had this Flexbox regression chain that went on for 10 or 12 Chrome releases where we would fix some correctness issue and then that would cause a performance problem in some site that had a particular structure because we were doing two layout and it was causing exponential layout, then we'd fix that performance issue and then will cause your correctness issue. And it was just a trail of tears for 10 or 12 releases long. So it's very, very tricky to get if you're inspecting this stuff manually.
Eric Meyer: Wow. And then, I mean you have all of those problems, which just intrinsically just trying to lay out the page is clearly difficult enough as it is. And we haven't even gotten into things like formatting lines of text, which still the complexity of completely boggles me every time, but it's not a web purely of documents anymore. There can be stuff watching things change from the scripted side and then that might trigger something. And so you have to maintain your performance while also allowing people to do all this crazy dynamic stuff.
Ian Kilpatrick: Yeah, exactly. I mean that was one of the big design considerations of ResizeObserver in that. ResizeObserver does allow for a class of things which we didn't allow previously. And ResizeObserver will purposely keep stepping down in the page.
Eric Meyer: What does that mean?
Ian Kilpatrick: So for example, if you are observing the size of, say, the HTML element and the body element, and we trigger out resizing both of them, the first phase of ResizeObserver will deliver two resize notifications. You'll go, 'Hey, your HTML element changed and then your body element changed.' And then you go, 'Great, I'm going to do something like change the DOM something, something, something.' On the next pass, even if you've changed both the HTML element and the body element, will only notify you of the body element changing, because we have to exit out of this resize observations at some stage. And we do want to allow you to like, 'Oh, something above me changed substantially. I need to rechange my DOM completely. I became smaller. I need to remove a whole bunch of buttons because I can't fit them anymore.' And so that was sort of the design consideration there. It's like we do want to exit at some point. We can't just continually spin there forever. People will write infinite loops, but we want you to allow to respond to some other ResizeObserver changing the DOM substantially. So that was sort of the design consideration of ResizeObserver. Text-rendering I can get into as well. It's very counterintuitive how it works and all the design considerations there.
Eric Meyer: Although one of the things that sort of occurs here is that you talked about how the design of ResizeObserver had to take all of this stuff into account and the design of ResizeObserver, what is the algorithm for how it behaves and what it looks at and what it ignores, I guess, in some cases and what it tells you. But there's a whole lot of stuff. Brian was mentioning this the other day when we were talking. There's a whole lot of stuff that was not designed with this in mind. Old features of the web that were added in. And before, this was really the kind of concern that it was have. Are there examples of legacy web features that had to be rethought or changed or otherwise worked around because of these kinds of considerations, especially as new layout modes get added?
Ian Kilpatrick: Yeah. I don't think that there was too much in layout specifically or really anything fundamentally in the rendering engine. A lot of the time, the sort of original multi-cache thing was kind of the tables where you need to cache the min content size and the max content size separately from layout. That gets really important for tables. But yeah, I don't think there was too much that was... I mean, at least in my experience, I know there was other things, obviously in the web. I think Brian, you might have mentioned this earlier, where document.write is a really, really nasty API to get throughput performance or what was the API? What was the API where you could pause script by sending up an alert dialogue? Was it just document.alert or something like that?
Brian Kardell: Yep. Just alert.
Ian Kilpatrick: Yeah, just alert.
Brian Kardell: Alert prompts any of those, they just stop everything.
Ian Kilpatrick: Stop everything, yeah, stop the world. So broadly speaking, synchronous JS APIs that do non-trivial amounts of work is pretty bad. But we do allow that on workers. So a good example of this is the Atomics API in script. You've got some SharedArrayBuffer. You can listen for a change on that ArrayBuffer and then get notified when it does. I think it's like async write or something like that. But on workers, there's asynchronous version of this and that's purposely not available on the main thread because someone will create some script that will be bad for user experience, well, just block the main thread for a long period of time. And that's something that we don't want users to experience. That one's a little bit contentious because there are people that will argue for, there are use cases for synchronously blocking the main thread, but it sort of comes down to what your priorities are.
Brian Kardell: XMLHttpRequest also had asynchronous mode if you remember that.
Ian Kilpatrick: Oh, yeah, I remember that.
Brian Kardell: Yeah. And there were a lot of people who argued that we needed that and there were useful things we could do. I wrote a very simple module system for JavaScript using that, and it is very easy to reason about if you do it that way. It's not exceptionally performant, but it's very easy to think about. The logic is very simple and everything.
Ian Kilpatrick: Yeah, every turn, there's always something that you can do with these synchronous APIs. I think my position would be yes, but at what user experience cost. Someone will always write something. The web is amazing. The web is a tapestry of different ideas and someone will always write that thing that you didn't expect. You can say in the spec like, 'Hey, don't nest grids more than three times,' so you're going to have bad performance problems and then someone will nest up. So we always have to design for that. A good reasoning example for us is we had this artificial limit on the number of rows and columns you could have in the grid. We limited it to a thousand. And this is complex because previously, we expanded all of those rows and columns out of memory. So we'd have a thousand slots in some array representing each one and traversing over that thousand elements is like text time. We recently now basically allow uint max, whatever that is.
Eric Meyer: Right due to the 20th or whatever.
Ian Kilpatrick: Yeah, exactly. And the reason that we allowed that is because the Microsoft folks change the representation. So instead of spitting out to the whatever in memory, we'll store them as compressed repeaters. So we'll say like, 'Hey, there's a thousand tracks of one pixel size and there's another 20,' which is where the grid item is of one pixel inside, and then there's the remaining whatever million in one pixel inside. So we only have to iterate over three things instead of X million, because we had bug report. We limited it to a thousand, and then we consistently had bug reports that, 'Hey, I need 100,000.' And we're like, 'Sorry.' And I think now that-
Brian Kardell: But It's pretty easy to say, that seems unreasonable. That seems really unreasonable. And then people show you and you go, boy, I mean...
Eric Meyer: Dang. No, it's reasonable.
Brian Kardell: Maybe it's not as unreasonable as I think it is.
Ian Kilpatrick: Yeah, exactly. So I think now Firefox has the lowest limit of all of the browsers, and so now they have bug reports of, 'Hey, this is not enough. Chrome support's larger.' So there's always that frustration developers that'll run up to some limit and complain to the lowest limit that it's, yeah, the lowest. Yeah. So we'll get there slowly.
Eric Meyer: And then in all of that, to come back to a point that I sort of diverted this away from, you got to render text.
Ian Kilpatrick: Yeah. So text is amazing. Text is pretty counterintuitive. So a good example of this is, say, you've got the width of word for example, and then you've got some width of bar. What would be the width then of foo bar? Would it be smaller, the same size, or larger?
Brian Kardell: Let me ask a question just to clarify that mentally. So when you say the result of foo and bar, you're talking about we're rendering them sort of smashed together with no spaces or anything or just-
Ian Kilpatrick: Yeah. So we can get into spaces later, so let's say just with no spaces, for example.
Brian Kardell: Okay, so you just flow into one another.
Ian Kilpatrick: Just flow into one another. Intuitively, you're just like, oh, width of foo bar is going to equal the width of foo plus the width of bar. We're done, but the answer is it can be smaller, it can be larger, it can be the same size. We don't know. We have to invoke text shaping to actually know what the final size is going to be.
Brian Kardell: How could that be smaller?
Ian Kilpatrick: So you might have some kerning that will mean that and be overlap slightly.
Eric Meyer: Letter spacing minus one px.
Ian Kilpatrick: Yep. So you might have some kerning in the front table or something like that. The other thing that might happen is you might have something in your ligature table. So then the font that you're using might have something special where it will have like, oh the R and the B have some special glyph I wanted to render as. And that could be wider or smaller than the R and B glyph smooshed together separately, if that makes sense. So it gets very complex, and this is just Latin type fonts. There's a whole separate conversation about more complex fonts. Arabic, Thai is super, super complex. There was a really good talk done by a native, someone who understands the Arabic script on just the basics of how complex just even basic words can get. And you'll add some character to your string and then the whole word will change effectively. So text is very, very tricky basically, and then you get into spaces and you think-
Brian Kardell: We also have mathematical text, I suppose.
Ian Kilpatrick: Yeah. It's tricky. Again, there's all sorts of offsets that you want to do to get things to look correct to the perceived eye. So then you get into like, 'Oh, surely spaces are fine. So if I just cache the width of the words separately, the space is always going to be so many pixels wired.' Spaces can also be in ligature tables and spaces can also have kerning. And so it gets very complicated very quickly still.
Brian Kardell: Would it be helpful to define the words that you're using?
Ian Kilpatrick: Yeah, potentially.
Brian Kardell: Ligature table I think is not a thing that most... Yeah.
Ian Kilpatrick: So a good example of this is when you type into your favorite editor browser, whatever, in a lot of fonts, FI will get replaced with a different glyph. So different rendering of the actual characters and they'll get rendered as one unit together. They'll be smooshed where the I will sort of fit very, very closely underneath the F and the I dot will disappear and stuff like that. So when I say glyph, I mean something that the text engine has outputted that may represent multiple characters. The FI ligature is the common one that you'll see in Latin text, but there's a whole bunch of, depending on the font, there could be a whole bunch of them. So when I say ligature table, substitution table, it's going, 'Hey, I've got the F and the I characters, instead of rendering it as the F glyph and then the I glyph, I need to look at the FI glyph instead.' And this is one of the many fonts as a whole subspecialty that you can spend an absolute lifetime on. We're fortunate, we're super fortunate that we use HarfBuzz and Behdad who maintains that. That's an exceptionally good library. So there's all these subtleties and text. The naive way that you'll build up a something that produces text is to go, okay, I've got some width available to me, say 100 pixels. I'm going to keep on adding words to the line until I hit the 200 pixels limit or something like that. But you might have things like spaces that could be substituted into one glyph. You might have complex kerning, whole bunch of other stuff. And so the only way to do it safely is the process of converting characters and fonts into the final width. What glyphs you use is called shaping. There's other steps before that, but let's just call it shaping for the moment. And the safest way that you can do that is you give the shaping engine the whole content every single time you want to measure it because you just don't know what the font may request. So there's optimizations that you need to do. If you do it that way, you can do it that way. It can work. There's a whole bunch of checking that you need to do to make sure that you're not breaking any of these in variance. Blink sort of does it in a kind of interesting way in that we will work backwards, which is a little bit counterintuitive. So we will lay out the whole paragraph of text. We'll give the whole paragraph to the text shaper as one single line and we'll go just measure this whole thing. It's probably not going to fit, but we don't care. And then we'll go, great, your 5,000 pixels wide or whatever. And then we'll search for what's called a break opportunity. So a place where we can break the text onto a next line and then go, okay, we'll pretend that we'll break here. Then we give that subsection from the start to that break point to the shaping engine. And then the kerning and true thing is that then with that break point, the text might increase in size again, it still might not fit. And so we continually walk backwards until we find something that will fit or we're at the first break opportunity. So yeah, that's a very, very simplified version of text layout is counterintuitive and probably doesn't work how you'd expect because it gets complicated with languages. For example, Thai. Thai does have spaces, but they don't separate what we think of as words. They separate phrases instead, for example. And very simple text rendering engine, which is just split everything on spaces wouldn't render Thai correctly, for example. Because it would be separating subsentences instead of actual words. Then you've got the CJK class of scripts where obviously each word, quote, unquote, is 'one single cliff'. Text is super complicated. You can spend a lifetime on this sort of stuff. But I should Prefix it Postfix office with, I've got a very, very high level understanding of how fonts and text rendering works. There are definitely probably better experts that you can get on the show to talk about how it all works. But yeah, it's very, very complex to get these, we call them complex scripts because they're complex. Complex scripts like Arabic, Thai to have fast throughput and equivalent throughput to say Latin text, which is sort of what we index on a lot of the time, as a lot of us come from Latin backgrounds. I should write a blog post at some point on just the 10 falsehoods that programmers believe about text rendering at some point and go into all of this into detail. Yeah, they'll be proof on. Someone else should write that. It'll be much easier thought on actually.
Brian Kardell: There's a thing that is in our show notes that I thought was interesting. One of the things that's nice about the web is that it really changed our design thinking. We were used to designing for really fixed things and then the web came along and it said no. I mean the nature of the web is fluid. You have to break the fluid nature. I mean it took a long time until we got responsive design and things like that. There's two aspects of that that are kind of interesting. I don't know if you can speak to them or not. So one is when we introduced media queries. And that is interesting because it introduced a thing where at some point in time, you sort of shove a whole new style sheet or pull it out into the calculation. I expect that that means that you have sort of no choice but to recompute the world. But the other one is that one of the things that makes this all really possible is the ability to introduce, well, you scroll, if you don't have enough room, this is how much room you get, and then you do scroll and you put, I think it was you that added a note about auto scroll. So I want the opportunity to speak about that.
Ian Kilpatrick: Yeah, auto scrollbars are the bane of most layout engineers existence. They're like awful. We do some very, very dirty hacks to basically get them to be fast and stable. So the long story short is that when you lay out a box, you don't know if you need a scrollbar or not. And so the thing that we typically do is that we'll assume that there's no scrollbar that exists. So we'll lay out our content and we'll go, great, we've laid our content and oh no, it overflows. So we need to add a scrollbar. And so now, because we've added a scrollbar, we need to go back to the start of layout and start everything again because the width of everything will have changed by default. And then we go through and there is a possibility that then you decide, oh no, we need the other scrollbar. So we need the scrollbar at the bottom, for example. And then you need to start layout again. And then there's subproblems like you can get into a state where you'll add that right scrollbar for example. And then that second layout pass, you don't need it anymore. So what do you do? Because you're in this state where you both need a scrollbar and then you don't need it, which is very, very counterintuitive. So this is a quintessential multi-pass layout that I don't think any engine like caches, particularly well having auto scrollbars and everything is super popular in enterprise type use cases and can trigger very, very poor performance. The way that you can sort of mitigate this is we've added, the CSS working group has added a few properties like scrollbars stable, which will reserve space for scrollbars on each side is like this extra padding. I should also say that the auto scrollbar problem only exists when you've got your OS system settings to always show the scrollbars. So this broadly isn't a problem for overlay scrollbars, which is more and more of the default these days on OSs. So we don't have this problem on Android and by default on Chrome OS or macOS, for example, and I think by default on Windows 11. I suspect because the OSs also hate auto scrollbars for this reason. But the auto scrollbars can get into super complexity really, really quickly. And it's very, very tricky to get correct and performant. The other thing that can happen is that this gets into an little bit advanced topic, but the intrinsic size can change because scrollbar is kind of border or padding, so it can widen your books. It's a whole separate conversation of how should that work, and it gets complicated very quickly.
Brian Kardell: Performance is an optimization problem, right? So you can write something that's technically correct and we'll get the job done, but it might take way too long. I think one of the things that's interesting to think about here is that we're designing a standard that's being implemented in multiple different engines, which have their own architecture and baggage and past and future plans. I have heard this, I've been in discussions before where an idea gets posited like, what if we did this in the standard? And different engines can have different opinions because it's not the same ask of all engines, right?
Ian Kilpatrick: That's right.
Brian Kardell: It's a different ask. Yeah.
Ian Kilpatrick: Yeah. I mean that's sort of why. So Blink, we had this project called LayoutNG, which basically rearchitectured our layout engine, because we were effectively running into all of the same problems. So we had the long and the short sort of it is that we had performance these exponential blowout costs with Flexbox and grid that were really, really tricky to get a handle on. Evidenced by that, 10 to 12 regression chain and Flexbox that we had. It was really, really difficult to add new features as well. And then some of the asks that we saw was just like, 'Oh boy, that's going to take years worth of work to get a quality solution.' We could do a short term thing that's kind of buggy, that's 70% of the way there, but we were really, really constrained by our fundamental architecture at that point in time.
Brian Kardell: Yeah. People can relate to it being much more difficult to add a certain concept into one code base than another. And just thinking about the standards versus the engines and why sometimes we don't do things or why they take longer because we have to do rearchitecture because we're like, 'Boy, there's so many good ideas here but we're not prepared for this and we can't consider it right now. But boy, we sure would like to.'
Ian Kilpatrick: Yeah. And we're also constrained by our past for features that were sort of bolted on that didn't necessarily have the right underlying architecture. So a really good example this for us was vertical writing modes was effectively bolted on to the layout engine that we had relatively quickly and didn't think about all of the nuances that's involved with vertical writing modes. I can get into that, but it's complicated and so with our rearchitecture or writing. Writing Modes is going to be a first class citizen. We're not going to do any of the sort of hacks that we had to do previously to get it to work. And so for us now, vertical writing modes is trivial. Each engine has a different past. We found at least that when you try and quickly bolt something onto it versus doing a partial rearchitecture, rethink something, you might get there quicker, but it might be a lot more buggy than what you'd likely desire. And then you might have to do a whole bunch of fixes later or a continued fixes to get it up to the standard that what developers expect. So each engine has their own constraints, their own resources, own history, and technical debt past that one engine might go, hey, this feature is trivial to implement. We can do it in six months. And then other engine might be sitting there just going, oh boy, that's a two to three year project for us. So for us personally, we are fortunate in that we had the right team. We had the right amount of time that we could sort of invest in this. It was a hell of a lot of work, but we sort of got the performance and the correctness gains that we expected out of it. One really nice thing that we saw consistently is that when we switch each layout mode to the new architecture, effectively, the bugs in that area basically dropped by half. So if you think about that, instead of an engineer spending, say, three to five days on a bug multiplied by 200 bugs, that's over years worth of work. We found it almost cheaper counterintuitively to rearchitect the whole thing, and then we just found that we closed all of these bugs.
Eric Meyer: Lots and lots of difficulties.
Ian Kilpatrick: So many difficulties. Yeah. I've got so many gray hairs from all the different stuff we've run into previously.
Brian Kardell: Because you're only as fast as your weakest link. Do you ever run into some change either upstream from you or downstream from you, not in the open source sense, but in the pipeline sense is too slow or too greedy and that as it improves, now you're the problem?
Ian Kilpatrick: Yeah, a little bit, but not as much as you would expect. It's sort of interesting because occasionally, there are developers that are pretty aware of what the slow parts of your engine are. And so the bug reports that we get from these web developers are really, really high quality. They're fantastic. They'll go, 'Hey, I've got these thousand elements and with this style, it's super slow, but if I change it to this, it's gone away. I've worked around it, but you should probably fix this at some point.' We're like, 'Yep, we should probably fix that. That's kind of bad.' We find that web developers will discover these performance cliffs. And this is why personally, I spending a lot of time thinking about performance cliffs versus raw throughput because this is what developers often run into that they will edge, they will come up with some mitigation to work around your bug effectively. There are times where another part of the engine is causing some problems for you. So the classical example here is that HTML spec thing where the HTML parser was just yielding way too often. And this came up as a layout bug because the vast majority of time, you're just spending in layout and people were asking, 'Hey, we're spending far more time in layout in Blink versus Firefox. Why is our layout engine slow?' And then we dug into it and we're just like, 'Well, it's not that the layout engine is slow, it's the fact that we're running layout a hundred times versus once.' And so there are some characteristic of the pipeline upstream, if you knew, can affect what the perceived problem is, but it might be some fundamentally different problem if that's a different way of thinking about it. But similarly, there might be some new feature that people are using. So grid for example, good example where they're like, 'Great, we've worked...' And Igalia did the first work on grid in this area for Blink, quite solid implementation. People using it but then sites started to use grid more and more and more and then ran into this performance type of issues of just like, 'Hey, we're constrained by the number of tracks that we need, or constrained by how many grid levels that we're nesting.' We had one found bug report where we fixed some performance issue and then suddenly someone's typing was taking 200 milliseconds each time they did a keystroke. So it's a complex space. The ecosystem will often change and start pressing on what you've thought that you could get away with performance wise. Yeah, it's complicated. This is again, why I've got gray hairs. Web rendering engines are used to solve all sorts of use cases from enterprise to consumer space to generating PDFs on the server side, all sorts of stuff. And we get-
Brian Kardell: Lots of embedded stuff now too.
Ian Kilpatrick: Lots of embedded stuff as well, and it's always a complicated task of trying to prioritize performance and bug fixing work for all of those different constituents.
Eric Meyer: Yeah, cool or not cool, but cool.
Ian Kilpatrick: Scary on that note. Yeah. This is why I enjoy working on the web, just on personal note, is that web rendering engines bring so much value to society at large, just the web generally. But if you want to scope it down a little bit to rendering engines, so much economic value, even if you want to put a number to it that effectively we give away for free. And that's what keeps me in this game.
Eric Meyer: Well, this has been a really interesting conversation, Ian. And as we said before, we can probably go on for hours and hours, but-
Ian Kilpatrick: We should probably limit it at some point.
Eric Meyer: Limit it at some point. Maybe come back for part two.
Ian Kilpatrick: Yeah, exactly. I mean I'm more than happy to chat about all sorts of layout things and all sorts of things more generally. Thank you for having me. It was a pleasure as always.
Eric Meyer: Yeah, thanks for being here.
Ian Kilpatrick: Hopefully, this was interesting to listeners as well.
Eric Meyer: I don't think there will be any concerns on that front. So thanks again and we'll talk to you soon.
Ian Kilpatrick: Thank You very much. See you later, Eric and Brian.
Eric Meyer: Thanks.