Building the Death Star #1

"Vertical slice" is from the game industry: instead of showing off rough drafts of all the levels, you focus intently on one small section of gameplay. You build it out and polish it up to the point that people can start to sense how it will all feel when it comes together.

To this point in the development process, I've built a couple prototypes that got me excited about the potential. But as I was building the marketing page, it became very clear how hard it is to talk about the app without an actual demo of it. This series will walk through the building of that demo, and I'm going to try and make it a proper vertical slice.

Andor feels like the perfect story to test on. It's 12 episodes, well-structured, strong thematics, reasonably intricate plotting, character-focused, good mix of drama and excitement, and right up our alley here at Potboiler Labs, in the sense that it's way better than it needs to be considering the budget and IP at its disposal. Plus, it tells the story of one of the guys who steals the plans to the Death Star, and sometimes the scope of this project feels like my own personal Death Star, so it's all apropos.

There was an even better reason to choose Andor, by the way -- Tony Gilroy, showrunner, announced his plans to release the shooting scripts for the first season. But then the Writers' Guild went on strike, and that fell by the wayside, so I've got to hack together my own scripts.

Cleaning up SRTs with Subtitle Edit

Subtitle files are freely available, so it's easy enough to grab a set of those. But these need cleanup: subtitles also include sound effects like "(PANTING)", or nearly inaudible background dialogue which only exists to fill out the soundscape.

I downloaded a program called Subtitle Edit and found it to be one of those delightfully useful pieces of freeware you'll stumble across on the internet every now and then. It's inspiring while I'm trying to build something equally useful (tbd on the freeware, though). What makes it so useful is that it's very clearly designed to solve common problems, so just by reading the menus I can learn something about the problem space.

The tools menu in Subtitle Edit

As someone who spends too much of his life digging through menus, a dropdown with a long list of clear, imperative commands fills me with gratitude.

A screenshot of Subtitle Edit's 'Remove text for the hearing impaired' tool

This filters out a lot of the extraneous dialogue.

After that, I did a first pass on merging and splitting the dialogue, so that the text files would look more like a script. The UI for this required more clicking than my carpal tunnels would prefer, though, so I knew I'd have to account for this functionality in phase 2.

Building a tool to assign speakers

One skill that programmers develop is the ability to make their own tools. Everyone who writes code probably has a natural instinct to automate things whenever possible -- sometimes to our detriment, when the brute force solution would still be faster than tinkering with something purpose-built.

In this case, the brute force solution would probably be dumping everything into a spreadsheet and going from there, but I knew I was going to need the ability to jump backward to catch bits of dialogue I missed. I've also got another personalized tool for managing my painting reference images, and I've built plenty of UX to manage all the tagging there, so I knew the ergonomics would be worth the effort.

So I built a small Svelte app.

These days, when I need to build a JS tool for myself, my rule of thumb is to pick "anything but React", and it's been a success every time. Svelte's got two-way data binding, so I could just do the most straightforward thing at every turn. For instance, when I wanted to build in some checkboxes to enable bulk-assignment, I didn't need to hoist a callback to some higher-order component... I just used a DOM selector to grab the relevant inputs, modified their content, and triggered their "change" events. This was then dumped into window.localstorage, and that was that.

The killer feature here was the play button. If I click on that, it will submit an XHR request to my VLC media player, specifying a timestamp to jump to. This lets me loop back if the dialogue gets too fast to keep up with, and proved immensely useful: with this setup, I was able to annotate episodes in realtime, like a court reporter.

VLC's web API is not the prettiest thing, but since it is another one of those unbelievably useful pieces of freeware, it works. (I have so much confidence in VLC that I instinctively knew it would offer some interoperability with my Javascript frontend.) Here's how I set that up:

A screenshot of a customized Windows shortcut that can launch VLC's web interface
  • I'm on Windows, which lets you configure a shortcut that will accept additional flags. I used this command: "C:\Program Files\VideoLAN\VLC\vlc.exe" --extraintf=http --http-port=8083 --http-password=password. Launching VLC through this shortcut will enable a tiny web app to control your VLC (running on localhost:8083 behind basic HTTP authentication in this case), but we don't want to use it -- we just want to submit directly to its endpoint.
  • Unfortunately, sending an XHR from localhost:5173 to localhost:8083 violates CORS. To fix CORS issues, you need to tell the server to accept requests from this other origin, but in this case, I had no clear way to mess with whatever server VLC was running here. So instead of worrying about that, I spun up a reverse proxy in Flask. Its only job was to listen for the XHR request and forward it along to the VLC web app.
A screenshot of a customized Windows shortcut that can launch VLC's web interface

I used Postman to build the XHR request: if you weren't aware, it'll do some codegen for you too.

A Flask app that proxies requests for VLC

This is the whole app

I will say this: the second most viral I ever went was when I built a social network of the A Song of Ice and Fire novels. I did a whole machine learning thing to predict the speakers, then combed through and confirmed each labeling by hand. That took ~75 hours for 668,000 words of dialogue, so as far as I'm concerned, 8 hours isn't too bad.

Once I had this all working, it was just a matter of time. Time-intensiveness is on my mind whenever I'm building something for this app. My theory is that, since a novel will take 500-2000 hours to write, even an extra 100 hours of data entry wouldn't be completely intolerable. But tedious data entry will be a killer, even if I'm targeting this to people who are very comfortable at a keyboard. Andor is about 8 hours long, and I was feeling a little stir-crazy at the end of this process, which isn't very encouraging... but I think that's the point of dogfooding. I'm proposing to solve certain pain points for writers: use this app and you won't have to worry so much about keeping track of characters, or missing an obvious narrative opportunity. But my solution brings pain points of its own, namely mind-numbing data entry. Having a strong emotional sense of this even in this earliest stage will give me a good instinct for where to focus my efforts. Anything I can do to reduce friction in that direction will be time well spent.

This work was a satisfying reminder of how good the FOSS ecosystem is. If you know how to code, you can assemble these ad hoc pipelines that would be a tremendous annoyance otherwise. For instance, my naive assumption was that I could just stick the video files into a video tag, but Chrome struggles to seek through files of that size. Instead of trying to troubleshoot that, I just put my trust in VLC.

Initial Reports

After five days of work, you want to take a second to see what kind of progress you've made. I now have:

If you don't know word counts, 42k puts Andor solidly in the novella range. National Novel Writing Month requires 50,000 words over 30 days, and a 300 page novel is going to be around 100k. Which would suggest that a writer tagging their manuscript would be faced with ~16 hours of data entry right now. Will have to do what I can to pull that number down.

  • A Svelte frontend with the ability to:
    • jump between episodes
    • filter by timestamp
    • tag a chunk of dialogue with metadata
  • A way to integrate the source video into this frontend
  • A database containing 8,374 tagged lines of dialogue, representing 42,810 words.

Can we pull anything interesting out of our new dataset? Two ideas come to mind:

Minutes Talking

Since .srt files have very precise timestamps, we can easily figure out how many minutes each actor gets to talk.

Character Minutes of speech
Cassian Andor 35.05
Luthen Rael 27.46
Dedra Meero 21.92
Mon Mothma 17.64
Vel Sartha 13.90
Karis Nemik 13.16
Kino Loy 11.57
Syril Karn 10.78
Maarva Andor 10.78
Sergeant Linus Mosk 9.93
Major Partagaz 8.65
Arvel Skeen 6.84
Bix Caleen 6.51
Lieutenant Supervisor Blevin 5.70
Lieutenant Gorn 5.66
Taramyn Barcona 5.59
Commandant Jayhold Beehaz 5.42
Eedy Karn 5.07
Brasso 4.75
Tay Kolma 4.29
Colonel Yularen 4.19
Imperial CO 3.94
Kleya Marki 3.93
Perrin Fertha 3.84
Chief Hyne 3.60
Davo Sculdun 3.47
Supervisor Lonni Jung 3.28
Jemboc 3.25
Saw Gerrera 3.13
Pegla 2.97
Doctor Gorst 2.96
B2EMO 2.92
Willi 2.68
Prisoner 2.59
Cinta Kaz 2.58
Clem Andor 2.52
Narkina PA 2.41
Captain Vanis Tigo 2.41
Attendant Heert 1.99
Xanwan 1.97
Lieutenant Keysax 1.95
Melshi 1.91
Nurchi 1.79
Kravas 1.61
Corv 1.49
Intake Warden 1.47
Captain Elk 1.44
Timm Karlo 1.40
Leida 1.31
Hostess 1.31

Dialogue Maps

I've been interested in "distant reading" since 2007, so I've built this kind of visualization before. It's fun to look at, but not real helpful for a writer. Couple things I'll say in its favor:

  • It's interesting to think of these parts like instrument sections in an orchestra. Cassian is a steady drumbeat, running throughout. Syril is, I don't know, a bassoon? who gets a nice solo in the first movement, but gets sidelined as a result of his failure on Ferrix. There's less for him to do at this point, but the writers still want him involved in the finale, so they diligently check in with him -- typically early in the episode -- just to keep him fresh in our minds.
  • Seems like there's a tendency to cluster a character's scenes within an episode. Again, check out Syril's timeline. In episodes 7, 8, and 9, he'll have a scene, we cut to some other characters for one scene, and then we cut right back to Syril for a capper.
  • I paired up satellite characters with their main character: you can see how Kleya Marki, B2EMO, and Eedy Karn serve as supports for Luthen, Maarva, and Syril, respectively. TV demands these types of characters, since we can't read minds. Literature doesn't have that constraint, and I bet that has informed the construction of every novel ever written.
  • Long, unbroken lines are interesting: Vel, Gorn, Taramyn, Kino, and Sgt. Mosk really get to take over the screen for big chunks of episodes. Compare that to Mon Mothma, who's such a crucial part of the story but never gets to dribble the ball for that long.

Here are some improvements I'll work on next:

  • Social network view, showing conversational partners. If you know the show and are willing to squint, you can figure out that Cassian and Luthen only have three scenes together, but that can be made much clearer with a graph view. To create that, I'll need to annotate each conversation to see who is being addressed with each line of dialogue. I also realized that eavesdroppers are good to keep track of: in a spy show like Andor, the sometimes asymmetric flow of information is all part of the fun.
  • Think about how to represent non-verbal actions. Obviously a script will have action beats written out -- how should we add those?
  • Integrate plot points, like the Aldhani heist. I imagine that key events will make some of the activity more legible.
  • Start to qualify each line of dialogue. Not all of those boxes are created equal: Nemik's big manifesto or Maarva's big speech take up minutes of screentime, and you wouldn't know it from here.
  • Similarly, most of the Aldhani crew gets tons of dialogue, but a lot of it is bank robber stuff: "Move! On the ground! Open that door!", etc. That makes it hard to see the big, dramatically significant moments.
  • We need better ways of showing simultaneous action. I think the shape of this visualization is right -- it looks like a video timeline editor, and I think that track mentality is good. Just need to make it easier to work with.

The Right Moment

To do these state machines, I feel like I need to pin down the story's timeline, because there's a lot of emphasis on who knows what and when. It's easy to look at a machine and forget that everything has a duration, and that these transitions aren't a snap of the finger. In this story there are oceans to be sailed, expeditions to be outfitted, jungles to be trekked -- everything takes time.

Most crucially, feelings take time, and that's really why I wanted to slot these beats into a calendar. Obviously the beatsheet itself is a kind of timeline, since it tracks cause and effect, but it's oddly non-specific even so. And that's because we're only paying attention to immediate cause and effect. That isn't realistic, since we're all of our personal plotlines are mixing together with the recent past's events in a big stew. So if we want to help our "tracking", we need to do that work.

This exercise had an immediate payoff, the biggest I've had since starting this project... it uncovered at least half a dozen serious inconsistencies. I keep mentioning this, but anytime you ask a writer, "Hey, is there any connection between <X> and <Y>?", writing occurs. The reason this program is so good at surfacing these questions is because computers have no common sense. Non-programmers underrate just how profound that statement is. I see it all the time in my professional life -- gnarly technical problems are completely glossed over because these problems require a grade-schooler's intelligence to solve, and non-programmers forget that computers don't have that. Programs are not all-purpose geniuses but idiot savants, who need everything defined for them before they can begin working their magic.

For this dogfooding phase, I'm doing that work via a python script which uses the standard datetime library and a small extension, dateutil, which lets you define relative deltas (ie <X> occurs three months before <Y>). In terms of a coding challenge, this is a breeze! But it does get me thinking about how, prior to releasing this to a general audience, I'll need to come up with a pretty involved UI to make this intuitive to work with.

I should also mention that it was easy to get a toehold on this problem: I just used a regex (year|month|day|week) to look for time words, and that took me right to the places where I'd committed to some timestamps.

Dramatis Personae

I'm coming back after a long layoff and trying to build towards these "state machines" -- more about what those are in a future post. But to build them, I need to assign each beat in the beatsheet to the characters involved, and that means I need to create a matrix: in the first column, all the beats, and in the first row, all of the characters.

While creating the beatsheet, I used Spacy's named entity recognition to very easily pull out all the proper nouns from the text, and there were 41 in total. Smaller than I would have thought, considering the scope of the book. And it's very top heavy, with a Pareto distribution you might expect: the top 8 characters represent 73% of all mentions.

But the biggest surprise was which names were unfamiliar to me. I started writing this book 10 years ago, and have read it quite a bit... I thought I would have had them all down by now! But there are 3 guys in the middle of the list who I cannot distinguish, and I've mentioned them 40+ times in the text. Whereas "Laine", a bit character with 13 mentions, is instantly recognizable to me. And then there's "Per", who's got 121 mentions, good for the 10th spot on this list... and I can't say much about him at all.

These surprises are all good surprises, since that's exactly the kind of thing I'm trying to find in revision. You go through the text, scrutinizing everything, looking for anything that's out of place, nudging it back into place. These type of statistics are impossible to sense when reading, so this is an easy win for the program -- now I know I need to do something more with Per's character, and I need to do some genuine work to characterize those anonymous three stooges.

The more dogfood I eat, the clearer my goal is: to invent new ways of reading, which will give me metrics to measure the significance of each piece of the story. I'm so happy with how unambiguous these metrics have been -- without the quality or the quantity of prose to distract me (they say to "murder your darlings" as a writer, but I am a doting father), each event takes on equal significance, and so anything can catch my eye as meaningful. Right now I'm assigning each beat to the characters who would have some stake in it, and it's clear as day that events affecting the whole cast are a big deal. Some of these I haven't thought much about, and it's only through this bone dry quantification process am I realizing.

Also exciting: each time I run a new analysis on the story, I need to sort the story in a different way. And when you re-sort, different elements fall next to each other. It's semi-random, but writers are monster synthesizers, so my brain is weaving away as I scroll through the shuffled story. A lot of this dogfooding has created good insights and general ideas, but something as simple as sorting my beatsheet alphabetically, has actually inspired scenes with clear emotional stakes, and that's solid gold to me during revision.

This reordering is impossible in the manuscript file itself or in a pile of 3x5 notecards, but with a click of the button I can sort by: who's driving the action in the scene, when events occur logically, when events occur within the narrative (those last two are different, because of things like flashbacks), and even randomly.

Exploring The Solution Space

I've committed to building my beatsheet in WebGL. But, since we're just getting started, let's be precise about our scope: I've commited to trying a WebGL solution here. This isn't do-or-die, and I can always fall back to a low-tech solution that requires more elbow grease. I think it's important to get establish this frame of mind, because I want think clearly at this early stage, not reactively.

My emotional concern was that my lack of experience would force me to waste a lot of time going down blind alleys. I can minimize that with a little research. Who's tried big SVGs in 3D before? Am I prematurely commiting, here? Is there anything prebuilt? Horror stories? I start Googling.

Couple interesting finds:

  • Canvas isn't the way to go. This benchmark shows that Canvas is maybe 4x better than SVG, while WebGL blows them both out of the water. https://ahoak.github.io/renderer-benchmark/
  • PixiJS, which the above benchmark is built with, looks like a contender. It's a 2D renderer built for WebGL. I'm intrigued at the fact that it's exclusively 2D -- that specialization suggests performance to me. Plus, they claim their text rendering is just a performant as their sprites, which is as big deal for a beatsheet that has thousands of words of text.

I also have a number of technical concerns. Here are my must-have features:

  • Editable content. I need to add, delete, and update elements of the graph as I'm building it
  • Interactivity. That editing will include dragging nodes around and cutting/creating linkages between them
  • Performance. 30FPS+ with 1000 nodes & all the linkages betwen them
  • Freedom of movement. Camera that can pan and zoom, so I can explore the map
  • Smooth navigation. A minimap that can teleport you to another region of the graph

These constraints are a blessing, because they give me a place to focus my early exploration and a clear signal to bail if they can't be accommodated.

Since Pixi.js seems like a nice middle-ground solution, I decided to try there. I even found a library that purported to do exactly what I needed. Though the writeup didn't inspire confidence (old habit of mine -- I put a lot of stock in clear writing), I fired it up... and all my Chrome tabs went black. Never seen that before!

Well, what're the odds it was going to work first try? The second try, however:

I fed the SVG into a Pixi Sprite, it rasterized successfully, and pixi-viewport has a pretty nice feel without any tweaks.

Now we've got a toehold. No matter what else happens, this is an improvement on where we started. Only upside from here on out.

Intimidation and Prototyping

During the prototyping phase, you will try a number of different tools. For my "beatsheet", a giant flowchart capturing the novel's plot, I created a Python script that would convert a text document into something parseable by GraphViz. It's easy to design an API when you are both the creator and sole user of said API -- the text document used whitespace to communicate cause and effect.

This happens first
And because this is indented, we know this happens later

All I needed beyond that was a simple way to add metadata, and I handled that with some square brackets. Easy!

After that, I used Graphviz on the commandline to spit out an SVG, which I then dumped into React. To handle some barebones interactivity (clicking on a box in the chart and getting its metadata) and establish a line of communication with the database, I started using DOT's JSON output to manually draw the SVG. This worked well and was quite easy to implement, but I knew that adding more robust interactivity would be a time suck. I turned to ReactFlow, an off-the-shelf library which has nice drag-and-drop capabilities.

But my beatsheet has over 800 nodes within it, and ReactFlow is not built to accommodate that type of scale. Frankly, neither is SVG -- my experience in data viz and D3 made me dubious that there was much upside to optimizing on that.

Since this is going to be core functionality, I started to think about what it would take to hand roll this thing. My first thought was to go 3D. 3D in the browser seems to be pretty good these days, and I found a tool that would allow React to interface with the 3D package I'm both familiar with and excited to learn more about: Three.js.

But this is where things get intimidating. Performance optimization is hard. Two reasons:

  1. If you're trying to debug a 3d renderer, you're taking a big step closer to the metal... but I'm confident that there are plenty of good tools to help analyze even at this more machine-level code.
  2. My lack of intuition about the guts of a rendering engine. I already know I'll be slow to explore this solution space, since my 3d programming knowledge is a little bit of Python scripting within blender. But because I don't have any instincts, I'm going to make bad guesses as to what direction is profitable. As I'll detail below, there is a wide array of solutions to try here, and process of elimination with no guarantee of success is a big part of the intimidation.

This intimidation is a gut feeling, and those always merit attention. I took a second to wonder: am I over engineering?

There is an intermediate option between SVG and full-blown 3D: the <canvas> element. That has better peformance than SVG and is made for rendering 2D, which is actually all I need -- the third dimension isn't necessary for what I'm doing... is it?

On the other hand, I know for a fact that Three.js has cameras, and these cameras can zoom. That's an important feature for a huge map. Another one of my features, a minimap, is very much the kind of thing that 3D does all the time -- they're called LODs (level of detail), and they represented a potentially polygon-heavy model at increasingly lower resolutions for when that object is sitting off in the distance. So while my beatsheet isn't necessarily 3d, it can benefit from the 3d paradigm.

Plus the upside of learning Three.js is much higher than me building some kind of canvas plotter. And the downside of an HTML canvas is they are memoryless -- you're just dropping pixels. A cursory look at how others have done interactivity with canvas was not at all promising; even with no hands-on experience in this subdomain of programming, I know jank when I see it.

At this point, I've got one goal and one solution to that goal. That solution has a pro and a con. To recap:

The Goal: build something that's going to make mapping a plot into a delightful experience

The Solution: use Three.js to create a flowchart editor

Pro: Seems well suited for the job

Con: It's intimidating to tackle a large, unknown challenge in the midst of a prototyping phase, which is supposed to be quick & exploratory

This is a great moment for analysis paralysis to kick in, but frankly, I don't have the time or the interest to do so. There is a way to tackle this challenge in bite-sized chunks, and my intimidation isn't that high: I've been coding for years, now, and one thing I know for sure is that I can solve any problem given enough time.

So let's build this thing in 3D.

Plot Arcs

Something fascinating about this:

Plot arcs mapped in Gephi

It actually looks like the bifurcation diagram of the logistic map:

The bifurcation diagram of the logistic map

What's cool about that is the logistic map is a pretty simple function that exhibits chaotic behavior. And I certainly believe stories are chaotic systems, so it's neat to see a visual similarity.

I was very excited to see literal plot arcs. Those look cool, though it's not very functional. Whenever you're working with graphs like this, they become illegible very quickly. So all we need from this particular view is a way to assess each beat in our story. And there's a tool called betweenness centrality does a nice job of this. The principle behind it is pretty simple. In graph theory, there's a thing called a geodesic. These are the shortest paths between two nodes. To measure the betweenness of a node, you look at all of the shortest path pairs, and you see how many times the node pops up. You can think of it like bridge crossing the river. No matter where you want to go to or get from on either side of the bank, you're very likely to pass over that river at the bridge. In my story, I was pleased to see that the major turning points in the story all had high betweenness. Conversely, if a node has low betweenness, that may suggest that it is a candidate for pruning. Particularly if it's in the middle of the story. Since the beatsheet is very unidirectional, there's not a lot of connectivity from later in the story back to the beginning, the earlier nodes don't have much chance to accrue betweenness because they don't have a lot of stuff before that. But if you have a story point in the middle and has low betweenness, maybe it's a dead end.

Another way to find dead ends is to search for degree, which measures the number of connections a node has. Since our graph is directed, we can track the number of connections coming in and going out. In story terms, then, this "in-degree" is the number of causes that lead to this particular beat. And "out-degree" is the number of events this beat causes in the future. When you have an out-degree of zero, that suggests that either you haven't connected the beat up properly and it is influencing something down the line, or there are no ramifications to that beat. Anyway it's nice to have an easy way to visualize this.

One thing that I found surprisingly useful about this graph was noticing who had a voice in it. When you're writing prose, there's lots of scenes that demands character speak up, but at this very abstract level, only the decision-makers will crop up. I was amazed at how often the antagonist would show up, and I guess I shouldn't have been because antagonists are an absolute godsend when it comes to making things happen. But there's one character who in the rough draft showed up a lot, and I noticed that in the beat sheet I didn't need him for much. And that made me wonder if I could cut back his storyline. Just having that in mind, then gave me lots of great ideas and I was able to reduce his impact in the story. Similarly, you get a sense of what moments are turning points based on just their visual layout. In the moment that everything funnels to a chokepoint, you want to make sure that those events immediately preceding it feel solid and well supported. And I had a character death that led to a lot of action, and the character is really nowhere to be found elsewhere in the beatsheet. In that case and suggest I needed to beef up his role in the story.

Finally, tagging each beat with the characters involved gives a nice opportunity to track how characters move throughout the story.

I also started to develop a gut feeling of what a good-looking passage in the beatsheet looked like. You want have some branches but not too many, and you want to definitely have merging branches. But in the middle there's a very natural tendency for the graphic to get very wide, because a lot of threads are ramified hang ends things haven't really wrapped up yet. Part of this overload is actually imaginary because the graph only lays things out according to pure sequence and does not respect the linear order that attacks must occur. So on one hand we want to preserve some of this simultaneous feeling, because it accurately displays the number of things that the reader will be thinking about, but it can be hard to track who knows what when and what has already happened at any given point. Definitely something to work on.