Metamuse by Adam Wiggins, Mark McGranaghan

January 12, 2025

Discuss this episode in the Muse community

Follow @MuseAppHQ on Twitter

Show notes

00:00:00 - Speaker 1: There’s been very little innovation and research

more generally into what is a good interface for inputting equations. So

I think most people are probably familiar with Microsoft Word or Excel

have these equation editors where you basically open this palette and

there is a preview and there is a button for every possible mathematical

symbol or operator you can imagine.

00:00:28 - Speaker 2: Hello and welcome to Meta Muse. Muse is a tool for

thought on iPad and Mac. This podcast isn’t about Muse the product,

it’s about Muse the company and the small team behind it. I’m Adam

Wiggins here today with my colleague Mark McGranaghan. Hey, Adam. And

joined by our guest Sarah Lim, who goes by Slim. Hello, hello, and Slim,

you’ve got various interesting affiliations including UC Berkeley,

Notion, Inc and Switch, but what I’m interested in right now is the

lessons you’ve learned from playing classic video games. Tell me about

that.

00:01:01 - Speaker 1: So this arose when I was deciding whether to get

the 14 inch or 16 inch M1 MacBook Pro and a critical question of our

age, let’s be

00:01:10 - Speaker 1: honest. Exactly, exactly. I couldn’t decide. I

posted a request for comments on Twitter, and then I had this

realization that when I was 6 years old playing Organ Trail 5, which is

a remake of Organ Trail 2, which is itself a remake of the original. I

was in the initial outfitting stage, and you have 3 choices for your

farm wagon. You can get the small farm wagon, the large farm wagon, and

the Conestoga wagon. I actually don’t know if I’m pronouncing that

correctly, but let’s assume I am. So I just naively chose the Conestoga

wagon because as a 6 year old, I figured that bigger must be better and

being able to store more supplies for your expedition would make it more

successful. I eventually learned that the fact that the wagon is much

larger and can store a lot more weight means that it’s a lot easier to

overload it. Among other things, this requires constantly abandoning

supplies to cut weight. It makes the roover forwarding minigame much

more perilous. It’s a lot harder to control the wagon. And yeah, I

never chose that wagon again on subsequent playthroughs, and I decided

to get the 14-inch laptop.

00:02:12 - Speaker 2: Makes perfect sense to me and and what a great

lesson for a six year old trade-offs, I feel like it’s one of the most

important kind of fundamental concepts to understand as a human in this

world, and I think many folks struggle with that well into adulthood. At

least I feel like I’ve often been in certainly business conversations

where trying to explain trade-offs is met with confusion.

00:02:35 - Speaker 1: They should just play Organ Trail.

00:02:37 - Speaker 2: Clearly that’s the solution. And tell us a little

bit about your background.

00:02:42 - Speaker 1: Yeah, so I’ve been interested in basically all

permutations really of user interfaces and programming languages for a

really long time, so this includes the very different programming

languages as user interfaces and programming languages for user

interfaces, and then, you know, the combination of the two. So right now

I’m doing a PhD in programming languages, interested in more of like

the theoretical perspective, but in the past, I’ve worked on I guess,

end user computing, which is really the broader vision of notion, I was

at Khan Academy for a while on the long term research team.

00:03:18 - Speaker 2: Yeah, and there I think you worked with Andy

Matuschek, who’s a good friend of ours and uh previous guest on the

podcast.

00:03:24 - Speaker 1: Yes, definitely. That was the first time I worked

with Andy in real depth, and I still really enjoy talking to him and

occasionally collaborating with him today.

So, I guess, prior to that, I was doing a lot of research at the

intersection of HCI or human computer interaction and programming tools,

programming systems, I guess. So, one of the big projects that I worked

on as an undergrad was focused on inspecting.

CSS on a webpage or more generally trying to understand what are the

properties of like the code that influence how the page looks or a

visual outcome of interest, and there I was really motivated by the fact

that you have these software tools have their own Mental model, I guess,

or just model of how code works and how different parts of the program

interact to produce some output and then you have the user who has often

this entirely different intuitive model of what matters, what’s

important.

So they don’t care if this line of code is or isn’t evaluated, they

care whether it actually has a visible effect on the output. So trying

to reconcile those two paradigms, I think is a recurring theme in a lot

of my work.

00:04:30 - Speaker 2: And I remember seeing a little demo maybe of some

of the, I don’t know if it was a prototype or a full open source tool,

but essentially a visualizer that helps you better understand which CSS

rules are being applied. Am I remembering that right?

00:04:43 - Speaker 1: Yeah, so that was both part of the prototype and

the eventual implementation in Firefox, but the idea there is The syntax

of CSS really elides the complexity, I think, because syntactically it

looks like you have all of these independent properties like color, red,

you know, font size, 16 pixels, and they seem to be all equally

important and at the same level of nesting, I guess, and what that

really hides is the fact that there are a lot of dependencies between

properties, so a certain property like Z index, you know, the perennial

favorite Z index 9999999. Doesn’t take effect unless the element has

like position relative, for example, and it’s not at all apparent if

you’re writing those two properties that there is a dependency between

them.

So I was working on visualizing kind of what those dependencies were.

This actually arose because I wrote to Burt, who is one of the

co-creators of CSS and was like, Hi, I’m interested in building a tool

that visualizes these dependencies. Where can I find the computer

readable list of all such dependencies? And he was like, oh, we don’t

have one, you know, we have this SVG that tries to map out the

dependencies between CSS 2.1 modules, and even there you can see all

these circular dependencies, but we don’t have anything like what

you’re looking for. That to me was totally bananas because it was the

basic blocker to most people being able to go from writing really

trivial CSS to more complicated layouts. So I was like, well, I guess

this thing doesn’t exist, so I’d better go invent it.

00:06:12 - Speaker 2: Perfect way to find good research problems. Now,

more recently, two projects I wanted to make sure we reference because

they connect to what we’ll talk about today, which is recently worked

on the equation editor at Ocean, and then you worked on a rich text CRDT

called Paratext at In and Switch. Uh, would love to hear a little bit

about those projects.

00:06:34 - Speaker 1: Yeah, definitely. So I guess the Paroex project,

which was the most recent one was collaboration with Jeffrey Litt,

Martin Klutman and Peter Van Harperberg, and that one was really

exciting because we were trying to build a CRDT that could handle rich

text formatting and traditionally, you have all of these CRDTs that are

designed for fairly bespoke applications. They’re things like a counter

data type or a set data type that has certain behavior when you combine

two sets, and we’re still at the stage of CRDT development where aside

from things like JSON CRDTs like automerge, we don’t really have a one

size fits all CRDT framework or solution. You still mostly have to hand

design and implement the CRDT for a given application.

And it turns out that in the case of something like rich text, it’s a

lot harder than just saying, oh, you know, we’ll store annotations in

an array and call it a day, because the semantics for how you want

different types of formatting to combine when people split and rejoin

sessions and things like that are all very complex and it turns out that

we have a lot of learned behaviors that arise, even from just like,

Design decisions in Microsoft Word, where you expect certain annotations

to be able to extend, certain annotations to not extend, things like

that. Capturing all of the nuance in that behavior turns out to be

really difficult and requires a lot of domain specific thinking.

But we think we have an approach that works and I would really encourage

everyone to read the essay that we published and try to poke holes in it

too. This was like the 5th version of the. algorithm, right? So like

months ago, we were like, all right, let’s start writing and then

Martin, who has just an incredible talent for these things is like, hey,

everyone, you know, I found some issues with the approach and, you know,

oh no, 00, and sort of we fix those, we’re like, all right, you know,

this one’s good and just repeat this like week after week. So I really

have to give him a ton of credit for both coming up with a lot of these

problems and also figuring out ways to work around it.

00:08:33 - Speaker 2: We talked with Peter a little bit recently, Peter

van Hardenberg, about the pencils down element of the lab, but also just

research generally, which is there’s always more to solve, you know,

it’s the classic XKCD, more research needed is always the end of every

paper ever written, which is indeed the pursuit of the unknown. That’s

part of what makes science and Seeking new knowledge, exciting and

interesting, but at some point you do have to say we have a new quantum

of knowledge and it’s worth publishing that. But then I think if it’s

just straight up wrong or you see major problems that you feel

embarrassed by, then if you want to invest more.

00:09:09 - Speaker 1: Right, exactly. I think in this case. There was a

distinction between, there’s always more we can tack on versus we

wanted to get it right, you know, and in particular, the history of both

operational transforms or OT and CRDT for rich text, just text in

general is such that it’s this minefield of I guess to use kind of a

gruesome visual metaphor, just dead bodies everywhere.

You’re like, oh, you know, such and such algorithm was published and

it’s such and such time and it was new hotness for a while and then we

realized, oh, it was actually wrong and this new paper came out which

proved like 4 of the algorithms were wrong and so on.

And so with correctness being such an important part of any algorithm,

of course, but also kind of this white whale in the rich text field, we

thought it was important to at least make a credible effort to having a

correct algorithm.

00:09:57 - Speaker 2: Yeah, makes sense. Yeah, I can highly recommend

the Paroex essay.

One of the things I found interesting about it, maybe just for anyone

who’s listening, whose head is spinning from all the specialized jargon

here, CRDTs are a data structure for doing collaborative software,

collaborative documents, and then, yeah, rich text, the Microsoft Word

is the canonical example there.

You can bold things, you can italic things, you can make things bigger

and smaller.

Well, part of what I enjoyed about this paper was actually that I felt,

even if you have no interest in CRDTs, it has these lovely

visualizations that show kind of the data representation of a sentence

like the quick brown fox, and then if you bold quick, and then later

someone else bolds fox, you know, how do those things merge together.

But even aside from the merging and the collaborative aspect, which

obviously is the research, the novel research here. I felt it gave me a

greater understanding of just how rich text editing works under the

hood, which I guess I had a vague idea of, but hadn’t thought about it

so deeply. So, highly recommend that paper. Just give them the figures,

even if you don’t want to read the thousands of words.

00:11:05 - Speaker 1: I’m glad you like the figures. They were a real

labor of sigma.

00:11:08 - Speaker 2: Perfect, yeah, so.

00:11:10 - Speaker 1: The one thing I would add is that CRDTs are a

technology for collaboration, but the way they differ from operational

transforms or OTs is that a CRDT is basically designed to operate in a

decentralized setting, so you don’t need a persistent network

connection to all the parts. you don’t need a centralized server. The

idea is you can fluidly recover from network partitions by merging all

of the data and operations that happened while you were offline, and

this turns out to be really important to our vision of how collaborative

editing should work because we think it’s really important for people

to be able to do things like not always be editing in the same document

at the same time as everyone. Maybe I want to take some space for myself

to write in private and then have my changes sync up with everyone else

thereafter. Maybe I’m, you know, self-conscious about other people

editing. are seeing my work in progress, but I think that it would be

interesting and helpful to look at what the main document looks like and

how that’s evolving while I’m working in private, and you can have

that kind of one way visibility with something like a CRDT versus with

something like Google Docs, where it’s just sort of always online or

always not editing in your own personal editor. Conversely, maybe I’m

OK with everyone else seeing the work that I’m doing in progress, but I

just find it really visually jarring to have all these cursors and

different colors jumping around and People inserting text, bumping my

paragraphs down the page. I’ve definitely been there. I’m not

particularly precious about people seeing my work in progress, but I

just cannot focus on writing when the page is just changing all around

me. So in that situation, maybe I would want to allow other people to

see my work in progress, so that we don’t duplicate effort or something

like that, but I just have like a focus mode where incoming changes

don’t disrupt my writing environment and these kinds of fork join one

way window. Microgit style branching paradigms are really only enabled

by a technology like CRDTs where you have the flexibility to separate

and then come back together.

00:13:12 - Speaker 2: And I’m incredibly excited by the design research

that needs to go into that.

Now at this point, I think we’re still on the technology level, you

know, one way to think of it is Google Docs came along, I don’t know,

15, it’s almost 20 years ago now, I can’t even remember, let’s say 15

years ago, and this novel idea that We could both have a shared document

or several people could have a shared document, all see the up to-date

version and type into it and get, you know, a reasonable response or

have that be coherent was an amazing breakthrough at the time and has

since been kind of widely copied notion, Figma, many others.

But now maybe we can go beyond that, much more granularity, like you

said, maybe borrowing from the developer version control workflows a

little bit in a lightweight way, giving a lot more control and

flexibility, and giving us a lot more choices about how we want to work

most effectively.

But before we can even get onto those design decisions and how do we

present all these different things to the user, what are the different

options? We need this like fundamental underlying merge technology,

hence the endless fascination that we have the lab and increasingly the

technology industry generally has with CRDTs because it has the

potential to enable all that.

00:14:23 - Speaker 1: Yeah, when we were working on the Paratax project,

Peter was pushing really hard for, don’t make this just a technology

project.

It’s a socio-technical endeavor and we need to invest a lot of time in

the design component, also just doing user interviews, identifying how

people interact with and.

How people collaborate in the status quo on text and Jeffrey and I

actually did do a bunch of user interviews with people from all kinds of

backgrounds. We’ve talked to people who write plays, people who produce

a dramatic podcast kind of in this style of Night Vale.

I love Night Vale. Yeah, people who are in the writer’s room kind of

working together with their collaborators on that, people who write

lessons, video lessons for educational platforms. And there was a ton of

really interesting Insights into user behavior around collaborative

text.

We ended up just torn because we had this 12 week project and we were

like, how should we best spend our time? Clearly, this is not just a

technical area and we need to invest a lot in getting the design right,

understanding what the design space even looks like since it hasn’t

really been explored.

I really want to avoid, and this is a recurring theme in my work, I

really want to avoid publishing or shipping something. And having it be

this like, very broad, very shallow exploration into all the things that

are possible. I think that this kind of work plays an important role,

and there are a lot of people who do this well, just fermenting the

space of possibilities and getting these ideas in a lot of people’s

heads, who can then go on and do really cool things with them.

My personal style, I never want to feel like something is half baked, I

guess, I would much rather ship this cohesive contribution like, here is

an algorithm for building rich text. We think that this is a technical

prerequisite to all of these interesting design choices, but the

alternative with a 12 week period, and in fact, you know, this, the

correctness and revision phase extended way over that. So thanks a lot

to Martin and Jeffrey for leading during that part.

But it’s just already so hard to get it correct that trying to tack on

a really substantive design exploration that does the area justice on

top of that, I was just really worried it would stretched too thin.

So absolutely lots of room for future work in this particular. project.

It’s very much a challenge in any area where you have simultaneously

this rich design space that’s just asking to be explored with tons of

prototypes and things like that, and then also to even realize the most

simple of those prototypes, you require fundamentally new technology.

00:16:53 - Speaker 2: Yeah, I’ve been down that same path on many

research projects as well, and often it’s that I’m excited for what

the technology will enable, but also that in many cases it’s a

combination, you know, some kind of peer to peer networking thing, but

with that will enable us to provide a certain benefit to the user and I

want to explore both of those things, but then that’s too much and then

the whole thing is half baked exactly as you said. I’ve never found a

perfect or even a good. Way to really manage that tradeoff. You just

kind of pick your battles and hope for the best. Yeah, definitely. Well,

I do want to hear about the equation editor project, but first I feel I

should introduce our topic here, which I think folks could probably have

gleaned is going to be rich text and rich text editing, and maybe we

could just step back a moment and define that a little bit.

I think we know that texts, you know, symbolic representation of

language is a pretty key thing, writing and the printing press and all

that sort of thing. We wrote about that a little bit in our text blocks

memo, which I’ll like in the show notes. But typically, I think

computers for a lot of their early time and even now with something like

computer code is typically plain text, that’s the dot TXT file is kind

of almost the native style of text that you have and then rich text

typically layers something on top of that. I don’t know, so maybe you

could better define rich text for us to have a more concrete discussion

about it.

00:18:21 - Speaker 1: Yeah, I think rich text for most people basically

evokes things like bold, italic, underline, the ability to augment plain

text with annotations that are useful in formatting, actually, I think.

Notepad to word pad is the archetypal jump in software, if you’re

thinking about it from the old Windows perspective.

In the past few years, I think we’ve started to see a real expansion of

what rich text can look like. So, of course, we started out with

something like Markdown, which is, of course, a plain text

representation. But it’s designed to be able to capture more nuance in

plain text and be rendered to something like HTML which very much

supports rich text.

So in Markdown, you have not only these kinds of inline formatting

elements like bold and italic and hyperlinks as well.

You also have support for images, which you could think of as more block

level rich text elements, I guess, and I don’t think there’s a real

clear consensus across editors on how block level rich text elements

should be displayed.

Of course, in between you have things like bulleted lists and those tend

to be handled in a fairly standard manner with nested lists and so on,

but it quickly becomes like a question of taste. Which kinds of

annotations you support.

So in editors like Coto or Notion, you have all these different block

types where the block is really the atom of collaboration and editing,

and then you can have things like, you know, file embeds or even

database views, things like that.

So I think we’re at a point now where both block-based editors, I’m

using block based editors in like the text or writing sense, not the

structured editors for programming sense, although I have other things

to say about that, but we’re at a point where you’re starting to see

these block-based editors appear and I think that there are a lot of

really interesting patterns that this permits that the paragraphs via

linear sequence of characters, including new lines and whitespace does

not permit, or at least doesn’t allow you to build as structured

tooling around.

00:20:30 - Speaker 2: I’m trying to think what is actually the core of

the difference between a block-based editor, that’s a notion, a RO uses

working on its own block text implementation and a flow of characters,

so that’s Microsoft Word, Google Docs, maybe even text editors. I guess

it’s sort of like paragraphs are separated by like these sort of

nested. Elements or have a parent to the document versus like two new

lines embedded in the stream of characters, but I don’t know, that

seems too unsophisticated, maybe have a better definition for us.

00:21:03 - Speaker 1: So, I actually think about this very similarly to

in the like programming languages and editor tools space.

There is a distinction between structured editors and regular plain text

editors for programs. The idea is that you might have a text-based

programming language and you can write that perfectly fine in any buffer

that allows you to put sequential characters, often AI is sufficient for

some languages, and then on the other hand, These programs might have a

lot of inherent structure. A simple example is with lisps which are

built out of these parenthesis S expressions, everything is, you know,

an S expression. You can think about like the structure of the tree

formed by, I guess a forest, formed by having like these S expressions

with subelements and stuff. that, and then you can do manipulations

directly on the structure in a way that allows you to always have a

syntactically correct program or at least a partial syntactically

correct program by doing things like I’m just going to take this

subtree, which is a sub-expression and move it somewhere else where

there’s room for another subexpression. So, I think of block-based

editors as capturing a very similar zeitgeist to structured editors for

code, because instead of just having this linear buffer of characters

that can have, you know, formatting or things like that, you can have

new lines, you actually have more of a forest structure where you have

lots of like individual blocks, and then you can have blocks that are

children of other blocks and so on, and that allows you to Do things

like move an entire subtree representing an outline to another position

in the document without selecting all of the characters, you know, cut

them and then paste them somewhere else. So things like reparenting

becomes a lot easier, things like setting the background of an entire

subtree becomes a lot easier. Just in general, you have more structure

and there’s more things you can do with that structure, I guess is how

I would phrase it. One of my favorite things that you can do with this

model in notion is you can change the type of a block very easily. So

let’s say I have a bullet list item, and then I hit enter and enter

these like subnote or something like that as children of the initial

bullet list item. I can turn the bullet list item into a page, and then

all of a sudden it’s just a subpage in the document, and the sub

bullets that were there before are just like top level bullets in that

page. And this is particularly important for my workflow because I care

a lot about starting out with something like really rough and sketchy

and then progressively improving it or moving up and down the ladder of

like fidelity into something more polished. So you might, for instance,

start off with just an outline list or even a one dimensional list of to

do blocks when you’re trying to do project planning or something. And

then later on, let’s say I want to put these into like a tasks database

with support for like a conbond view or something like that. I don’t

actually want to sit there and like recreate all of these tasks in Jira.

I’ve been there, you know, I’ve been the person making all the tasks

in Jira after the meeting and then assigning them to people. What the

workflow that I think notion is poised to enable and can certainly do a

better job in this regard, but already offers some benefits on is like,

can I just highlight all of these blocks because everything is a block,

move them into some existing database and have them match the schema.

That kind of like allowing people to do fast and loose prototyping with

very unstructured primitives and then promote them into something more

structured like in a relational database setting or similar, I think is

the sweet spot, structured editing provides the sweet spot between like

just completely unstructured text and these very high fidelity, high

effort interfaces that allows you to kind of move between them.

00:24:47 - Speaker 3: Yeah, I really like that direction and framing,

and if I can extend it a little bit, I think we can also look at a

continuum of richness in terms of the content itself.

So you have plain text, what you might classically call rich text with

links and bold and underlying. And then you maybe start to throw a few

images in, and then what if you can put it in videos and what if you

have a whole table, and that table is actually a database query, and you

can nest the figment document, and this way you can see that there’s

sort of continuum on the richness of the document. One reason I think

Notion has been so successful, they’ve been pushing along that

continuum while maintaining a sort of foundation of rich textness, which

is very familiar and the important basic use case for a lot of people.

A related idea is that I think we’re seeing a lot of the classic

document types converge. So if you look at a rich text like a Microsoft

Word and a PowerPoint and increasingly spreadsheets, those all used to

be 3 distinct Microsoft Office applications, and we’re seeing the value

of them being in or being the same document.

This is actually one of the motivating ideas behind Muse and a lot of the

research we’ve done in the lab, and the kind of something Slim was

saying, you want to take your idea continuously through different media

and different modalities and different degrees of fidelity, and you

don’t want to jump between different applications do that. You want to

be able to do it on the same canvas. That’s by the way, one of the

reasons I like Canvas. It’s not only because it’s a free multimedia

surface, but also it evokes this idea of like flexibility and

potentiality, and I think that’s one of the things that’s really

excited about these mixed media documents.

00:26:16 - Speaker 2: And I know if Jeffrey were here, he might jump in

and say that one downside to our current application silo world is that

the only way to have this deeply rich text where it’s images, video, a

table, a database query, something like that, is to have the Uber

application, to have the everything app, and certainly notion has

probably gotten pretty far on that, but others kind of in In some ways

are forced to do that, like we have to do some of that in Muse as well.

People come in and ask for all these different types here as well, and

there’s more of like an open doc inspired or Unix inspired future that

maybe Jeffrey and others, including me, would hope for, which would be

more that applications could be these individual data types and you

could put them all together through some kind of more operating system

connection.

But that is so completely reversed from kind of how all our computing

devices work today. It’s hard to see how we might get to that.

00:27:14 - Speaker 3: Yeah, I’m certainly sympathetic to that concern,

although I suspect the way out is through, and you get platforms from

working killer apps.

And so the way we got the whole unit ecosystem was they wanted to build

a computer for, you know, writing and running programs and then

eventually got all this generalized text processing stuff, but it’s not

like they started in like, oh, I’m gonna make a generalized text

processing machine.

I don’t think that was really the way that they approached it and

developed a success. So, I’m still hopeful we could do this, but I

think you got to extract it from something that’s already working as an

app, but it always helps to have an eye towards that, and I think we’ve

done some of that with Muse.

00:27:46 - Speaker 1: I was just going to say that it’s not me talking

about texts, unless I bring up my favorite piece of software of all

time, which is Pandok.

And I think that Pandok actually is very relevant to this discussion. So

for those who aren’t as familiar with it, Panok brands itself as this

Swiss Army knife for document formats, and it’s sort of headline

contribution is that it allows you to convert between all kinds of

documents.

For instance, I can take a Word document and convert it to a PDF Word

documents to something like, I don’t know, IPython notebook, Jupiter

notebook, back and forth across this incredible bipartite graph of

formats, but I think that the subtler contribution that Pandokc makes,

which is extremely significant, is that Pandok has this form of markdown

called Pandok markdown that essentially aligns and supersedes all of the

different fragments of markdown that we’ve seen before.

So the problem with markdown basically is that the original

specification is sort of ill-defined. There are several cases in which

the behavior is not super clear and then on top of that, it’s not very

expressive.

There aren’t very many constructs. So things like fenced code blocks,

which many people associate very closely with Markdown today, that was

only added by GitHubb flavored markdown, which is certainly widely used

among the programming community, but not everyone is on GitHub, of

course. And then you have things like table formatting or even like

strike through really strike Through wasn’t defined in the original

markdown specification either. And so you have markdown and then you

have like GitHub flavored markdown, common mark is sort of this unifying

effort remark down all these different is the markdown cinematic

universe. I tried to make a joke about this. I had this joke ready for

the markdown Cinematic universe when the last Marvel. Movie came out.

But then like, it didn’t get nearly the traction in my timeline as the

Dune did, perhaps understandably. So really, I’m just going to have to

wait till the next movie comes out. It’s a real, real tragedy. No, but

like, I guess you have this real pluralism of forms and it becomes very

difficult to use markdown truly as a portable format because the way it

renders in one editor or even parses can very much differ from editor to

editor. So, Pandoc provides this format that essentially serves as an IR

or intermediate representation between all these kinds of documents

using a markdown supersets that somehow magically encapsulates

everything.

00:30:18 - Speaker 2: And that includes not just markdown, but also like

PDFs or Microsoft Word, that seems.

00:30:24 - Speaker 1: Well, so the way it works is it’s this

compilation pipeline, I guess, that allows you to go from a markdown

document.

It compiles it to PDF using PDF Lawtech or something. It outputs

Lawtech, it outputs HTML various things, and you can think of it as

being this intermediate representation because you start with this like

Word document, you can turn that into markdown and you can go from that

markdown format into any of these output formats, which turns out to be

like really powerful because the main issue with these kinds of

conversions is that it’s often lossy, there are features that are

supported by Law tech, for instance, that aren’t supported by the web

natively, there are features that are part of like Word documents that

aren’t necessarily supported by HTML and so on and so forth.

So Pandok serves this role of like basically saying, OK, what is an

intermediate language that can encapsulate all the different

implementations of the same concept across different input and output

formats.

And what I think is so remarkable about it is that oftentimes when you

are using an AP. of software and you’re like, oh darn, you know, now I

need to support this other thing too. You quickly end up in a situation

where you have the snowball and things start to feel tacked on.

So you’re like, Oh man, it’s very clear that they just glommed on this

additional syntax for this feature. And with Pandok, everything feels

like very principled in its inclusion. And at the same time, whenever

I’m using Pandok and I’m like, darn, I really wish there was a

construct that I could use to express this. particular thing, I look up

in the documentation and it’s always supported. So, as one of my

favorite examples, one of the output formats that Handok supports is

various slideshow frameworks. So Beamer for people who use Lawtech and

Reveal JS for people who use HTML and CSS and these slideshow frameworks

basically allow you to replace something like PowerPoint, Keynote,

Google slides with essentially like a text-based format. I really like

doing slideshows in Pandock markdown. There are a few reasons for that.

The first reason is that it’s really useful to be able to reuse some of

the same content from like my blog post or essay even in the slideshow.

There are some really minor and almost petty, but really significant

reasons. Like, I like to have equations or code blocks with syntax

highlighting in my slideshows, and there’s not really a good solution

to putting like a syntax highlighted code block in Keynote right now.

00:32:39 - Speaker 2: Last I remembered, the gold standard at the Ruby

conferences I used to frequent was to take screenshot of Textmate and

paste that in.

00:32:47 - Speaker 1: Yeah, it’s awful. I don’t want to see your like

monochai editor with like the weird background that contrasts weirdly

with the slide background. I just, ah, and it doesn’t scale on a huge

conference display anyway, I digress, but The other reason why I really

like doing my slideshows in text is actually that there is often a

hierarchical structure to my presentations, right? I’ll have like these

main top level sections and then I’ll have subsections, and then I’ll

have like sub subsections and all of these manifest and slides. But in

the gooey thumbnail view of most of these existing Slideshow editors

like PowerPoint or Google slides, it reduces it all to like this linear

list. It’s like, here are all of your thumbnails in order. And it makes

it very hard, as soon as I have like an hour-long conference talk, how

do I like jump to this subsection that I know exists, aside from like

scrolling past like 117 thumbnails and trying to find the right one,

right? And moreover, let’s say I want to Reorder a certain part of the

talk because I think it better fits the narrative structure. Now I have

to like figure out which thumbnails I need to drag to which other place

or worse, go into the individual slide, select the text from that, move

that somewhere else, and it’s just way, way clunkier actually than

reordering some text in like a bullet list outline in my editor.

And then the other part is that I was talking about how Pandok has

really great support, expressive support for idioms of different

formats, and one thing you often have in slideshows is that I have some

element on the screen and then I press, you know, the next button again

and then another element will appear.

So in Pandoc you can denote this with just like an ellipsis basically so

like dot dot dot and then if I have a slide where I have a paragraph and

then the dot dot dot and then another paragraph, it will render with

just the first paragraph visible and then I press next and then like the

subsequent paragraph comes in.

And that’s like just a very lightweight way to handle these stepped.

Animations compared to going to the animation pane and then clicking the

element that I want to animate in and so on and so forth.

So it started off with me being like, I’ll just prototype in this

format, but then it ended up supporting columns, it supports all these

things that you actually want. And I was like, this is in many ways a

more ergonomic way to handle long technical slideshows. Anyway, I have

to chill for Pandok anytime I talk about rich text, I’m contractually

obligated to do so.

00:35:08 - Speaker 2: Yeah, it’s a great piece of software, use it here

and there. I think I was doing some Asky doc kind of manuals many years

ago and yeah, just in general, it’s also worth looking at the homepage

that you mentioned the plot they have where it shows all the different

formats that can convert between is quite fun. You click on that, you

can zoom in.

00:35:26 - Speaker 1: Yeah, I had this really elaborate plan when I

decided to go to Berkeley, that I was going to print out a door-sized

poster of like that graph that shows all the formats they convert

between and then show up at John McFarlane’s door and ask him to sign

it. But then the pandemic interfered with some of those plans.

Nonetheless, it remains on my list.

00:35:48 - Speaker 2: Good bucket list item, pretty unique one at that.

00:35:51 - Speaker 1: Also, I found my tweet, or I found the draft of my

tweet, which is about eternals, and I said, directed by Chloe Zhao, the

latest entry in the Markdown Cinematic Universe features an ensemble

cast of multi markdown, GitHubb flavored markdown, PHP Markdown Extra, R

Markdown, and Common Mark as they joined forces in battle against

mankind’s ancient enemy, Doc X. Nice.

00:36:12 - Speaker 2: Wow. You would have gotten the like from me.

00:36:16 - Speaker 1: Yeah, we’ll see if it ever sees the light of

Twitter.com.

00:36:20 - Speaker 2: You briefly mentioned there equations and La tech,

and maybe that’s a good chance to talk about the equation project you

did for notion. And part of what I thought was so interesting or what I

think in general is interesting about equations is that they are

obviously an extremely important symbolic format, but in many ways

extremely different from the pros we’ve been talking about.

So English or other languages, even languages that are right to left or

something like that, they all have the same kind of basic flow and the

way that we represent sound. So with these little squiggly symbols, even

though the symbols themselves and sounds vary and how we put them

together into words across languages, that’s a common thing. If you go

to the mathematical realm, you have symbolic representation, but

equations are the whole own beast, and I think one that has gotten a lot

less attention from kind of the software and editing world. So tell us

about that rabbit hole.

00:37:16 - Speaker 1: Yeah, so just as context for people, notion and

many other applications actually have long supported block equations, an

equation that basically takes up, you know, most of the page

horizontally.

What is much more uncommon in editors is support for inline equations

and so this can be something as simple as saying, You want to type let X

be a variable, and X should be formatted or stylized mathematically.

Being able to refer to elements of a block level equation in inline text

is a prerequisite for being able to do any kind of serious mathematical

writing, yet because this is kind of this niche area that has

historically been the purview of Overleaf and other law tech editors,

it’s really not implemented.

In most editors.

So I pushed really hard to add inline equations and inline math to

notion, because I was like, there’s a huge opportunity for people to

write scientific or mathematical documents that take advantage of all of

notion’s other features like being able to embed FIMA or embed

illustrations and things like that, right? So, it turns out that it’s

kind of difficult, exactly as you’re describing to do this equation

format.

There’s been very little innovation and research more generally into

what is like a good interface for inputting equations. So I think most

people Probably familiar with Microsoft Word or Excel have these

equation editors, or even like operating system level sometimes where

you basically like open this palette, and there is a preview and there

is a button for every possible mathematical symbol or operator you can

imagine. And then for composite symbols like the fraction bar or

integral or something like that, you find the button for that, you click

it, and then you click into like the little subboxes and then you find

whatever symbol you want and you put those there too. So it’s kind of a

structured editor, but like in an unimaginably cumbersome interface.

This is what I used to do my lab reports in high school, for example.

And then at the other end of the spectrum, you have things like law

tech. Law tech is basically how everyone in at least in computer science

and mathematics chooses to typeset their work, typesets complex

mathematics. One of the real selling points of law tech, I think is that

It turns out that operator spacing is really important, and there’s a

big difference between, say, a dash that’s used like a hyphen or a dash

character that’s used in text, and a hyphen or a dash character that’s

used as a minus sign in an equation, the spacing is subtly different.

And one of the big things that Lawtech does is it basically allows you

to declare certain operations in certain contexts as like a math

operator versus just a symbol versus just like a tagged group of

characters, and it correctly handles the spacing depending on what kinds

of characters are around the operator in question. And so Lawtech

basically produces really nice looking mathematics at the cost of this

markdown which looks like I kind of smashed my keyboard that only had

like 3 characters. It’s the exact opposite of the equation editors

instead of having a button for every imaginable character, you only have

3 buttons. The buttons are backslash, open curly brace, and closed curly

brace, and somehow like permuting those characters is supposed to get

you like any possible mathematical outfit. There’s just two ends of the

spectrum.

00:40:41 - Speaker 3: Yeah, I used to do my analysis homework in college

in law tech, and I remember when I first looked up how you would input

in law tech these formulas, like, that can’t be right. This is not the

best way in the world to do this. In fact, that’s it, that’s the one

and only way.

00:40:53 - Speaker 1: It really is, it’s terrifying. It’s the one and

only way and the wild part is there are people who are like super, super

good at law tech. They can like live tech their lecture notes. I was

never nearly like that fast, but some people can do it usually with

extensive use of macros, which macros are another selling point of law

tech as you can define these kind of custom shorthand for operators you

use a lot. But anyway, yeah, so you have a lot of tech sort of at the

other end of the spectrum, like really quite unreadable, oftentimes,

like, it’s like a right only format, many times.

00:41:23 - Speaker 2: And of a regular expressions come to mind on that

as well, yeah.

00:41:26 - Speaker 1: It’s exactly the same zeitgeist, I think. It

turns out that figuring out how to have like a combination, gooey, plain

text interface that allows you to be like in a rich text editor like

notion, then. into an inline equation field to have like an inline

symbol and then go back into the GUI editor was like just very

unexplored territory.

And it kind of makes sense that lots of people don’t prioritize this

because many people that notion rightfully had the question like, oh, is

this something we should be working on? But first of all, it turned out

that if you actually tallied up like our user requests, inline math was

like near the top.

Of editor feature based requests. And then more generally, it turns out

that because this is like a prerequisite for many researchers and for

students, you can get a lot of people on your platform who rely on it,

you know, as a student to take notes and something like that, because

there’s literally no alternative. And then they are able to stick

around and use the platform for all kinds of other things.

So this is just kind of a plug that more editors should implement this.

But Yeah, I thought that this project was really interesting because in

the interaction paradigm, you want to capture a lot of the things that

are very fluid about editing regular text. So for instance, we knew it

was important that you should be able to use the arrow keys to move left

and right, kind of straight through a token without editing it if you

wanted, or if you wanted to be able to go. Into a token and edit it

using the arrow keys, you shouldn’t have to like use the mouse to

click, although, of course, you should also be able to use the mouse to

click. And when you have this formatted equation, we made the decision

that the rendered equation would be represented as this atomic token. So

if you were highlighting text to copy and paste and move around, it

would be like highlighting a single character that would just be like

the whole equation. But of course, you could go in and edit the

equation. Any way you want it in kind of this pop up text editing

interface.

I think another thing that’s the subtle interface challenge here is

that like Mark was saying, there is often a Uh, disproportionately large

number of characters used to represent the equivalent of like one

character with a formatted output. And so that’s something you don’t

really take into account. The output is like X with a hat in San Sara

font, and then there’s like 25 characters of markup that goes into

that, and you just need to like scale the interface appropriately to

take that into account.

But I think that it’s really interesting because It shows the power of

combining different input and output formats in like the same atom,

right? So you have like a single line of text, and you want to have rich

text that’s formatted and stylized and so on, hyperlinks, and then also

equations or whatever inline rendered output of another input format

that you have. I think that that’s really where GUI editors and whizzy

wig editors can shine is being able to combine these like, Input formats

and output formats like in the same line in Chu, yeah, I guess you

can’t really do that at all with the terminal or something like that,

and I say this as someone who uses like CLIIM for everything.

00:44:34 - Speaker 3: This is bringing back so many memories. I wish I

had notion with equation support back when I was a math undergrad. It’s

so nice.

00:44:41 - Speaker 1: I’m like the notion math stand guardian, I don’t

know, something like that. And I’m always keeping track of like all the

cool things people are doing using equations and notion.

A lot of people are doing like math blogs in notion, which is really

awesome for me to see. Also, I just feel like they’re having tried lots

of other things. They’re just like really isn’t. A good alternative

short of like actually writing lots like for your blog, which no one

really likes. And yeah, I mean, certainly it’s the kind of thing that I

implemented originally, kind of, I was like, I’m gonna do this for

myself, and then realized that lots of people would be able to benefit

from it.

It’s been really cool to see a bit of reception it gets, like the

inline math tweets on the notion, uh Twitter account overwhelmingly get

the most engagement and interaction.

And initially, like the marketing team was shocked. They thought this

would be the super niche feature, but no, it turns out that people love

math and like, they may not be the most vocal proponents or they’re

used to no one caring about math type setting, things like that.

For a while, I think it was the case that when I did find an editor that

had support for equations of some kind, to me, it was overwhelmingly

obvious that the people who implemented it did not regularly use

equations for writing. I think you can often tell that with different

features. So I think that having that kind of Representation is not

quite the right word, but being able to see a feature that was designed

by someone who really cares about using it themselves is really cool for

people who are interested in typesetting, students, researchers, people

who are interested in typesetting more mathematical text.

00:46:11 - Speaker 3: Yeah, and I think it’s really important, like you

were saying that it’s mixed media because you’re combining the

equations, the inline equation and the block equation, by the way, in

the world class form, which is a lot tech based with a world class rich

text editor with text and images and stuff. It’s really nice. I do

think there’s still one frontier here, especially for math, which is

the fully gradual process from you’re taking handwritten notes and

you’re working out a problem and you’re drawing squiggly diagrams all

the way up through your finished homework. I remember when I was at math

undergrad. I would basically have to do the homework twice. You do it

once on paper. Nobody could read that, including myself, so that, you

know, do it in lot again. And I always wish there was a way to do it

incrementally. You sort of changed equation by equation and diagram by

diagram into the final product. And I know there has been some research

on uh turning equations into lot tech formulas with machine learning. I

don’t know if I can do handwriting, but perhaps someday we’ll get the

new support for equations and you can go all the way to the end.

00:47:02 - Speaker 1: Yeah, like you, I share exactly the same

frustration that you have to essentially do lots of things twice, and

the relative position of everything is ambiguous, and Lawtech is what

allows you to do things like have subscripts of subscripts, which would

be really inscrutable in most people’s handwriting, including my own,

and, you know, subscripts of subscripts along with super scripts and

things like that. There are just so many ambiguous details and it turns

out in my experience with like, anything that tries to automate the

transition is that I always end up Going through and like really

rewriting all of the details to be structured in a readable way.

You have this other problem which back in the days of like Wizzy Wig web

editors like Dreamweaver and Microsoft Front Page and things like that,

you would often end up with this problem where you try to do like any

edit in the Wizzy Wig side and then you look at the generated HTML and

it’s ridiculous. There’s just like 16 nested empty span tags, and no

one would ever be able to maintain that.

And my worry is basically that when you automatically create Markup for

something that has a very complex graphical representation, it’s really

like one way, you know, maybe it will help you produce a compiled

output, but it doesn’t actually help you go back in and like edit and

tweak the representation later or it’s just so inscrutable if you do

that it’s kind of also a reg x type situation.

I think we really need to get to some kind of like good intermediate

representation that allows you to flexibly go both ways.

And that goes back to something that I think Adam and I were chatting

about earlier, which is that a lot of people gripe and complain that

like law tech is the best we have and, you know, I’m one of them, but

It really is the case that, you know, lottech was just this like

monumental effort by really a few people and amount of effort that would

be like considered really impressive if I were to try to do the same

thing but better today and not a lot of people just have like spare time

to do this all in one text formatting, packaging, document

representation project, even though it would have huge impact on the way

people write and publish these kinds of documents. And so in many ways

we’re sort of just bottlenecked on the fact that It’s hard to do

incremental improvements to this particular area. We really depend on

these like software monoliths to keep us afloat.

00:49:19 - Speaker 2: I’m not nearly as mathy as either of you, but I

can’t help but make the comparison on these equation editing to what

you mentioned earlier with kind of structured editors and programming,

where whether there’s lightweight help from your text editor, things

like code folding, syntax highlighting and autocomplete, or full

structured editing, some of the visual programming stuff we talked about

with Maggie Appleton, like Scratch, for example, or these flow based

systems that are fully graph. and you sort of can’t have it in a bad

state. And I can’t help but to think there might be some direction like

that that is not necessarily the right only inscrutable tech, but is not

the Microsoft Word one button literally for every symbol you might ever

want.

It does seem like there might be some other path, and yeah, I agree

it’s a monumental effort, but I mean, mathematics is so important and

foundational and so much of human endeavor that certainly seems like one

worth investing in, although perhaps hard to reap a profit from, and

that makes it harder to put concentrated capital behind it.

00:50:20 - Speaker 1: Yeah, I think that there’s definitely very clear

demand for I think something exactly like what you’re describing, which

is somewhere in between the two extremes, and it is really relevant

because ACM, which is the Association for Computing Machinery, the

academic and professional body really for computer science, they are

currently undergoing this.

Fiasco, maybe, I probably shouldn’t go on the record as calling it a

fiasco.

The ACM is currently undergoing this initiative called TAPS, which is

the ACM Publishing System, where they are attempting to revise the

template by which all computer science research is published and

disseminated, and the idea behind this is that right now, computer

science research is published to these PDFs. Initially they were all two

column PDFs, now I think there’s some one column PDFs. They want to

output HTML as the archival format for various reasons, including that

it offers much better reading experience on different screen widths, so

like phones or tablets, which are increasingly how people are reading

papers, not just printed out. And they are much more accessible than

PDFs. PDFs are just like really quite inaccessible, especially to screen

readers and other assistive technologies that are trying to parse out

all the different math or whatever arbitrary formatting you’ve decided

to use. The upshot of this, I guess, is that there are currently a group

of very smart people who are trying to figure out how in the world

we’re going to get people to start writing all of their papers and

outputting them in a different format, in a world where everyone is

already used to preparing. Their publications and preprints in law tech.

And turns out that even if you solve the problem of like what the input

syntax should be, rendering math in the browser is like an extremely

unsolved problem.

00:52:05 - Speaker 3: Yeah, isn’t the state of the art that it like

generates PNG and sticks it in the web page?

00:52:09 - Speaker 1: Not exactly, but like almost. OK. So MathML, which

is like an XML dialect or like mathematical markup language, was this

effort to build.

HTML XML style syntax for typesetting mathematics.

Naturally, it is only implemented in Firefox, so that’s really

unfortunate. So in terms of the state of the art, there are basically

two libraries that you can use to typeset mathematics. There’s math

Jack and Caltech.

Mathjax supports basically all valid law tech, including, you know,

different. Environments and equations and things like that.

The problem is that Mathjacks is very slow. So if you ever go on math

overflow or another like related stock exchange and you see like all of

these answers with like weird gaps, and then as you watch before you,

the page starts to like load all of the rendered equations like bumping

everything down one level at a time. That’s math Jackson action.

And oftentimes it is doing what you’re describing where it is

outputting like an SBG or a PNG or something like that, and it’s just

like reflowing the page with every equation.

So then you have Caltech, which was a library developed at Konn Academy

where they realized that math Jack’s performance was basically just

like not satisfactory for their exercises and things like that. Sootte

supports a much more limited subset of all of Law tech syntax, but it

does it all using CSS basically, and it doesn’t reflow the page for

every equation. It’s basically instant surrender.

So tech is what we use at Notion, it’s also what’s used in like

Facebook Messenger, which supports equations if you ever tried that, and

many other websites, and basically it means that your options, if you

want to render math are only target Firefox. Use a limited subset of

math that’s supported by Kottech and Consign yourself to like extremely

slow, dozens of reflow, full expressive power rendering to inline

PNG’s.

And so that’s just not like a great situation to be in, and we haven’t

even gotten to the question of like how people write math. So I would

say that people underestimate like how open this problem spaces.

00:54:17 - Speaker 3: Yeah, man.

00:54:19 - Speaker 1: Just take a moment of silence to like recognize

the gravity of the situation.

00:54:23 - Speaker 3: This is an aside, I don’t know if you want to put

this in the episode, but now I’m curious. It sounds like both of those

are interpreted in the sense that the equations are rendered at load

time instead of being compiled down to some like HTML and CSS that you

can render without JavaScript. Like, basically, do you need JavaScript

to render these pages?

00:54:39 - Speaker 1: Yeah, basically, I should say you also need

JavaScript, unless you’re doing the pre-compied to MathML and then hope

that people are using Firefox.

00:54:47 - Speaker 3: Man, I feel like there’s no way that that stuff

loads in 10 years, but we’ll see.

00:54:52 - Speaker 1: I actually had this exact argument, again, I

don’t know if you want to put this in the episode.

I had this exact argument with Jonathan Aldrich, who’s on the taps

committee when we were talking about this, and I think the point was not

so much that you can guarantee that the artifact loads. Exactly the same

way in 10 years, but that the representation is rich enough that one

could feasibly build software that renders it the same way in 10 years.

So it’s more about the fidelity of the like underlying representation

where like a team of, I guess, digital, you know, archaeologists could

recover the work that we were doing and not so much like we trust in the

vendors to like keep everything stable, which is obviously never going

to happen. You know, the only reason like PDFs are stable is because how

many trillions of dollars of IP depend on being able to load the PDF the

same way as it was written, you know, 30 years ago.

00:55:45 - Speaker 3: Yeah, interesting.

00:55:46 - Speaker 1: Nice. Going back to this idea earlier that Mark

mentioned of the spectrum of like plain text, rich text, Wizzy wig

editors.

One recurring theme for me is thinking about decoupling this spectrum

into like what is the format and then what are like the editors and

tools that we can use to interact with this format, so they structured,

unstructured, etc.

I want to call outAR, which is a native application for Mac OS and iOS

that does a really great job with this, which is that Bear is basically

Something in between a whizzy wig and a plain text editor in that

you’re always editing markdown documents and indeed, when you have

something that’s bold, you can see the like asterisks around it that

delimits that character.

But all of these standard, you know, Control B, U, editor shortcuts work

as you would expect.

And more importantly, you can see like the formatting applied in real

time.

So That when you do star star, hello star star, he suddenly becomes bold

face in this gooey.

And so in many ways it combines like the fluidity and the real-time

preview of a rich text editor or previewer with the flexibility of like

ultimately just writing plain text characters. And I think this is like

really unexplored area.

I don’t just mean something like Open VS code or VIM and type

characters and then see like different formatting labels attached to the

results.

I mean like a native application that’s really designed like for end

use or end users, that doesn’t fully obscure the input syntax but does

real time rendering in place.

It’s not even like in monospace font, right? It makes it feel much more

like this is actually the output that you’re targeting. And not just

like an input step that needs to be pre-processed. I think that there is

a lot of room for applications that are kind of in between and in that

same spaces where it doesn’t entirely obscure what you are writing, but

it does give you a lot of the benefits of previewing things and having

like a GUI application outside of the terminal in terms of like

capturing the richness of the possible results.

00:57:52 - Speaker 3: Yeah, I like the bear approach a lot. Now, are

there particular domains or types of documents that you think would be

susceptible to this approach, or it just for rich tech specifically?

00:58:01 - Speaker 1: So I was making a list of like all of the

different traditionally graphical outputs that have corresponding plain

text representations and a lot of them I was thinking about, for

example, in engraving sheet music, right, traditionally you would use a

desktop program like Finae or Sibelius nowadays you have options like

new score and flat, which are more web-based editors, but you see the

staff and you click notes. In the staff like corresponding to where you

want the note, and you know you use the quarter note or the 8th note

cursor to pick the duration and so on.

And then at the other end of the spectrum you have Lily Pond, which is

kind of like law tech I guess for engraving sheet music where you type a

very like law tech-esque syntax and out comes, you know, beautifully

typeset sheet music. For me this is like a little bit too. Gnu edgy,

just because when I think of like composing music, I’m very much

thinking about like what the staff looks like, just to be able to

visualize chords and counterpoints and things like that.

But I think the upshot is that like you could very easily have something

in between where you have like a text-based or non-binary representation

of like a piece of music or a composition, and then you can edit it

either using like the text editor or using the structured editor of an

existing Wizzy wigUY like composition software or notation software

rather, and edit the same representation both ways. And then likewise,

you have for diagram generation. This is an area that’s been A real

pain point for me historically because you can basically do something

like really low fidelity, like sketching on paper, but then if you

don’t want to like take a picture and upload it to whatever document,

right? All of the options are like very high fidelity, like there’s

omnigraphle and whimsical and Sigma, which is even more involved where

you get all of these nice things like lots of styles and force directed

layout and so on and so forth, but it’s like quite cumbersome to input

a diagram that you sketched in all of 30 seconds into omnigraphle in its

full glory. And then you have like on the plain text end of the

spectrum. The software like graph is Tie for Law tech things I really

like are Mermaid, which is a markdown type syntax for quickly generating

diagrams. There’s SVG Bob, which is incredible. It basically lets you

turn Asy art into formatted SVG though, as a brief aside, I don’t

actually know what problem this is solving. Aside from being incredibly

cool, because at least for me, I consider myself someone who’s like

fairly artistic, and it takes at least as much effort to figure out how

to make a really nice Asky art like thought bubble as it does to figure

out how to actually like do the SVG. I’ve always really wanted

something that basically allows you to edit it either as text, which

allows you to prototype really quickly, make a fast flow chart or

something like that, and something. I’ve always really wanted an

intermediate representation for diagrams where you can edit it either on

the text end using something like mermaid to do really fast prototyping

for a flow chart or something like that. And then if I wanted to have

more precision and control, I could also pull it into software like

omnigraphle or Figma and make fine grain tweaks, um, get like my nice

force directed layout or control where individual nodes were if I find a

grain control over positioning, things like that. I guess I think there

are lots of different areas outside of just traditional documents that

are ripe for an editor or a representation that learns some things from

the plain text approach and some things from the whizzy wig approach. I

think that we are, yeah, we’re getting close to being able to explore

those, but I would love to see more work in this area.

01:01:40 - Speaker 3: Yeah, this is very interesting. One challenge here

I think is with plain text and rich text, the structure of the text and

structure of the final output are going to be pretty close.

And so that makes it most feasible to have the thing where you’re

seeing both the worlds superimposed with the double asterisks on both

sides and the bold text, for example, with something like a diagram, if

you were to represent a diagram in just like Like not like input, it

would be a complete mess. It basically no resemblance to the final

output, just be like a string of really opaque characters, and then it

would compile out to a nice graph, but it’s kind of hard to go back and

forth because of that.

One way to combine these two worlds would be to invoke the command

palette metaphor that we see emerging so often, as you can imagine, OK,

you’re editing a score or you’re editing a graph. And instead Having

1000 buttons around the edge of your screen like you do with these

typical applications, the only interfaces you can click on stuff and

then you can type stuff in the command poet. So you click up where you

want to add a note and you say like B, you know, BQ, and it puts in the

Bcor node and so on. And similarly with graphs, you could click on a

node and you could invoke little commands with your text editor or

perhaps edit the little node locally represented as a little text box.

That’s kind of a way to bridge this issue of a pure tax representation

would have no obvious correspondence to a 2D or a 3D image, but if you

have some way to get more local nodes, it could work well.

01:03:00 - Speaker 1: Yeah, definitely.

01:03:01 - Speaker 2: And the thing that brings to mind for me is our

oft-cited favorite tool for thought, which is the spreadsheet where you

do have this, it’s a very, very simple version of that, this 2D layout,

but in fact you do click on cells and type in symbolic there, so you are

mixing a visual spatial layout, a very lightweight one with some

symbolic representation.

01:03:23 - Speaker 3: Spreadsheet remains undefeated.

01:03:25 - Speaker 1: One thing that I find really interesting about

spreadsheets, that’s I think often very unexplored is that many

applications like Air Table notion is also very much guilty of this.

You can capture like the power of the spreadsheet as like a relational

database or we like what happens if we impose better structure onto the

different columns and things like that, but there’s like a separate

totally untapped. Under explored area of spreadsheets, which is that

it’s basically this canvas, right? Spreadsheets capture everything that

people liked about table-based layouts in HTML with none of the stigma

associated with it. And so you can create these like really complex

interfaces that basically just do data. and things like that and put

things, be like, OK, I’m going to like copy this data and bring it over

closer to where I’m working now so I can reference it more easily.

It’s basically just this grid, right? And that’s totally unstructured.

It doesn’t correspond to any kind of relational format, but it’s also

a really powerful computation paradigm.

01:04:20 - Speaker 3: Yeah, totally. I think people really love to be

able to click somewhere and put stuff there. And a lot of spreadsheet

use is just that. They just want to click there and put text or put a

color, and there’s no formulas at all. And by the way, this goes back

to our idea of convergence of the Office document types. I see people

using Figma for this a lot, like they’re not designers, they’re not

designing interface. They want to click and put pictures on a 2D canvas,

and they want to click and put text there and you could see a sort of

continuation of this world where these things continue to merge as the

software gets more sophisticated.

01:04:49 - Speaker 1: Yeah, and then on the subject of diagrams real

quickly, I remember that I want to mention sketch and sketch, which is

this project by Brian Hempel, Justin Lubin, Robbie Shug at University of

Chicago from a couple of years ago, and the idea there is you have

direct manipulation programming for SBG.

So in the same you have this editor and then on the left side, you might

see the code that outputs a certain SPG on the right side you see the

SP. itself, and you should be able to do things like directly go in with

the mouse, click an anchor point and drag it somewhere or do other kinds

of transformations that people are used to when SVG editing, and it

should obviously be reflected in the output, but also change the code

that goes into it, and then you can make changes to the code and it will

modify the output.

I think this is one of the most successful examples. I’ve seen of an

editor that actually manages to keep this bidirectional linkage working

and when you make manual edits with the direct manipulation edits with

the cursor, it doesn’t totally botch your code and when you make

changes with the code, it doesn’t lose all of your edits with the

visual side. I think it would be great to see like more things like this

for more structured areas like diagramming or things like that.

01:05:59 - Speaker 3: So many research projects to do.

01:06:02 - Speaker 1: Yes, lots to do.

01:06:04 - Speaker 2: So slum, I see a recurring theme in how you think

about all of this, whether it’s equations, prose, rich texts, musical

score, or diagrams, is this intermediate format concept, and maybe like

a straw man or an outside view might come at this thinking, well, being

able to see something like a markdown is sort of exposing plumbing that

nerdy programmer types might like, but The reason we invented what you

see is what you get word processors, whatever, 40 years ago or whatever

it was, was to potentially liberate us from that.

But I see that you see the future is not one where those go away.

We want to expose that. There’s some value to that separately from a

fully visual 100% mapping the rendered output and the way you edit it

looking precisely.

The same, so I think that eliminates somewhat what I would imagine how

you would answer the question I was going to ask you about the future,

but with that in mind, I’ll basically say, yeah, if you look forward,

say 5 or 10 years to what advances either have happened or that you hope

to see happen in terms of how rich text works on our computing devices,

what does that look like?

01:07:16 - Speaker 1: Yeah, I think it’s exactly like you were

describing, we originally had this idea that you would be able to get a

Wizzy Wig editor or something like Microsoft Word and totally decouple

yourself from this underlying representation. I think that works up

until the point where you have lots of different Output formats or

different ways of viewing the document that people would like to use.

And as soon as you are in a world where even something like, let’s say

I want to have two different views in a GUI application, all of a sudden

it becomes much more beneficial to have some kind of intermediate format

so that you don’t have to do like N times different renderers and

parsers and compilation pipelines for all of these internal things.

01:08:01 - Speaker 2: So there’s a simple example of that earlier you

mentioned the reading academic papers on different size screens, you

know, a phone versus a desktop versus a printout, that even just the

basic reflow of the text, simple as that seems to a narrower or wider

screen actually is pretty complicated and There was an approach of

designing for several different screen sizes, but now we know that

that’s not very futureproof and doesn’t the way we want. And so as

soon as you have anything that’s even slightly dynamic, even something

as simple as text free flowing, that’s the place where you think in an

immediate format is necessary.

01:08:36 - Speaker 1: Yeah, exactly, like it’s not tractable to design

like a phone version of the website for every possible phone and then

like a tablet version for all the tablets and then a desktop version.

And but also like a projector version, things like that.

So the layout and the appearance is driven by the content itself. And I

think that there’s an idea of that for outputting a paper.

If you’re thinking about outputting another artifact like a diagram or

something, I think there are situations where it’s really useful to be

able to do standard direct manipulation diagram editing and then also

situations where it’s really useful to be able To like select all of

the text that corresponds to a certain subgraph and just like move it

somewhere else and allowing people the flexibility of choosing between

those different edit options, depending on what task they’re trying to

perform, what problem they’re trying to solve is like a really big area

of opportunity.

So I think like, we’re still at a stage where with all of these

different new editors like Coda or Notion or even bare craft. Editors

are still like very much borrowing from each other a lot and

periodically striking out in the direction of like, here’s a new kind

of block or a new kind of like cell or type of text that you can have,

and I think that while we’re still in the stage of like churning

feature churn around, what are the editing primitives that people care

about, what things go in a document, it’s going to be hard to develop

any kind of like unifying framework or IR for these documents to work

together.

I’m hopeful that once we reach a scenario where there’s a little more

stasis and maybe more overlap in the capabilities and interests of

different editors, you could have like this intermediate platform that

extends from things like Rome to notion or notion to air Table or

something like that for the components that make sense to go into those

other platforms, and then you can actually really flexibly move your

data around between these areas and likewise, within applications, maybe

you want to be able to start off with something really low fidelity and

gradually get something higher fidelity, like it would be really nice to

have a slider almost. That allows you to move up and down the ladder of

abstraction, but failing that, like an intermediate tool that you can

plug in and be like, OK, I want to take this like bullet list of to do

items and upgrade it into a database for like things or something like

that. Something that’s more plug and play that also handles structured

data in the same way we have a tool like Handdoc for text. That’s what

I’m really excited to see because I really think of rich text as slowly

expanding to include all the things you might want to have in a

document, which might include Embedded views of other databases or

things like that. So just having a more expansive interpretation of rich

text that is less constraining with respect to the kinds of artifacts

that you can produce, allows you to combine more things together, has

like a notion of structure that enables these kinds of really Powerful

edits like re-parenting an entire subtree, while also allowing you to do

things like select a linear region of text and copy it somewhere else. I

think that’s kind of the direction we’re moving in, where we combine a

lot of flexibility of plain text editors that we’ve seen to date with

some of the power of having more structure.

01:11:51 - Speaker 3: It’s a pretty exciting future.

01:11:53 - Speaker 1: Yeah, let’s hope we get there.

01:11:54 - Speaker 2: Well, let’s wrap it there. Thanks everyone for

listening. If you have feedback, write us on Twitter at @museapphq. You can

reach us on email, hello at museapp.com. You can also leave us a review

on Apple Podcasts. And slim, your drive and passion for all things text

and in fact expanding my mind has been expanded on what even we would

think of text as being and what these intermediate formats can do for us

in the future. So I’m really excited that you’re on the forefront of

this and pushing forward our tools.

01:12:29 - Speaker 1: Yeah, thanks so much for having me. This was great

to talk about.

...more

Share Metamuse

Sign up to save your podcasts

Metamuse

FAQs about Metamuse:

How many episodes does Metamuse have?

Metamuse episodes:

FAQs about Metamuse:

How many episodes does Metamuse have?