How to check if createMemo should be used?

I'm trying to figure out if using createMemo is or isn't worth it. I know that for super basic equality checks, there is probably no point. But what about something slow, like code highlighting? What I don't understand is that what happens when my content changes. I think the whole function needs to be re-run, so the memo doesn't help. Or does it, still? Also, what about the case, when this whole component is just item in a <For> loop. I mean I'm parsing the markdown like this:
const tokens: TokensList = processor.lexer(input)

return tokens
.filter((t) => t.type !== 'space')
.map((t) => {
if (t.type === 'code') {
return { type: 'code', content: t.text, lang: t.lang }
} else {
const html = processor.parse(t.raw)
return { type: 'html', content: html }
}
})
const tokens: TokensList = processor.lexer(input)

return tokens
.filter((t) => t.type !== 'space')
.map((t) => {
if (t.type === 'code') {
return { type: 'code', content: t.text, lang: t.lang }
} else {
const html = processor.parse(t.raw)
return { type: 'html', content: html }
}
})
So I have lot of small tokens which is processed in a <For> component, like this:
const tokens = getMarkdownTokens(props.block.content)
return <For each={tokens}>{(tok) => <TokenDisplay token={tok} />}</For>
const tokens = getMarkdownTokens(props.block.content)
return <For each={tokens}>{(tok) => <TokenDisplay token={tok} />}</For>
export const CodeDisplay = (props: { token: Extract<MarkdownToken, { type: 'code' }> }) => {
const highlighted = createMemo(() => {
if (props.token.lang && hljs.getLanguage(props.token.lang)) {
return hljs.highlight(props.token.content, { language: props.token.lang }).value
} else {
return hljs.highlightAuto(props.token.content).value
}
})

return (
<pre {...stylex.attrs(styles.base)}>
<code innerHTML={highlighted()} />
</pre>
)
}
export const CodeDisplay = (props: { token: Extract<MarkdownToken, { type: 'code' }> }) => {
const highlighted = createMemo(() => {
if (props.token.lang && hljs.getLanguage(props.token.lang)) {
return hljs.highlight(props.token.content, { language: props.token.lang }).value
} else {
return hljs.highlightAuto(props.token.content).value
}
})

return (
<pre {...stylex.attrs(styles.base)}>
<code innerHTML={highlighted()} />
</pre>
)
}
So what is happening here exactly with Solid? I mean I have a changing Markdown input -> changing array of parsed tokens -> a potentially lot of possible code-reuse for syntax highlighting. Is createMemo helping here or not? How can I check if it helps? Putting a console.log inside createMemo which should not trigger?
31 Replies
hyperknot
hyperknotOP3d ago
What I'm suspecting is that the changing for loop from my source function "throws out" all the optimisations from Solid in the For loop, including the memo.
zulu
zulu3d ago
in the markdown case, I am guessing every time you change the input and the markdown produces the tokens, then your <For> will re render all the tokens. because there is no referential Identity, and so I don't think it will be an optimal update. my 2 guess is that memo will not help in this case. the only thing that could have helped, is if you can somehow update your list of tokens in a way that you don't completely replace the whole token list. even then I am not sure how optimal that will be, but probably better than what you have now.
Madaxen86
Madaxen863d ago
createMemo holds the computed value until a dependency changes. So it only makes sense when you access the value (call the function) multiple times. In your example you call it only once in the For component. So createMemo does not optimise there. BTW for does memo each item. So it Supports granular updates.
hyperknot
hyperknotOP3d ago
So createMemo doesn't use like a global cache of input->output, it uses a locally scoped cache, right? So basically because my for loop generates a new array each time, there is no way createMemo would remember what has happened right? Basicall the solution is to create my own cache, which I manually reset, right?
Madaxen86
Madaxen863d ago
Hm. I don't what exactly you mean. But you can see if createMemo reruns if you place a simple console.log inside. So you see if there's a difference between a regular function call and createMemo. Maybe you can move the logic of the tokens.map inside the For so you don't recreate a new array on every change. Something like
const tokens: any[] = [];

return (
<For each={tokens}>
{(t) => {
return (
<Switch>
<Match when={t.type === "code"}>
{(code) => <CodeDisplay token={code} />} //preprocess code lokic in component
</Match>
<Match when={t.type !== "space"}>
<HTMLDispay token={t} /> //preprocess html in component
</Match>
</Switch>
);
}}
</For>
);
const tokens: any[] = [];

return (
<For each={tokens}>
{(t) => {
return (
<Switch>
<Match when={t.type === "code"}>
{(code) => <CodeDisplay token={code} />} //preprocess code lokic in component
</Match>
<Match when={t.type !== "space"}>
<HTMLDispay token={t} /> //preprocess html in component
</Match>
</Switch>
);
}}
</For>
);
hyperknot
hyperknotOP3d ago
I mean what I need to do is to make my own memo, independent of the component I think.
const cache = new Map()
return (content, lang) => {
const key = `${lang || 'auto'}-${content}`
if (cache.has(key)) {
return cache.get(key)
} else {
let result
if (lang && hljs.getLanguage(lang)) {
result = hljs.highlight(content, { language: lang }).value
} else {
result = hljs.highlightAuto(content).value
}
cache.set(key, result)
return result
}
}
const cache = new Map()
return (content, lang) => {
const key = `${lang || 'auto'}-${content}`
if (cache.has(key)) {
return cache.get(key)
} else {
let result
if (lang && hljs.getLanguage(lang)) {
result = hljs.highlight(content, { language: lang }).value
} else {
result = hljs.highlightAuto(content).value
}
cache.set(key, result)
return result
}
}
bigmistqke
bigmistqke3d ago
yes, a memo is not a cache. it's simply a computation that does (by default) a shallow equal check with the previous computation. your cache idea will work, but it will still be slow with big text sizes vscode-textmate has a tokenizeLine which allows to pass a previous context to it, so that all the lines before that line do not have to be recalculated i was playing around with syntax highlighting a bit back too. here you can see that tokenizeLine in action.
hyperknot
hyperknotOP3d ago
Thanks! I'm thinking that in a chat response, there are multiple max. 200 line code snippets for example. The last one is constantly being rewritten / recolorized, but all the ones before it should be cacheable.
bigmistqke
bigmistqke3d ago
yes, but you can't just highlight each line individually you will need to pass the context of the previous lines to it
hyperknot
hyperknotOP3d ago
For how to speedup the currently extended code block, I have no idea, I think it needs full refresh, or possibly some token level optimisation in the highlighting library. you are right, probably IDEs have to figure it out in a super optimised way
bigmistqke
bigmistqke3d ago
a i misread this sentence you mean caching the whole codeblock yes, by saving the previous tokenized lines as stack of tokens, that's how vscode-textmate does it you can get really optimized with that: https://pota.quack.uy/Reactivity/mutable-tests this is using my tm-textarea under the hood (oof editing of that file is not so great lol) but you are right, for max 200 loc snippets it's a bit unnecessary if there are no signals updated in the memo, it should not recalculate the memo too
peerreynders
peerreynders3d ago
1. That type of caching assumes that your tokens are value objects, i.e. token equality is entirely based on the prop values of the token but not its identity. For certain types of analysis, where a token appears in the list in relation to the tokens before or after it helps to establish its identity. Now if you can guarantee that every token in your list will always be unique (i.e. tokens with the same properties cannot appear in multiple places within the same list) you should be OK. However if separate tokens with identical properties can appear in multiple places you'll likely confuse the hell out of the For because you will replace the separate occurrences with the same identical reference. For doesn't expect references to appear more than once in the list; it needs them to be unique; that's how it tracks a DOM fragment as belonging to an item reference, so that it can move around the fragment when the item position moves. 2. Whenever you generate a new token list transfer tokens you keep from the old Map to a new Map. That way you can discard removed nodes immediately and don't have them hanging around unnecessarily growing the cache. 3. createMemo can help you manage the “cache”
const reconciled = createMemo(
([_lastTokens, lastMap]) => {
const tokens = getMarkdownTokens(props.block.content);
return reconcileTokens(tokens, lastMap);
},
[[], new Map()]
);

return <For each={reconciled[0]}>{(tok) => <TokenDisplay token={tok} />}</For>;
const reconciled = createMemo(
([_lastTokens, lastMap]) => {
const tokens = getMarkdownTokens(props.block.content);
return reconcileTokens(tokens, lastMap);
},
[[], new Map()]
);

return <For each={reconciled[0]}>{(tok) => <TokenDisplay token={tok} />}</For>;
hyperknot
hyperknotOP3d ago
I'll measure how much it takes to render and see if I need more optimisations. So marked markdown parser gives me a list of object:
const tokens: TokensList = processor.lexer(input)

return tokens
.filter((t) => t.type !== 'space')
.map((t) => {
if (t.type === 'code') {
return { type: 'code', content: t.text, lang: t.lang }
} else {
const html = processor.parse(t.raw)

return { type: 'html', content: html }
}
})
const tokens: TokensList = processor.lexer(input)

return tokens
.filter((t) => t.type !== 'space')
.map((t) => {
if (t.type === 'code') {
return { type: 'code', content: t.text, lang: t.lang }
} else {
const html = processor.parse(t.raw)

return { type: 'html', content: html }
}
})
Many of these will be identical, for example all the <hr> lines will be exactly the same. So you mean that the Solid <For> loop tracking these items will break? I mean it displays correctly, but I don't know anything about how optimal it is. I mean I suspect that in every single loop the whole For loop is being recreated, so maybe all optimisations are just not doing anything?
peerreynders
peerreynders3d ago
So you mean that the Solid <For> loop tracking these items will break?
Right now each <hr> will have it's own referential identity. Your proposed caching scheme would collapse that to one single referential identity. My prediction is that the For will have a pretty good chance of glitching out.
hyperknot
hyperknotOP3d ago
But the whole caching would be hidden inside the component. The For wouldn't see any difference, or would it?
peerreynders
peerreynders3d ago
For the optimization to work, you have to manage the referential identity of the items passed to the For. Right now each new list will simply drop all previous fragments and create fresh ones as there is no overlap in referential identity of the items.
hyperknot
hyperknotOP3d ago
Yes, it'd be great to keep the old items. But what can I do? My input is an array from a 3rd party library. I can calculate a hash on each item, for example, but how do I tell Solid "not to destroy and rerender everything"? How can I measure how much does it take for Solid to render a component though?
peerreynders
peerreynders3d ago
I can calculate a hash on each item.
Well, imagine this. - process the previous token list into a Map; key: the hash, value an array of tokens (reverse order of appearance in list) with the matching hash. - once the new token list comes in - for each token - generate the hash; if hash is NOT in Map continue - if hash in Map remove token from the end of array and replace the token in the new token list with the old token. That way you stabilize the reference identity of items to reuse the DOM fragments. And of course you can use a createMemo to transfer that “previous token list map” from one update to the next.
hyperknot
hyperknotOP3d ago
so basically I should try to have a single reference array, which I modify by .push() and similar, instead of always taking a new array from my function? you are saying that I don't even need to use anything more advanced, like solid-primitives/keyed?
peerreynders
peerreynders3d ago
If you are managing referential identity yourself Solid will just do the rest. Also note that you can build the next Map for the next update as you consume the old Map to minimize the times you run the hash.
instead of always taking a new array from my function
You are still getting that new array from your function. You are just running a rudimentary diff to determine which items to (referentially) keep from the last update. So sure, you are creating a new array of the “old items” where possible, only using a “new items” when there is no match in the “old items” (and in the process dropping old items that are no longer relevant).
hyperknot
hyperknotOP3d ago
I see. So basically I'm building my own reference tracking function, taking an immutable array and diffing it into a mutable array. But at this point I might as well write that hash into an id and use Keyed, shouldn't I?
<Key each={items()} by="id">
<Key each={items()} by="id">
https://primitives.solidjs.community/package/keyed
peerreynders
peerreynders3d ago
A hash isn't an ID.
hyperknot
hyperknotOP3d ago
Only because of the duplicates, right?
peerreynders
peerreynders3d ago
By definition an ID is expected to be unique.
hyperknot
hyperknotOP3d ago
then I can append an index to the id
peerreynders
peerreynders3d ago
The idea is that IDs are consistent between updates.
hyperknot
hyperknotOP3d ago
That would be the case, as long as only the end of the array changes, so I think it could work.
peerreynders
peerreynders3d ago
What I mean is that you would have to track the used index to create an ID for each hash individually to be "somewhat" consistent. Otherwise you would rarely have matches between updates-rendering the effort moot. keyed applies in situations where
model1 !== model2
model1 !== model2
but
model1.id === model2.id
model1.id === model2.id
Telling us that model2 from the current update slots into where model1 was used on the last update. For stores reconcile accomplishes that.
hyperknot
hyperknotOP3d ago
I haven't looked deeply into, but I thought I have to do the same anyway. Also, wouldn't keyed work with duplicate IDs in the array? I mean there is no reason why it couldn't render the same element twice.
peerreynders
peerreynders3d ago
The ID establishes the strict relationship between the data and the DOM fragment that was rendered based on it (and more importantly the connecting, reactive props). There can't be a duplicate IDs otherwise the relationship is ambiguous. That's why in React you have to provide a key prop. Solid uses the reference identity of the data instead; something that React cannot do because data is immutable so correlation of data over time is accomplished with key (a unique ID). Keyed is closer to what React does and is useful with data coming from the server as it isn't possible to have referential identity between updates from the server; so an ID of your choice is used to correlate data instead. reconcile on the other hand is used to maintain referential identity inside a store while using the configured key to orchestrate the necessary updates to match the supplied data.
hyperknot
hyperknotOP2d ago
Thank you for the explanation!

Did you find this page helpful?