C
C#2y ago
Dusty

❔ Markdown RegEx substituation

Hey, I need some help making a RegEx to convert StackExchange markdown to discord or rather CommonMark markdown. From:
<!-- language: lang-js -->
let a = null;

function doSomething() {
// abc
}

<!-- language: lang-html -->

<p>abc</p>
<!-- language: lang-js -->
let a = null;

function doSomething() {
// abc
}

<!-- language: lang-html -->

<p>abc</p>
To:
´´´js
let a = null;

function doSomething() {
// abc
}
´´´

´´´html
<p>abc</p>
´´´
´´´js
let a = null;

function doSomething() {
// abc
}
´´´

´´´html
<p>abc</p>
´´´
This is what I tried so far however I can't get it to work: https://regex101.com/r/mxeeiW/1
regex101
regex101: build, test, and debug regex
Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET, Rust.
6 Replies
n8ta
n8ta2y ago
Regex doesn't produce output it only identifies sections of the input and matches. You'll need to combine regex with some other code to accomplish this. Can you clarify your approach?
FestivalDelGelato
this seems a task that requires a non trivial effort...
Dusty
DustyOP2y ago
You can substitute with regex using capture groups. So I basically wanna capture the first line (the html comment) and match the cosing language, so in the example above "js" or "html". After that I want to capture everything that's intended At the link above you can almost see it working Any help is appreciated. The pattern I've linked is almost working
Denis
Denis2y ago
If the input is as simple as having XML comments before the code content, then I believe using regex might be a bit overkill I'd suggest, going over each line of the input string, and only use regex to check if the given line is the XML tag stating the code block language. And once you find such a tag, you populate each next line into some Model representing a code block until either the end of the string input, or the next XML tag This way the logic for converting the content to markdown will be much more transparent, and maybe even more performant Here's what Bing Chat has generated for me:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = @"<!-- language: lang-js -->
let a = null;

function doSomething() {
// abc
}

<!-- language: lang-html -->

<p>abc</p>";

var models = new List<Model>();
var lines = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
var currentLanguage = string.Empty;
var currentCode = string.Empty;

foreach (var line in lines)
{
if (line.StartsWith("<!-- language:"))
{
if (!string.IsNullOrEmpty(currentCode))
{
models.Add(new Model { Language = currentLanguage, Code = currentCode });
currentCode = string.Empty;
}

currentLanguage = Regex.Match(line, @"(?<=language:\s)(.*)(?=\s-->)").Value;
}
else
{
currentCode += line + Environment.NewLine;
}
}

if (!string.IsNullOrEmpty(currentCode))
{
models.Add(new Model { Language = currentLanguage, Code = currentCode });
}

foreach (var model in models)
{
Console.WriteLine($"Language: {model.Language}");
Console.WriteLine($"Code: {model.Code}");
}
}
}

public class Model
{
public string Language { get; set; }
public string Code { get; set; }
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = @"<!-- language: lang-js -->
let a = null;

function doSomething() {
// abc
}

<!-- language: lang-html -->

<p>abc</p>";

var models = new List<Model>();
var lines = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
var currentLanguage = string.Empty;
var currentCode = string.Empty;

foreach (var line in lines)
{
if (line.StartsWith("<!-- language:"))
{
if (!string.IsNullOrEmpty(currentCode))
{
models.Add(new Model { Language = currentLanguage, Code = currentCode });
currentCode = string.Empty;
}

currentLanguage = Regex.Match(line, @"(?<=language:\s)(.*)(?=\s-->)").Value;
}
else
{
currentCode += line + Environment.NewLine;
}
}

if (!string.IsNullOrEmpty(currentCode))
{
models.Add(new Model { Language = currentLanguage, Code = currentCode });
}

foreach (var model in models)
{
Console.WriteLine($"Language: {model.Language}");
Console.WriteLine($"Code: {model.Code}");
}
}
}

public class Model
{
public string Language { get; set; }
public string Code { get; set; }
}
From here it is elemantary to implement logic in the model for outputting it either in the original xml format, or in the markdown format You can utilize new C# features to create a code-generated regex for matching the language tag Not sure if it makes sense to replace the StartsWith with the regex check directly, you'll have to do a performance check This should be the output of the Console WriteLine
Language: lang-js
Code: let a = null;

function doSomething() {
// abc
}


Language: lang-html
Code: <p>abc</p>
Language: lang-js
Code: let a = null;

function doSomething() {
// abc
}


Language: lang-html
Code: <p>abc</p>
This is the prompt I've used:
Write a C# program, that will go over a string input, and look for XML comment tags, indicating the programming language in which the following lines are written in. The program will find such an XML comment, and create an instance of a Model representing a code block. This model will have a string property indicating what language it is (this will be loaded from the XML comment), and the code that follows after the XML comment. The program will fill the model with lines after the XML comment until either it is the end of the input, or it finds another XML comment. Here is an example input: I've pasted your example here
Dusty
DustyOP2y ago
Thanks appreciate it, thought regex would be simpler but this makes sense
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?