❔ Split on new line preserving empty line

So I'm trying to create a method that given a string with Environment.NewLine or \r\n or \r or \n converts it to an array while preserving the new lines in form of an empty line.
string[] newLineArray = { Environment.NewLine };
string[] textArray1 = text.Split(newLineArray, StringSplitOptions.None);
string[] textArray = text.Split(Environment.NewLine.ToArray(), StringSplitOptions.None);
string[] newLineArray = { Environment.NewLine };
string[] textArray1 = text.Split(newLineArray, StringSplitOptions.None);
string[] textArray = text.Split(Environment.NewLine.ToArray(), StringSplitOptions.None);
While testing things I am having hard time understanding why there is difference between first 2 lines vs what is in 3rd line. The first 2 lines when given a string such as "First line\r\nAnd more in new line" split into array of 2 strings, while the output of textArray splits into 3, with an empty line. In the end I want to add to my C# library that can create Word Documents ability for people to be able to provide string with new lines of different kind and that would be treated in proper manner. But for some reason Split on newLineArray delivers no empty lines, and the only time I can get it to deliver empty lines is when using Environmnet.NewLine.ToArray()
private WordParagraph ConvertToTextWithBreaks(string text) {
string[] newLineArray = { Environment.NewLine, "\n", "\r\n", "\n\r" };
//string[] newLineArray = { Environment.NewLine };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
//string[] textArray = text.Split(Environment.NewLine.ToArray(), StringSplitOptions.None);

WordParagraph wordParagraph = null;
foreach (string line in textArray) {
if (line == "") {
wordParagraph = AddBreak();
} else {
wordParagraph = new WordParagraph(this._document, this._paragraph, new Run());
wordParagraph.Text = line;
this._paragraph.Append(wordParagraph._run);
}
}
return wordParagraph;
}
private WordParagraph ConvertToTextWithBreaks(string text) {
string[] newLineArray = { Environment.NewLine, "\n", "\r\n", "\n\r" };
//string[] newLineArray = { Environment.NewLine };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
//string[] textArray = text.Split(Environment.NewLine.ToArray(), StringSplitOptions.None);

WordParagraph wordParagraph = null;
foreach (string line in textArray) {
if (line == "") {
wordParagraph = AddBreak();
} else {
wordParagraph = new WordParagraph(this._document, this._paragraph, new Run());
wordParagraph.Text = line;
this._paragraph.Append(wordParagraph._run);
}
}
return wordParagraph;
}
What I am missing?
43 Replies
phaseshift
phaseshift2y ago
Using Environmnet.NewLine.ToArray() means you're splitting on \r and then also on \n. So if the line is "blah\r\n" then you get "blah", "". Same as if you have "blah\ra\n" then you get "blah", "a"
ero
ero2y ago
you get "blah", "", and ""
phaseshift
phaseshift2y ago
Not sure how 'implementation defined' this is, but this might be what you want: .Split(new[]{"\r\n", "\r", "\n"}, StringSplitOptions.None); importantly "\r\n" is the first split string, so that's always checked first - at least in my local experiment.
ero
ero2y ago
splitting on just \r doesn't really make sense i think you can have a \r without a new line
przemyslawklys
przemyslawklysOP2y ago
That's now what I am seeing. It split just once and properly shows empty line in place of newline for both \r\n and explicit. My problem for the first question is - that even tho StringSplitOptions.None is supposed to act the same, I get 2 different results where while the split works on text.Split(newLineArray, StringSplitOptions.None); it's actually not preserving the empty entry
przemyslawklys
przemyslawklysOP2y ago
This doesn't seem to work for me. I mean the Split itself does work, the prbolem is the result is not "blah", "", "something else" but it's "blah", "something else". Only time "blah", "", "something else" is reteined if I use text.Split(Environment.NewLine.ToArray(), StringSplitOptions.None); I have no clue why
przemyslawklys
przemyslawklysOP2y ago
The breakpoint never gets hit if I define text.Split suggested way
przemyslawklys
przemyslawklysOP2y ago
przemyslawklys
przemyslawklysOP2y ago
it hits just fine when using single .ToArray() approach I guess the difference comes from var test = Environment.NewLine.ToArray(); because what it shows now is that test is actually char[], that would mean char[] splits differently then string[]
phaseshift
phaseshift2y ago
It seems like you're expecting split to keep new lines or convert them. It just doesn't
przemyslawklys
przemyslawklysOP2y ago
var paragraph1 = "First line\r\nAnd more in new line";

char[] testChars = Environment.NewLine.ToArray();
string[] testStrings = { Environment.NewLine };
string[] testStringsMultiple = { Environment.NewLine, "\n", "\r\n" };

string[] textArray1 = paragraph1.Split(testChars, StringSplitOptions.None);
string[] textArray2 = paragraph1.Split(testStrings, StringSplitOptions.None);
string[] textArray3 = paragraph1.Split(testStringsMultiple, StringSplitOptions.None);

Console.WriteLine(textArray1.Length);
Console.WriteLine(textArray2.Length);
Console.WriteLine(textArray3.Length);
var paragraph1 = "First line\r\nAnd more in new line";

char[] testChars = Environment.NewLine.ToArray();
string[] testStrings = { Environment.NewLine };
string[] testStringsMultiple = { Environment.NewLine, "\n", "\r\n" };

string[] textArray1 = paragraph1.Split(testChars, StringSplitOptions.None);
string[] textArray2 = paragraph1.Split(testStrings, StringSplitOptions.None);
string[] textArray3 = paragraph1.Split(testStringsMultiple, StringSplitOptions.None);

Console.WriteLine(textArray1.Length);
Console.WriteLine(textArray2.Length);
Console.WriteLine(textArray3.Length);
phaseshift
phaseshift2y ago
It doesn't 'show empty line in place of new line'
przemyslawklys
przemyslawklysOP2y ago
first 3 console write gives 3,2,2 which means char is treated differently and preserves new line on which we actually Split when a string[] is passed the empty line is not preserved
phaseshift
phaseshift2y ago
Your just doing different things They're not treated differently You give different input, you get different answer
przemyslawklys
przemyslawklysOP2y ago
i've edited the example above showing same string, with split using 3 different, but very similar ways first result gives 3, second gives 2, third gives 2 to my noob ass it looks the same 😉 when it comes to input,
phaseshift
phaseshift2y ago
This again
przemyslawklys
przemyslawklysOP2y ago
hrmms
phaseshift
phaseshift2y ago
It's all irrelevant. Split will not convert the split thing into something else... The split thing gets removed
przemyslawklys
przemyslawklysOP2y ago
ok, so what would the proper way to get this done I want to convert a string into array, and retain new lines in a form of empty strings
phaseshift
phaseshift2y ago
Split on everything you want to split on. Then you have all your lines. Then you can add in an empty line after every real line if you really want to
przemyslawklys
przemyslawklysOP2y ago
But how do I know where it is? thats' my problem
phaseshift
phaseshift2y ago
Because that's by definition what your array entries are The things between the split tokens Or use regex replace To give a single token, then split on that token
przemyslawklys
przemyslawklysOP2y ago
🤯
phaseshift
phaseshift2y ago
Btw, what you're trying to do, I wouldn't suggest anyway. After you're done, you don't know if an empty line is really an empty line, or if it's a 'new line'
przemyslawklys
przemyslawklysOP2y ago
how so? It's for displaying in Word. User is only able to provide a string, so it has to be "explicit" in what they provide
public static void Example_BasicWordWithTabStopsTabChars(string folderPath, bool openWord) {
Console.WriteLine("[*] Creating standard document with different default style (PL)");
string filePath = System.IO.Path.Combine(folderPath, "BasicWordWithTabs.docx");
using (WordDocument document = WordDocument.Create(filePath)) {

var paragraph1 = document.AddParagraph("To jest po polsku");

Console.WriteLine("Paragraph count (expected 1): " + document.Paragraphs.Count);

var paragraph2 = document.AddParagraph("To jest po polsku \t\t And more");

Console.WriteLine("Paragraph count (expected 2): " + document.Paragraphs.Count);

var paragraph3 = document.AddParagraph("To jest po polsku \t\t And more");
paragraph3.Underline = UnderlineValues.DashLong;

Console.WriteLine("Paragraph count (expected 3): " + document.Paragraphs.Count);

var paragraph4 = document.AddParagraph("First line\r\nAnd more in new line");

var paragraph6 = document.AddParagraph("First line\nnd more in new line");

var paragraph7 = document.AddParagraph("First line" + Environment.NewLine + "And more in new line");

Console.WriteLine("Paragraph count (expected 6): " + document.Paragraphs.Count);

document.Save(openWord);
}
}
public static void Example_BasicWordWithTabStopsTabChars(string folderPath, bool openWord) {
Console.WriteLine("[*] Creating standard document with different default style (PL)");
string filePath = System.IO.Path.Combine(folderPath, "BasicWordWithTabs.docx");
using (WordDocument document = WordDocument.Create(filePath)) {

var paragraph1 = document.AddParagraph("To jest po polsku");

Console.WriteLine("Paragraph count (expected 1): " + document.Paragraphs.Count);

var paragraph2 = document.AddParagraph("To jest po polsku \t\t And more");

Console.WriteLine("Paragraph count (expected 2): " + document.Paragraphs.Count);

var paragraph3 = document.AddParagraph("To jest po polsku \t\t And more");
paragraph3.Underline = UnderlineValues.DashLong;

Console.WriteLine("Paragraph count (expected 3): " + document.Paragraphs.Count);

var paragraph4 = document.AddParagraph("First line\r\nAnd more in new line");

var paragraph6 = document.AddParagraph("First line\nnd more in new line");

var paragraph7 = document.AddParagraph("First line" + Environment.NewLine + "And more in new line");

Console.WriteLine("Paragraph count (expected 6): " + document.Paragraphs.Count);

document.Save(openWord);
}
}
i don't see how someone could feed empty line for my conversion to have a problem but maybe i don't see something
using System.Collections.Generic;

List<string> Convert(string text) {
string[] testStringsMultiple = { Environment.NewLine, "\r\n", "\n" };
string[] textArray3 = text.Split(testStringsMultiple, StringSplitOptions.None);
var list = new List<string>();
for (int i = 0; i < textArray3.Length; i++) {
list.Add(textArray3[i]);
list.Add("");
}
if (list.Count > 0) {
if (list[list.Count - 1] == "") {
list.RemoveAt(list.Count - 1);
}
}
return list;
}

Console.WriteLine("----");
foreach (var item in Convert("First line\r\nAnd more in new line\r\n")) {
Console.WriteLine(item);
}
Console.WriteLine("----");

Console.WriteLine("----");
foreach (var item in Convert("First line\r\nAnd more \r\n in new line")) {
Console.WriteLine(item);
}
Console.WriteLine("----");

Console.WriteLine("----");
foreach (var item in Convert("First line\nAnd more in new line\r\n")) {
Console.WriteLine(item);
}
Console.WriteLine("----");
using System.Collections.Generic;

List<string> Convert(string text) {
string[] testStringsMultiple = { Environment.NewLine, "\r\n", "\n" };
string[] textArray3 = text.Split(testStringsMultiple, StringSplitOptions.None);
var list = new List<string>();
for (int i = 0; i < textArray3.Length; i++) {
list.Add(textArray3[i]);
list.Add("");
}
if (list.Count > 0) {
if (list[list.Count - 1] == "") {
list.RemoveAt(list.Count - 1);
}
}
return list;
}

Console.WriteLine("----");
foreach (var item in Convert("First line\r\nAnd more in new line\r\n")) {
Console.WriteLine(item);
}
Console.WriteLine("----");

Console.WriteLine("----");
foreach (var item in Convert("First line\r\nAnd more \r\n in new line")) {
Console.WriteLine(item);
}
Console.WriteLine("----");

Console.WriteLine("----");
foreach (var item in Convert("First line\nAnd more in new line\r\n")) {
Console.WriteLine(item);
}
Console.WriteLine("----");
so i wrote this, but it's not really working properly if it ends with new line
phaseshift
phaseshift2y ago
Because any string can be empty
przemyslawklys
przemyslawklysOP2y ago
But I would not split on empty string, so I would treat it without spliting and without adding a break
phaseshift
phaseshift2y ago
Wdym 'you wouldn't split'? Split on an empty string is going to give an array of size 1 containing an empty string If you don't want empty entries, which is what you're complaining about with the case where it ends with a new line, then use the option that removes empty entries
przemyslawklys
przemyslawklysOP2y ago
using System.Collections.Generic;

List<string> Convert(string text) {
string[] testStringsMultiple = { Environment.NewLine, "\r\n", "\n" };
string[] textArray3 = text.Split(testStringsMultiple, StringSplitOptions.RemoveEmptyEntries);
var list = new List<string>();
for (int i = 0; i < textArray3.Length; i++) {
list.Add(textArray3[i]);
if (i < textArray3.Length - 1) {
list.Add("");
} else {
if (text.EndsWith(Environment.NewLine)) {
list.Add("");
} else if (text.EndsWith("\r\n")) {
list.Add("");
} else if (text.EndsWith("\n")) {
list.Add("");
}
}

}
return list;
}

Console.WriteLine("----");
var test1 = Convert("First line\r\nAnd more \r\n in new line\r\n");
foreach (var item in test1) {
Console.WriteLine(item);
}

Console.WriteLine("----");
var test2 = Convert("First line\r\nAnd more \r\n in new line");
foreach (var item in test2) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test3 = Convert("First line\r\nAnd more \r\n in new line");
foreach (var item in test3) {
Console.WriteLine(item);
}
Console.WriteLine("----");
using System.Collections.Generic;

List<string> Convert(string text) {
string[] testStringsMultiple = { Environment.NewLine, "\r\n", "\n" };
string[] textArray3 = text.Split(testStringsMultiple, StringSplitOptions.RemoveEmptyEntries);
var list = new List<string>();
for (int i = 0; i < textArray3.Length; i++) {
list.Add(textArray3[i]);
if (i < textArray3.Length - 1) {
list.Add("");
} else {
if (text.EndsWith(Environment.NewLine)) {
list.Add("");
} else if (text.EndsWith("\r\n")) {
list.Add("");
} else if (text.EndsWith("\n")) {
list.Add("");
}
}

}
return list;
}

Console.WriteLine("----");
var test1 = Convert("First line\r\nAnd more \r\n in new line\r\n");
foreach (var item in test1) {
Console.WriteLine(item);
}

Console.WriteLine("----");
var test2 = Convert("First line\r\nAnd more \r\n in new line");
foreach (var item in test2) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test3 = Convert("First line\r\nAnd more \r\n in new line");
foreach (var item in test3) {
Console.WriteLine(item);
}
Console.WriteLine("----");
this seems to work fine just seems a bit ugly but not if the first element is new line
using System.Collections.Generic;

List<string> Convert(string text) {
string[] splitStrings = { Environment.NewLine, "\r\n", "\n" };
string[] textSplit = text.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);
var list = new List<string>();
for (int i = 0; i < textSplit.Length; i++) {
if (i == 0 && text.StartsWith(Environment.NewLine)) {
list.Add("");
} else if (i == 0 && text.StartsWith("\r\n")) {
list.Add("");
} else if (i == 0 && text.StartsWith("\n")) {
list.Add("");
}

list.Add(textSplit[i]);

if (i < textSplit.Length - 1) {
list.Add("");
} else {
if (text.EndsWith(Environment.NewLine)) {
list.Add("");
} else if (text.EndsWith("\r\n")) {
list.Add("");
} else if (text.EndsWith("\n")) {
list.Add("");
}
}
}
return list;
}

Console.WriteLine("----");
var test1 = Convert("First line\r\nAnd more \r\n in new line\r\n");
foreach (var item in test1) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test2 = Convert("First line\r\nAnd more \r\n in new line");
foreach (var item in test2) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test3 = Convert("First line\r\nAnd more \n in new line");
foreach (var item in test3) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test4 = Convert("\nFirst line\r\nAnd more \r\n in new line\r\n");
foreach (var item in test4) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test5 = Convert("\nFirst line\r\nAnd more " + Environment.NewLine + "in new line\r\n");
foreach (var item in test5) {
Console.WriteLine(item);
}
Console.WriteLine("----");
using System.Collections.Generic;

List<string> Convert(string text) {
string[] splitStrings = { Environment.NewLine, "\r\n", "\n" };
string[] textSplit = text.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);
var list = new List<string>();
for (int i = 0; i < textSplit.Length; i++) {
if (i == 0 && text.StartsWith(Environment.NewLine)) {
list.Add("");
} else if (i == 0 && text.StartsWith("\r\n")) {
list.Add("");
} else if (i == 0 && text.StartsWith("\n")) {
list.Add("");
}

list.Add(textSplit[i]);

if (i < textSplit.Length - 1) {
list.Add("");
} else {
if (text.EndsWith(Environment.NewLine)) {
list.Add("");
} else if (text.EndsWith("\r\n")) {
list.Add("");
} else if (text.EndsWith("\n")) {
list.Add("");
}
}
}
return list;
}

Console.WriteLine("----");
var test1 = Convert("First line\r\nAnd more \r\n in new line\r\n");
foreach (var item in test1) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test2 = Convert("First line\r\nAnd more \r\n in new line");
foreach (var item in test2) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test3 = Convert("First line\r\nAnd more \n in new line");
foreach (var item in test3) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test4 = Convert("\nFirst line\r\nAnd more \r\n in new line\r\n");
foreach (var item in test4) {
Console.WriteLine(item);
}
Console.WriteLine("----");
var test5 = Convert("\nFirst line\r\nAnd more " + Environment.NewLine + "in new line\r\n");
foreach (var item in test5) {
Console.WriteLine(item);
}
Console.WriteLine("----");
that seems to work, but seems a bit overkill 😉
mtreit
mtreit2y ago
This code honestly makes no sense. If you split on newline characters, no token in the resulting set will start with a newline character.
MODiX
MODiX2y ago
mtreit#6470
REPL Result: Success
var text = $"\r\na\nb\r\n\n\n\r\nc{Environment.NewLine}d\ne\r\nf";
var splitStrings = new string[] { Environment.NewLine, "\r\n", "\n" };
var textSplit = text.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);

foreach (var s in textSplit)
{
Console.WriteLine(s);
}

Console.WriteLine();
var hasNewLine = textSplit.Where(x => x.StartsWith(Environment.NewLine) || x.StartsWith("\n"));
Console.WriteLine($"Strings starting with a newline: {hasNewLine.Count()}");
var text = $"\r\na\nb\r\n\n\n\r\nc{Environment.NewLine}d\ne\r\nf";
var splitStrings = new string[] { Environment.NewLine, "\r\n", "\n" };
var textSplit = text.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);

foreach (var s in textSplit)
{
Console.WriteLine(s);
}

Console.WriteLine();
var hasNewLine = textSplit.Where(x => x.StartsWith(Environment.NewLine) || x.StartsWith("\n"));
Console.WriteLine($"Strings starting with a newline: {hasNewLine.Count()}");
Console Output
a
b
c
d
e
f

Strings starting with a newline: 0
a
b
c
d
e
f

Strings starting with a newline: 0
Compile: 770.253ms | Execution: 104.113ms | React with ❌ to remove this embed.
przemyslawklys
przemyslawklysOP2y ago
Thats' what I get when i run my code
przemyslawklys
przemyslawklysOP2y ago
przemyslawklys
przemyslawklysOP2y ago
on your string which is exactly what I need i would love to have this output from your string using something simpler but I am pretty noobish i don't understand what I am doing most of the time
List<string> Convert(string text) {
string[] splitStrings = { Environment.NewLine, "\r\n", "\n" };
string[] textSplit = text.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);
var list = new List<string>();
for (int i = 0; i < textSplit.Length; i++) {
// check if there's new line at the beginning of the text
// if there is add empty string to the list
if (i == 0 && text.StartsWith(Environment.NewLine)) {
list.Add("");
} else if (i == 0 && text.StartsWith("\r\n")) {
list.Add("");
} else if (i == 0 && text.StartsWith("\n")) {
list.Add("");
}
// add splitted text to the list
list.Add(textSplit[i]);

if (i < textSplit.Length - 1) {
// for every element in the list except the last element add empty string to the list
list.Add("");
} else {
// check if there's new line at the end of the text
// if there is add an empty string to the list
if (text.EndsWith(Environment.NewLine)) {
list.Add("");
} else if (text.EndsWith("\r\n")) {
list.Add("");
} else if (text.EndsWith("\n")) {
list.Add("");
}
}
}
return list;
}
List<string> Convert(string text) {
string[] splitStrings = { Environment.NewLine, "\r\n", "\n" };
string[] textSplit = text.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);
var list = new List<string>();
for (int i = 0; i < textSplit.Length; i++) {
// check if there's new line at the beginning of the text
// if there is add empty string to the list
if (i == 0 && text.StartsWith(Environment.NewLine)) {
list.Add("");
} else if (i == 0 && text.StartsWith("\r\n")) {
list.Add("");
} else if (i == 0 && text.StartsWith("\n")) {
list.Add("");
}
// add splitted text to the list
list.Add(textSplit[i]);

if (i < textSplit.Length - 1) {
// for every element in the list except the last element add empty string to the list
list.Add("");
} else {
// check if there's new line at the end of the text
// if there is add an empty string to the list
if (text.EndsWith(Environment.NewLine)) {
list.Add("");
} else if (text.EndsWith("\r\n")) {
list.Add("");
} else if (text.EndsWith("\n")) {
list.Add("");
}
}
}
return list;
}
That's the best that I can do
ero
ero2y ago
So, what, the result should just be a list of strings where every other item is an empty string...?
przemyslawklys
przemyslawklysOP2y ago
that's the idea... because then for every empty line in list i will add a Break() to word document
przemyslawklys
przemyslawklysOP2y ago
ero
ero2y ago
text
.Split(new[] { "\r\n", "\n" }, StringSplitOptions.RemoveEmptyEntries)
.SelectMany(static (s, i) => i == 0 ? new[] { s } : new[] { "", s })
.ToArray()
text
.Split(new[] { "\r\n", "\n" }, StringSplitOptions.RemoveEmptyEntries)
.SelectMany(static (s, i) => i == 0 ? new[] { s } : new[] { "", s })
.ToArray()
Or something
przemyslawklys
przemyslawklysOP2y ago
it's close, but not the same it's missing something, either the first or the last elements
przemyslawklys
przemyslawklysOP2y ago
przemyslawklys
przemyslawklysOP2y ago
ye, actually its' missing both if string starts with new line and and ends with new line it completly ignores it
przemyslawklys
przemyslawklysOP2y ago
GitHub
Add ability to use NewLines ("\r\n"/Environment.NewLine) in AddPara...
This PR adds the ability to create line breaks using \n, \r\n or Environment.NewLine var test = document.AddParagraph("TestMe").AddText("\nFirst line\r\nAnd more " + Environment...
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?