c# - Scalable Solution for Splitting words in a document? -
i have document in words separated , extracted blank space. purpose used following code.
string[] words = s.split(' ');
now problem going use code parser of search engine. because of there hundreds of thousands if not millions of webpages needed split words.
is concern right using above code process take long time or unfounded. if right suggestions on alternative scalable solution welcomed.
write own implementation returns ienumerable<string>
defers execution. example:
private static ienumerable<string> createsplitdeferredenumerable( string str, char delimiter) { var buffer = new stringbuilder(); foreach (var ch in str) { if (ch == delimiter) { yield return buffer.tostring(); buffer.length = 0; } else { buffer.append(ch); } } if (buffer.length != 0) { yield return buffer.tostring(); } } public static ienumerable<string> splitdeferred(this string self, char delimiter) { if (self == null) { throw new argumentnullexception("self"); } return createsplitdeferredenumerable(self, delimiter); }
instead of splitting string in 1 shot , returning array of every single sub-string (which consume huge amount of memory) can enumerate returned enumerable, , string split pieces on-the-fly. assuming don't keep enumerated string objects around after each iteration, eligible garbage collection.
Comments
Post a Comment