关于C＃：将CSV文件导入.Net中的强类型数据结构

Import CSV file to strongly typed data structure in .Net

将CSV文件导入到强类型数据结构中的最佳方法是什么？

微软的TextFieldParser是稳定的，并且遵循RFC 4180的CSV文件格式。不要被Microsoft.VisualBasic命名空间推迟；它是.NET Framework中的标准组件，只需添加对全局Microsoft.VisualBasic程序集的引用。

如果您要针对Windows(而不是Mono)进行编译，并且不希望解析"破碎的"(不符合RFC要求)CSV文件，那么这将是显而易见的选择，因为它是免费，不受限制，稳定，并得到积极的支持，其中FileHelpers绝不能说大部分。

另请参见：如何：从Visual Basic中的逗号分隔文本文件中读取VB代码示例。

查看FileHelpers开源库。

使用OleDB连接。

1
2
3
4
5
6
7
8
9

String sConnectionString ="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();

如果您期望CSV解析相当复杂的场景，甚至不用考虑推出我们自己的解析器。有很多出色的工具，例如FileHelpers甚至是CodeProject的工具。

关键是这是一个相当普遍的问题，您可以打赌很多软件开发人员已经考虑并解决了这个问题。

我同意@NotMyself。 FileHelpers经过了良好的测试，可以处理各种边缘情况，如果您自己这样做，最终将要处理。看一下FileHelpers的功能，只有在绝对确定(1)您将永远不需要处理FileHelpers的极端情况，或者(2)您喜欢编写这种东西并且打算当您必须解析如下内容时，您会大喜过望：

1，"帐单"，"史密斯"，"主管"，"无评论"

2，"德雷克"，"奥马利"，"管理员"

糟糕，我没有被引用，我正在换行！

Brian为将其转换为强类型集合提供了一个很好的解决方案。

给出的大多数CSV解析方法都没有考虑转义字段或CSV文件的其他一些细微问题(例如修剪字段)。这是我个人使用的代码。边缘有些粗糙，几乎没有错误报告。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

public static IList<IList<string>> Parse(string content)
{
IList<IList<string>> records = new List<IList<string>>();

StringReader stringReader = new StringReader(content);

bool inQoutedString = false;
IList<string> record = new List<string>();
StringBuilder fieldBuilder = new StringBuilder();
while (stringReader.Peek() != -1)
{
char readChar = (char)stringReader.Read();

if (readChar == '
' || (readChar == '
' && stringReader.Peek() == '
'))
{
// If it's a

combo consume the
part and throw it away.
if (readChar == '
')
{
stringReader.Read();
}

if (inQoutedString)
{
if (readChar == '
')
{
fieldBuilder.Append('
');
}
fieldBuilder.Append('
');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();

records.Add(record);
record = new List<string>();

inQoutedString = false;
}
}
else if (fieldBuilder.Length == 0 && !inQoutedString)
{
if (char.IsWhiteSpace(readChar))
{
// Ignore leading whitespace
}
else if (readChar == '"')
{
inQoutedString = true;
}
else if (readChar == ',')
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
else
{
fieldBuilder.Append(readChar);
}
}
else if (readChar == ',')
{
if (inQoutedString)
{
fieldBuilder.Append(',');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
}
else if (readChar == '"')
{
if (inQoutedString)
{
if (stringReader.Peek() == '"')
{
stringReader.Read();
fieldBuilder.Append('"');
}
else
{
inQoutedString = false;
}
}
else
{
fieldBuilder.Append(readChar);
}
}
else
{
fieldBuilder.Append(readChar);
}
}
record.Add(fieldBuilder.ToString().TrimEnd());
records.Add(record);

return records;
}

请注意，这不能处理没有用双引号引起来的字段的极端情况，而是在其中用引号引起来的梅尔。有关更好的扩展以及一些适当库的链接，请参见此文章。

我很无聊，所以我修改了我写的一些东西。它尝试以OO方式封装解析，从而减少了遍历文件的迭代次数，它仅在顶部foreach迭代一次。

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

namespace ConsoleApplication1
{
class Program
{

static void Main(string[] args)
{

// usage:

// note this wont run as getting streams is not Implemented

// but will get you started

CSVFileParser fileParser = new CSVFileParser();

// TO Do: configure fileparser

PersonParser personParser = new PersonParser(fileParser);

List<Person> persons = new List<Person>();
// if the file is large and there is a good way to limit
// without having to reparse the whole file you can use a
// linq query if you desire
foreach (Person person in personParser.GetPersons())
{
persons.Add(person);
}

// now we have a list of Person objects
}
}

public abstract class CSVParser
{

protected String[] deliniators = {"," };

protected internal IEnumerable<String[]> GetRecords()
{

Stream stream = GetStream();
StreamReader reader = new StreamReader(stream);

String[] aRecord;
while (!reader.EndOfStream)
{
aRecord = reader.ReadLine().Split(deliniators,
StringSplitOptions.None);

yield return aRecord;
}

}

protected abstract Stream GetStream();

}

public class CSVFileParser : CSVParser
{
// to do: add logic to get a stream from a file

protected override Stream GetStream()
{
throw new NotImplementedException();
}
}

public class CSVWebParser : CSVParser
{
// to do: add logic to get a stream from a web request

protected override Stream GetStream()
{
throw new NotImplementedException();
}
}

public class Person
{
public String Name { get; set; }
public String Address { get; set; }
public DateTime DOB { get; set; }
}

public class PersonParser
{

public PersonParser(CSVParser parser)
{
this.Parser = parser;
}

public CSVParser Parser { get; set; }

public IEnumerable<Person> GetPersons()
{
foreach (String[] record in this.Parser.GetRecords())
{
yield return new Person()
{
Name = record[0],
Address = record[1],
DOB = DateTime.Parse(record[2]),
};
}
}
}
}

CodeProject上有两篇文章提供了解决方案的代码，一篇使用StreamReader，另一篇使用Microsoft文本驱动程序导入CSV数据。

一种简单的好方法是打开文件，然后将每一行读入数组，链接列表和您选择的数据结构。但是要小心处理第一行。

这可能很麻烦，但是似乎也可以使用连接字符串直接访问它们。

为什么不尝试使用Python代替C＃或VB？它有一个不错的CSV模块导入，可以为您完成所有繁重的工作。

我输入了一些代码。 datagridviewer中的结果看起来不错。它将一行文本解析为对象的数组列表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

enum quotestatus
{
none,
firstquote,
secondquote
}
public static System.Collections.ArrayList Parse(string line,string delimiter)
{
System.Collections.ArrayList ar = new System.Collections.ArrayList();
StringBuilder field = new StringBuilder();
quotestatus status = quotestatus.none;
foreach (char ch in line.ToCharArray())
{
string chOmsch ="char";
if (ch == Convert.ToChar(delimiter))
{
if (status== quotestatus.firstquote)
{
chOmsch ="char";
}
else
{
chOmsch ="delimiter";
}
}

if (ch == Convert.ToChar(34))
{
chOmsch ="quotes";
if (status == quotestatus.firstquote)
{
status = quotestatus.secondquote;
}
if (status == quotestatus.none )
{
status = quotestatus.firstquote;
}
}

switch (chOmsch)
{
case"char":
field.Append(ch);
break;
case"delimiter":
ar.Add(field.ToString());
field.Clear();
break;
case"quotes":
if (status==quotestatus.firstquote)
{
field.Clear();
}
if (status== quotestatus.secondquote)
{
status =quotestatus.none;
}
break;
}
}
if (field.Length != 0)
{
ar.Add(field.ToString());
}
return ar;
}

今年夏天，我不得不在.NET中为项目使用CSV解析器，并决定使用Microsoft Jet文本驱动程序。您可以使用连接字符串指定文件夹，然后使用SQL Select语句查询文件。您可以使用schema.ini文件指定强类型。起初我没有这样做，但是后来我得到了不好的结果，即数据类型不是立即显而易见的，例如IP号或" XYQ 3.9 SP1"之类的条目。

我遇到的一个限制是它不能处理超过64个字符的列名。它会截断。这应该不是问题，除非我正在处理设计很差的输入数据。它返回一个ADO.NET数据集。

这是我发现的最佳解决方案。我会警惕使用自己的CSV解析器，因为我可能会错过一些最终案例，而且我没有找到用于.NET的任何其他免费CSV解析包。

编辑：而且，每个目录只能有一个schema.ini文件，因此我将动态地附加到该文件以强烈键入所需的列。它只会强类型化指定的列，并推断出任何未指定的字段。我真的很感激，因为我要处理的是输入70+列的连续CSV，并且不想指定每个列，而只是指定行为不当的列。

如果可以保证数据中没有逗号，则最简单的方法可能是使用String.split。

例如：

1
2
3