Skip to content Skip to sidebar Skip to footer

Reading a Table in a Csv File in C++ From Main Class?

This series will explore diverse aspects for importing a CSV file with comma-separated-values (.csv) into a SQL-Server database.  CSV files are a common style to share data in manifestly text format from sources such every bit a database table(s) to some other database due east.g. from SQL-Server to an Oracle database.

The accompanying source code and code blocks have been kept very simply and so that following along and learning the basics is non overwhelming as this generally can happen the deeper into the import process you lot the develop go.

When information exported from a database to a customer database that has a matching database table(south) with matching columns the process is not always simple, for case business rules may indicate new incoming data tin can't overwrite existing data or incoming data needs to be merged with existing information.

In the wild rarely is a simple import possible every bit database information types all take the aforementioned bones types but are handled differently from database to database. Couple this with a flat CSV file may need to exist carve up into multiple database tables.

Part 1-A part of the series

The following should always be considered when importing CSV files.

  • All columns are suspect to be missing altogether or missing in one or more rows.
  • Mixed information types, consider a column with dates where some rows may have malformed dates, dates setup for a different culture, columns that should be numeric were some rows have no value or unexpected format etc.
  • Columns which have values that are not valid to your business organization due east.m. a listing of products that demand to map to a product table where in that location are products that you lot don't handle.
  • Column values out of range due east.g. a numeric column has a range of 1 through 10 only incoming data has values 1 through 100.
  • The file is in use by another procedure and is locked.
  • The file is extremely big and processing time may take hours, have a plan such equally to run a nightly job.
  • Treatment rows/columns that don't fit into the database, have a programme to handle them as several examples volition exist shown in this serial.
  • Offering clients, a method(due south) to review suspect data, modify or refuse the information.
  • Consider an intermediate database tabular array then that processing doubtable data can be done over time especially when there is a large data set that may accept hours or days to procedure.

Consider working with CSV files equally a puzzle no matter what the construction should be and that parsing dissimilar files usually has their own quirks.

Part i goals

To read a simple CSV file just over 7,500 records, nine columns with types ranging from integer, bladder, date fourth dimension and strings with malformed data.

To parse data a TextFieldParser will be used to read and parse data. Alternates to a TextFieldParser are reading data using a Stream (StreamReader) or OleDb when sticking with pure Microsoft classes. Outside of this there are several libraries that can handle reading CSV files yet as stated this series is solely for working with Microsoft classes.

During parsing assertion is performed to validate data is the proper types, non empty and if in validate ranges. Data read in is placed into a listing of a grade designed to handle the data read in from the CSV file.

The TextFieldParser class does a great job at processing incoming information which is why this class was selected. As with any grade there can be unknowns which become known once you accept worked with them and learn them. With the TextFieldParser when looping though lines in a file, empty lines are skipped. In the code sample nothing is done but the line count will be off by the amount of empty lines encountered to what might be learned from opening the file in Notepad++ or similar text editor.  Using OleDb or a Steam lines are not ignored but zilch is truly gained if the tape count is correct e.k. there are 150 lines were 50 lines are empty and yous expect 100 lines of valid data. This ways yous have received the correct amount of data, just that at that place are empty lines to filter out.

Requires

Visual interface

The interface is done using Windows forms projection as these types of projects are easy to setup and then setting a spider web project upward coupled with a Windows course project demand not be installed on a user's automobile but instead may be executed from a shared location.

File selection

In the lawmaking samples below a hard-coded file is used, in the wild a file may be selected by a file selection dialog, past reading one or more files from a directory listing. If the procedure were to be from a directory listing then the results would get directly to a intermediate table for review while in the code samples provided here they are sent directly to a DataGridView.

Parsing data using StreamReader

Kickoff check to ensure the file to parse exists. In the following code block mHasException and mLastException are from a base exception course which the course for parsing inherits. The render blazon is a ValueTuple (installed using NuGet Parcel Manager).

if (!File.Exists(_inputFileName))

{

mHasException = true ;

mLastException = new FileNotFoundException($ "Missing {_inputFileName}" );

render (mHasException, new List<DataItem>(), new Listing<DataItemInvalid>() );

}

If the file exists the next footstep is to setup several variables which volition be used for validation purposes and return types which will contain valid and if presented invalid information when read in information from the CSV file.

var validRows = new List<DataItem>();

var invalidRows = new Listing<DataItemInvalid>();

var validateBad = 0;

int index = 0;

int district = 0;

int grid = 0;

int nCode = 0;

float breadth = 0;

float longitude = 0;

The following code block follows the code block higher up.

A while statement is used to loop through each line in the CSV file. For each line, split the line by comma in this case which is the most common delimiter. Next validate there are nine elements in the cord array. If there are not nine elements in the array then identify them into a possible reject container.

Note that the first line contains cavalcade names which is skip by checking the index/line number stored in the variable index.

Post-obit the bank check for nine elements int a line seven elements in the string assortment are checked to ensure they tin can exist converted to the expected data type ranging from engagement to numerics and too empty string values.

Passing the type check above the section under the comment Questionable fields volition do several more checks e.k. does the NICIC field contain data that is not in an expected range. Note all data should be checked here such as the data in role[3] as this can be subjective to the data in other elements in the array so this is left to the review process which will provides a grid with a dropdown of validate selections to select from. If in that location are issues to review a record a holding is gear up to flag the data for a manual review procedure and loaded into a listing.

try

{

using (var readFile = new StreamReader(_inputFileName))

{

string line;

string [] parts;

while ((line = readFile.ReadLine()) != null )

{

parts = line.Carve up( ',' );

index += 1;

if (parts == aught )

{

interruption ;

}

index += 1;

validateBad = 0;

if (parts.Length != nine)

{

invalidRows.Add( new DataItemInvalid() { Row = alphabetize, Line = string .Join( "," , parts) });

continue ;

}

// Skip first row which in this instance is a header with column names

if (index <= 1) go on ;

/*

* These columns are checked for proper types

*/

var validRow = DateTime.TryParse(parts[0], out var d) &&

float .TryParse(parts[7].Trim(), out latitude) &&

float .TryParse(parts[8].Trim(), out longitude) &&

int .TryParse(parts[two], out district) &&

int .TryParse(parts[4], out filigree) &&

! string .IsNullOrWhiteSpace(parts[v]) &&

int .TryParse(parts[vi], out nCode);

/*

* Questionable fields

*/

if ( cord .IsNullOrWhiteSpace(parts[i]))

{

validateBad += 1;

}

if ( cord .IsNullOrWhiteSpace(parts[3]))

{

validateBad += 1;

}

// NICI lawmaking must be 909 or greater

if (nCode < 909)

{

validateBad += 1;

}

if (validRow)

{

validRows.Add together( new DataItem()

{

Id = index,

Date = d,

Accost = parts[1],

District = district,

Beat = parts[3],

Grid = filigree,

Description = parts[5],

NcicCode = nCode,

Breadth = latitude,

Longitude = longitude,

Inspect = validateBad > 0

});

}

else

{

// fields to review in specific rows

invalidRows.Add( new DataItemInvalid() { Row = index, Line = cord .Bring together( "," , parts) });

}

}

}

}

catch (Exception ex)

{

mHasException = true ;

mLastException = ex;

}

Once the above code has completed the post-obit line of code returns data to the calling form/window which is a ValueTupler.

return (IsSuccessFul, validRows, invalidRows);


Parsing data using TextFieldParser

This case uses a TextFieldParser to process data. Rather and then splitting lines manually as done higher up TextFieldParser.ReadFields method handles the splitting by the delimiter assigned in parser.Delimiters. The remainder for validating information is no different then done with StreamReader. Ane major divergence is empty lines are ignored unlike with SteamReader.

public ( bool Success, List<DataItem>, List<DataItemInvalid>, int EmptyLineCount) LoadCsvFileTextFieldParser()

{

mHasException = fake ;

var validRows = new Listing<DataItem>();

var invalidRows = new List<DataItemInvalid>();

var validateBad = 0;

int alphabetize = 0;

int district = 0;

int grid = 0;

int nCode = 0;

float breadth = 0;

bladder longitude = 0;

var emptyLineCount = 0;

var line = "" ;

effort

{

/*

* If interested in bare line count

*/

using (var reader = File.OpenText(_inputFileName))

{

while ((line = reader.ReadLine()) != nix ) // EOF

{

if ( cord .IsNullOrWhiteSpace(line))

{

emptyLineCount++;

}

}

}

using (var parser = new TextFieldParser(_inputFileName))

{

parser.Delimiters = new [] { "," };

while ( true )

{

string [] parts = parser.ReadFields();

if (parts == zero )

{

break ;

}

index += 1;

validateBad = 0;

if (parts.Length != 9)

{

invalidRows.Add together( new DataItemInvalid() { Row = index, Line = string .Join( "," , parts) });

keep ;

}

// Skip first row which in this case is a header with cavalcade names

if (alphabetize <= 1) continue ;

/*

* These columns are checked for proper types

*/

var validRow = DateTime.TryParse(parts[0], out var d) &&

bladder .TryParse(parts[7].Trim(), out breadth) &&

float .TryParse(parts[8].Trim(), out longitude) &&

int .TryParse(parts[2], out district) &&

int .TryParse(parts[iv], out grid) &&

! cord .IsNullOrWhiteSpace(parts[5]) &&

int .TryParse(parts[6], out nCode);

/*

* Questionable fields

*/

if ( string .IsNullOrWhiteSpace(parts[1]))

{

validateBad += one;

}

if ( string .IsNullOrWhiteSpace(parts[3]))

{

validateBad += one;

}

// NICI code must be 909 or greater

if (nCode < 909)

{

validateBad += 1;

}

if (validRow)

{

validRows.Add( new DataItem()

{

Id = index,

Date = d,

Address = parts[1],

Commune = district,

Beat = parts[iii],

Grid = grid,

Description = parts[5],

NcicCode = nCode,

Breadth = latitude,

Longitude = longitude,

Inspect = validateBad > 0

});

}

else

{

// fields to review in specific rows

invalidRows.Add( new DataItemInvalid() { Row = index, Line = cord .Join( "," , parts) });

}

}

}

}

grab (Exception ex)

{

mHasException = true ;

mLastException = ex;

}

return (IsSuccessFul, validRows, invalidRows,emptyLineCount);

}


Parsing information using OleDb

This method "reads" lines from a CSV file with the disadvantage of all fields are not typed and comport more luggage than needed for processing lines from the CSV file which will make a difference in fourth dimension to procedure with larger CSV files.

public DataTable LoadCsvFileOleDb()

{

var connString = $@ "Provider=Microsoft.Jet.OleDb.4.0;....." ;

var dt = new DataTable();

endeavor

{

using (var cn = new OleDbConnection(connString))

{

cn.Open();

var selectStatement = "SELECT * FROM [" + Path.GetFileName(_inputFileName) + "]" ;

using (var adapter = new OleDbDataAdapter(selectStatement, cn))

{

var ds = new DataSet( "Demo" );

adapter.Fill up(ds);

ds.Tables[0].TableName = Path.GetFileNameWithoutExtension(_inputFileName);

dt = ds.Tables[0];

}

}

}

catch (Exception ex)

{

mHasException = truthful ;

mLastException = ex;

}

return dt;

}


Reviewing

The following window has several buttons at the bottom. The Process button executes reading the CSV file using in this case StreamReader. The dropdown will contain any line number which needs to be inspected, pressing the inspect button moves to that line in the filigree, this would be for a modest amount of lines with issues or to go a visual on a possible larger problem. The push labeled Review will popup a child window to allow edits that volition update the principal window below.

Kid window shown when pressing the "Review" button.

The but truthful validation done on this window is to provide a listing of valid values for the shell field using a Dropdown from a static list. As this serial continues a database reference table will supercede the static listing.

Code for validating through a Drop-downwards in the DataGridView.

using System;

using System.Collections.Generic;

using System.ComponentModel;

using Organisation.Information;

using Arrangement.Drawing;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using Organization.Windows.Forms;

using WindowsFormsApp1.Classes;

namespace WindowsFormsApp1

{

public partial class ReviewForm : Grade

{

private BindingSource _bs = new BindingSource();

individual List<DataItem> _data;

/// <summary>

/// Provide access by the calling form to the data presented

/// </summary>

public List<DataItem> Information

{

become { return _data; }

}

/// <summary>

/// Acceptable values for vanquish field. In role 2 these will be read from a database reference table.

/// </summary>

private List< string > _beatList = new List< string >()

{

"1A" , "1B" , "1C" , "2A" , "2B" , "2C" , "3A" , "3B" , "3C" , "3M" , "4A" ,

"4B" , "4C" , "5A" , "5B" , "5C" , "6A" , "6B" , "6C"

} ;

public ReviewForm()

{

InitializeComponent();

}

public ReviewForm(List<DataItem> pData)

{

InitializeComponent();

_data = pData;

Shown += ReviewForm_Shown;

}

private void ReviewForm_Shown( object sender, EventArgs e)

{

dataGridView1.AutoGenerateColumns = false ;

// ReSharper disable once PossibleNullReferenceException

((DataGridViewComboBoxColumn) dataGridView1.Columns[ "beatColumn" ]).DataSource = _beatList;

_bs.DataSource = _data;

dataGridView1.DataSource = _bs;

dataGridView1.ExpandColumns();

dataGridView1.EditingControlShowing += DataGridView1_EditingControlShowing;

}

/// <summary>

/// Setup to provide access to changes to the current row, here nosotros are only interested in the beat field.

/// Other fields would use similar logic for providing valid selections.

/// </summary>

/// <param name="sender"></param>

/// <param name="e"></param>

individual void DataGridView1_EditingControlShowing( object sender, DataGridViewEditingControlShowingEventArgs e)

{

if (dataGridView1.CurrentCell.IsComboBoxCell())

{

if (dataGridView1.Columns[dataGridView1.CurrentCell.ColumnIndex].Name == "beatColumn" )

{

if (east.Command is ComboBox cb)

{

cb.SelectionChangeCommitted -= _SelectionChangeCommitted;

cb.SelectionChangeCommitted += _SelectionChangeCommitted;

}

}

}

}

/// <summary>

/// Update electric current row beat field

/// </summary>

/// <param name="sender"></param>

/// <param proper name="e"></param>

private void _SelectionChangeCommitted( object sender, EventArgs due east)

{

if (_bs.Current != nada )

{

if (! cord .IsNullOrWhiteSpace(((DataGridViewComboBoxEditingControl)sender).Text))

{

var currentRow = (DataItem) _bs.Current;

currentRow.Shell = ((DataGridViewComboBoxEditingControl) sender).Text;

currentRow.Inspect = false ;

}

}

}

}

}

Extension methods used in the above code blocks.

namespace WindowsFormsApp1.Classes

{

public static class DataGridViewExtensions

{

/// <summary>

/// Expand all columns excluding in this case Orders column

/// </summary>

/// <param name="sender"></param>

public static void ExpandColumns( this DataGridView sender)

{

sender.Columns.Cast<DataGridViewColumn>().ToList()

.ForEach(col => col.AutoSizeMode = DataGridViewAutoSizeColumnMode.AllCells);

}

/// <summary>

/// Used to determine if the current prison cell type is a ComboBoxCell

/// </summary>

/// <param name="sender"></param>

/// <returns></returns>

public static bool IsComboBoxCell( this DataGridViewCell sender)

{

var outcome = false ;

if (sender.EditType != null )

{

if (sender.EditType == typeof (DataGridViewComboBoxEditingControl))

{

result = truthful ;

}

}

render result;

}

}

}

Data classes to contain information read from the CSV file.

Good/questionable data grade

namespace WindowsFormsApp1.Classes

{

public form DataItem

{

public int Id { go ; set ; }

public DateTime Date { get ; set ; }

public string Address { become ; set ; }

public int Commune { get ; set ; }

public cord Trounce { go ; prepare ; }

public int Grid { get ; set ; }

public cord Description { get ; set ; }

public int NcicCode { go ; set ; }

public bladder Breadth { get ; ready ; }

public float Longitude { get ; fix ; }

public bool Inspect { get ; ready ; }

public cord Line => $ "{Id},{Engagement},{Accost},{Commune},{Beat}," +

$ "{Grid},{Description},{NcicCode},{Latitude},{Longitude}" ;

public override cord ToString()

{

return Id.ToString();

}

}

}


Invalid information class.

namespace WindowsFormsApp1.Classes

{

public class DataItemInvalid

{

public int Row { get ; set ; }

public cord Line { become ; set ; }

public override string ToString()

{

return $ "[{Row}] '{Line}'" ;

}

}

}


In this article thoughts/ideas forth with suggestions have been presented to dealing with CSV files which is to be considered a building block which continues in office ii of this serial.

montenegrosualleadiang.blogspot.com

Source: https://social.technet.microsoft.com/wiki/contents/articles/52030.c-processing-csv-files-part-1.aspx

Post a Comment for "Reading a Table in a Csv File in C++ From Main Class?"