Hello, Everyone: Greetings!
I am new to Sql Server [Express version] and am not even sure I'm making the right choice. So here I am, seeking advice.
My database needs are nothing sophisticated. They just involve:
(a) create tens of thousands of separate data files each under a unique file name of up to 8 characters, with file names read in from a pre-determined file name list.
(b) store/insert VOLUMINOUS numerical data into each of the data files, with the data indexed by date&time, plus maybe one or two additional character or string fields.
(c) for each data file, retrieve a subset of its data, perform extensive numerical calculations, and then store the results in one or more separate corresponding files, e.g. if a file name in (b) is F12345, (c) creates F12345R1, F12345R2, F12345R3, etc. which stores different sets of calculated results.
Thus, this is purely a console application, doing a batch job, and requiring no graphical user interface. Both automation and speed are important here, due to the large number of data files that must be created and/or updated, and the very extensive numerical calculations on the data.
The goal is to automate the daily or weekly creation of each of the tens of thousands of Sql Server database files, insert fresh data (read in from a fresh ASCII file) into each file, numerically process the data and then store the results in one or more separate, corresponding result data files, with all the steps automated and without need for GUI. Once coding is done, the entire data processing session is expected to run for many hours, or even days, in an automated manner, and without human intervention.
What would be the most efficient way of doing this under Visual Basic Express (which is what I'm learning to use) by directly calling Sql Server Express without having to go through GUI to create database files? What is the proper interface utility or library to use to enable direct database function calls without the need to learn SQL language? Is Visual Basic and/or Sql Server even good choices for what I want to do? I want to be able to call the basic, simple database functions directly and simply from program code in a non-GUI, non-interactive manner for the tens of thousands of separate data files that will be used.
I really miss the good old days when one can do a straightforward batch job via a console application, with simple, direct calls to create new data files, insert and index fresh data, retrieve any subset of data to do extensive calculations, create new files to store the results, etc. all under automated program control and iterating through unlimited number of data files, until the job is finished, all without human intervention during processing.
Or am I missing something because all this can still be done simply and easily under VB and Sql Server? I've several books here about Visual Basic 2005 and Visual Basic 2005 Express, all showing how to create a database via a GUI utility. That's fine if one needs to create just one or two databases, but not hundreds, or even tens of thousands (as in my case) of them on the fly.
So, I am looking for the simplest and most direct database interface that will allow me to do the above under VB program code alone, and easily. For something as simple as I have described above, I don't think I should have to learn the SQL language or manually create each database file.
As you can see, I just want to get some heavy duty numerical processing job done over tens of thousands of data files as simply and efficiently as possible, and with as little fanciful detour as possible. So, ironically, I am trying to use Visual Basic without being cluttered by having to learn its "Visual" aspects, yet GUI is what most VB books devote to or emphasize heavily. Similarly, I would much rather use simple, "lean and mean", direct database function calls than having to learn a new vocabulary of "English-like" SQL language.
Yes, I'm not used to this tedious detour of learning the GUI aspect of VB, or learning the Structured Query Language of Sql Server, just to try to do something simple that I need to do in batch mode via a console application.
Are there any good books or other helpful URLs that will help a guy like me? Am I even using the wrong language and the wrong database to do what I want to do? What are the better alternatives, if any? Any advice, experience and pointers on any of the above issues raised would be very much appreciated. Thank you!
Regards,
ConsoleApp
You've asked a fairly large, fairly non-trivial question...
SQL Server 2005 Express Edition is a good place to start, as you've surmised. Depending on your background and expertise, Visual C# or Visual Basic from Visual Studio 2005 would be a great programming environment. You'll be able to create a new database and add tables, as well as use the database project for managing the TSQL scripts that can be used to create these database objects on the fly when your application runs. Depending on your indexing vs. partitioning decision, you may only ever need to create the database and tables one time.
In both C# and Visual Basic, a console app that does what you're asking it to do should be possible. My only concern (follow up in one of the Visual Studio forums for an autoritative answer and guide to this) is the ability to use some of the great data designer tools in a UI-less application -- the designers and associated runtime controls are targeted toward a rich client application and thus assume a windowed application. For that reason, you may find that prototyping your application as a Windows Forms application may be easier to start with and then transition to a non-UI application at a later date (note that a windowed app can still serve as the container for a batch processing type app such as this...what you're asking for is an application that doesn't display UI that blocks the progress of the job).
As to some of the other aspects of your application, you will more likely want to create lots of tables, not databases. There is a very rough correlation between databases and directories and between tables and files...that general mindset will probably help you transition your mental model. Also, look at the justification behind partitioning your data into many, many files...in a database, you can use indexing to differentiate these groupings of data (e.g. SELECT * FROM [Data] WHERE [SensorID]=4). For extremely large data sets, there may still be some value in partitioning, but I suspect that this is a tertiary consideration at this point.
And finally, your processed data could be stored in additional tables much as your input data was. On these tables you could then do additional analysis/reporting...
Are you parsing out non-contiguous subsets of data (e.g. [1-10], [11-20], [21-30] or is it [1,2,4,10,12], [2, 20,21,30,33], etc) on a regular basis? Or do you process data files in their entirety?
No comments:
Post a Comment