Showing posts with label advice. Show all posts
Showing posts with label advice. Show all posts

Thursday, March 8, 2012

Automated Batch Job Over Numerous Data Files, How?


Hello, Everyone: Greetings!

I am new to Sql Server [Express version] and am not even sure I'm making the right choice. So here I am, seeking advice.

My database needs are nothing sophisticated. They just involve:
(a) create tens of thousands of separate data files each under a unique file name of up to 8 characters, with file names read in from a pre-determined file name list.
(b) store/insert VOLUMINOUS numerical data into each of the data files, with the data indexed by date&time, plus maybe one or two additional character or string fields.
(c) for each data file, retrieve a subset of its data, perform extensive numerical calculations, and then store the results in one or more separate corresponding files, e.g. if a file name in (b) is F12345, (c) creates F12345R1, F12345R2, F12345R3, etc. which stores different sets of calculated results.

Thus, this is purely a console application, doing a batch job, and requiring no graphical user interface. Both automation and speed are important here, due to the large number of data files that must be created and/or updated, and the very extensive numerical calculations on the data.

The goal is to automate the daily or weekly creation of each of the tens of thousands of Sql Server database files, insert fresh data (read in from a fresh ASCII file) into each file, numerically process the data and then store the results in one or more separate, corresponding result data files, with all the steps automated and without need for GUI. Once coding is done, the entire data processing session is expected to run for many hours, or even days, in an automated manner, and without human intervention.

What would be the most efficient way of doing this under Visual Basic Express (which is what I'm learning to use) by directly calling Sql Server Express without having to go through GUI to create database files? What is the proper interface utility or library to use to enable direct database function calls without the need to learn SQL language? Is Visual Basic and/or Sql Server even good choices for what I want to do? I want to be able to call the basic, simple database functions directly and simply from program code in a non-GUI, non-interactive manner for the tens of thousands of separate data files that will be used.

I really miss the good old days when one can do a straightforward batch job via a console application, with simple, direct calls to create new data files, insert and index fresh data, retrieve any subset of data to do extensive calculations, create new files to store the results, etc. all under automated program control and iterating through unlimited number of data files, until the job is finished, all without human intervention during processing.

Or am I missing something because all this can still be done simply and easily under VB and Sql Server? I've several books here about Visual Basic 2005 and Visual Basic 2005 Express, all showing how to create a database via a GUI utility. That's fine if one needs to create just one or two databases, but not hundreds, or even tens of thousands (as in my case) of them on the fly.

So, I am looking for the simplest and most direct database interface that will allow me to do the above under VB program code alone, and easily. For something as simple as I have described above, I don't think I should have to learn the SQL language or manually create each database file.

As you can see, I just want to get some heavy duty numerical processing job done over tens of thousands of data files as simply and efficiently as possible, and with as little fanciful detour as possible. So, ironically, I am trying to use Visual Basic without being cluttered by having to learn its "Visual" aspects, yet GUI is what most VB books devote to or emphasize heavily. Similarly, I would much rather use simple, "lean and mean", direct database function calls than having to learn a new vocabulary of "English-like" SQL language.
Yes, I'm not used to this tedious detour of learning the GUI aspect of VB, or learning the Structured Query Language of Sql Server, just to try to do something simple that I need to do in batch mode via a console application.

Are there any good books or other helpful URLs that will help a guy like me? Am I even using the wrong language and the wrong database to do what I want to do? What are the better alternatives, if any? Any advice, experience and pointers on any of the above issues raised would be very much appreciated. Thank you!

Regards,
ConsoleApp

You've asked a fairly large, fairly non-trivial question...

SQL Server 2005 Express Edition is a good place to start, as you've surmised. Depending on your background and expertise, Visual C# or Visual Basic from Visual Studio 2005 would be a great programming environment. You'll be able to create a new database and add tables, as well as use the database project for managing the TSQL scripts that can be used to create these database objects on the fly when your application runs. Depending on your indexing vs. partitioning decision, you may only ever need to create the database and tables one time.

In both C# and Visual Basic, a console app that does what you're asking it to do should be possible. My only concern (follow up in one of the Visual Studio forums for an autoritative answer and guide to this) is the ability to use some of the great data designer tools in a UI-less application -- the designers and associated runtime controls are targeted toward a rich client application and thus assume a windowed application. For that reason, you may find that prototyping your application as a Windows Forms application may be easier to start with and then transition to a non-UI application at a later date (note that a windowed app can still serve as the container for a batch processing type app such as this...what you're asking for is an application that doesn't display UI that blocks the progress of the job).

As to some of the other aspects of your application, you will more likely want to create lots of tables, not databases. There is a very rough correlation between databases and directories and between tables and files...that general mindset will probably help you transition your mental model. Also, look at the justification behind partitioning your data into many, many files...in a database, you can use indexing to differentiate these groupings of data (e.g. SELECT * FROM [Data] WHERE [SensorID]=4). For extremely large data sets, there may still be some value in partitioning, but I suspect that this is a tertiary consideration at this point.

And finally, your processed data could be stored in additional tables much as your input data was. On these tables you could then do additional analysis/reporting...

Are you parsing out non-contiguous subsets of data (e.g. [1-10], [11-20], [21-30] or is it [1,2,4,10,12], [2, 20,21,30,33], etc) on a regular basis? Or do you process data files in their entirety?

Automated Batch Job Over Numerous Data Files, How?


Hello, Everyone: Greetings!

I am new to Sql Server [Express version] and am not even sure I'm making the right choice. So here I am, seeking advice.

My database needs are nothing sophisticated. They just involve:
(a) create tens of thousands of separate data files each under a unique file name of up to 8 characters, with file names read in from a pre-determined file name list.
(b) store/insert VOLUMINOUS numerical data into each of the data files, with the data indexed by date&time, plus maybe one or two additional character or string fields.
(c) for each data file, retrieve a subset of its data, perform extensive numerical calculations, and then store the results in one or more separate corresponding files, e.g. if a file name in (b) is F12345, (c) creates F12345R1, F12345R2, F12345R3, etc. which stores different sets of calculated results.

Thus, this is purely a console application, doing a batch job, and requiring no graphical user interface. Both automation and speed are important here, due to the large number of data files that must be created and/or updated, and the very extensive numerical calculations on the data.

The goal is to automate the daily or weekly creation of each of the tens of thousands of Sql Server database files, insert fresh data (read in from a fresh ASCII file) into each file, numerically process the data and then store the results in one or more separate, corresponding result data files, with all the steps automated and without need for GUI. Once coding is done, the entire data processing session is expected to run for many hours, or even days, in an automated manner, and without human intervention.

What would be the most efficient way of doing this under Visual Basic Express (which is what I'm learning to use) by directly calling Sql Server Express without having to go through GUI to create database files? What is the proper interface utility or library to use to enable direct database function calls without the need to learn SQL language? Is Visual Basic and/or Sql Server even good choices for what I want to do? I want to be able to call the basic, simple database functions directly and simply from program code in a non-GUI, non-interactive manner for the tens of thousands of separate data files that will be used.

I really miss the good old days when one can do a straightforward batch job via a console application, with simple, direct calls to create new data files, insert and index fresh data, retrieve any subset of data to do extensive calculations, create new files to store the results, etc. all under automated program control and iterating through unlimited number of data files, until the job is finished, all without human intervention during processing.

Or am I missing something because all this can still be done simply and easily under VB and Sql Server? I've several books here about Visual Basic 2005 and Visual Basic 2005 Express, all showing how to create a database via a GUI utility. That's fine if one needs to create just one or two databases, but not hundreds, or even tens of thousands (as in my case) of them on the fly.

So, I am looking for the simplest and most direct database interface that will allow me to do the above under VB program code alone, and easily. For something as simple as I have described above, I don't think I should have to learn the SQL language or manually create each database file.

As you can see, I just want to get some heavy duty numerical processing job done over tens of thousands of data files as simply and efficiently as possible, and with as little fanciful detour as possible. So, ironically, I am trying to use Visual Basic without being cluttered by having to learn its "Visual" aspects, yet GUI is what most VB books devote to or emphasize heavily. Similarly, I would much rather use simple, "lean and mean", direct database function calls than having to learn a new vocabulary of "English-like" SQL language.
Yes, I'm not used to this tedious detour of learning the GUI aspect of VB, or learning the Structured Query Language of Sql Server, just to try to do something simple that I need to do in batch mode via a console application.

Are there any good books or other helpful URLs that will help a guy like me? Am I even using the wrong language and the wrong database to do what I want to do? What are the better alternatives, if any? Any advice, experience and pointers on any of the above issues raised would be very much appreciated. Thank you!

Regards,
ConsoleApp

You've asked a fairly large, fairly non-trivial question...

SQL Server 2005 Express Edition is a good place to start, as you've surmised. Depending on your background and expertise, Visual C# or Visual Basic from Visual Studio 2005 would be a great programming environment. You'll be able to create a new database and add tables, as well as use the database project for managing the TSQL scripts that can be used to create these database objects on the fly when your application runs. Depending on your indexing vs. partitioning decision, you may only ever need to create the database and tables one time.

In both C# and Visual Basic, a console app that does what you're asking it to do should be possible. My only concern (follow up in one of the Visual Studio forums for an autoritative answer and guide to this) is the ability to use some of the great data designer tools in a UI-less application -- the designers and associated runtime controls are targeted toward a rich client application and thus assume a windowed application. For that reason, you may find that prototyping your application as a Windows Forms application may be easier to start with and then transition to a non-UI application at a later date (note that a windowed app can still serve as the container for a batch processing type app such as this...what you're asking for is an application that doesn't display UI that blocks the progress of the job).

As to some of the other aspects of your application, you will more likely want to create lots of tables, not databases. There is a very rough correlation between databases and directories and between tables and files...that general mindset will probably help you transition your mental model. Also, look at the justification behind partitioning your data into many, many files...in a database, you can use indexing to differentiate these groupings of data (e.g. SELECT * FROM [Data] WHERE [SensorID]=4). For extremely large data sets, there may still be some value in partitioning, but I suspect that this is a tertiary consideration at this point.

And finally, your processed data could be stored in additional tables much as your input data was. On these tables you could then do additional analysis/reporting...

Are you parsing out non-contiguous subsets of data (e.g. [1-10], [11-20], [21-30] or is it [1,2,4,10,12], [2, 20,21,30,33], etc) on a regular basis? Or do you process data files in their entirety?

Wednesday, March 7, 2012

Automate syncing login ids of master db at remote server

Hi All,
I was hoping for some advice about automating syncing of login ids at
remote servers. I have implemented simple log shipping and that seems
to be working fine, any ids created on the database are replicated to
the standby database without issue, however any changes to the master
database logins are not replicated.
I have looked at the DTS "transfer logins task" however there appear to
be some limitations to this.
I have also looked at several scripts (sp_help_revlogin) provided by
MS, http://www.support.microsoft.com/?id=246133, however this process
seems to be manual, and I would like this to be automated, so that
should there be any changes to logins within the master db, they are
replicated to the standby server the next time the task is run.
Similarly I would like to sync any orphaned users at the standy server.
If anyone could please provide further information I would be grateful.
Many Thanks
Andrew
The way I do this in an automated manner is completely unsupported by
Microsoft.
I create a linked server for the standby. Then alter sp_addlogin as well as
sp_changepassword and add in a call to sp_addlogin and sp_changepassword to
the local server passing the same input arguments along. This causes the
logins to be added to both instances as well as handles password changes.
It is unsupported, because I modify system objects.
Mike
http://www.solidqualitylearning.com
Disclaimer: This communication is an original work and represents my sole
views on the subject. It does not represent the views of any other person
or entity either by inference or direct reference.
"astrally2005" <andrewdritchie@.yahoo.com.au> wrote in message
news:1142333250.388701.45050@.j33g2000cwa.googlegro ups.com...
> Hi All,
> I was hoping for some advice about automating syncing of login ids at
> remote servers. I have implemented simple log shipping and that seems
> to be working fine, any ids created on the database are replicated to
> the standby database without issue, however any changes to the master
> database logins are not replicated.
> I have looked at the DTS "transfer logins task" however there appear to
> be some limitations to this.
> I have also looked at several scripts (sp_help_revlogin) provided by
> MS, http://www.support.microsoft.com/?id=246133, however this process
> seems to be manual, and I would like this to be automated, so that
> should there be any changes to logins within the master db, they are
> replicated to the standby server the next time the task is run.
> Similarly I would like to sync any orphaned users at the standy server.
> If anyone could please provide further information I would be grateful.
> Many Thanks
> Andrew
>
|||Mike,
Thanks for your help,
It is a novel approach, and ensures up to date login information.
As I understand whenever a user is created via either an application or
Enterprise manager, the sp_addlogin sp is called.
However what if the remote server is unavailable at the time that a new
user is being created on the primary db?
I am not sure if I have the skills to amend these scripts, particularly
given that they are system objects.
Thanks once again,
Andrew
|||I have continued to search the web for a solution and have found some
scripts provided by Umachandar Jayachandran at
http://www.sqlmag.com/Article/Articl...er_25710.html,
which seem to work, in that the logins ids are located within
sysxlogins table within the master db on the standby server, however
these logins are not visible under users within Enterprise Manager,
does this matter?
CREATE PROCEDURE sp_Syncronize_Logins_from_prod AS
set ANSI_NULLS OFF
set ANSI_WARNINGS OFF
DECLARE @.logins cursor
DECLARE @.name sysname, @.password sysname,
@.dbname sysname, @.language sysname,
@.sid binary(16), @.isntuser bit
SET @.logins = cursor fast_forward FOR
SELECT l.loginname, l.password, l.dbname, l.language, l.sid,
l.isntuser
FROM [server\instance].master.dbo.syslogins AS l
WHERE l.loginname IS NOT NULL
OPEN @.logins
WHILE(1=1)
BEGIN
FETCH @.logins INTO @.name, @.password, @.dbname,
@.language, @.sid, @.isntuser
IF @.@.fetch_status < 0 break
IF is_srvrolemember( 'sysadmin', @.name ) IS NOT NULL
CONTINUE
IF @.isntuser = 0
EXEC sp_addlogin @.name, @.password, @.dbname,
@.language, @.sid, 'skip_encryption'
ELSE
BEGIN
EXEC sp_grantlogin @.name
EXEC sp_defaultdb @.name, @.dbname
EXEC sp_defaultlanguage @.name, @.language
END
END
DEALLOCATE @.logins
|||They should be visible. Enterprise Manager is looking at the sysxlogins
table.
Mike
http://www.solidqualitylearning.com
Disclaimer: This communication is an original work and represents my sole
views on the subject. It does not represent the views of any other person
or entity either by inference or direct reference.
"astrally2005" <andrewdritchie@.yahoo.com.au> wrote in message
news:1142602814.147621.269160@.e56g2000cwe.googlegr oups.com...
>I have continued to search the web for a solution and have found some
> scripts provided by Umachandar Jayachandran at
> http://www.sqlmag.com/Article/Articl...er_25710.html,
> which seem to work, in that the logins ids are located within
> sysxlogins table within the master db on the standby server, however
> these logins are not visible under users within Enterprise Manager,
> does this matter?
> CREATE PROCEDURE sp_Syncronize_Logins_from_prod AS
> set ANSI_NULLS OFF
> set ANSI_WARNINGS OFF
> DECLARE @.logins cursor
> DECLARE @.name sysname, @.password sysname,
> @.dbname sysname, @.language sysname,
> @.sid binary(16), @.isntuser bit
> SET @.logins = cursor fast_forward FOR
> SELECT l.loginname, l.password, l.dbname, l.language, l.sid,
> l.isntuser
> FROM [server\instance].master.dbo.syslogins AS l
> WHERE l.loginname IS NOT NULL
> OPEN @.logins
> WHILE(1=1)
> BEGIN
> FETCH @.logins INTO @.name, @.password, @.dbname,
> @.language, @.sid, @.isntuser
> IF @.@.fetch_status < 0 break
> IF is_srvrolemember( 'sysadmin', @.name ) IS NOT NULL
> CONTINUE
> IF @.isntuser = 0
> EXEC sp_addlogin @.name, @.password, @.dbname,
> @.language, @.sid, 'skip_encryption'
> ELSE
> BEGIN
> EXEC sp_grantlogin @.name
> EXEC sp_defaultdb @.name, @.dbname
> EXEC sp_defaultlanguage @.name, @.language
> END
> END
> DEALLOCATE @.logins
>

Saturday, February 25, 2012

Auto-Increment of varchar primary key

Hi All
I am looking for a bit of advice.
I am in the process of creating a database in which it has been decided that
all primary keys are going to varchar(40). Not my decision, but anyway.
When inserting into each table it will be possible to specify a value for
the primary, but if not specified a value should be auto-generated. That
means that the values in the primary key field can be a mixture of both
numbers and letters, but if auto-generated it should just be a number.
What be the best way to make this autogenerated values if no value is being
specified in the insert?
TIA
KlausDepends really, if auto-generated does it just need to be a number? Any old
number, or a specific format and range?
I always recommend putting a surrogate key on the tables and use that as the
foriegn key and inside the application (not for display purposes, but for
use as the value in a listbox for instance), that can be a int column with
the IDENTITY property, not null and have a unique constraint on it.
You could set the value of the primary key to that if not specified, that
would save calculating a new unique number.
Otherwise, you could use an 'instead of' trigger, for example...
Instead of using MAX, you could take the value from a table that holds the
last number used.
create table testtrg (
mycol int not null unique
)
go
insert testtrg ( mycol ) values ( 1 )
go
create trigger trgTestTrg on testtrg instead of insert
as
begin
if @.@.rowcount = 0
return
declare @.nextid int
begin tran
set @.nextid = ( select max( mycol )
from testtrg with (tablockx) )
set @.nextid = isnull( @.nextid, 0 ) + 1
insert testtrg values( @.nextid )
commit tran
end
go
-- Note, inserting 1 but it already exists so should give a key violation,
-- but the instead of trigger code kicks in and gives the next id.
select * from testtrg
insert testtrg ( mycol ) values( 1 )
select * from testtrg
insert testtrg ( mycol ) values( 1 )
select * from testtrg
insert testtrg ( mycol ) values( 1 )
select * from testtrg
go
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials
"Klaus" <Klaus@.discussions.microsoft.com> wrote in message
news:FE28E558-F88F-4A9F-9AAA-40837A9966E9@.microsoft.com...
> Hi All
> I am looking for a bit of advice.
> I am in the process of creating a database in which it has been decided
> that
> all primary keys are going to varchar(40). Not my decision, but anyway.
> When inserting into each table it will be possible to specify a value for
> the primary, but if not specified a value should be auto-generated. That
> means that the values in the primary key field can be a mixture of both
> numbers and letters, but if auto-generated it should just be a number.
> What be the best way to make this autogenerated values if no value is
> being
> specified in the insert?
> TIA
> Klaus
>|||Thanks a lot, Tony. That was very helpfull.
I will create a unique field on each of my tables. The value for this will
be auto-generated using identity. A trigger will then keep an eye on the
inserts. If no value is being specified for the Primary key, the Identity
value will be copied into the varchar(40) primary key field.
-- Klaus
"Tony Rogerson" wrote:

> Depends really, if auto-generated does it just need to be a number? Any ol
d
> number, or a specific format and range?
> I always recommend putting a surrogate key on the tables and use that as t
he
> foriegn key and inside the application (not for display purposes, but for
> use as the value in a listbox for instance), that can be a int column with
> the IDENTITY property, not null and have a unique constraint on it.
> You could set the value of the primary key to that if not specified, that
> would save calculating a new unique number.
> Otherwise, you could use an 'instead of' trigger, for example...
> Instead of using MAX, you could take the value from a table that holds the
> last number used.
> create table testtrg (
> mycol int not null unique
> )
> go
>
> insert testtrg ( mycol ) values ( 1 )
> go
>
> create trigger trgTestTrg on testtrg instead of insert
> as
> begin
> if @.@.rowcount = 0
> return
>
> declare @.nextid int
>
> begin tran
>
> set @.nextid = ( select max( mycol )
> from testtrg with (tablockx) )
>
> set @.nextid = isnull( @.nextid, 0 ) + 1
>
> insert testtrg values( @.nextid )
>
> commit tran
>
> end
> go
>
> -- Note, inserting 1 but it already exists so should give a key violation
,
> -- but the instead of trigger code kicks in and gives the next id.
> select * from testtrg
> insert testtrg ( mycol ) values( 1 )
> select * from testtrg
> insert testtrg ( mycol ) values( 1 )
> select * from testtrg
> insert testtrg ( mycol ) values( 1 )
> select * from testtrg
> go
>
>
> --
> Tony Rogerson
> SQL Server MVP
> http://sqlserverfaq.com - free video tutorials
>
> "Klaus" <Klaus@.discussions.microsoft.com> wrote in message
> news:FE28E558-F88F-4A9F-9AAA-40837A9966E9@.microsoft.com...
>
>|||The second part of Tony's point should not be lost. You should use that int
key as the FK for relationships with other tables. If you need to show your
client the benefit of using an int instead of a varchar(40). Load up a
couple of tables with some test data. Perform join's using varchar(40) as
the keys and then the same using int as the key. The performance difference
is noticable.
So use an int (or even bigint) PK, put a unique constraint on the
varchar(40) column and for all business logic purposes, the varchar(40) fiel
d
is the "key". But behind the scenes in the database the far more efficient
int is the key.
John Scragg
"Klaus" wrote:
> Thanks a lot, Tony. That was very helpfull.
> I will create a unique field on each of my tables. The value for this will
> be auto-generated using identity. A trigger will then keep an eye on the
> inserts. If no value is being specified for the Primary key, the Identity
> value will be copied into the varchar(40) primary key field.
> -- Klaus
> "Tony Rogerson" wrote:
>|||Also, if you need the data in a varchar(40) field you can use a calculated
column (if they dont need to enter it).
I concurr with Tony & John, if you're doing joins, definately use the INT
field as the joining field, joining on varchar fields gets very slow at
medium to high data volumes.
create table ( id int identity(1,1) primary key , myPK AS cast( ID as
varchar(40)) )
"John Scragg" <JohnScragg@.discussions.microsoft.com> wrote in message
news:EDB86798-0F96-415A-9D8D-733ED2E0CA02@.microsoft.com...
> The second part of Tony's point should not be lost. You should use that
int
> key as the FK for relationships with other tables. If you need to show
your
> client the benefit of using an int instead of a varchar(40). Load up a
> couple of tables with some test data. Perform join's using varchar(40) as
> the keys and then the same using int as the key. The performance
difference
> is noticable.
> So use an int (or even bigint) PK, put a unique constraint on the
> varchar(40) column and for all business logic purposes, the varchar(40)
field
> is the "key". But behind the scenes in the database the far more
efficient
> int is the key.
> John Scragg
> "Klaus" wrote:
>
will
Identity
Any old
as the
for
with
that
the
violation,
decided
anyway.
value for
That
both
number.
is