Listed below are some SQL Server Integration Services (SSIS) best
practices:
1.Keep it simple.
Avoid using components
unnecessarily. For example:
Step 1. Declare the variable
varServerDate.
Step 2. Use ExecuteSQLTask in the control flow to execute
a SQL query to get the server date-time and store it in the variable
Step
3. Use the dataflow task and insert/update database with the server date-time
from the variable varServerDate.
This sequence is advisable only in cases
where the time difference from step 2 to step 3 really matters. If that doesn't
really matter, then just use the getdate() command at step 3, as shown
below:
--create table #table1 (Lap_Id int, LAP_Date
datetime)
Insert into #table1 (Lap_Id, LAP_Date) values (1,
getdate())2.Calling a child package multiple times from a parent with different
parameter values.
When a child package is executed from a master package,
the parameters that are passed from the master need to be configured in the
child package. For this, you can use the ‘Parent Package Configuration’ option
in the child package. But, for using the ‘Parent Package Configuration’, you
need to specify the name of the ‘Parent Package Variable’ that is passed to the
child package. If you want to call the same child package multiple times (each
time with a different parameter value), declare the parent package variables
(with the same name as given in the child package) with a scope limited to
‘Execute Package Tasks’.
SSIS allows declaring variables with the same
name but the scope limited to different tasks – all inside the same
package!
3.SQL job with many atomic steps.
For the SQL job
that calls the SSIS packages, make multiple steps, each doing small tasks,
rather than a single step doing all the tasks. In the first case, the
transaction log grows too big, and if a rollback happens, it may take the full
processing space of the server.
4.Avoid unnecessary typecasts.
Avoid unnecessary type casts. For example, the flat file connection
manager, by default, uses the string [DT_STR] data type for all the columns. In
case you want to use the actual data types, you have to manually change it.
Better to change it at the source-level itself to avoid unnecessary type
castings.
5.Transactions.
Usually, the ETL processes handle
large volumes of data. In such a scenario, do not attempt a transaction on the
whole package logic. However, SSIS supports transaction, and it is advisable to
use transactions where the atomicity of the transaction is taken care
of.
For example, consider a scenario where a source record is to be
spitted into 25 records at the target - where either all the 25 records reach
the destination or zero. In this scenario, using a transaction, we can ensure
either all the 25 records reach the destination or
zero.
6.Distributed transaction spanning multiple
tasks.
The control flow of an SSIS package threads together various
control tasks. It is possible to set a transaction that can span into multiple
tasks using the same connection. To enable this, the “retainsameconnection”
property of the Connection Manager should be set to “True”.
7.Limit
the package names to a maximum of 100 characters.
When an SSIS
package with a package name exceeding 100 chars is deployed into SQL Server, it
trims the package name to 100 chars, which may cause an execution failure. So,
limit the package names to a maximum of 100 characters.
8.Select *
from…
Make sure that you are not passing any unnecessary columns from
the source to the downstream. With the OLEDB connection manager source, using
the ‘Table or View’ data access mode is equivalent to ‘SELECT * FROM
’, which will fetch all the columns. Use ‘SQL command’ to fetch only
the required columns, and pass that to the downstream. At each down-stream
component, filter out the unnecessary
columns.
9.Sorting.
Sorting in SSIS is a time consuming
operation. At the places where we know that data is coming from database tables,
it’s better to perform the sorting operation at the database query
itself.
10.Excel Source and 64-bit runtime.
The Excel
Source or Excel Connection manager works only with the 32 bit runtime. When a
package using the Excel Source is enabled for 64-bit runtime (by default, it is
enabled), it will fail on the production server using the 64-bit runtime. Go to
the solution property pages\debugging and set Run64BitRuntime to
False.
11.On failure of a component, stop/continue the execution with
the next component.
When a component fails, the property
failParentonFailure can be effectively used either to stop the package execution
or continue with the next component - exception - stop/continue with the next
component in a sequence container. The value of the constraint connecting the
components in the sequence should be set to "Completion", and the
failParentonFailure property should be set to False
(default).
12.Protection.
To avoid most of the package
deployment error from one system to another system, set the package protection
level to ‘DontSaveSenstive’.
13.Copy pasting the Script
component.
Once you copy-paste a script component and execute the
package, it may fail. Just open the script editor of the pasted script
component, save the script, and execute the package – it will
work.
14.Configuration filter – Use as a filter.
It is a
best practice to use the package name as the configuration filter for all the
configuration items that are specific to a package. It is especially useful when
there are so many packages with package-specific configuration items. For the
configuration items that are general to many packages, use a generic
name.
15.Optimal use of configuration records.
Avoid the
same configuration item recorded under different filter/object names. For
example, if two packages are using the same connection string, you need only one
configuration record. To enable this, use the same name for the connection
manager in both the packages. Also, use a generic configuration filter. This is
quite convenient at the time of porting from one environment to another (e.g.:
from UAT to production).
No comments:
Post a Comment