Friday, January 13, 2012

Production Support – Story About Smooth Transition



The following information protected by USA and international law. Any usage of this publication or its part can be allowed per request. Consultation available - leave your email address as a comment.
Foreword

Production support is the last, biggest, and one of the most complicated and difficult part in the Software Development Life Circle (SDLC). Unlike development, it meets real, not imaging environment. It’s like a main dish of the dinner and all participants of IT process involved into production support. That’s why these notes addressed not only to production support team, but also, and may be first of all, to managers, developers and DBA’s.
Production support usually begins from release. Release can be provided either first time (initial release - IR) when a new project moves to the regular running for users, or release following IR (following release – FR) when it provided on already working IT system.
Initial Release
IR usually means step-by-step process of moving from existing (legacy) system or from scratch to new one, and replacing developers with production support (PS) team (PST). It’s probably the most exciting moment because users begin to get data they expected, and PST begins to take care of this new-born system. The main issue here is the process of transferring knowledge from developers to users and PST. Despite everybody in IT knows that it’s necessary to have clear, complete and well structured documentation, reality is different. Usually in real life, transition from developers to PST provided in a restricted form as a short term training accompanied by short fragmental notes on some key nodes of initial design.
Probably, the best way to provide smooth IR is to include most knowledgeable and experienced members of PST into development team.
These PST agents will get the necessary knowledge from developers beginning from the complex testing and write documentation for PST. Obvious, that as PST members, these PST agents will do their best to prepare the best documentation they can for their PST team. Then, they transfer their knowledge to the rest of members of PST using and polish documentation they prepared. Usually, it should be provided short time before IR, necessary to transfer knowledge and mentor other members of PST. Depending on how the new IT process is big, this transferring can take from a few days to a few months.
At the same time, developers would gain a lot from such transition of knowledge to PST through PST agents, because they will not spend a lot of time on writing documentation (just periodical discussion with PST agents), and will not spend a lot of time on training whole PST, only PST agents.
As result, IR would run faster and smoother with fewer issues.

Following Release
FR means the adding some functions or just a regular fix on already released IT system. The main issue here is not to damage existing IT system, which sometimes runs around clock. To provide this release, it’s necessary to do not abruptly, as one shot, transforming from running IT system to modified one, but step-by-step, as smoothly as possible. This is especially relevant to data modification, because creating new set of processing data takes much more time then switching to new programs. To illustrate some methods how to do it, let me give you a couple of examples from my experience of 7 TB Oracle Data Warehouse support.
The first method of such transition is to run modified data processing in parallel with existing one.
-          To exclude not necessary data from the processing in tens GB table, there were built two new tables: one with necessary data which should be used in modified system, and other table contained not used data. The tables were created and loaded almost completely prior to release. The last portion (current partition of data) has been loaded at the first day of release. Existing old table still was used as a source for reports.
-          Then, running a few statements script (it took a small fraction of a second), existing table was renamed getting suffix ‘_old‘ and two new tables were merge (using UNION statement) into one view with the same name as was a processed table. Such way, the existing table was divided in two tables: one with necessary data and other one with the rest of data. The switch was seamless to users, and the rollback script with renaming back has been prepared and kept ready to run in case of rollback.
-          After splitting data, the data loading procedure has been replaced with intermediate one which loaded data both into the old table (in case of rollback) and a new couple which was used through view created on the previous step.
-          In a couple of days after using splitting data and stable work of all applications using this data, the step of cutting not used data has been provided. It was a dropping the view created on the first day of release, and new table with only used data was renamed to the same name as had initial (now ‘_old’) table, and view was dropped (again, it took a small fraction of a second).
-          Then, intermediate procedure was replaced with new one which loaded only used data into the new table having already the same name as it had prior to release. Release has been completed.
-          After that, data existed before release and not used data were compressed and saved temporary for couple of months, then dropped later.
Such way of release has some advantages:
-          It was seamless to users, eliminating shutdown and access disruptions.
-          There was no risk of losing some data.
-          At any moment there was a chance immediately step back operating only with DDL statements, not DML.
-          It allowed operate with tens of GB amount of data using old data preloading.
Such way was the similar one as an alpinist climbing to the peak of mountain always touching the rock by two hands and foot, or by two feet and hand, minimizing the risk of falling .  It also can be named as a “sliding step” because each step of release provided as putting one foot close to other one.
Extra precautions are the similar steps like extra pop-up box (sometimes, even more than one) which appeared when you clicked “delete” on some file. Then, deleted file saved in Recycle Bin.
The second example illustrates extension of existing data processing system smoothly transforming it to the new one. In Oracle 9, there was a permanent problem to add extra disk space to the permanently growing data. Traditional DBA approach was a physical relocation data files to the new, just added, disk space. Such operation took time and impacted user’s access to data. To eliminate shutdown and access disruption, I moved approach in other direction: not relocate existing files, but create new ones on new disk space and assign the files to the tablespaces which was necessary to increase. As result, process of increasing space for tables became seamless to users, eliminating access disruption. Later, such approach has been implemented in the next versions of Oracle.
Such approach can be implemented to resolve other issues of growing existing data processing system. It can be formulated as a striving not to restructure existing data processing system, but adding external addendum which would extend existing system to the new, more powerful level. In the new extended system, old one will exist as a subset which will not contradict to a new functionality, but continue to work as a part of it.
Final conclusion: to make release process more efficient and successful, it’s necessary to strive to make it smooth, gradually moving from existing system to the new one, always been ready to step back. Such release looks longer than a one big jump, but in the real life it does not impact user that is most important, and eventually transition takes less time and provided with less efforts and expenses.
Rules of releases:
1.      Step-by-step providing.
2.      Maximum of parallel processing (both old and new ways of data processing).
3.      Smooth transition.
4.      Spending extra time for reliability reduces losses from possible crashes.
5.      Readiness to rollback which should be included in plan and test.
6.      Save old programs and especially data because it’s not reproducible.  

Epilogue
Production support is the longest and most expensive step of SDLC, but it’s not been described a lot in the IT theory. That’s why I post in the blog the first article on this subject, expecting to get more examples from best practice and notes about small and big issues. Such way, we can help each other and enrich our knowledge in this area.