How do I fix Discovery "duplicating" AIX records when AIX server moves between Hardware frames?

Thomas Vorster
Tera Guru

We are experiencing an issue with discovery of AIX Server CI with the serial number, CI entry duplication issue when we migrate/move virtual AIX servers between physical hardware and I am hoping that someone has encountered this before and was able to fix it.

Any help will be appreciated.

E.g. moving them from one of our Power 8 servers, to another - this created a duplicate CI. Then when when moving other AIX virtual hosts onto the same first Power 8, they adopted the old serial number, and now we have entries with incorrect hostnames, IPs etc, because of the serial numbers. The server to service relationships are also adopted causing incorrect service maps

 

Let me break it down into a bit more detail, e.g. using the server called ucmodc2 and twebap1a as an example:

 

  1. Starting point - AIX LPAR (Logical Partition / Virtual server) called ucmodc2 is located on P8 with serial number 7806A88. It’s partition ID is 16. The service now serial number for CI entry ucmodc2 is thus 7806A88_16.

 

  1. Moved AIX LPAR ucmodc2 to a new physical hosting P8 with serial number 7806A68, partition ID 5. A new CI entry is created for ucmodc2. Thus there are now 2 entries for ucmodc2. One with serial number 7806A88_16 (old with service mapping), and a new entry (no service mapping) for serial number 7806A68

 

  1. We complete firmware upgrades on the P8 with serial number 7806A88 (where ucmodc2 was originally), and we need to move other AIX LPARs to this serial number to perform upgrades on the next P8 (7806A78). At this point, twebap1a is located on 7806A78 partition ID of 16, hence giving it the serial number 7806A78

 

  1. I now move the AIX LPAR called twebap1a to P8 7806A88. It lands on 7806A88 with partition ID 16, and thus gets the serial number 7806A88_16, which is the old serial number for ucmodc2. And the CI entry is now confused:

  

twebap1a now has the CI entry details for the old CI entry of ucmodc2 (due to having the same serial number. The service mapping is incorrect, the IP is incorrect etc:

 

  1. End-state - ucmodc2 is currently running on 7806A68_5(orphan CI entry), twebap1a is running on 7806A88_16 (CI merged with the old entry for ucmodc2), and the old twebap1a entry (on old serial number of 7806A78_16) is now a duplicate entry (but does have the correct mappings and info).
1 ACCEPTED SOLUTION

Thomas,

I've seen this behaviour before with another client except it was more overdone then this in that it just had the serial number of the frame it was on with nothing concatenated on the end for the slot. My suggestion would be to modify the identification rule for AIX server and remove the serial number lookup at all from it. In this case what I am hearing is that name is actually what should be looked at for each of these regardless of what frame they happen to be on at that point so you can probably set name to the first order to run. The only reason why I recommend deactivating the serial number is that at some point you will have a new server that hasn't been discovered before that will contain a serial number that already exists on an older item (maybe it's moved frames) and you will end up updating that one instead of creating a new CI.

I hope this helps.

View solution in original post

5 REPLIES 5

Ashutosh Munot1
Kilo Patron
Kilo Patron

Hi,

@Patrick DeCarlo  @doug.schulze 

Thanks,
Ashutosh

Ievgen Galych
Tera Contributor

Hello Thomas,

AIX Server class has 4 Identification Rules:

  1. Serial Number, Serial Number Type - Priority 100
  2. Serial Number - Priority 200
  3. Name - Priority 300
  4. IP Address, MAC Address - Priority 400

These rules should be matched one by one (based on priority) to avoid duplicates. In your case serial number is not a best way for identification. As an option you can set Name attribute higher priority (50) and Name will be matched first and AIX Servers will not adopt the old serial numbers. 

For this option you should manage Names appropriately.

You can merge existing duplicates and related relationships using de-duplication task and remediation process suggested in it. Please let me know if you have any additional questions.

Thanks

 

Thomas Vorster
Tera Guru

Hi Ievgen.

Thank you so much for your response. It is highly appreciated!

The issue here is not really related to actual duplicates being created though and cannot be managed through de-duplication. Let's call it "false" dulpication for want of a better word.

Will the promotion in priority for the name attribute address the following scenarios?

When an AIX server is first discovered the name is unique and the serial number is based on the uderpinning hardware and is concatenated with the Partition ID allocated to the LPAR on the frame.

I.e. scenario: Server AIXLAB is installed on IBM P8 frame with chassis serial 12345 and is allocated partition ID 10 on the P8 frame. Discovery will create the CI record for AIXLAB with Serial number: 12345_10.

When the server AIXLAB is moved from P8 frame to P9 frame with chassis serial 67890 the name remains a constant but the AIXLAB server Partition ID allocated on the new frame could change as well. Let's say the new partition ID is 20...

So now Discovery will find AIXLAB with serial 67890 allocated to Partition ID 20 and will insert a new orphan server record in the CMDB with serial number 6789_20 CMDB. It is deemed orphaned in our case as it is not related to a service CI at that point in time. It is not duplicated in the true sense of the word from an IRE perspective.

next scenario: Now let's say there is a server called AIXLAB2 and is migrated from a P9 to the P8 chassis with serial 12345 and is allocated the partition ID 10. Then AIXLAB2 will be discovered with serial number 12345_10 and the CI record is updated from AIXLAB to AIXLAB2. Thus the update is valid from an IRE perspective, but the server record is now misrepresented and the relationship to downstream service CI that were associated with AIXLAB is now related to AIXLAB2 and is not correct.

Thanks and regards

Thomas

Hi Thomas,

Regarding the scenarios I can assume the following behavior and possible reason:

I.e. scenario: NOT expected behavior with OOTB Identification Rules. Name attribute Promotion will NOT help.

"So now Discovery will find AIXLAB with serial 67890 allocated to Partition ID 20 and will insert a new orphan server record in the CMDB with serial number 6789_20 CMDB"

Record should not be inserted since Name attribute should be used (priority 300) and record should be updated.

I assume that Identification Rules were customized for Hardware Class (AIX class extends Hardware->Computer->Server->UNIX Server and uses Identification Rules defined for Hardware Class). 

next scenario: expected behavior with OOTB Identification Rules. Name attribute Promotion will help.

So, I think need to check Identification Rules for Hardware or any custom rules for child classes. I assume Name attribute is not used now. You need to create special Identification Rules for AIX server with high priority Name attribute in order to not update rules for Hardware class and to not affect other classes. 

I agree that your situation is not really duplication but "false" duplication which is even worst. De-duplication tasks will not be created now since IRE doesn't use Name attribute for this class, I assume.

Thanks,

Ievgen