Wednesday, September 6, 2023

ChatGPT actually gave some useful suggestions!


ChatGPT actually gave some useful suggestions!


Thought I'd share them here.

Physical impacts can cause a range of issues, from visible damage to subtle problems that only manifest under specific conditions. Here's a systematic approach to surveying the motherboard for damage:


1. **Visual Inspection**:

   - **Magnifying Glass or Loupe**: Use a magnifying glass or a loupe to inspect the motherboard closely. Look for any visible cracks, broken traces, or damaged components.

   - **Check Connectors**: Ensure that all connectors, including the PCIe slots, are intact and not bent or broken.

   - **Inspect Solder Joints**: Look for any solder joints that might have cracked or become loose due to the impact.

   - **Check for Debris**: Ensure there's no debris or foreign objects lodged between components or traces.


2. **Physical Inspection**:

   - **Press Gently**: Gently press down on various parts of the motherboard to see if there's any unusual movement, which could indicate a broken stand-off or mounting point.

   - **Check the Back**: Turn the motherboard over and inspect the back for any signs of damage or unusual marks.


3. **Functional Testing**:

   - **Power On**: If the visual and physical inspections don't reveal any issues, consider powering on the server. Listen for any unusual sounds and observe any irregular LED patterns.

   - **BIOS/UEFI Check**: Access the BIOS/UEFI and check for any hardware errors or warnings. Some systems run a self-test during startup and might report issues.

   - **PCIe Slot Test**: Since the impact dislodged a card from the PCIe slot, test that specific slot with a known working card to ensure it's still functional. Also, test the previously dislodged card in another slot or system to ensure the card itself isn't damaged.

   - **Stress Testing**: Use software tools to stress-test the system. Tools like Prime95, Memtest86, and FurMark can help identify issues with the CPU, memory, and GPU, respectively.


4. **Check Peripheral Components**:

   - **RAM**: Ensure that all RAM modules are seated correctly. Consider running a memory test to ensure they're functioning correctly.

   - **Storage**: Check that all storage devices (HDDs, SSDs) are securely connected and are recognized by the system.

   - **Cooling**: Ensure that all fans are operational and that heatsinks are securely attached.


5. **External Inspection**:

   - **Case and Chassis**: Check the server case and chassis for any signs of deformation or damage. Ensure that the motherboard is securely and correctly mounted within the case.


6. **Seek Expert Assistance**:

   - If you're unsure about any findings or if the server behaves erratically, consider seeking assistance from a professional or a specialized repair service. They might have specialized equipment, like multimeters or oscilloscopes, to diagnose issues more accurately.


7. **Document Everything**:

   - Take photos of any visible damage or irregularities. This can be useful for insurance claims, future reference, or if you decide to seek professional assistance.


Remember, even if the server appears to function normally after the impact, there could be latent issues that might manifest later. It's essential to be thorough in your inspection and testing to ensure the server's reliability and longevity.


Tuesday, September 5, 2023

ML350 G9, the continuing saga.

 ML350 G9, the continuing saga.


Part I: received the server, box was pretty 'bashed up'. 

The case was pretty 'bashed up', it had had a hard impact into the power-supplies (probably used to rest the case on the ground, by the delivery guys).


Also the PCIe storage card (for the tapedrive and the cd-rom drive) had 'jumped' out of the PCIe slot. Not good signs. I repaired the power board (because the power supplies would not be recognised, in the meantime I had a new power-board on the way ($20).

I've since replaced the power-board too, no luck so far. The same error keeps popping up. It's about an EFUSE (20h), but I have no idea where that is, I suspect it might be protecting the PCIe slots (maybe some of the pins have shorted?) but I have no idea where to look.
A new motherboard is now on order (~$100, these older parts are getting quite cheap).

According to this post, it could be the PSUs, but they give a 'green light' when plugged in: https://community.hpe.com/t5/proliant-servers-ml-dl-sl/error-power-on-fault-system-board-aux-main-efuse-regulator-1-20h/td-p/7181745

So: Motherboard first, then some 'flex' power supplies. Let's see where this goes.

In the meantime, I also have a storj.io node now. I've already 'made' $0.07

In other news, also expanded my NAS by 8Tbyte, as I am now running overseerr and people can request stuff.

Just to get it all linked back to one place, here is the link for the HPE forums with the same problem (no resulution): https://community.hpe.com/t5/proliant-servers-ml-dl-sl/ml350-gen9-not-booting-with-critical-error-aux-main-efuse/m-p/7180208/thread-id/180199 
And my own post on Reddit describing my 'pains' with the server board: https://www.reddit.com/r/homelab/comments/168o7ib/help_me_resurrect_my_ml350_g9/




Monday, August 21, 2023

ML350G9, the 'final' server.

 Finally, have found my 'dream' server.

HPE ML350G9, capable of carrying 6 modules containing 4x3.5" or 8x3.5" drives. So 24xLFF or 48xSFF drives. I'm using zfs with SLOG/ZIL for data security. I have some 5x 8Tbyte QVO Samsung disks.


It can take 2x E5 v4 processors. In my case this would be 2x 2630L v4 10 core processors at 55W TDP. For now I intend to add:

  1. 128Gbytes of LRDIMM ECC memory (power saving and error correcting)
  2. 8/16 Gbytes of NVDIMM for the zfs SLOG/ZIL

  1. 2 port 10Gbit SFP+ ethernet card PCIe x8
  2. 2 port 56Gbit QSFP+ infiniband card PCIe x8
  3. Fujitsu IB mode SAS 12G card PCIe x8
  4. PCIe switch card with 4x 1Tbyte NVMe storage (cache) PCIe x8
  5. PCIe2NVMe card for the boot drive (2tbyte NVMe), hoping I can boot from it. PCIe x4
  6. SAS Expander (12G)  (no PCIe lanes, just power)
  7. Nvidia Quadro P2000 (for transcoding) multiple streams possible. No external supply needed, limit 75W. PCIe x16
It will be running Debian 11 (due to Infiniband drivers not being available for Debian 12). Along with docker/k3s (not decided yet).





Tuesday, June 20, 2023

The challenge of charging Tesla packs.

The problem:

When connecting LiPo cells in parralel, massive equalisation currents can flow, especially if the voltage differs significantly. https://www.rcgroups.com/forums/showthread.php?2297114-Equalizing-two-LiPo-s
It's been a bit of a 'pain' to charge 16 packs to the same voltage. Once I got to the 16th pack, the 1st one was out of whack again. 

The solution:

Instead of using resistors, I've come up with a different idea, using diodes.

The diodes will stop the cells discharging into each other, but will allow charge to in from the charger. There will a slight voltage drop from the diodes, but I can turn the power supply up to match the drop in voltage. To compensate from the drop in voltage, I have ordered Shottky diodes, which should have a lower drop across them.

I'll let you know how it goes. There have been several suggestions that diodes are not very reliable, and might fail-shorted. If they do, they will most likely burn out, like a fuse.

Friday, April 14, 2023

The storage 'endgame'.

 I've been 'playing' with my NAS options. So far:

  • The original Xeon E5-2630L v4 NAS. 200W, multiple 10G interfaces (built in 2017/2018)
  • Zimaboard - 2x 1Gbit Network ports, 2 SATA ports, PCIe x4
  • TMM Lenovo Thinkcentre M900 1xSATA, 2xNVMe PCIe x16
  • Topton NAS board, 6xSATA, 2xNVMe, 4x 2.5Gbit ethernet
  • Framework laptop motherboard with a+e 2230 to m.2 2280 converter and 2280 m2. 8x SATA ports: https://nl.aliexpress.com/item/1005004417694518.html 
All of these have some kind of 'severe' limitations in one way or another, be it form-factor, the lack of a real 'case' or just being plain ineffficient. Mostly, on everything except the E5, I'm 'missing' bandwidth (PCIe lanes).

Back the original NAS, but with a 'twist'. 
Storage is going to be SATA SSDs, and for performance some T4510 Intel SSDs attached directly to a PCIe switch. That should hopefully keep power low, as I do not have a real SAS controller for those disks. It should also keep performance really high.

I'll be using a backplane from Supermicro to house all the SATA drives, hopefully. But using a SAS controller with only 4/8 ports should keep power-usage down.


The SAS expander in more detail:




I'm going to be making something similar to this, but then for 2.5" drives and specifically for this setup, I hope: https://www.thingiverse.com/thing:5803558

Maybe I can even 'reduce' the backplane to a 10" format, making it suitable for a 10" rack. Or maybe it can be used 'sideways'