I just finished my first year as a Software Performance Engineer at Intel Corp. at Hillsboro OR. My main project has been the performance analysis of MSFT SQL Big data cluster, on which we published a whitepaper recently. This product is a big step up in their popular MSFT SQL Server product line and I have been excited and grateful for the opportunity to contribute.
In this post, I thought of sharing my experience as a Software Perf. Engineer
1) Hardware/Software optimization is challenging and time-consuming
There are just way too many knobs to control, in both the software and hardware layer. Based on the guidance from our partners, we quickly settled on the problem statement and the software configuration. Regarding hardware, we started with the best we have and kept improving the components that we observed as our perf. bottlenecks. It is important to keep the customer in mind, what their challenges and pain-points would be, their potential technical debt (coming from the previous product) and so on.
2) Hardware/Software optimization is highly rewarding
I realized this after finishing our whitepaper. A customer reading it will get a headstart on the best-known methods/configuration (BKM/BKC) to setup the software stack and the recommended hardware stack for optimal performance. What took us multiple man-months can be achieved by them in a few days/weeks.
3) Automation frees up the mind for analysis and insights
The more time/energy we spent in manual work, the lesser the time/energy we have to analyze the results and collect valuable insights. In addition to expt. run and data collection, I automated significant chunk of the data analysis as well, to the extent of the spitting out the results in the “exact” spreadsheet format that I need, significant freeing up my mind (and making it ergo. friendly)
4) BKM documents make things systematic
I developed a practice of writing the BKM documents as we go along and keeping it on Git (along with the config. files). That way it was easy to trace the steps back if something goes wrong. Also, it became super-easy to train new team members, and get everyone on the “same page”. Also, given we iterate through it every day, the end result is a super polished document and we feel comfortable sharing it easily with the stakeholders. Along with Automation, this significantly reduced the unforced errors in our experiments.
5) Developing a solid relation with software partners takes years of partnership and trust – The quality of analysis and results clearly show it.
Some of the challenges I experienced as a software performance engineer
- Prioritizing learning concepts while having results to show regularly – It takes a while to understand a new software stack or benchmark. But to keep a cadence with our partners/stakeholders, we need to show results regularly, even if they are not perfect or have known unknowns. Coming from an academic background, I am more used to sharing results only when I have everything “figured” out. It took a few months to get used to such ‘work-in-progress’ presentations. Also presenting ‘current results’ (in a spreadsheet) vs. polished presentations (in a slideshow).
- Prioritizing projects and tasks
Understanding the priority of our organization vs. our partners/stakeholders. Also balancing between deliverables vs. learning. - Mastering a wide range of skills in a relatively short time
Our role involves a wide range of skills – setting up (computer) systems in our lab, configuring hardware/software, integrating new hardware (say accelerators), choosing and running workloads, data collection and analysis, and sharing the results. This was somewhat broader than a typical software engineering/research role I have been in. It took a while to get used to it, but I love it!