Zhonghao He
Hi! I am Zhonghao, a master’s student at the University of Cambridge.
My research interests started with interpretability and AI alignment. On the one hand it’s about understanding the machines in front of us; on the other hand it’s about effective cooperation between humans and machines.
Ultimately, I want to build AIs for human excellence (or “arete”, in Greek conception), which requires both sound societal mechanism design and epistemic tools with which individuals can better exercise their agency.
A lot of effort is required to operationalize those concepts, but currently I am actively exploring the following topics: mechanistic interpretability, computational neuroscience, AI ethics, alignment, political philosophy, virtue ethics, multi-agent systems, AI for science, and human-computer interface, and collective intelligence.
Research
You may read my published work on Google Scholar.
My current “Hamming Problems” (the most important problems I can work on) are:
- How does knowledge diversity get lost from training and using LLM? (an ongoing research paper aiming for ICLR)
- How do we better understand neural networks with mechanistic interpretability, information theory, and neuroscience? (a survey here)
- What knowledge assistant helps humans to think better? (exploring this topic in this doc)
I strive to become a “full stack researcher,” which, in my definition, is to have technical sophistication (experiments, mathematics, and engineering) and deep engagements with problems (technical and societal ones). Building technologies for human betterment is hard, and let’s get this one right.
Here is a list of research projects I am interested in working on.
Contacts
I love free-flow research conversations! You may simply book a quick call via Calendly. (I blocked deep work, sleep, and private time, so don’t worry!). You may drop me an email at zh378@cam.ac.uk