@mikachip
Playing with Llama 2 this morning - several interesting results.
Overall conclusion: probably the best model to run locally available now. Haven't fully tested but my hypothesis is that a 4-bit quantised of the 13B version is going to be the sweet spot for local inference for now.
A few interesting results below...