LLMs and coding

Author

Marie-Hélène Burle

For the past few years, large language models have been everywhere, but not everyone has the same level of knowledge and experience with them. So let’s start from the basics. If you have stayed clear of the topic, whether due to ethical concerns, personal reasons, or lack of interest, this section aims to catch you up.

What are LLMs?

Large language models (LLMs) are a type of generative deep learning models based on natural language processing (NLP) and capable of generating human and programming languages.

They do so by predicting the most likely token (word or code element) in the sequence. They are, at heart, probabilistic. This means that the same question asked several times to the same model will lead to slightly different answers. This is in contrast to classic computer programs which are deterministic (the same code run several times under the same conditions leads to the same results).

Modified versions of LLMs are capable of generating images, videos, and sounds.

LLMs and ethics

There are numerous ethical concerns around the topic of AI:

A lot of the data used to train large language models was obtained via copyright infringement.
Training and using models requires a lot of electricity and the construction of large data centres which produce a lot of heat, so AI comes with a high environmental cost.
Data centres are sometimes built in developing countries, putting a burden on their infrastructures and thus the local populations.
There are many justified concerns that AI will put a lot of people out of work and disrupt societies.
Training some types of models requires huge amounts of processed data (e.g. labelled data: pictures associated with labels describing what is in them). The labelling is done by people in developing countries that are underpaid.
AI is so transformative that a new arms race has started between the US and China.
Some AI can be used to automate drones and killing devices.
Some people worry that at some point in the future extremely intelligent AI might put humanity at risk through lack of alignment.
AI is already used for spamming, scams, disinformation, and deepfakes abuse. There are concerns that they could be used in the future for terrorism.
Models (when not open-source) are owned by private companies and this raises questions about concentration of power.
LLMs hallucinate, make mistakes, and can be biased.
LLM-based “friends” or “romantic partners” can have all sorts of psychological impacts on people, particularly young ones.

That said, these models are extremely useful for research and refusal by some people to use them could cause them to fall behind in an unfair competition.

Whatever one thinks about AI and its problems, I personally think that it is important to be as educated as possible about the problems and solutions they offer and to know how to use them.

LLMs and code

While LLMs are capable of producing human language, they are particularly good at writing code.

For people with no coding background and who could greatly benefit from coding for their research, LLMs are particularly useful which is why I decided to bring them to this course for the first time this year.

In previous years, people attending the introduction to Python course at DHSI encountered several types of problems and frustrations:

It is impossible to learn enough Python in a week to be able to use it for research: reaching a level of competency that can be useful takes months.
Programming is unforgiving in that any slight typo or syntax error in the code will break it. People in previous years would get frustrated by code that was not working due to a suite of small mistakes. The nitty-gritty of fixing syntax would take away the joy of writing code.

LLMs solve all this. Once you know a little bit of Python and once you know how to use these models, you become able to actually produce code that can be useful for your research.

Which LLM to chose?

This is a purely personal decision. Moreover, the field is fast evolving.

With most companies, you will have to create an account or doing so will give you access to more features. Some services are free, others require a subscription. Some universities might have subscriptions with some companies so you might want to check whether your institution does. Some services (e.g. GitHub Copilot) are free for instructors.

Usually, what you get for free involves some of the following (but it is company dependent and can change over time):

older and less powerful models,
limited number of queries per month, week, or day,
limited functionality (e.g. you might only be able to upload a few documents for the model to process),
your queries might be used by the companies to train their next models.

While subscriptions tend to provide some of the following:

latest models which perform better since things are evolving fast,
unlimited usage,
full functionality (e.g. being able to upload unlimited numbers of documents that the model can use to better help you),
no usage of your queries.

Different models have different styles and some perform better than others in some contexts, but the competition is tight. Some models have open-source weights (e.g. LLaMA and DeepSeek) while most are for the moment closed-source. Open-source weights mean that anybody can take those trained weights and use them to build their own model.

As of 2025, popular options include various versions of the following models (in random order and non-exhaustively):

Companies such as Perplexity AI also provide search engines that can use a multitude of the above models under the hood to answer your queries.

How to use an LLM for coding?

LLMs can be used for a variety of tasks. In particular, they can:

auto-complete code,
explain programming concepts,
explain code,
write code,
debug code,
write software tests,
write entire programs, websites, and applications (vibe coding).

In this course, we will use LLMs to write and understand Python code.