Lesson 7 has three parts A, B, C which can be completed in any order.
So far, we have been using strings (items of str
type) only in simple ways. In this lesson we show how to manipulate strings: how to take them apart, combine them, and how to view the individual characters that make up a string.
What is a string?
All data stored on a computer is ultimately stored as a sequence of 0s and 1s. This includes text, digital books, images, songs, videos, and "executable files" like games and applications. Strings, an example of text data, are stored in the following way:
- a string is a sequence of characters (e.g., the string "Hello, World!" contains 13 characters including letters like "H", "e" and punctuation like " ", "!"
- each character is actually represented by a number (e.g., "H" is represented by the number 72; this is its ASCII/Unicode value)
(Numbers are stored internally in a 0-1 binary format.)
Manipulating strings as sequences of characters: S[]
In order to manipulate a string, we need to be able to access the individual characters that make up a string. In Python this is done in the following way: for a string S
and an integer index
, the notation
S[index]returns the character of
S
at position index
. By convention the string starts at index 0: so S[0]
is the first character, S[1]
is the second character, etc. In "Hello, World!" the list of characters is:
Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 Char.: H e l l o , W o r l d !Note that the character at index 6 is a space.
In many other programming languages, there is a separate type for characters. In Python, characters are the same as length-1 strings, so their type is str . |
Finding the number of characters in a string: len
To get the number of characters in a string, we use the Python function len
. For example, len("Hello, World!")
is 13.
S
?len(S)
gives you the total number of characters in the string, since it starts with index 0
, the last character is at index len(S)-1
.Here is an example of using len
and []
, the two tools we just introduced.
Cutting strings: S[:]
Cutting out some part of a string gives you a substring. For example, the strings "eat" and "ted" are substrings of "repeated". To extract a substring in Python, we use the syntax
S[firstIndex:tailIndex]to get the substring starting at index firstIndex and ending at tailIndex-1. Try to figure out the output of the following code before you run it.
Note that in taking substrings, firstIndex is included, while the tailIndex is not included. This is a common source of errors. However, it has some nice effects. For example, because of this choice, the length of the substring S[i:j] is always j-i . This convention is often depicted like a ruler: |
Pasting strings: +
We all know that 1+2=3. With strings, instead we get the following result:
As you can see, the effect of S+T
is to create a new string that starts with S
and has T
immediately afterwards. This string-gluing operation is also called concatenation.
If you want to concatenate numbers, you need to convert them to str first. Otherwise you will get one of two errors, depending on the order you tried. Run this program to see the errors that can occur.
Here is a correct example: the str() function converts the number to a string before concatenation.
|
As we mentioned in Lesson 4, you can multiply strings and integers: S * n
is short for S + S + ... + S
.
Character codes: ord
, chr
As we mentioned in the introduction of this lesson, your computer actually represents every character as a number. Which number corresponds to which character? Generally, it can depend on which encoding your computer uses, but nearly all modern computers have a standard set of characters for the numbers between 32 and 255. Here is a list of the characters with numbers between 32 and 127:
ord: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 chr: ! " # $ % & ' ( ) * + , - . / ord: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 chr: 0 1 2 3 4 5 6 7 8 9 : ; < = > ? ord: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 chr: @ A B C D E F G H I J K L M N O ord: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 chr: P Q R S T U V W X Y Z [ \ ] ^ _ ord: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 chr: ` a b c d e f g h i j k l m n o ord: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 chr: p q r s t u v w x y z { | } ~Later, in lesson 8, you will write a program to generate this table.
It is not so useful to personally memorize the entire table, but there are some useful facts to remember:
- the lowercase characters a, b, c, ..., z have consecutive character codes
- the uppercase characters A, B, C, ..., Z have consecutive character codes
- the digit characters 0, 1, 2, ..., 9 have consecutive character codes
Character 32 is a space, while character 127 is one of several special "control" characters. Some useful control characters are 9, which is tab, and 10 and 13 which are used for newlines.
In Python, you can convert a character into its corresponding numerical code using the ord
function. The chr
function does the reverse: it takes a number as input, and returns the character with that code.
Some systems only support printable characters between 32 and 127; others have printable characters up to 255 or 65535; in Unicode there are hundreds of thousands of characters. You can read more about the history here or here. |
Here are two more exercises to finish the lesson.
Continue on to the next lesson!