SIPPING WG D. Bryan Internet-Draft College of William and Mary Expires: April 7, 2004 C. Jennings Cisco Systems October 8, 2003 Telephony tones using MIDI in SIP draft-bryan-sipping-midi-00 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 7, 2004. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document describes conventions for using MIDI to generate tones on SIP UAs. This does not define any changes to SIP but describes how to use the existing Alert-Info header to play tones that can be used to indicate specialized PSTN tones such as the ringback tones from various countries. The tones are described using MIDI which results in a compact representation that is simple for a UA to generate and easy to render into audio. Bryan & Jennings Expires April 7, 2004 [Page 1] Internet-Draft MIDI tones in SIP October 2003 Table of Contents 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. MIDI Background . . . . . . . . . . . . . . . . . . . . . . . 4 5. MIDI to Represent Tones . . . . . . . . . . . . . . . . . . . 5 6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 6.1 Signaling to Play Tones . . . . . . . . . . . . . . . . . . . 7 6.2 US and ITU Tones . . . . . . . . . . . . . . . . . . . . . . . 9 7. UA behavior . . . . . . . . . . . . . . . . . . . . . . . . . 13 8. Required MIDI support . . . . . . . . . . . . . . . . . . . . 13 9. Constructing Telephone Tones . . . . . . . . . . . . . . . . . 14 10. Security Considerations . . . . . . . . . . . . . . . . . . . 17 11. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 17 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 13. Appendix A: Country Reference . . . . . . . . . . . . . . . . 18 Normative References . . . . . . . . . . . . . . . . . . . . . 62 Informative References . . . . . . . . . . . . . . . . . . . . 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 63 Intellectual Property and Copyright Statements . . . . . . . . 64 Bryan & Jennings Expires April 7, 2004 [Page 2] Internet-Draft MIDI tones in SIP October 2003 1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. 2. Introduction Countries and regions within countries use different tones to represent information related to the state and progress of telephone calls. Different tones are generated to alert the user the phone is ready to make a call, is attempting to ring a remote party, that the remote party is busy, etc.. When placing a call to a remote party with the PSTN, the tones of the destination country are usually heard. Most work at the IETF today is intended to make sure the user interface experience of users is localized to the form they understand and prefer to use. This work is just the opposite, it provides a way to replicate the PSTN experience of phoning a foreign country and hearing tones you may not have heard before and have no way to understand. SIP telephones have traditionally generated these tones locally, and the tones heard are the same no matter where the called resides. While many people feel that this localized and familiar behavior is the correct behavior, there have been discussions of situations where it is desirable to maintain the behavior of the PSTN and reproduce the local tones of the destination. Additionally, there has been no standard way to encode the tones used in a particular country. As the reach of SIP endpoints increases, localization of these devices such that they exhibit correct behavior for the region in which they are installed has become increasingly important. This document presents a method to represent these tones using an existing standard, and presents a method for conveying this information using SIP. 3. Discussion The basic mechanism is for the UAS to indicate in a response the tone that it wants to be played by including the MIDI tone in the response message. The MIDI tone description is carried in the a body in the SIP message and the Alert-Info header contains a cid URL that points at the body. The UAS can indicate in a SIP response, such as a 180, that it wishes a tone to be played by including a body with a MIDI tone and setting Bryan & Jennings Expires April 7, 2004 [Page 3] Internet-Draft MIDI tones in SIP October 2003 the Alert-Info header to have a cid URL that points at the body. This would typically be used in a 180 to indicate the ringtone to play but it could also be sent in a response like a 486 to indicate what sort of busy tone to play. It could also be used in an UPDATE message. 4. MIDI Background MIDI (Musical Instrument Digital Interface) is a protocol defined by the MIDI Manufacturers Association for use by computerized musical instruments such as synthesizers and keyboards and to represent musical instrument scores on computer systems. MIDI is a very complete and expressive protocol, capable of representing virtually everything required for a complete performance on an instrument or instruments. MIDI has also been used to convey other signaling information related to musical performances, such as control of lighting and changing settings on sound effect and public address equipment. MIDI has been used extensively for many years (since 1983) and is a stable and well established protocol. MIDI is currently used in telephony for representation of ringtones for mobile phones. While the level of expressiveness available is far richer than needed, the protocol can also be used to represent the various tones that are generated by phones in each country. MIDI is an event based system. Events are presented in linear order and interpreted by the playback device in that order. For example, one MIDI event might indicate that a note should begin to play, and the following event could indicate that that note should be released and stop playing. Each event is prefixed by a delta time, which is used to indicate the time that should elapse before that event occurs. If the note mentioned above is to be sounded for 1 second, the release note event would be prefixed by a delta time of 1 second, indicating that the note should be allowed to play for one second before being released. MIDI, as a language for musicians, is oriented towards notes and beats, rather than the frequencies and times more common in the telecommunications and networking worlds. MIDI does, however, support using absolute times in place of beats, as discussed on the Open Issues section of this draft. Delta times, discussed above, are typically represented as counts and beats, and notes are represented as integers representing a particular musical note in the chromatic (western) scale, and not a given frequency. In addition to defining the events that can be generated, MIDI defines a format for files to store songs as sequences of such events. The regular MIDI protocol is often used exclusive of this file format, for example to allow one musical instrument to control Bryan & Jennings Expires April 7, 2004 [Page 4] Internet-Draft MIDI tones in SIP October 2003 another on the fly, but we will use the file format as we wish to represent a pre-defined sequence of musical events. The General MIDI standard was added in 1991 to further standardize some features that were not fully specified in the original MIDI spec. Among the things that General MIDI addresses is a numbering scheme for patches, or types of musical instrument sounds that should be used (Grand Piano is sound 1, Sawtooth wave is 82, etc.). Unfortunately for telecommunications, no sine wave tone is specified. General MIDI also suggests default values for most parameters in MIDI. We take advantage of these suggested defaults in this document, as they are widely used in the industry. 5. MIDI to Represent Tones MIDI offers considerably greater expressiveness that is required for the generation of simple tones, but a number of MIDI events are required to properly reproduce the tones generated by a typical endpoint. MIDI represents start and stop events for particular musical notes, but the tones used in telephony signaling are often not whole musical notes, but rather tones of arbitrary frequency. To properly represent these arbitrary pitches, one must precede the event representing the start of the tone with a bend event that alters the frequency of the note away from the chromatic musical pitch to an arbitrary frequency prior to it being "struck". This is commonly done in MIDI to represent bends as well as to represent musical notes for non-western music, in which the frequencies may not correspond to the chromatic musical notes. MIDI pitch bends (also called pitch wheel events) represent the degree that the note is changed as a 14 bit integer (0-16383), where the midpoint (8192) represents no pitch bend, and either extreme represents the maximum or minimum possible bend for a particular MIDI device. While the range of pitch bend these correspond to is not enforced by the protocol, a range of two half steps up and two half steps down (four half steps total range) is recommended by the General MIDI specification and used by most devices. A half step represents a change from one MIDI note to the next, so any arbitrary pitch can be obtained by bending up or down by less than one half step from some note. The 14 bits are represented in a 16 bit integer in MIDI files where the bytes 0 to 6 in the first half of the 16 bit integer represent bits 0 to 6 of the 14 bit values, and bits 9 to 15 in the second half of the 16 bit integer represent bits 7 to 13 of the 14 bit value. As such, the 16 bit representation of the midpoint is defined in hex as 0x0040, and a bend of one half step Bryan & Jennings Expires April 7, 2004 [Page 5] Internet-Draft MIDI tones in SIP October 2003 up or down are represented by 0x0060 and 0x0020 respectively. We show a worked example of calculating the pitch events needed to generate the arbitrary tones in a later section. MIDI can represent times either using beats, or using absolute times. While absolute times might seem to be the better choice for a technology driven application, the beat type system is more common in MIDI files. We will discuss the beat-based mechanism here, and leave discussion of the merits of one system or the other to the open issues section. The header for MIDI files includes data that specifies the time to be used. Two bytes in this header represent (in Big Endian) how many clocks there are in a quarter note of music. Later, one can specify time signature (such as 4/4 or 3/4) and speed, in beats per minute (bpm). General MIDI specifies that, unless modified, a signature of 4/4 and a speed of 120 bpm are to be assumed. This means that 120 quarter notes occur in a minute, or that a quarter note lasts for half a second. If a value of 500 clocks is assigned to one quarter note in the header, each clock used in the delta times in our file will thus represent 1ms of clock time, a convenient time unit for recreating telephony tones. With the time specified in the header, we can represent delta times elapsed between events as milliseconds, but we need to understand how MIDI represents delta times in the file. Delta times are represented as a Variable Length Quantity. This is a mechanism to allow numbers of arbitrary size to be represented with the length of the representation in the file increasing as the size of the number represented increases. The first seven bits (right justified) of a byte represent data, while the last bit is 0 if no more bytes of information are needed, and 1 if another byte is needed to represent the number. Numbers from 1-127 (2^7) can thus be represented using one byte, adding a second byte will allow numbers as large as 2^14, adding a third byte allows numbers as large as 2^21, etc. We show a worked example of calculating delta times as Variable Length Quantities in the Constructing Telephone Tones section. The tones we wish to generate are also composed of pure sinusoidal tones, whereas most music is generated using samples of actual musical instruments based on more complex wave forms. General MIDI offers pre-defined patches for some of the most common musical instruments, as well as sawtooth and square waves, but fails to offer a pre-defined patch for a sine wave. We discuss this further in the open issues section. Our recommendation is that sound 125 (0x7D), mapped to "Telephone Ring" in general MIDI, be interpreted as a sine wave. Bryan & Jennings Expires April 7, 2004 [Page 6] Internet-Draft MIDI tones in SIP October 2003 While developing tone files on a conventional synthesizer, one may wish to use sound 82 (0x52), mapped to "Sawtooth Wave" in general MIDI, since this sound most closely resembles a sine wave, and leads to tones that sound very close when playing back with a general MIDI sequencer. MIDI files consist of a header, providing basic information about the file (including the time information discussed above) and one or more tracks containing MIDI events. In our examples, we use only one track, and recommend this for users creating files. There is no standard mechanism defined in MIDI files to specify that the file should played repeatedly. For most tones representing phone sounds, the pattern is an infinitely repeating pattern, or should be played for a certain amount of time before switching to a different tone (for example from dialtone to off hook). Information about whether to repeat a file over and over, for a certain time, etc. is conveyed with attached signaling information, as specified in the Signaling to Play Tones section of this draft. If possible, users should attempt to create tone files that can be looped easily and are as short as possible. 6. Examples 6.1 Signaling to Play Tones The following section shows an example of the signaling for a typical ringback tone use. Alice calls a PSTN GW which sends back a 180 ringing response with a ringback tone description using MIDI. Alice Bob |(1) INVITE | |---------------------->| | | |(2) 180 | |<----------------------| Message 1 from Alice to Bob is: INVITE sip:bob@b.example.com SIP/2.0 Via: SIP/2.0/UDP a.example.com;branch=z9hG4bKnashds8 To: Bob From: Alice ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE Max-Forwards: 70 Date: Thu, 21 Feb 2002 13:02:03 GMT Contact: Bryan & Jennings Expires April 7, 2004 [Page 7] Internet-Draft MIDI tones in SIP October 2003 Content-Type: application/sdp Content-Length: 147 v=0 o=UserA 2890844526 2890844526 IN IP4 example.com s=Session SDP c=IN IP4 a.example.com t=0 0 m=audio 49172 RTP/AVP 0 a=rtpmap:0 PCMU/8000 If message 2 does not contain any bodies other than the MIDI body, then message from Bob to Alice is: SIP/2.0 180 Ringing Alert-Info: Via: SIP/2.0/UDP a.example.com;branch=z9hG4bKnashds8 To: Bob ;tag=a6c85cf From: Alice ;tag=1928301774 Call-ID: a84b4c76e66710 Contact: CSeq: 314159 INVITE Content-Type: audio/midi Content-Disposition: render;handling=optional Content-Length: XXX Content-ID: *********************** * BINARY BLOB of MIDI * *********************** If message 2 contains the MIDI body and SDP, then message 2 from Bob to Alice uses multipart mime and looks like: SIP/2.0 180 Ringing Alert-Info: Via: SIP/2.0/UDP a.example.com;branch=z9hG4bKnashds8 To: Bob ;tag=a6c85cf From: Alice ;tag=1928301774 Call-ID: a84b4c76e66710 Contact: CSeq: 314159 INVITE Content-Type: multipart/mixed;boundary=bound42 Content-Length: xxx --bound42 Content-Type: audio/midi Bryan & Jennings Expires April 7, 2004 [Page 8] Internet-Draft MIDI tones in SIP October 2003 Content-Disposition: render;handling=optional Content-Length: yyy Content-Transfer-Encoding: binary Content-ID: *********************** * BINARY BLOB of MIDI * *********************** --bound42 Content-Type: application/sdp Content-Disposition: session;handling=required Content-Length: zzz v=0 o=UserB 2890844526 2890844526 IN IP4 example.com s=Session SDP c=IN IP4 b.example.com t=0 0 m=audio 47172 RTP/AVP 0 a=rtpmap:0 PCMU/8000 --bound42 6.2 US and ITU Tones For detailed examples of how to construct a tone file, and exactly what each line in a MIDI file represents, see the section "Constructing Telephone Tones" later in this document. The following MIDI represents the standard US and ITU tones. U.S. Dialtone Signal - continuous 350 and 440Hz tones. (5 second duration used for example) Bryan & Jennings Expires April 7, 2004 [Page 9] Internet-Draft MIDI tones in SIP October 2003 +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 67 64 | The text MThd is the magic word indicating | | | this is a MIDI header. | | 00 00 00 06 | The length of the header data to follow is six | | | bytes. | | 00 01 | MIDI format 1. | | 00 01 | The file contains 1 Track. | | 01 F4 | 500 (0x01F4) clocks for a quarter note. | | 4D 54 72 6B | The text MTrk is the magic word indicating | | | this is a MIDI track. | | 00 00 00 1E | The length of the track data. | | 00 C0 7D | Change channel 0 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 C1 7D | Change channel 1 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 E1 5E 3E | Set the pitch bend for channel 1 at delta time | | | zero. | | 00 90 45 40 | Sound the first tone on channel 0. | | 00 91 41 40 | Sound the second tone on channel 1. | | A7 08 80 45 7F | After a delta time of 5.0 seconds, silence the | | | first tone. | | 00 91 41 7F | Silence the second tone. | | FF 2F 00 | End of Track magic word | +------------------+------------------------------------------------+ U.S. Ringing Signal - 440 and 480Hz tones, on for 2.0 seconds, off for 4.0 seconds. +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 67 64 | The text MThd is the magic word indicating | | | this is a MIDI header. | | 00 00 00 06 | The length of the header data to follow is six | | | bytes. | | 00 01 | MIDI format 1. | | 00 01 | The file contains 1 Track. | | 01 F4 | 500 (0x01F4) clocks for a quarter note. | | 4D 54 72 6B | The text MTrk is the magic word indicating | | | this is a MIDI track. | | 00 00 00 27 | The length of the track data. | | 00 C0 7D | Change channel 0 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 C1 7D | Change channel 1 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 E0 7C 4F | Set the pitch bend for channel 0 at delta time | | | zero. | | 00 90 46 40 | Sound the first tone on channel 0. | Bryan & Jennings Expires April 7, 2004 [Page 10] Internet-Draft MIDI tones in SIP October 2003 | 00 91 45 40 | Sound the second tone on channel 1. | | 8F 50 80 46 7F | After a delta time of 2.0 seconds, silence the | | | first tone. | | 00 81 45 7F | Silence the second tone. | | 9F A0 80 46 7F | After 4.0 seconds of silence, silence notes | | | again. | | 00 81 45 7F | Silence the second tone again. | | FF 2F 00 | End of Track magic word | +------------------+------------------------------------------------+ U.S. Busy Signal - 480 and 620Hz tones, on for 0.5 seconds, off for 0.5 seconds. +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 67 64 | The text MThd is the magic word indicating | | | this is a MIDI header. | | 00 00 00 06 | The length of the header data to follow is six | | | bytes. | | 00 01 | MIDI format 1. | | 00 01 | The file contains 1 Track. | | 01 F4 | 500 (0x01F4) clocks for a quarter note. | | 4D 54 72 6B | The text MTrk is the magic word indicating | | | this is a MIDI track. | | 00 00 00 2B | The length of the track data. | | 00 C0 7D | Change channel 0 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 C1 7D | Change channel 1 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 E0 7C 4F | Set the pitch bend for channel 0 at delta time | | | zero. | | 00 E1 78 3D | Set the pitch bend for channel 1 at delta time | | | zero. | | 00 90 46 40 | Sound the first tone on channel 0. | | 00 91 4B 40 | Sound the second tone on channel 1. | | 83 74 80 46 7F | After a delta time of 0.5 seconds, silence the | | | first tone. | | 00 81 4B 7F | Silence the second tone. | | 83 74 80 46 7F | After another 0.5 seconds of silence, silence | | | notes again. | | 00 81 4B 7F | Silence the second tone again. | | FF 2F 00 | End of Track magic word | +------------------+------------------------------------------------+ ITU Dialtone and Busy Signal - continuous 425Hz tone. (5 second duration used for example) Bryan & Jennings Expires April 7, 2004 [Page 11] Internet-Draft MIDI tones in SIP October 2003 +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 67 64 | The text MThd is the magic word indicating | | | this is a MIDI header. | | 00 00 00 06 | The length of the header data to follow is six | | | bytes. | | 00 01 | MIDI format 1. | | 00 01 | The file contains 1 Track. | | 01 F4 | 500 (0x01F4) clocks for a quarter note. | | 4D 54 72 6B | The text MTrk is the magic word indicating | | | this is a MIDI track. | | 00 00 00 13 | The length of the track data. | | 00 C0 7D | Change channel 0 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 E0 38 33 | Set the pitch bend for channel 0 at delta time | | | zero. | | 00 90 44 40 | Sound the first tone on channel 0. | | A7 08 80 44 7F | After a delta time of 5.0 seconds, silence the | | | tone. | | FF 2F 00 | End of Track magic word | +------------------+------------------------------------------------+ ITU Busy Signal - 425Hz tone, on for 0.67-1.5 seconds, off for 3-5 seconds (varies by location, 1.5 and 5 used in example) +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 67 64 | The text MThd is the magic word indicating | | | this is a MIDI header. | | 00 00 00 06 | The length of the header data to follow is six | | | bytes. | | 00 01 | MIDI format 1. | | 00 01 | The file contains 1 Track. | | 01 F4 | 500 (0x01F4) clocks for a quarter note. | | 4D 54 72 6B | The text MTrk is the magic word indicating | | | this is a MIDI track. | | 00 00 00 17 | The length of the track data. | | 00 C0 7D | Change channel 0 (C1) patch to "Telephone | | | Ringing" at delta time zero. | | 00 E0 38 33 | Set the pitch bend for channel 0 at delta time | | | zero. | | 00 90 44 40 | Sound the first tone on channel 0. | | 8b 5c 80 44 7F | After a delta time of 1.5 seconds, silence the | | | tone. | | A7 08 46 7F | After 5.0 seconds of silence, silence the tone | | | again. | | FF 2F 00 | End of Track magic word | +------------------+------------------------------------------------+ Bryan & Jennings Expires April 7, 2004 [Page 12] Internet-Draft MIDI tones in SIP October 2003 7. UA behavior A SIP UAC that can play MIDI tones SHOULD indicate that it supports the MIDI media type in the Accept header. 8. Required MIDI support As specified by MIDI, endpoints MUST assume 4/4 time and 120 bpm if no other time is specified. Users generating ring files MUST NOT specify an alternate time signature or tempo. Devices interpreting these files MAY correctly interpret time signature and tempo change events, but should reasonably expect that these events will not appear in files sent to them. Users creating files MAY specify a number of clocks per beat other than 500, and devices interpreting these files MUST correctly interpret the clocks per beat specification in the MIDI header file. A device interpreting a MIDI file to generate telephone tones MUST interpret the range of the pitch wheel to be four half-steps total (two half-steps up and two half-steps down). A device interpreting a MIDI file MUST interpret general MIDI sample number 125 ("Telephone Ringing Effect") to be a pure sinusoidal wave. A device interpreting a MIDI file MUST be able to interpret a format 1 MIDI file. It MAY interpret other MIDI types. Users generating tone files MUST use MIDI format 1. A device interpreting a MIDI file MUST be able to handle 1 track of data. It MAY accept more than one track. Users generating tone files MUST use a single track of data. Users generating tones SHOULD use a velocity of 0x40 (medium velocity) unless they require one tone to be louder than another. TODO They should use velocity 0xXX to represent XX db less, 0xXX to represent XX db less, and 0xXX to represent XX db less. Users generating tones should terminate them with velocity 0x7F (silence as quickly as possible) unless there is a reason to fade the note. Users generating tones MUST encode them so that they can be looped, if the tone is designed to be repeated infinitely. Users SHOULD make the sample as small as possible (for example, an alternating on-off pattern should only specify one each of on and off) to minimize the size of the file. Users SHOULD make the sample 5 seconds long in the event that the file is a continuous tone. Bryan & Jennings Expires April 7, 2004 [Page 13] Internet-Draft MIDI tones in SIP October 2003 9. Constructing Telephone Tones We present as an example a file to represent the US Busy signal. The US busy signal consists of 480Hz and 620Hz tones. These two tones are played simultaneously in a repeated pattern of 0.5 seconds of tone, followed by 0.5 seconds of silence. Since neither 480Hz nor 620Hz correspond to a standard musical pitch, both need to be pitch bent from a nearby musical tone to reach the appropriate frequency. To produce 480Hz, we bend up from note 70 (0x46), an A-sharp at approximately 466.164Hz, toward note 71, a B at approximately 493.883Hz. A MIDI pitch bend is represented by a 14bit integer, with values ranging from 0 to 16383. The midpoint of this range, 8192, represents that the note receives no pitch bend from the musical note. Assuming the suggested standard where a value of 0 represents a bend of 2 half-steps down, and a value of 16383 represents a bend up of 2 half-steps up, we calculate our pitch value by linear interpolation. The difference between the two notes is one half-step and 27.719Hz. One half- step is 4096 "units" of pitch bend, so each unit corresponds to 0.00677Hz. 480Hz requires us to bend up 13.836Hz from note 70 at 466.164Hz, or 2044 units. Since 8192 represents the center of our pitch bend (half of a 14 bit value), we add 2044 to this to bend up. Our pitch bend thus needs to send a value of 10236. In binary, this is 10011111111100. We map this to a 16 bit integer by placing the first 7 bits in the first half of the 16 bit integer, padded with a 0, and the last 7 bits in the second half of the 16 bit integer, padded with a 0, as shown below: 10011111111100 => 1001111 1111100 => 01001111 01111100 We reverse these to get the proper order for the MIDI message format, and we have: 01111100 01001111 => 0x7C4F Similarly, we can calculate a pitch of 620Hz. Note 74 is 587.330Hz, and note 75 is 622.254. We bend down 2.254Hz from note 75. The difference between the notes is 34.924Hz, so each unit of pitch bend corresponds to 0.00853Hz. We thus need to bend down 264 units from note 75 (0x4B). With 8192 being center, we need a pitch value of 7928. Calculating as the 16 bit value as above, this returns 0x783D as the pitch bend required. We now know the notes and pitch bends we need, but need to calculate Bryan & Jennings Expires April 7, 2004 [Page 14] Internet-Draft MIDI tones in SIP October 2003 the time required. We have set up the file as defined above to use 500 clocks to represent a quarter note, and assume 4/4 time with 120 bpm. This means a quarter note is one beat, and there are 120 beats per minute. Thus, there are 120 quarter notes in a minute, and each is 500 clocks long. We thus have 60000 clocks in a minute, or 1000 clocks per second. Each clock is thus 1ms. Since a busy signal alternates 0.5 seconds on and 0.5 seconds off, we need to use 500 clocks between turning the notes on and turning them off again. We need to represent 500 as a Variable Length Quantity. One can find utilities to convert this online, but we will convert this by hand.: 500 can be represented in binary as 111110100. Breaking this into 7 bit chunks, and padding with leading zeros, we have 0000011 1110100. We will need two bytes to represent this value. The first will begin with a 1, indicating that a second byte follows, and the second will begin with a 0, indicating that no more bytes follow. Thus, we have the following two bytes: 10000011 01110100 => 0x83 0x74. This will be used as our delta time before we silence the notes. With these values calculated, we can look at the complete MIDI file for a US dialtone. The file begins with a MIDI header: 4D 54 68 64 00 00 00 06 00 01 00 01 01 F4 The header can be broken down as follows +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 67 64 | The text MThd is the magic word indicating | | | this is a MIDI header. | | 00 00 00 06 | The length of the data to follow the header is | | | 6 bytes. | | 00 01 | MIDI format. Format 1 is one or more | | | simultaneous tracks and is the most common | | | type of MIDI | | 00 01 | Number of tracks. We use only 1. | | 01 F4 | Number of clocks for a quarter note. We use | | | 500 (0x01F4) as discussed above. | +------------------+------------------------------------------------+ The file then contains the MIDI data needed to represent the events: Bryan & Jennings Expires April 7, 2004 [Page 15] Internet-Draft MIDI tones in SIP October 2003 +------------------+------------------------------------------------+ +------------------+------------------------------------------------+ | 4D 54 72 6B | The text MTrk is the magic word indicating | | | this is a MIDI track | | 00 00 00 2B | The length of the track data. 0x22 is 34 bytes | | 00 C0 7D | The first byte is a time offset from the last | | | event, in this case, the start of the file or | | | time zero. This event occurs immediately. C0 | | | is the code for the MIDI event changing the | | | sound for MIDI channel 0. We change to sound | | | 0x7D, or 125, which is the Telephone Ringing | | | effect. | | 00 C1 7D | Change channel 1 (C1) to the sawtooth wave as | | | well. | | 00 E0 7C 4F | Again, the first byte is a time offset of | | | zero. Event E0 is a pitch bend event on | | | channel 0. We bend by 0x7C4F, as calculated | | | above. | | 00 E1 78 3D | Set the bend for channel 1 to be 0x783D, as | | | calculated above. | | 00 90 46 40 | Time delta of zero, event 90 is a start note | | | for channel 0, 0x46 is note 70, as calculated | | | above. The final byte, 0x40, indicates how | | | hard we strike the note. 0x40 represents a | | | middle velocity or volume note. | | 00 91 4B 40 | Start the second note on channel 1. We now | | | have both tones sounding. | | 83 74 80 46 7F | Wait 0x83 0x74 time units (0.5 seconds), then | | | signal event 80. 80 turns off a note on | | | channel 0. The note to be turned off is note | | | 46, which we turned on, then silenced with two | | | instances of event 90 above. The final byte, | | | 7f, represents the speed to stop the note. 7f | | | means turn the note off immediately, with no | | | fade. | | 00 81 4B 7F | After an additional 0 seconds (in other words, | | | at the same time as the event above), silence | | | the other note. Event 81 turns off a note on | | | channel 1, the note to turn off is the 4B we | | | created with the 91 event, and we turn it off | | | as quickly as possible. | | 83 74 80 46 7F | Wait another 0x83 0x74 time units (0.5 | | | seconds), then signal event 80 again. This is | | | designed to insert an additional half second | | | of silence at the end of the file. | | 00 81 4B 7F | Turn off the second channel again. | | FF 2F 00 | End of Track magic word | +------------------+------------------------------------------------+ Bryan & Jennings Expires April 7, 2004 [Page 16] Internet-Draft MIDI tones in SIP October 2003 10. Security Considerations This section still needs work. A UA needs to be very careful on who it accepts tones from because the user attributes particular meaning to the tones, which, if the tone is wrong, may not corespond to what is really happening. 11. Open Issues Someone should get IANA to register audio/midi Get someone with a frequency counter or oscilloscope to check the math and make sure the frequencies calculated are correct - just because they sound right doesn't mean 100% they are.. No Sine Wave, so we use 125 for Telephone Ringing. Are there problems with this? Timing note style (more common) vs. absolute (more "techie" and less musician). Should we use the style more common in MIDI or more familiar to telephony/network people? Measure and determine velocity values to produce sounds that are so many db more or less than others. This is likely to be inexact since different synthesizers may play the files differently. Endpoints should be calibrated to make them correct. Look in the spec and ensure the mechanism presented for rests at the very end of the MIDI sequence is handled correctly. Needed to make the sounds loopable, but must be 100% certain it is standards compliant. Verify there is no standard way of representing looping and discuss what we want to do to allow for it in the absence of a standard way to represent. Explain how the modulated form works that is used in some countries, and determine if there is a way to reproduce this in a standard way in MIDI. Write security section 12. Acknowledgements This reused work from the earlier drafts by Adam Roach [4] and Rohan Mahy[3]. Bryan & Jennings Expires April 7, 2004 [Page 17] Internet-Draft MIDI tones in SIP October 2003 13. Appendix A: Country Reference This appendix lists tones that might be used in various countries as an aid to developers. It is not known to be correct and is not normative in any way. The frequency is in Hz and + means to play both tones simultaneously while * means modulate one tone with the other. A / indicates the first tone is played then the second tone is played. Duration is the on time followed by the off time in seconds. Bryan & Jennings Expires April 7, 2004 [Page 20] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 21] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 23] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 24] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 25] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 26] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 27] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 29] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 30] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 31] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 32] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 33] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 34] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 36] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 37] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 38] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 40] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 43] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 44] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 45] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 46] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 47] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 48] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 49] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 50] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 51] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 52] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 53] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 54] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 55] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 56] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 57] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 58] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 59] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 60] Internet-Draft MIDI tones in SIP October 2003 Bryan & Jennings Expires April 7, 2004 [Page 61] Internet-Draft MIDI tones in SIP October 2003 Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. Informative References [3] Mahy, R., "Conveying Tones in the Session Initiation Protocol (SIP)", draft-mahy-sipping-tones-00 (work in progress), June 2003. [4] Roach, A., "Ringback tones in SIP-Based Telephony", draft-roach-voip-ringtone-00 (work in progress), November 2000. [5] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. Bryan & Jennings Expires April 7, 2004 [Page 62] Internet-Draft MIDI tones in SIP October 2003 Authors' Addresses David Bryan College of William and Mary Department of Computer Science P.O. Box 8795 Williamsburg, VA 23187 USA Phone: EMail: bryan@cs.wm.edu Cullen Jennings Cisco Systems 170 West Tasman Drive MS: SJC-21/3 San Jose, CA 95134 USA Phone: +1 408 902 3341 EMail: fluffy@cisco.com Bryan & Jennings Expires April 7, 2004 [Page 63] Internet-Draft MIDI tones in SIP October 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Bryan & Jennings Expires April 7, 2004 [Page 64] Internet-Draft MIDI tones in SIP October 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Bryan & Jennings Expires April 7, 2004 [Page 65]