FIO34-C. Distinguish between characters read from a file and EOF or WEOF - SEI CERT C Coding Standard (2024)

The EOF macro represents a negative value that is used to indicate that the file is exhausted and no data remains when reading data from a file. EOF is an example of an in-band error indicator. In-band error indicators are problematic to work with, and the creation of new in-band-error indicators is discouraged by ERR02-C. Avoid in-band error indicators.

The byte I/O functions fgetc(), getc(), and getchar() all read a character from a stream and return it as an int. (SeeSTR00-C. Represent characters using an appropriate type.) If the stream is at theend of the file, the end-of-file indicator for the stream is set and the function returns EOF. If a read error occurs, the error indicator for the stream is set and the function returns EOF. If these functions succeed, they cast the character returned into an unsigned char.

Because EOF is negative, it should not match any unsigned character value. However, this is only true for implementations where the int type is wider than char. On an implementation where int and char have the same width, a character-reading function can read and return a valid character that has the same bit-pattern as EOF. This could occur, for example, if an attacker inserted a value that looked like EOF into the file or data stream to alter the behavior of the program.

The C Standard requires only that the int type be able to represent a maximum value of +32767 and that a char type be no larger than an int. Although uncommon, this situation can result in the integer constant expression EOF being indistinguishable from a valid character; that is, (int)(unsigned char)65535 == -1. Consequently, failing to use feof() and ferror() to detect end-of-file and file errors can result in incorrectly identifying the EOF character on rare implementations where sizeof(int) == sizeof(char).

This problem is much more common when reading wide characters. The fgetwc(), getwc(), and getwchar() functions return a value of type wint_t. This value can represent the next wide character read, or it can represent WEOF, which indicates end-of-file for wide character streams. On most implementations, the wchar_t type has the same width as wint_t, and these functions can return a character indistinguishable from WEOF.

In the UTF-16 character set, 0xFFFF is guaranteed not to be a character, which allows WEOF to be represented as the value -1. Similarly, all UTF-32 characters are positive when viewed as a signed 32-bit integer. All widely used character sets are designed with at least one value that does not represent a character. Consequently, it would require a custom character set designed without consideration of the C programming language for this problem to occur with wide characters or with ordinary characters that are as wide as int.

The C Standard feof() and ferror() functions are not subject to the problems associated with character and integer sizes and should be used to verify end-of-file and file errors for susceptible implementations [Kettlewell 2002]. Calling both functions on each iteration of a loop adds significant overhead, so a good strategy is to temporarily trust EOF and WEOF within the loop but verify them with feof() and ferror() following the loop.

Noncompliant Code Example

This noncompliant code example loops while the character c is not EOF:

Although EOF is guaranteed to be negative and distinct from the value of any unsigned character, it is not guaranteed to be different from any such value when converted to an int. Consequently, when int has the same width as char, this loop may terminate prematurely.

Compliant Solution (Portable)

This compliant solution uses feof() and ferror() to test whether the EOF was an actual character or a real EOF because of end-of-file or errors:

#include <stdio.h>void func(void) { int c; do { c = getchar(); } while (c != EOF || (!feof(stdin) && !ferror(stdin)));}

Noncompliant Code Example (Nonportable)

This noncompliant code example uses an assertion to ensure that the code is executed only on architectures where int is wider than char and EOF is guaranteed not to be a valid character value. However, this code example is noncompliant because the variable c is declared as a char rather than an int, making it possible for a valid character value to compare equal to the value of the EOF macro when char is signed because of sign extension:

#include <assert.h>#include <limits.h>#include <stdio.h>void func(void) { char c; static_assert(UCHAR_MAX < UINT_MAX, "FIO34-C violation"); do { c = getchar(); } while (c != EOF);}

Assuming that a char is a signed 8-bit type and an int is a 32-bit type, if getchar() returns the character value '\xff (decimal 255), it will be interpreted as EOF because this value is sign-extended to 0xFFFFFFFF (the value of EOF) to perform the comparison. (See STR34-C. Cast characters to unsigned char before converting to larger integer sizes.)

Compliant Solution (Nonportable)

This compliant solution declares c to be an int. Consequently, the loop will terminate only when the file is exhausted.

#include <assert.h>#include <stdio.h>#include <limits.h>void func(void) { int c; static_assert(UCHAR_MAX < UINT_MAX, "FIO34-C violation"); do { c = getchar(); } while (c != EOF);}

Noncompliant Code Example (Wide Characters)

In this noncompliant example, the result of the call to the C standard library function getwc() is stored into a variable of type wchar_t and is subsequently compared with WEOF:

#include <stddef.h>#include <stdio.h>#include <wchar.h>enum { BUFFER_SIZE = 32 };void g(void) { wchar_t buf[BUFFER_SIZE]; wchar_t wc; size_t i = 0; while ((wc = getwc(stdin)) != L'\n' && wc != WEOF) { if (i < (BUFFER_SIZE - 1)) { buf[i++] = wc; } } buf[i] = L'\0';}

This code suffers from two problems. First, the value returned by getwc() is immediately converted to wchar_t before being compared with WEOF. Second, there is no check to ensure that wint_t is wider than wchar_t. Both of these problems make it possible for an attacker to terminate the loop prematurely by supplying the wide-character value matching WEOF in the file.

Compliant Solution (Portable)

This compliant solution declares wc to be a wint_t to match the integer type returned by getwc(). Furthermore, it does not rely on WEOF to determine end-of-file definitively.

#include <stddef.h>#include <stdio.h>#include <wchar.h> enum {BUFFER_SIZE = 32 }void g(void) { wchar_t buf[BUFFER_SIZE]; wint_t wc; size_t i = 0; while ((wc = getwc(stdin)) != L'\n' && wc != WEOF) { if (i < BUFFER_SIZE - 1) { buf[i++] = wc; } } if (feof(stdin) || ferror(stdin)) { buf[i] = L'\0'; } else { /* Received a wide character that resembles WEOF; handle error */ }}

Exceptions

FIO34-C-EX1: A number of C functions do not return characters but can return EOF as a status code. These functions include fclose(), fflush(), fputs(), fscanf(), puts(), scanf(), sscanf(), vfscanf(), and vscanf(). These return values can be compared to EOF without validating the result.

Risk Assessment

Incorrectly assuming characters from a file cannot match EOF or WEOF has resulted in significant vulnerabilities, including command injection attacks. (See the *CA-1996-22 advisory.)

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

FIO34-C

High

Probable

Medium

P12

L1

Automated Detection

Tool

Version

Checker

Description

Axivion Bauhaus Suite

7.2.0

CertC-FIO34
CodeSonar

8.1p0

LANG.CAST.COERCECoercion alters value
Compass/ROSE
Coverity

2017.07

CHAR_IO

Identifies defects when the return value offgetc(),getc(), orgetchar()is incorrectly assigned to achar instead of anint. Coverity Prevent cannot discover all violations of this rule, so further verification is necessary

ECLAIR1.2

CC2.FIO34

Partially implemented

Helix QAC

2024.1

C2676, C2678

C++2676, C++2678, C++3001, C++3010, C++3051, C++3137, C++3717


Klocwork

2024.1

CWARN.CMPCHR.EOF
LDRA tool suite

9.7.1

662 S
Fully implemented
Parasoft C/C++test

2023.1

CERT_C-FIO34-a

The macro EOF should be compared with the unmodified return value from the Standard Library function

Polyspace Bug Finder

R2024a

CERT C: Rule FIO34-C

Checks for character values absorbed into EOF (rule partially covered)

Splint3.1.1

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on theCERT website.

Key here (explains table format and definitions)

Taxonomy

Taxonomy item

Relationship

CERT C Secure Coding StandardSTR00-C. Represent characters using an appropriate typePrior to 2018-01-12: CERT: Unspecified Relationship
CERT C Secure Coding StandardINT31-C. Ensure that integer conversions do not result in lost or misinterpreted dataPrior to 2018-01-12: CERT: Unspecified Relationship
CERT Oracle Secure Coding Standard for JavaFIO08-J. Use an int to capture the return value of methods that read a character or bytePrior to 2018-01-12: CERT: Unspecified Relationship
ISO/IEC TS 17961:2013Using character values that are indistinguishable from EOF [chreof]Prior to 2018-01-12: CERT: Unspecified Relationship
CWE 2.11CWE-1972017-06-14: CERT: Rule subset of CWE

CERT-CWE Mapping Notes

Key here for mapping notes

CWE-197 and FIO34-C

Independent( FLP34-C, INT31-C) FIO34-C = Subset( INT31-C)

Therefore: FIO34-C = Subset( CWE-197)

Bibliography

[Kettlewell 2002]Section 1.2, "<stdio.h> and Character Types"
[NIST 2006]SAMATE Reference Dataset Test Case ID 000-000-088
[Summit 2005]Question 12.2

FIO34-C. Distinguish between characters read from a file and EOF or WEOF - SEI CERT C Coding Standard (2024)

FAQs

What is the EOF character in C? ›

The EOF in C/Linux is control^d on your keyboard; that is, you hold down the control key and hit d. The ascii value for EOF (CTRL-D) is 0x05 as shown in this ascii table . Typically a text file will have text and a bunch of whitespaces (e.g., blanks, tabs, spaces, newline characters) and terminate with an EOF.

What is the difference between EOF and feof functions in C? ›

EOF is a value. You can compare things to this value. feof() is a function, that returns a value. You can call feof() , and see what value it gives you.

How to detect EOF? ›

Detect EOF Using Scanner

We can observe how scanner reads the file, as long as hasNext() evaluates to true. This means we can retrieve String values from the scanner using the nextLine() method until hasNext() evaluates to false, indicating that we've reached the EOF.

How to check if you are at the end of a file in C? ›

feof() Function in C

The feof() function is used to check whether the file pointer to a stream is pointing to the end of the file or not. It returns a non-zero value if the end is reached, otherwise, it returns 0.

How to read character by character from file in C? ›

Reading characters from files with "file get character" function fgetc() The fgetc() function takes a file pointer as an argument, and reads a single character from a file. After reading the character, the file pointer is advanced to next character. It will read newline characters as well.

What is the ASCII code for EOF character? ›

Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an ASCII Control-Z, code 26.

What is the difference between .C and .O files? ›

An Object file (.o extension) is the compiled file itself. It is created when a C program (. c file) is compiled using a compiler. An executable file (.exe) is formed by linking the Object files.

What is the EOF () function? ›

Remarks. Use EOF to avoid the error generated by attempting to get input past the end of a file. The EOF function returns False until the end of the file has been reached. With files opened for Random or Binary access, EOF returns False until the last executed Get statement is unable to read an entire record.

Does every file have an EOF? ›

What is this value? EOF A common misconception of students is that files have a special EOF character at the end. There is no special character stored at the end of a file.

Does scanf read EOF? ›

Return Value

The scanf() function returns the number of fields that were successfully converted and assigned. The return value does not include fields that were read but not assigned. The return value is EOF for an attempt to read at end-of-file if no conversion was performed.

How to enter EOF from keyboard in C? ›

The End of the File (EOF) indicates the end of input. After we enter the text, if we press CTRL+Z, the text terminates i.e. it indicates the file reached end nothing to read.

What is the syntax of end-of-file in C? ›

C++ provides a special function, eof( ), that returns nonzero (meaning TRUE) when there are no more data to be read from an input file stream, and zero (meaning FALSE) otherwise. Rules for using end-of-file (eof( )): 1. Always test for the end-of-file condition before processing data read from an input file stream.

How to check if a file can be read in C? ›

Summary. Use the fopen() function to check if a file exists by attempting to read from it. Use the stat() function to check if a file exists by attempting to read properties from the file.

How to check for end of string character in C? ›

The null character signals the end of the string, and algorithms on C-style strings rely on its presence to determine where the string ends.

How to check if a file is opened correctly in C? ›

Create a variable of type "FILE*". Open the file using the "fopen" function and assign the "file" to the variable. Check to make sure the file was successfully opened by checking to see if the variable == NULL. If it does, an error has occured.

What is the difference between EOF and \0 in C? ›

'\0' is Null terminator used for representing end of a string. It's value is 0. While EOF is End Of File, it's a macro defined in stdio. h (usually value is -1).

What is the end character in C? ›

A String in C programming is a sequence of characters terminated with a null character '\0'. The C String is stored as an array of characters. The difference between a character array and a C string is that the string in C is terminated with a unique character '\0'.

What is the end of text character C? ›

The End of Text character (␃) is a control character in the ASCII and Unicode character sets with the value of 3. It is often abbreviated as ETX. The End of Text character was originally used in the context of telecommunications to indicate the end of the text portion of a transmission or message.

Top Articles
Latest Posts
Article information

Author: Prof. Nancy Dach

Last Updated:

Views: 5910

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.